Re: [sqlite] Very, very slow commits - Possibly solved

2018-07-31 Thread Rob Willett

Chris,

I'll try and summarise.

1. We have a 60GB database collecting data. This database is accessed by 
a single process once every five mins and around 5MB of data (approx 
600-800 rows) is added. Data has never been deleted.


2. The database is getting too big for the server it's hosted on. We're 
struggling to back it up, or do much with it as its hosted on a Virtual 
Private Server.


3. We took a long hard look at the database a few months ago and tried 
to work out what we could do. When we designed the database we weren't 
completely sure what data we would need so we went overboard and stored 
a lot of data, hence the database is growing.


4. We realised that a lot of the data is redundant, not in a normalised 
database form of redundancy but is data we don't actually need now. We 
thought we did, but our expectations are now different. Most of the data 
is held in a single table which is currently 200M rows long.


5. We worked out we could remove approx 99% of the data and everything 
that we currently do *should* work as before. The work we have been 
discussing in this thread is our testing of this reduction or 
de-duplication work. Currently the production system is untouched and 
works well and is performant.


6. The work to reduce the main table has been difficult as the table is 
so large AND we are using a Virtual Private Server which has IO 
limitations as its based on OpenVZ. The supplier doesn't want us 
consuming all the available resources.


7. We developed a couple of techniques for trying to speed up the 
reduction of the main database table. Rather than removing rows from the 
table, we copied out the required rows to a new identical table but we 
only needed to copy out approx 500,000 rows as opposed to 200,000,000. 
We then discovered that dropping a 200M row table on a VPS server is 
slow. Circa 10 hours. On a new home built and large server it's a few 
minutes. We only found this out late in the process.


8. Once we constructed the new table and new database (600Mb now rather 
than 60GB) we started testing it on a test server. This is a smaller 
version of the main production server, e.g. it has two cores rather than 
eight, 2GB rather than 8GB. Both the servers use a RAID array of 
spinning rust at the back end. We as customers have no idea what this 
array is.


9. After some various tests, we noticed that the database seemed to be 
slowing down, especially around the commit statement. It was taking 
around 7 secs to commit what should be a tiny amount of data (5MB). The 
average work we do in a process is off the database parsing and 
processing an XML file. The database actions we do are normally a simple 
insert to add rows to the main table with very occasional updates of 
other tables.


10. We then built a large server in our office under ESXI to replicate 
the production server and to try and move the work closer to us, so we 
could try and see what the problem is. This local server is faster than 
our production server BUT it doesn't have the network connections, 
redundancy and other features we need for production.  We tried to 
replicate the steps we did last week to see if we could reproduce the 
problem. We used the technique of copying to a new table, dropping the 
200M row table and catering the name of the table back as the technique 
to use. We have other techniques which involves working with the 200M 
row table in-situ but this technique seemed to be faster on our VPS 
server. On our home built server, we think that working with the table 
as-is would be faster.


11. We worked through our steps one by one to reproduce our smaller 
database. We vacuumed and analysed the database and then copied it back 
to a test server back on our VPS estate.


12. We benchmarked the database in the test VPS server and got around 
9-10 secs per run. As this is a test server it's significantly slower 
than our prod server but its a baseline we can work with. We send 
through 25 iterations of data to get the baseline.


13. We then started 'playing' about with indexes, creating them with 
different collations, creating tables with collations, including integer 
collations which we think should be cost neutral, as we copyied data 
from table to table to try and see what happened, we noticed that the 
speed significantly changed from 10 secs to around 16-18 secs. As far as 
we could see this was due to simply moving the data around. We always 
created the 'right' schema to copy into and didn't allow SQLite to work 
out the types. We ran analyse and vacuum on the data after moving 
tables. We also created and recreated indexes as needed.


14. We think that the constant moving of data around between tables is 
fragmenting tables and indexes on the disk and so when we add new rows 
to the vacuumed table we are adding them to all over the place so that 
commits are taking longer and longer. There was also a discussion that 
SSD's may mean that we are constantly getting 

Re: [sqlite] Very, very slow commits - Possibly solved

2018-07-31 Thread Chris Locke
I've been following this thread with interest, but this just doesn't make
sense...

>  Logically speaking SQLite shouldn't notice the difference in row order,
but things do slow down,
> even with analyse.

Are you accessing each row via its ID?  Even so, that should still be
indexed.
I thought you were simply adding records into the database - I'm failing to
grasp how this is slowing down in the new database.


Thanks,
Chris



On Tue, Jul 31, 2018 at 3:30 PM Rob Willett 
wrote:

> Dear all,
>
> We think we have now found the issue with the slow commits.
>
> We believe this is due to an inherent (and old) defect in our database
> design. We think our original design has an implicit ordering of rows in
> a table, when the table is only increasing this flaw in the design isn't
> apparent.
>
> However when we started deduping the table AND we copied rows from one
> table to another to move things around, we changed the underlying order
> of rows. Sqlite handles the design change BUT the flaw in our design
> becomes apparent as we keep moving the data around and data gets mixed
> up. The database slows down when we create a second table with an
> identical structure to the first table, copy the data into the new
> table, drop the old and then when we rename the old table to the new
> table, things appear to slow down. Logically speaking SQLite shouldn't
> notice the difference in row order, but things do slow down, even with
> analyse.
>
> We think that a better index definition could solve the problem for us,
> a better database design would, but thats a tricky problem.
>
> We're now going back to our 60GB database and start from scratch to see
> if we can create the issue (now we think we know what it is).
>
> Thanks to everybody who contributed ideas, we appreciate the help.
>
> Rob
>
> On 31 Jul 2018, at 15:19, Rob Willett wrote:
>
> > Simon,
> >
> > As an exercise we have just added in COLLATE NOCASE to our integer
> > columns.
> >
> > Whoops! We thought this would make no difference but its added extra
> > 70% to our processing speeds.
> >
> > We've now got to the stage where we can make changes quickly, so we'll
> > back that change out and go back to the integer defn without COLLATE
> > NOCASE.
> >
> > Rob
> >
> > On 31 Jul 2018, at 14:59, Rob Willett wrote:
> >
> >> Simon,
> >>
> >> Apologies for taking so long to get back, we've been building a test
> >> system and its taken a long time.
> >>
> >> We're just getting round to trying your ideas out to see what
> >> difference they make,
> >>
> >> We've created a new table based on your ideas, moved the collate into
> >> the table, analysed the database. We did **not** add COLLATE NOCASE
> >> to the columns which are defined as integers. Would that make a
> >> difference?
> >>
> >> We've found it now takes around 10% longer to do the queries than
> >> before.
> >>
> >> Rob
> >>
> >>
> >>> Please try moving your COLLATE clauses into the table definition.
> >>> e.g. instead of
> >>>
> >>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> >>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
> >>> COLLATE NOCASE ASC);
> >>>
> >>> Your table definition should have
> >>>
> >>>  "version" integer NOT NULL COLLATE NOCASE,
> >>>  "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
> >>> ...
> >>>  "location" integer NOT NULL COLLATE NOCASE,
> >>>
> >>> and the index should be
> >>>
> >>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
> >>> ("version" ASC, "Disruption_id" ASC, "location" ASC);
> >>>
> >>> Once data has been entered, do ANALYZE.  This step may take a long
> >>> time.
> >>>
> >>> Simon.
> >>> ___
> >>> sqlite-users mailing list
> >>> sqlite-users@mailinglists.sqlite.org
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits - Possibly solved

2018-07-31 Thread Rob Willett

Dear all,

We think we have now found the issue with the slow commits.

We believe this is due to an inherent (and old) defect in our database 
design. We think our original design has an implicit ordering of rows in 
a table, when the table is only increasing this flaw in the design isn't 
apparent.


However when we started deduping the table AND we copied rows from one 
table to another to move things around, we changed the underlying order 
of rows. Sqlite handles the design change BUT the flaw in our design 
becomes apparent as we keep moving the data around and data gets mixed 
up. The database slows down when we create a second table with an 
identical structure to the first table, copy the data into the new 
table, drop the old and then when we rename the old table to the new 
table, things appear to slow down. Logically speaking SQLite shouldn't 
notice the difference in row order, but things do slow down, even with 
analyse.


We think that a better index definition could solve the problem for us, 
a better database design would, but thats a tricky problem.


We're now going back to our 60GB database and start from scratch to see 
if we can create the issue (now we think we know what it is).


Thanks to everybody who contributed ideas, we appreciate the help.

Rob

On 31 Jul 2018, at 15:19, Rob Willett wrote:


Simon,

As an exercise we have just added in COLLATE NOCASE to our integer 
columns.


Whoops! We thought this would make no difference but its added extra 
70% to our processing speeds.


We've now got to the stage where we can make changes quickly, so we'll 
back that change out and go back to the integer defn without COLLATE 
NOCASE.


Rob

On 31 Jul 2018, at 14:59, Rob Willett wrote:


Simon,

Apologies for taking so long to get back, we've been building a test 
system and its taken a long time.


We're just getting round to trying your ideas out to see what 
difference they make,


We've created a new table based on your ideas, moved the collate into 
the table, analysed the database. We did **not** add COLLATE NOCASE 
to the columns which are defined as integers. Would that make a 
difference?


We've found it now takes around 10% longer to do the queries than 
before.


Rob


Please try moving your COLLATE clauses into the table definition.  
e.g. instead of


CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" 
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" 
COLLATE NOCASE ASC);


Your table definition should have

 "version" integer NOT NULL COLLATE NOCASE,
 "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
...
 "location" integer NOT NULL COLLATE NOCASE,

and the index should be

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
("version" ASC, "Disruption_id" ASC, "location" ASC);

Once data has been entered, do ANALYZE.  This step may take a long 
time.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users