Re: [sqlite] Very, very slow commits - Possibly solved
Chris, I'll try and summarise. 1. We have a 60GB database collecting data. This database is accessed by a single process once every five mins and around 5MB of data (approx 600-800 rows) is added. Data has never been deleted. 2. The database is getting too big for the server it's hosted on. We're struggling to back it up, or do much with it as its hosted on a Virtual Private Server. 3. We took a long hard look at the database a few months ago and tried to work out what we could do. When we designed the database we weren't completely sure what data we would need so we went overboard and stored a lot of data, hence the database is growing. 4. We realised that a lot of the data is redundant, not in a normalised database form of redundancy but is data we don't actually need now. We thought we did, but our expectations are now different. Most of the data is held in a single table which is currently 200M rows long. 5. We worked out we could remove approx 99% of the data and everything that we currently do *should* work as before. The work we have been discussing in this thread is our testing of this reduction or de-duplication work. Currently the production system is untouched and works well and is performant. 6. The work to reduce the main table has been difficult as the table is so large AND we are using a Virtual Private Server which has IO limitations as its based on OpenVZ. The supplier doesn't want us consuming all the available resources. 7. We developed a couple of techniques for trying to speed up the reduction of the main database table. Rather than removing rows from the table, we copied out the required rows to a new identical table but we only needed to copy out approx 500,000 rows as opposed to 200,000,000. We then discovered that dropping a 200M row table on a VPS server is slow. Circa 10 hours. On a new home built and large server it's a few minutes. We only found this out late in the process. 8. Once we constructed the new table and new database (600Mb now rather than 60GB) we started testing it on a test server. This is a smaller version of the main production server, e.g. it has two cores rather than eight, 2GB rather than 8GB. Both the servers use a RAID array of spinning rust at the back end. We as customers have no idea what this array is. 9. After some various tests, we noticed that the database seemed to be slowing down, especially around the commit statement. It was taking around 7 secs to commit what should be a tiny amount of data (5MB). The average work we do in a process is off the database parsing and processing an XML file. The database actions we do are normally a simple insert to add rows to the main table with very occasional updates of other tables. 10. We then built a large server in our office under ESXI to replicate the production server and to try and move the work closer to us, so we could try and see what the problem is. This local server is faster than our production server BUT it doesn't have the network connections, redundancy and other features we need for production. We tried to replicate the steps we did last week to see if we could reproduce the problem. We used the technique of copying to a new table, dropping the 200M row table and catering the name of the table back as the technique to use. We have other techniques which involves working with the 200M row table in-situ but this technique seemed to be faster on our VPS server. On our home built server, we think that working with the table as-is would be faster. 11. We worked through our steps one by one to reproduce our smaller database. We vacuumed and analysed the database and then copied it back to a test server back on our VPS estate. 12. We benchmarked the database in the test VPS server and got around 9-10 secs per run. As this is a test server it's significantly slower than our prod server but its a baseline we can work with. We send through 25 iterations of data to get the baseline. 13. We then started 'playing' about with indexes, creating them with different collations, creating tables with collations, including integer collations which we think should be cost neutral, as we copyied data from table to table to try and see what happened, we noticed that the speed significantly changed from 10 secs to around 16-18 secs. As far as we could see this was due to simply moving the data around. We always created the 'right' schema to copy into and didn't allow SQLite to work out the types. We ran analyse and vacuum on the data after moving tables. We also created and recreated indexes as needed. 14. We think that the constant moving of data around between tables is fragmenting tables and indexes on the disk and so when we add new rows to the vacuumed table we are adding them to all over the place so that commits are taking longer and longer. There was also a discussion that SSD's may mean that we are constantly getting
Re: [sqlite] Very, very slow commits - Possibly solved
I've been following this thread with interest, but this just doesn't make sense... > Logically speaking SQLite shouldn't notice the difference in row order, but things do slow down, > even with analyse. Are you accessing each row via its ID? Even so, that should still be indexed. I thought you were simply adding records into the database - I'm failing to grasp how this is slowing down in the new database. Thanks, Chris On Tue, Jul 31, 2018 at 3:30 PM Rob Willett wrote: > Dear all, > > We think we have now found the issue with the slow commits. > > We believe this is due to an inherent (and old) defect in our database > design. We think our original design has an implicit ordering of rows in > a table, when the table is only increasing this flaw in the design isn't > apparent. > > However when we started deduping the table AND we copied rows from one > table to another to move things around, we changed the underlying order > of rows. Sqlite handles the design change BUT the flaw in our design > becomes apparent as we keep moving the data around and data gets mixed > up. The database slows down when we create a second table with an > identical structure to the first table, copy the data into the new > table, drop the old and then when we rename the old table to the new > table, things appear to slow down. Logically speaking SQLite shouldn't > notice the difference in row order, but things do slow down, even with > analyse. > > We think that a better index definition could solve the problem for us, > a better database design would, but thats a tricky problem. > > We're now going back to our 60GB database and start from scratch to see > if we can create the issue (now we think we know what it is). > > Thanks to everybody who contributed ideas, we appreciate the help. > > Rob > > On 31 Jul 2018, at 15:19, Rob Willett wrote: > > > Simon, > > > > As an exercise we have just added in COLLATE NOCASE to our integer > > columns. > > > > Whoops! We thought this would make no difference but its added extra > > 70% to our processing speeds. > > > > We've now got to the stage where we can make changes quickly, so we'll > > back that change out and go back to the integer defn without COLLATE > > NOCASE. > > > > Rob > > > > On 31 Jul 2018, at 14:59, Rob Willett wrote: > > > >> Simon, > >> > >> Apologies for taking so long to get back, we've been building a test > >> system and its taken a long time. > >> > >> We're just getting round to trying your ideas out to see what > >> difference they make, > >> > >> We've created a new table based on your ideas, moved the collate into > >> the table, analysed the database. We did **not** add COLLATE NOCASE > >> to the columns which are defined as integers. Would that make a > >> difference? > >> > >> We've found it now takes around 10% longer to do the queries than > >> before. > >> > >> Rob > >> > >> > >>> Please try moving your COLLATE clauses into the table definition. > >>> e.g. instead of > >>> > >>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" > >>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" > >>> COLLATE NOCASE ASC); > >>> > >>> Your table definition should have > >>> > >>> "version" integer NOT NULL COLLATE NOCASE, > >>> "Disruption_id" INTEGER NOT NULL COLLATE NOCASE, > >>> ... > >>> "location" integer NOT NULL COLLATE NOCASE, > >>> > >>> and the index should be > >>> > >>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions > >>> ("version" ASC, "Disruption_id" ASC, "location" ASC); > >>> > >>> Once data has been entered, do ANALYZE. This step may take a long > >>> time. > >>> > >>> Simon. > >>> ___ > >>> sqlite-users mailing list > >>> sqlite-users@mailinglists.sqlite.org > >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >> ___ > >> sqlite-users mailing list > >> sqlite-users@mailinglists.sqlite.org > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Very, very slow commits - Possibly solved
Dear all, We think we have now found the issue with the slow commits. We believe this is due to an inherent (and old) defect in our database design. We think our original design has an implicit ordering of rows in a table, when the table is only increasing this flaw in the design isn't apparent. However when we started deduping the table AND we copied rows from one table to another to move things around, we changed the underlying order of rows. Sqlite handles the design change BUT the flaw in our design becomes apparent as we keep moving the data around and data gets mixed up. The database slows down when we create a second table with an identical structure to the first table, copy the data into the new table, drop the old and then when we rename the old table to the new table, things appear to slow down. Logically speaking SQLite shouldn't notice the difference in row order, but things do slow down, even with analyse. We think that a better index definition could solve the problem for us, a better database design would, but thats a tricky problem. We're now going back to our 60GB database and start from scratch to see if we can create the issue (now we think we know what it is). Thanks to everybody who contributed ideas, we appreciate the help. Rob On 31 Jul 2018, at 15:19, Rob Willett wrote: Simon, As an exercise we have just added in COLLATE NOCASE to our integer columns. Whoops! We thought this would make no difference but its added extra 70% to our processing speeds. We've now got to the stage where we can make changes quickly, so we'll back that change out and go back to the integer defn without COLLATE NOCASE. Rob On 31 Jul 2018, at 14:59, Rob Willett wrote: Simon, Apologies for taking so long to get back, we've been building a test system and its taken a long time. We're just getting round to trying your ideas out to see what difference they make, We've created a new table based on your ideas, moved the collate into the table, analysed the database. We did **not** add COLLATE NOCASE to the columns which are defined as integers. Would that make a difference? We've found it now takes around 10% longer to do the queries than before. Rob Please try moving your COLLATE clauses into the table definition. e.g. instead of CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE NOCASE ASC); Your table definition should have "version" integer NOT NULL COLLATE NOCASE, "Disruption_id" INTEGER NOT NULL COLLATE NOCASE, ... "location" integer NOT NULL COLLATE NOCASE, and the index should be CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" ASC, "Disruption_id" ASC, "location" ASC); Once data has been entered, do ANALYZE. This step may take a long time. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users