Hi Rob, Answers are in the text below On Mon, Jul 30, 2018, at 13:45, Rob Willett wrote: > Droedel, > > We don't think there are significant read access. The database is a > single database on a single thread on a single process. The only access > to it is a Perl script that logs the incoming information. We never have > two accesses at the same time.
Can you also _measure_ read access, preferably on system level on your production database, e.g. by using iostat ? I've seen cases where (other, non-SQLite) databases had unexpected disk access patterns due to an application error. > We have a nagging feeling (and thats all it is) about the autoincrement > value. We do use that feature in the table, but we have deleted so many [snip] If both databases use autoincrement, then performance should be similar (or at least that's what I expect). Can you easily check if the newly generated IDs are as expected and larger than any existing ID in your table ? > We did wonder if we are filling up pages in the middle or something. > However we expected the vacuum and analyse to sort this out. Now its [snip] sqlite3_analyzer can give some measurements, e.g. unused bytes on index pages. > We've built the replica test system now and we're going to have some > initial checks and get some benchmarks in place. > > It could be an interesting and exciting ride :) Sure. It's always fun learning something new. But it's less fun in full production when customers are yelling :-( Regards, Droedel > > Rob > > > On 30 Jul 2018, at 12:32, Droedel wrote: > > > Hi Rob, > > > > Is there significant read access (iostat: r/s) during these slow > > writes ? If yes, it might be due to a small cache, requiring the > > database to read (index) pages before updating them. > > > > And is the data you're adding in both databases (large/small) added at > > the end of the table using the autoincrement, or do you insert some > > items in the middle ? I'm not a SQLite performance expert, but in > > other databases performance can be very different because in the > > former case fewer pages must be updated. > > > > Microsoft SQL Server has something called "fill factor", basically the > > max percentage of an index page that is used during initial fill, > > which helps avoiding too many page shuffling in the index when extra > > items are added. Disadvantage: it makes DBAs argue endlessly about the > > best fill factor ;-) Maybe there's something similar possible in > > SQLite but I couldn't find a pragma for this. > > > > Oh, and do both databases have the same page size, preferably 4K ? > > > > Regards, > > > > Droedel > > > > > > > > On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote: > >> Droedel, > >> > >> Thanks for the comprehensive reply. We have actually done all of > >> this. > >> > >> The system has been running for 2-3 years and we have taken the > >> opportunity to try and prune the database from 60GB down to 600MB. > >> Currently the live system is working OK with a 60GB database, but our > >> test system (which is smaller) is struggling with 600MB. > >> > >> The system has a restriction of IOPS as it's a Virtual Private > >> Server. > >> Technically it's running Ubuntu 16.04 under OpenVZ. We can get > >> 69MB/sec > >> with a disk to disk copy, which isn't brilliant if we had sustained > >> disk > >> traffic which we don't. > >> > >> We log approx 600 - 800 items of around 3-5K every five minutes. > >> These > >> 600-800 items are mainly an insert into a single table, there are > >> other > >> things happening as well, but this is the bulk of the work. We can > >> see > >> that the -wal files grow a small amount (4-5MB) just before the > >> commit. > >> It then takes 7 seconds to execute the commit. This is the bit that > >> we're struggling with. We know we can get circa 70MB/sec data > >> throughput, so this should take a fraction of a second. Now SQLite > >> needs > >> to work out which pages to commit so thats a little longer, but we > >> know > >> SQLite is fast, so that shouldn't take 7 seconds on the small > >> database > >> as it doesn't take that long on the large 60GB database. Thats the > >> puzzling bit, the large database is quick, the small one slow. > >> > >> We have no logging turned on, we can turn SQL logging on at the DBI > >> level but that turns a 20 sec run into a 2-3 minute run as it > >> captures > >> everything :) Nothing in the log files gives us any concern (apart > >> from > >> the really long commit time). Simon Slavin suggested dropping the > >> indexes which we did, that turned the commit into a fast commit, so > >> its > >> something to do with the indexes but we can't see what. > >> > >> What we are now doing is going back to the very beginning: > >> > >> 1. We built a replacement system yesterday with 8GB memory and 8 > >> cores > >> and 150GB disk space. Its virtualised (ESXI) but under our control. > >> 2. We've installed a copy of the old 60GB database on the new system. > >> 3. We're going to benchmark the new system over a couple of thousand > >> runs to see what the average time is. > >> 4. We'll then work our way through the deduping of the database step > >> by > >> step to see when the commit time blow up. This will take a few days > >> as > >> working out the duplications of 200,000,000 rows isn't that quick :) > >> As > >> we found out, dropping a very large table is really, really, really > >> slow. > >> 5. We'll apply some of the ideas that people have suggested since > >> yesterday to see if they work, but I'm keen that we have a repeatable > >> problem that we solve rather than we use a scatter gun approach to > >> fixing it. We think SQLite is well written so we figure the problem > >> is > >> ours to solve rather than simply blaming the software. > >> > >> > >> Thanks > >> > >> Rob > >> > >> On 30 Jul 2018, at 11:11, Droedel wrote: > >> > >>> Hi, > >>> > >>> When having bad performance, I usually first try to find out if the > >>> slowness is due to disk througput (sequential), slow random access > >>> or > >>> something else. In Linux, try "iostat -xtc 5". Do this with and > >>> without your application writing to disk. > >>> > >>> If you see high CPU %iowait and high %util on your disk, then disk > >>> is > >>> the bottleneck. If not: start profiling / analyzing other > >>> bottlenecks > >>> (CPU / network / ...) > >>> > >>> If the disk throughput (wMB/s) is close to your normal sequential > >>> throughput (69 MB/s): try to write less data or get a faster disk. > >>> If the disk troughput is low, but high numbers of writes (w/s): > >>> there's too much seeking / too many small writes to your disk. Page > >>> cache too small ? Checkpointing too often ? > >>> > >>> Sometimes this kind of problems is caused by other applications > >>> (logging / ...) causing too much baseload. %util should be low when > >>> your application isn't running. > >>> > >>> Just my 2 cents. > >>> > >>> Kind regards, > >>> > >>> Droedel > >>> > >>> > >>> On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote: > >>>> Hi, > >>>> > >>>> Background > >>>> > >>>> We've been refactoring our database to reduce the size of it. > >>>> Through > >>>> some simple logic we've managed to pull out 99% of the data to > >>>> reduce > >>>> the size from 51GB down to approx 600MB. This logic has been to > >>>> remove > >>>> rows that are almost the same but not quite identical. As with all > >>>> things, the thinking was the difficult bit, the execution somewhat > >>>> easier. > >>>> > >>>> As part of the testing we've been doing, we've now hit on an odd > >>>> and > >>>> weird problem to do with the COMMIT statement. A commit of a few > >>>> hundred > >>>> (circa 600-800) rows takes approx 7 seconds whereas before we never > >>>> even > >>>> noticed it, though we now know it was two seconds before. Each row > >>>> is > >>>> probably 1-2K of data, so its not very much at all. > >>>> > >>>> Details of what we have tried: > >>>> > >>>> 1. We've turned synchronous on and off > >>>> > >>>> PRAGMA synchronous=ON > >>>> > >>>> and thats not made any difference. > >>>> > >>>> 2. We are using and have been using WAL mode for years. > >>>> > >>>> PRAGMA journal_mode; > >>>> journal_mode > >>>> wal > >>>> > >>>> 3. We've tested that the server copies OK, we get a consistent > >>>> 69MB/sec. > >>>> This is not as fast we would like, but it's the same across all our > >>>> virtual servers. > >>>> > >>>> 4. We've tested the commit on our existing 60GB database and it > >>>> takes > >>>> 2 > >>>> seconds, which is longer than we thought it would be. The server > >>>> for > >>>> the > >>>> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu > >>>> 14.04. > >>>> The > >>>> server we are testing on is a 2GB/2 core test server running Ubuntu > >>>> 16.04. Whilst the test server is smaller, we wouldn't expect it to > >>>> take > >>>> 3 times longer to do a commit. > >>>> > >>>> 5. The code is identical across the servers. We are running Perl > >>>> and > >>>> the > >>>> DBI module. The code for doing a commit in Perl::DBI is > >>>> $dbh->do("COMMIT"); > >>>> > >>>> We are getting the expected performance elsewhere on the system > >>>> and > >>>> in the code. It's just the commit that is taking a long time. > >>>> > >>>> 6. The code we are committing is adding 600-800 lines to a table > >>>> that > >>>> used to be 200,000,000 rows in size. It's now 400,000 lines in > >>>> size. > >>>> We > >>>> are wondering if the deletion of the lines has had an impact we > >>>> didn't > >>>> expect. We have vacuumed and analysed the database. > >>>> > >>>> The schema for the table we insert into is > >>>> > >>>> CREATE TABLE IF NOT EXISTS "Disruptions" ( > >>>> "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, > >>>> "version" integer NOT NULL, > >>>> "Disruption_id" INTEGER NOT NULL, > >>>> "status" integer NOT NULL, > >>>> "severity" integer NOT NULL, > >>>> "levelOfInterest" integer NOT NULL, > >>>> "category" integer NOT NULL, > >>>> "subCategory" integer NOT NULL, > >>>> "startTime" TEXT NOT NULL, > >>>> "endTime" text NOT NULL, > >>>> "location" integer NOT NULL, > >>>> "corridor" integer NOT NULL, > >>>> "comments" integer NOT NULL, > >>>> "currentUpdate" integer NOT NULL, > >>>> "remarkTime" TEXT NOT NULL, > >>>> "lastModTime" TEXT NOT NULL, > >>>> "CauseAreaPointX" real NOT NULL, > >>>> "CauseAreaPointY" real NOT NULL, > >>>> "Direction" TEXT > >>>> ); > >>>> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE > >>>> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE > >>>> ASC, > >>>> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, > >>>> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, > >>>> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC); > >>>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions > >>>> ("Disruption_id" > >>>> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" > >>>> COLLATE > >>>> NOCASE ASC, "subCategory" COLLATE NOCASE ASC); > >>>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" > >>>> COLLATE > >>>> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE > >>>> NOCASE ASC); > >>>> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", > >>>> "Disruption_id", > >>>> "Severity", "levelOfInterest", "category", "subCategory", > >>>> "version"); > >>>> > >>>> We have checked that this schema is consistent across the > >>>> databases. > >>>> > >>>> We're about to recreate the table to see if that makes a > >>>> difference. > >>>> > >>>> Any help or advice welcomed. > >>>> > >>>> Thanks > >>>> > >>>> Rob > >>>> _______________________________________________ > >>>> sqlite-users mailing list > >>>> sqlite-users@mailinglists.sqlite.org > >>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >>> _______________________________________________ > >>> sqlite-users mailing list > >>> sqlite-users@mailinglists.sqlite.org > >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >> _______________________________________________ > >> sqlite-users mailing list > >> sqlite-users@mailinglists.sqlite.org > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > _______________________________________________ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users