Re: [sqlite] Very, very slow commits

Droedel Mon, 30 Jul 2018 05:50:06 -0700

Hi Rob, 
Answers are in the text below

On Mon, Jul 30, 2018, at 13:45, Rob Willett wrote:
> Droedel,
> 
> We don't think there are significant read access. The database is a 
> single database on a single thread on a single process. The only access 
> to it is a Perl script that logs the incoming information. We never have 
> two accesses at the same time.


Can you also _measure_ read access, preferably on system level on your 
production database, e.g. by using iostat ? I've seen cases where (other, 
non-SQLite) databases had unexpected disk access patterns due to an application 
error. 

> We have a nagging feeling (and thats all it is) about the autoincrement 
> value. We do use that feature in the table, but we have deleted so many 

[snip]
If both databases use autoincrement, then performance should be similar (or at 
least that's what I expect). Can you easily check if the newly generated IDs 
are as expected and larger than any existing ID in your table ?

> We did wonder if we are filling up pages in the middle or something. 
> However we expected the vacuum and analyse to sort this out. Now its 

[snip]
sqlite3_analyzer can give some measurements, e.g. unused bytes on index pages. 

> We've built the replica test system now and we're going to have some 
> initial checks and get some benchmarks in place.
> 
> It could be an interesting and exciting ride :)

Sure. It's always fun learning something new. But it's less fun in full 
production when customers are yelling :-(

Regards,

Droedel


> 
> Rob
> 
> 
> On 30 Jul 2018, at 12:32, Droedel wrote:
> 
> > Hi Rob,
> >
> > Is there significant read access (iostat: r/s) during these slow 
> > writes ? If yes, it might be due to a small cache, requiring the 
> > database to read (index) pages before updating them.
> >
> > And is the data you're adding in both databases (large/small) added at 
> > the end of the table using the autoincrement, or do you insert some 
> > items in the middle ? I'm not a SQLite performance expert, but in 
> > other databases performance can be very different because in the 
> > former case fewer pages must be updated.
> >
> > Microsoft SQL Server has something called "fill factor", basically the 
> > max percentage of an index page that is used during initial fill, 
> > which helps avoiding too many page shuffling in the index when extra 
> > items are added. Disadvantage: it makes DBAs argue endlessly about the 
> > best fill factor ;-) Maybe there's something similar possible in 
> > SQLite but I couldn't find a pragma for this.
> >
> > Oh, and do both databases have the same page size, preferably 4K ?
> >
> > Regards,
> >
> > Droedel
> >
> >
> >
> > On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:
> >> Droedel,
> >>
> >> Thanks for the comprehensive reply. We have actually done all of 
> >> this.
> >>
> >> The system has been running for 2-3 years and we have taken the
> >> opportunity to try and prune the database from 60GB down to 600MB.
> >> Currently the live system is working OK with a 60GB database, but our
> >> test system (which is smaller) is struggling with 600MB.
> >>
> >> The system has a restriction of IOPS as it's a Virtual Private 
> >> Server.
> >> Technically it's running Ubuntu 16.04 under OpenVZ. We can get 
> >> 69MB/sec
> >> with a disk to disk copy, which isn't brilliant if we had sustained 
> >> disk
> >> traffic which we don't.
> >>
> >> We log approx 600 - 800 items of around 3-5K every five minutes. 
> >> These
> >> 600-800 items are mainly an insert into a single table, there are 
> >> other
> >> things happening as well, but this is the bulk of the work. We can 
> >> see
> >> that the -wal files grow a small amount (4-5MB) just before the 
> >> commit.
> >> It then takes 7 seconds to execute the commit. This is the bit that
> >> we're struggling with. We know we can get circa 70MB/sec data
> >> throughput, so this should take a fraction of a second. Now SQLite 
> >> needs
> >> to work out which pages to commit so thats a little longer, but we 
> >> know
> >> SQLite is fast, so that shouldn't take 7 seconds on the small 
> >> database
> >> as it doesn't take that long on the large 60GB database. Thats the
> >> puzzling bit, the large database is quick, the small one slow.
> >>
> >> We have no logging turned on, we can turn SQL logging on at the DBI
> >> level but that turns a 20 sec run into a 2-3 minute run as it 
> >> captures
> >> everything :) Nothing in the log files gives us any concern (apart 
> >> from
> >> the really long commit time). Simon Slavin suggested dropping the
> >> indexes which we did, that turned the commit into a fast commit, so 
> >> its
> >> something to do with the indexes but we can't see what.
> >>
> >> What we are now doing is going back to the very beginning:
> >>
> >> 1. We built a replacement system yesterday with 8GB memory and 8 
> >> cores
> >> and 150GB disk space. Its virtualised (ESXI) but under our control.
> >> 2. We've installed a copy of the old 60GB database on the new system.
> >> 3. We're going to benchmark the new system over a couple of thousand
> >> runs to see what the average time is.
> >> 4. We'll then work our way through the deduping of the database step 
> >> by
> >> step to see when the commit time blow up. This will take a few days 
> >> as
> >> working out the duplications of 200,000,000 rows isn't that quick :) 
> >> As
> >> we found out, dropping a very large table is really, really, really
> >> slow.
> >> 5. We'll apply some of the ideas that people have suggested since
> >> yesterday to see if they work, but I'm keen that we have a repeatable
> >> problem that we solve rather than we use a scatter gun approach to
> >> fixing it. We think SQLite is well written so we figure the problem 
> >> is
> >> ours to solve rather than simply blaming the software.
> >>
> >>
> >> Thanks
> >>
> >> Rob
> >>
> >> On 30 Jul 2018, at 11:11, Droedel wrote:
> >>
> >>> Hi,
> >>>
> >>> When having bad performance, I usually first try to find out if the
> >>> slowness is due to disk througput (sequential), slow random access 
> >>> or
> >>> something else. In Linux, try "iostat -xtc 5". Do this with and
> >>> without your application writing to disk.
> >>>
> >>> If you see high CPU %iowait and high %util on your disk, then disk 
> >>> is
> >>> the bottleneck. If not: start profiling / analyzing other 
> >>> bottlenecks
> >>> (CPU / network / ...)
> >>>
> >>> If the disk throughput (wMB/s) is close to your normal sequential
> >>> throughput (69 MB/s): try to write less data or get a faster disk.
> >>> If the disk troughput is low, but high numbers of writes (w/s):
> >>> there's too much seeking / too many small writes to your disk. Page
> >>> cache too small ? Checkpointing too often ?
> >>>
> >>> Sometimes this kind of problems is caused by other applications
> >>> (logging / ...) causing too much baseload. %util should be low when
> >>> your application isn't running.
> >>>
> >>> Just my 2 cents.
> >>>
> >>> Kind regards,
> >>>
> >>> Droedel
> >>>
> >>>
> >>> On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:
> >>>> Hi,
> >>>>
> >>>> Background
> >>>>
> >>>> We've been refactoring our database to reduce the size of it. 
> >>>> Through
> >>>> some simple logic we've managed to pull out 99% of the data to 
> >>>> reduce
> >>>> the size from 51GB down to approx 600MB. This logic has been to
> >>>> remove
> >>>> rows that are almost the same but not quite identical. As with all
> >>>> things, the thinking was the difficult bit, the execution somewhat
> >>>> easier.
> >>>>
> >>>> As part of the testing we've been doing, we've now hit on an odd 
> >>>> and
> >>>> weird problem to do with the COMMIT statement. A commit of a few
> >>>> hundred
> >>>> (circa 600-800) rows takes approx 7 seconds whereas before we never
> >>>> even
> >>>> noticed it, though we now know it was two seconds before. Each row 
> >>>> is
> >>>> probably 1-2K of data, so its not very much at all.
> >>>>
> >>>> Details of what we have tried:
> >>>>
> >>>> 1. We've turned synchronous  on and off
> >>>>
> >>>> PRAGMA synchronous=ON
> >>>>
> >>>> and thats not made any difference.
> >>>>
> >>>> 2. We are using and have been using WAL mode for years.
> >>>>
> >>>> PRAGMA journal_mode;
> >>>> journal_mode
> >>>> wal
> >>>>
> >>>> 3. We've tested that the server copies OK, we get a consistent
> >>>> 69MB/sec.
> >>>> This is not as fast we would like, but it's the same across all our
> >>>> virtual servers.
> >>>>
> >>>> 4. We've tested the commit on our existing 60GB database and it 
> >>>> takes
> >>>> 2
> >>>> seconds, which is longer than we thought it would be. The server 
> >>>> for
> >>>> the
> >>>> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 
> >>>> 14.04.
> >>>> The
> >>>> server we are testing on is a 2GB/2 core test server running Ubuntu
> >>>> 16.04. Whilst the test server is smaller, we wouldn't expect it to
> >>>> take
> >>>> 3 times longer to do a commit.
> >>>>
> >>>> 5. The code is identical across the servers. We are running Perl 
> >>>> and
> >>>> the
> >>>> DBI module. The code for doing a commit in Perl::DBI is
> >>>>   $dbh->do("COMMIT");
> >>>>
> >>>>    We are getting the expected performance elsewhere on the system
> >>>> and
> >>>> in the code. It's just the commit that is taking a long time.
> >>>>
> >>>> 6. The code we are committing is adding 600-800 lines to a table 
> >>>> that
> >>>> used to be 200,000,000 rows in size. It's now 400,000 lines in 
> >>>> size.
> >>>> We
> >>>> are wondering if the deletion of the lines has had an impact we
> >>>> didn't
> >>>> expect. We have vacuumed and analysed the database.
> >>>>
> >>>> The schema for the table we insert into is
> >>>>
> >>>> CREATE TABLE IF NOT EXISTS "Disruptions" (
> >>>>   "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> >>>>   "version" integer NOT NULL,
> >>>>   "Disruption_id" INTEGER NOT NULL,
> >>>>   "status" integer NOT NULL,
> >>>>   "severity" integer NOT NULL,
> >>>>   "levelOfInterest" integer NOT NULL,
> >>>>   "category" integer NOT NULL,
> >>>>   "subCategory" integer NOT NULL,
> >>>>   "startTime" TEXT NOT NULL,
> >>>>   "endTime" text NOT NULL,
> >>>>   "location" integer NOT NULL,
> >>>>   "corridor" integer NOT NULL,
> >>>>   "comments" integer NOT NULL,
> >>>>   "currentUpdate" integer NOT NULL,
> >>>>   "remarkTime" TEXT NOT NULL,
> >>>>   "lastModTime" TEXT NOT NULL,
> >>>>   "CauseAreaPointX" real NOT NULL,
> >>>>   "CauseAreaPointY" real NOT NULL,
> >>>>   "Direction" TEXT
> >>>> );
> >>>> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
> >>>> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
> >>>> ASC,
> >>>> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
> >>>> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
> >>>> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> >>>> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions
> >>>> ("Disruption_id"
> >>>> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" 
> >>>> COLLATE
> >>>> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> >>>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> >>>> COLLATE
> >>>> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE
> >>>> NOCASE ASC);
> >>>> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
> >>>> "Disruption_id",
> >>>> "Severity", "levelOfInterest", "category", "subCategory", 
> >>>> "version");
> >>>>
> >>>> We have checked that this schema is consistent across the 
> >>>> databases.
> >>>>
> >>>> We're about to recreate the table to see if that makes a 
> >>>> difference.
> >>>>
> >>>> Any help or advice welcomed.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Rob
> >>>> _______________________________________________
> >>>> sqlite-users mailing list
> >>>> [email protected]
> >>>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>> _______________________________________________
> >>> sqlite-users mailing list
> >>> [email protected]
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >> _______________________________________________
> >> sqlite-users mailing list
> >> [email protected]
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > _______________________________________________
> > sqlite-users mailing list
> > [email protected]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Very, very slow commits

Reply via email to