Re: [sqlite] Very, very slow commits

2018-07-31 Thread Rob Willett

Simon,

Absolutely no need to apologise. We should apologise for all the time we 
have taken from other people :(


We recognise that the collate no case is inappropriate for our database. 
We suspect this was added from a SQLite tool we used some time ago. We 
are going to use this opportunity to remove this sort of nonsense.


We still think we have an inherent design fault in our database that we 
are trying to understand. One of the problems we had was that checking 
the performance of a 200M row table for bad indexes is time consuming. 
We now have a workflow to get from a 60GB database to a 600MB database 
in a few hours. We cannot do all the work in SQL as it involves an 
external program to analyse the data but a few hours to run isn't bad. 
As we now have the database held locally, we can thrash the local server 
silly to get the performance we need.


Rob

On 31 Jul 2018, at 16:18, Simon Slavin wrote:

On 31 Jul 2018, at 2:59pm, Rob Willett  
wrote:


We've created a new table based on your ideas, moved the collate into 
the table, analysed the database. We did **not** add COLLATE NOCASE 
to the columns which are defined as integers. Would that make a 
difference?


What you did is correct.  I gave wrong advice for which I apologise.  
But I am now confused since your original code is a little strange.  
Your original has a table definition including


"version" integer NOT NULL,

but then

	CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions 
("Disruption_id" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
"category" COLLATE NOCASE ASC, "subCategory" COLLATE NOCASE ASC);


If "version" really is an INTEGER, then it is incorrect to use version 
COLLATE NOCASE in the index.  NOCASE is purely for text values.  This 
may be slowing things down.


To solve it, in the table definition, use COLLATE NOCASE for TEXT 
columns and not for INTEGER columns.  Also, remove all mentions of 
COLLATE NOCASE in your index definitions.  Collation methods should be 
set in the table definition, not in indexes, except for some unusual 
situations.


This should increase your speed relative to your original timings.  If 
it slows things down, something else I haven't spotted is wrong.


We've found it now takes around 10% longer to do the queries than 
before.


That is understandable given the incorrect advice I gave you before.

In another post you report some strange timing problems with no simple 
explanation.  When I get such things I suspect database corruption or 
hardware problems and run an integrity_check.  But with a 60Gig 
database I might think twice.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits - Possibly solved

2018-07-31 Thread Rob Willett

Chris,

I'll try and summarise.

1. We have a 60GB database collecting data. This database is accessed by 
a single process once every five mins and around 5MB of data (approx 
600-800 rows) is added. Data has never been deleted.


2. The database is getting too big for the server it's hosted on. We're 
struggling to back it up, or do much with it as its hosted on a Virtual 
Private Server.


3. We took a long hard look at the database a few months ago and tried 
to work out what we could do. When we designed the database we weren't 
completely sure what data we would need so we went overboard and stored 
a lot of data, hence the database is growing.


4. We realised that a lot of the data is redundant, not in a normalised 
database form of redundancy but is data we don't actually need now. We 
thought we did, but our expectations are now different. Most of the data 
is held in a single table which is currently 200M rows long.


5. We worked out we could remove approx 99% of the data and everything 
that we currently do *should* work as before. The work we have been 
discussing in this thread is our testing of this reduction or 
de-duplication work. Currently the production system is untouched and 
works well and is performant.


6. The work to reduce the main table has been difficult as the table is 
so large AND we are using a Virtual Private Server which has IO 
limitations as its based on OpenVZ. The supplier doesn't want us 
consuming all the available resources.


7. We developed a couple of techniques for trying to speed up the 
reduction of the main database table. Rather than removing rows from the 
table, we copied out the required rows to a new identical table but we 
only needed to copy out approx 500,000 rows as opposed to 200,000,000. 
We then discovered that dropping a 200M row table on a VPS server is 
slow. Circa 10 hours. On a new home built and large server it's a few 
minutes. We only found this out late in the process.


8. Once we constructed the new table and new database (600Mb now rather 
than 60GB) we started testing it on a test server. This is a smaller 
version of the main production server, e.g. it has two cores rather than 
eight, 2GB rather than 8GB. Both the servers use a RAID array of 
spinning rust at the back end. We as customers have no idea what this 
array is.


9. After some various tests, we noticed that the database seemed to be 
slowing down, especially around the commit statement. It was taking 
around 7 secs to commit what should be a tiny amount of data (5MB). The 
average work we do in a process is off the database parsing and 
processing an XML file. The database actions we do are normally a simple 
insert to add rows to the main table with very occasional updates of 
other tables.


10. We then built a large server in our office under ESXI to replicate 
the production server and to try and move the work closer to us, so we 
could try and see what the problem is. This local server is faster than 
our production server BUT it doesn't have the network connections, 
redundancy and other features we need for production.  We tried to 
replicate the steps we did last week to see if we could reproduce the 
problem. We used the technique of copying to a new table, dropping the 
200M row table and catering the name of the table back as the technique 
to use. We have other techniques which involves working with the 200M 
row table in-situ but this technique seemed to be faster on our VPS 
server. On our home built server, we think that working with the table 
as-is would be faster.


11. We worked through our steps one by one to reproduce our smaller 
database. We vacuumed and analysed the database and then copied it back 
to a test server back on our VPS estate.


12. We benchmarked the database in the test VPS server and got around 
9-10 secs per run. As this is a test server it's significantly slower 
than our prod server but its a baseline we can work with. We send 
through 25 iterations of data to get the baseline.


13. We then started 'playing' about with indexes, creating them with 
different collations, creating tables with collations, including integer 
collations which we think should be cost neutral, as we copyied data 
from table to table to try and see what happened, we noticed that the 
speed significantly changed from 10 secs to around 16-18 secs. As far as 
we could see this was due to simply moving the data around. We always 
created the 'right' schema to copy into and didn't allow SQLite to work 
out the types. We ran analyse and vacuum on the data after moving 
tables. We also created and recreated indexes as needed.


14. We think that the constant moving of data around between tables is 
fragmenting tables and indexes on the disk and so when we add new rows 
to the vacuumed table we are adding them to all over the place so that 
commits are taking longer and longer. There was also a discussion that 
SSD's may mean that we are constantly getting 

Re: [sqlite] Very, very slow commits

2018-07-31 Thread Simon Slavin
On 31 Jul 2018, at 2:59pm, Rob Willett  wrote:

> We've created a new table based on your ideas, moved the collate into the 
> table, analysed the database. We did **not** add COLLATE NOCASE to the 
> columns which are defined as integers. Would that make a difference?

What you did is correct.  I gave wrong advice for which I apologise.  But I am 
now confused since your original code is a little strange.  Your original has a 
table definition including

"version" integer NOT NULL,

but then

CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE NOCASE 
ASC, "subCategory" COLLATE NOCASE ASC);

If "version" really is an INTEGER, then it is incorrect to use version COLLATE 
NOCASE in the index.  NOCASE is purely for text values.  This may be slowing 
things down.

To solve it, in the table definition, use COLLATE NOCASE for TEXT columns and 
not for INTEGER columns.  Also, remove all mentions of COLLATE NOCASE in your 
index definitions.  Collation methods should be set in the table definition, 
not in indexes, except for some unusual situations.

This should increase your speed relative to your original timings.  If it slows 
things down, something else I haven't spotted is wrong.

> We've found it now takes around 10% longer to do the queries than before.

That is understandable given the incorrect advice I gave you before.

In another post you report some strange timing problems with no simple 
explanation.  When I get such things I suspect database corruption or hardware 
problems and run an integrity_check.  But with a 60Gig database I might think 
twice.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits - Possibly solved

2018-07-31 Thread Chris Locke
I've been following this thread with interest, but this just doesn't make
sense...

>  Logically speaking SQLite shouldn't notice the difference in row order,
but things do slow down,
> even with analyse.

Are you accessing each row via its ID?  Even so, that should still be
indexed.
I thought you were simply adding records into the database - I'm failing to
grasp how this is slowing down in the new database.


Thanks,
Chris



On Tue, Jul 31, 2018 at 3:30 PM Rob Willett 
wrote:

> Dear all,
>
> We think we have now found the issue with the slow commits.
>
> We believe this is due to an inherent (and old) defect in our database
> design. We think our original design has an implicit ordering of rows in
> a table, when the table is only increasing this flaw in the design isn't
> apparent.
>
> However when we started deduping the table AND we copied rows from one
> table to another to move things around, we changed the underlying order
> of rows. Sqlite handles the design change BUT the flaw in our design
> becomes apparent as we keep moving the data around and data gets mixed
> up. The database slows down when we create a second table with an
> identical structure to the first table, copy the data into the new
> table, drop the old and then when we rename the old table to the new
> table, things appear to slow down. Logically speaking SQLite shouldn't
> notice the difference in row order, but things do slow down, even with
> analyse.
>
> We think that a better index definition could solve the problem for us,
> a better database design would, but thats a tricky problem.
>
> We're now going back to our 60GB database and start from scratch to see
> if we can create the issue (now we think we know what it is).
>
> Thanks to everybody who contributed ideas, we appreciate the help.
>
> Rob
>
> On 31 Jul 2018, at 15:19, Rob Willett wrote:
>
> > Simon,
> >
> > As an exercise we have just added in COLLATE NOCASE to our integer
> > columns.
> >
> > Whoops! We thought this would make no difference but its added extra
> > 70% to our processing speeds.
> >
> > We've now got to the stage where we can make changes quickly, so we'll
> > back that change out and go back to the integer defn without COLLATE
> > NOCASE.
> >
> > Rob
> >
> > On 31 Jul 2018, at 14:59, Rob Willett wrote:
> >
> >> Simon,
> >>
> >> Apologies for taking so long to get back, we've been building a test
> >> system and its taken a long time.
> >>
> >> We're just getting round to trying your ideas out to see what
> >> difference they make,
> >>
> >> We've created a new table based on your ideas, moved the collate into
> >> the table, analysed the database. We did **not** add COLLATE NOCASE
> >> to the columns which are defined as integers. Would that make a
> >> difference?
> >>
> >> We've found it now takes around 10% longer to do the queries than
> >> before.
> >>
> >> Rob
> >>
> >>
> >>> Please try moving your COLLATE clauses into the table definition.
> >>> e.g. instead of
> >>>
> >>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
> >>> COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
> >>> COLLATE NOCASE ASC);
> >>>
> >>> Your table definition should have
> >>>
> >>>  "version" integer NOT NULL COLLATE NOCASE,
> >>>  "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
> >>> ...
> >>>  "location" integer NOT NULL COLLATE NOCASE,
> >>>
> >>> and the index should be
> >>>
> >>> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
> >>> ("version" ASC, "Disruption_id" ASC, "location" ASC);
> >>>
> >>> Once data has been entered, do ANALYZE.  This step may take a long
> >>> time.
> >>>
> >>> Simon.
> >>> ___
> >>> sqlite-users mailing list
> >>> sqlite-users@mailinglists.sqlite.org
> >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits - Possibly solved

2018-07-31 Thread Rob Willett

Dear all,

We think we have now found the issue with the slow commits.

We believe this is due to an inherent (and old) defect in our database 
design. We think our original design has an implicit ordering of rows in 
a table, when the table is only increasing this flaw in the design isn't 
apparent.


However when we started deduping the table AND we copied rows from one 
table to another to move things around, we changed the underlying order 
of rows. Sqlite handles the design change BUT the flaw in our design 
becomes apparent as we keep moving the data around and data gets mixed 
up. The database slows down when we create a second table with an 
identical structure to the first table, copy the data into the new 
table, drop the old and then when we rename the old table to the new 
table, things appear to slow down. Logically speaking SQLite shouldn't 
notice the difference in row order, but things do slow down, even with 
analyse.


We think that a better index definition could solve the problem for us, 
a better database design would, but thats a tricky problem.


We're now going back to our 60GB database and start from scratch to see 
if we can create the issue (now we think we know what it is).


Thanks to everybody who contributed ideas, we appreciate the help.

Rob

On 31 Jul 2018, at 15:19, Rob Willett wrote:


Simon,

As an exercise we have just added in COLLATE NOCASE to our integer 
columns.


Whoops! We thought this would make no difference but its added extra 
70% to our processing speeds.


We've now got to the stage where we can make changes quickly, so we'll 
back that change out and go back to the integer defn without COLLATE 
NOCASE.


Rob

On 31 Jul 2018, at 14:59, Rob Willett wrote:


Simon,

Apologies for taking so long to get back, we've been building a test 
system and its taken a long time.


We're just getting round to trying your ideas out to see what 
difference they make,


We've created a new table based on your ideas, moved the collate into 
the table, analysed the database. We did **not** add COLLATE NOCASE 
to the columns which are defined as integers. Would that make a 
difference?


We've found it now takes around 10% longer to do the queries than 
before.


Rob


Please try moving your COLLATE clauses into the table definition.  
e.g. instead of


CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" 
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" 
COLLATE NOCASE ASC);


Your table definition should have

 "version" integer NOT NULL COLLATE NOCASE,
 "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
...
 "location" integer NOT NULL COLLATE NOCASE,

and the index should be

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
("version" ASC, "Disruption_id" ASC, "location" ASC);

Once data has been entered, do ANALYZE.  This step may take a long 
time.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-31 Thread Rob Willett

Simon,

As an exercise we have just added in COLLATE NOCASE to our integer 
columns.


Whoops! We thought this would make no difference but its added extra 70% 
to our processing speeds.


We've now got to the stage where we can make changes quickly, so we'll 
back that change out and go back to the integer defn without COLLATE 
NOCASE.


Rob

On 31 Jul 2018, at 14:59, Rob Willett wrote:


Simon,

Apologies for taking so long to get back, we've been building a test 
system and its taken a long time.


We're just getting round to trying your ideas out to see what 
difference they make,


We've created a new table based on your ideas, moved the collate into 
the table, analysed the database. We did **not** add COLLATE NOCASE to 
the columns which are defined as integers. Would that make a 
difference?


We've found it now takes around 10% longer to do the queries than 
before.


Rob


Please try moving your COLLATE clauses into the table definition.  
e.g. instead of


CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" 
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" 
COLLATE NOCASE ASC);


Your table definition should have

 "version" integer NOT NULL COLLATE NOCASE,
 "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
...
 "location" integer NOT NULL COLLATE NOCASE,

and the index should be

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
("version" ASC, "Disruption_id" ASC, "location" ASC);

Once data has been entered, do ANALYZE.  This step may take a long 
time.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-31 Thread Rob Willett

Simon,

Apologies for taking so long to get back, we've been building a test 
system and its taken a long time.


We're just getting round to trying your ideas out to see what difference 
they make,


We've created a new table based on your ideas, moved the collate into 
the table, analysed the database. We did **not** add COLLATE NOCASE to 
the columns which are defined as integers. Would that make a difference?


We've found it now takes around 10% longer to do the queries than 
before.


Rob


Please try moving your COLLATE clauses into the table definition.  
e.g. instead of


CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" 
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" 
COLLATE NOCASE ASC);


Your table definition should have

 "version" integer NOT NULL COLLATE NOCASE,
 "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
...
 "location" integer NOT NULL COLLATE NOCASE,

and the index should be

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
("version" ASC, "Disruption_id" ASC, "location" ASC);

Once data has been entered, do ANALYZE.  This step may take a long 
time.


Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-30 Thread Rob Willett

Droedel,

Fortunately we have no performance issues in production with the 60GB 
database. These issues came out in testing (which is what testing is 
for).


We're investigating the newly generated ID's as we speak or we will be 
once we get our replica production system setup.


sqlite_analyser has been a problem for us. We've struggled to get a 
build for it.


Rob

On 30 Jul 2018, at 13:49, Droedel wrote:


Hi Rob,
Answers are in the text below

On Mon, Jul 30, 2018, at 13:45, Rob Willett wrote:

Droedel,

We don't think there are significant read access. The database is a
single database on a single thread on a single process. The only 
access
to it is a Perl script that logs the incoming information. We never 
have

two accesses at the same time.


Can you also _measure_ read access, preferably on system level on your 
production database, e.g. by using iostat ? I've seen cases where 
(other, non-SQLite) databases had unexpected disk access patterns due 
to an application error.


We have a nagging feeling (and thats all it is) about the 
autoincrement
value. We do use that feature in the table, but we have deleted so 
many


[snip]
If both databases use autoincrement, then performance should be 
similar (or at least that's what I expect). Can you easily check if 
the newly generated IDs are as expected and larger than any existing 
ID in your table ?



We did wonder if we are filling up pages in the middle or something.
However we expected the vacuum and analyse to sort this out. Now its


[snip]
sqlite3_analyzer can give some measurements, e.g. unused bytes on 
index pages.



We've built the replica test system now and we're going to have some
initial checks and get some benchmarks in place.

It could be an interesting and exciting ride :)


Sure. It's always fun learning something new. But it's less fun in 
full production when customers are yelling :-(


Regards,

Droedel




Rob


On 30 Jul 2018, at 12:32, Droedel wrote:


Hi Rob,

Is there significant read access (iostat: r/s) during these slow
writes ? If yes, it might be due to a small cache, requiring the
database to read (index) pages before updating them.

And is the data you're adding in both databases (large/small) added 
at

the end of the table using the autoincrement, or do you insert some
items in the middle ? I'm not a SQLite performance expert, but in
other databases performance can be very different because in the
former case fewer pages must be updated.

Microsoft SQL Server has something called "fill factor", basically 
the

max percentage of an index page that is used during initial fill,
which helps avoiding too many page shuffling in the index when extra
items are added. Disadvantage: it makes DBAs argue endlessly about 
the

best fill factor ;-) Maybe there's something similar possible in
SQLite but I couldn't find a pragma for this.

Oh, and do both databases have the same page size, preferably 4K ?

Regards,

Droedel



On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:

Droedel,

Thanks for the comprehensive reply. We have actually done all of
this.

The system has been running for 2-3 years and we have taken the
opportunity to try and prune the database from 60GB down to 600MB.
Currently the live system is working OK with a 60GB database, but 
our

test system (which is smaller) is struggling with 600MB.

The system has a restriction of IOPS as it's a Virtual Private
Server.
Technically it's running Ubuntu 16.04 under OpenVZ. We can get
69MB/sec
with a disk to disk copy, which isn't brilliant if we had sustained
disk
traffic which we don't.

We log approx 600 - 800 items of around 3-5K every five minutes.
These
600-800 items are mainly an insert into a single table, there are
other
things happening as well, but this is the bulk of the work. We can
see
that the -wal files grow a small amount (4-5MB) just before the
commit.
It then takes 7 seconds to execute the commit. This is the bit that
we're struggling with. We know we can get circa 70MB/sec data
throughput, so this should take a fraction of a second. Now SQLite
needs
to work out which pages to commit so thats a little longer, but we
know
SQLite is fast, so that shouldn't take 7 seconds on the small
database
as it doesn't take that long on the large 60GB database. Thats the
puzzling bit, the large database is quick, the small one slow.

We have no logging turned on, we can turn SQL logging on at the DBI
level but that turns a 20 sec run into a 2-3 minute run as it
captures
everything :) Nothing in the log files gives us any concern (apart
from
the really long commit time). Simon Slavin suggested dropping the
indexes which we did, that turned the commit into a fast commit, so
its
something to do with the indexes but we can't see what.

What we are now doing is going back to the very beginning:

1. We built a replacement system yesterday with 8GB memory and 8
cores
and 150GB disk space. Its virtualised (ESXI) but under our control.
2. We've installed a copy of the 

Re: [sqlite] Very, very slow commits

2018-07-30 Thread Droedel
Hi Rob, 
Answers are in the text below

On Mon, Jul 30, 2018, at 13:45, Rob Willett wrote:
> Droedel,
> 
> We don't think there are significant read access. The database is a 
> single database on a single thread on a single process. The only access 
> to it is a Perl script that logs the incoming information. We never have 
> two accesses at the same time.

Can you also _measure_ read access, preferably on system level on your 
production database, e.g. by using iostat ? I've seen cases where (other, 
non-SQLite) databases had unexpected disk access patterns due to an application 
error. 

> We have a nagging feeling (and thats all it is) about the autoincrement 
> value. We do use that feature in the table, but we have deleted so many 

[snip]
If both databases use autoincrement, then performance should be similar (or at 
least that's what I expect). Can you easily check if the newly generated IDs 
are as expected and larger than any existing ID in your table ?

> We did wonder if we are filling up pages in the middle or something. 
> However we expected the vacuum and analyse to sort this out. Now its 

[snip]
sqlite3_analyzer can give some measurements, e.g. unused bytes on index pages. 

> We've built the replica test system now and we're going to have some 
> initial checks and get some benchmarks in place.
> 
> It could be an interesting and exciting ride :)

Sure. It's always fun learning something new. But it's less fun in full 
production when customers are yelling :-(

Regards,

Droedel


> 
> Rob
> 
> 
> On 30 Jul 2018, at 12:32, Droedel wrote:
> 
> > Hi Rob,
> >
> > Is there significant read access (iostat: r/s) during these slow 
> > writes ? If yes, it might be due to a small cache, requiring the 
> > database to read (index) pages before updating them.
> >
> > And is the data you're adding in both databases (large/small) added at 
> > the end of the table using the autoincrement, or do you insert some 
> > items in the middle ? I'm not a SQLite performance expert, but in 
> > other databases performance can be very different because in the 
> > former case fewer pages must be updated.
> >
> > Microsoft SQL Server has something called "fill factor", basically the 
> > max percentage of an index page that is used during initial fill, 
> > which helps avoiding too many page shuffling in the index when extra 
> > items are added. Disadvantage: it makes DBAs argue endlessly about the 
> > best fill factor ;-) Maybe there's something similar possible in 
> > SQLite but I couldn't find a pragma for this.
> >
> > Oh, and do both databases have the same page size, preferably 4K ?
> >
> > Regards,
> >
> > Droedel
> >
> >
> >
> > On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:
> >> Droedel,
> >>
> >> Thanks for the comprehensive reply. We have actually done all of 
> >> this.
> >>
> >> The system has been running for 2-3 years and we have taken the
> >> opportunity to try and prune the database from 60GB down to 600MB.
> >> Currently the live system is working OK with a 60GB database, but our
> >> test system (which is smaller) is struggling with 600MB.
> >>
> >> The system has a restriction of IOPS as it's a Virtual Private 
> >> Server.
> >> Technically it's running Ubuntu 16.04 under OpenVZ. We can get 
> >> 69MB/sec
> >> with a disk to disk copy, which isn't brilliant if we had sustained 
> >> disk
> >> traffic which we don't.
> >>
> >> We log approx 600 - 800 items of around 3-5K every five minutes. 
> >> These
> >> 600-800 items are mainly an insert into a single table, there are 
> >> other
> >> things happening as well, but this is the bulk of the work. We can 
> >> see
> >> that the -wal files grow a small amount (4-5MB) just before the 
> >> commit.
> >> It then takes 7 seconds to execute the commit. This is the bit that
> >> we're struggling with. We know we can get circa 70MB/sec data
> >> throughput, so this should take a fraction of a second. Now SQLite 
> >> needs
> >> to work out which pages to commit so thats a little longer, but we 
> >> know
> >> SQLite is fast, so that shouldn't take 7 seconds on the small 
> >> database
> >> as it doesn't take that long on the large 60GB database. Thats the
> >> puzzling bit, the large database is quick, the small one slow.
> >>
> >> We have no logging turned on, we can turn SQL logging on at the DBI
> >> level but that turns a 20 sec run into a 2-3 minute run as it 
> >> captures
> >> everything :) Nothing in the log files gives us any concern (apart 
> >> from
> >> the really long commit time). Simon Slavin suggested dropping the
> >> indexes which we did, that turned the commit into a fast commit, so 
> >> its
> >> something to do with the indexes but we can't see what.
> >>
> >> What we are now doing is going back to the very beginning:
> >>
> >> 1. We built a replacement system yesterday with 8GB memory and 8 
> >> cores
> >> and 150GB disk space. Its virtualised (ESXI) but under our control.
> >> 2. We've installed a copy of the old 60GB 

Re: [sqlite] Very, very slow commits

2018-07-30 Thread Rob Willett

Warren,

The hardware is different, at the time we didn't want to spin up a 
complete production replica as thats quite expensive. We used a smaller 
machine, both have the same type of back end spining raid array, but we 
would think that writing 4-5MB of changed data back shouldn't take 7 
seconds. We had seen far better performance on the slower machine 
earlier in testing.


We will go back to step one and work our way through step by step from 
60GB to 600Mb as our thinking is that we have somehow screwed our 
database up.


Rob


On 30 Jul 2018, at 13:29, Warren Young wrote:

On Jul 30, 2018, at 5:53 AM, Rob Willett 
 wrote:


I would wonder why writing the data to a 60GB database and doing a 
commit is fast and writing exactly the same data to the 600MB 
database is different. The programs for doing it are the same, the 
database schema is identical.


I assume the hardware is different.  Is that not the case?

If the small DB is on a machine with a spinning disk but the large DB 
is on a machine with either an SSD or a many-spindled RAID, there’s 
your key difference.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-30 Thread Warren Young
On Jul 30, 2018, at 5:53 AM, Rob Willett  wrote:
> 
> I would wonder why writing the data to a 60GB database and doing a commit is 
> fast and writing exactly the same data to the 600MB database is different. 
> The programs for doing it are the same, the database schema is identical.

I assume the hardware is different.  Is that not the case?

If the small DB is on a machine with a spinning disk but the large DB is on a 
machine with either an SSD or a many-spindled RAID, there’s your key difference.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-30 Thread Rob Willett

Warren,

On 30 Jul 2018, at 12:28, Warren Young wrote:

On Jul 30, 2018, at 4:51 AM, Rob Willett 
 wrote:


The system has been running for 2-3 years


Has performance changed over that span?  Which direction?


Performance hasn't changed on the large 60GB data database. its pretty 
consistent.




we have taken the opportunity to try and prune the database from 60GB 
down to 600MB.


SQLite’s speed is only weakly affected by database size.  A starting 
guess is log2(N) where N is file size.


Since your data access didn’t get 10x faster from your 100x size 
drop, you can fairly guess that the speed problem isn’t due to the 
inherent time required to traverse tree-based data structures in 
SQLite.




The reason for the pruning is not for the benefit of SQLite, it's for 
our own administration. Backing up 60GB files is hard work, we're also 
struggling to fire up replicas, so we wanted to look at the database as 
a whole and get the size down. The 99% reduction is great, we were 
hoping for a 50% to 80% reduction.


The system has a restriction of IOPS as it's a Virtual Private 
Server. Technically it's running Ubuntu 16.04 under OpenVZ. We can 
get 69MB/sec with a disk to disk copy, which isn't brilliant if we 
had sustained disk traffic which we don’t.


I am sensing a spinning disk.  (Else, throughput should be a lot 
higher.)


I suspect that is the case. However 69MB/sec is adequate for the 60GB 
database and for normal usage. Our commits are fine on the 60GB 
database, just not the 600MB one.


SQLite takes data durability (the D in ACID) seriously, unlike most 
other software applications, so it is uncommon in that it flushes each 
transaction to disk before proceeding with further writes to that 
table.


A flush to disk takes a full disk rotation, and due to the way 
SQLite’s journal operates, each transaction requires two flushes.  
That means that with a 7200 RPM disk, you can get a maximum of 60 
transactions per second per table with SQLite.


I agree with your logic, but I would wonder why writing the data to a 
60GB database and doing a commit is fast and writing exactly the same 
data to the 600MB database is different. The programs for doing it are 
the same, the database schema is identical.




Sound familiar?

If I’ve guessed the problem correctly, the solutions are:

1. Batch multiple writes in a transaction.


All ready done.



2. Switch to an SSD.


Not an option in the short term for production BUT the test system we 
have setup has an SSD. This may skew the results though. Its difficult 
to get an identical system setup, but the other option is a VMWare 
Fusion system on a Mac with a spinning disk. Most of our local systems 
are SSD, we have a Mac with a spinning disk for backup.




3. Use multiple tables and/or multiple DB files.  In your case, I’d 
suggest one SQLite DB per sensor, with one thread per sensor, each of 
which keeps one of the SQLite DBs open continuously.  That way, a 
blocked DB conn won’t block any other writers.


We have one process that reads a single file in every 5 mins. No need 
for multiple databases or multiple threads.




Those solutions are given in order of ease of application and cost of 
implementation.



Nothing in the log files gives us any concern


Have you tried SQLite’s new .expert feature?


No, but we will now :)


   https://sqlite.org/cli.html#index_recommendations_sqlite_expert_


dropping a very large table is really, really, really slow.


If you put your main data table in a file of its own, you can quickly 
“drop” the table by just closing the DB and removing the DB file 
from disk.


Thats what we will do on our test system.



When you then recreate the DB file with a fresh schema, it’s 
effectively defragged/vacuumed as well.


We have a copy of the 60GB data file (took a ling time to download) 
locally now. We will use this (or rather a copy) to start the testing.


Thanks very much for the thoughtful and useful comments.

Rob


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-30 Thread Rob Willett

Droedel,

We don't think there are significant read access. The database is a 
single database on a single thread on a single process. The only access 
to it is a Perl script that logs the incoming information. We never have 
two accesses at the same time.


We have a nagging feeling (and thats all it is) about the autoincrement 
value. We do use that feature in the table, but we have deleted so many 
rows and we may have messed something up as we move data from table to 
table. We did read https://www.sqlite.org/autoinc.html a week or so ago 
and we wonder if the fact we have deleted circa 199,500,000 rows from 
the table of 200,000,000 rows we may have cocked it up somehow. We have 
never directly accessed the sqite_internal table (except to read) but 
this page makes us wonder if we have missed an error somewhere.


We did wonder if we are filling up pages in the middle or something. 
However we expected the vacuum and analyse to sort this out. Now its 
entirely possible we have had this problem before now hence we are going 
to go back to the very beginning with out 61GB database and take it step 
by step so we understand what is happening. Whats frustrating is that 
the 61GB database seems fast and the 600MB seems slow yet they both have 
the same database schema. We feel that we have made an error somewhere 
but only now discovered it.


We've built the replica test system now and we're going to have some 
initial checks and get some benchmarks in place.


It could be an interesting and exciting ride :)

Rob


On 30 Jul 2018, at 12:32, Droedel wrote:


Hi Rob,

Is there significant read access (iostat: r/s) during these slow 
writes ? If yes, it might be due to a small cache, requiring the 
database to read (index) pages before updating them.


And is the data you're adding in both databases (large/small) added at 
the end of the table using the autoincrement, or do you insert some 
items in the middle ? I'm not a SQLite performance expert, but in 
other databases performance can be very different because in the 
former case fewer pages must be updated.


Microsoft SQL Server has something called "fill factor", basically the 
max percentage of an index page that is used during initial fill, 
which helps avoiding too many page shuffling in the index when extra 
items are added. Disadvantage: it makes DBAs argue endlessly about the 
best fill factor ;-) Maybe there's something similar possible in 
SQLite but I couldn't find a pragma for this.


Oh, and do both databases have the same page size, preferably 4K ?

Regards,

Droedel



On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:

Droedel,

Thanks for the comprehensive reply. We have actually done all of 
this.


The system has been running for 2-3 years and we have taken the
opportunity to try and prune the database from 60GB down to 600MB.
Currently the live system is working OK with a 60GB database, but our
test system (which is smaller) is struggling with 600MB.

The system has a restriction of IOPS as it's a Virtual Private 
Server.
Technically it's running Ubuntu 16.04 under OpenVZ. We can get 
69MB/sec
with a disk to disk copy, which isn't brilliant if we had sustained 
disk

traffic which we don't.

We log approx 600 - 800 items of around 3-5K every five minutes. 
These
600-800 items are mainly an insert into a single table, there are 
other
things happening as well, but this is the bulk of the work. We can 
see
that the -wal files grow a small amount (4-5MB) just before the 
commit.

It then takes 7 seconds to execute the commit. This is the bit that
we're struggling with. We know we can get circa 70MB/sec data
throughput, so this should take a fraction of a second. Now SQLite 
needs
to work out which pages to commit so thats a little longer, but we 
know
SQLite is fast, so that shouldn't take 7 seconds on the small 
database

as it doesn't take that long on the large 60GB database. Thats the
puzzling bit, the large database is quick, the small one slow.

We have no logging turned on, we can turn SQL logging on at the DBI
level but that turns a 20 sec run into a 2-3 minute run as it 
captures
everything :) Nothing in the log files gives us any concern (apart 
from

the really long commit time). Simon Slavin suggested dropping the
indexes which we did, that turned the commit into a fast commit, so 
its

something to do with the indexes but we can't see what.

What we are now doing is going back to the very beginning:

1. We built a replacement system yesterday with 8GB memory and 8 
cores

and 150GB disk space. Its virtualised (ESXI) but under our control.
2. We've installed a copy of the old 60GB database on the new system.
3. We're going to benchmark the new system over a couple of thousand
runs to see what the average time is.
4. We'll then work our way through the deduping of the database step 
by
step to see when the commit time blow up. This will take a few days 
as
working out the duplications of 200,000,000 rows isn't that quick :) 

Re: [sqlite] Very, very slow commits

2018-07-30 Thread Droedel
Hi Rob,

Is there significant read access (iostat: r/s) during these slow writes ? If 
yes, it might be due to a small cache, requiring the database to read (index) 
pages before updating them.

And is the data you're adding in both databases (large/small) added at the end 
of the table using the autoincrement, or do you insert some items in the middle 
? I'm not a SQLite performance expert, but in other databases performance can 
be very different because in the former case fewer pages must be updated. 

Microsoft SQL Server has something called "fill factor", basically the max 
percentage of an index page that is used during initial fill, which helps 
avoiding too many page shuffling in the index when extra items are added. 
Disadvantage: it makes DBAs argue endlessly about the best fill factor ;-) 
Maybe there's something similar possible in SQLite but I couldn't find a pragma 
for this. 

Oh, and do both databases have the same page size, preferably 4K ?

Regards,

Droedel



On Mon, Jul 30, 2018, at 12:51, Rob Willett wrote:
> Droedel,
> 
> Thanks for the comprehensive reply. We have actually done all of this.
> 
> The system has been running for 2-3 years and we have taken the 
> opportunity to try and prune the database from 60GB down to 600MB. 
> Currently the live system is working OK with a 60GB database, but our 
> test system (which is smaller) is struggling with 600MB.
> 
> The system has a restriction of IOPS as it's a Virtual Private Server. 
> Technically it's running Ubuntu 16.04 under OpenVZ. We can get 69MB/sec 
> with a disk to disk copy, which isn't brilliant if we had sustained disk 
> traffic which we don't.
> 
> We log approx 600 - 800 items of around 3-5K every five minutes. These 
> 600-800 items are mainly an insert into a single table, there are other 
> things happening as well, but this is the bulk of the work. We can see 
> that the -wal files grow a small amount (4-5MB) just before the commit. 
> It then takes 7 seconds to execute the commit. This is the bit that 
> we're struggling with. We know we can get circa 70MB/sec data 
> throughput, so this should take a fraction of a second. Now SQLite needs 
> to work out which pages to commit so thats a little longer, but we know 
> SQLite is fast, so that shouldn't take 7 seconds on the small database 
> as it doesn't take that long on the large 60GB database. Thats the 
> puzzling bit, the large database is quick, the small one slow.
> 
> We have no logging turned on, we can turn SQL logging on at the DBI 
> level but that turns a 20 sec run into a 2-3 minute run as it captures 
> everything :) Nothing in the log files gives us any concern (apart from 
> the really long commit time). Simon Slavin suggested dropping the 
> indexes which we did, that turned the commit into a fast commit, so its 
> something to do with the indexes but we can't see what.
> 
> What we are now doing is going back to the very beginning:
> 
> 1. We built a replacement system yesterday with 8GB memory and 8 cores 
> and 150GB disk space. Its virtualised (ESXI) but under our control.
> 2. We've installed a copy of the old 60GB database on the new system.
> 3. We're going to benchmark the new system over a couple of thousand 
> runs to see what the average time is.
> 4. We'll then work our way through the deduping of the database step by 
> step to see when the commit time blow up. This will take a few days as 
> working out the duplications of 200,000,000 rows isn't that quick :) As 
> we found out, dropping a very large table is really, really, really 
> slow.
> 5. We'll apply some of the ideas that people have suggested since 
> yesterday to see if they work, but I'm keen that we have a repeatable 
> problem that we solve rather than we use a scatter gun approach to 
> fixing it. We think SQLite is well written so we figure the problem is 
> ours to solve rather than simply blaming the software.
> 
> 
> Thanks
> 
> Rob
> 
> On 30 Jul 2018, at 11:11, Droedel wrote:
> 
> > Hi,
> >
> > When having bad performance, I usually first try to find out if the 
> > slowness is due to disk througput (sequential), slow random access or 
> > something else. In Linux, try "iostat -xtc 5". Do this with and 
> > without your application writing to disk.
> >
> > If you see high CPU %iowait and high %util on your disk, then disk is 
> > the bottleneck. If not: start profiling / analyzing other bottlenecks 
> > (CPU / network / ...)
> >
> > If the disk throughput (wMB/s) is close to your normal sequential 
> > throughput (69 MB/s): try to write less data or get a faster disk.
> > If the disk troughput is low, but high numbers of writes (w/s): 
> > there's too much seeking / too many small writes to your disk. Page 
> > cache too small ? Checkpointing too often ?
> >
> > Sometimes this kind of problems is caused by other applications 
> > (logging / ...) causing too much baseload. %util should be low when 
> > your application isn't running.
> >
> > Just my 2 cents.
> >
> 

Re: [sqlite] Very, very slow commits

2018-07-30 Thread Warren Young
On Jul 30, 2018, at 4:51 AM, Rob Willett  wrote:
> 
> The system has been running for 2-3 years

Has performance changed over that span?  Which direction?

> we have taken the opportunity to try and prune the database from 60GB down to 
> 600MB.

SQLite’s speed is only weakly affected by database size.  A starting guess is 
log2(N) where N is file size.

Since your data access didn’t get 10x faster from your 100x size drop, you can 
fairly guess that the speed problem isn’t due to the inherent time required to 
traverse tree-based data structures in SQLite.

> The system has a restriction of IOPS as it's a Virtual Private Server. 
> Technically it's running Ubuntu 16.04 under OpenVZ. We can get 69MB/sec with 
> a disk to disk copy, which isn't brilliant if we had sustained disk traffic 
> which we don’t.

I am sensing a spinning disk.  (Else, throughput should be a lot higher.)

SQLite takes data durability (the D in ACID) seriously, unlike most other 
software applications, so it is uncommon in that it flushes each transaction to 
disk before proceeding with further writes to that table.

A flush to disk takes a full disk rotation, and due to the way SQLite’s journal 
operates, each transaction requires two flushes.  That means that with a 7200 
RPM disk, you can get a maximum of 60 transactions per second per table with 
SQLite.

Sound familiar?

If I’ve guessed the problem correctly, the solutions are:

1. Batch multiple writes in a transaction.

2. Switch to an SSD.

3. Use multiple tables and/or multiple DB files.  In your case, I’d suggest one 
SQLite DB per sensor, with one thread per sensor, each of which keeps one of 
the SQLite DBs open continuously.  That way, a blocked DB conn won’t block any 
other writers.

Those solutions are given in order of ease of application and cost of 
implementation.

> Nothing in the log files gives us any concern

Have you tried SQLite’s new .expert feature?

   https://sqlite.org/cli.html#index_recommendations_sqlite_expert_

> dropping a very large table is really, really, really slow.

If you put your main data table in a file of its own, you can quickly “drop” 
the table by just closing the DB and removing the DB file from disk.  

When you then recreate the DB file with a fresh schema, it’s effectively 
defragged/vacuumed as well.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-30 Thread Rob Willett

Droedel,

Thanks for the comprehensive reply. We have actually done all of this.

The system has been running for 2-3 years and we have taken the 
opportunity to try and prune the database from 60GB down to 600MB. 
Currently the live system is working OK with a 60GB database, but our 
test system (which is smaller) is struggling with 600MB.


The system has a restriction of IOPS as it's a Virtual Private Server. 
Technically it's running Ubuntu 16.04 under OpenVZ. We can get 69MB/sec 
with a disk to disk copy, which isn't brilliant if we had sustained disk 
traffic which we don't.


We log approx 600 - 800 items of around 3-5K every five minutes. These 
600-800 items are mainly an insert into a single table, there are other 
things happening as well, but this is the bulk of the work. We can see 
that the -wal files grow a small amount (4-5MB) just before the commit. 
It then takes 7 seconds to execute the commit. This is the bit that 
we're struggling with. We know we can get circa 70MB/sec data 
throughput, so this should take a fraction of a second. Now SQLite needs 
to work out which pages to commit so thats a little longer, but we know 
SQLite is fast, so that shouldn't take 7 seconds on the small database 
as it doesn't take that long on the large 60GB database. Thats the 
puzzling bit, the large database is quick, the small one slow.


We have no logging turned on, we can turn SQL logging on at the DBI 
level but that turns a 20 sec run into a 2-3 minute run as it captures 
everything :) Nothing in the log files gives us any concern (apart from 
the really long commit time). Simon Slavin suggested dropping the 
indexes which we did, that turned the commit into a fast commit, so its 
something to do with the indexes but we can't see what.


What we are now doing is going back to the very beginning:

1. We built a replacement system yesterday with 8GB memory and 8 cores 
and 150GB disk space. Its virtualised (ESXI) but under our control.

2. We've installed a copy of the old 60GB database on the new system.
3. We're going to benchmark the new system over a couple of thousand 
runs to see what the average time is.
4. We'll then work our way through the deduping of the database step by 
step to see when the commit time blow up. This will take a few days as 
working out the duplications of 200,000,000 rows isn't that quick :) As 
we found out, dropping a very large table is really, really, really 
slow.
5. We'll apply some of the ideas that people have suggested since 
yesterday to see if they work, but I'm keen that we have a repeatable 
problem that we solve rather than we use a scatter gun approach to 
fixing it. We think SQLite is well written so we figure the problem is 
ours to solve rather than simply blaming the software.



Thanks

Rob

On 30 Jul 2018, at 11:11, Droedel wrote:


Hi,

When having bad performance, I usually first try to find out if the 
slowness is due to disk througput (sequential), slow random access or 
something else. In Linux, try "iostat -xtc 5". Do this with and 
without your application writing to disk.


If you see high CPU %iowait and high %util on your disk, then disk is 
the bottleneck. If not: start profiling / analyzing other bottlenecks 
(CPU / network / ...)


If the disk throughput (wMB/s) is close to your normal sequential 
throughput (69 MB/s): try to write less data or get a faster disk.
If the disk troughput is low, but high numbers of writes (w/s): 
there's too much seeking / too many small writes to your disk. Page 
cache too small ? Checkpointing too often ?


Sometimes this kind of problems is caused by other applications 
(logging / ...) causing too much baseload. %util should be low when 
your application isn't running.


Just my 2 cents.

Kind regards,

Droedel


On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:

Hi,

Background

We've been refactoring our database to reduce the size of it. Through
some simple logic we've managed to pull out 99% of the data to reduce
the size from 51GB down to approx 600MB. This logic has been to 
remove

rows that are almost the same but not quite identical. As with all
things, the thinking was the difficult bit, the execution somewhat
easier.

As part of the testing we've been doing, we've now hit on an odd and
weird problem to do with the COMMIT statement. A commit of a few 
hundred
(circa 600-800) rows takes approx 7 seconds whereas before we never 
even

noticed it, though we now know it was two seconds before. Each row is
probably 1-2K of data, so its not very much at all.

Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent 
69MB/sec.

This is not as fast we would like, but it's the same across all our
virtual servers.

4. We've tested the commit on our existing 60GB database 

Re: [sqlite] Very, very slow commits

2018-07-30 Thread Droedel
Hi,

When having bad performance, I usually first try to find out if the slowness is 
due to disk througput (sequential), slow random access or something else. In 
Linux, try "iostat -xtc 5". Do this with and without your application writing 
to disk.

If you see high CPU %iowait and high %util on your disk, then disk is the 
bottleneck. If not: start profiling / analyzing other bottlenecks (CPU / 
network / ...)

If the disk throughput (wMB/s) is close to your normal sequential throughput 
(69 MB/s): try to write less data or get a faster disk.
If the disk troughput is low, but high numbers of writes (w/s): there's too 
much seeking / too many small writes to your disk. Page cache too small ? 
Checkpointing too often ?

Sometimes this kind of problems is caused by other applications (logging / ...) 
causing too much baseload. %util should be low when your application isn't 
running.

Just my 2 cents.

Kind regards,

Droedel


On Sun, Jul 29, 2018, at 10:14, Rob Willett wrote:
> Hi,
> 
> Background
> 
> We've been refactoring our database to reduce the size of it. Through 
> some simple logic we've managed to pull out 99% of the data to reduce 
> the size from 51GB down to approx 600MB. This logic has been to remove 
> rows that are almost the same but not quite identical. As with all 
> things, the thinking was the difficult bit, the execution somewhat 
> easier.
> 
> As part of the testing we've been doing, we've now hit on an odd and 
> weird problem to do with the COMMIT statement. A commit of a few hundred 
> (circa 600-800) rows takes approx 7 seconds whereas before we never even 
> noticed it, though we now know it was two seconds before. Each row is 
> probably 1-2K of data, so its not very much at all.
> 
> Details of what we have tried:
> 
> 1. We've turned synchronous  on and off
> 
> PRAGMA synchronous=ON
> 
> and thats not made any difference.
> 
> 2. We are using and have been using WAL mode for years.
> 
> PRAGMA journal_mode;
> journal_mode
> wal
> 
> 3. We've tested that the server copies OK, we get a consistent 69MB/sec. 
> This is not as fast we would like, but it's the same across all our 
> virtual servers.
> 
> 4. We've tested the commit on our existing 60GB database and it takes 2 
> seconds, which is longer than we thought it would be. The server for the 
> 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 14.04. The 
> server we are testing on is a 2GB/2 core test server running Ubuntu 
> 16.04. Whilst the test server is smaller, we wouldn't expect it to take 
> 3 times longer to do a commit.
> 
> 5. The code is identical across the servers. We are running Perl and the 
> DBI module. The code for doing a commit in Perl::DBI is
>   $dbh->do("COMMIT");
> 
>We are getting the expected performance elsewhere on the system and 
> in the code. It's just the commit that is taking a long time.
> 
> 6. The code we are committing is adding 600-800 lines to a table that 
> used to be 200,000,000 rows in size. It's now 400,000 lines in size. We 
> are wondering if the deletion of the lines has had an impact we didn't 
> expect. We have vacuumed and analysed the database.
> 
> The schema for the table we insert into is
> 
> CREATE TABLE IF NOT EXISTS "Disruptions" (
>"id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
>"version" integer NOT NULL,
>"Disruption_id" INTEGER NOT NULL,
>"status" integer NOT NULL,
>"severity" integer NOT NULL,
>"levelOfInterest" integer NOT NULL,
>"category" integer NOT NULL,
>"subCategory" integer NOT NULL,
>"startTime" TEXT NOT NULL,
>"endTime" text NOT NULL,
>"location" integer NOT NULL,
>"corridor" integer NOT NULL,
>"comments" integer NOT NULL,
>"currentUpdate" integer NOT NULL,
>"remarkTime" TEXT NOT NULL,
>"lastModTime" TEXT NOT NULL,
>"CauseAreaPointX" real NOT NULL,
>"CauseAreaPointY" real NOT NULL,
>"Direction" TEXT
> );
> CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE 
> NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC, 
> "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, 
> "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
> "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
> COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE 
> NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE 
> NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE 
> NOCASE ASC);
> CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", "Disruption_id", 
> "Severity", "levelOfInterest", "category", "subCategory", "version");
> 
> We have checked that this schema is consistent across the databases.
> 
> We're about to recreate the table to see if that makes a 

Re: [sqlite] Very, very slow commits

2018-07-29 Thread Simon Slavin
Please try moving your COLLATE clauses into the table definition.  e.g. instead 
of

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE 
NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE NOCASE ASC);

Your table definition should have

 "version" integer NOT NULL COLLATE NOCASE,
 "Disruption_id" INTEGER NOT NULL COLLATE NOCASE,
...
 "location" integer NOT NULL COLLATE NOCASE,

and the index should be

CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions
("version" ASC, "Disruption_id" ASC, "location" ASC);

Once data has been entered, do ANALYZE.  This step may take a long time.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

Thanks for the mail.

We ran analyze with no indexes, made no difference.

We recreated the indexes and ran analyze again. The very long commit is 
back, this time it took 14 secs :)


It does appear that the indexes have something to do with this whereby 
they have not been an issue to now.


Rob

On 29 Jul 2018, at 11:45, J. King wrote:

On July 29, 2018 5:47:29 AM EDT, Rob Willett 
 wrote:

John,

Thanks for the prompt reply and a very good question..

We've dropped the indexes and the commit is now very quick, approx 
two

seconds

However the overall performance of the run is much the same as other
areas of the code are now significantly slower, whereas before they
were
quick.

Where were you going with that question?


Might ANALYZE help?
--
J. King
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

Anton,

Dropped the indexes and created them without order.

We'll need to look at what your second para means. It could be a major 
and massive change.


Rob



On 29 Jul 2018, at 11:52, Djelf wrote:


Rob,

Try creating indexes without order.

Or, try to make a column with a hash of the values entering the index 
and
search for the value by the index of. This will complicate the logic 
of your

program, but it will decrease the volume of the database, and possibly
significantly speed up both reading and writing.

---
Anton Azanov



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Djelf
Rob,

Try creating indexes without order.

Or, try to make a column with a hash of the values entering the index and
search for the value by the index of. This will complicate the logic of your
program, but it will decrease the volume of the database, and possibly
significantly speed up both reading and writing.

---
Anton Azanov



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread J. King
On July 29, 2018 5:47:29 AM EDT, Rob Willett  
wrote:
>John,
>
>Thanks for the prompt reply and a very good question..
>
>We've dropped the indexes and the commit is now very quick, approx two 
>seconds
>
>However the overall performance of the run is much the same as other 
>areas of the code are now significantly slower, whereas before they
>were 
>quick.
>
>Where were you going with that question?
>
Might ANALYZE help? 
-- 
J. King
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread R Smith
Also, kindly clarify one bit - I'm not sure whether you use the word 
"commit" as an easy substitute for the entire process of updating the DB 
(i.e all SQL INSERT/UPDATE code that runs up to and including the COMMIT 
statement), of if you actually mean the "COMMIT" operation, because the 
things that take time are the updating processes, the "COMMIT" function 
is simply /typically/ a single quick file operation (depending on 
Journal mode of course), and that is typically quick, and if not, might 
give a clue towards the problem.




On 2018/07/29 11:47 AM, Rob Willett wrote:



What is the performance without the indexes?

On Sun, 29 Jul 2018 10:20:11 +0100
"Rob Willett"  wrote:


Update 1

We've copied the disruptions table to a new table, dropped the old
table, copied the new table back in and recreated all the indexes.

Exactly the sam commit performance.

We've also tracked the -shm and -wal files and they are around 5MB in
size.

Mmmm tricky (as Deepthought said).

Rob

On 29 Jul 2018, at 9:14, Rob Willett wrote:


Hi,

Background

We've been refactoring our database to reduce the size of it. Through
some simple logic we've managed to pull out 99% of the data to reduce
the size from 51GB down to approx 600MB. This logic has been to remove
rows that are almost the same but not quite identical. As with all
things, the thinking was the difficult bit, the execution somewhat
easier.

As part of the testing we've been doing, we've now hit on an odd and
weird problem to do with the COMMIT statement. A commit of a few
hundred (circa 600-800) rows takes approx 7 seconds whereas before we
never even noticed it, though we now know it was two seconds before.
Each row is probably 1-2K of data, so its not very much at all.

Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent
69MB/sec. This is not as fast we would like, but it's the same across
all our virtual servers.

4. We've tested the commit on our existing 60GB database and it takes
2 seconds, which is longer than we thought it would be. The server for
the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
14.04. The server we are testing on is a 2GB/2 core test server
running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't
expect it to take 3 times longer to do a commit.

5. The code is identical across the servers. We are running Perl and
the DBI module. The code for doing a commit in Perl::DBI is
 $dbh->do("COMMIT");

  We are getting the expected performance elsewhere on the system and
in the code. It's just the commit that is taking a long time.

6. The code we are committing is adding 600-800 lines to a table that
used to be 200,000,000 rows in size. It's now 400,000 lines in size.
We are wondering if the deletion of the lines has had an impact we
didn't expect. We have vacuumed and analysed the database.

The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id"
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
COLLATE NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
"Disruption_id", "Severity", "levelOfInterest", "category",
"subCategory", "version");

We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] Very, very slow commits

2018-07-29 Thread R Smith

On 2018/07/29 11:47 AM, Rob Willett wrote:

John,

Thanks for the prompt reply and a very good question..

We've dropped the indexes and the commit is now very quick, approx two 
seconds


However the overall performance of the run is much the same as other 
areas of the code are now significantly slower, whereas before they 
were quick.


Where were you going with that question?


Possibly you have too many indexes.

An Index is an expensive thing to maintain for a DB, it's only ever 
useful when the improvement gained for look-ups significantly outweighs 
the cost of updating the Indexes.


The typical process here is to remove all Indexes, then add them back 
one by one and run all queries, noting which adds benefit and which not, 
then when all are installed, remove them in the same order (i.e. if you 
added A then B then C... start removing A then B then C also) and 
measure again, you will quickly find the useless Indexes.


This is the very last step in design though, it's the kind of 
optimization everyone talks about when they warn against "premature 
optimization". A prior step would be to study the queries and see if you 
can find better Indexes, or ones that covers (i.e. is helpful with) a 
wider range of queries, etc.


Apart from all that... did you add any triggers since the big DB? Which 
thing is re-forming the previously "big" sets of data records into the 
new streamlined set? If this is a Trigger or UDF, does that not eat any 
time?



Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

John,

Thanks for the prompt reply and a very good question..

We've dropped the indexes and the commit is now very quick, approx two 
seconds


However the overall performance of the run is much the same as other 
areas of the code are now significantly slower, whereas before they were 
quick.


Where were you going with that question?

Thanks

Rob

On 29 Jul 2018, at 10:33, John Found wrote:


What is the performance without the indexes?

On Sun, 29 Jul 2018 10:20:11 +0100
"Rob Willett"  wrote:


Update 1

We've copied the disruptions table to a new table, dropped the old
table, copied the new table back in and recreated all the indexes.

Exactly the sam commit performance.

We've also tracked the -shm and -wal files and they are around 5MB in
size.

Mmmm tricky (as Deepthought said).

Rob

On 29 Jul 2018, at 9:14, Rob Willett wrote:


Hi,

Background

We've been refactoring our database to reduce the size of it. 
Through
some simple logic we've managed to pull out 99% of the data to 
reduce
the size from 51GB down to approx 600MB. This logic has been to 
remove

rows that are almost the same but not quite identical. As with all
things, the thinking was the difficult bit, the execution somewhat
easier.

As part of the testing we've been doing, we've now hit on an odd and
weird problem to do with the COMMIT statement. A commit of a few
hundred (circa 600-800) rows takes approx 7 seconds whereas before 
we

never even noticed it, though we now know it was two seconds before.
Each row is probably 1-2K of data, so its not very much at all.

Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent
69MB/sec. This is not as fast we would like, but it's the same 
across

all our virtual servers.

4. We've tested the commit on our existing 60GB database and it 
takes
2 seconds, which is longer than we thought it would be. The server 
for

the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu
14.04. The server we are testing on is a 2GB/2 core test server
running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't
expect it to take 3 times longer to do a commit.

5. The code is identical across the servers. We are running Perl and
the DBI module. The code for doing a commit in Perl::DBI is
 $dbh->do("COMMIT");

  We are getting the expected performance elsewhere on the system 
and

in the code. It's just the commit that is taking a long time.

6. The code we are committing is adding 600-800 lines to a table 
that

used to be 200,000,000 rows in size. It's now 400,000 lines in size.
We are wondering if the deletion of the lines has had an impact we
didn't expect. We have vacuumed and analysed the database.

The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE
ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC,
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC,
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions 
("Disruption_id"

COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version"
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location"
COLLATE NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status",
"Disruption_id", "Severity", "levelOfInterest", "category",
"subCategory", "version");

We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing 

Re: [sqlite] Very, very slow commits

2018-07-29 Thread John Found
What is the performance without the indexes?

On Sun, 29 Jul 2018 10:20:11 +0100
"Rob Willett"  wrote:

> Update 1
> 
> We've copied the disruptions table to a new table, dropped the old 
> table, copied the new table back in and recreated all the indexes.
> 
> Exactly the sam commit performance.
> 
> We've also tracked the -shm and -wal files and they are around 5MB in 
> size.
> 
> Mmmm tricky (as Deepthought said).
> 
> Rob
> 
> On 29 Jul 2018, at 9:14, Rob Willett wrote:
> 
> > Hi,
> >
> > Background
> >
> > We've been refactoring our database to reduce the size of it. Through 
> > some simple logic we've managed to pull out 99% of the data to reduce 
> > the size from 51GB down to approx 600MB. This logic has been to remove 
> > rows that are almost the same but not quite identical. As with all 
> > things, the thinking was the difficult bit, the execution somewhat 
> > easier.
> >
> > As part of the testing we've been doing, we've now hit on an odd and 
> > weird problem to do with the COMMIT statement. A commit of a few 
> > hundred (circa 600-800) rows takes approx 7 seconds whereas before we 
> > never even noticed it, though we now know it was two seconds before. 
> > Each row is probably 1-2K of data, so its not very much at all.
> >
> > Details of what we have tried:
> >
> > 1. We've turned synchronous  on and off
> >
> > PRAGMA synchronous=ON
> >
> > and thats not made any difference.
> >
> > 2. We are using and have been using WAL mode for years.
> >
> > PRAGMA journal_mode;
> > journal_mode
> > wal
> >
> > 3. We've tested that the server copies OK, we get a consistent 
> > 69MB/sec. This is not as fast we would like, but it's the same across 
> > all our virtual servers.
> >
> > 4. We've tested the commit on our existing 60GB database and it takes 
> > 2 seconds, which is longer than we thought it would be. The server for 
> > the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 
> > 14.04. The server we are testing on is a 2GB/2 core test server 
> > running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't 
> > expect it to take 3 times longer to do a commit.
> >
> > 5. The code is identical across the servers. We are running Perl and 
> > the DBI module. The code for doing a commit in Perl::DBI is
> >  $dbh->do("COMMIT");
> >
> >   We are getting the expected performance elsewhere on the system and 
> > in the code. It's just the commit that is taking a long time.
> >
> > 6. The code we are committing is adding 600-800 lines to a table that 
> > used to be 200,000,000 rows in size. It's now 400,000 lines in size. 
> > We are wondering if the deletion of the lines has had an impact we 
> > didn't expect. We have vacuumed and analysed the database.
> >
> > The schema for the table we insert into is
> >
> > CREATE TABLE IF NOT EXISTS "Disruptions" (
> >  "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
> >  "version" integer NOT NULL,
> >  "Disruption_id" INTEGER NOT NULL,
> >  "status" integer NOT NULL,
> >  "severity" integer NOT NULL,
> >  "levelOfInterest" integer NOT NULL,
> >  "category" integer NOT NULL,
> >  "subCategory" integer NOT NULL,
> >  "startTime" TEXT NOT NULL,
> >  "endTime" text NOT NULL,
> >  "location" integer NOT NULL,
> >  "corridor" integer NOT NULL,
> >  "comments" integer NOT NULL,
> >  "currentUpdate" integer NOT NULL,
> >  "remarkTime" TEXT NOT NULL,
> >  "lastModTime" TEXT NOT NULL,
> >  "CauseAreaPointX" real NOT NULL,
> >  "CauseAreaPointY" real NOT NULL,
> >  "Direction" TEXT
> > );
> > CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE 
> > NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE 
> > ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, 
> > "levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
> > "subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
> > CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
> > COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE 
> > NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
> > CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" 
> > COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" 
> > COLLATE NOCASE ASC);
> > CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", 
> > "Disruption_id", "Severity", "levelOfInterest", "category", 
> > "subCategory", "version");
> >
> > We have checked that this schema is consistent across the databases.
> >
> > We're about to recreate the table to see if that makes a difference.
> >
> > Any help or advice welcomed.
> >
> > Thanks
> >
> > Rob
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> 

Re: [sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

Update 1

We've copied the disruptions table to a new table, dropped the old 
table, copied the new table back in and recreated all the indexes.


Exactly the sam commit performance.

We've also tracked the -shm and -wal files and they are around 5MB in 
size.


Mmmm tricky (as Deepthought said).

Rob

On 29 Jul 2018, at 9:14, Rob Willett wrote:


Hi,

Background

We've been refactoring our database to reduce the size of it. Through 
some simple logic we've managed to pull out 99% of the data to reduce 
the size from 51GB down to approx 600MB. This logic has been to remove 
rows that are almost the same but not quite identical. As with all 
things, the thinking was the difficult bit, the execution somewhat 
easier.


As part of the testing we've been doing, we've now hit on an odd and 
weird problem to do with the COMMIT statement. A commit of a few 
hundred (circa 600-800) rows takes approx 7 seconds whereas before we 
never even noticed it, though we now know it was two seconds before. 
Each row is probably 1-2K of data, so its not very much at all.


Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent 
69MB/sec. This is not as fast we would like, but it's the same across 
all our virtual servers.


4. We've tested the commit on our existing 60GB database and it takes 
2 seconds, which is longer than we thought it would be. The server for 
the 60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 
14.04. The server we are testing on is a 2GB/2 core test server 
running Ubuntu 16.04. Whilst the test server is smaller, we wouldn't 
expect it to take 3 times longer to do a commit.


5. The code is identical across the servers. We are running Perl and 
the DBI module. The code for doing a commit in Perl::DBI is

 $dbh->do("COMMIT");

  We are getting the expected performance elsewhere on the system and 
in the code. It's just the commit that is taking a long time.


6. The code we are committing is adding 600-800 lines to a table that 
used to be 200,000,000 rows in size. It's now 400,000 lines in size. 
We are wondering if the deletion of the lines has had an impact we 
didn't expect. We have vacuumed and analysed the database.


The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE 
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE 
ASC, "category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, 
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE 
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" 
COLLATE NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" 
COLLATE NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", 
"Disruption_id", "Severity", "levelOfInterest", "category", 
"subCategory", "version");


We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Very, very slow commits

2018-07-29 Thread Rob Willett

Hi,

Background

We've been refactoring our database to reduce the size of it. Through 
some simple logic we've managed to pull out 99% of the data to reduce 
the size from 51GB down to approx 600MB. This logic has been to remove 
rows that are almost the same but not quite identical. As with all 
things, the thinking was the difficult bit, the execution somewhat 
easier.


As part of the testing we've been doing, we've now hit on an odd and 
weird problem to do with the COMMIT statement. A commit of a few hundred 
(circa 600-800) rows takes approx 7 seconds whereas before we never even 
noticed it, though we now know it was two seconds before. Each row is 
probably 1-2K of data, so its not very much at all.


Details of what we have tried:

1. We've turned synchronous  on and off

PRAGMA synchronous=ON

and thats not made any difference.

2. We are using and have been using WAL mode for years.

PRAGMA journal_mode;
journal_mode
wal

3. We've tested that the server copies OK, we get a consistent 69MB/sec. 
This is not as fast we would like, but it's the same across all our 
virtual servers.


4. We've tested the commit on our existing 60GB database and it takes 2 
seconds, which is longer than we thought it would be. The server for the 
60GB database is a large VPS with 8GB/8 cores and runs Ubuntu 14.04. The 
server we are testing on is a 2GB/2 core test server running Ubuntu 
16.04. Whilst the test server is smaller, we wouldn't expect it to take 
3 times longer to do a commit.


5. The code is identical across the servers. We are running Perl and the 
DBI module. The code for doing a commit in Perl::DBI is

 $dbh->do("COMMIT");

  We are getting the expected performance elsewhere on the system and 
in the code. It's just the commit that is taking a long time.


6. The code we are committing is adding 600-800 lines to a table that 
used to be 200,000,000 rows in size. It's now 400,000 lines in size. We 
are wondering if the deletion of the lines has had an impact we didn't 
expect. We have vacuumed and analysed the database.


The schema for the table we insert into is

CREATE TABLE IF NOT EXISTS "Disruptions" (
 "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
 "version" integer NOT NULL,
 "Disruption_id" INTEGER NOT NULL,
 "status" integer NOT NULL,
 "severity" integer NOT NULL,
 "levelOfInterest" integer NOT NULL,
 "category" integer NOT NULL,
 "subCategory" integer NOT NULL,
 "startTime" TEXT NOT NULL,
 "endTime" text NOT NULL,
 "location" integer NOT NULL,
 "corridor" integer NOT NULL,
 "comments" integer NOT NULL,
 "currentUpdate" integer NOT NULL,
 "remarkTime" TEXT NOT NULL,
 "lastModTime" TEXT NOT NULL,
 "CauseAreaPointX" real NOT NULL,
 "CauseAreaPointY" real NOT NULL,
 "Direction" TEXT
);
CREATE INDEX "Disruptions_Idx1" ON Disruptions ("location" COLLATE 
NOCASE ASC, "corridor" COLLATE NOCASE ASC, "status" COLLATE NOCASE ASC, 
"category" COLLATE NOCASE ASC, "severity" COLLATE NOCASE ASC, 
"levelOfInterest" COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, 
"subCategory" COLLATE NOCASE ASC, "startTime" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx2" ON Disruptions ("Disruption_id" 
COLLATE NOCASE ASC, "version" COLLATE NOCASE ASC, "category" COLLATE 
NOCASE ASC, "subCategory" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Disruptions_Idx3" ON Disruptions ("version" COLLATE 
NOCASE ASC, "Disruption_id" COLLATE NOCASE ASC, "location" COLLATE 
NOCASE ASC);
CREATE INDEX Disruptions_Idx5 ON Disruptions ("status", "Disruption_id", 
"Severity", "levelOfInterest", "category", "subCategory", "version");


We have checked that this schema is consistent across the databases.

We're about to recreate the table to see if that makes a difference.

Any help or advice welcomed.

Thanks

Rob
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users