Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-12 Thread Jens Alfke


> On Sep 12, 2019, at 12:33 PM, mailing lists  wrote:
> 
> the application also runs on mobile devices and the insertion (excluding 
> indexing) also takes a couple of minutes. This means that the user might 
> switch to another app during insertion and - in the worst case - the system 
> is just terminating my background app. So, all the insertion is lost in case 
> of a single transaction.

[Off-topic]

Your app should not be getting terminated without notice! Use the platform's 
APIs for requesting background time. I'm not an Android programmer, but here's 
what to do on iOS:

When notified the app is being backgrounded, call 
UIApplication.beginBackgroundTask. The OS will give you more time to run. More 
importantly, you'll be notified when the OS wants you to quit, and can clean up 
(i.e. commit the transaction).

To guard against conditions where the OS needs to terminate the app while it's 
in the foreground (low memory or low battery), implement the 
UIApplicationDelegate method applicationWillTerminate and perform the same 
cleanup there.

Note that you won't get notifications like this if your DB processing is 
blocking the main thread's runloop. To prevent that, do the work on a 
background thread, or at least put the runloop in a modal state and 
periodically run it for an iteration.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-12 Thread mailing lists
Hi,

> Am 2019-09-12 um 10:55 schrieb Keith Medcalf :
> 
> 
> On Tuesday, 10 September, 2019 09:26, mailing lists  
> wrote:
> 
>> I cannot really put all the inserts into one transaction because in case of
>> a failure I loose all the already inserted data. 
> 
> Why is that important?  Cannot you just load it again from whence it came in 
> the first place on failure?  

the application also runs on mobile devices and the insertion (excluding 
indexing) also takes a couple of minutes. This means that the user might switch 
to another app during insertion and - in the worst case - the system is just 
terminating my background app. So, all the insertion is lost in case of a 
single transaction. This scenario should be prevented.

> 
>> There is hardly any performance gain anymore when doing 1000 or 10 000
>> insertions in one transaction including immediate insertion into indices
>> (in my case the difference is in the per cent range).
> 
> I find that the difference between (a) "inserting all the data into a table 
> with indexes in a single transaction"; (b) "inserting all the data into a 
> table without indexes in a single transaction and then create the indexes"; 
> and, (d) "within a single transaction drop the indexes, insert all the data, 
> then create the indexes" is:
> 
> (b) is about 10% faster than (a)
> (c) is about 40& faster than (a)
> 
> smaller batch sizes result in more random I/O and performance decreases as 
> the batch size decreases.
> 
> -- 
> The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
> lot about anticipated traffic volume.
> 
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-12 Thread Keith Medcalf

On Tuesday, 10 September, 2019 09:26, mailing lists  
wrote:

>I cannot really put all the inserts into one transaction because in case of
>a failure I loose all the already inserted data. 

Why is that important?  Cannot you just load it again from whence it came in 
the first place on failure?  

>There is hardly any performance gain anymore when doing 1000 or 10 000
>insertions in one transaction including immediate insertion into indices
>(in my case the difference is in the per cent range).

I find that the difference between (a) "inserting all the data into a table 
with indexes in a single transaction"; (b) "inserting all the data into a table 
without indexes in a single transaction and then create the indexes"; and, (d) 
"within a single transaction drop the indexes, insert all the data, then create 
the indexes" is:

(b) is about 10% faster than (a)
(c) is about 40& faster than (a)

smaller batch sizes result in more random I/O and performance decreases as the 
batch size decreases.

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-12 Thread Valentin Davydov
On Tue, Sep 10, 2019 at 05:25:38PM +0200, mailing lists wrote:
> Hi,
> 
> I cannot really put all the inserts into one transaction because in case of a 
> failure I loose all the already inserted data. Though I made some tests. 
> There is hardly any performance gain anymore when doing 1000 or 10 000 
> insertions in one transaction including immediate insertion into indices (in 
> my case the difference is in the per cent range).

What do you mean "to loose data"? Do you need them to be immediately available 
via SQL, or just written to persistent storage? In the latter case you can 
implement your own data cache, like sequentilal log files, which will be 
periodically (and/or on demand) rotated, and afterwards asynchronously parsed, 
inserted into the SQLite database with optimized CACHE_SIZE, transaction size,
journal mode etc, and deleted only after successfull commit. Thus you shift the 
burden from SQL to filesystem which is less limited by natural data structure 
and might perform better.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-10 Thread Simon Slavin
On 10 Sep 2019, at 4:02pm, mailing lists  wrote:

> Insertion really slows down after about 100 000 items have been inserted. I 
> suppose that the slow down is related to indexing because:
> 
> [...]
> c) changing the cache size has only a minor impact

SQLite speed does degrade with table size, but it degrades gracefully.  
Depending on which OS you're using, you have at least three caches in play.  A 
sudden change in speed indicates that you just bust one of those caches, as you 
wrote.  But in this case it's probably the OS's memory cache assigned to your 
program.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-10 Thread Keith Medcalf

On Tuesday, 10 September, 2019 09:26, mailing lists  
wrote:

>I cannot really put all the inserts into one transaction because in case of
>a failure I loose all the already inserted data. Though I made some tests.
>There is hardly any performance gain anymore when doing 1000 or 10 000
>insertions in one transaction including immediate insertion into indices
>(in my case the difference is in the per cent range).

>What is the background that index creation is so much faster than insertion
>using indices? Once I heard something about fragmentation but on solid
>state disks fragmentation should not play a role as long as indices and
>data are separated, are they?

When you "create" an index as a single operation, you scan the table to collect 
the key data, do a sort, and then do an in-order insertion into the B-Tree, and 
then write out the entire tree all at once (it is a single transaction).

When the index already exists, you have to "fiddle" with the B-Tree for each 
record because it is no longer being built in order.  You have to split and 
combine pages and shuffle the data about as each record is inserted into the 
index.  The more records that can be inserted per transaction the less I/O will 
be required (and you have to have a big enough cache).

If you can pre-sort the records so that they are inserted in-order and use a 
relatively large number of records inserted per transaction, then there will be 
very little difference between the two.  Of course, if you have multiple 
indexes then pre-sorting into the most efficient insertion order is a high art 
(and often not entirely possible anyway).

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-10 Thread Richard Hipp
On 9/10/19, mailing lists  wrote:

> What is the background that index creation is so much faster than insertion
> using indices?

Indexes are maintained in key order.  So inserts are happening at
random spots all across the index.  For each insert, the system has to
(1) read a 4096-byte page, (2) update the 10 or 20 bytes corresponding
to the index entry, and finally (3) write the 4096-byte page.

If you keep pages in cache, you might get lucky in update two or three
entries on each page before you have to write it back.  But even then,
you are doing a lot of I/O relative to the amount of content you are
changing.  Your I/O is hundreds of times more than the amount of
content being updated.

CREATE INDEX does an external merge source on the keys, such that keys
are always written in ascending order.  Hence, the pattern is (1) fill
up a 4096-byte page with new entries, then (2) write the page.  The
I/O to content change ratio is now close to 1.0, which is what you
want.  About a hundred times less I/O than random inserts.

The fact that all pages are written in order also helps, as
filesystems tend to be optimized for that case.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-10 Thread mailing lists
Hi,

I cannot really put all the inserts into one transaction because in case of a 
failure I loose all the already inserted data. Though I made some tests. There 
is hardly any performance gain anymore when doing 1000 or 10 000 insertions in 
one transaction including immediate insertion into indices (in my case the 
difference is in the per cent range).

What is the background that index creation is so much faster than insertion 
using indices? Once I heard something about fragmentation but on solid state 
disks fragmentation should not play a role as long as indices and data are 
separated, are they?

Regards,
Hartwig


> Am 2019-09-10 um 17:16 schrieb Richard Hipp :
> 
> On 9/10/19, mailing lists  wrote:
> 
>> So, the best solution I found so far is to disable indexing while insertion
>> and to index the table afterwards
> 
> I think that is the best solution.  Be sure to also do all of your
> inserts (and the CREATE INDEX statements) inside of a transaction.
> 
> -- 
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-10 Thread Richard Hipp
On 9/10/19, mailing lists  wrote:

> So, the best solution I found so far is to disable indexing while insertion
> and to index the table afterwards

I think that is the best solution.  Be sure to also do all of your
inserts (and the CREATE INDEX statements) inside of a transaction.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-10 Thread Rob Willett

Hartwig,

You have got most of the tricks we know about. Other more experienced 
developers may provide a better insight.


We had to moved about 60GB of table data about and we ended up doing 
what you have done with one extra bit, we batched the jobs up in 
multiples of 10,000 between BEGIN and END to make transactions out of 
them. It's not clear if you are doing that.


Rob

On 10 Sep 2019, at 16:02, mailing lists wrote:


I have the following situation:

- in one table relatively small data is inserted (100 bytes per 
record)

- this table contains three indices
- about 100 million or more records have to be inserted

Insertion really slows down after about 100 000 items have been 
inserted. I suppose that the slow down is related to indexing because:


a) removing the indices brings the speed up
b) it does not matter whether using a solid state drive or a 
conventional one (the overall speed differs but not the phenomenon 
itself)

c) changing the cache size has only a minor impact

So, the best solution I found so far is to disable indexing while 
insertion and to index the table afterwards (this is magnitudes faster 
than insertion with indexes). Are there any better solutions or other 
tricks I might try (splitting table into a data and an index part)?


BTW: I am using journal_mode DELETE. WAL mode only delays the problem 
and increases a bit the speed but not significantly.


Regards,
Hartwig



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-10 Thread mailing lists
I have the following situation:

- in one table relatively small data is inserted (100 bytes per record)
- this table contains three indices
- about 100 million or more records have to be inserted

Insertion really slows down after about 100 000 items have been inserted. I 
suppose that the slow down is related to indexing because:

a) removing the indices brings the speed up
b) it does not matter whether using a solid state drive or a conventional one 
(the overall speed differs but not the phenomenon itself)
c) changing the cache size has only a minor impact

So, the best solution I found so far is to disable indexing while insertion and 
to index the table afterwards (this is magnitudes faster than insertion with 
indexes). Are there any better solutions or other tricks I might try (splitting 
table into a data and an index part)?

BTW: I am using journal_mode DELETE. WAL mode only delays the problem and 
increases a bit the speed but not significantly.

Regards,
Hartwig



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users