Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Clemens Ladisch [clem...@ladisch.de] Sent: Wednesday, September 11, 2013 18:57 To: sqlite-users@sqlite.org Subject: Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables >Harmen de Jong - CoachR Group B.V. wrote: >> http://www.coachrdevelopment.com/share/callstack_tree.html >> >> This shows most time is spend on sqlite3CodeRowTriggerDirect. > >I'd guess the actual culprit is the loop in getRowTrigger (which does >not show up because it is inlined): > > /* It may be that this trigger has already been coded (or is in the > ** process of being coded). If this is the case, then an entry with > ** a matching TriggerPrg.pTrigger field will be present somewhere > ** in the Parse.pTriggerPrg list. Search for such an entry. */ > for(pPrg=pRoot->pTriggerPrg; > pPrg && (pPrg->pTrigger!=pTrigger || pPrg->orconf!=orconf); > pPrg=pPrg->pNext > ); We have put a timer around this 'inline' function and indeed as you suggest this is causing a huge part of the 'overhead'. This specific code takes 45.28% of the total time. What it does is keeping a list (Parse::pTriggerPrg) of trigger programs that are already created and every time a trigger program is created, it checks this list to see if it is already created. Obviously this list becomes longer as the foreign keys are looped through. Hereby our earlier assumption that the increasement was polynomial because of two nested loops seems to be wrong. So after improving this feature we still have to find about another 25% -;). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 12 sep. 2013, at 07:20, "James K. Lowden" mailto:jklow...@schemamania.org>> wrote: On Tue, 10 Sep 2013 12:58:21 + Harmen de Jong - CoachR Group B.V. mailto:har...@coachr.com>> wrote: I think the way I wrote our timings were not that clear, since they are definately exponentially. The numbers from my previous post refer to the multiplier between the test cases. Just to make it clear, here follows the same tests, but then expressed in msec of total time per test. 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total I don't know what you mean by "exponentially". 500 .020 ms/table 1000 .025 ms/table 5000 .0596 ms/table 1 .0985 ms/table Linearly, I'd say. It may help to look at it graphically. Well, actually it is neither of both (calling it exponentially was a mistake on my side). The increase in time is polynomial where we would expect an increase that is more or less linearly. http://www.schemamania.org/sqlite/graph.pdf we cannot find anything in there that would explain an exponential groth in time. I doubt you will. Well, the non-linear increase in time (polynomial increase) is there, so sooner or later we will find an explanation. In the mean time we already traced it down further by doing some profiling. You will find the result of this profiling (where time per function is expressed in percentage of the total time taken) here: http://www.coachrdevelopment.com/share/callstack_tree.html This shows most time is spend on sqlite3CodeRowTriggerDirect (the second one where it loops though FK's that point to B). However, now the question why this piece of code seems to be causing a polynomial increase still remains. Next question is if it can be improved for use cases with a large number of tables. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On Tue, 10 Sep 2013 12:58:21 + Harmen de Jong - CoachR Group B.V. wrote: > I think the way I wrote our timings were not that clear, since they > are definately exponentially. The numbers from my previous post refer > to the multiplier between the test cases. Just to make it clear, here > follows the same tests, but then expressed in msec of total time per > test. > > 500 tables - 10 msec in total > 1000 tables - 25 msec in total > 5000 tables - 298 msec in total > 1 tables - 985 msec in total I don't know what you mean by "exponentially". 500 .020 ms/table 1000 .025 ms/table 5000 .0596 ms/table 1 .0985 ms/table Linearly, I'd say. It may help to look at it graphically. http://www.schemamania.org/sqlite/graph.pdf > we cannot find anything in there that would explain an exponential > groth in time. I doubt you will. --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Harmen de Jong - CoachR Group B.V. wrote: > http://www.coachrdevelopment.com/share/callstack_tree.html > > This shows most time is spend on sqlite3CodeRowTriggerDirect. I'd guess the actual culprit is the loop in getRowTrigger (which does not show up because it is inlined): /* It may be that this trigger has already been coded (or is in the ** process of being coded). If this is the case, then an entry with ** a matching TriggerPrg.pTrigger field will be present somewhere ** in the Parse.pTriggerPrg list. Search for such an entry. */ for(pPrg=pRoot->pTriggerPrg; pPrg && (pPrg->pTrigger!=pTrigger || pPrg->orconf!=orconf); pPrg=pPrg->pNext ); Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
To get rid of the question of WHERE exactly the time is consumed, we did some profiling on the application that run the query (using the 1 tables test DB). As a result you will find an overview of time consumed per function (shown as percentage of the total time) at this link: http://www.coachrdevelopment.com/share/callstack_tree.html This shows most time is spend on sqlite3CodeRowTriggerDirect. Now the question remains IF this function is causing the polynomial increase in time and if so, WHY it causes a polynomial increase in time and if there are any optimizations possible. We don't see it yet. Looking forward to any suggestions. Best regards, Harmen ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 21:24, "E.Pasma" mailto:pasm...@concepts.nl>> wrote: My suppositions that the time was spent in the execute step and that this has been fixed in the new release appeared both wrong. Thus I may be wrong again but I think to have an explanation now. It is as Simon guesses that a list of lists is being scanned. It is however not the contents of tables being scanned, but the list of foreign key constraints as is loaded in memory when a database is opened. When preparing a DELETE statement, the global list of FK's is scanned to see if the current table is referred. This is the outer loop. If a referring FK is found and if this has an ON DELETE clause, comes an inner loop on the same global list to see if the referrer is referred to itself. In the case that every table has such a constraint, as is the case here, the time becomes n * n. If I'm right this is hard to fix and inherent to the representation of the database schema in memory. This also means that if you leave out the cascading delete from the constraints the time becomes linear. Actually that is what I observed before coming with above explanation. This was easy to check by extractingg the schemas from the test databases and removing ON .. CASCADE. Thanks for making these database available. To get rid of the question of WHERE exactly the time is consumed, we did some profiling on the application that run the query (using the 1 tables test DB). As a result you will find an overview of time consumed per function (shown as percentage of the total time) at this link: http://www.coachrdevelopment.com/share/callstack_tree.html This shows most time is spend on sqlite3CodeRowTriggerDirect. Now the question remains IF this function is causing the polynomial increase in time and if so, WHY it causes a polynomial increase in time and if there are any optimizations possible. We don't see it yet. Looking forward to any suggestions. Best regards, Harmen ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 21:24, "E.Pasma" wrote: > My suppositions that the time was spent in the execute step and that this has > been fixed in the new release appeared both wrong. Thus I may be wrong again > but I think to have an explanation now. > It is as Simon guesses that a list of lists is being scanned. It is however > not the contents of tables being scanned, but the list of foreign key > constraints as is loaded in memory when a database is opened. When preparing > a DELETE statement, the global list of FK's is scanned to see if the current > table is referred. This is the outer loop. If a referring FK is found and if > this has an ON DELETE clause, comes an inner loop on the same global list to > see if the referrer is referred to itself. In the case that every table has > such a constraint, as is the case here, the time becomes n * n. If I'm right > this is hard to fix and inherent to the representation of the database schema > in memory. > > This also means that if you leave out the cascading delete from the > constraints the time becomes linear. Actually that is what I observed before > coming with above explanation. This was easy to check by extractingg the > schemas from the test databases and removing ON .. CASCADE. Thanks for > making these database available. Your suggestion that when preparing a DELETE statement, the global list of FK's is scanned to see if the current table is referred seems to be wrong for what we could find. >From sqlite3FkActions the method sqlite3FkReferences(pTab) is called; this >method uses a hash table (pTab->pSchema->fkeyHash) to know immediately which >FK's are referring to the given table, without having to loop through a global >list of FK's. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 21:24, "E.Pasma" wrote: > Op 10 sep 2013, om 19:48 heeft Simon Slavin het volgende geschreven: > >> >> On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. >> wrote: >> >>> That is something we suspected too. We already made some tests where we >>> timed the time needed for all memory allocations executed in the entire >>> operation. In total for the 1 tables test this was somewhere around 25 >>> msec. Since this is just a little overhead and the instructions as you >>> point out have a linear increasement this still does not explain the >>> polynomial increasement in preperation time. >> >> Polynomial time would be explained if the software is scanning every item in >> a list of lists for each table mentioned. So a guess about what's happening >> is that for each DELETE FROM instruction the software has to look up the >> entry being deleted, and to check that none of the items in any of the other >> tables refers to it. >> >> This makes sense given how FOREIGN KEYs are implemented in SQLite: there is >> no master 'check-for-changes' table, but each table which has the foreign >> key has to be scanned. It is irrelevant that there are no entries in any of >> these tables: the table (actually, the index) still has to be found so that >> SQLite can figure out that there are no entries in it. Because SQLite >> hashes table names, rather than scanning a simple list, it is scanning a >> list of lists. Hence, polynomial time. >> >> I see that E. Pasma has posted that the long scanning time has been fixed in >> a later release. Good. >> >> Simon > My suppositions that the time was spent in the execute step and that this has > been fixed in the new release appeared both wrong. Thus I may be wrong again > but I think to have an explanation now. > It is as Simon guesses that a list of lists is being scanned. It is however > not the contents of tables being scanned, but the list of foreign key > constraints as is loaded in memory when a database is opened. When preparing > a DELETE statement, the global list of FK's is scanned to see if the current > table is referred. This is the outer loop. If a referring FK is found and if > this has an ON DELETE clause, comes an inner loop on the same global list to > see if the referrer is referred to itself. In the case that every table has > such a constraint, as is the case here, the time becomes n * n. If I'm right > this is hard to fix and inherent to the representation of the database schema > in memory. > > This also means that if you leave out the cascading delete from the > constraints the time becomes linear. Actually that is what I observed before > coming with above explanation. This was easy to check by extractingg the > schemas from the test databases and removing ON .. CASCADE. Thanks for > making these database available. What you are supposing makes sense. Actually we already have been searching for two loops that are related to each other since that would explain the polynomial, however thus far we cannot find them (hence we are not yet that familiar with the source code of SQLite). Could you point us to the code where these inner and outer loops are done? The next challenge would of course be to find a solution. Actually we already found a similar optimization last week that had quite some impact with these many tables that we provided to the developers and have been added to the fork right now -;). Especially since these are quite exceptional use cases, there might be the chance to do some optimizations that seem useless for the more normal usage of SQLite. No thanks for making these databases available. We would like to thank every one of you that helps pointing us into the right direction in this thread. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 10 sep 2013, om 19:48 heeft Simon Slavin het volgende geschreven: On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. > wrote: That is something we suspected too. We already made some tests where we timed the time needed for all memory allocations executed in the entire operation. In total for the 1 tables test this was somewhere around 25 msec. Since this is just a little overhead and the instructions as you point out have a linear increasement this still does not explain the polynomial increasement in preperation time. Polynomial time would be explained if the software is scanning every item in a list of lists for each table mentioned. So a guess about what's happening is that for each DELETE FROM instruction the software has to look up the entry being deleted, and to check that none of the items in any of the other tables refers to it. This makes sense given how FOREIGN KEYs are implemented in SQLite: there is no master 'check-for-changes' table, but each table which has the foreign key has to be scanned. It is irrelevant that there are no entries in any of these tables: the table (actually, the index) still has to be found so that SQLite can figure out that there are no entries in it. Because SQLite hashes table names, rather than scanning a simple list, it is scanning a list of lists. Hence, polynomial time. I see that E. Pasma has posted that the long scanning time has been fixed in a later release. Good. Simon My suppositions that the time was spent in the execute step and that this has been fixed in the new release appeared both wrong. Thus I may be wrong again but I think to have an explanation now. It is as Simon guesses that a list of lists is being scanned. It is however not the contents of tables being scanned, but the list of foreign key constraints as is loaded in memory when a database is opened. When preparing a DELETE statement, the global list of FK's is scanned to see if the current table is referred. This is the outer loop. If a referring FK is found and if this has an ON DELETE clause, comes an inner loop on the same global list to see if the referrer is referred to itself. In the case that every table has such a constraint, as is the case here, the time becomes n * n. If I'm right this is hard to fix and inherent to the representation of the database schema in memory. This also means that if you leave out the cascading delete from the constraints the time becomes linear. Actually that is what I observed before coming with above explanation. This was easy to check by extractingg the schemas from the test databases and removing ON .. CASCADE. Thanks for making these database available. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 Sep 2013, at 4:15pm, Harmen de Jong - CoachR Group B.V. wrote: > That is something we suspected too. We already made some tests where we timed > the time needed for all memory allocations executed in the entire operation. > In total for the 1 tables test this was somewhere around 25 msec. Since > this is just a little overhead and the instructions as you point out have a > linear increasement this still does not explain the polynomial increasement > in preperation time. Polynomial time would be explained if the software is scanning every item in a list of lists for each table mentioned. So a guess about what's happening is that for each DELETE FROM instruction the software has to look up the entry being deleted, and to check that none of the items in any of the other tables refers to it. This makes sense given how FOREIGN KEYs are implemented in SQLite: there is no master 'check-for-changes' table, but each table which has the foreign key has to be scanned. It is irrelevant that there are no entries in any of these tables: the table (actually, the index) still has to be found so that SQLite can figure out that there are no entries in it. Because SQLite hashes table names, rather than scanning a simple list, it is scanning a list of lists. Hence, polynomial time. I see that E. Pasma has posted that the long scanning time has been fixed in a later release. Good. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 16:44, "E.Pasma" wrote: > Op 10 sep 2013, om 16:36 heeft Harmen de Jong - CoachR Group B.V. het > volgende geschreven: > >> On 10 sep. 2013, at 16:16, "E.Pasma" wrote: >> >>> Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het >>> volgende geschreven: I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; >>> >>> I could not resist trying this but the tables in the databases appear all >>> empty and I get neglectible timings. Do I need to run an insert script >>> first? >> >> No, it is all about preparing, so there is no need to insert data. When we >> perform the query "delete from A where id=1;" on the databases from the zip >> file, we get the following timings: >> >> 500 tables - 10 msec in total >> 1000 tables - 25 msec in total >> 5000 tables - 298 msec in total >> 1 tables - 985 msec in total >> >> What we would indeed expect is to have neglectible timings. Perhaps our >> interpretation of neglectible differs? Could you please tell me what your >> timing is on the 1 tables database? >> >> We used the .timer ON command in the SQLite command shell utility as well as >> timing from our own application that has SQLite integrated. >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > Clear, and affter downgrading to sqlite version 3.7.7.1 I got exactly your > timings. > But with SQLite version 3.7.17 or 3.8.0.1 the timings are neglectible. > Hope this is good news. Unfortunately not; forgot to mention that foreign keys have to be turned on since these are off by default in the SQLite command shell utility and we found the time consuming operations are located somewhere in sqlite3FKActions. Could you please test again with option "PRAGMA foreign_keys=1;"? Then time increasement is no longer linear. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 10 sep 2013, om 16:36 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: On 10 sep. 2013, at 16:16, "E.Pasma" wrote: Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; I could not resist trying this but the tables in the databases appear all empty and I get neglectible timings. Do I need to run an insert script first? No, it is all about preparing, so there is no need to insert data. When we perform the query "delete from A where id=1;" on the databases from the zip file, we get the following timings: 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total What we would indeed expect is to have neglectible timings. Perhaps our interpretation of neglectible differs? Could you please tell me what your timing is on the 1 tables database? We used the .timer ON command in the SQLite command shell utility as well as timing from our own application that has SQLite integrated. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users Clear, and affter downgrading to sqlite version 3.7.7.1 I got exactly your timings. But with SQLite version 3.7.17 or 3.8.0.1 the timings are neglectible. Hope this is good news. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
> >> het volgende geschreven: > >>> I included 5 databases that we used for testing in this link: > http://wikisend.com/download/570088/test_databases.zip > >>> > >>> The query performed on these databases is: > >>> delete from A where id=1; > >> > >> I could not resist trying this but the tables in the databases > >> appear all empty and I get neglectible timings. Do I need to run an > >> insert script first? > > > > No, it is all about preparing, so there is no need to insert data. > > When we perform the query "delete from A where id=1;" on the > > databases from the zip file, we get the following timings: > > > > 500 tables - 10 msec in total > > 1000 tables - 25 msec in total > > 5000 tables - 298 msec in total > > 1 tables - 985 msec in total > > > > What we would indeed expect is to have neglectible timings. Perhaps > > our interpretation of neglectible differs? Could you please tell me > > what your timing is on the 1 tables database? > > > > We used the .timer ON command in the SQLite command shell utility as > > well as timing from our own application that has SQLite integrated. The vdbe program is 20 times larger for 1 tables versus 500 tables (400K instructions for the later -vs- 20K for the former), and this appears to be where the time is spent -- generating all the foreign key checking and trigger code. Perhaps the memory allocator has issues when building such a huge vdbe set (I haven't looked at the code, but is it perchance doing a lot of realloc's as more triggers are added?) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Sent from my iPad On 10 sep. 2013, at 17:04, "Keith Medcalf" wrote: >>> No, it is all about preparing, so there is no need to insert data. >>> When we perform the query "delete from A where id=1;" on the >>> databases from the zip file, we get the following timings: >>> >>> 500 tables - 10 msec in total >>> 1000 tables - 25 msec in total >>> 5000 tables - 298 msec in total >>> 1 tables - 985 msec in total >>> >>> What we would indeed expect is to have neglectible timings. Perhaps >>> our interpretation of neglectible differs? Could you please tell me >>> what your timing is on the 1 tables database? >>> >>> We used the .timer ON command in the SQLite command shell utility as >>> well as timing from our own application that has SQLite integrated. > > The vdbe program is 20 times larger for 1 tables versus 500 tables (400K > instructions for the later -vs- 20K for the former), and this appears to be > where the time is spent -- generating all the foreign key checking and > trigger code. Perhaps the memory allocator has issues when building such a > huge vdbe set (I haven't looked at the code, but is it perchance doing a lot > of realloc's as more triggers are added?) That is something we suspected too. We already made some tests where we timed the time needed for all memory allocations executed in the entire operation. In total for the 1 tables test this was somewhere around 25 msec. Since this is just a little overhead and the instructions as you point out have a linear increasement this still does not explain the polynomial increasement in preperation time. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 16:16, "E.Pasma" wrote: > Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het > volgende geschreven: >> I included 5 databases that we used for testing in this link: >> http://wikisend.com/download/570088/test_databases.zip >> >> The query performed on these databases is: >> delete from A where id=1; > > I could not resist trying this but the tables in the databases appear all > empty and I get neglectible timings. Do I need to run an insert script first? No, it is all about preparing, so there is no need to insert data. When we perform the query "delete from A where id=1;" on the databases from the zip file, we get the following timings: 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total What we would indeed expect is to have neglectible timings. Perhaps our interpretation of neglectible differs? Could you please tell me what your timing is on the 1 tables database? We used the .timer ON command in the SQLite command shell utility as well as timing from our own application that has SQLite integrated. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; I could not resist trying this but the tables in the databases appear all empty and I get neglectible timings. Do I need to run an insert script first? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 15:41, "Igor Tandetnik" wrote: > Not exponential - polynomial. Between 500 and 1 the size of input > increases x20, so the time increase of x400 would be consistent with a > quadratic algorithm. Your observed measurements are even better than that. You're right, thanks for correcting me. However, our basic issue stays the same. We would expect an linear increasement while the increasement is actually polynomial and still can't figure out why the increasement in time could not be linear in this case. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 9/10/2013 5:37 AM, Harmen de Jong - CoachR Group B.V. wrote: The time factors it takes on each database are as follows (where the time needed for the 500 tables was taken as starting point to calculate the other factors): 500 tables - 1x 1000 tables - 2.5x 5000 tables - 29x 1 tables - 98x As you can see this is an exponential growth in time it takes to execte the query. Not exponential - polynomial. Between 500 and 1 the size of input increases x20, so the time increase of x400 would be consistent with a quadratic algorithm. Your observed measurements are even better than that. Exponential means adding each one new table would cause the running time to be multiplied by some factor. You certainly don't have it *that* bad. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 14:43, "E.Pasma" wrote: > The timings do not look truly exponential to me. It looks more as if there is > a graduated charge (NL: staffeltoeslag) on the time per table. For instance: > table 1 - 500 - 2 msec/table > table 501 - 1.000 - 3 msec/table > table 1001 - 5.000 - 5 msec/table > table 5001 - 10.000 - 10 msec/table > It could be that there is no further extra charge above 10.000 tables. Is it > worth trying that or did you do that already? I'm sorry, but I think the way I wrote our timings were not that clear, since they are definately exponentially. The numbers from my previous post refer to the multiplier between the test cases. Just to make it clear, here follows the same tests, but then expressed in msec of total time per test. 500 tables - 10 msec in total 1000 tables - 25 msec in total 5000 tables - 298 msec in total 1 tables - 985 msec in total > Your last observation, that the time is spent in sqlite3FKActions, means that > this is not in query preparation but in ecexution (if I understand the code > right). That is understandable because the engine needs to perform a vast > amount of internal internal queries to check all the related tables. sqlite3FKActions is part of the call stack create by sqlite3_prepare_v2, so I think it actually is part of the prepare. The execution time of sqlite3_step is actually negligible (< 1 msec) in our test cases because nothing is actually deleted because all tables are empty in the test case. > This leaves room for the hypothesis that the observed increase is a matter of > caching. For instance the sqlite default page cache may just fit 2.000 tables > but not 10.000. Is. It may be worth to try an increased cache_size. We already have a very high cache_size set in our test cases. We have set it to 819200, so I think that can be ruled out too. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 10 sep 2013, om 11:37 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: On 9 sep. 2013, at 22:11, "E.Pasma" wrote: Ha, I did not mean the length of the names but the length of the hash table (NL: klutstabel), That is the number of buckets over which the hash values are distributed. I looked some further in the code and now believe that this is 1024. That is stilll generous for a normal database, But with 10.000 tables you will have about 10 table names in each bucket. So the enigine wiill need to do a lot of searching within each bucket. That involves checking the name of each item in the bucket with the searched name. We've looked into that code. Actually the 1024 is not the number of buckets used for the hash table, but the total size. So we did some tests by changing the 1024 bytes into a value large enough to store the hash table for 10K tables, but unfortunately that gave a performance increasement of 'just' 5 percent (which is at least something, but still does not get us to the point where it is fast enough for our usage and does not prevent an exponential growth in time as the number of tables increases). Still we have no idea why the query preparation time increases exponentially instead of linearly which is what we would expect since it is using a hash table. I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; The time factors it takes on each database are as follows (where the time needed for the 500 tables was taken as starting point to calculate the other factors): 500 tables - 1x 1000 tables - 2.5x 5000 tables - 29x 1 tables - 98x As you can see this is an exponential growth in time it takes to execte the query. So far we're missing the point of why this growth should be exponential. ... We tried some further debugging and it seems that SQLite spends its time in "sqlite3FkActions", though we cannot find anything in there that would explain an exponential groth in time. ___ The timings do not look truly exponential to me. It looks more as if there is a graduated charge (NL: staffeltoeslag) on the time per table. For instance: table 1 - 500 - 2 msec/table table 501 - 1.000 - 3 msec/table table 1001 - 5.000 - 5 msec/table table 5001 - 10.000 - 10 msec/table It could be that there is no further extra charge above 10.000 tables. Is it worth trying that or did you do that already? Your last observation, that the time is spent in sqlite3FKActions, means that this is not in query preparation but in ecexution (if I understand the code right). That is understandable because the engine needs to perform a vast amount of internal internal queries to check all the related tables. This leaves room for the hypothesis that the observed increase is a matter of caching. For instance the sqlite default page cache may just fit 2.000 tables but not 10.000. Is. It may be worth to try an increased cache_size. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 10 sep. 2013, at 11:37, "Harmen de Jong - CoachR Group B.V." wrote: > As you can see this is an exponential growth in time it takes to execte the > query. So far we're missing the point of why this growth should be > exponential. We tried some further debugging and it seems that SQLite spends its time in "sqlite3FkActions", though we cannot find anything in there that would explain an exponential groth in time. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 9 sep. 2013, at 22:11, "E.Pasma" wrote: > Ha, I did not mean the length of the names but the length of the hash table > (NL: klutstabel), That is the number of buckets over which the hash values > are distributed. I looked some further in the code and now believe that this > is 1024. That is stilll generous for a normal database, But with 10.000 > tables you will have about 10 table names in each bucket. So the enigine > wiill need to do a lot of searching within each bucket. That involves > checking the name of each item in the bucket with the searched name. We've looked into that code. Actually the 1024 is not the number of buckets used for the hash table, but the total size. So we did some tests by changing the 1024 bytes into a value large enough to store the hash table for 10K tables, but unfortunately that gave a performance increasement of 'just' 5 percent (which is at least something, but still does not get us to the point where it is fast enough for our usage and does not prevent an exponential growth in time as the number of tables increases). Still we have no idea why the query preparation time increases exponentially instead of linearly which is what we would expect since it is using a hash table. I included 5 databases that we used for testing in this link: http://wikisend.com/download/570088/test_databases.zip The query performed on these databases is: delete from A where id=1; The time factors it takes on each database are as follows (where the time needed for the 500 tables was taken as starting point to calculate the other factors): 500 tables - 1x 1000 tables - 2.5x 5000 tables - 29x 1 tables - 98x As you can see this is an exponential growth in time it takes to execte the query. So far we're missing the point of why this growth should be exponential. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Op 9 sep 2013, om 10:06 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: Our table and column names are not too long either as E.Pasma suggests. Ha, I did not mean the length of the names but the length of the hash table (NL: klutstabel), That is the number of buckets over which the hash values are distributed. I looked some further in the code and now believe that this is 1024. That is stilll generous for a normal database, But with 10.000 tables you will have about 10 table names in each bucket. So the enigine wiill need to do a lot of searching within each bucket. That involves checking the name of each item in the bucket with the searched name. With regards to the number buckets I found a meaningful comment in the code: /* The inability to allocates space for a larger hash table is ** a performance hit but it is not a fatal error. So mark the ** allocation as a benign. */ Another generousity is that the code uses a constant for the length of the hash tables so you can change that by a single edit. There are of course advantages and disadvantages of doing that,. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 7 sep. 2013, at 04:45, "David de Regt" wrote: > Mayhaps the CROSS JOIN trick is your friend in this case, if you can be > pretty sure of the correct direction of the join order. :) In the examples I gave it was actually about a simple delete query from one table without any joins, so that still does not explain the huge performance decrease. > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of James K. Lowden > Sent: Friday, September 6, 2013 7:40 PM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] Query preperation time does not scale linearly with > growth of no. of tables > > Factorial, actually. After three tables, each addtional table increases > potential join sequences by roughly an order of magnitude. This factorial increasement would be the case if there were any joins, but there are none. Does any one have any other explanation to why the performance gets so bad when having many tables? Our table and column names are not too long either as E.Pasma suggests. Best regards, Harmen ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
The developers choose the C type 'int' to represent the hash value. Possibly this is too small for your case? Op 6 sep 2013, om 22:00 heeft Harmen de Jong - CoachR Group B.V. het volgende geschreven: On 6 sep. 2013, at 20:09, "Kevin Benson" wrote: Dr. Hipp does a little bit of explaining on this topic, generally, in his replies on this thread: http://www.mail-archive.com/sqlite-users@sqlite.org/msg78602.html Thanks for pointing me to that thread, but as dr. Hipp states in this thread, the tables are stored in a hash. Therefore I would not expect a large performance decrease on large number of tables at all, or am I missing something? Best regards, Harmen ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Mayhaps the CROSS JOIN trick is your friend in this case, if you can be pretty sure of the correct direction of the join order. :) -David -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of James K. Lowden Sent: Friday, September 6, 2013 7:40 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables On Fri, 6 Sep 2013 17:29:25 + Harmen de Jong - CoachR Group B.V. wrote: > > If I recall correctly, query planner's behavior is worst-case > > quadratic in the number of tables participating in the query. This > > includes tables mentioned directly, and also those pulled in > > indirectly via views, triggers or foreign keys. Factorial, actually. After three tables, each addtional table increases potential join sequences by roughly an order of magnitude. Given tables A, B, and C, 1 * 2 * 3 = 6: sqlite> select a.T, b.T, c.T from F a join F b on a.T <> b.T join F c sqlite> on b.T <> c.T where a.T <> c.T order by a.T, b.T, c.T; A B C A C B B A C B C A C A B C B A That's six plans for the order in which the system could choose to access the tables to execute the query. Factorial grows quickly, as is demonstrated by adding table D: sqlite> select a.T, b.T, c.T, d.T from F a join F b on a.T <> b.T > cross join F c on b.T <> c.T join F as d on c.T <> d.T where a.T <> > c.T and a.T <> d.T and b.T <> d.T order by a.T, b.T, c.T, d.T; A B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A Pity the query optimizer facing an 8-way join. Or, say, a 20-table join: $ FACT=1; seq 20 | while read F; do FACT=$(( ${FACT} * $F )); printf '% 3d! = %d\n' $F ${FACT}; done 1! = 1 2! = 2 3! = 6 4! = 24 5! = 120 6! = 720 7! = 5040 8! = 40320 9! = 362880 10! = 3628800 11! = 39916800 12! = 479001600 13! = 6227020800 14! = 87178291200 15! = 1307674368000 16! = 20922789888000 17! = 355687428096000 18! = 6402373705728000 19! = 121645100408832000 20! = 243290200817664 There is such a thing as too many choices! --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On Fri, 6 Sep 2013 17:29:25 + Harmen de Jong - CoachR Group B.V. wrote: > > If I recall correctly, query planner's behavior is worst-case > > quadratic in the number of tables participating in the query. This > > includes tables mentioned directly, and also those pulled in > > indirectly via views, triggers or foreign keys. Factorial, actually. After three tables, each addtional table increases potential join sequences by roughly an order of magnitude. Given tables A, B, and C, 1 * 2 * 3 = 6: sqlite> select a.T, b.T, c.T from F a join F b on a.T <> b.T sqlite> join F c on b.T <> c.T where a.T <> c.T order by a.T, b.T, c.T; A B C A C B B A C B C A C A B C B A That's six plans for the order in which the system could choose to access the tables to execute the query. Factorial grows quickly, as is demonstrated by adding table D: sqlite> select a.T, b.T, c.T, d.T from F a join F b on a.T <> b.T > cross join F c on b.T <> c.T join F as d on c.T <> d.T where > a.T <> c.T and a.T <> d.T and b.T <> d.T order by a.T, b.T, > c.T, d.T; A B C D A B D C A C B D A C D B A D B C A D C B B A C D B A D C B C A D B C D A B D A C B D C A C A B D C A D B C B A D C B D A C D A B C D B A D A B C D A C B D B A C D B C A D C A B D C B A Pity the query optimizer facing an 8-way join. Or, say, a 20-table join: $ FACT=1; seq 20 | while read F; do FACT=$(( ${FACT} * $F )); printf '% 3d! = %d\n' $F ${FACT}; done 1! = 1 2! = 2 3! = 6 4! = 24 5! = 120 6! = 720 7! = 5040 8! = 40320 9! = 362880 10! = 3628800 11! = 39916800 12! = 479001600 13! = 6227020800 14! = 87178291200 15! = 1307674368000 16! = 20922789888000 17! = 355687428096000 18! = 6402373705728000 19! = 121645100408832000 20! = 243290200817664 There is such a thing as too many choices! --jkl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
Two things: 1. The longer the table names, the longer it will take to compute the hash of each table name. 2. Because the entire schema must be reprocessed after each change, all the table names will be rehashed after each table has been created. Creating 10,000 tables will result in re-reading all that data and re-hashing all the table names. After adding the 10,000th table, SQLite will have computed at least 50,005,000 hash operations. Many more if column names are hashed too. SDR On Sep 6, 2013 2:00 PM, "Harmen de Jong - CoachR Group B.V." < har...@coachr.com> wrote: > On 6 sep. 2013, at 20:09, "Kevin Benson" wrote: > > Dr. Hipp does a little bit of explaining on this topic, generally, in his > > replies on this thread: > > > > http://www.mail-archive.com/sqlite-users@sqlite.org/msg78602.html > > Thanks for pointing me to that thread, but as dr. Hipp states in this > thread, the tables are stored in a hash. Therefore I would not expect a > large performance decrease on large number of tables at all, or am I > missing something? > > Best regards, > Harmen > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 6 Sep 2013, at 9:00pm, Harmen de Jong - CoachR Group B.V. wrote: > as dr. Hipp states in this thread, the tables are stored in a hash. Therefore > I would not expect a large performance decrease on large number of tables at > all, or am I missing something? The /names/ of the tables are hashed. Not the data in the tables, or the list of page numbers that the data is stored in. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 6 sep. 2013, at 20:09, "Kevin Benson" wrote: > Dr. Hipp does a little bit of explaining on this topic, generally, in his > replies on this thread: > > http://www.mail-archive.com/sqlite-users@sqlite.org/msg78602.html Thanks for pointing me to that thread, but as dr. Hipp states in this thread, the tables are stored in a hash. Therefore I would not expect a large performance decrease on large number of tables at all, or am I missing something? Best regards, Harmen ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On Fri, Sep 6, 2013 at 1:29 PM, Harmen de Jong - CoachR Group B.V. < har...@coachr.com> wrote: > On 6 sep. 2013, at 18:42, "Igor Tandetnik" wrote: > > If I recall correctly, query planner's behavior is worst-case quadratic > in the number of tables participating in the query. This includes tables > mentioned directly, and also those pulled in indirectly via views, triggers > or foreign keys. > > Ok, maybe that explains it. Do you remember some topics or explanations > that cover this more in depth? At least it will be worth for us to do some > debugging on the SQLite code to see if we can pin point this behaviour and > see if we can work around. Dr. Hipp does a little bit of explaining on this topic, generally, in his replies on this thread: http://www.mail-archive.com/sqlite-users@sqlite.org/msg78602.html -- -- -- --Ô¿Ô-- K e V i N ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 6 sep. 2013, at 18:42, "Igor Tandetnik" wrote: > If I recall correctly, query planner's behavior is worst-case quadratic in > the number of tables participating in the query. This includes tables > mentioned directly, and also those pulled in indirectly via views, triggers > or foreign keys. Ok, maybe that explains it. Do you remember some topics or explanations that cover this more in depth? At least it will be worth for us to do some debugging on the SQLite code to see if we can pin point this behaviour and see if we can work around. > If I may be so bold, I would say that a design that calls for a database with > 10,000 tables doesn't feel right The main reason for this is that the application is used to do research on and examine the quality of processes that take place within organizations. For this research the users can create highly customized research forms with varying number of columns, varying field types and a lot of specific restrictions, which we store in tables that are generated on the fly with the proper indexes (later on the research results are stored in these tables and used for real-time dashboard reporting and downloading reports that are generated with the latest data) Furthermore the data that is examined and imported into the application contains a lot of privately related data. Since we already create these separated tables, we use this as some sort of 'automated' security wall to rule out the risk of users from different organizations reaching or hacking into each others privacy sensitive data in case programming weaknesses or mistakes are made. For this project we went on board with SQLite (which we already used in several other projects) and have built our own 'SQL server' app that takes care of things like: - Asynchronous real-time backups to secondary location (sort of binary logging as found in other DB's, though our implementation is not binary) -Automatically splitting time intensive writes and updates to prevent these locking the app (since SQLite has DB level locking) for users that perform simple tasks that can be performed fast. -This 'stable' server app also rules out startup times of the database in the case of application crashes (since the DB is several gigabytes in size it takes SQLite a few seconds to 'initialize' and thereby hiding application crashes from the users (which access the app through a web interface). -Automatically backing up the DB in compressed and encrypted format. Best regards, Harmen ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query preperation time does not scale linearly with growth of no. of tables
On 9/6/2013 10:35 AM, Harmen de Jong - CoachR Group B.V. wrote: We ran into an issue where specific queries are geting non linearly slower when the total number of tables grows. If I recall correctly, query planner's behavior is worst-case quadratic in the number of tables participating in the query. This includes tables mentioned directly, and also those pulled in indirectly via views, triggers or foreign keys. If I may be so bold, I would say that a design that calls for a database with 10,000 tables doesn't feel right to me. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Query preperation time does not scale linearly with growth of no. of tables
We ran into an issue where specific queries are geting non linearly slower when the total number of tables grows. Example 1 (not slow): Database has table A Database has 1,000 other tables with foreign key to table A Row is deleted from table A (no deletion of actual data in other tables is involved). Example 2 (more than 50 times slower than example A): Database has table A Database has 10,000 other tables with foreign key to table A Row is deleted from table A (no deletion of actual data in other tables is involved). Does anybody have an idea why the preparation time of the query is scaling up so fast in the above examples? Best regards, Harmen de Jong CoachR Group B.V. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users