Re: [sqlite] 50% faster than 3.7.17
On Thu, 2014-10-09 at 17:41 +0700, Dan Kennedy wrote: > I think I understand the second optimization. You're saying that given this: > >SELECT DISTINCT main.uid > FROM main LEFT JOIN email_list ON main.uid = email_list.uid > WHERE email_list.email LIKE 'al%' > > SQLite should deduce that since the LIKE implies "email_list.email IS > NOT NULL" ( Digression: SQLite *should* deduce the LIKE implies the IS NOT NULL condition. My *actual* queries all have, for some bizarre historical reason I haven't yet explored, email_list.value IS NOT NULL AND email_list.value LIKE 'foo%' You'd think that the former condition would end up optimised away, but timings show that it clearly isn't. I wasn't going to bring that up here since I think it is so clearly a stupid thing to be putting in our query in the first place. But you just made me do it anyway. ) > the LEFT JOIN is equivalent to a regular JOIN. Which would allow > SQLite to reorder the tables and perhaps come up with a more efficient > query plan. Correct? Indeed. As I said, a glib response to this suggested optimisation in *isolation* might be that the user shouldn't have asked for a LEFT JOIN at all if they didn't need it. But of course when coupled with the other suggested transforms, that glib response is a lot less valid. > > The first optimization might be trickier. ... > I guess queries that have a DISTINCT clause don't have this problem. Right. I think it only works with DISTINCT, but that's OK. > So it could in theory transform your query to: > >SELECT DISTINCT > main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid > WHERE email_list.email LIKE 'al%' > UNION ALL >SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid > WHERE main.first_name like 'al%' > UNION ALL >SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid > WHERE main.last_name like 'al%'; > > The the hypothetical optimization above could figure out that the > first LEFT JOIN could just as easily be a JOIN. Right. > And that the other two LEFT JOINs are not required at all due to the > DISTINCT and the fact that the WHERE clauses on the sub-selects do not > reference the joined table at all. Yeah, that's probably true. I'd assumed that we'd do it as part of the "first optimisation" introducing the UNION but it's probably better handled as a separate transform as you have it here. > It seems like there are a few moving parts here. And none of these are > trivial changes. So, good ideas that might show up in an SQLite > release at some point, but it's not realistic to expect these > optimizations to be implemented quickly. TBH that's all I was really looking for; thanks. I *hate* it when people silently work around limitations in my software without bringing them to my attention, and I didn't want to be guilty of doing the same. I certainly don't expect an instant fix; just knowing that it's on the radar is perfectly sufficient. Now I can commit my hackish workaround to rewrite certain special-case queries to optimise the output of the current query planner for them, and still sleep at night :) Thanks. > Unless, of course, you can propose a patch and they turn out to be > simpler than they look. That would be interesting but realistically it's outside my capacity at the moment. I am already *so* far into shaving this yak that I can barely remember what I was actually intending to do in the first place :) -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On 10/09/2014 04:38 PM, David Woodhouse wrote: On Thu, 2014-09-25 at 11:13 +0100, David Woodhouse wrote: I suggested a couple of specific optimisations which the query planner might be able to make, which should hopefully have benefits wider than just my own use case. Are those not viable? I'm preparing to commit a workaround to Evolution to avoid this issue, and then move on with my life and forget about it. Before I do, is it worth me rephrasing this as a 'suboptimal query plan' bug report so it gets tracked and might get attention later? Or should I just forget the idea of getting it fixed in sqlite? Well, you could always create a patch... I think I understand the second optimization. You're saying that given this: SELECT DISTINCT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid WHERE email_list.email LIKE 'al%' SQLite should deduce that since the LIKE implies "email_list.email IS NOT NULL" the LEFT JOIN is equivalent to a regular JOIN. Which would allow SQLite to reorder the tables and perhaps come up with a more efficient query plan. Correct? Seems like a reasonable idea. The first optimization might be trickier. With queries that feature a single table: SELECT cols FROM tbl WHERE a=? OR b=? OR c=? the planner may elect to run something very close to this: SELECT cols FROM tbl WHERE a=? UNION ALL SELECT cols FROM tbl WHERE b=? UNION ALL SELECT cols FROM tbl WHERE c=? However, after returning each row, we remember its PRIMARY KEY (either the rowid or real PK for WITHOUT ROWID tables). Similar transformations for individual loops within join queries are also possible. However, with a JOIN query, we don't currently attempt this kind of transform. I think because we would have to create some kind of composite key to use in place of the PRIMARY KEY to avoid returning duplicates. I guess queries that have a DISTINCT clause don't have this problem. So it could in theory transform your query to: SELECT DISTINCT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid WHERE email_list.email LIKE 'al%' UNION ALL SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid WHERE main.first_name like 'al%' UNION ALL SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid WHERE main.last_name like 'al%'; The the hypothetical optimization above could figure out that the first LEFT JOIN could just as easily be a JOIN. And that the other two LEFT JOINs are not required at all due to the DISTINCT and the fact that the WHERE clauses on the sub-selects do not reference the joined table at all. It seems like there are a few moving parts here. And none of these are trivial changes. So, good ideas that might show up in an SQLite release at some point, but it's not realistic to expect these optimizations to be implemented quickly. Unless, of course, you can propose a patch and they turn out to be simpler than they look. Regards, Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Thu, 2014-09-25 at 11:13 +0100, David Woodhouse wrote: > > I suggested a couple of specific optimisations which the query planner > might be able to make, which should hopefully have benefits wider than > just my own use case. Are those not viable? I'm preparing to commit a workaround to Evolution to avoid this issue, and then move on with my life and forget about it. Before I do, is it worth me rephrasing this as a 'suboptimal query plan' bug report so it gets tracked and might get attention later? Or should I just forget the idea of getting it fixed in sqlite? -- dwmw2 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
"Simon Slavin" wrote... On 24 Sep 2014, at 2:13pm, jose isaias cabrerawrote: This would be a nice set of options. On my case, I would set all connections to our project to be" max_performance", as it is what we need. Just thinking out loud. How much max is max ? Are you willing to give up ACID ? Are you willing to have your database corrupted if there's power loss ? Ok, back to reality... ;-), no, I will not. Good questions, Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Tue, 2014-09-23 at 17:48 +0100, David Woodhouse wrote: > That looks really promising; thanks for all this work. > > Tristan, you have a comprehensive set of benchmarks for Evolution's > addressbook; is it possible for someone else to run those or would it > take more of your time to babysit than it would to run them yourself? I was able to get these benchmarks running, and compared the 3.8.7 snapshot against the version of 3.8.6 which is shipped in Fedora 20. Results temporarily at http://westmere.infradead.org/charts/sqlite387/ This isn't a perfect comparison since the Fedora package is built from sqlite-src-3080600.zip and the equivalent source for the 3.8.7 alpha didn't seem to be available. But it was built as a shared library using the same configuration and compiler flags, and the benchmark compares identical code running against both the 3.8.6 and 3.8.7 libraries. I haven't done any real analysis other than looking at the pretty pictures, but certainly nothing seems to go slower (or break), and there are some noticeable improvements on some of the benchmarks (filter-by-long-full-name-suffix.png, filter-by-short-full-name-suffix.png, filter-containing-full-name.png, filter-containing-given-name.png, filter-containing-phone-number.png). -- dwmw2 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Wed, 2014-09-24 at 19:36 -0600, Keith Medcalf wrote: > > Interesting. From that code you might want to try something like this: > > SELECT uid, vcard, bdata > FROM folder_id > WHERE uid in ( select uid FROM email_list where value like 'p%' >union > select uid from folder_id where nickname LIKE 'p%' >union > select uid from folder_id where full_name LIKE 'p%' >union > select uid from folder_id where family_name LIKE 'p%' >union > select uid from folder_id where given_name LIKE 'p%' >union > select uid from folder_id where nickname LIKE 'p%' >union > select uid from folder_id where file_as LIKE 'p%' > ); > > Then having nocase indexes on the various search fields will all work as > expected. Yeah, that achieves the same speed. I'm not sure it addresses the real problem though. It still only really applies when the user's query (which we're translating to SQL) contains only 'OR' and no 'AND' clauses. It doesn't help me translate a query in the general case. Now, I'm not *entirely* averse to having a special case for the 'all OR' query — this particular query is the address autocompletion so it's common and the user is actually waiting for it as they type. In fact, as a proof of concept I've already *implemented* a hackish special case to spot this case and basically submit a hand-crafted query instead of the normal translation to SQL: https://bugzilla.gnome.org/show_bug.cgi?id=699597#c19 The problem is that I don't *want* to have to have that special case. This is just a query optimisation, which is something the query planner is supposed to do. I don't *want* to implement this and other optimisations in the client, just to trick *today's* sqlite query planner into spotting the best way to do it. That's the Wrong Way™ to do things. There are two alternative approaches which *don't* seem as wrong. Firstly, if there's a sane way to rewrite our translator so that it naturally uses UNION for OR clauses, that might make sense. But to cope with AND clauses, AFAICT the natural extension of that approach would be to use 'SELECT FROM ... SELECT FROM' and then we lose the use of indices for *those* cases, right¹? Tristan started a thread about this 'nested select' last year, which I picked up a couple of weeks ago. It didn't seem like it was a viable strategy for the general case. The second approach, and this is why I started this thread, is to 'fix' the query planner so that that it can see for *itself* the best way to implement a given query given the constraints. I suggested a couple of specific optimisations which the query planner might be able to make, which should hopefully have benefits wider than just my own use case. Are those not viable? -- dwmw2 ¹ I do realise that in the special case of a single top-level AND such that all its sub-clauses are necessary conditions, I can do something nicer. But again, it's a special case and doesn't handle the general case of nested AND and OR clauses. And it's still the *client* code doing the job of the optimiser, spotting necessary vs. sufficient conditions and pulling them out to the top level for more efficient implementation. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
In the vein of configurability, and in a day dream I just had, it would be nice (But probably not possible as there could be compiler directives you can't use at the same time) that we could have a single output DLL/SO/whatever dumped from the compiler that had everything available, then, via defaults in the source code, IF/THEN/ELSE (or whatever the C equiv is) operations are considered to have certain functionality run during the query process. Via a configuration file, or via pragma which would be generated based on a configuration, feature sets could be set at run time. One fault with this would be the issue of COMPILED size. I realize we're in an age where we generally talk megabytes instead of kilobytes in regards to main thread compiled code, but I'm sure there are platforms out there that are using SQLite that have to fit in the kilobyte range, and need special compiled sources. The advantage to this is that I wouldn't have to compile X number of DLLs for Y number of programs to get the results I need, just have one configuration file kicking around that I can adjust as needed. It'd also help with fine tuning to validate whether I need certain features turned on or off. On Wed, Sep 24, 2014 at 1:15 PM, Roger Binnswrote: > On 24/09/14 06:19, Simon Slavin wrote: > > How much max is max ? > > Not giving up ACID. But for example stat4 is better than the default > stat1. > Memory mapping (especially on 64 bit) is great. So is WAL. All are off > by > default. > > If you want to give up ACID then you should really be on your own to look > at > the various tradeoffs. > > Roger > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
Interesting. From that code you might want to try something like this: SELECT uid, vcard, bdata FROM folder_id WHERE uid in ( select uid FROM email_list where value like 'p%' union select uid from folder_id where nickname LIKE 'p%' union select uid from folder_id where full_name LIKE 'p%' union select uid from folder_id where family_name LIKE 'p%' union select uid from folder_id where given_name LIKE 'p%' union select uid from folder_id where nickname LIKE 'p%' union select uid from folder_id where file_as LIKE 'p%' ); Then having nocase indexes on the various search fields will all work as expected. >-Original Message- >From: David Woodhouse [mailto:dw...@infradead.org] >Sent: Wednesday, 24 September, 2014 07:25 >To: Keith Medcalf >Cc: General Discussion of SQLite Database >Subject: Re: [sqlite] 50% faster than 3.7.17 > >On Wed, 2014-09-24 at 06:13 -0600, Keith Medcalf wrote: >> >> Would it not be more efficient to skip the join altogether since all >> you want is the list of uid's, and assuming that you have maintained >> the referential integrity of your database mail_list(list_uid) >> references main(uid)? >> >> SELECT list_uid >> FROM mail_list >> WHERE email LIKE 'al%' >> UNION >> SELECT uid >> FROM main >> WHERE first_name LIKE 'al%' >> OR last_name LIKE 'al%'; > >Yes, but only because I oversimplified. In fact my queries are actually >after *other* fields from the main table. It's just that I'd elided >those fields from the example. > >For the sake of the example, please assume my queries were actually > 'SELECT DISTINCT main.uid, main.first_name, main.last_name FROM …' > >The real-life query that this is simplified *from* is discussed at >https://git.gnome.org/browse/evolution-data-server/commit/?id=5f9f5b52807 > >-- >dwmw2 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On 24/09/14 06:19, Simon Slavin wrote: > How much max is max ? Not giving up ACID. But for example stat4 is better than the default stat1. Memory mapping (especially on 64 bit) is great. So is WAL. All are off by default. If you want to give up ACID then you should really be on your own to look at the various tradeoffs. Roger ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Wed, 2014-09-24 at 06:13 -0600, Keith Medcalf wrote: > > Would it not be more efficient to skip the join altogether since all > you want is the list of uid's, and assuming that you have maintained > the referential integrity of your database mail_list(list_uid) > references main(uid)? > > SELECT list_uid > FROM mail_list > WHERE email LIKE 'al%' > UNION > SELECT uid > FROM main > WHERE first_name LIKE 'al%' > OR last_name LIKE 'al%'; Yes, but only because I oversimplified. In fact my queries are actually after *other* fields from the main table. It's just that I'd elided those fields from the example. For the sake of the example, please assume my queries were actually 'SELECT DISTINCT main.uid, main.first_name, main.last_name FROM …' The real-life query that this is simplified *from* is discussed at https://git.gnome.org/browse/evolution-data-server/commit/?id=5f9f5b52807 -- dwmw2 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On 24 Sep 2014, at 2:13pm, jose isaias cabrerawrote: > This would be a nice set of options. On my case, I would set all connections > to our project to be" max_performance", as it is what we need. Just thinking > out loud. How much max is max ? Are you willing to give up ACID ? Are you willing to have your database corrupted if there's power loss ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
"Roger Binns" wrote... On 22/09/14 10:48, Richard Hipp wrote: But if you have any new ideas on how we can further reduce the I/O, we'd love to hear from you. The single biggest problem for me is defaults. SQLite supports memory mapped i/o which has many advantages. The stat4 analyze does a really good job. WAL reduces writing contention. These are all off by default for various good reasons. But it also means that by default the majority of developers are not getting the best performance SQLite already has to offer, unless they happen to stumble on these. Some like stat4 are especially problematic since it requires recompilation to address. It would be nice if by default things were best. One weak suggestion is to have a few profiles, and have them selectable by pragma. The profile would then go ahead and set various options. Future versions of SQLite would also then update the profiles. For example: pragma profile='max_performance' -- turns on all above, ups caches etc pragma profile='min_memory'-- tunes down everything related to memory Various members of the SQLite consortium can then do things like: pragma profile='firefox' This would be a nice set of options. On my case, I would set all connections to our project to be" max_performance", as it is what we need. Just thinking out loud. josé ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
Would it not be more efficient to skip the join altogether since all you want is the list of uid's, and assuming that you have maintained the referential integrity of your database mail_list(list_uid) references main(uid)? SELECT list_uid FROM mail_list WHERE email LIKE 'al%' UNION SELECT uid FROM main WHERE first_name LIKE 'al%' OR last_name LIKE 'al%'; >-Original Message- >Speaking of better query plans, though... here's a query which takes >about 1700ms on my data set, followed by a couple of optimisations which >seem like they might be generically useful, and would help it go 300 >times faster: > >/* This is an address book. The 'main' table holds the primary > information while the 'email_list' table holds email addresses > because each contact can have more than one email address */ >PRAGMA case_sensitive_like=on; >CREATE TABLE main (uid text primary key, first_name text, last_name >text); >INSERT INTO main VALUES('','alice', 'X'); >INSERT INTO main VALUES('','bert', 'Y'); >INSERT INTO main VALUES('','alison', 'Z'); >CREATE INDEX main_first_name_index on main (first_name); >CREATE INDEX main_last_name_index on main (last_name); >CREATE TABLE email_list (uid text, email text); >INSERT INTO email_list VALUES('','al...@example.com'); >INSERT INTO email_list VALUES('','ali...@example.com'); >INSERT INTO email_list VALUES('','alb...@example.org'); >CREATE INDEX email_uid_index on email_list (uid); >CREATE INDEX email_email_index on email_list (email); > >SELECT DISTINCT main.uid > FROM main LEFT JOIN email_list ON main.uid = email_list.uid > WHERE email_list.email LIKE 'al%' > OR main.first_name like 'al%' > OR main.last_name like 'al%'; >/* >0|0|0|SCAN TABLE main USING INDEX sqlite_autoindex_main_1 >0|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?) >*/ > >The first optimisation is to realise that sometimes, operations on the >result of a JOIN can be more efficiently resolved by using on the >original tables. Specifically, in this example, there's no point in >looking for a given first_name or last_name in the *joined* table, when >we have indices on those fields in the original 'main' table and can do >it much faster by querying those and then combining the results. > >So the original query becomes: > >SELECT DISTINCT main.uid > FROM main LEFT JOIN email_list ON main.uid = email_list.uid > WHERE email_list.email LIKE 'al%' >UNION SELECT main.uid > FROM main WHERE main.first_name LIKE 'al%' > OR main.last_name LIKE 'al%'; >/* >1|0|0|SCAN TABLE main USING COVERING INDEX sqlite_autoindex_main_1 >1|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?) >2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>? >AND first_name2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND >last_name0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION) >*/ > >With that optimisation, the time taken on my data set is reduced to >about 1000ms. Which is quite a nice improvement from 1700ms but that's >just the first step... > >The second optimisation is to realise that there's no point in that >being a LEFT JOIN any more, given that it's now only ever going to >return results where the right-hand-side actually does exist. Perhaps >normally we'd blame the user for such a naïve query, but when coupled >with the first optimisation, spotting this optimisation does actually >start to make sense. So now the query becomes: > >SELECT DISTINCT main.uid > FROM main JOIN email_list ON main.uid = email_list.uid > WHERE email_list.email LIKE 'al%' >UNION SELECT main.uid > FROM main WHERE main.first_name LIKE 'al%' > OR main.last_name LIKE 'al%'; >/* >1|0|1|SEARCH TABLE email_list USING INDEX email_email_index (email>? AND >email1|1|0|SEARCH TABLE main USING COVERING INDEX sqlite_autoindex_main_1 >(uid=?) >2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>? >AND first_name2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND >last_name0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION) >*/ > >Now the query takes 6ms on my data set. > >If this were a hard-coded query, of course it would make sense for me to >just do it this way in the client. But in my case — and I suspect it's >this way in a number of other real-world examples — we're actually just >representing a set of user-input search criteria in the SQL query. To >make this optimisation in the client would be distinctly non-trivial, >and it would be *really* nice if the sqlite query planner could do it. > >Is that feasible? > >-- >dwmw2 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Fri, 2014-09-19 at 21:14 -0400, Richard Hipp wrote: > The 50% faster number above is not about better query plans. Speaking of better query plans, though... here's a query which takes about 1700ms on my data set, followed by a couple of optimisations which seem like they might be generically useful, and would help it go 300 times faster: /* This is an address book. The 'main' table holds the primary information while the 'email_list' table holds email addresses because each contact can have more than one email address */ PRAGMA case_sensitive_like=on; CREATE TABLE main (uid text primary key, first_name text, last_name text); INSERT INTO main VALUES('','alice', 'X'); INSERT INTO main VALUES('','bert', 'Y'); INSERT INTO main VALUES('','alison', 'Z'); CREATE INDEX main_first_name_index on main (first_name); CREATE INDEX main_last_name_index on main (last_name); CREATE TABLE email_list (uid text, email text); INSERT INTO email_list VALUES('','al...@example.com'); INSERT INTO email_list VALUES('','ali...@example.com'); INSERT INTO email_list VALUES('','alb...@example.org'); CREATE INDEX email_uid_index on email_list (uid); CREATE INDEX email_email_index on email_list (email); SELECT DISTINCT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid WHERE email_list.email LIKE 'al%' OR main.first_name like 'al%' OR main.last_name like 'al%'; /* 0|0|0|SCAN TABLE main USING INDEX sqlite_autoindex_main_1 0|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?) */ The first optimisation is to realise that sometimes, operations on the result of a JOIN can be more efficiently resolved by using on the original tables. Specifically, in this example, there's no point in looking for a given first_name or last_name in the *joined* table, when we have indices on those fields in the original 'main' table and can do it much faster by querying those and then combining the results. So the original query becomes: SELECT DISTINCT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid WHERE email_list.email LIKE 'al%' UNION SELECT main.uid FROM main WHERE main.first_name LIKE 'al%' OR main.last_name LIKE 'al%'; /* 1|0|0|SCAN TABLE main USING COVERING INDEX sqlite_autoindex_main_1 1|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?) 2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>? AND first_name? AND last_name? AND email? AND first_name? AND last_name___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On 22/09/14 10:48, Richard Hipp wrote: > But if you have any new ideas on how we can further reduce the I/O, we'd love > to hear from you. The single biggest problem for me is defaults. SQLite supports memory mapped i/o which has many advantages. The stat4 analyze does a really good job. WAL reduces writing contention. These are all off by default for various good reasons. But it also means that by default the majority of developers are not getting the best performance SQLite already has to offer, unless they happen to stumble on these. Some like stat4 are especially problematic since it requires recompilation to address. It would be nice if by default things were best. One weak suggestion is to have a few profiles, and have them selectable by pragma. The profile would then go ahead and set various options. Future versions of SQLite would also then update the profiles. For example: pragma profile='max_performance' -- turns on all above, ups caches etc pragma profile='min_memory'-- tunes down everything related to memory Various members of the SQLite consortium can then do things like: pragma profile='firefox' Roger ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Fri, 2014-09-19 at 21:14 -0400, Richard Hipp wrote: > The latest SQLite 3.8.7 alpha version (available on the download page > http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release > from 16 months ago. That is to say, it does 50% more work using the same > number of CPU cycles. > > This performance gain is over and above the query planner improvements that > have also been made. We are constantly looking for new ways to run queries > and adding those ways into the query planner. For example, in the previous > release, we added a new way to evaluate IN operators with non-constant > right-hand-sides that was reported on this mailing list to make some > queries run 5 times faster. > > The 50% faster number above is not about better query plans. This is 50% > faster at the low-level grunt work of moving bits on and off disk and > search b-trees. We have achieved this by incorporating hundreds of > micro-optimizations. Each micro-optimization might improve the performance > by as little as 0.05%. If we get one that improves performance by 0.25%, > that is considered a huge win. Each of these optimizations is unmeasurable > on a real-world system (we have to use cachegrind to get repeatable > run-times) but if you do enough of them, they add up. > > A full 10% of the performance gain has come since the previous release. > There have been a lot of changes. All our tests pass, and we still have > 100% branch test coverage, so we are confident that we didn't break too > much. But your testing is an important part of our quality process. > Please download a source archive or a DLL and give the latest alpha a > whirl, and let us know if you encounter any problems. That looks really promising; thanks for all this work. Tristan, you have a comprehensive set of benchmarks for Evolution's addressbook; is it possible for someone else to run those or would it take more of your time to babysit than it would to run them yourself? -- dwmw2 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Tue, Sep 23, 2014 at 2:33 AM, Donald Shepherdwrote: > Are any of these improvements specifically in the area of the online backup > API, or are they more in the general running of SQLite? > The "speedtest1" benchmark does not use the backup API. However, many of the performance improvements are broadly applicable, so you would expect that you would see some improvement in that API too. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
Are any of these improvements specifically in the area of the online backup API, or are they more in the general running of SQLite? On 20 September 2014 11:14, Richard Hippwrote: > The latest SQLite 3.8.7 alpha version (available on the download page > http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release > from 16 months ago. That is to say, it does 50% more work using the same > number of CPU cycles. > > This performance gain is over and above the query planner improvements that > have also been made. We are constantly looking for new ways to run queries > and adding those ways into the query planner. For example, in the previous > release, we added a new way to evaluate IN operators with non-constant > right-hand-sides that was reported on this mailing list to make some > queries run 5 times faster. > > The 50% faster number above is not about better query plans. This is 50% > faster at the low-level grunt work of moving bits on and off disk and > search b-trees. We have achieved this by incorporating hundreds of > micro-optimizations. Each micro-optimization might improve the performance > by as little as 0.05%. If we get one that improves performance by 0.25%, > that is considered a huge win. Each of these optimizations is unmeasurable > on a real-world system (we have to use cachegrind to get repeatable > run-times) but if you do enough of them, they add up. > > A full 10% of the performance gain has come since the previous release. > There have been a lot of changes. All our tests pass, and we still have > 100% branch test coverage, so we are confident that we didn't break too > much. But your testing is an important part of our quality process. > Please download a source archive or a DLL and give the latest alpha a > whirl, and let us know if you encounter any problems. > > P.S.: Measurements were done using the "speedtest1 --size 5" workload on > Ubuntu 10.13 and gcc 4.8.1 with -Os. YMMV. Version 3.7.17 requires > 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU > cycles, as measured by cachegrind. > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
ok, Nearly all the time is spent in a big 'CTE' select. So maybe this sort of ugly CTE query is not threadable. with f0_k as (SELECT f.rowid as nof2, f.*, p.Dn, p.g1, p.g2, p.w, p.other FROM F0 AS f, Pt AS p WHERE f.Io =p.Io) ,F0_mcalc as (SELECT f.*, p.*, (case when Priority=999 then Cg else Source end) AS Sourcef FROM F0_k AS f, Sor AS p WHERE p.omg in (f.Cg, '') And f.Slic Between Be And En And p.PtGroup In (f.Io,f.g1,f.g2,f.w,f.other,'PL','SO','ST','Not_found') ) ,F0_mcalcmin as (SELECT nof2, min(priority) AS minp FROM F0_mcalc GROUP BY nof2) ,F0_mcalc_final as (SELECT f.*, round(qty*coefficient,3) AS qtyf FROM F0_mcalcmin AS fm inner join F0_mcalc AS f on f.nof2=fm.nof2 and f.priority=fm.minp) select * from F0_mcalc_final ; ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Mon, Sep 22, 2014 at 8:29 AM, Valentin Davydovwrote: > On Fri, Sep 19, 2014 at 09:14:17PM -0400, Richard Hipp wrote: > > > The latest SQLite 3.8.7 alpha version (available on the download page > > http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 > release > > from 16 months ago. That is to say, it does 50% more work using the same > > number of CPU cycles. > > Is there any similar benchmarks with regard to disk i/o operations rather > than CPU? Especially random read of cache misses, I mean. > No, there are no such benchmarks. On the other hand, we've been focused on minimizing I/O in SQLite for the past decade. I think we have extracted about as much as we can fro m that vein. But if you have any new ideas on how we can further reduce the I/O, we'd love to hear from you. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Mon, Sep 22, 2014 at 1:43 PM, big stonewrote: > Hi, > > This 3.8.7alpha release seems to bring about 5% win from 3.8.6 , on my > particular SQL test case. > > Question : "PRAGMA threads=2" didn't bring any speed-up on my "2 true" > cores machine. > > Did I miss a compilation option ? > The multi-thread sort should be on by default. Probably you are just not doing a big enough sort to make it worthwhile to start any new threads. The "threads=2" pragma sets an upper limit on the number of threads. There is no guarantee that SQLite will use that many. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
Hi, This 3.8.7alpha release seems to bring about 5% win from 3.8.6 , on my particular SQL test case. Question : "PRAGMA threads=2" didn't bring any speed-up on my "2 true" cores machine. Did I miss a compilation option ? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Fri, Sep 19, 2014 at 09:14:17PM -0400, Richard Hipp wrote: > The latest SQLite 3.8.7 alpha version (available on the download page > http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release > from 16 months ago. That is to say, it does 50% more work using the same > number of CPU cycles. Is there any similar benchmarks with regard to disk i/o operations rather than CPU? Especially random read of cache misses, I mean. Valentin Davydov. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
On Sat, Sep 20, 2014 at 1:34 PM,wrote: > In trying to see if the new version breaks any of my queries, I ran > several of them repeatedly, and they all appear to have produced the > expected output. > > The only thing I noticed which maybe of interest in relation to speed > performance was (with .timer on) that although the first two run time > numbers (real & user) were consistently smaller in 3.8.7 (when compared to > 3.8.6), the third number (sys) was consistently higher (or same in one > occasion). I guess the first number is the actual time (in seconds) it > took to run the query. I don't even know what the 2nd and 3rd numbers > represent, and how or if they maybe related to the first one. Is that > increase in sys to be expected? > Thanks for the report. I think the increased system time is harmless, but I want to investigate further to be sure. Except right now I'm preoccupied with Mr. Xu's new bug. So please remind me next week if I don't bring this up again. :-) What OS are you using? Can you share your database and test script with us? > > A few examples from many more I tried that all follow the same pattern > (same query & database in each case): > > 3.8.6: Run Time: real 2.434 user 2.386815 sys 0.00 > 3.8.7: Run Time: real 1.856 user 1.778411 sys 0.062400 > --- > 3.8.6: Run Time: real 584.465 user 560.293192 sys 1.638011 > 3.8.7: Run Time: real 518.227 user 430.469159 sys 53.617544 > --- > 3.8.6: Run Time: real 2.449 user 2.340015 sys 0.046800 > 3.8.7: Run Time: real 1.935 user 1.794012 sys 0.046800 > > (Thank you for two great solutions I use daily -- SQLite3 and Fossil) > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
In trying to see if the new version breaks any of my queries, I ran several of them repeatedly, and they all appear to have produced the expected output. The only thing I noticed which maybe of interest in relation to speed performance was (with .timer on) that although the first two run time numbers (real & user) were consistently smaller in 3.8.7 (when compared to 3.8.6), the third number (sys) was consistently higher (or same in one occasion). I guess the first number is the actual time (in seconds) it took to run the query. I don't even know what the 2nd and 3rd numbers represent, and how or if they maybe related to the first one. Is that increase in sys to be expected? A few examples from many more I tried that all follow the same pattern (same query & database in each case): 3.8.6: Run Time: real 2.434 user 2.386815 sys 0.00 3.8.7: Run Time: real 1.856 user 1.778411 sys 0.062400 --- 3.8.6: Run Time: real 584.465 user 560.293192 sys 1.638011 3.8.7: Run Time: real 518.227 user 430.469159 sys 53.617544 --- 3.8.6: Run Time: real 2.449 user 2.340015 sys 0.046800 3.8.7: Run Time: real 1.935 user 1.794012 sys 0.046800 (Thank you for two great solutions I use daily -- SQLite3 and Fossil) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
I, as well, wish to thank you for this tool Dr. Hipp. I've never published a public application using this engine, but, at my place of employment, where my primary responsibility is to just monitor servers world wide, I've coded a few tidbit web and desktop applications that have made my job SO much easier. Without this, my desktop apps would have to rely on MySQL, which is massive overkill (resource wise) for some of the things I've needed to use it for. Thanks again! On Fri, Sep 19, 2014 at 9:22 PM, jose isaias cabrera < jic...@cinops.xerox.com> wrote: > "Richard Hipp" wrote... > > > > The latest SQLite 3.8.7 alpha version (available on the download page >> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 >> release >> from 16 months ago. That is to say, it does 50% more work using the same >> number of CPU cycles. >> >> This performance gain is over and above the query planner improvements >> that >> have also been made. We are constantly looking for new ways to run >> queries >> and adding those ways into the query planner. For example, in the >> previous >> release, we added a new way to evaluate IN operators with non-constant >> right-hand-sides that was reported on this mailing list to make some >> queries run 5 times faster. >> >> The 50% faster number above is not about better query plans. This is 50% >> faster at the low-level grunt work of moving bits on and off disk and >> search b-trees. We have achieved this by incorporating hundreds of >> micro-optimizations. Each micro-optimization might improve the >> performance >> by as little as 0.05%. If we get one that improves performance by 0.25%, >> that is considered a huge win. Each of these optimizations is >> unmeasurable >> on a real-world system (we have to use cachegrind to get repeatable >> run-times) but if you do enough of them, they add up. >> >> A full 10% of the performance gain has come since the previous release. >> There have been a lot of changes. All our tests pass, and we still have >> 100% branch test coverage, so we are confident that we didn't break too >> much. But your testing is an important part of our quality process. >> Please download a source archive or a DLL and give the latest alpha a >> whirl, and let us know if you encounter any problems. >> >> P.S.: Measurements were done using the "speedtest1 --size 5" workload on >> Ubuntu 10.13 and gcc 4.8.1 with -Os. YMMV. Version 3.7.17 requires >> 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU >> cycles, as measured by cachegrind. >> > > I don't know if folks have ever thank you, Dr. Hipp, for this wonderful > gift to the world called SQLite. I have become a legend in my own world > with this tool. :-) I do have to say that I have used it since 2006 and it > has increased in speed every year. Thank you! Thank you! And in my own > native language, muchas gracias! > > josé > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] 50% faster than 3.7.17
"Richard Hipp" wrote... The latest SQLite 3.8.7 alpha version (available on the download page http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release from 16 months ago. That is to say, it does 50% more work using the same number of CPU cycles. This performance gain is over and above the query planner improvements that have also been made. We are constantly looking for new ways to run queries and adding those ways into the query planner. For example, in the previous release, we added a new way to evaluate IN operators with non-constant right-hand-sides that was reported on this mailing list to make some queries run 5 times faster. The 50% faster number above is not about better query plans. This is 50% faster at the low-level grunt work of moving bits on and off disk and search b-trees. We have achieved this by incorporating hundreds of micro-optimizations. Each micro-optimization might improve the performance by as little as 0.05%. If we get one that improves performance by 0.25%, that is considered a huge win. Each of these optimizations is unmeasurable on a real-world system (we have to use cachegrind to get repeatable run-times) but if you do enough of them, they add up. A full 10% of the performance gain has come since the previous release. There have been a lot of changes. All our tests pass, and we still have 100% branch test coverage, so we are confident that we didn't break too much. But your testing is an important part of our quality process. Please download a source archive or a DLL and give the latest alpha a whirl, and let us know if you encounter any problems. P.S.: Measurements were done using the "speedtest1 --size 5" workload on Ubuntu 10.13 and gcc 4.8.1 with -Os. YMMV. Version 3.7.17 requires 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU cycles, as measured by cachegrind. I don't know if folks have ever thank you, Dr. Hipp, for this wonderful gift to the world called SQLite. I have become a legend in my own world with this tool. :-) I do have to say that I have used it since 2006 and it has increased in speed every year. Thank you! Thank you! And in my own native language, muchas gracias! josé ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] 50% faster than 3.7.17
The latest SQLite 3.8.7 alpha version (available on the download page http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release from 16 months ago. That is to say, it does 50% more work using the same number of CPU cycles. This performance gain is over and above the query planner improvements that have also been made. We are constantly looking for new ways to run queries and adding those ways into the query planner. For example, in the previous release, we added a new way to evaluate IN operators with non-constant right-hand-sides that was reported on this mailing list to make some queries run 5 times faster. The 50% faster number above is not about better query plans. This is 50% faster at the low-level grunt work of moving bits on and off disk and search b-trees. We have achieved this by incorporating hundreds of micro-optimizations. Each micro-optimization might improve the performance by as little as 0.05%. If we get one that improves performance by 0.25%, that is considered a huge win. Each of these optimizations is unmeasurable on a real-world system (we have to use cachegrind to get repeatable run-times) but if you do enough of them, they add up. A full 10% of the performance gain has come since the previous release. There have been a lot of changes. All our tests pass, and we still have 100% branch test coverage, so we are confident that we didn't break too much. But your testing is an important part of our quality process. Please download a source archive or a DLL and give the latest alpha a whirl, and let us know if you encounter any problems. P.S.: Measurements were done using the "speedtest1 --size 5" workload on Ubuntu 10.13 and gcc 4.8.1 with -Os. YMMV. Version 3.7.17 requires 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU cycles, as measured by cachegrind. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users