Re: [sqlite] 50% faster than 3.7.17

2014-10-09 Thread David Woodhouse
On Thu, 2014-10-09 at 17:41 +0700, Dan Kennedy wrote:
> I think I understand the second optimization. You're saying that given this:
> 
>SELECT DISTINCT main.uid
>   FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>   WHERE email_list.email LIKE 'al%'
> 
> SQLite should deduce that since the LIKE implies "email_list.email IS
> NOT NULL" 

(
Digression: SQLite *should* deduce the LIKE implies the IS NOT NULL
condition. My *actual* queries all have, for some bizarre
historical reason I haven't yet explored,

email_list.value IS NOT NULL AND email_list.value LIKE 'foo%'

You'd think that the former condition would end up optimised away,
but timings show that it clearly isn't. I wasn't going to bring that
up here since I think it is so clearly a stupid thing to be putting
in our query in the first place. But you just made me do it anyway.
)

> the LEFT JOIN is equivalent to a regular JOIN. Which would allow
> SQLite to reorder the tables and perhaps come up with a more efficient
> query plan. Correct?

Indeed.

As I said, a glib response to this suggested optimisation in *isolation*
might be that the user shouldn't have asked for a LEFT JOIN at all if
they didn't need it. But of course when coupled with the other suggested
transforms, that glib response is a lot less valid.

> 
> The first optimization might be trickier.
 ...
>  I guess queries that have a DISTINCT clause don't have this problem.

Right. I think it only works with DISTINCT, but that's OK.

>  So it could in theory transform your query to:
> 
>SELECT DISTINCT
> main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
> WHERE email_list.email LIKE 'al%'
>  UNION ALL
>SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
> WHERE main.first_name like 'al%'
>  UNION ALL
>SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
> WHERE main.last_name like 'al%';
> 
> The the hypothetical optimization above could figure out that the
> first LEFT JOIN could just as easily be a JOIN.

Right.

> And that the other two LEFT JOINs are not required at all due to the
> DISTINCT and the fact that the WHERE clauses on the sub-selects do not
> reference the joined table at all.

Yeah, that's probably true. I'd assumed that we'd do it as part of the
"first optimisation" introducing the UNION but it's probably better
handled as a separate transform as you have it here.

> It seems like there are a few moving parts here. And none of these are
> trivial changes. So, good ideas that might show up in an SQLite
> release at some point, but it's not realistic to expect these
> optimizations to be implemented quickly.

TBH that's all I was really looking for; thanks. I *hate* it when people
silently work around limitations in my software without bringing them to
my attention, and I didn't want to be guilty of doing the same.

I certainly don't expect an instant fix; just knowing that it's on the
radar is perfectly sufficient. Now I can commit my hackish workaround to
rewrite certain special-case queries to optimise the output of the
current query planner for them, and still sleep at night :)

Thanks.

>  Unless, of course, you can propose a patch and they turn out to be
> simpler than they look.

That would be interesting but realistically it's outside my capacity at
the moment. I am already *so* far into shaving this yak that I can
barely remember what I was actually intending to do in the first
place :)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-10-09 Thread Dan Kennedy

On 10/09/2014 04:38 PM, David Woodhouse wrote:

On Thu, 2014-09-25 at 11:13 +0100, David Woodhouse wrote:

I suggested a couple of specific optimisations which the query planner
might be able to make, which should hopefully have benefits wider than
just my own use case. Are those not viable?

I'm preparing to commit a workaround to Evolution to avoid this issue,
and then move on with my life and forget about it.

Before I do, is it worth me rephrasing this as a 'suboptimal query plan'
bug report so it gets tracked and might get attention later? Or should I
just forget the idea of getting it fixed in sqlite?



Well, you could always create a patch...

I think I understand the second optimization. You're saying that given this:

  SELECT DISTINCT main.uid
 FROM main LEFT JOIN email_list ON main.uid = email_list.uid
 WHERE email_list.email LIKE 'al%'

SQLite should deduce that since the LIKE implies "email_list.email IS NOT NULL" 
the LEFT JOIN is equivalent to a regular JOIN. Which would allow SQLite to reorder the 
tables and perhaps come up with a more efficient query plan. Correct?

Seems like a reasonable idea.

The first optimization might be trickier. With queries that feature a single 
table:

  SELECT cols FROM tbl WHERE a=? OR b=? OR c=?

the planner may elect to run something very close to this:

  SELECT cols FROM tbl WHERE a=?
UNION ALL
  SELECT cols FROM tbl WHERE b=?
UNION ALL
  SELECT cols FROM tbl WHERE c=?

However, after returning each row, we remember its PRIMARY KEY (either the 
rowid or real PK for WITHOUT ROWID tables). Similar transformations for 
individual loops within join queries are also possible.

However, with a JOIN query, we don't currently attempt this kind of transform. 
I think because we would have to create some kind of composite key to use in 
place of the PRIMARY KEY to avoid returning duplicates. I guess queries that 
have a DISTINCT clause don't have this problem. So it could in theory transform 
your query to:

  SELECT DISTINCT
   main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
   WHERE email_list.email LIKE 'al%'
UNION ALL
  SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
   WHERE main.first_name like 'al%'
UNION ALL
  SELECT main.uid FROM main LEFT JOIN email_list ON main.uid = email_list.uid
   WHERE main.last_name like 'al%';

The the hypothetical optimization above could figure out that the first LEFT 
JOIN could just as easily be a JOIN.

And that the other two LEFT JOINs are not required at all due to the DISTINCT 
and the fact that the WHERE clauses on the sub-selects do not reference the 
joined table at all.

It seems like there are a few moving parts here. And none of these are trivial 
changes. So, good ideas that might show up in an SQLite release at some point, 
but it's not realistic to expect these optimizations to be implemented quickly. 
Unless, of course, you can propose a patch and they turn out to be simpler than 
they look.

Regards,
Dan.






___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-10-09 Thread David Woodhouse
On Thu, 2014-09-25 at 11:13 +0100, David Woodhouse wrote:
> 
> I suggested a couple of specific optimisations which the query planner
> might be able to make, which should hopefully have benefits wider than
> just my own use case. Are those not viable?

I'm preparing to commit a workaround to Evolution to avoid this issue,
and then move on with my life and forget about it.

Before I do, is it worth me rephrasing this as a 'suboptimal query plan'
bug report so it gets tracked and might get attention later? Or should I
just forget the idea of getting it fixed in sqlite?

-- 
dwmw2
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-30 Thread jose isaias cabrera

"Simon Slavin" wrote...



On 24 Sep 2014, at 2:13pm, jose isaias cabrera  
wrote:


This would be a nice set of options. On my case, I would set all 
connections to our project to be" max_performance", as it is what we 
need.  Just thinking out loud.


How much max is max ?  Are you willing to give up ACID ?  Are you willing 
to have your database corrupted if there's power loss ?


Ok, back to reality... ;-), no, I will not.  Good questions, Simon.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-25 Thread David Woodhouse
On Tue, 2014-09-23 at 17:48 +0100, David Woodhouse wrote:
> That looks really promising; thanks for all this work.
> 
> Tristan, you have a comprehensive set of benchmarks for Evolution's
> addressbook; is it possible for someone else to run those or would it
> take more of your time to babysit than it would to run them yourself?

I was able to get these benchmarks running, and compared the 3.8.7
snapshot against the version of 3.8.6 which is shipped in Fedora 20.

Results temporarily at http://westmere.infradead.org/charts/sqlite387/

This isn't a perfect comparison since the Fedora package is built from
sqlite-src-3080600.zip and the equivalent source for the 3.8.7 alpha
didn't seem to be available. But it was built as a shared library using
the same configuration and compiler flags, and the benchmark compares
identical code running against both the 3.8.6 and 3.8.7 libraries.

I haven't done any real analysis other than looking at the pretty
pictures, but certainly nothing seems to go slower (or break), and there
are some noticeable improvements on some of the benchmarks
(filter-by-long-full-name-suffix.png,
filter-by-short-full-name-suffix.png, filter-containing-full-name.png,
filter-containing-given-name.png, filter-containing-phone-number.png).

-- 
dwmw2
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-25 Thread David Woodhouse
On Wed, 2014-09-24 at 19:36 -0600, Keith Medcalf wrote:
> 
> Interesting.  From that code you might want to try something like this:
> 
> SELECT uid, vcard, bdata 
>   FROM folder_id
>  WHERE uid in ( select uid FROM email_list where value like 'p%'
>union
> select uid from folder_id where nickname LIKE 'p%'
>union
> select uid from folder_id where full_name LIKE 'p%'
>union
> select uid from folder_id where family_name LIKE 'p%'
>union
> select uid from folder_id where given_name LIKE 'p%'
>union
> select uid from folder_id where nickname LIKE 'p%'
>union
> select uid from folder_id where file_as LIKE 'p%'
>   );
> 
> Then having nocase indexes on the various search fields will all work as 
> expected.

Yeah, that achieves the same speed.

I'm not sure it addresses the real problem though. It still only really
applies when the user's query (which we're translating to SQL) contains
only 'OR' and no 'AND' clauses. It doesn't help me translate a query in
the general case.

Now, I'm not *entirely* averse to having a special case for the 'all OR'
query — this particular query is the address autocompletion so it's
common and the user is actually waiting for it as they type. In fact, as
a proof of concept I've already *implemented* a hackish special case to
spot this case and basically submit a hand-crafted query instead of the
normal translation to SQL:
https://bugzilla.gnome.org/show_bug.cgi?id=699597#c19

The problem is that I don't *want* to have to have that special case.
This is just a query optimisation, which is something the query planner
is supposed to do. I don't *want* to implement this and other
optimisations in the client, just to trick *today's* sqlite query
planner into spotting the best way to do it. That's the Wrong Way™ to do
things.

There are two alternative approaches which *don't* seem as wrong. 

Firstly, if there's a sane way to rewrite our translator so that it
naturally uses UNION for OR clauses, that might make sense. But to cope
with AND clauses, AFAICT the natural extension of that approach would be
to use 'SELECT FROM ... SELECT FROM' and then we lose the use of indices
for *those* cases, right¹? Tristan started a thread about this 'nested
select' last year, which I picked up a couple of weeks ago. It didn't
seem like it was a viable strategy for the general case.

The second approach, and this is why I started this thread, is to 'fix'
the query planner so that that it can see for *itself* the best way to
implement a given query given the constraints.

I suggested a couple of specific optimisations which the query planner
might be able to make, which should hopefully have benefits wider than
just my own use case. Are those not viable?

-- 
dwmw2

¹ I do realise that in the special case of a single top-level AND such
  that all its sub-clauses are necessary conditions, I can do something
  nicer. But again, it's a special case and doesn't handle the general
  case of nested AND and OR clauses. And it's still the *client* code
  doing the job of the optimiser, spotting necessary vs. sufficient
  conditions and pulling them out to the top level for more efficient
  implementation.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-25 Thread Stephen Chrzanowski
In the vein of configurability, and in a day dream I just had, it would be
nice (But probably not possible as there could be compiler directives you
can't use at the same time) that we could have a single output
DLL/SO/whatever dumped from the compiler that had everything available,
then, via defaults in the source code, IF/THEN/ELSE (or whatever the C
equiv is) operations are considered to have certain functionality run
during the query process.  Via a configuration file, or via pragma which
would be generated based on a configuration, feature sets could be set at
run time.

One fault with this would be the issue of COMPILED size.  I realize we're
in an age where we generally talk megabytes instead of kilobytes in regards
to main thread compiled code, but I'm sure there are platforms out there
that are using SQLite that have to fit in the kilobyte range, and need
special compiled sources.

The advantage to this is that I wouldn't have to compile X number of DLLs
for Y number of programs to get the results I need, just have one
configuration file kicking around that I can adjust as needed.  It'd also
help with fine tuning to validate whether I need certain features turned on
or off.

On Wed, Sep 24, 2014 at 1:15 PM, Roger Binns  wrote:

> On 24/09/14 06:19, Simon Slavin wrote:
> > How much max is max ?
>
> Not giving up ACID.  But for example stat4 is better than the default
> stat1.
>  Memory mapping (especially on 64 bit) is great.  So is WAL.  All are off
> by
> default.
>
> If you want to give up ACID then you should really be on your own to look
> at
> the various tradeoffs.
>
> Roger
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-24 Thread Keith Medcalf

Interesting.  From that code you might want to try something like this:

SELECT uid, vcard, bdata 
  FROM folder_id
 WHERE uid in ( select uid FROM email_list where value like 'p%'
   union
select uid from folder_id where nickname LIKE 'p%'
   union
select uid from folder_id where full_name LIKE 'p%'
   union
select uid from folder_id where family_name LIKE 'p%'
   union
select uid from folder_id where given_name LIKE 'p%'
   union
select uid from folder_id where nickname LIKE 'p%'
   union
select uid from folder_id where file_as LIKE 'p%'
  );

Then having nocase indexes on the various search fields will all work as 
expected.

>-Original Message-
>From: David Woodhouse [mailto:dw...@infradead.org]
>Sent: Wednesday, 24 September, 2014 07:25
>To: Keith Medcalf
>Cc: General Discussion of SQLite Database
>Subject: Re: [sqlite] 50% faster than 3.7.17
>
>On Wed, 2014-09-24 at 06:13 -0600, Keith Medcalf wrote:
>>
>> Would it not be more efficient to skip the join altogether since all
>> you want is the list of uid's, and assuming that you have maintained
>> the referential integrity of your database mail_list(list_uid)
>> references main(uid)?
>>
>> SELECT list_uid
>>   FROM mail_list
>>  WHERE email LIKE 'al%'
>> UNION
>> SELECT uid
>>   FROM main
>>  WHERE first_name LIKE 'al%'
>> OR last_name LIKE 'al%';
>
>Yes, but only because I oversimplified. In fact my queries are actually
>after *other* fields from the main table. It's just that I'd elided
>those fields from the example.
>
>For the sake of the example, please assume my queries were actually
> 'SELECT DISTINCT main.uid, main.first_name, main.last_name FROM …'
>
>The real-life query that this is simplified *from* is discussed at
>https://git.gnome.org/browse/evolution-data-server/commit/?id=5f9f5b52807
>
>--
>dwmw2
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-24 Thread Roger Binns
On 24/09/14 06:19, Simon Slavin wrote:
> How much max is max ?  

Not giving up ACID.  But for example stat4 is better than the default stat1.
 Memory mapping (especially on 64 bit) is great.  So is WAL.  All are off by
default.

If you want to give up ACID then you should really be on your own to look at
the various tradeoffs.

Roger

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-24 Thread David Woodhouse
On Wed, 2014-09-24 at 06:13 -0600, Keith Medcalf wrote:
> 
> Would it not be more efficient to skip the join altogether since all
> you want is the list of uid's, and assuming that you have maintained
> the referential integrity of your database mail_list(list_uid)
> references main(uid)?
> 
> SELECT list_uid 
>   FROM mail_list
>  WHERE email LIKE 'al%'
> UNION 
> SELECT uid
>   FROM main 
>  WHERE first_name LIKE 'al%'
> OR last_name LIKE 'al%';

Yes, but only because I oversimplified. In fact my queries are actually
after *other* fields from the main table. It's just that I'd elided
those fields from the example.

For the sake of the example, please assume my queries were actually
 'SELECT DISTINCT main.uid, main.first_name, main.last_name FROM …'

The real-life query that this is simplified *from* is discussed at
https://git.gnome.org/browse/evolution-data-server/commit/?id=5f9f5b52807

-- 
dwmw2
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-24 Thread Simon Slavin

On 24 Sep 2014, at 2:13pm, jose isaias cabrera  wrote:

> This would be a nice set of options. On my case, I would set all connections 
> to our project to be" max_performance", as it is what we need.  Just thinking 
> out loud.

How much max is max ?  Are you willing to give up ACID ?  Are you willing to 
have your database corrupted if there's power loss ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-24 Thread jose isaias cabrera


"Roger Binns" wrote...


On 22/09/14 10:48, Richard Hipp wrote:
But if you have any new ideas on how we can further reduce the I/O, we'd 
love to hear from you.


The single biggest problem for me is defaults.  SQLite supports memory
mapped i/o which has many advantages.  The stat4 analyze does a really 
good

job.  WAL reduces writing contention.  These are all off by default for
various good reasons.  But it also means that by default the majority of
developers are not getting the best performance SQLite already has to 
offer,

unless they happen to stumble on these.  Some like stat4 are especially
problematic since it requires recompilation to address.

It would be nice if by default things were best.  One weak suggestion is 
to
have a few profiles, and have them selectable by pragma.  The profile 
would
then go ahead and set various options.  Future versions of SQLite would 
also

then update the profiles.

For example:

 pragma profile='max_performance' -- turns on all above, ups caches etc
 pragma profile='min_memory'-- tunes down everything related to memory

Various members of the SQLite consortium can then do things like:

 pragma profile='firefox'


This would be a nice set of options. On my case, I would set all connections 
to our project to be" max_performance", as it is what we need.  Just 
thinking out loud.


josé 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-24 Thread Keith Medcalf

Would it not be more efficient to skip the join altogether since all you want 
is the list of uid's, and assuming that you have maintained the referential 
integrity of your database mail_list(list_uid) references main(uid)?

SELECT list_uid 
  FROM mail_list
 WHERE email LIKE 'al%'
UNION 
SELECT uid
  FROM main 
 WHERE first_name LIKE 'al%'
OR last_name LIKE 'al%';

>-Original Message-
>Speaking of better query plans, though... here's a query which takes
>about 1700ms on my data set, followed by a couple of optimisations which
>seem like they might be generically useful, and would help it go 300
>times faster:
>
>/* This is an address book. The 'main' table holds the primary
>   information while the 'email_list' table holds email addresses
>   because each contact can have more than one email address */
>PRAGMA case_sensitive_like=on;
>CREATE TABLE main (uid text primary key, first_name text, last_name
>text);
>INSERT INTO main VALUES('','alice', 'X');
>INSERT INTO main VALUES('','bert', 'Y');
>INSERT INTO main VALUES('','alison', 'Z');
>CREATE INDEX main_first_name_index on main (first_name);
>CREATE INDEX main_last_name_index on main (last_name);
>CREATE TABLE email_list (uid text, email text);
>INSERT INTO email_list VALUES('','al...@example.com');
>INSERT INTO email_list VALUES('','ali...@example.com');
>INSERT INTO email_list VALUES('','alb...@example.org');
>CREATE INDEX email_uid_index on email_list (uid);
>CREATE INDEX email_email_index on email_list (email);
>
>SELECT DISTINCT main.uid
>   FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>   WHERE email_list.email LIKE 'al%'
>   OR main.first_name like 'al%'
>   OR main.last_name like 'al%';
>/*
>0|0|0|SCAN TABLE main USING INDEX sqlite_autoindex_main_1
>0|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
>*/
>
>The first optimisation is to realise that sometimes, operations on the
>result of a JOIN can be more efficiently resolved by using on the
>original tables. Specifically, in this example, there's no point in
>looking for a given first_name or last_name in the *joined* table, when
>we have indices on those fields in the original 'main' table and can do
>it much faster by querying those and then combining the results.
>
>So the original query becomes:
>
>SELECT DISTINCT main.uid
>   FROM main LEFT JOIN email_list ON main.uid = email_list.uid
>   WHERE email_list.email LIKE 'al%'
>UNION SELECT main.uid
>   FROM main WHERE main.first_name LIKE 'al%'
>   OR main.last_name LIKE 'al%';
>/*
>1|0|0|SCAN TABLE main USING COVERING INDEX sqlite_autoindex_main_1
>1|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
>2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>?
>AND first_name2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND
>last_name0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
>*/
>
>With that optimisation, the time taken on my data set is reduced to
>about 1000ms. Which is quite a nice improvement from 1700ms but that's
>just the first step...
>
>The second optimisation is to realise that there's no point in that
>being a LEFT JOIN any more, given that it's now only ever going to
>return results where the right-hand-side actually does exist. Perhaps
>normally we'd blame the user for such a naïve query, but when coupled
>with the first optimisation, spotting this optimisation does actually
>start to make sense. So now the query becomes:
>
>SELECT DISTINCT main.uid
>   FROM main JOIN email_list ON main.uid = email_list.uid
>   WHERE email_list.email LIKE 'al%'
>UNION SELECT main.uid
>   FROM main WHERE main.first_name LIKE 'al%'
>   OR main.last_name LIKE 'al%';
>/*
>1|0|1|SEARCH TABLE email_list USING INDEX email_email_index (email>? AND
>email1|1|0|SEARCH TABLE main USING COVERING INDEX sqlite_autoindex_main_1
>(uid=?)
>2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>?
>AND first_name2|0|0|SEARCH TABLE main USING INDEX main_last_name_index (last_name>? AND
>last_name0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
>*/
>
>Now the query takes 6ms on my data set.
>
>If this were a hard-coded query, of course it would make sense for me to
>just do it this way in the client. But in my case — and I suspect it's
>this way in a number of other real-world examples — we're actually just
>representing a set of user-input search criteria in the SQL query. To
>make this optimisation in the client would be distinctly non-trivial,
>and it would be *really* nice if the sqlite query planner could do it.
>
>Is that feasible?
>
>--
>dwmw2



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-24 Thread David Woodhouse
On Fri, 2014-09-19 at 21:14 -0400, Richard Hipp wrote:
> The 50% faster number above is not about better query plans. 

Speaking of better query plans, though... here's a query which takes
about 1700ms on my data set, followed by a couple of optimisations which
seem like they might be generically useful, and would help it go 300
times faster:

/* This is an address book. The 'main' table holds the primary
   information while the 'email_list' table holds email addresses
   because each contact can have more than one email address */
PRAGMA case_sensitive_like=on;
CREATE TABLE main (uid text primary key, first_name text, last_name text);
INSERT INTO main VALUES('','alice', 'X');
INSERT INTO main VALUES('','bert', 'Y');
INSERT INTO main VALUES('','alison', 'Z');
CREATE INDEX main_first_name_index on main (first_name);
CREATE INDEX main_last_name_index on main (last_name);
CREATE TABLE email_list (uid text, email text);
INSERT INTO email_list VALUES('','al...@example.com');
INSERT INTO email_list VALUES('','ali...@example.com');
INSERT INTO email_list VALUES('','alb...@example.org');
CREATE INDEX email_uid_index on email_list (uid);
CREATE INDEX email_email_index on email_list (email);

SELECT DISTINCT main.uid
   FROM main LEFT JOIN email_list ON main.uid = email_list.uid
   WHERE email_list.email LIKE 'al%'
   OR main.first_name like 'al%'
   OR main.last_name like 'al%';
/*
0|0|0|SCAN TABLE main USING INDEX sqlite_autoindex_main_1
0|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
*/

The first optimisation is to realise that sometimes, operations on the
result of a JOIN can be more efficiently resolved by using on the
original tables. Specifically, in this example, there's no point in
looking for a given first_name or last_name in the *joined* table, when
we have indices on those fields in the original 'main' table and can do
it much faster by querying those and then combining the results.

So the original query becomes:

SELECT DISTINCT main.uid
   FROM main LEFT JOIN email_list ON main.uid = email_list.uid
   WHERE email_list.email LIKE 'al%'
UNION SELECT main.uid
   FROM main WHERE main.first_name LIKE 'al%'
   OR main.last_name LIKE 'al%';
/*
1|0|0|SCAN TABLE main USING COVERING INDEX sqlite_autoindex_main_1
1|1|1|SEARCH TABLE email_list USING INDEX email_uid_index (uid=?)
2|0|0|SEARCH TABLE main USING INDEX main_first_name_index (first_name>? AND 
first_name? AND 
last_name? AND 
email? AND 
first_name? AND 
last_name___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-23 Thread Roger Binns
On 22/09/14 10:48, Richard Hipp wrote:
> But if you have any new ideas on how we can further reduce the I/O, we'd love 
> to hear from you.

The single biggest problem for me is defaults.  SQLite supports memory
mapped i/o which has many advantages.  The stat4 analyze does a really good
job.  WAL reduces writing contention.  These are all off by default for
various good reasons.  But it also means that by default the majority of
developers are not getting the best performance SQLite already has to offer,
unless they happen to stumble on these.  Some like stat4 are especially
problematic since it requires recompilation to address.

It would be nice if by default things were best.  One weak suggestion is to
have a few profiles, and have them selectable by pragma.  The profile would
then go ahead and set various options.  Future versions of SQLite would also
then update the profiles.

For example:

  pragma profile='max_performance' -- turns on all above, ups caches etc
  pragma profile='min_memory'-- tunes down everything related to memory

Various members of the SQLite consortium can then do things like:

  pragma profile='firefox'

Roger
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-23 Thread David Woodhouse
On Fri, 2014-09-19 at 21:14 -0400, Richard Hipp wrote:
> The latest SQLite 3.8.7 alpha version (available on the download page
> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
> from 16 months ago.  That is to say, it does 50% more work using the same
> number of CPU cycles.
> 
> This performance gain is over and above the query planner improvements that
> have also been made.  We are constantly looking for new ways to run queries
> and adding those ways into the query planner.  For example, in the previous
> release, we added a new way to evaluate IN operators with non-constant
> right-hand-sides that was reported on this mailing list to make some
> queries run 5 times faster.
> 
> The 50% faster number above is not about better query plans.  This is 50%
> faster at the low-level grunt work of moving bits on and off disk and
> search b-trees.  We have achieved this by incorporating hundreds of
> micro-optimizations.  Each micro-optimization might improve the performance
> by as little as 0.05%.  If we get one that improves performance by 0.25%,
> that is considered a huge win.  Each of these optimizations is unmeasurable
> on a real-world system (we have to use cachegrind to get repeatable
> run-times) but if you do enough of them, they add up.
> 
> A full 10% of the performance gain has come since the previous release.
> There have been a lot of changes.  All our tests pass, and we still have
> 100% branch test coverage, so we are confident that we didn't break too
> much.  But your testing is an important part of our quality process.
> Please download a source archive or a DLL and give the latest alpha a
> whirl, and let us know if you encounter any problems.

That looks really promising; thanks for all this work.

Tristan, you have a comprehensive set of benchmarks for Evolution's
addressbook; is it possible for someone else to run those or would it
take more of your time to babysit than it would to run them yourself?

-- 
dwmw2
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-23 Thread Richard Hipp
On Tue, Sep 23, 2014 at 2:33 AM, Donald Shepherd 
wrote:

> Are any of these improvements specifically in the area of the online backup
> API, or are they more in the general running of SQLite?
>

The "speedtest1" benchmark does not use the backup API.  However, many of
the performance improvements are broadly applicable, so you would expect
that you would see some improvement in that API too.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-23 Thread Donald Shepherd
Are any of these improvements specifically in the area of the online backup
API, or are they more in the general running of SQLite?

On 20 September 2014 11:14, Richard Hipp  wrote:

> The latest SQLite 3.8.7 alpha version (available on the download page
> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
> from 16 months ago.  That is to say, it does 50% more work using the same
> number of CPU cycles.
>
> This performance gain is over and above the query planner improvements that
> have also been made.  We are constantly looking for new ways to run queries
> and adding those ways into the query planner.  For example, in the previous
> release, we added a new way to evaluate IN operators with non-constant
> right-hand-sides that was reported on this mailing list to make some
> queries run 5 times faster.
>
> The 50% faster number above is not about better query plans.  This is 50%
> faster at the low-level grunt work of moving bits on and off disk and
> search b-trees.  We have achieved this by incorporating hundreds of
> micro-optimizations.  Each micro-optimization might improve the performance
> by as little as 0.05%.  If we get one that improves performance by 0.25%,
> that is considered a huge win.  Each of these optimizations is unmeasurable
> on a real-world system (we have to use cachegrind to get repeatable
> run-times) but if you do enough of them, they add up.
>
> A full 10% of the performance gain has come since the previous release.
> There have been a lot of changes.  All our tests pass, and we still have
> 100% branch test coverage, so we are confident that we didn't break too
> much.  But your testing is an important part of our quality process.
> Please download a source archive or a DLL and give the latest alpha a
> whirl, and let us know if you encounter any problems.
>
> P.S.:  Measurements were done using the "speedtest1 --size 5" workload on
> Ubuntu 10.13 and gcc 4.8.1 with -Os.  YMMV.  Version 3.7.17 requires
> 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU
> cycles, as measured by cachegrind.
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-22 Thread big stone
ok,

Nearly all the time is spent in a big 'CTE' select.

So maybe this sort of ugly CTE query  is not threadable.

with
f0_k as
  (SELECT f.rowid as nof2, f.*, p.Dn, p.g1, p.g2, p.w, p.other FROM F0 AS
f, Pt AS p WHERE  f.Io =p.Io)
,F0_mcalc as
  (SELECT f.*, p.*, (case when Priority=999 then Cg else Source end) AS
Sourcef
   FROM F0_k AS f, Sor AS p WHERE  p.omg  in (f.Cg, '') And
   f.Slic Between Be And En
   And p.PtGroup In (f.Io,f.g1,f.g2,f.w,f.other,'PL','SO','ST','Not_found')
)
,F0_mcalcmin as
  (SELECT nof2, min(priority) AS minp FROM F0_mcalc GROUP BY nof2)
,F0_mcalc_final as
 (SELECT f.*, round(qty*coefficient,3) AS qtyf FROM  F0_mcalcmin
  AS fm inner join  F0_mcalc AS f on f.nof2=fm.nof2 and
f.priority=fm.minp)
 select *   from F0_mcalc_final ;
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-22 Thread Richard Hipp
On Mon, Sep 22, 2014 at 8:29 AM, Valentin Davydov 
wrote:

> On Fri, Sep 19, 2014 at 09:14:17PM -0400, Richard Hipp wrote:
>
> > The latest SQLite 3.8.7 alpha version (available on the download page
> > http://www.sqlite.org/download.html) is 50% faster than the 3.7.17
> release
> > from 16 months ago.  That is to say, it does 50% more work using the same
> > number of CPU cycles.
>
> Is there any similar benchmarks with regard to disk i/o operations rather
> than CPU? Especially random read of cache misses, I mean.
>

No, there are no such benchmarks.  On the other hand, we've been focused on
minimizing I/O in SQLite for the past decade.  I think we have extracted
about as much as we can fro m that vein.  But if you have any new ideas on
how we can further reduce the I/O, we'd love to hear from you.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-22 Thread Richard Hipp
On Mon, Sep 22, 2014 at 1:43 PM, big stone  wrote:

> Hi,
>
> This 3.8.7alpha release seems to bring about 5%  win from 3.8.6 , on my
> particular SQL test case.
>
> Question : "PRAGMA threads=2" didn't bring any speed-up on my "2 true"
> cores machine.
>
> Did I miss a compilation option ?
>

The multi-thread sort should be on by default.  Probably you are just not
doing a big enough sort to make it worthwhile to start any new threads.
The "threads=2" pragma sets an upper limit on the number of threads.  There
is no guarantee that SQLite will use that many.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-22 Thread big stone
Hi,

This 3.8.7alpha release seems to bring about 5%  win from 3.8.6 , on my
particular SQL test case.

Question : "PRAGMA threads=2" didn't bring any speed-up on my "2 true"
cores machine.

Did I miss a compilation option ?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-22 Thread Valentin Davydov
On Fri, Sep 19, 2014 at 09:14:17PM -0400, Richard Hipp wrote:

> The latest SQLite 3.8.7 alpha version (available on the download page
> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
> from 16 months ago.  That is to say, it does 50% more work using the same
> number of CPU cycles.

Is there any similar benchmarks with regard to disk i/o operations rather 
than CPU? Especially random read of cache misses, I mean.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-20 Thread Richard Hipp
On Sat, Sep 20, 2014 at 1:34 PM,  wrote:

> In trying to see if the new version breaks any of my queries, I ran
> several of them repeatedly, and they all appear to have produced the
> expected output.
>
> The only thing I noticed which maybe of interest in relation to speed
> performance was (with .timer on) that although the first two run time
> numbers (real & user) were consistently smaller in 3.8.7 (when compared to
> 3.8.6), the third number (sys) was consistently higher (or same in one
> occasion).  I guess the first number is the actual time (in seconds) it
> took to run the query.  I don't even know what the 2nd and 3rd numbers
> represent, and how or if they maybe related to the first one.  Is that
> increase in sys to be expected?
>

Thanks for the report.  I think the increased system time is harmless, but
I want to investigate further to be sure.  Except right now I'm preoccupied
with Mr. Xu's new bug.  So please remind me next week if I don't bring this
up again.  :-)

What OS are you using?

Can you share your database and test script with us?



>
> A few examples from many more I tried that all follow the same pattern
> (same query & database in each case):
>
> 3.8.6: Run Time: real 2.434 user 2.386815 sys 0.00
> 3.8.7: Run Time: real 1.856 user 1.778411 sys 0.062400
> ---
> 3.8.6: Run Time: real 584.465 user 560.293192 sys 1.638011
> 3.8.7: Run Time: real 518.227 user 430.469159 sys 53.617544
> ---
> 3.8.6: Run Time: real 2.449 user 2.340015 sys 0.046800
> 3.8.7: Run Time: real 1.935 user 1.794012 sys 0.046800
>
> (Thank you for two great solutions I use daily -- SQLite3 and Fossil)
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-20 Thread tonyp
In trying to see if the new version breaks any of my queries, I ran several 
of them repeatedly, and they all appear to have produced the expected 
output.


The only thing I noticed which maybe of interest in relation to speed 
performance was (with .timer on) that although the first two run time 
numbers (real & user) were consistently smaller in 3.8.7 (when compared to 
3.8.6), the third number (sys) was consistently higher (or same in one 
occasion).  I guess the first number is the actual time (in seconds) it took 
to run the query.  I don't even know what the 2nd and 3rd numbers represent, 
and how or if they maybe related to the first one.  Is that increase in sys 
to be expected?


A few examples from many more I tried that all follow the same pattern (same 
query & database in each case):


3.8.6: Run Time: real 2.434 user 2.386815 sys 0.00
3.8.7: Run Time: real 1.856 user 1.778411 sys 0.062400
---
3.8.6: Run Time: real 584.465 user 560.293192 sys 1.638011
3.8.7: Run Time: real 518.227 user 430.469159 sys 53.617544
---
3.8.6: Run Time: real 2.449 user 2.340015 sys 0.046800
3.8.7: Run Time: real 1.935 user 1.794012 sys 0.046800

(Thank you for two great solutions I use daily -- SQLite3 and Fossil)

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-19 Thread Stephen Chrzanowski
I, as well, wish to thank you for this tool Dr. Hipp.  I've never published
a public application using this engine, but, at my place of employment,
where my primary responsibility is to just monitor servers world wide, I've
coded a few tidbit web and desktop applications that have made my job SO
much easier.  Without this, my desktop apps would have to rely on MySQL,
which is massive overkill (resource wise) for some of the things I've
needed to use it for.

Thanks again!

On Fri, Sep 19, 2014 at 9:22 PM, jose isaias cabrera <
jic...@cinops.xerox.com> wrote:

> "Richard Hipp" wrote...
>
>
>
>  The latest SQLite 3.8.7 alpha version (available on the download page
>> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17
>> release
>> from 16 months ago.  That is to say, it does 50% more work using the same
>> number of CPU cycles.
>>
>> This performance gain is over and above the query planner improvements
>> that
>> have also been made.  We are constantly looking for new ways to run
>> queries
>> and adding those ways into the query planner.  For example, in the
>> previous
>> release, we added a new way to evaluate IN operators with non-constant
>> right-hand-sides that was reported on this mailing list to make some
>> queries run 5 times faster.
>>
>> The 50% faster number above is not about better query plans.  This is 50%
>> faster at the low-level grunt work of moving bits on and off disk and
>> search b-trees.  We have achieved this by incorporating hundreds of
>> micro-optimizations.  Each micro-optimization might improve the
>> performance
>> by as little as 0.05%.  If we get one that improves performance by 0.25%,
>> that is considered a huge win.  Each of these optimizations is
>> unmeasurable
>> on a real-world system (we have to use cachegrind to get repeatable
>> run-times) but if you do enough of them, they add up.
>>
>> A full 10% of the performance gain has come since the previous release.
>> There have been a lot of changes.  All our tests pass, and we still have
>> 100% branch test coverage, so we are confident that we didn't break too
>> much.  But your testing is an important part of our quality process.
>> Please download a source archive or a DLL and give the latest alpha a
>> whirl, and let us know if you encounter any problems.
>>
>> P.S.:  Measurements were done using the "speedtest1 --size 5" workload on
>> Ubuntu 10.13 and gcc 4.8.1 with -Os.  YMMV.  Version 3.7.17 requires
>> 1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU
>> cycles, as measured by cachegrind.
>>
>
> I don't know if folks have ever thank you, Dr. Hipp, for this wonderful
> gift to the world called SQLite.  I have become a legend in my own world
> with this tool. :-)  I do have to say that I have used it since 2006 and it
> has increased in speed every year.  Thank you!  Thank you!  And in my own
> native language, muchas gracias!
>
> josé
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-19 Thread jose isaias cabrera

"Richard Hipp" wrote...



The latest SQLite 3.8.7 alpha version (available on the download page
http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
from 16 months ago.  That is to say, it does 50% more work using the same
number of CPU cycles.

This performance gain is over and above the query planner improvements 
that
have also been made.  We are constantly looking for new ways to run 
queries
and adding those ways into the query planner.  For example, in the 
previous

release, we added a new way to evaluate IN operators with non-constant
right-hand-sides that was reported on this mailing list to make some
queries run 5 times faster.

The 50% faster number above is not about better query plans.  This is 50%
faster at the low-level grunt work of moving bits on and off disk and
search b-trees.  We have achieved this by incorporating hundreds of
micro-optimizations.  Each micro-optimization might improve the 
performance

by as little as 0.05%.  If we get one that improves performance by 0.25%,
that is considered a huge win.  Each of these optimizations is 
unmeasurable

on a real-world system (we have to use cachegrind to get repeatable
run-times) but if you do enough of them, they add up.

A full 10% of the performance gain has come since the previous release.
There have been a lot of changes.  All our tests pass, and we still have
100% branch test coverage, so we are confident that we didn't break too
much.  But your testing is an important part of our quality process.
Please download a source archive or a DLL and give the latest alpha a
whirl, and let us know if you encounter any problems.

P.S.:  Measurements were done using the "speedtest1 --size 5" workload on
Ubuntu 10.13 and gcc 4.8.1 with -Os.  YMMV.  Version 3.7.17 requires
1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU
cycles, as measured by cachegrind.


I don't know if folks have ever thank you, Dr. Hipp, for this wonderful gift 
to the world called SQLite.  I have become a legend in my own world with 
this tool. :-)  I do have to say that I have used it since 2006 and it has 
increased in speed every year.  Thank you!  Thank you!  And in my own native 
language, muchas gracias!


josé 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 50% faster than 3.7.17

2014-09-19 Thread Richard Hipp
The latest SQLite 3.8.7 alpha version (available on the download page
http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
from 16 months ago.  That is to say, it does 50% more work using the same
number of CPU cycles.

This performance gain is over and above the query planner improvements that
have also been made.  We are constantly looking for new ways to run queries
and adding those ways into the query planner.  For example, in the previous
release, we added a new way to evaluate IN operators with non-constant
right-hand-sides that was reported on this mailing list to make some
queries run 5 times faster.

The 50% faster number above is not about better query plans.  This is 50%
faster at the low-level grunt work of moving bits on and off disk and
search b-trees.  We have achieved this by incorporating hundreds of
micro-optimizations.  Each micro-optimization might improve the performance
by as little as 0.05%.  If we get one that improves performance by 0.25%,
that is considered a huge win.  Each of these optimizations is unmeasurable
on a real-world system (we have to use cachegrind to get repeatable
run-times) but if you do enough of them, they add up.

A full 10% of the performance gain has come since the previous release.
There have been a lot of changes.  All our tests pass, and we still have
100% branch test coverage, so we are confident that we didn't break too
much.  But your testing is an important part of our quality process.
Please download a source archive or a DLL and give the latest alpha a
whirl, and let us know if you encounter any problems.

P.S.:  Measurements were done using the "speedtest1 --size 5" workload on
Ubuntu 10.13 and gcc 4.8.1 with -Os.  YMMV.  Version 3.7.17 requires
1432835574 CPU cycles and the 3.8.7 alpha requires just 953861485 CPU
cycles, as measured by cachegrind.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users