Re: [sqlite] how is "pragma threads = 4" working

2016-09-19 Thread James K. Lowden
On Fri, 16 Sep 2016 07:29:28 -0400
Richard Hipp  wrote:

> The algorithm used for "ORDER BY ... LIMIT N" uses much less memory
> than a full-up "ORDER BY" because is only keeps track of the top N
> entries seen so far, discarding the rest.  But it also only uses a
> single thread.  

My immediate thought was that this is an optimization opportunity.  As
the OP alludes to, N is the sum of LIMIT and OFFSET. 

Would you have information on how these are typically used?  My guess
is that the LIMIT argument is typically small, less than 20, but that
OFFSET marches on, and grows to be a significant fraction of the
table.  

If LIMIT N is small and OFFSET is not used, a memory-efficient,
nonlocking parallel algorithm would reserve N slots for each thread,
and divide the table among the threads, each processing 1/threads
rows.  Then merge-sort their outputs.  

Humbly submitted, 

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how is "pragma threads = 4" working

2016-09-17 Thread Венцислав Русев


On 08/15/2016 12:02 AM, Венцислав Русев wrote:
My computer has 4 cores. I have compile sqlite like this "gcc 
-DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 
shell.c sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and 
found that "pragma threads = 4" doesn't decrease runtime of the query 
that sorts 1 milion records.


SQLite version 3.8.8
sqlite> pragma threads;
4
sqlite> CREATE TABLE event (
ID INTEGER PRIMARY KEY NOT NULL,
date   INTEGER NOT NULL,
value  INTEGER NOT NULL );
sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1;
0|0|0|SCAN TABLE event
0|0|0|USE TEMP B-TREE FOR ORDER BY
sqlite> SELECT ID FROM event ORDER BY date LIMIT 1;
4101021
Run Time: real 2.493 user 2.426000 sys 0.049000
sqlite> pragma threads = 0;
0
sqlite> SELECT ID FROM event ORDER BY date LIMIT 1;
4101021
Run Time: real 2.484 user 2.421000 sys 0.044000

To sum it up I have:
  - multi core cpu;
  - sqlite that is compiled to use "auxiliary threads";
  - a table with 1 million records;
  - a query that scans through the table, sorts all records and 
outputs the first ID.


sqlite doesn't use these "auxiliary threads" that sqlite docs talks 
about and the runtime of that query is the same with or without 
"pragma threads = 4".


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Some clarifications:
  - OS: Ubuntu 14.04 LTS;
  - "PRAGMA compile_options" gives "SYSTEM_MALLOC  THREADSAFE=1";
  - when sqlite shell is running and no query is running at the moment 
the OS reports that it has only one thread;



On 09/16/2016 11:29 AM, Richard Hipp wrote:


The algorithm used for "ORDER BY ... LIMIT N" uses much less memory 
than a full-up "ORDER BY" because is only keeps track of the top N 
entries seen so far, discarding the rest. But it also only uses a 
single thread. If you want multiple threads to be used, you'll need to 
drop the LIMIT, though I imagine that would defeat your purpose, no?


Richard Hipp has a point. I searched the sources and found that the 
sorting algorithm uses threads only if there isn't a LIMIT clause.
To be exact the sources say "This file contains code for the VdbeSorter 
object, used in concert with a VdbeCursor to sort large numbers of keys 
for CREATE INDEX statements or by SELECT statements with ORDER BY 
clauses that cannot be satisfied using indexes and without LIMIT clauses."


But I'm wondering what if your query is like so "SELECT ID FROM event 
ORDER BY date LIMIT 1 OFFSET 100;"? Isn't it better to sort the 
results using available threads?



On 09/16/2016 12:13 PM, Adam Devita wrote:


Have you proven that the cpu is the bottleneck? Sorting a million rows 
seems like a lot, but even older single core cpus may be capable of 2 
billion ops per second. [I apologize if this has been sorted out 
already I've got about 2 days of history on the thread] regards, 
Adam DeVita


This was just a test to see the performance of "PRAGMA threads". 
Actually I expect to sort 10-30 million rows or more. But after the 
tests I may choose the index.



Just to let you know what other tests I've conducted.

sqlite> .output stdout

sqlite> pragma threads = 4;
sqlite> select ID from event order by date;
Run Time: real 2.880 user 4.469000 sys 0.147000

sqlite> pragma threads = 0;
sqlite> select ID from event order by date;
Run Time: real 3.870 user 3.771000 sys 0.075000

Besides the runtime difference I inspected the process information while 
the query is running. When "pragma threads = 4;" sqlite shell starts new 
threads for sorting and stops them after the query has finished.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how is "pragma threads = 4" working

2016-09-16 Thread J Decker
that is a single sql statement though ; it doesn't spread itself amongst
threads, the compile option says 'limits number of threads' so if you had
an application (not sqlite shell) that had threads it could have a limit?

On Fri, Sep 16, 2016 at 5:13 AM, Adam Devita  wrote:

> Have you proven that the cpu is the bottleneck? Sorting a million rows
> seems like a lot, but even older single core cpus may be capable of 2
> billion ops per second. [I apologize if this has been sorted out
> already I've got about 2 days of history on the thread]
>
> regards,
> Adam DeVita
>
> On Fri, Sep 16, 2016 at 7:56 AM, Stephen Chrzanowski 
> wrote:
>
> > Although programmatically easily done, from the SQLite point of view,
> what
> > if that query, sans LIMIT, were a subquery, and the limit was put out on
> > the outside?  Would the inner query execute, use all the threads, then
> > return just one row on the outer?
> >
> > On Fri, Sep 16, 2016 at 7:29 AM, Richard Hipp  wrote:
> >
> > > On 8/14/16, Венцислав Русев  wrote:
> > > > My computer has 4 cores. I have compile sqlite like this "gcc
> > > > -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4
> > shell.c
> > > > sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found
> that
> > > > "pragma threads = 4" doesn't decrease runtime of the query that
> sorts 1
> > > > milion records.
> > > >
> > > > SQLite version 3.8.8
> > > > sqlite> pragma threads;
> > > > 4
> > > > sqlite> CREATE TABLE event (
> > > >  ID INTEGER PRIMARY KEY NOT NULL,
> > > >  date   INTEGER NOT NULL,
> > > >  value  INTEGER NOT NULL );
> > > > sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT
> 1;
> > >
> > > The algorithm used for "ORDER BY ... LIMIT N" uses much less memory
> > > than a full-up "ORDER BY" because is only keeps track of the top N
> > > entries seen so far, discarding the rest.  But it also only uses a
> > > single thread.  If you want multiple threads to be used, you'll need
> > > to drop the LIMIT, though I imagine that would defeat your purpose,
> > > no?
> > >
> > > --
> > > D. Richard Hipp
> > > d...@sqlite.org
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users@mailinglists.sqlite.org
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
>
> --
> --
> VerifEye Technologies Inc.
> 151 Whitehall Dr. Unit 2
> Markham, ON
> L3R 9T1
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how is "pragma threads = 4" working

2016-09-16 Thread Adam Devita
Have you proven that the cpu is the bottleneck? Sorting a million rows
seems like a lot, but even older single core cpus may be capable of 2
billion ops per second. [I apologize if this has been sorted out
already I've got about 2 days of history on the thread]

regards,
Adam DeVita

On Fri, Sep 16, 2016 at 7:56 AM, Stephen Chrzanowski 
wrote:

> Although programmatically easily done, from the SQLite point of view, what
> if that query, sans LIMIT, were a subquery, and the limit was put out on
> the outside?  Would the inner query execute, use all the threads, then
> return just one row on the outer?
>
> On Fri, Sep 16, 2016 at 7:29 AM, Richard Hipp  wrote:
>
> > On 8/14/16, Венцислав Русев  wrote:
> > > My computer has 4 cores. I have compile sqlite like this "gcc
> > > -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4
> shell.c
> > > sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found that
> > > "pragma threads = 4" doesn't decrease runtime of the query that sorts 1
> > > milion records.
> > >
> > > SQLite version 3.8.8
> > > sqlite> pragma threads;
> > > 4
> > > sqlite> CREATE TABLE event (
> > >  ID INTEGER PRIMARY KEY NOT NULL,
> > >  date   INTEGER NOT NULL,
> > >  value  INTEGER NOT NULL );
> > > sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1;
> >
> > The algorithm used for "ORDER BY ... LIMIT N" uses much less memory
> > than a full-up "ORDER BY" because is only keeps track of the top N
> > entries seen so far, discarding the rest.  But it also only uses a
> > single thread.  If you want multiple threads to be used, you'll need
> > to drop the LIMIT, though I imagine that would defeat your purpose,
> > no?
> >
> > --
> > D. Richard Hipp
> > d...@sqlite.org
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
--
VerifEye Technologies Inc.
151 Whitehall Dr. Unit 2
Markham, ON
L3R 9T1
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how is "pragma threads = 4" working

2016-09-16 Thread Stephen Chrzanowski
Although programmatically easily done, from the SQLite point of view, what
if that query, sans LIMIT, were a subquery, and the limit was put out on
the outside?  Would the inner query execute, use all the threads, then
return just one row on the outer?

On Fri, Sep 16, 2016 at 7:29 AM, Richard Hipp  wrote:

> On 8/14/16, Венцислав Русев  wrote:
> > My computer has 4 cores. I have compile sqlite like this "gcc
> > -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 shell.c
> > sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found that
> > "pragma threads = 4" doesn't decrease runtime of the query that sorts 1
> > milion records.
> >
> > SQLite version 3.8.8
> > sqlite> pragma threads;
> > 4
> > sqlite> CREATE TABLE event (
> >  ID INTEGER PRIMARY KEY NOT NULL,
> >  date   INTEGER NOT NULL,
> >  value  INTEGER NOT NULL );
> > sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1;
>
> The algorithm used for "ORDER BY ... LIMIT N" uses much less memory
> than a full-up "ORDER BY" because is only keeps track of the top N
> entries seen so far, discarding the rest.  But it also only uses a
> single thread.  If you want multiple threads to be used, you'll need
> to drop the LIMIT, though I imagine that would defeat your purpose,
> no?
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how is "pragma threads = 4" working

2016-09-16 Thread Richard Hipp
On 8/14/16, Венцислав Русев  wrote:
> My computer has 4 cores. I have compile sqlite like this "gcc
> -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 shell.c
> sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found that
> "pragma threads = 4" doesn't decrease runtime of the query that sorts 1
> milion records.
>
> SQLite version 3.8.8
> sqlite> pragma threads;
> 4
> sqlite> CREATE TABLE event (
>  ID INTEGER PRIMARY KEY NOT NULL,
>  date   INTEGER NOT NULL,
>  value  INTEGER NOT NULL );
> sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1;

The algorithm used for "ORDER BY ... LIMIT N" uses much less memory
than a full-up "ORDER BY" because is only keeps track of the top N
entries seen so far, discarding the rest.  But it also only uses a
single thread.  If you want multiple threads to be used, you'll need
to drop the LIMIT, though I imagine that would defeat your purpose,
no?

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how is "pragma threads = 4" working

2016-09-16 Thread Simon Slavin

On 15 Aug 2016, at 1:02am, Венцислав Русев  wrote:

> sqlite doesn't use these "auxiliary threads" that sqlite docs talks about and 
> the runtime of that query is the same with or without "pragma threads = 4".

I cannot solve your problem, but this information may help the person who does.

Which OS (include rough version number) are you running ?

What does "PRAGMA compile_options" say about THREADSAFE ?

Let's suppose that SQLite does actually launch those threads but they make no 
change in the time the command takes for some reason.  Does your OS have a 
process tracker which can show you how many threads are in use ?  For instance, 
in macOS I'd look at the "Activity Monitor.app" main window.  In Windows I'd 
use Task Manager, select the process and click "Properties".  You may have to 
increase the table size to make it take long enough to monitor.

I'm just looking for some confirmation apart from the time taken.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users