On 29 Apr 2015, at 3:22pm, Jason Vas Dias <jason.vas.dias at gmail.com> wrote:

> My question is simply how to get sqlite3 to use multiple threads to create the
> temporary table from a query

If you are working on a database on disk
        (rather than one in memory),
and most of your time is spend storing data in the table
        (rather than calculating the values to be stored)
then most of the time spent is spent on waiting for your storage medium and 
threading is a red herring.

> Currently, I have not had enough time to
> wait for SQLite to create the temporary table ( I tried for over 2
> hours)

How many rows were you trying to store ?  You mentioned 2 million.  If it's 
taking SQLite 2 hours to store 2 million rows you're doing something wrong.  
Threads are the wrong way to fix this problem.  Any attempt to split this up 
into threads will just have all the threads blocking each other as they wait 
for access to the database.

> My question is simply how to get sqlite3 to use multiple threads to create the
> temporary table from a query,  which is computing minima, averages and maxima
> over about 2 million floating point measurements that correspond to a
> set of "GROUP BY" selectors, and then back-tracking to count how many
> measurements M with the same
> selectors  fall into one of 4 "quartiles" :
>   1:  min(M) to Q1: ( min(M) + (  (max(M) - min(M) ) /4 )
>   2:  Q1  to      Q2: ( min(M) + ( (max(M) - min(M) ) /2 ) )
>   3:  Q2  to      Q3: ( min(M) + ( (max(M) - min(M) ) * 1.75) )
>   4:  Q3  to max(M)
> (the goal is eventually to plot these measurements to show the rough
> distribution
> of measurements between the maximum and minimum along with the average on
> the same plot).

I suspect you're making a common mistake, and trying to devise one (maybe two) 
SQL commands which does all of this.  SQL is meant for storage and retrieval, 
not interval maths. Here is a suggested way of proceeding.

A procedure which stores your 'M' values and selectors in a table which has an 
extra column called 'quartile' you don't use yet, and an index on (selector,M).
A procedure which reads a list of groups with minima and maxima into an array 
of triple elements (something like "SELECT min(M),max(M) FROM myTable GROUP BY 
myCategory").
A procedure which iterates through that array and does calculations to work out 
where the quartile boundaries are.  Then uses an UPDATE command to store the 
appropriate value in the quartile column (something like "UPDATE myTable SET 
quartile=CASE WHEN m<q1 THEN "1" WHEN m<q2 THEN "2" WHEN ... END WHERE 
myCategory=%thisCat").

This does the things SQLite is good at in SQLite, and the things programming 
languages are good at in your programming language.  It contains three 
different kinds of SQLite command, each of which will execute very quickly 
using an index.  It produces results for each stage which can be easily checked 
for accuracy while you're debugging.  And it produces a TABLE which clearly 
lays out your counts for later use.

Simon.

Reply via email to