[sqlite] Creating Histogram fast and efficiently :)

2008-12-31 Thread Jonathon
Hello all,

I have a column of numbers in a table, and I was wondering if it is possible
to create a histogram out of it fast and efficiently?

For example, assuming the data in the column is:  1, 5, 10, 12, 12, 15, 20,
20, 20.. I would like to return:

'less than 10' --> 2
'less than 20 and greater than or equal to 10' --> 4
'great than or equal to 20' --> 3

I hope that makes sense.  I checked all over Google, and it seems that
different databases seem to tackle this problem differently.  So.. I was
just curious how sqlite can help me make this calculation fast :)

Thanks,
J
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating Histogram fast and efficiently :)

2008-12-31 Thread Russell Leighton

create table numbers (val integer);

insert into numbers values (1);
insert into numbers values (5);


sqlite> select * from numbers order by val;
1
5
10
12
12
15
20
20
20



select case when val < 10 then 1 when val >=10 and val < 20 then 2  
else 3 end  as bin,
count(1) as c
from numbers  group by bin;


sqlite> select case when val < 10 then 1 when val >=10 and val < 20  
then 2 else 3 end  as bin,
...>count(1) as c
...> from numbers  group by bin;
1|2
2|4
3|3


On Dec 31, 2008, at 6:16 AM, Jonathon wrote:

> Hello all,
>
> I have a column of numbers in a table, and I was wondering if it is  
> possible
> to create a histogram out of it fast and efficiently?
>
> For example, assuming the data in the column is:  1, 5, 10, 12, 12,  
> 15, 20,
> 20, 20.. I would like to return:
>
> 'less than 10' --> 2
> 'less than 20 and greater than or equal to 10' --> 4
> 'great than or equal to 20' --> 3
>
> I hope that makes sense.  I checked all over Google, and it seems that
> different databases seem to tackle this problem differently.  So.. I  
> was
> just curious how sqlite can help me make this calculation fast :)
>
> Thanks,
> J
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file

2008-12-31 Thread Webb Sprague
See below.

On Sun, Dec 28, 2008 at 11:56 PM, Chris Wedgwood  wrote:
> On Sun, Dec 28, 2008 at 11:49:34PM -0800, Webb Sprague wrote:
>
>> I am sure there is a better way to deal with 12K rows by 2500 columns,
>> but I can't figure it out
>
> 2500 columns sounds like a nightmare to deal with
>
> could you perhaps explain that data layout a little?
>

It is a download of a huge longitudinal survey
(www.bls.gov/nls/nlsy79.htm) that has been converted out of the
proprietary format into SAS, and now I want to convert it into a
single SQLITE database per wave.  I will wind up connecting people by
ID across the waves to show patterns of moving etc...

For each wave/ table, each row describes contains integers that code
for information about a single respondent, such as age, whether
employed in June  (either zero or one), whether employed in July,
etc...  Since the NLSY doesn't do multiple tables, this is very much
NOT normalized.  What the codes mean is described in a separate
codebook (-5 = missing data, 1=living at home, etc).

There is a separate table for each wave (1979, 1980, ... 2006).

I have managed (just now) to get it working with a hacked version of
SQLITE.  Here is a meaningless query, just to confirm:

sqlite> select W0072400, count(*) as c  from data_stuff group by
W0072400 order by c desc limit 5;
0,9204
-5,2513
100,293
1,80
3,43
CPU Time: user 0.917062 sys 0.364962

Like I say, I may be going about it all wrong, but I can't run the
proprietary software on my Mac, and SQL makes me comfortable.  I hope
to pull out the data I want via SQL (a processed 1% of the total),
then run statistical analyses and graphics with R.

I am describing all this in hopes there is another quantitative
sociologist out there using SQLITE!

TIA
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Transaction?

2008-12-31 Thread John Hawley


OK here's the problem:-

I decided I needed a little app to manage some data in 3rd party binary 
files, specifically, I needed to store the user generated data from the 
files in a database so that I could retrieve the data in a massaged 
form. As others have also decided that they need to do the same thing I 
decided to generate a simple Windows native API app to do the task ( I 
have a command line Linux task that does the same thing but that sort of 
thing is not to everybody's taste!) . I downloaded the VC++ 2008 express 
edition, read up how Windows API worked and wrote the little app using 
sqlite as the serverless db engine. No problems so far, the app works as 
intended BUT

the following block of code executes 2 or 3 orders of magnitude slower 
than 1 think it should ( the db was opened in earlier code)

RecOffset=sizeof(rec);
fseek(infile,126,SEEK_SET);// data starts at byte 126
while (fread(,sizeof(rec),1,infile))
{
iSeq++;
GridY= (long) rec.lat*LATGRIDFACTOR + 0.5;
GridX= (long) rec.longt*LongGridFactor(rec.lat) + 0.5;
iGS =GridX*1000 + GridY;
sprintf_s(SqlStr,200,"INSERT into soundings 
VALUES(%lf,%lf,%f,%s,%d,%ld);",rec.lat,
rec.longt,rec.depth,DateStr,iSeq,iGS);
rc= sqlite3_exec(db,SqlStr,NULL,0,);
if(rc!=SQLITE_OK)
{
MessageBox(hwnd,zErrMsg,"SQL Error adding soundings 
data",MB_OK);
exit(1);
}   

}

The whole program will read in 10,000 records /sec if

rc= sqlite3_exec(db,SqlStr,NULL,0,);
if(rc!=SQLITE_OK)
{
MessageBox(hwnd,zErrMsg,"SQL Error adding soundings 
data",MB_OK);
exit(1);
}   

is commented out, with the code in place, it reads around 10 records/sec 
( but does it correctly) with masses of disk i/o but very little cpu 
activity.

 
I have tried using the sqlite subroutines as both static libraries, 
DLL's and on the last attempt, I just added the sqlite3 sources and 
headers to the project and compiled the lot together. All work but slowly!


initially I suspected that I've somehow done something wrong with the 
compiler/ linker settings  but now I think its probable that I should be 
writing the records using a transaction.

I'm not sure how I would code this to speed things up - any tips would 
be greatly appreciated.

Many thanks


Regards
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Transaction?

2008-12-31 Thread Derrell Lipman
On Wed, Dec 31, 2008 at 11:36 AM, John Hawley
wrote:

> the following block of code executes 2 or 3 orders of magnitude slower
> than 1 think it should ( the db was opened in earlier code)
>

Yup, you're committing each insert to disk (a VERY slow process) rather than
waiting until you've inserted all 1 entries before committing to disk.
Insert code at the following marked places for dramatically increased
performance:

>
>RecOffset=sizeof(rec);
>fseek(infile,126,SEEK_SET);// data starts at byte 126
>

  /* Begin a transaction */
  rc = sqlite3_exec(db, "BEGIN;", NULL, 0, );
  if (rc != SQLITE_OK) { /* do error handling */ }

   while (fread(,sizeof(rec),1,infile))
>{
>iSeq++;
>GridY= (long) rec.lat*LATGRIDFACTOR + 0.5;
>GridX= (long) rec.longt*LongGridFactor(rec.lat) + 0.5;
>iGS =GridX*1000 + GridY;
>sprintf_s(SqlStr,200,"INSERT into soundings
> VALUES(%lf,%lf,%f,%s,%d,%ld);",rec.lat,
>rec.longt,rec.depth,DateStr,iSeq,iGS);
>rc= sqlite3_exec(db,SqlStr,NULL,0,);
>if(rc!=SQLITE_OK)
>{
>MessageBox(hwnd,zErrMsg,"SQL Error adding soundings
> data",MB_OK);
>

  ./* Rollback the transaction.  No need for error checking as journal
will recover next time anyway. */
  (void) sqlite3_exec(db, "ROLLBACK;", NULL, 0, NULL);

>exit(1);
>}
>
>}
>

  /* Commit the transaction.  This will be your single slow operation
rather than 1 of them */
  rc = sqlite3_exec(db, "COMMIT;", NULL, 0, );
  if (rc != SQLITE_OK) { /* do error handling */ }

Derrell
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very widefile

2008-12-31 Thread Griggs, Donald
Regarding:
>> I am sure there is a better way to deal with 12K rows by 2500 
>> columns, but I can't figure it out

I wonder if you might want to use *sed* or *awk* or *perl* to preprocess
the data before import.

A "master" table could contain the unique person id, plus the fields
that you intend to index and that you are likely to filter upon most
often.  Other tables could exist for the remaining data, and could be
joined on the person id as needed.

This might:
   -- let you avoid a customized version of sqlite
   -- allow your most-used queries to run faster

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Recording history of changes to schema

2008-12-31 Thread BareFeet
Hi all,

I want to set up a history of changes in a database schema. I hoped I  
could simply set up triggers for changes to the SQLite_Master table,  
like this:

create table "BF SQLite_Master History" -- record changes to  
SQLite_Master
(
  ID integer primary key
, Date date -- julianday utc of the date & time of 
the change
, Event text-- insert, delete, or update
, SQL text  -- SQL prior to change for delete and 
update, after  
change for insert
)
;

create trigger "BF SQLite_Master Insert"
before insert
on SQLite_Master
begin
insert into "BF SQLite_Master History"
(
  Date
, Event
, SQL
)
select
  julianday('now', 'utc')
, 'insert'
, new.SQL
;
end
;

But I get an error:

SQL error near line 1: cannot create trigger on system table

Is it possible to enable this functionality?

Thanks,
Tom
BareFeet

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating Histogram fast and efficiently :)

2008-12-31 Thread Jonathon
Thanks Russell :)

It works great.

J

On Wed, Dec 31, 2008 at 5:52 AM, Russell Leighton  wrote:

>
> create table numbers (val integer);
>
> insert into numbers values (1);
> insert into numbers values (5);
> 
>
> sqlite> select * from numbers order by val;
> 1
> 5
> 10
> 12
> 12
> 15
> 20
> 20
> 20
>
>
>
> select case when val < 10 then 1 when val >=10 and val < 20 then 2
> else 3 end  as bin,
>count(1) as c
> from numbers  group by bin;
>
>
> sqlite> select case when val < 10 then 1 when val >=10 and val < 20
> then 2 else 3 end  as bin,
>...>count(1) as c
>...> from numbers  group by bin;
> 1|2
> 2|4
> 3|3
>
>
> On Dec 31, 2008, at 6:16 AM, Jonathon wrote:
>
> > Hello all,
> >
> > I have a column of numbers in a table, and I was wondering if it is
> > possible
> > to create a histogram out of it fast and efficiently?
> >
> > For example, assuming the data in the column is:  1, 5, 10, 12, 12,
> > 15, 20,
> > 20, 20.. I would like to return:
> >
> > 'less than 10' --> 2
> > 'less than 20 and greater than or equal to 10' --> 4
> > 'great than or equal to 20' --> 3
> >
> > I hope that makes sense.  I checked all over Google, and it seems that
> > different databases seem to tackle this problem differently.  So.. I
> > was
> > just curious how sqlite can help me make this calculation fast :)
> >
> > Thanks,
> > J
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users