[sqlite] Creating Histogram fast and efficiently :)
Hello all, I have a column of numbers in a table, and I was wondering if it is possible to create a histogram out of it fast and efficiently? For example, assuming the data in the column is: 1, 5, 10, 12, 12, 15, 20, 20, 20.. I would like to return: 'less than 10' --> 2 'less than 20 and greater than or equal to 10' --> 4 'great than or equal to 20' --> 3 I hope that makes sense. I checked all over Google, and it seems that different databases seem to tackle this problem differently. So.. I was just curious how sqlite can help me make this calculation fast :) Thanks, J ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Creating Histogram fast and efficiently :)
create table numbers (val integer); insert into numbers values (1); insert into numbers values (5); sqlite> select * from numbers order by val; 1 5 10 12 12 15 20 20 20 select case when val < 10 then 1 when val >=10 and val < 20 then 2 else 3 end as bin, count(1) as c from numbers group by bin; sqlite> select case when val < 10 then 1 when val >=10 and val < 20 then 2 else 3 end as bin, ...>count(1) as c ...> from numbers group by bin; 1|2 2|4 3|3 On Dec 31, 2008, at 6:16 AM, Jonathon wrote: > Hello all, > > I have a column of numbers in a table, and I was wondering if it is > possible > to create a histogram out of it fast and efficiently? > > For example, assuming the data in the column is: 1, 5, 10, 12, 12, > 15, 20, > 20, 20.. I would like to return: > > 'less than 10' --> 2 > 'less than 20 and greater than or equal to 10' --> 4 > 'great than or equal to 20' --> 3 > > I hope that makes sense. I checked all over Google, and it seems that > different databases seem to tackle this problem differently. So.. I > was > just curious how sqlite can help me make this calculation fast :) > > Thanks, > J > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very wide file
See below. On Sun, Dec 28, 2008 at 11:56 PM, Chris Wedgwoodwrote: > On Sun, Dec 28, 2008 at 11:49:34PM -0800, Webb Sprague wrote: > >> I am sure there is a better way to deal with 12K rows by 2500 columns, >> but I can't figure it out > > 2500 columns sounds like a nightmare to deal with > > could you perhaps explain that data layout a little? > It is a download of a huge longitudinal survey (www.bls.gov/nls/nlsy79.htm) that has been converted out of the proprietary format into SAS, and now I want to convert it into a single SQLITE database per wave. I will wind up connecting people by ID across the waves to show patterns of moving etc... For each wave/ table, each row describes contains integers that code for information about a single respondent, such as age, whether employed in June (either zero or one), whether employed in July, etc... Since the NLSY doesn't do multiple tables, this is very much NOT normalized. What the codes mean is described in a separate codebook (-5 = missing data, 1=living at home, etc). There is a separate table for each wave (1979, 1980, ... 2006). I have managed (just now) to get it working with a hacked version of SQLITE. Here is a meaningless query, just to confirm: sqlite> select W0072400, count(*) as c from data_stuff group by W0072400 order by c desc limit 5; 0,9204 -5,2513 100,293 1,80 3,43 CPU Time: user 0.917062 sys 0.364962 Like I say, I may be going about it all wrong, but I can't run the proprietary software on my Mac, and SQL makes me comfortable. I hope to pull out the data I want via SQL (a processed 1% of the total), then run statistical analyses and graphics with R. I am describing all this in hopes there is another quantitative sociologist out there using SQLITE! TIA ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Transaction?
OK here's the problem:- I decided I needed a little app to manage some data in 3rd party binary files, specifically, I needed to store the user generated data from the files in a database so that I could retrieve the data in a massaged form. As others have also decided that they need to do the same thing I decided to generate a simple Windows native API app to do the task ( I have a command line Linux task that does the same thing but that sort of thing is not to everybody's taste!) . I downloaded the VC++ 2008 express edition, read up how Windows API worked and wrote the little app using sqlite as the serverless db engine. No problems so far, the app works as intended BUT the following block of code executes 2 or 3 orders of magnitude slower than 1 think it should ( the db was opened in earlier code) RecOffset=sizeof(rec); fseek(infile,126,SEEK_SET);// data starts at byte 126 while (fread(,sizeof(rec),1,infile)) { iSeq++; GridY= (long) rec.lat*LATGRIDFACTOR + 0.5; GridX= (long) rec.longt*LongGridFactor(rec.lat) + 0.5; iGS =GridX*1000 + GridY; sprintf_s(SqlStr,200,"INSERT into soundings VALUES(%lf,%lf,%f,%s,%d,%ld);",rec.lat, rec.longt,rec.depth,DateStr,iSeq,iGS); rc= sqlite3_exec(db,SqlStr,NULL,0,); if(rc!=SQLITE_OK) { MessageBox(hwnd,zErrMsg,"SQL Error adding soundings data",MB_OK); exit(1); } } The whole program will read in 10,000 records /sec if rc= sqlite3_exec(db,SqlStr,NULL,0,); if(rc!=SQLITE_OK) { MessageBox(hwnd,zErrMsg,"SQL Error adding soundings data",MB_OK); exit(1); } is commented out, with the code in place, it reads around 10 records/sec ( but does it correctly) with masses of disk i/o but very little cpu activity. I have tried using the sqlite subroutines as both static libraries, DLL's and on the last attempt, I just added the sqlite3 sources and headers to the project and compiled the lot together. All work but slowly! initially I suspected that I've somehow done something wrong with the compiler/ linker settings but now I think its probable that I should be writing the records using a transaction. I'm not sure how I would code this to speed things up - any tips would be greatly appreciated. Many thanks Regards ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Transaction?
On Wed, Dec 31, 2008 at 11:36 AM, John Hawleywrote: > the following block of code executes 2 or 3 orders of magnitude slower > than 1 think it should ( the db was opened in earlier code) > Yup, you're committing each insert to disk (a VERY slow process) rather than waiting until you've inserted all 1 entries before committing to disk. Insert code at the following marked places for dramatically increased performance: > >RecOffset=sizeof(rec); >fseek(infile,126,SEEK_SET);// data starts at byte 126 > /* Begin a transaction */ rc = sqlite3_exec(db, "BEGIN;", NULL, 0, ); if (rc != SQLITE_OK) { /* do error handling */ } while (fread(,sizeof(rec),1,infile)) >{ >iSeq++; >GridY= (long) rec.lat*LATGRIDFACTOR + 0.5; >GridX= (long) rec.longt*LongGridFactor(rec.lat) + 0.5; >iGS =GridX*1000 + GridY; >sprintf_s(SqlStr,200,"INSERT into soundings > VALUES(%lf,%lf,%f,%s,%d,%ld);",rec.lat, >rec.longt,rec.depth,DateStr,iSeq,iGS); >rc= sqlite3_exec(db,SqlStr,NULL,0,); >if(rc!=SQLITE_OK) >{ >MessageBox(hwnd,zErrMsg,"SQL Error adding soundings > data",MB_OK); > ./* Rollback the transaction. No need for error checking as journal will recover next time anyway. */ (void) sqlite3_exec(db, "ROLLBACK;", NULL, 0, NULL); >exit(1); >} > >} > /* Commit the transaction. This will be your single slow operation rather than 1 of them */ rc = sqlite3_exec(db, "COMMIT;", NULL, 0, ); if (rc != SQLITE_OK) { /* do error handling */ } Derrell ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLITE_MAX_VARIABLE_NUMBER and .import for very widefile
Regarding: >> I am sure there is a better way to deal with 12K rows by 2500 >> columns, but I can't figure it out I wonder if you might want to use *sed* or *awk* or *perl* to preprocess the data before import. A "master" table could contain the unique person id, plus the fields that you intend to index and that you are likely to filter upon most often. Other tables could exist for the remaining data, and could be joined on the person id as needed. This might: -- let you avoid a customized version of sqlite -- allow your most-used queries to run faster ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Recording history of changes to schema
Hi all, I want to set up a history of changes in a database schema. I hoped I could simply set up triggers for changes to the SQLite_Master table, like this: create table "BF SQLite_Master History" -- record changes to SQLite_Master ( ID integer primary key , Date date -- julianday utc of the date & time of the change , Event text-- insert, delete, or update , SQL text -- SQL prior to change for delete and update, after change for insert ) ; create trigger "BF SQLite_Master Insert" before insert on SQLite_Master begin insert into "BF SQLite_Master History" ( Date , Event , SQL ) select julianday('now', 'utc') , 'insert' , new.SQL ; end ; But I get an error: SQL error near line 1: cannot create trigger on system table Is it possible to enable this functionality? Thanks, Tom BareFeet ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Creating Histogram fast and efficiently :)
Thanks Russell :) It works great. J On Wed, Dec 31, 2008 at 5:52 AM, Russell Leightonwrote: > > create table numbers (val integer); > > insert into numbers values (1); > insert into numbers values (5); > > > sqlite> select * from numbers order by val; > 1 > 5 > 10 > 12 > 12 > 15 > 20 > 20 > 20 > > > > select case when val < 10 then 1 when val >=10 and val < 20 then 2 > else 3 end as bin, >count(1) as c > from numbers group by bin; > > > sqlite> select case when val < 10 then 1 when val >=10 and val < 20 > then 2 else 3 end as bin, >...>count(1) as c >...> from numbers group by bin; > 1|2 > 2|4 > 3|3 > > > On Dec 31, 2008, at 6:16 AM, Jonathon wrote: > > > Hello all, > > > > I have a column of numbers in a table, and I was wondering if it is > > possible > > to create a histogram out of it fast and efficiently? > > > > For example, assuming the data in the column is: 1, 5, 10, 12, 12, > > 15, 20, > > 20, 20.. I would like to return: > > > > 'less than 10' --> 2 > > 'less than 20 and greater than or equal to 10' --> 4 > > 'great than or equal to 20' --> 3 > > > > I hope that makes sense. I checked all over Google, and it seems that > > different databases seem to tackle this problem differently. So.. I > > was > > just curious how sqlite can help me make this calculation fast :) > > > > Thanks, > > J > > ___ > > sqlite-users mailing list > > sqlite-users@sqlite.org > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users