Re: [sqlite] Help with SQL and index (or schema?)
> In the sub-SELECT you use * so you get all columns. That's > not necessary. > Ah, thanks! > It might be better to rewrite as a self-join using LEFT > OUTER JOIN on filename and filepath, using an NULL istarget > from one or the other alias as an indication that target > instance of the file is missing. > > Something like: > > SELECT source.filepath, source.filename > FROM files AS source > LEFT OUTER JOIN files AS target > USING (filepath,filename) > WHERE source.istarget = 0 > AND target.istarget IS NULL > ORDER BY whatever you like; > > (untested) > But with this solution I have to iterate through the whole resultset, and will not this iteration of 137 000 files be kinda slow? > You could normalise some more, by creating a second table > > CREATE TABLE Pathnames ( > pathid INTEGER PRIMARY KEY, > pathname UNIQUE > ); > and referring to its with a foreign key in the file table. > > CREATE TABLE Files ( > pathid INTEGER > CONSTRAINT fk_path REFERENCES Pathnames (pathid) > ON INSERT RESTRICT > ON UPDATE RESTRICT, > filename TEXT, > istarget INTEGER, > PRIMARY KEY (filepathid,filename,istarget) > ); > > This reduces the overall database size, improves the amount > of unique data in a database page, and above that comparing > integers is faster than strings. > Yeah, I know I could normalize, but I thought it added more complexity than was required but perhaps not... > I can't help noticing that comparing two directories in the > same system would cause the source and target paths to be > different by definition (assuming the filepath column > represents the absolute path), but I guess you are aware of > that :) Heh, yes. The paths are stored as relative so c:\my documents\folder1\file1 and d:\backup\folder1\file1 are both stored as folder1\file1 in the database. In any case, I'll try your suggestions with primary key and normalization and see how it goes. Thanks! /Jonas ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using the static lib of sqlite3 under 64 bit ubuntu
On Mar 29, 2009, at 10:01 PM, Danny De Keuleneire wrote: > Settings: > > Gcc 4.3.3 > Added usr/lib64/libsqlite3.a > > Error: > > Undefined reference to 'pthread_mutex_trylock' > > What I am missing? -lpthread -ldl ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] creating unique data takes many hours, help
Hi Michael, Two thoughts -- and I hope others will chime in if I'm off-base here: 1) Build just one index after the import: >From page: http://www.sqlite.org/lang_createtable.html "The UNIQUE constraint causes an unique index to be created on the specified columns." I think that if you then create your own index, it will be redundant, and, with millions of records, time-consuming. However, you may well want to KEEP your own index (and add the UNIQUE constrait to it), and *remove* the UNIQUE constraint on the table column definition. I hear that creating the index after the data has been imported is faster (and less fragmented) than creating an implicit one on-the-fly during import. 2) Hashing If Vinnie [thev...@yahoo.com] was correct in guessing your goal (to determine if a given string has been examined before), and if performance remains a problem, I wondered if the design might benefit from using hashing to "pre-qualify" a string and/or substitute for a direct match. A short hash (perhaps a 64-bit integer?) could be stored in a separate table or even a separate database from the strings themselves. If the hash of a new string does not match any hash in the database, then you know there is no need to search for the string itself. If the entire hash index can be kept in RAM, you might get a big benefit. You could decide to take this a step further by using a longer hash as a proxy for the string itself. The hash would need to be long enough to make collisions extremely unlikely, yet not so long as to negate the value of using a proxy for the original strings. In practice, you'd probably want to compute a single long hash for each string, then use the first X bytes as the "pre-qualifying" hash, and the remainder as the "confirming hash." If your average string length is short, then using *two* hashes may not be much of a gain. (Of course, since this is not a cryptographic application, you don't need to worry about whether your chosen hash algorithm is "secure" or not.) Regards, Donald -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Vinnie Sent: Sunday, March 29, 2009 6:14 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] creating unique data takes many hours, help Michael: While my answer isn't strictly limited to sqlite, the performance nut in me can't help myself. You have a a table with only one column, a string. I could be wrong here but it seems like you just want to keep a list of values that you have already tried. After you insert a bunch of strings into the table you want to be able to quickly look up a string to see if it exists, so that you can tell if you already probed that sequence (taking a guess here). If my guess is right, and the only thing you are doing is looking up sorted single-column elements, you probably can get away with your own quick disk-based binary tree implementation and avoid sqlite for this particular circumstance altogether. The result would be several orders of magnitude faster, even after you have followed the suggestions others have given. > Hi, > > I am new with sqlite, and I create a program that reads several mllion > records and puts them into a sqlite db using. > > The table has one column ONLY indexed and unique, but it takes many > hours. > > Is it necessary to pre-allocate the space, or is the anything that I > can do to reduce the time it takes. > > this is how I create the db, table and index. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] creating unique data takes many hours, help
Michael: While my answer isn't strictly limited to sqlite, the performance nut in me can't help myself. You have a a table with only one column, a string. I could be wrong here but it seems like you just want to keep a list of values that you have already tried. After you insert a bunch of strings into the table you want to be able to quickly look up a string to see if it exists, so that you can tell if you already probed that sequence (taking a guess here). If my guess is right, and the only thing you are doing is looking up sorted single-column elements, you probably can get away with your own quick disk-based binary tree implementation and avoid sqlite for this particular circumstance altogether. The result would be several orders of magnitude faster, even after you have followed the suggestions others have given. > Hi, > > I am new with sqlite, and I create a program that reads > several mllion > records and puts them into a sqlite db using. > > The table has one column ONLY indexed and unique, but it > takes many hours. > > Is it necessary to pre-allocate the space, or is the > anything that I can > do to reduce the time it takes. > > this is how I create the db, table and index. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] creating unique data takes many hours, help
Yes, you're right, but if the data is already in index order, you'll do less I/O when creating the index. Whether the sort + create DB time is less than "create DB from random input" time is another question. Jim On 3/29/09, mrobi...@cs.fiu.edu wrote: > question: > When you say >> 2) Sort your input file on the PRIMARY KEY >> or on some other INDEX > > I thought that while sqlite inserts the data it is creating the indexes > therefore sorting the data by way of the index, is this corret? > I have decared one column ONLY, unique and indexed. -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] creating unique data takes many hours, help
Thanks very much. I will try different values for the loops until a get an optimal number, and I will try the "PRAGMA cache_size = <#pages>". command. question: When you say > 2)Sort your input file on the PRIMARY KEY > or on some other INDEX I thought that while sqlite inserts the data it is creating the indexes therefore sorting the data by way of the index, is this corret? I have decared one column ONLY, unique and indexed. Thanks again, Michael > On Sun, 29 Mar 2009 15:19:00 -0400 (EDT), > mrobi...@cs.fiu.edu wrote: > >>Hi, >> >>I am new with sqlite, and I create a program that reads several mllion >>records and puts them into a sqlite db using. >> >>The table has one column ONLY indexed and unique, but it takes many >> hours. >> >>Is it necessary to pre-allocate the space, or is the anything that I can >>do to reduce the time it takes. >> >>this is how I create the db, table and index. >> >>void openSqliteFile() >>{ >>rc = sqlite3_open(genome_probesDB, &db); //if it !exist creates it >> >>if (rc == SQLITE_OK) { >> printf("RC=%d database=%s was opened\n", rc, genome_probesDB ); >>} >>else { >> printf("RC=%d database=%s COULD NOT OPEN\n", rc, genome_probesDB >> ); >>} >> >>rc = sqlite3_exec(db, "CREATE TABLE probes (probe STRING unique);", >>NULL, NULL, &errmsg); >>if (rc == SQLITE_OK) { >> printf("RC=%d table probes with field probe was created\n", rc ); >>} >>else { >> printf("RC=%d table %s already exists, so it was NOT created\n", >>rc, genome_probesDB ); >>} >> >>rc = sqlite3_exec(db, "CREATE INDEX probe on probes (probe);", NULL, >>NULL, &errmsg); >>if (rc == SQLITE_OK) { >> printf("RC=%d INDEX probe on table probes for field probe was >>created\n", rc ); >>} >>else { >>printf("RC=%d INDEX probe on table %s already exists, so it was NOT >>created\n", rc, genome_probesDB ); >>} >> >> >>}//end void openSqliteFile() >> >> >>and this is how I add the data: >>char *zSQL = sqlite3_mprintf("INSERT INTO probes VALUES(%Q)", probe); >> >>sqlite3_exec(db, zSQL, 0, 0, 0); >>sqlite3_free(zSQL); > > Two common optimizations: > > 1)Wrap the INSERT statements in a transaction > while not EOF on input file > BEGIN > loop 1 times or EOF > read input record > INSERT > endloop > COMMIT > endwhile > > 2)Sort your input file on the PRIMARY KEY > or on some other INDEX > >>Thanks very much >> >>Michael > -- > ( Kees Nuyt > ) > c[_] > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] creating unique data takes many hours, help
On Sun, 29 Mar 2009 15:19:00 -0400 (EDT), mrobi...@cs.fiu.edu wrote: >Hi, > >I am new with sqlite, and I create a program that reads several mllion >records and puts them into a sqlite db using. > >The table has one column ONLY indexed and unique, but it takes many hours. > >Is it necessary to pre-allocate the space, or is the anything that I can >do to reduce the time it takes. > >this is how I create the db, table and index. > >void openSqliteFile() >{ >rc = sqlite3_open(genome_probesDB, &db); //if it !exist creates it > >if (rc == SQLITE_OK) { > printf("RC=%d database=%s was opened\n", rc, genome_probesDB ); >} >else { > printf("RC=%d database=%s COULD NOT OPEN\n", rc, genome_probesDB ); >} > >rc = sqlite3_exec(db, "CREATE TABLE probes (probe STRING unique);", >NULL, NULL, &errmsg); >if (rc == SQLITE_OK) { > printf("RC=%d table probes with field probe was created\n", rc ); >} >else { > printf("RC=%d table %s already exists, so it was NOT created\n", >rc, genome_probesDB ); >} > >rc = sqlite3_exec(db, "CREATE INDEX probe on probes (probe);", NULL, >NULL, &errmsg); >if (rc == SQLITE_OK) { > printf("RC=%d INDEX probe on table probes for field probe was >created\n", rc ); >} >else { >printf("RC=%d INDEX probe on table %s already exists, so it was NOT >created\n", rc, genome_probesDB ); >} > > >}//end void openSqliteFile() > > >and this is how I add the data: >char *zSQL = sqlite3_mprintf("INSERT INTO probes VALUES(%Q)", probe); > >sqlite3_exec(db, zSQL, 0, 0, 0); >sqlite3_free(zSQL); Two common optimizations: 1) Wrap the INSERT statements in a transaction while not EOF on input file BEGIN loop 1 times or EOF read input record INSERT endloop COMMIT endwhile 2) Sort your input file on the PRIMARY KEY or on some other INDEX >Thanks very much > >Michael -- ( Kees Nuyt ) c[_] ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] creating unique data takes many hours, help
On Sun, Mar 29, 2009 at 04:21:26PM -0500, Stephen Woodbridge scratched on the wall: > Try wrapping all the inserts in a transaction: > > > BEGIN; > INSERT ... > INSERT ... > > COMMIT; > > This should be MUCH faster. > > I am new with sqlite, and I create a program that reads several mllion > > records and puts them into a sqlite db using. If you're doing several million records, I wouldn't wrap them *all* in a single transaction You might try batches of 10,000 or so. For example: BEGIN; INSERT ... ... x10,000 COMMIT; BEGIN; INSERT ... ... x10,000 COMMIT; ... ... ... Also, if possible, things will go much faster if you pre-order the data. If the data is not pre-ordered, bump up the page cache to give the index some working room. The default is 2000 pages, which consumes about 3MB of real memory using the default 1K page size. Consider bumping that up by a factor of 10x to maybe even 100x, depending on how much physical RAM you have available. You can change the size of the page cache with the command "PRAGMA cache_size = <#pages>". You don't need the extra space for queries, but it helps when doing a large number of random inserts into the index. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] creating unique data takes many hours, help
Try wrapping all the inserts in a transaction: BEGIN; INSERT ... INSERT ... COMMIT; This should be MUCH faster. -Steve W mrobi...@cs.fiu.edu wrote: > Hi, > > I am new with sqlite, and I create a program that reads several mllion > records and puts them into a sqlite db using. > > The table has one column ONLY indexed and unique, but it takes many hours. > > Is it necessary to pre-allocate the space, or is the anything that I can > do to reduce the time it takes. > > this is how I create the db, table and index. > > void openSqliteFile() > { > rc = sqlite3_open(genome_probesDB, &db); //if it !exist creates it > > if (rc == SQLITE_OK) { >printf("RC=%d database=%s was opened\n", rc, genome_probesDB ); > } > else { >printf("RC=%d database=%s COULD NOT OPEN\n", rc, genome_probesDB ); > } > > rc = sqlite3_exec(db, "CREATE TABLE probes (probe STRING unique);", > NULL, NULL, &errmsg); > if (rc == SQLITE_OK) { >printf("RC=%d table probes with field probe was created\n", rc ); > } > else { >printf("RC=%d table %s already exists, so it was NOT created\n", > rc, genome_probesDB ); > } > > rc = sqlite3_exec(db, "CREATE INDEX probe on probes (probe);", NULL, > NULL, &errmsg); > if (rc == SQLITE_OK) { >printf("RC=%d INDEX probe on table probes for field probe was > created\n", rc ); > } > else { > printf("RC=%d INDEX probe on table %s already exists, so it was NOT > created\n", rc, genome_probesDB ); > } > > > }//end void openSqliteFile() > > > and this is how I add the data: > char *zSQL = sqlite3_mprintf("INSERT INTO probes VALUES(%Q)", probe); > > sqlite3_exec(db, zSQL, 0, 0, 0); > sqlite3_free(zSQL); > > > > Thanks very much > > Michael > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] creating unique data takes many hours, help
Hi, I am new with sqlite, and I create a program that reads several mllion records and puts them into a sqlite db using. The table has one column ONLY indexed and unique, but it takes many hours. Is it necessary to pre-allocate the space, or is the anything that I can do to reduce the time it takes. this is how I create the db, table and index. void openSqliteFile() { rc = sqlite3_open(genome_probesDB, &db); //if it !exist creates it if (rc == SQLITE_OK) { printf("RC=%d database=%s was opened\n", rc, genome_probesDB ); } else { printf("RC=%d database=%s COULD NOT OPEN\n", rc, genome_probesDB ); } rc = sqlite3_exec(db, "CREATE TABLE probes (probe STRING unique);", NULL, NULL, &errmsg); if (rc == SQLITE_OK) { printf("RC=%d table probes with field probe was created\n", rc ); } else { printf("RC=%d table %s already exists, so it was NOT created\n", rc, genome_probesDB ); } rc = sqlite3_exec(db, "CREATE INDEX probe on probes (probe);", NULL, NULL, &errmsg); if (rc == SQLITE_OK) { printf("RC=%d INDEX probe on table probes for field probe was created\n", rc ); } else { printf("RC=%d INDEX probe on table %s already exists, so it was NOT created\n", rc, genome_probesDB ); } }//end void openSqliteFile() and this is how I add the data: char *zSQL = sqlite3_mprintf("INSERT INTO probes VALUES(%Q)", probe); sqlite3_exec(db, zSQL, 0, 0, 0); sqlite3_free(zSQL); Thanks very much Michael ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Using the static lib of sqlite3 under 64 bit ubuntu
Settings: Gcc 4.3.3 Added usr/lib64/libsqlite3.a Error: Undefined reference to 'pthread_mutex_trylock' What I am missing? If I use the so shared lib no problems at all. Tx, Danny ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select by min di
"baxy77bax" wrote in message news:22767988.p...@talk.nabble.com > hi , > i need help with this one : i have a table with 2 columns C1 & C2: > > C1 C2 > > a 1 > b 1 > b 2 > b 4 > c 3 > v 2 > c 5 > d 5 > > and i need to get this: > > C1 C2 > > a 1 > b 1 > c 3 > d 5 > v 2 > > so a unique set of data in column C1 where C2 is the min id for that > data ? select C1, min(C2) from mytable group by C1; Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] select by min di
hi , i need help with this one : i have a table with 2 columns C1 & C2: C1 C2 a 1 b 1 b 2 b 4 c 3 v 2 c 5 d 5 and i need to get this: C1 C2 a 1 b 1 c 3 d 5 v 2 so a unique set of data in column C1 where C2 is the min id for that data ? thank you ! -- View this message in context: http://www.nabble.com/select-by-min-di-tp22767988p22767988.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sequential row numbers from query
Possible limitations of temporary tables vs views: 1) Temporary tables are static - views are dynamic - i.e. anytime a change is made to the database the temporary tables might need to be recreated. 2) Memory: Temporary tables will be retained in memory until dropped. Views will only use memory when used. Eli Adler -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Robert Citek Sent: Saturday, March 28, 2009 12:03 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] sequential row numbers from query That would work. In fact, my current solution, which actually pipes to perl, works pretty well. It's just that I have to then import the data back into the database. So, I'd prefer to do the process entirely in SQL. I was thinking maybe a view, but that didn't work. Apparently, there is no rowid with views (or is there something equivalent that I'm overlooking?). $ sqlite3 db 'create view bar as select * from foo order by field desc; select rowid, * from bar ; ' |c |b |a Substituting a temporary table for the view works: $ sqlite3 db 'create temporary table bat as select * from foo order by field desc; select rowid, * from bat ; ' 1|c 2|b 3|a Any limitations to consider when using a temporary table? Any other ideas? Regards, - Robert On Fri, Mar 27, 2009 at 3:37 PM, Thomas Briggs wrote: > Holy cow that feels inefficient. > > It's a bit clunky, but why not insert into a temporary table, > ordered as desired, and then use the rowid from the temp table? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users This e-mail is confidential, the property of NDS Ltd and intended for the addressee only. Any dissemination, copying or distribution of this message or any attachments by anyone other than the intended recipient is strictly prohibited. If you have received this message in error, please immediately notify the postmas...@nds.com and destroy the original message. Messages sent to and from NDS may be monitored. NDS cannot guarantee any message delivery method is secure or error-free. Information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. We do not accept responsibility for any errors or omissions in this message and/or attachment that arise as a result of transmission. You should carry out your own virus checks before opening any attachment. Any views or opinions presented are solely those of the author and do not necessarily represent those of NDS. To protect the environment please do not print this e-mail unless necessary. NDS Limited Registered Office: One London Road, Staines,Middlesex TW18 4EX, United Kingdom. A company registered in England and Wales Registered no. 3080780 VAT no. GB 603 8808 40-00 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users