Re: [sqlite] creating unique data takes many hours, help

2009-03-30 Thread Dennis Cote
mrobi...@cs.fiu.edu wrote:
> this is how I create the db, table and index.
>
>
> rc = sqlite3_exec(db, "CREATE TABLE probes (probe STRING unique);", 
>
> rc = sqlite3_exec(db, "CREATE INDEX probe on probes (probe);
> ",  NULL,
>   
In addition to what the others have said you should also see a 
substantial speedup, and reduction in your database size, if you 
eliminate the redundant probe index. The unique constraint on the probe 
column in your table causes sqlite to automatically create an index 
nearly identical to the one you are adding manually. It uses that index 
to efficiently enforce the uniqueness constraint.

As it is, your database contains one table and two indexes, each of 
which contains a copy of every probe string your table (i.e. three times 
several million strings).

If you know your data doesn't contain any duplicate strings, the fastest 
approach would be to remove the unique constraint from the table. Then 
add all your data to the table (inside one or more transactions), and 
finally create a unique index on the strings after all the inserts.

CREATE TABLE probes (probe STRING);

BEGIN;
INSERT ...
INSERT ...
COMMIT;

CREATE UNIQUE INDEX probe on probes (probe);


Of course this won't work if you are relying on the unique constraint to 
eliminate duplicate strings in your data.
 
HTH
Dennis Cote
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating unique data takes many hours, help

2009-03-30 Thread Ken



As others have indicated: 
   1. Create the index at the end after the data is loaded. 
   2. Wrap the inserts in a transaction. Commiting every N transactions.

Also:
   3. Instead of the sqlite3_mprintf/exec, you should prepare,bind, step.
   4. If this is single threaded then maybe recompiling the sqlite library 
disabling threading would likely help some.

  
   

--- On Mon, 3/30/09, mrobi...@cs.fiu.edu  wrote:

> From: mrobi...@cs.fiu.edu 
> Subject: Re: [sqlite] creating unique data takes many hours, help
> To: "General Discussion of SQLite Database" 
> Date: Monday, March 30, 2009, 7:19 AM
> Hi,
> 
> Yes, I am dealing with dna sequences, the data is huge,
> human genome is
> about 3.2 billion records, others are of similar size.
> 
> Before I found squile, using native C, I extracted the
> strings, testing
> for uniqueness. With sqlite I did the same and  it
> took about the same
> time, once I added the index to sqlite the time was
> reduced.
> 
> I don't need this data to be sorted, just unique, adding
> the index made it
> faster, but I am always looking for ways of reducing
> processing times
> 
> I will try your suggestions and report back.
> 
> More suggestions are very welcome
> 
> Thanks very much for your help
> 
> Michael
> 
> > Hi Michael,
> >
> >
> > Two thoughts -- and I hope others will chime in if I'm
> off-base here:
> >
> >
> > 1) Build just one index after the import:
> >
> >>From page:
> > http://www.sqlite.org/lang_createtable.html
> > "The UNIQUE constraint causes an unique index to be
> created on the
> > specified columns."
> >
> > I think that if you then create your own index, it
> will be redundant,
> > and, with millions of records, time-consuming.
> >
> > However, you may well want to KEEP your own index (and
> add the UNIQUE
> > constrait to it), and *remove* the UNIQUE constraint
> on the table column
> > definition.   I hear that creating the
> index after the data has been
> > imported is faster (and less fragmented) than creating
> an implicit one
> > on-the-fly during import.
> >
> >
> >
> > 2) Hashing
> >
> > If Vinnie [thev...@yahoo.com]
> was correct in guessing your goal (to
> > determine if a given string has been examined before),
> and if
> > performance remains a problem, I wondered if the
> design might benefit
> > from using hashing to "pre-qualify" a string and/or
> substitute for a
> > direct match.
> >
> > A short hash (perhaps a 64-bit integer?) could be
> stored in a separate
> > table or even a separate database from the strings
> themselves.  If the
> > hash of a new string does not match any hash in the
> database, then you
> > know there is no need to search for the string
> itself.  If the entire
> > hash index can be kept in RAM, you might get a big
> benefit.
> >
> > You could decide to take this a step further by using
> a longer hash as a
> > proxy for the string itself.  The hash would need
> to be long enough to
> > make collisions extremely unlikely, yet not so long as
> to negate the
> > value of using a proxy for the original strings. 
> In practice, you'd
> > probably want to compute a single long hash for each
> string, then use
> > the first X bytes as the "pre-qualifying" hash, and
> the remainder as the
> > "confirming hash." If your average string length is
> short, then using
> > *two* hashes may not be much of a gain.
> >
> > (Of course, since this is not a cryptographic
> application, you don't
> > need to worry about whether your chosen hash algorithm
> is "secure" or
> > not.)
> >
> > Regards,
> >Donald
> >
> >
> >
> > -Original Message-
> > From: sqlite-users-boun...@sqlite.org
> > [mailto:sqlite-users-boun...@sqlite.org]
> On Behalf Of Vinnie
> > Sent: Sunday, March 29, 2009 6:14 PM
> > To: sqlite-users@sqlite.org
> > Subject: Re: [sqlite] creating unique data takes many
> hours, help
> >
> >
> > Michael:
> >
> > While my answer isn't strictly limited to sqlite, the
> performance nut in
> > me can't help myself. You have a a table with only one
> column, a string.
> > I could be wrong here but it seems like you just want
> to keep a list of
> > values that you have already tried. After you insert a
> bunch of strings
> > into the table you want to be able to quickly look up
> a string to see if
> > it exists, so that you can tell if you already probed
> that sequence
> > (taking a guess here).
> >
> > If my guess is right, and the only thing you are doing
> is looking up
> > sorted single-column elements, you probably can get
> away with your own
> > quick disk-based binary tree implementation and avoid
> sqlite for this
> > particular circumstance altogether. The result would
> be several orders
> > of magnitude faster, even after you have followed the
> suggestions others
> > have given.
> >
> >
> >
> >
> >> Hi,
> >>
> >> I am new with sqlite, and I create a program that
> reads several mllion
> >
> >> records and puts them into a 

Re: [sqlite] ScrollingCursor with unique constraint on multiple columns

2009-03-30 Thread Igor Tandetnik
"Dieter Guthmann" 
wrote in message news:49d0ac6b.60...@bup-nbg.de
> -- Statement for whole list:
> SELECT manufacturer, model_name FROM device ORDER BY manufacturer;
>
> -- Statement sent by application for the second scrolling step (should
> show row 4 to 6):
> SELECT manufacturer, model_name FROM device WHERE manufacturer > 'M2'
> ORDER BY manufacturer LIMIT 3;
>
> The first problem is that the second page should start with ( M2 |
> X500 ) but the where clause causes that all rows with
> manufacturer="M2" are omitted.
> If I use two columns in the where-clause the output still gets worse:
> ...WHERE manufacturer > 'M2' AND model_name > 'X1000'...
> also omitts the row ( M3, X1000 ).

You want something like this:

SELECT manufacturer, model_name FROM device
WHERE manufacturer >= 'M2' AND (manufacturer > 'M2' OR model_name > 
'X1000')
ORDER BY manufacturer, model_name LIMIT 3;

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] ScrollingCursor with unique constraint on multiple columns

2009-03-30 Thread Dieter Guthmann
Hello there,

I've found the implementation-recomendations for scrolling windows in
sqlite [1].
We've currently implemented it with LIMIT and OFFSET as described in
section "What Not To Do" ;-)

The solution described in "Work Arounds" seems really faster than our
implementation... but can I implement it also when there's a unique
constraint on multiple columns?

Here an example:
create table device( id integer primary key, manufacturer VARCHAR(200)
NOT NULL, model_name VARCHAR(200) NOT NULL, CONSTRAINT U_device UNIQUE (
manufacturer, model_name ) ) ;
INSERT INTO device (id,manufacturer,model_name) VALUES (NULL,"M2","X1000");
INSERT INTO device (id,manufacturer,model_name) VALUES (NULL,"M2","X500");
INSERT INTO device (id,manufacturer,model_name) VALUES (NULL,"M1","X1000");
INSERT INTO device (id,manufacturer,model_name) VALUES (NULL,"M2","A5");
INSERT INTO device (id,manufacturer,model_name) VALUES (NULL,"M3","TX200");
INSERT INTO device (id,manufacturer,model_name) VALUES (NULL,"M3","A1");
INSERT INTO device (id,manufacturer,model_name) VALUES (NULL,"M3","X1000");

-- Statement for whole list:
SELECT manufacturer, model_name FROM device ORDER BY manufacturer;

-- Statement sent by application for the second scrolling step (should
show row 4 to 6):
SELECT manufacturer, model_name FROM device WHERE manufacturer > 'M2'
ORDER BY manufacturer LIMIT 3;

The first problem is that the second page should start with ( M2 | X500
) but the where clause causes that all rows with manufacturer="M2" are
omitted.
If I use two columns in the where-clause the output still gets worse:
...WHERE manufacturer > 'M2' AND model_name > 'X1000'...
also omitts the row ( M3, X1000 ).

Is there a way to speed up my scrolling-window queries  for tables in
that style?
Thanks in advance.

Best Regards,
Dieter Guthmann

[1] 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating unique data takes many hours, help

2009-03-30 Thread mrobi002
Hi,

Yes, I am dealing with dna sequences, the data is huge, human genome is
about 3.2 billion records, others are of similar size.

Before I found squile, using native C, I extracted the strings, testing
for uniqueness. With sqlite I did the same and  it took about the same
time, once I added the index to sqlite the time was reduced.

I don't need this data to be sorted, just unique, adding the index made it
faster, but I am always looking for ways of reducing processing times

I will try your suggestions and report back.

More suggestions are very welcome

Thanks very much for your help

Michael

> Hi Michael,
>
>
> Two thoughts -- and I hope others will chime in if I'm off-base here:
>
>
> 1) Build just one index after the import:
>
>>From page:
> http://www.sqlite.org/lang_createtable.html
> "The UNIQUE constraint causes an unique index to be created on the
> specified columns."
>
> I think that if you then create your own index, it will be redundant,
> and, with millions of records, time-consuming.
>
> However, you may well want to KEEP your own index (and add the UNIQUE
> constrait to it), and *remove* the UNIQUE constraint on the table column
> definition.   I hear that creating the index after the data has been
> imported is faster (and less fragmented) than creating an implicit one
> on-the-fly during import.
>
>
>
> 2) Hashing
>
> If Vinnie [thev...@yahoo.com] was correct in guessing your goal (to
> determine if a given string has been examined before), and if
> performance remains a problem, I wondered if the design might benefit
> from using hashing to "pre-qualify" a string and/or substitute for a
> direct match.
>
> A short hash (perhaps a 64-bit integer?) could be stored in a separate
> table or even a separate database from the strings themselves.  If the
> hash of a new string does not match any hash in the database, then you
> know there is no need to search for the string itself.  If the entire
> hash index can be kept in RAM, you might get a big benefit.
>
> You could decide to take this a step further by using a longer hash as a
> proxy for the string itself.  The hash would need to be long enough to
> make collisions extremely unlikely, yet not so long as to negate the
> value of using a proxy for the original strings.  In practice, you'd
> probably want to compute a single long hash for each string, then use
> the first X bytes as the "pre-qualifying" hash, and the remainder as the
> "confirming hash." If your average string length is short, then using
> *two* hashes may not be much of a gain.
>
> (Of course, since this is not a cryptographic application, you don't
> need to worry about whether your chosen hash algorithm is "secure" or
> not.)
>
> Regards,
>Donald
>
>
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Vinnie
> Sent: Sunday, March 29, 2009 6:14 PM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] creating unique data takes many hours, help
>
>
> Michael:
>
> While my answer isn't strictly limited to sqlite, the performance nut in
> me can't help myself. You have a a table with only one column, a string.
> I could be wrong here but it seems like you just want to keep a list of
> values that you have already tried. After you insert a bunch of strings
> into the table you want to be able to quickly look up a string to see if
> it exists, so that you can tell if you already probed that sequence
> (taking a guess here).
>
> If my guess is right, and the only thing you are doing is looking up
> sorted single-column elements, you probably can get away with your own
> quick disk-based binary tree implementation and avoid sqlite for this
> particular circumstance altogether. The result would be several orders
> of magnitude faster, even after you have followed the suggestions others
> have given.
>
>
>
>
>> Hi,
>>
>> I am new with sqlite, and I create a program that reads several mllion
>
>> records and puts them into a sqlite db using.
>>
>> The table has one column ONLY indexed and unique, but it takes many
>> hours.
>>
>> Is it necessary to pre-allocate the space, or is the anything that I
>> can do to reduce the time it takes.
>>
>> this is how I create the db, table and index.
>
>



Yes, you're right, but if the data is already in index order, you'll
do less I/O when creating the index.  Whether the sort + create DB
time is less than "create DB from random input" time is another
question.

Jim
___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Unable to open database file, uclinux

2009-03-30 Thread Stephen Kyne

Hi all,
I'm currently trying to get a file database working on Uclinux/Petalinux. I've 
been using the shell program to test if it is working or not but with no 
success. The filesystem on the board is read-only (/bin /usr) except for the 
"/var" folder which is a ram disk. I have sqlite3 in the bin folder with full 
777 permissions. I have the libsqlite3.a in "/usr/lib", also full permissions. 
I try and create a database in the "/var" folder and seemingly it works at 
first. When I try to enter a command though, I get the error "unable to open 
database file". If i check the /var folder after running the shell program, I 
can see the database file was created but its 0 bytes in size. All the folders 
have full permissions so it shouldn't be that. Sqlite can also output text 
files to that directory just fine.
I built the program from source using the following parameters:

CFLAGS="-Os -DSQLITE_DEBUG=1 -DSQLITE_OMIT_LOAD_EXTENSION=1 
-DSQLITE_TEMP_STORE=2";
./configure --host="i386-pc-linux" --disable-shared --enable-static

I guess my question is, how can I find out the root cause of the problem? From 
reading various forums, this error is a quite vague. I've tried most things 
with no luck, is there any other options?
Thanks

_
Get 30 Free Emoticons for your Windows Live Messenger
http://www.livemessenger-emoticons.com/funfamily/en-ie/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users