Re: [sqlite] [sqlite-dev] Poor SQLite indexing performance

2010-03-16 Thread Tim Romano
I have not done this, but if you have enough RAM available, you might 
try putting your primary keys in a table in an in-memory database, and 
test for existence there.  That would allow you to enforce uniqueness 
while postponing creation of the PK index on the disk table until after 
the initial population has completed.

The only other way to enforce uniqueness without an index is a hashed 
table (a feature not available in SQLite). On a table with very many 
rows, finding the key using a hash can be much quicker than scanning a 
b-tree for it, and inserts moreover do not slow down as no unique index 
is being created/reorganized during batch population of the table.

Regards
Tim Romano

On 3/15/2010 10:31 AM, Pavel Ivanov wrote:
> 
>
>> Is there any way to have a UNIQUE
>> field but disable indexing till the end?
>>  
> How do you expect your uniqueness to be enforced? SQLite does that by
> looking into index - if value is there then it is repeated, if value
> is not there then it's unique and should be inserted into index for
> further check.
>
>

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [sqlite-dev] Poor SQLite indexing performance

2010-03-15 Thread Max Vlasov
> Also it's quite known that
> creating index after inserting all rows is much faster than creating
> index before that. So it can be even beneficial in inserting huge
> amount of rows somewhere in the middle of the work: first delete all
> indexes, then insert rows, then create indexes once more.
>

Pavel, please add some reference (for example from docs or drh), my tests
showed that for large data there almost no difference in speed. And at the
first place I thought that the algorithm is really different, now I almost
sure that building index from scratch is just enumerating records building
the tree. CMIIW

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [sqlite-dev] Poor SQLite indexing performance

2010-03-15 Thread Pavel Ivanov
First of all you should write to sqlite-users mailing list, not sqlite-dev.

> Why is the indexing so slow and bogs down as we proceed with insertions ?
> Any suggestions ?
> Also, how could I improve performance ?

I can't say exactly why performance with index degrades so
significantly with the size of the table. Probably it's something to
do with cache - try to increase its size. Also it's quite known that
creating index after inserting all rows is much faster than creating
index before that. So it can be even beneficial in inserting huge
amount of rows somewhere in the middle of the work: first delete all
indexes, then insert rows, then create indexes once more.

> Is there any way to have a UNIQUE
> field but disable indexing till the end?

How do you expect your uniqueness to be enforced? SQLite does that by
looking into index - if value is there then it is repeated, if value
is not there then it's unique and should be inserted into index for
further check.

> However, as I am using hash values instead of filenames I need to deal with
> collisions. Hence, removing the UNIQUE field is risky as I cant detect
> collisions. But I am also unable to use it because of the poor indexing
> performance.

What's wrong with creating unique index after all rows are inserted?
It's the same as declaring UNIQUE field although checking for
uniqueness is postponed till index creation and you will unable to
create index if there's some not unique values. But I don't know if
all this matters to you.


Pavel

On Fri, Mar 12, 2010 at 7:51 AM, rohan a  wrote:
> Hello,
>
> I am using an SQLite database for storing certain file properties. Initially
> I used a pathname (ex: D:\\1\2\file.txt) as a PRIMARY KEY. But as the
> database created was very large, I decided to use hashes of the pathname
> instead. This brought down the size occupied by the database on disk. I am
> using the C language interface provided.
>
> The table looks like this:
>
> CREATE TABLE IF NOT EXISTS EITable (Crc1 INTEGER, Crc2 INTEGER, Ctime
> INTEGER, Checksum INTEGER, UNIQUE(Crc1,Crc2));
>
> I generate 2 64-bit hashes of the pathname and store them into the database.
> With a combination of (Crc1,Crc2) being UNIQUE.
>
> The size of the database definitely came down using this method. However,
> the time taken for insertions becomes abnormally large and un-acceptable. It
> is quite fast initially but slows down drastically as the database gets
> filled with records.
>
> When I remove the UNIQUE field and create an INDEX after all INSERTions are
> completed the insertions are fast and complete quickly. I use the CREATE
> INDEX when all insertions are completed.
> However, as I am using hash values instead of filenames I need to deal with
> collisions. Hence, removing the UNIQUE field is risky as I cant detect
> collisions. But I am also unable to use it because of the poor indexing
> performance.
>
> Why is the indexing so slow and bogs down as we proceed with insertions ?
> Any suggestions ?
> Also, how could I improve performance ?Is there any way to have a UNIQUE
> field but disable indexing till the end?
>
> I am using Transactions and compiled statements for INSERT, SELECT etc. I
> timed each of the activities like generation of the hash, insertion, time
> taken for COMMIT. This indicated that the indexing is the culprit. Also
> moving indexing to the end improved performance.
>
> Any help/suggestions on this ?
>
> Thanks
>
> ___
> sqlite-dev mailing list
> sqlite-...@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-dev
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users