Hi Kefah,

On Sat, 15 Sep 2007 04:43:46 +0300, you wrote:

>Thanks Kees,
>
>In fact using integer instead of string gives very similar results.
>
>Dropping the unique index constraint and then creating it again when needed 
>sounds interesting, as insertion without the unique index is satisfactory and 
>constact over time.

Thanks for the feedback.

>I tried this, but got a trivial question : 
>When creating the unique index, sqlite gives me : 
>SQL error: indexed columns are not unique
>
>What should be done here?

Apparently the data in the text column is not unique.
That is the disadvantage of building the index after the
insertions: the database can't exercise the contraints on your
data, so you would have to do that yourself, for example by a
sort --unique step. My second suggestion for speeding things was
sorting the input data, so now you have two reasons for a sort.

Such a data cleaning step will take considerable time, so time
gained in inserting may be lost again in preprocessing.
It might be better to use the database constraints, and live
with the slow insertions. Your benchmarks will tell you what's
best.

>Thanks again,
>- Kefah.

Good luck.

>On Saturday 15 September 2007 00:25:03 Kees Nuyt wrote:
>> On Fri, 14 Sep 2007 23:20:53 +0300, you wrote:
>> >Dear All,
>> >
>> >I have been struggling with the performance of insertion in sqlite.
>> >
>> >Here we have a very simple case :
>> >
>> >A table with an integer autoincrement primary key and a text field that is
>> >unique.
>> >
>> >CREATE TABLE my (id PRIMARY KEY, url);
>> >
>> >CREATE UNIQUE INDEX myurl ON my(url);
>> >
>> >
>> >My application requires inserting up to 10 million records in batches of
>> > 20 thousand records.
>> >
>> >I use :
>> >    A. PRAGMA SYNCHRONOUS = OFF
>> >    B. Prepared Statement
>> >    C. setAutoCommit(false), then to true at the end of the batch.
>> >
>> >Using the above, the insertion starts off at a good speed, then
>> > drastically slows down as more records are inserted.
>> >
>> >It goes like this :
>> >
>> >The first four inserstions (first 4*20K -> 60K records)
>> >0        took    786
>> >1        took    944
>> >2        took    1001
>> >3        took    1303
>> >
>> >After the first 1 Million records
>> >50        took    2560
>> >51        took    1921
>> >55        took    1900
>> >53        took    3990
>> >
>> >After the 2nd million records
>> >2        took    22393
>> >3        took    16789
>> >4        took    29419
>> >5        took    13993
>> >
>> >By the end of the the 3rd million records the insertion time goes up to 30
>> >seconds per 20K records.
>> >
>> >My app is running from a java code and I am using SqliteJDBC 0.37 (the
>> > latest) on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu
>> > server with raid10 sata-II harddisk.
>> >
>> >
>> >I know I might be stretching sqlite far beyond its limits, I just want to
>> >verify that there is nothing more that can be done to make a case for
>> > sqlite in this scenario. I am not sure whats the best next thing to do
>> > either.
>> >
>> >Your feedback and input will be highly appreciated,
>> >
>> >- Kefah.
>>
>> Most probably the UNIQUE INDEX on the TEXT column is the
>> culprit.
>>
>> My first try would be to create and fill the table first, and
>> create the UNIQUE INDEX on the TEXT column afterwards.
>>
>> The second suggestion would be to INSERT the rows in sorted
>> order, the sort key being the TEXT column.
-- 
  (  Kees Nuyt
  )
c[_]

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to