Re: [sqlite] Sqlite insertion performance

2007-09-15 Thread Kefah T. Issa

Dear Kees and Richard,

Much appreciated.


I tried the ordered-urls-insert the results were better, but it is still 
taking progressively longer time as the number of records increases.

A fundamental question to be asked here :

Shouldn't the time complexity (Big-O) of the insert operation be constant?

I even did a third test where the integer primary key is not auto increment; 
the same problem is observed.

Time complexity is clearly constant when there are no indexes at all (except 
for the implicit auto-increment integer primary key)

But otherwise, time taken increases incrementally (if not exponentially) with 
the number of existing records.

Acknowledging my ignorance on the subject; I really don't see a reason why 
this is happening except being a potential bug or performance improvement 
opportunity.

Regards,
- Kefah.



On Saturday 15 September 2007 12:51:02 Kees Nuyt wrote:
> Hi Kefah,
>
> On Sat, 15 Sep 2007 04:43:46 +0300, you wrote:
> >Thanks Kees,
> >
> >In fact using integer instead of string gives very similar results.
> >
> >Dropping the unique index constraint and then creating it again when
> > needed sounds interesting, as insertion without the unique index is
> > satisfactory and constact over time.
>
> Thanks for the feedback.
>
> >I tried this, but got a trivial question :
> >When creating the unique index, sqlite gives me :
> >SQL error: indexed columns are not unique
> >
> >What should be done here?
>
> Apparently the data in the text column is not unique.
> That is the disadvantage of building the index after the
> insertions: the database can't exercise the contraints on your
> data, so you would have to do that yourself, for example by a
> sort --unique step. My second suggestion for speeding things was
> sorting the input data, so now you have two reasons for a sort.
>
> Such a data cleaning step will take considerable time, so time
> gained in inserting may be lost again in preprocessing.
> It might be better to use the database constraints, and live
> with the slow insertions. Your benchmarks will tell you what's
> best.
>
> >Thanks again,
> >- Kefah.
>
> Good luck.
>
> >On Saturday 15 September 2007 00:25:03 Kees Nuyt wrote:
> >> On Fri, 14 Sep 2007 23:20:53 +0300, you wrote:
> >> >Dear All,
> >> >
> >> >I have been struggling with the performance of insertion in sqlite.
> >> >
> >> >Here we have a very simple case :
> >> >
> >> >A table with an integer autoincrement primary key and a text field that
> >> > is unique.
> >> >
> >> >CREATE TABLE my (id PRIMARY KEY, url);
> >> >
> >> >CREATE UNIQUE INDEX myurl ON my(url);
> >> >
> >> >
> >> >My application requires inserting up to 10 million records in batches
> >> > of 20 thousand records.
> >> >
> >> >I use :
> >> >A. PRAGMA SYNCHRONOUS = OFF
> >> >B. Prepared Statement
> >> >C. setAutoCommit(false), then to true at the end of the batch.
> >> >
> >> >Using the above, the insertion starts off at a good speed, then
> >> > drastically slows down as more records are inserted.
> >> >
> >> >It goes like this :
> >> >
> >> >The first four inserstions (first 4*20K -> 60K records)
> >> >0took786
> >> >1took944
> >> >2took1001
> >> >3took1303
> >> >
> >> >After the first 1 Million records
> >> >50took2560
> >> >51took1921
> >> >55took1900
> >> >53took3990
> >> >
> >> >After the 2nd million records
> >> >2took22393
> >> >3took16789
> >> >4took29419
> >> >5took13993
> >> >
> >> >By the end of the the 3rd million records the insertion time goes up to
> >> > 30 seconds per 20K records.
> >> >
> >> >My app is running from a java code and I am using SqliteJDBC 0.37 (the
> >> > latest) on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu
> >> > server with raid10 sata-II harddisk.
> >> >
> >> >
> >> >I know I might be stretching sqlite far beyond its limits, I just want
> >> > to verify that there is nothing more that can be done to make a case
> >> > for sqlite in this scenario. I am not sure whats the best next thing
> >> > to do either.
> >> >
> >> >Your feedback and input will be highly appreciated,
> >> >
> >> >- Kefah.
> >>
> >> Most probably the UNIQUE INDEX on the TEXT column is the
> >> culprit.
> >>
> >> My first try would be to create and fill the table first, and
> >> create the UNIQUE INDEX on the TEXT column afterwards.
> >>
> >> The second suggestion would be to INSERT the rows in sorted
> >> order, the sort key being the TEXT column.



-- 
Kefah T. Issa
Manager

>/. freesoft technologies llc
freesoft technologies, LLC.
Cell : +962 777 80 90 50
Office : +962 6 55 23 967
Fax : +962 6 55 61 967
Jabber IM (XMPP) : [EMAIL PROTECTED] 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-14 Thread Kefah T. Issa
Thanks Kees,

In fact using integer instead of string gives very similar results.

Dropping the unique index constraint and then creating it again when needed 
sounds interesting, as insertion without the unique index is satisfactory and 
constact over time.

I tried this, but got a trivial question : 
When creating the unique index, sqlite gives me : 
SQL error: indexed columns are not unique

What should be done here?

Thanks again,
- Kefah.


On Saturday 15 September 2007 00:25:03 Kees Nuyt wrote:
> On Fri, 14 Sep 2007 23:20:53 +0300, you wrote:
> >Dear All,
> >
> >I have been struggling with the performance of insertion in sqlite.
> >
> >Here we have a very simple case :
> >
> >A table with an integer autoincrement primary key and a text field that is
> >unique.
> >
> >CREATE TABLE my (id PRIMARY KEY, url);
> >
> >CREATE UNIQUE INDEX myurl ON my(url);
> >
> >
> >My application requires inserting up to 10 million records in batches of
> > 20 thousand records.
> >
> >I use :
> >A. PRAGMA SYNCHRONOUS = OFF
> >B. Prepared Statement
> >C. setAutoCommit(false), then to true at the end of the batch.
> >
> >Using the above, the insertion starts off at a good speed, then
> > drastically slows down as more records are inserted.
> >
> >It goes like this :
> >
> >The first four inserstions (first 4*20K -> 60K records)
> >0took786
> >1took944
> >2took1001
> >3took1303
> >
> >After the first 1 Million records
> >50took2560
> >51took1921
> >55took1900
> >53took3990
> >
> >After the 2nd million records
> >2took22393
> >3took16789
> >4took29419
> >5took13993
> >
> >By the end of the the 3rd million records the insertion time goes up to 30
> >seconds per 20K records.
> >
> >My app is running from a java code and I am using SqliteJDBC 0.37 (the
> > latest) on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu
> > server with raid10 sata-II harddisk.
> >
> >
> >I know I might be stretching sqlite far beyond its limits, I just want to
> >verify that there is nothing more that can be done to make a case for
> > sqlite in this scenario. I am not sure whats the best next thing to do
> > either.
> >
> >Your feedback and input will be highly appreciated,
> >
> >- Kefah.
>
> Most probably the UNIQUE INDEX on the TEXT column is the
> culprit.
>
> My first try would be to create and fill the table first, and
> create the UNIQUE INDEX on the TEXT column afterwards.
>
> The second suggestion would be to INSERT the rows in sorted
> order, the sort key being the TEXT column.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Sqlite insertion performance

2007-09-14 Thread Kefah T. Issa

Dear All,

I have been struggling with the performance of insertion in sqlite.

Here we have a very simple case : 

A table with an integer autoincrement primary key and a text field that is 
unique.

CREATE TABLE my (id PRIMARY KEY, url);

CREATE UNIQUE INDEX myurl ON my(url);


My application requires inserting up to 10 million records in batches of 20 
thousand records.

I use : 
A. PRAGMA SYNCHRONOUS = OFF
B. Prepared Statement
C. setAutoCommit(false), then to true at the end of the batch.

Using the above, the insertion starts off at a good speed, then drastically 
slows down as more records are inserted.

It goes like this : 

The first four inserstions (first 4*20K -> 60K records)
0took786
1took944
2took1001
3took1303

After the first 1 Million records 
50took2560
51took1921
55took1900
53took3990

After the 2nd million records
2took22393
3took16789
4took29419
5took13993

By the end of the the 3rd million records the insertion time goes up to 30 
seconds per 20K records.

My app is running from a java code and I am using SqliteJDBC 0.37 (the latest) 
on Sun JDK 1.6.0_02 on CentOS 5.0 4GB Ram, two duo core cpu server with 
raid10 sata-II harddisk.


I know I might be stretching sqlite far beyond its limits, I just want to 
verify that there is nothing more that can be done to make a case for sqlite 
in this scenario. I am not sure whats the best next thing to do either.

Your feedback and input will be highly appreciated,

- Kefah.





P.S. here is a snippit of the java code :
Class.forName("org.sqlite.JDBC");
Connection connection = 
DriverManager.getConnection("jdbc:sqlite:./my.db");

long total_time=0;
int iterations = 50;
int records_per_iteration = 2;

for(int i=0; i