Re: [sqlite] serious performance problems with indexes

2006-11-03 Thread Christian Smith

Peter De Rijk uttered:


--On Friday 27 October 2006 15:31, [EMAIL PROTECTED] wrote:

When a table is indexed, INSERT performance is logorithmic in the
number of rows in the table and linear in the number of indices.
This is because entries have to be inserted into the index in
sorted order (otherwise it wouldn't be an index).  And each
insert thus requires a binary search.

If your index becomes very large so that it no longer fits
in your disk cache, then the binary search can involve a lot
of disk I/O which can make things really slow.  The usual
work-around here is to keep a much smaller staging table into
which you do your inserts and then periodically merge the
staging table into the main table.  This makes your queries
more complex (and slower) since you are now having to
search multiple tables.  But it does speed up inserts.


I have checked more fine grained (results and code at the end of the
mail), and it seems indeed that the extreme growth of time needed
for inserts only starts at around 16 rows (on my system). Before
that, it also rises, but much much more slowly. As this indeed
looked like a cache effect, I tried setting the default_cache_size
to something big (2). This was enough to remove the bend (at
least till 200 rows), so that the increase in time kept going
at the same slower pace as for the smaller data sizes.



You might want to try increasing the page size as well as the size of the
page cache. For index pages, this'll increase the btree fanout, reducing 
the btree depth for a given number of entries, and reducing the pressure 
on the cache when inserting new entries.


Christian


--
/"\
\ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 X   - AGAINST MS ATTACHMENTS
/ \

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] serious performance problems with indexes

2006-10-31 Thread Scott Hess

On 10/30/06, Joe Wilson <[EMAIL PROTECTED]> wrote:

If you have only one index, then pre-sorting your large datasets prior to
inserting with the default sqlite cache will yield twice as good insert
performance as not pre-sorting your data and using a huge cache.
This stands to reason since you're basically appending pages to the end
of the database file. (I'm assuming it's just an append since the pre-sorted
inserts' timing is constant regardless of the number of rows).

But if you wish to maintain more than one index and still have good insert
performance with huge datasets, then there's not much else you can
do other than using some combination of the techniques already discussed,
or use another database or some other form of data storage altogether.


You can also take the (somewhat crazy) step of rearranging your tables
so that all indices are sorted.  If you have:

CREATE TABLE x (
 name text primary key,
 office text,
 key(office)
);

You can't keep the name and office both sorted.  But if you had:

CREATE TABLE x (
 name text primary key
);
CREATE TABLE y (
 office text primary key,
 xid integer
);

you could insert into both tables in sorted order in a single
transaction.  At the cost of taking up a bit more space, and making
all your SQL uglier, etc.  Really a hack, though.

-scott

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] serious performance problems with indexes

2006-10-30 Thread Joe Wilson
If you have only one index, then pre-sorting your large datasets prior to 
inserting with the default sqlite cache will yield twice as good insert 
performance as not pre-sorting your data and using a huge cache. 
This stands to reason since you're basically appending pages to the end
of the database file. (I'm assuming it's just an append since the pre-sorted
inserts' timing is constant regardless of the number of rows).

But if you wish to maintain more than one index and still have good insert 
performance with huge datasets, then there's not much else you can 
do other than using some combination of the techniques already discussed, 
or use another database or some other form of data storage altogether.

SQLite's virtual table mechanism might be useful in your case.
You can create custom code and a custom file format that is optimal for
your dataset and indexes and still have the benefit of an SQL interface:

 http://www.sqlite.org/cvstrac/wiki?p=VirtualTables

This paper may give you some ideas as well:

 http://labs.google.com/papers/sawzall.html

- Original Message 
From: Peter De Rijk <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Monday, October 30, 2006 1:00:25 PM
Subject: Re: [sqlite] serious performance problems with indexes

Presorting is unfortunately not always possible, e.g. when 
there are several indexes (with different order). But increasing 
the cache seems a good solution for this anyway (and is less 
cumbersome than presorting or staging). Your ideas set me thinking
on why even with the larger cache, we still do worse than mysql on large 
datasets, and did some further experimenting (code at the end of the mail, 
all tests with larger cache size). 

I guess that it is not only the number of entries that is 
important, but that there are at least two different factors 
affecting the performance: the number of different values actually present 
in the index, and the number of rows that have one value

The number of values has an important influence. If there is 
only a limited number of possible values, performance stays nearly 
identical regardless of the data size (and thus faster than mysql).

For larger numbers of possible values behaviour gets a bit 
unexpected (to me at least): the time needed for random inserts rises 
about linear with the data size, until it reaches a certain point, 
after which the insert time remains the same regardless of dataset 
size. The height of this plateau seems to related to the number of 
possible values in a linear fashion.







-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] serious performance problems with indexes

2006-10-30 Thread Peter De Rijk
On Sunday 29 October 2006 18:47, Joe Wilson wrote:
> The last test that simulated pre-sorting 2 million rows
> in index order prior to insert may show optimal insert speed
> since it's only appending pages to the table and the index,
> but it may not be realistic, since you may not have sufficient
> memory to pre-sort all data in memory prior to insert.
> (Although if you have the memory to spare, this is the way to go).

Presorting is unfortunately not always possible, e.g. when 
there are several indexes (with different order). But increasing 
the cache seems a good solution for this anyway (and is less 
cumbersome than presorting or staging). Your ideas set me thinking
on why even with the larger cache, we still do worse than mysql on large 
datasets, and did some further experimenting (code at the end of the mail, 
all tests with larger cache size). 

I guess that it is not only the number of entries that is 
important, but that there are at least two different factors 
affecting the performance: the number of different values actually present 
in the index, and the number of rows that have one value

The number of values has an important influence. If there is 
only a limited number of possible values, performance stays nearly 
identical regardless of the data size (and thus faster than mysql).

For larger numbers of possible values behaviour gets a bit 
unexpected (to me at least): the time needed for random inserts rises 
about linear with the data size, until it reaches a certain point, 
after which the insert time remains the same regardless of dataset 
size. The height of this plateau seems to related to the number of 
possible values in a linear fashion.

Any ideas?

-- code --

proc runtest {sqlfile} {
  set result {}
  set delay 1000
  exec sync; after $delay;
  set t [time "exec sqlite3 lc.db <$sqlfile" 1]
  set t [expr {[lindex $t 0]/100.0}]
  lappend result [format %.3f $t]
  return $result
}

set numvlist {2 10 100 200 300 400 500 600 700 800 1000 2000 3000 5000 7000 
1}
# Initialize the environment
#
expr srand(1)
catch {exec /bin/sh -c {rm -f *.db}}
set fd [open clear.sql w]
foreach numv $numvlist {
puts $fd "drop table t$numv;"
}
close $fd
catch {exec mysql drh 

Re: [sqlite] serious performance problems with indexes

2006-10-29 Thread Joe Wilson
The last test that simulated pre-sorting 2 million rows 
in index order prior to insert may show optimal insert speed
since it's only appending pages to the table and the index, 
but it may not be realistic, since you may not have sufficient 
memory to pre-sort all data in memory prior to insert.
(Although if you have the memory to spare, this is the way to go).

Here's another test on the same machine that uses 
random inputs but inserts batches of 2 pre-sorted 
rows at time. You can see that insert time increases with 
the size of the table, but it is not exponential as was the
case with the original test program.

You can also see that having a large sqlite cache does
not significantly effect the insert timings under 2M rows 
as compared to the default sqlite cache.

Tcl code change follows the data.

tablesize   sqlite  sqlite-largecache
0   1.909   1.909
2   2.019   2.048
4   2.064   2.079
6   2.111   2.079
8   2.142   2.158
10  2.157   2.157
12  2.282   2.235
14  2.298   2.267
16  2.392   2.298
18  2.439   2.314
20  2.361   2.345
22  2.501   2.439
24  2.689   3.439
26  2.704   3.423
28  2.673   3.564
30  2.767   3.533
32  2.829   3.564
34  2.876   2.657
36  2.955   2.626
38  3.017   3.673
40  3.001   3.704
42  3.048   2.767
44  2.970   3.783
46  3.048   2.845
48  3.095   3.861
50  3.157   3.861
52  3.173   3.923
54  3.142   3.970
56  3.236   3.986
58  3.220   4.002
60  3.408   4.970
62  3.267   4.142
64  3.423   5.017
66  3.454   4.189
68  3.611   4.220
70  3.611   5.142
72  3.704   5.251
74  3.829   4.330
76  3.657   4.330
78  3.970   4.455
80  4.001   4.470
82  4.046   4.454
84  4.173   4.470
86  4.142   4.611
88  4.126   4.548
90  4.298   4.501
92  4.517   4.642
94  4.641   4.689
96  4.517   5.579
98  4.673   4.736
100 4.736   4.751
102 4.689   4.845
104 4.923   4.798
106 5.111   4.939
108 5.204   4.314
110 5.048   4.876
112 5.064   5.814
114 5.205   5.876
116 5.314   5.876
118 5.501   5.142
120 6.517   5.032
122 5.736   5.048
124 5.861   6.986
126 5.892   5.986
128 5.767   5.298
130 6.095   6.095
132 6.080   5.329
134 5.954   5.283
136 6.548   5.986
138 6.157   5.423
140 6.642   6.330
142 6.439   5.643
144 6.705   5.579
146 6.626   5.611
148 6.814   6.517
150 6.720   6.533
152 7.111   6.579
154 7.142   5.845
156 7.533   6.173
158 7.345   6.064
160 7.689   6.142
162 7.798   6.283
164 7.876   6.486
166 7.704   6.408
168 8.283   6.470
170 8.455   6.517
172 8.126   8.049
174 8.314   7.720
176 8.564   7.017
178 8.845   7.095
180 8.814   7.079
182 9.126   7.002
184 8.798   7.174
186 9.189   7.236
188 9.111   7.184
190 9.986   11.830
192 9.658   7.673
194 9.876   9.126
196 9.892   7.908
198 9.626   7.580

set o [open results.csv w]
puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql"
set step 2
for {set num 0} {$num < 200} {incr num $step} {
  puts $num
  set fd [open test.sql w]
  set rows [list]
  for {set i 1} {$i<=$step} {incr i} {
lappend rows [list [expr {$num+$i}] [expr {int(rand()*50)}]]
  }
  puts $fd "BEGIN;"
  foreach i [lsort -integer -index 1 $rows] {
puts $fd "INSERT INTO t1 VALUES([lindex $i 0], [lindex $i 1]);"
  }
  puts $fd "COMMIT;"
  close $fd
  puts $o $num\t[join [runtest test.sql] \t]
  flush $o
}
close $o


- Original Message 
From: Joe Wilson <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Saturday, October 28, 2006 1:08:20 PM
Subject: Re: [sqlite] serious performance problems with indexes

If you pre-sort the data prior to inserting into sqlite you will
see much better timings for both the default cache size and
the larger cache size.

When I run your script (unmodified, except for the removal of 
MySQL) up to 40:

tablesize   sqlite  sqlite-largecache
0   1.956   1.956
2   2.064   2.079
4   2.095   2.111
6   2.173   2.173
8   2.189   2.283
10  2.345   2.298
12  2.345   2.360
14  2.407   2.361
16  2.470   2.423
18  2.548   2.501
20  2.564   2.439
22  2.830   2.439
24  2.876   2.907
26  2.829   2.564
28  3.423   3.533
30  4.939   3.564
32  7.236   3.736
34  7.283   3.751
36  10.611  3.767
38  11.142  3.845
40  13.736  3.798

When I make the following change to your script to simulate an ordered data set
(okay, it's cheating slightly - no pre-sort need be performed here):

   # set r [expr {int(rand()*50)}]
   set r [expr {($num+$i)*5}]

tablesize   sqlite  sqlite-largecache
0   1.878   1.894
2   1.925   1.923
4   1.923   1.923
6   1.923   1.954
8

Re: [sqlite] serious performance problems with indexes

2006-10-28 Thread Joe Wilson
If you pre-sort the data prior to inserting into sqlite you will
see much better timings for both the default cache size and
the larger cache size.

When I run your script (unmodified, except for the removal of 
MySQL) up to 40:

tablesize   sqlite  sqlite-largecache
0   1.956   1.956
2   2.064   2.079
4   2.095   2.111
6   2.173   2.173
8   2.189   2.283
10  2.345   2.298
12  2.345   2.360
14  2.407   2.361
16  2.470   2.423
18  2.548   2.501
20  2.564   2.439
22  2.830   2.439
24  2.876   2.907
26  2.829   2.564
28  3.423   3.533
30  4.939   3.564
32  7.236   3.736
34  7.283   3.751
36  10.611  3.767
38  11.142  3.845
40  13.736  3.798

When I make the following change to your script to simulate an ordered data set
(okay, it's cheating slightly - no pre-sort need be performed here):

   # set r [expr {int(rand()*50)}]
   set r [expr {($num+$i)*5}]

tablesize   sqlite  sqlite-largecache
0   1.878   1.894
2   1.925   1.923
4   1.923   1.923
6   1.923   1.954
8   1.970   1.923
10  1.923   1.908
12  1.923   1.970
14  1.940   2.383
16  2.048   1.908
18  1.923   2.002
20  1.923   1.923
22  1.939   1.923
24  1.923   1.923
26  1.939   1.923
28  1.939   1.939
30  1.954   1.939
32  1.939   1.923
34  1.970   1.954
36  1.939   1.954
38  1.923   1.939
40  1.970   1.939

I wonder if other databases pre-sort their batch inserts by 
index order to achieve more consistant insert speed with 
datasets that exceed cache size.


- Original Message 
From: Peter De Rijk <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Saturday, October 28, 2006 10:40:03 AM
Subject: Re: [sqlite] serious performance problems with indexes

I have checked more fine grained (results and code at the end of the  
mail), and it seems indeed that the extreme growth of time needed 
for inserts only starts at around 16 rows (on my system). Before 
that, it also rises, but much much more slowly. As this indeed 
looked like a cache effect, I tried setting the default_cache_size 
to something big (2). This was enough to remove the bend (at 
least till 200 rows), so that the increase in time kept going 
at the same slower pace as for the smaller data sizes.

Curiously, mysql which I included as a reference, manages to 
keep nearly the same insert time even for the largest data sets.
Using the large cache size, sqlite is faster than mysql until about 
70 rows. At 200 rows mysql is about 3 times faster than sqlite
with large cache (vs more than 20 times faster than sqlite with 
default cache size).

--- 
tablesizesqlitesqlite-largecache mysql
01.121.042.86
21.091.172.89
41.141.153.27
61.231.23.18
81.471.253.01
101.31.322.94
121.41.342.94
142.061.383
162.411.462.99
183.091.543.03
203.311.623.04
223.841.723.06
244.221.73.03
264.231.763.06
284.831.942.98
306.272.083.07
329.022.063.11
349.481.942.88
36102.133.04
3810.62.213.13
4010.722.353.05
4211.352.243.11
4411.552.493.05
4611.752.53.07
4816.892.393.1
5017.812.593.09
5219.222.43.03
5419.352.773.05
5619.442.593.1
5819.872.673.08
6021.472.883.09
6220.822.743.08
6421.5533.11
6625.173.093.12
6829.892.843.16
7028.483.253.09
7228.623.313.2
7428.693.233.07
7630.343.593.1
7830.273.643.13
8030.573.373.14
8231.413.573.06
8431.893.893.09
8633.773.573.05
8836.013.873.1
9037.584.043.07
9240.593.683.04
9440.93.723.09
9641.454.243.1
9842.054.283.13
10041.114.093.03
10242.294.633.11
10442.094.813.17
10642.84.343.14
10843.824.433.02
11043.014.53.12
11244.084.843.1
11445.45.083.16
11647.514.663.13
11847.465.023.17
12048.215.423.17
12249.095.333.21
12449.655.123.1
12650.555.33.11
12850.565.83.17
13052.355.73.17
13253.675.433.14
13452.385.793.26
13651.446.033.18
13852.696.16  

Re: [sqlite] serious performance problems with indexes

2006-10-28 Thread Peter De Rijk
--On Friday 27 October 2006 15:31, [EMAIL PROTECTED] wrote:
> When a table is indexed, INSERT performance is logorithmic in the
> number of rows in the table and linear in the number of indices.
> This is because entries have to be inserted into the index in
> sorted order (otherwise it wouldn't be an index).  And each
> insert thus requires a binary search.
>
> If your index becomes very large so that it no longer fits
> in your disk cache, then the binary search can involve a lot
> of disk I/O which can make things really slow.  The usual
> work-around here is to keep a much smaller staging table into
> which you do your inserts and then periodically merge the
> staging table into the main table.  This makes your queries
> more complex (and slower) since you are now having to
> search multiple tables.  But it does speed up inserts.
> --
> D. Richard Hipp  <[EMAIL PROTECTED]>
>

I have checked more fine grained (results and code at the end of the  
mail), and it seems indeed that the extreme growth of time needed 
for inserts only starts at around 16 rows (on my system). Before 
that, it also rises, but much much more slowly. As this indeed 
looked like a cache effect, I tried setting the default_cache_size 
to something big (2). This was enough to remove the bend (at 
least till 200 rows), so that the increase in time kept going 
at the same slower pace as for the smaller data sizes.

Curiously, mysql which I included as a reference, manages to 
keep nearly the same insert time even for the largest data sets.
Using the large cache size, sqlite is faster than mysql until about 
70 rows. At 200 rows mysql is about 3 times faster than sqlite
with large cache (vs more than 20 times faster than sqlite with 
default cache size).

--- 
tablesize   sqlite  sqlite-largecachemysql
0   1.121.042.86
2   1.091.172.89
4   1.141.153.27
6   1.231.2 3.18
8   1.471.253.01
10  1.3 1.322.94
12  1.4 1.342.94
14  2.061.383
16  2.411.462.99
18  3.091.543.03
20  3.311.623.04
22  3.841.723.06
24  4.221.7 3.03
26  4.231.763.06
28  4.831.942.98
30  6.272.083.07
32  9.022.063.11
34  9.481.942.88
36  10  2.133.04
38  10.62.213.13
40  10.72   2.353.05
42  11.35   2.243.11
44  11.55   2.493.05
46  11.75   2.5 3.07
48  16.89   2.393.1
50  17.81   2.593.09
52  19.22   2.4 3.03
54  19.35   2.773.05
56  19.44   2.593.1
58  19.87   2.673.08
60  21.47   2.883.09
62  20.82   2.743.08
64  21.55   3   3.11
66  25.17   3.093.12
68  29.89   2.843.16
70  28.48   3.253.09
72  28.62   3.313.2
74  28.69   3.233.07
76  30.34   3.593.1
78  30.27   3.643.13
80  30.57   3.373.14
82  31.41   3.573.06
84  31.89   3.893.09
86  33.77   3.573.05
88  36.01   3.873.1
90  37.58   4.043.07
92  40.59   3.683.04
94  40.93.723.09
96  41.45   4.243.1
98  42.05   4.283.13
100 41.11   4.093.03
102 42.29   4.633.11
104 42.09   4.813.17
106 42.84.343.14
108 43.82   4.433.02
110 43.01   4.5 3.12
112 44.08   4.843.1
114 45.45.083.16
116 47.51   4.663.13
118 47.46   5.023.17
120 48.21   5.423.17
122 49.09   5.333.21
124 49.65   5.123.1
126 50.55   5.3 3.11
128 50.56   5.8 3.17
130 52.35   5.7 3.17
132 53.67   5.433.14
134 52.38   5.793.26
136 51.44   6.033.18
138 52.69   6.163.08
140 53.88   5.673.04
142 52.61   5.863.2
144 53.61   5.923.01
146 52.98   5.983.08
148 54.74   6.273.18
150 55.46.813.18
152 58.45   6.543.12
154 57.14   6.653.12
156 58.92   6.973.17
158 58.17   6.863.22
160 58.29   6.833.15
162 58.19   6.6 3.18
164 58.77   7.3 3.16
166 60.59   7.443.22
168 61.12   7.143.18
170 61.25   7.413.33
172 62.27.183.21
174 61.48   7.953.02
176 62.63   7.913.17
178 62.58   7.523.19
180 62.28   7.563.14
182 64.27.623.33
184 64.18   8.6 3.17
186 64.34   8.713.33
188 63.93   8.443.22
190 62.98   8.133.3
192 63.98.453.29
194 64.85   8.363.21
196 68.26   10.75   4.81
198 67.81   9.463.17

-- code --
proc runtest {sqlfile} {
  set result {}
  set delay 1000
  exec sync; after $delay;
  set t [time "exec sqlite3 def.db <$sqlfile" 1]
  set t [expr {[lindex $t 0]/100.0}]
  lappend result [format %.3f $t]
  exec sync; after $delay;

Re: [sqlite] serious performance problems with indexes

2006-10-27 Thread Joe Wilson
Hi DRH,

A mailing list post by you outlines a similar problem that I am seeing:

  http://www.mail-archive.com/sqlite-users@sqlite.org/msg15913.html

Have you given any thought to this index page locality matter?

Perhaps something like:

  PRAGMA reserve_pages_for_indexes = 5

Whereby a chunk of file space can be exclusively reserved for index use 
to prevent index fragmentation?

I realize that this goes against the zero-admin principle of SQLite, but 
such a feature might lead to dramatic improvements in bulk insert and 
cold-cache queries.

thanks.

- Original Message 
From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Friday, October 27, 2006 9:31:51 AM
Subject: Re: [sqlite] serious performance problems with indexes

Peter De Rijk <[EMAIL PROTECTED]> wrote:
> I have run into a serious performance problem with tables with many rows.
> The problem only occurs on tables with an index
> The time needed for an insert into a table with an index is dependend on the 
> number of rows. I have not formally checked, but from my tests it looks like 
> an exponential dependence. This of course means that while sqlite is very 
> fast on smaller datasets, it is slow on larger data sets and becomes unusable 
> on large datasets (million of rows). The insert behaviour is normal on non 
> indexed tables, but obviously queries are a problem then.
> Is this index behaviour normal/expected for sqlite, or can this be solved?
> 

When a table is indexed, INSERT performance is logorithmic in the 
number of rows in the table and linear in the number of indices.  
This is because entries have to be inserted into the index in 
sorted order (otherwise it wouldn't be an index).  And each 
insert thus requires a binary search.

If your index becomes very large so that it no longer fits
in your disk cache, then the binary search can involve a lot
of disk I/O which can make things really slow.  The usual
work-around here is to keep a much smaller staging table into
which you do your inserts and then periodically merge the
staging table into the main table.  This makes your queries
more complex (and slower) since you are now having to
search multiple tables.  But it does speed up inserts.
--
D. Richard Hipp  <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]
-







-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] serious performance problems with indexes

2006-10-27 Thread drh
Peter De Rijk <[EMAIL PROTECTED]> wrote:
> I have run into a serious performance problem with tables with many rows.
> The problem only occurs on tables with an index
> The time needed for an insert into a table with an index is dependend on the 
> number of rows. I have not formally checked, but from my tests it looks like 
> an exponential dependence. This of course means that while sqlite is very 
> fast on smaller datasets, it is slow on larger data sets and becomes unusable 
> on large datasets (million of rows). The insert behaviour is normal on non 
> indexed tables, but obviously queries are a problem then.
> Is this index behaviour normal/expected for sqlite, or can this be solved?
> 

When a table is indexed, INSERT performance is logorithmic in the 
number of rows in the table and linear in the number of indices.  
This is because entries have to be inserted into the index in 
sorted order (otherwise it wouldn't be an index).  And each 
insert thus requires a binary search.

If your index becomes very large so that it no longer fits
in your disk cache, then the binary search can involve a lot
of disk I/O which can make things really slow.  The usual
work-around here is to keep a much smaller staging table into
which you do your inserts and then periodically merge the
staging table into the main table.  This makes your queries
more complex (and slower) since you are now having to
search multiple tables.  But it does speed up inserts.
--
D. Richard Hipp  <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] serious performance problems with indexes

2006-10-27 Thread Arjen Markus

Peter De Rijk wrote:


I have run into a serious performance problem with tables with many rows.
The problem only occurs on tables with an index
The time needed for an insert into a table with an index is dependend on the 
number of rows. I have not formally checked, but from my tests it looks like 
an exponential dependence. This of course means that while sqlite is very 
fast on smaller datasets, it is slow on larger data sets and becomes unusable 
on large datasets (million of rows). The insert behaviour is normal on non 
indexed tables, but obviously queries are a problem then.

Is this index behaviour normal/expected for sqlite, or can this be solved?
 


Peter,
unless I am mistaken, inserting into a table that has an index requires 
the index
to be rebuilt for each insert. Are you inserting multiple rows in a 
single transaction
or not? If you could put them in a single transaction, I think the 
performance

will remain adequate, even for very large tables.

Regards,

Arjen



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] serious performance problems with indexes

2006-10-27 Thread Peter De Rijk

I have run into a serious performance problem with tables with many rows.
The problem only occurs on tables with an index
The time needed for an insert into a table with an index is dependend on the 
number of rows. I have not formally checked, but from my tests it looks like 
an exponential dependence. This of course means that while sqlite is very 
fast on smaller datasets, it is slow on larger data sets and becomes unusable 
on large datasets (million of rows). The insert behaviour is normal on non 
indexed tables, but obviously queries are a problem then.
Is this index behaviour normal/expected for sqlite, or can this be solved?


-- 
Dr Peter De Rijk E-mail: [EMAIL PROTECTED]
Bioinformatics Unit  Tel. +32 3 265 10 08
Department of Molecular Genetics VIB8Fax. +32 3 265 10 12
University of Antwerphttp://www.molgen.ua.ac.be/
Universiteitsplein 1 
B-2610 Antwerpen, Belgium

The glass is not half full, nor half empty. The glass is just too big.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-