Jay A. Kreibich wrote: > On Wed, Jun 17, 2009 at 11:52:45AM +1000, John Machin scratched on the wall: > >> On 17/06/2009 6:17 AM, Hoover, Jeffrey wrote: >> >> >>> One other note, if you have a primary key whose value is continually >>> increasing your pk index can become imbalanced and therefore >>> inefficient. >>> >> A B-tree becomes imbalanced? How so? >> >> http://www.sqlite.org/fileformat.html#btree_structures says: "The tree >> is always of uniform height, meaning the number of intermediate levels >> between each leaf node page and the root page is the same." >> >> Do you have any evidence to the contrary? >> > > It won't become imbalanced, but if you're inserting rows with an > explicit INTEGER PRIMARY KEY value in a non-increasing order, the > tree will require sorting and re-balancing. That takes time and > requires additional disk writes (and, as others have pointed out, > disk writes are VERY expensive due to their transactional nature). > > Also, depending on just how mixed up the pattern is, you can get into > situations where a very large index will over-flow the default 1500 > page cache-size. It is well known that if you want to build an index > on a large table, increasing the cache size will help make that > process faster. It might be true here as well. Try setting the page > cache to something nice and huge, like 10x or 100x the default, and > see if that helps. > > -j > > You are correct that a split in a B-Tree is expensive, more so if there are many levels. Also the B-Tree algorithm keeps the index balanced but it does not prevent fragmentation. An addition to the B-Tree logic to perform node merges where possible will limit fragmentation and limit the creation of unnecessary levels.
B-Trees which are expected to grow substantially can achieve a speed increase by partially filling the nodes so that insertions can occur without forcing a split. My guess is if you have a large number of Sqlite insertions you might find that to drop the index and re-raise it when the insertions are complete will be faster. Note that the Sqlite rowids are organized as a B-Tree, but because they are consecutive numbers they make a well organized index. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users