[HACKERS] Constant time insertion into highly non-unique indexes

2005-04-14 Thread Simon Riggs
Recent discussions on PERFORM have made me look into some aspects of B-tree index code, especially with regard to bulk loading high volumes of data. I now have cause for concern about the way that Btree index code currently works when inserting large volumes of data into a table with non-unique

Re: [HACKERS] Constant time insertion into highly non-unique indexes

2005-04-14 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: Recent discussions on PERFORM have made me look into some aspects of B-tree index code, especially with regard to bulk loading high volumes of data. Have you read the archived discussions that led up to the current algorithm? I don't think it's nearly as

Re: [HACKERS] Constant time insertion into highly non-unique indexes

2005-04-14 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: The move right only occurs when the page is full, so the chance of moving right is not 0.99^250, but 0.99, since the previous 249 inserts would not cause a page split. Sure, but given that we have a full page, the probability that 250 successive insertions

Re: [HACKERS] Constant time insertion into highly non-unique indexes

2005-04-14 Thread Tom Lane
Just to check if I was nuts or not, I made up a test case: create table foo (f1 text); create index fooi on foo(f1); then truncate foo; copy foo from stdin; 99 98 97 ... one million rows ... 02 01 00 \. versus truncate foo; copy foo from stdin;

Re: [HACKERS] Constant time insertion into highly non-unique indexes

2005-04-14 Thread Tom Lane
I wrote: So the theory does work, at least for small index entries. Currently repeating with wider ones ... I tried the same test with the row width extended to 100 characters and then 500 characters. The runtime and number of _bt_compare calls is still about the same for the