Re: [HACKERS] Max size of a btree index entry

2006-07-19 Thread Jim C. Nasby
On Tue, Jul 11, 2006 at 10:02:49AM -0400, Tom Lane wrote:
 Currently, we restrict btree index tuples to a size that ensures three of
 them will fit on a page.  The motivation for this is the following two
 considerations:
 
 1. In a non-rightmost page, we need to include a high key, or page
 boundary key, that isn't one of the useful data keys.
 
Why does a leaf page need a boundary key? ISTM if that wasn't the case,
we could actually allow keys to be nearly 8K, constrained by a non-leaf
page needing to include two pointers.

I guess I must be missing something here (and nbtree/README isn't
helping).

 2. In a non-leaf page, there had better be at least two child pages
 (downlink entries), else we have failed to subdivide the page's key
 range at all, and thus there would be a nonterminating recursion.
 
 However: a non-leaf page actually has one more pointer than key,
 eg a page with three children needs only two data keys:
 
  entire key range assigned to page --
 
 -- range 1 --  boundary key -- range 2 --  boundary key -- range 3 --
  |   |   |
  v   v   v
 child page 1   child page 2 child page 3
 
 We implement this by having the first data tuple on a non-leaf page
 contain only a downlink TID and no key data, ie it's just the header.
 
 So it appears to me that we could allow the maximum size of a btree
 entry to be just less than half a page, rather than just less than
 a third of a page --- the worst-case requirement for a non-leaf page
 is not three real tuples, but one tuple header and two real tuples.
 On a leaf page we might manage to fit only one real data item, but
 AFAICS that doesn't pose any correctness problems.
 
 Obviously a tree containing many such pages would be awfully inefficient
 to search, but I think a more common case is that there are a few wide
 entries in an index of mostly short entries, and so pushing the hard
 limit up a little would add some flexibility with little performance
 cost in real-world cases.
 
 Have I missed something?  Is this worth changing?
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Max size of a btree index entry

2006-07-19 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Tue, Jul 11, 2006 at 10:02:49AM -0400, Tom Lane wrote:
 1. In a non-rightmost page, we need to include a high key, or page
 boundary key, that isn't one of the useful data keys.
 
 Why does a leaf page need a boundary key?

So you can tell whether a proposed insertion ought to go into this page,
or the one to its right.  The tree descent logic doesn't guarantee that
you descend to exactly the correct page --- if concurrent page splits
are going on, you might have to move right one or more times after
reaching the leaf level.  You need the boundary key to make this test
correctly.

And of course, the reason there's no high key on the rightmost page is
exactly that it has no right-hand neighbor, hence no upper limit on its
delegated key space.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Max size of a btree index entry

2006-07-19 Thread Jim C. Nasby
On Wed, Jul 19, 2006 at 06:23:44PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  On Tue, Jul 11, 2006 at 10:02:49AM -0400, Tom Lane wrote:
  1. In a non-rightmost page, we need to include a high key, or page
  boundary key, that isn't one of the useful data keys.
  
  Why does a leaf page need a boundary key?
 
 So you can tell whether a proposed insertion ought to go into this page,
 or the one to its right.  The tree descent logic doesn't guarantee that
 you descend to exactly the correct page --- if concurrent page splits
 are going on, you might have to move right one or more times after
 reaching the leaf level.  You need the boundary key to make this test
 correctly.
 
 And of course, the reason there's no high key on the rightmost page is
 exactly that it has no right-hand neighbor, hence no upper limit on its
 delegated key space.

Could you not just scan right and see what the first key was? Thought
granted, that means there's a chance of a wasted page scan, but I think
that'd be somewhat of a corner case, so it might not be bad.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Max size of a btree index entry

2006-07-19 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 Could you not just scan right and see what the first key was? Thought
 granted, that means there's a chance of a wasted page scan, but I think
 that'd be somewhat of a corner case, so it might not be bad.

No, because (a) that confuses the first key that happens to be on a page
with its keyspace boundary --- what happens when you need to delete that
data key? and (b) because of locking considerations, you don't want to
move right and then have to back up.  You'd have to hold lock on the
first page while reading in the second, which makes for a nontrivial
performance hit.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Max size of a btree index entry

2006-07-11 Thread Tom Lane
Currently, we restrict btree index tuples to a size that ensures three of
them will fit on a page.  The motivation for this is the following two
considerations:

1. In a non-rightmost page, we need to include a high key, or page
boundary key, that isn't one of the useful data keys.

2. In a non-leaf page, there had better be at least two child pages
(downlink entries), else we have failed to subdivide the page's key
range at all, and thus there would be a nonterminating recursion.

However: a non-leaf page actually has one more pointer than key,
eg a page with three children needs only two data keys:

 entire key range assigned to page --

-- range 1 --  boundary key -- range 2 --  boundary key -- range 3 --
 |   |   |
 v   v   v
child page 1   child page 2 child page 3

We implement this by having the first data tuple on a non-leaf page
contain only a downlink TID and no key data, ie it's just the header.

So it appears to me that we could allow the maximum size of a btree
entry to be just less than half a page, rather than just less than
a third of a page --- the worst-case requirement for a non-leaf page
is not three real tuples, but one tuple header and two real tuples.
On a leaf page we might manage to fit only one real data item, but
AFAICS that doesn't pose any correctness problems.

Obviously a tree containing many such pages would be awfully inefficient
to search, but I think a more common case is that there are a few wide
entries in an index of mostly short entries, and so pushing the hard
limit up a little would add some flexibility with little performance
cost in real-world cases.

Have I missed something?  Is this worth changing?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Max size of a btree index entry

2006-07-11 Thread Josh Berkus

Tom,


Obviously a tree containing many such pages would be awfully inefficient
to search, but I think a more common case is that there are a few wide
entries in an index of mostly short entries, and so pushing the hard
limit up a little would add some flexibility with little performance
cost in real-world cases.

Have I missed something?  Is this worth changing?


Not sure.  I don't know that the difference between 2.7K and 3.9K would 
have ever made a difference to me in any real-world case.


If we're going to tinker with this code, it would be far more valuable 
to automatically truncate b-tree entries at, say, 1K so that they could 
be efficiently indexed.


Of course, a quick archives search of -SQL, -Newbie and -General would 
indicate how popular of an issue this is.


--Josh Berkus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Max size of a btree index entry

2006-07-11 Thread Hannu Krosing
Ühel kenal päeval, T, 2006-07-11 kell 10:46, kirjutas Josh Berkus:
 Tom,
 
  Obviously a tree containing many such pages would be awfully inefficient
  to search, but I think a more common case is that there are a few wide
  entries in an index of mostly short entries, and so pushing the hard
  limit up a little would add some flexibility with little performance
  cost in real-world cases.
  
  Have I missed something?  Is this worth changing?
 
 Not sure.  I don't know that the difference between 2.7K and 3.9K would 
 have ever made a difference to me in any real-world case.

One (hopefully) soon-to-be real-world case is index-only queries.

We discussed one approach with Luke and he expressed interest in getting
actually done in not too distant future.

 If we're going to tinker with this code, it would be far more valuable 
 to automatically truncate b-tree entries at, say, 1K so that they could 
 be efficiently indexed.

That would not work, if we want to get all data from indexes.

Maybe compressing the keys (like we do for TOAST) would be a better
solution.

 Of course, a quick archives search of -SQL, -Newbie and -General would 
 indicate how popular of an issue this is.

It may become populat again, when we will be able to do index-only
scans. 

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org