Re: [HACKERS] Red-black tree for GIN

2010-01-24 Thread Teodor Sigaev

1. I think rb_free_recursive is missing a pfree().


Fixed, thank you


2. We already have a definition of NIL in the PG source base.  I think
this one needs to be named something else.  RBNIL, maybe.

fixed



3. This code could really use some more comments, and maybe some of
the variable names could be better chosen, too.  It's far from obvious
what is going on here.  I studied rbtrees in college and I still
remember more or less how they work, but, boy, this is hard to follow.
  The names of the iterator states are truly horrible, at least IMO.

I changed names and slightly improve comments.



4. It would be nice if you could do a better job conforming to project
indentation style.

Did pgindent.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


rbtree-0.6.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-24 Thread Robert Haas
2010/1/24 Teodor Sigaev teo...@sigaev.ru:
 3. This code could really use some more comments, and maybe some of
 the variable names could be better chosen, too.  It's far from obvious
 what is going on here.  I studied rbtrees in college and I still
 remember more or less how they work, but, boy, this is hard to follow.
  The names of the iterator states are truly horrible, at least IMO.

 I changed names and slightly improve comments.

Would it be OK if I went through here and hacked on the comments and
sent this back to you?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-24 Thread Teodor Sigaev



Would it be OK if I went through here and hacked on the comments and
sent this back to you?

Feel free to edit the patch.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-20 Thread Robert Haas
On Mon, Jan 11, 2010 at 1:18 PM, Robert Haas robertmh...@gmail.com wrote:
 2010/1/11 Teodor Sigaev teo...@sigaev.ru:
 knngist uses that implementation of rb-tree. One more candidate is a ts_stat
 which now uses unbalanced binary tree.

 Ah, OK.  That's great if we can reuse that code in 2 or 3 places.

Some preliminary thoughts on this patch:

1. I think rb_free_recursive is missing a pfree().

2. We already have a definition of NIL in the PG source base.  I think
this one needs to be named something else.  RBNIL, maybe.

3. This code could really use some more comments, and maybe some of
the variable names could be better chosen, too.  It's far from obvious
what is going on here.  I studied rbtrees in college and I still
remember more or less how they work, but, boy, this is hard to follow.
 The names of the iterator states are truly horrible, at least IMO.

4. It would be nice if you could do a better job conforming to project
indentation style.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-11 Thread Teodor Sigaev

...and I would say the same logic applies to this patch, maybe even
moreso.  Tom has already applied a partial workaround for this
problem, and I'm feeling like it won't be trivial to figure out what


That partial workaround is not work for some corner cases:
http://www.sai.msu.su/~megera/wiki/2009-07-27 (8.4 already has that hask!)


The coding pattern that this patch uses also merits some discussion.
Basically, rbtree.c is a generic implementation of red-black trees -
from a textbook - which ginbulk.c then uses for GIN.  One possible
advantage of this implementation is that it might make it possible for
us to use the rbtree.c logic in other places, if we have other data
structures that need similar treatment.  But I'm not sure if that's
the way we want to go.  The other alternative is to drop the
generalized implementation and incorporate the logic directly into
ginbulk.c.  I really don't know which is better, but I'd like to hear
some other opinions...


knngist uses that implementation of rb-tree. One more candidate is a ts_stat 
which now uses unbalanced binary tree.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-11 Thread Robert Haas
2010/1/11 Teodor Sigaev teo...@sigaev.ru:
 knngist uses that implementation of rb-tree. One more candidate is a ts_stat
 which now uses unbalanced binary tree.

Ah, OK.  That's great if we can reuse that code in 2 or 3 places.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-10 Thread Robert Haas
On Thu, Dec 31, 2009 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote:
 My other question is as related to performance.  Can you provide a
 test case that shows the performance improvement with this patch?

So, we still don't have a test case for this patch.  During the
November CommitFest, Greg Smith griped a bit about the lack of a
reproducible performance benchmark for the XLogInsert patch:

http://archives.postgresql.org/pgsql-hackers/2009-12/msg00816.php

...and I would say the same logic applies to this patch, maybe even
moreso.  Tom has already applied a partial workaround for this
problem, and I'm feeling like it won't be trivial to figure out what
to measure to see the remaining issue and measure how much this new
implementation helps.

The coding pattern that this patch uses also merits some discussion.
Basically, rbtree.c is a generic implementation of red-black trees -
from a textbook - which ginbulk.c then uses for GIN.  One possible
advantage of this implementation is that it might make it possible for
us to use the rbtree.c logic in other places, if we have other data
structures that need similar treatment.  But I'm not sure if that's
the way we want to go.  The other alternative is to drop the
generalized implementation and incorporate the logic directly into
ginbulk.c.  I really don't know which is better, but I'd like to hear
some other opinions...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-10 Thread Greg Stark
On Mon, Jan 11, 2010 at 2:42 AM, Robert Haas robertmh...@gmail.com wrote:
 The coding pattern that this patch uses also merits some discussion.
 Basically, rbtree.c is a generic implementation of red-black trees -
 from a textbook - which ginbulk.c then uses for GIN.  One possible
 advantage of this implementation is that it might make it possible for
 us to use the rbtree.c logic in other places, if we have other data
 structures that need similar treatment.  But I'm not sure if that's
 the way we want to go.  The other alternative is to drop the
 generalized implementation and incorporate the logic directly into
 ginbulk.c.  I really don't know which is better, but I'd like to hear
 some other opinions...


So, code reuse is not the only advantage of abstraction. It's also
just plain easier to understand and test code written with clear
abstract interfaces. The way you describe it someone with no knowledge
could look at rbtree.c and see if it's done correctly and maybe
improve it. And someone reading ginbulk only has to understand the
operations red-black trees support and no understand how they're
implemented to follow the code.

Is there any advantage of integrating the code with ginbulk.c? Does it
let us get away with finer grained locks or do any tricks doing
gin-specific changes while we're in the middle of rebalancing or
anything like that?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-10 Thread Oleg Bartunov

Robert,

we have benchmark for rbtree
http://www.sai.msu.su/~megera/wiki/2009-07-27
rbtree, actually, fix corner cases with ordered input, with little
overhead.

As you may see from knngist patch, rbtree used in gist code, so, please,
leave rbtree code as is.

Oleg
On Sun, 10 Jan 2010, Robert Haas wrote:


On Thu, Dec 31, 2009 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote:
 My other question is as related to performance. =A0Can you provide a
 test case that shows the performance improvement with this patch?

So, we still don't have a test case for this patch.  During the
November CommitFest, Greg Smith griped a bit about the lack of a
reproducible performance benchmark for the XLogInsert patch:

http://archives.postgresql.org/pgsql-hackers/2009-12/msg00816.php

...and I would say the same logic applies to this patch, maybe even
moreso.  Tom has already applied a partial workaround for this
problem, and I'm feeling like it won't be trivial to figure out what
to measure to see the remaining issue and measure how much this new
implementation helps.

The coding pattern that this patch uses also merits some discussion.
Basically, rbtree.c is a generic implementation of red-black trees -
from a textbook - which ginbulk.c then uses for GIN.  One possible
advantage of this implementation is that it might make it possible for
us to use the rbtree.c logic in other places, if we have other data
structures that need similar treatment.  But I'm not sure if that's
the way we want to go.  The other alternative is to drop the
generalized implementation and incorporate the logic directly into
ginbulk.c.  I really don't know which is better, but I'd like to hear
some other opinions...

...Robert

--=20
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-04 Thread Alvaro Herrera
Robert Haas escribió:

 I did a quick read-through of this, and one question that immediately
 occurred to me is that rbtree.c says that it is adopted from
 http://algolist.manual.ru/ds/rbtree.php.  But I'm not sure what
 license that code is under, so I'm not sure whether it's OK for us to
 use it.

This code comes from Thomas Niemann's Sorting and Searching Algorithms:
A Cookbook,
http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_man.htm

specifically
http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_rbt.htm

The code is in the public domain; that web page says
Source code, when part of a software project, may be used
freely without reference to the author.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2010-01-04 Thread Robert Haas
On Mon, Jan 4, 2010 at 8:12 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Robert Haas escribió:
 I did a quick read-through of this, and one question that immediately
 occurred to me is that rbtree.c says that it is adopted from
 http://algolist.manual.ru/ds/rbtree.php.  But I'm not sure what
 license that code is under, so I'm not sure whether it's OK for us to
 use it.

 This code comes from Thomas Niemann's Sorting and Searching Algorithms:
 A Cookbook,
 http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_man.htm

 specifically
 http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_rbt.htm

 The code is in the public domain; that web page says
        Source code, when part of a software project, may be used
        freely without reference to the author.

That is excellent.  Perhaps we should document that information in the
code comments where the URL is currently mentioned.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-black tree for GIN

2009-12-31 Thread Robert Haas
2009/11/23 Teodor Sigaev teo...@sigaev.ru:
 Hi there,

 attached is a patch, which contains implementation of a  red-black
 tree,  a self-balanced implementation of binary tree.  The main goal of
 this patch is to improve creation time of GIN index in the corner cases.
 While creation, GIN collects data in memory in binary tree until it
 reach some limit and then it flush tree to disk. Some data could
 produces unbalanced binary tree,  for example, sorted data, so the tree
 degenerates to the list with O(N^2) processing time (see thread
 http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php)
 ), which cause very slow index creation.  Tom has fixed that by limiting
 depth of tree  (committed to 8.3 and 8.4),  but we found it's not enough
 and propose to use red-black tree, which is very good for skewed data and
 has almost the same performance for unsorted data, see
 http://www.sai.msu.su/~megera/wiki/2009-07-27 and
 http://www.sai.msu.su/~megera/wiki/2009-04-03 for more information.

 Implementation of red-black tree has several currently unused  methods,
 but they will be used in next patches.

I did a quick read-through of this, and one question that immediately
occurred to me is that rbtree.c says that it is adopted from
http://algolist.manual.ru/ds/rbtree.php.  But I'm not sure what
license that code is under, so I'm not sure whether it's OK for us to
use it.

My other question is as related to performance.  Can you provide a
test case that shows the performance improvement with this patch?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Red-black tree for GIN

2009-11-23 Thread Teodor Sigaev

Hi there,

attached is a patch, which contains implementation of a  red-black
tree,  a self-balanced implementation of binary tree.  The main goal of
this patch is to improve creation time of GIN index in the corner cases.
While creation, GIN collects data in memory in binary tree until it
reach some limit and then it flush tree to disk. Some data could
produces unbalanced binary tree,  for example, sorted data, so the tree
degenerates to the list with O(N^2) processing time (see thread
http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php)
), which cause very slow index creation.  Tom has fixed that by limiting
depth of tree  (committed to 8.3 and 8.4),  but we found it's not enough
and propose to use red-black tree, which is very good for skewed data and has 
almost the same performance for unsorted data, see 
http://www.sai.msu.su/~megera/wiki/2009-07-27 and

http://www.sai.msu.su/~megera/wiki/2009-04-03 for more information.

Implementation of red-black tree has several currently unused  methods,
but they will be used in next patches.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


rbtree-0.5.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers