Re: [HACKERS] Red-black tree for GIN
1. I think rb_free_recursive is missing a pfree(). Fixed, thank you 2. We already have a definition of NIL in the PG source base. I think this one needs to be named something else. RBNIL, maybe. fixed 3. This code could really use some more comments, and maybe some of the variable names could be better chosen, too. It's far from obvious what is going on here. I studied rbtrees in college and I still remember more or less how they work, but, boy, this is hard to follow. The names of the iterator states are truly horrible, at least IMO. I changed names and slightly improve comments. 4. It would be nice if you could do a better job conforming to project indentation style. Did pgindent. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ rbtree-0.6.gz Description: Unix tar archive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
2010/1/24 Teodor Sigaev teo...@sigaev.ru: 3. This code could really use some more comments, and maybe some of the variable names could be better chosen, too. It's far from obvious what is going on here. I studied rbtrees in college and I still remember more or less how they work, but, boy, this is hard to follow. The names of the iterator states are truly horrible, at least IMO. I changed names and slightly improve comments. Would it be OK if I went through here and hacked on the comments and sent this back to you? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
Would it be OK if I went through here and hacked on the comments and sent this back to you? Feel free to edit the patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
On Mon, Jan 11, 2010 at 1:18 PM, Robert Haas robertmh...@gmail.com wrote: 2010/1/11 Teodor Sigaev teo...@sigaev.ru: knngist uses that implementation of rb-tree. One more candidate is a ts_stat which now uses unbalanced binary tree. Ah, OK. That's great if we can reuse that code in 2 or 3 places. Some preliminary thoughts on this patch: 1. I think rb_free_recursive is missing a pfree(). 2. We already have a definition of NIL in the PG source base. I think this one needs to be named something else. RBNIL, maybe. 3. This code could really use some more comments, and maybe some of the variable names could be better chosen, too. It's far from obvious what is going on here. I studied rbtrees in college and I still remember more or less how they work, but, boy, this is hard to follow. The names of the iterator states are truly horrible, at least IMO. 4. It would be nice if you could do a better job conforming to project indentation style. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
...and I would say the same logic applies to this patch, maybe even moreso. Tom has already applied a partial workaround for this problem, and I'm feeling like it won't be trivial to figure out what That partial workaround is not work for some corner cases: http://www.sai.msu.su/~megera/wiki/2009-07-27 (8.4 already has that hask!) The coding pattern that this patch uses also merits some discussion. Basically, rbtree.c is a generic implementation of red-black trees - from a textbook - which ginbulk.c then uses for GIN. One possible advantage of this implementation is that it might make it possible for us to use the rbtree.c logic in other places, if we have other data structures that need similar treatment. But I'm not sure if that's the way we want to go. The other alternative is to drop the generalized implementation and incorporate the logic directly into ginbulk.c. I really don't know which is better, but I'd like to hear some other opinions... knngist uses that implementation of rb-tree. One more candidate is a ts_stat which now uses unbalanced binary tree. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
2010/1/11 Teodor Sigaev teo...@sigaev.ru: knngist uses that implementation of rb-tree. One more candidate is a ts_stat which now uses unbalanced binary tree. Ah, OK. That's great if we can reuse that code in 2 or 3 places. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
On Thu, Dec 31, 2009 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote: My other question is as related to performance. Can you provide a test case that shows the performance improvement with this patch? So, we still don't have a test case for this patch. During the November CommitFest, Greg Smith griped a bit about the lack of a reproducible performance benchmark for the XLogInsert patch: http://archives.postgresql.org/pgsql-hackers/2009-12/msg00816.php ...and I would say the same logic applies to this patch, maybe even moreso. Tom has already applied a partial workaround for this problem, and I'm feeling like it won't be trivial to figure out what to measure to see the remaining issue and measure how much this new implementation helps. The coding pattern that this patch uses also merits some discussion. Basically, rbtree.c is a generic implementation of red-black trees - from a textbook - which ginbulk.c then uses for GIN. One possible advantage of this implementation is that it might make it possible for us to use the rbtree.c logic in other places, if we have other data structures that need similar treatment. But I'm not sure if that's the way we want to go. The other alternative is to drop the generalized implementation and incorporate the logic directly into ginbulk.c. I really don't know which is better, but I'd like to hear some other opinions... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
On Mon, Jan 11, 2010 at 2:42 AM, Robert Haas robertmh...@gmail.com wrote: The coding pattern that this patch uses also merits some discussion. Basically, rbtree.c is a generic implementation of red-black trees - from a textbook - which ginbulk.c then uses for GIN. One possible advantage of this implementation is that it might make it possible for us to use the rbtree.c logic in other places, if we have other data structures that need similar treatment. But I'm not sure if that's the way we want to go. The other alternative is to drop the generalized implementation and incorporate the logic directly into ginbulk.c. I really don't know which is better, but I'd like to hear some other opinions... So, code reuse is not the only advantage of abstraction. It's also just plain easier to understand and test code written with clear abstract interfaces. The way you describe it someone with no knowledge could look at rbtree.c and see if it's done correctly and maybe improve it. And someone reading ginbulk only has to understand the operations red-black trees support and no understand how they're implemented to follow the code. Is there any advantage of integrating the code with ginbulk.c? Does it let us get away with finer grained locks or do any tricks doing gin-specific changes while we're in the middle of rebalancing or anything like that? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
Robert, we have benchmark for rbtree http://www.sai.msu.su/~megera/wiki/2009-07-27 rbtree, actually, fix corner cases with ordered input, with little overhead. As you may see from knngist patch, rbtree used in gist code, so, please, leave rbtree code as is. Oleg On Sun, 10 Jan 2010, Robert Haas wrote: On Thu, Dec 31, 2009 at 4:19 PM, Robert Haas robertmh...@gmail.com wrote: My other question is as related to performance. =A0Can you provide a test case that shows the performance improvement with this patch? So, we still don't have a test case for this patch. During the November CommitFest, Greg Smith griped a bit about the lack of a reproducible performance benchmark for the XLogInsert patch: http://archives.postgresql.org/pgsql-hackers/2009-12/msg00816.php ...and I would say the same logic applies to this patch, maybe even moreso. Tom has already applied a partial workaround for this problem, and I'm feeling like it won't be trivial to figure out what to measure to see the remaining issue and measure how much this new implementation helps. The coding pattern that this patch uses also merits some discussion. Basically, rbtree.c is a generic implementation of red-black trees - from a textbook - which ginbulk.c then uses for GIN. One possible advantage of this implementation is that it might make it possible for us to use the rbtree.c logic in other places, if we have other data structures that need similar treatment. But I'm not sure if that's the way we want to go. The other alternative is to drop the generalized implementation and incorporate the logic directly into ginbulk.c. I really don't know which is better, but I'd like to hear some other opinions... ...Robert --=20 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
Robert Haas escribió: I did a quick read-through of this, and one question that immediately occurred to me is that rbtree.c says that it is adopted from http://algolist.manual.ru/ds/rbtree.php. But I'm not sure what license that code is under, so I'm not sure whether it's OK for us to use it. This code comes from Thomas Niemann's Sorting and Searching Algorithms: A Cookbook, http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_man.htm specifically http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_rbt.htm The code is in the public domain; that web page says Source code, when part of a software project, may be used freely without reference to the author. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
On Mon, Jan 4, 2010 at 8:12 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Robert Haas escribió: I did a quick read-through of this, and one question that immediately occurred to me is that rbtree.c says that it is adopted from http://algolist.manual.ru/ds/rbtree.php. But I'm not sure what license that code is under, so I'm not sure whether it's OK for us to use it. This code comes from Thomas Niemann's Sorting and Searching Algorithms: A Cookbook, http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_man.htm specifically http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_rbt.htm The code is in the public domain; that web page says Source code, when part of a software project, may be used freely without reference to the author. That is excellent. Perhaps we should document that information in the code comments where the URL is currently mentioned. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-black tree for GIN
2009/11/23 Teodor Sigaev teo...@sigaev.ru: Hi there, attached is a patch, which contains implementation of a red-black tree, a self-balanced implementation of binary tree. The main goal of this patch is to improve creation time of GIN index in the corner cases. While creation, GIN collects data in memory in binary tree until it reach some limit and then it flush tree to disk. Some data could produces unbalanced binary tree, for example, sorted data, so the tree degenerates to the list with O(N^2) processing time (see thread http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php) ), which cause very slow index creation. Tom has fixed that by limiting depth of tree (committed to 8.3 and 8.4), but we found it's not enough and propose to use red-black tree, which is very good for skewed data and has almost the same performance for unsorted data, see http://www.sai.msu.su/~megera/wiki/2009-07-27 and http://www.sai.msu.su/~megera/wiki/2009-04-03 for more information. Implementation of red-black tree has several currently unused methods, but they will be used in next patches. I did a quick read-through of this, and one question that immediately occurred to me is that rbtree.c says that it is adopted from http://algolist.manual.ru/ds/rbtree.php. But I'm not sure what license that code is under, so I'm not sure whether it's OK for us to use it. My other question is as related to performance. Can you provide a test case that shows the performance improvement with this patch? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Red-black tree for GIN
Hi there, attached is a patch, which contains implementation of a red-black tree, a self-balanced implementation of binary tree. The main goal of this patch is to improve creation time of GIN index in the corner cases. While creation, GIN collects data in memory in binary tree until it reach some limit and then it flush tree to disk. Some data could produces unbalanced binary tree, for example, sorted data, so the tree degenerates to the list with O(N^2) processing time (see thread http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php) ), which cause very slow index creation. Tom has fixed that by limiting depth of tree (committed to 8.3 and 8.4), but we found it's not enough and propose to use red-black tree, which is very good for skewed data and has almost the same performance for unsorted data, see http://www.sai.msu.su/~megera/wiki/2009-07-27 and http://www.sai.msu.su/~megera/wiki/2009-04-03 for more information. Implementation of red-black tree has several currently unused methods, but they will be used in next patches. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ rbtree-0.5.gz Description: Unix tar archive -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers