Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch
I reviewed your patch. It seems to be in good shape, and worked as expected. I suppressed a compiler warning in the patch and cleaned up whitespaces in it. Patch attached. Thanks for the review! I saw that you also changed the writing: LogicalTapeWrite(state-tapeset, tapenum, tuple, HEAPTUPLESIZE); LogicalTapeWrite(state-tapeset, tapenum, tuple-t_data, tuple-t_len-HEAPTUPLESIZE); and the reading: tuple-t_len = tuplen - HEAPTUPLESIZE; if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen)) elog(ERROR, unexpected end of data); into (writing): LogicalTapeWrite(state-tapeset, tapenum, tuple, tuple-t_len); (reading): if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen)) Are we sure it's 100% equivalent? I remember I had issues with the fact that tuple-t_data wasn't at HEAPTUPLESIZE distance from tuple: http://osdir.com/ml/pgsql-hackers/2010-02/msg00744.html I think we need some documentation for the change. The only downside of the feature is that sorted cluster requires twice disk spaces of the target table (temp space for disk sort and the result table). Could I ask you to write documentation about the new behavior? Also, code comments can be improved; especially we need better description than copypaste from FormIndexDatum. I'll try to improve the comments and add doc changes (but my English will have to be double checked...) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bitmap indexes - performance
Using as a starting point the old bitmap patch in: http://archives.postgresql.org/message-id/20081101000154.go27...@fune I re-applied and re-worked the patch to see what kind of improvements over btrees bitmaps actually provided. Using a 20M rows table of 10/100/1000 random values, I've found that: 1) bulk index creation time is roughly 6 times better 2) index size is 6-15 times smaller (depending on column cardinality) 3) there's almost no difference in query times (but I have to make more tests) 4) I can't say anything about the insertion performance, but I guess bitmap will perform way worse than btree Are these improvements (index creation time, index size) worth enough to keep on working on this? I mean: given that bitmaps don't give any benefits in query times, but only benefits related to disk size and bulk index creation times, and will have horrible performance for insertions/deletions: would this job be worthed? In case it is: I will try to clean up the patch and post it... As a side note: I guess that most of the bitmap indexes performance improvements in the SELECT area are already implemented in postgres in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that say that bitmap indexes are faster for selects, unless of course they are ANDed/ORed together (which is something postgres already does for regular btree indexes) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bitmap indexes - performance
In principle a bitmap index scan should be significantly faster if the index can return the bitmap more or less natively rather than having to construct it. The problem I'm seeing is that even on a 20M rows table, doing a select * from t where c1=10 and c2=1 where c1 and c2 are low cardinality columns, leads to a *very* fast bitmap index scan, even with btree indexes (200ms per index on my PC). The rest of the time is spent in actually retrieving heap rows; and of course no index type is going to help with that. Now: if an index search on such a big table takes so little time, what kind of improvement are we trying to get? The btree indexes on c1 and c2 are about 340MB eaxh: maybe I'm experiencing some caching weirdness? Or it's normal that an index search on such a big table is that fast (again, not counting the heap scan step, which will be required no matter the index type)? I'll try to re-test it... In particular, I recall some discussions about developing a streaming API whereby an index AM could return a bitmap page-by-page or so, rather than having to construct the whole thing in-memory before anything could happen. This would be a huge win for AND/OR cases, and even for a simple indexscan it would eliminate the existing startup cost penalty for a bitmap scan. Streaming like this would also eliminate the problem of having to lossify large bitmaps in order to not overrun memory. One of the improvements I was going to try was to avoid calling tid_set_bit (or whatever is the function, I don't remember now) for every row, and call something like tid_set_bits_in_page where a whole page was passed in: this would remove a lot of the hash_* calls that are made in each and every tid_set_bit call (now that's something btree can't do, but bitmap indexes can do easily). But I stopped before implementing it, because, as I said, I don't think the improvement would still be worth it (even calling tid_set_bit 1/20th of the needed times didn't help that much; we're still talking about going from 200ms to 180ms on a query that takes seconds to execute). But I'm going to give more tested numbers... Talking about bitmap indexes I don't think we should mention memory... I mean: bitmap indexes are supposed to be used on huge tables, and I don't think that 100MB (which holds a lot of rows in a tbm...) to spare as work_mem would be a big problem... As for the startup cost: again, I wouldn't see that as a big improvement, as we're talking mostly OLAP scenarios, where most likely there will be some other blocking operator (group by, sort, sub select etc) that will remove any improvements in startup time... To sum up: IMHO nor improvements in memory usage nor in startup time would be good reasons to switch to bitmap indexes... but bulk index creation time (10 minutes to index what it takes 60 minutes with btree... and maybe more if tables are bigger) and (maybe) index disk space might be... but I'm not 100% convinced... I'm trying to find more docs that explain the improvements of bitmap indexes in other products... but most of what I've found talks about bitmapAND/OR which is something that is very cool, but that postgres already does even with btree indexes... or index creation time/size, which are, for the moment, the only things that I'm pretty confident the patch would actually provide. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] access method: are disk pages mandatory?
in bufpage.h: all blocks written out by an access method must be disk pages but in http://www.postgresql.org/docs/8.4/static/storage-page-layout.html Actually, index access methods need not use this page format. All the existing index methods do use this basic format, but the data kept on index metapages usually doesn't follow the item layout rules. I'm not getting it: am I supposed to use the disk page format when writing an index access method, or it's just a good practice because it makes the handling easier? Given the docs it looks recommended, but the comment on the code sounds more mandatory. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Error with GIT Repository
Why are you cloning over http? Me too I've used http, since I'm behind a proxy and I couldn't find a simple way of having the git:// method working behind a proxy... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Git: Unable to get pack file
I've re-run git repack on it, please try again. At least that file is accessible from here now.. It worked, thank you very much -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Git: Unable to get pack file
Hi, I tried getting the source using: git clone http://git.postgresql.org/git/postgresql.git postgresql-git but after a while (252MB) I always get: [...] Getting pack 61e1395a5bdacda95de5432123a0f8124fff05e6 which contains 476418893d3a2f366f47dbe4ce6d7522cc427545 error: Unable to get pack file http://git.postgresql.org/git/postgresql.git/objects/pack/pack-61e1395a5bdacda95de5432123a0f8124fff05e6.pack couldn't connect to host error: Unable to find 476418893d3a2f366f47dbe4ce6d7522cc427545 under http://git.postgresql.org/git/postgresql.git Cannot obtain needed blob 476418893d3a2f366f47dbe4ce6d7522cc427545 while processing commit ee6ddc7575ab1ce16d8f4eb2cdbcc525d43a6437. fatal: Fetch failed. I'm behind proxy/firewall, so I used the http://; protocol. wget http://git.postgresql.org/git/postgresql.git/objects/pack/pack-61e1395a5bdacda95de5432123a0f8124fff05e6.pack works... What am I doing wrong? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [FWD] About Our CLUSTER implementation is pessimal patch
I really thought this would have caused some interest, since - this item is in the TODO list - the improvement for CLUSTER in some scenarios is 800%, and maybe more (if I didn't do anything wrong, of course...) Could at least the message: http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php be added to the TODO page, under Improve CLUSTER performance by sorting to reduce random I/O ? It would be sad if the patch got lost... Leonardo Attached the updated patch (should solve a bug) and a script. The sql scripts generates a 2M rows table (orig); then the table is copied and the copy clustered using seq + sort (since set enable_seqscan=false;). Then the table orig is copied again, and the copy clustered using regular index scan (set enable_indexscan=true; set enable_seqscan=false). Then the same thing is done on a 5M rows table, and on a 10M rows table. On my system (Sol10 on a dual Opteron 2.8) single disc: 2M: seq+sort 11secs; regular index scan: 33secs 5M: seq+sort 39secs; regular index scan: 105secs 10M:seq+sort 83secs; regular index scan: 646secs Maybe someone could suggest a better/different test? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FWD] About Our CLUSTER implementation is pessimal patch
As outlined in the Submission timing section, you're asking about something during the wrong time to be doing so--that's why you're not getting any real feedback. Add your patch to the next CommitFest by linking to your message at https://commitfest.postgresql.org/ Ok! But there's something I don't understand: I didn't add the patch to the next CommitFest because I thought it could never be added in 9.0 (because it adds a new feature which has never been discussed). Hence I thought it should have been discussed (not properly reviewed) out of a CommitFest. The Submission timing section talks about beta phase, not alpha phase, so I'm stll confused... In other words: should patches that won't be included in the next release (because it's too late) still added to the next CommitFest? I thought a very rough discussion was the way to go in these cases, but I'm not familiar at all with the process... I'll wait for an answer before adding the patch to the CommitFest (and in case, I'll add more comments and docs to it) Thank you very much! Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [FWD] About Our CLUSTER implementation is pessimal patch
Yes. There's not going to be any more commitfests for this release, so the next commitfest is for 9.1. Perfect! Where could I find such information? I mean: how could I know it? (don't worry about the lack of enthusiasm for the patch, people are just very busy with 9.0 and don't have the energy to think about 9.1 material at this point) I understand! Since it's going to be posted in a CommitFest, I'll try to clean up the patch a little. Thank you Greg and Heikki. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch
Hi all, while testing the seq scan + sort CLUSTER implementation, I've found a bug in write/readtup_rawheap. The functions are needed by the sort part. The code in http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php didn't copy the whole tuple, but only the HeapTuple header: the t_data part wasn't copied (at least, that's my understanding), The original code is at the bottom of the email. It didn't work (the table wasn't fully clustered at the end of the CLUSTER command). So I modified write/readtup_rawheap: they are supposed to store/retrieve both the fixed part of HeapTupleData, plus the variable part t_data. But a get a lot of: WARNING: problem in alloc set TupleSort: detected write past chunk end in block 0x96853f0, chunk 0x968723c WARNING: problem in alloc set TupleSort: detected write past chunk end in block 0x96853f0, chunk 0x96872c8 warnings when calling tuplesort_end and some of the data gets garbled after the CLUSTER command. What am I doing wrong? Using the debugger, data coming out of readtup_rawheap seems fine... I *really* need your help here... static void writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup) { HeapTupletuple = (HeapTuple) stup-tuple; tuple-t_len += sizeof(HeapTupleData); /* write out the header as well */ LogicalTapeWrite(state-tapeset, tapenum, tuple, sizeof(HeapTupleData)); LogicalTapeWrite(state-tapeset, tapenum, tuple-t_data, tuple-t_len-sizeof(HeapTupleData)); if (state-randomAccess)/* need trailing length word? */ LogicalTapeWrite(state-tapeset, tapenum, tuple, sizeof(tuple-t_len)); FREEMEM(state, GetMemoryChunkSpace(tuple)); heap_freetuple(tuple); } static void readtup_rawheap(Tuplesortstate *state, SortTuple *stup, int tapenum, unsigned int tuplen) { HeapTupletuple = (HeapTuple) palloc(sizeof(HeapTupleData)); tuple-t_data = (HeapTupleHeader) palloc(tuplen-sizeof(HeapTupleData)); USEMEM(state, GetMemoryChunkSpace(tuple)); tuple-t_len = tuplen - sizeof(HeapTupleData); if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, sizeof(HeapTupleData)-sizeof(tuplen)) != sizeof(HeapTupleData)-sizeof(tuplen)) elog(ERROR, unexpected end of data); if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_data, tuple-t_len) != tuple-t_len) elog(ERROR, unexpected end of data); if (state-randomAccess)/* need trailing length word? */ if (LogicalTapeRead(state-tapeset, tapenum, tuplen, sizeof(tuplen)) != sizeof(tuplen)) elog(ERROR, unexpected end of data); stup-tuple = tuple; /* set up first-column key value */ if (state-indexInfo-ii_Expressions == NULL) { /* no expression index, just get the key datum value */ stup-datum1 = heap_getattr((HeapTuple) stup-tuple, state-indexInfo-ii_KeyAttrNumbers[0], state-tupDesc, stup-isnull1); } else { [...] expression index part, removed for clarity } } Original code: static void writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup) { HeapTuple tuple = (HeapTuple) stup-tuple; tuple-t_len += HEAPTUPLESIZE; /* write out the header as well */ LogicalTapeWrite(state-tapeset, tapenum, tuple, tuple-t_len); if (state-randomAccess)/* need trailing length word? */ LogicalTapeWrite(state-tapeset, tapenum, tuple, sizeof(tuple-t_len)); FREEMEM(state, GetMemoryChunkSpace(tuple)); heap_freetuple(tuple); } static void readtup_rawheap(Tuplesortstate *state, SortTuple *stup, int tapenum, unsigned int tuplen) { HeapTuple tuple = (HeapTuple) palloc(tuplen); USEMEM(state, GetMemoryChunkSpace(tuple)); tuple-t_len = tuplen - HEAPTUPLESIZE; if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen)) elog(ERROR, unexpected end of data); if (state-randomAccess)/* need trailing length word? */ if (LogicalTapeRead(state-tapeset, tapenum, tuplen, sizeof(tuplen)) != sizeof(tuplen)) elog(ERROR, unexpected end of data); stup-tuple = tuple; /* set up first-column key value */ stup-datum1 = heap_getattr(tuple, state-scanKeys[0].sk_attno, state-tupDesc, stup-isnull1); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch
I think you're confusing HeapTuple and HeapTupleHeader. SortTuple-tuple field should point to a HeapTupleHeader, not a HeapTuple. Mmh, I don't get it: that would mean I might be using more info than required, but I still don't understand why it's not working... at the end, CLUSTER is going to need full HeapTuple(s) (if I'm not mistaken). SortTuple-tuple can be any of MinimalTuple, IndexTuple, even a Datum. So I added my own read/write_rawtuple (I'm not that good, they were in the original patch and I copypasted them). I can write and read those tuples (using some Asserts I think everything goes as expected in write/readtup_rawheap). But when calling tuplesort_getrawtuple, after some right tuples, the call to tuplesort_gettuple_common fails because the lines: /* pull next preread tuple from list, insert in heap */ newtup = state-memtuples[tupIndex]; give a wrong initialized newtup (in other words, newtup is random memory). Since write/readtup_rawheap seems fine, I can't understand what's going on... I write the HeapTuples with: (in writetup_rawheap): HeapTupletuple = (HeapTuple) stup-tuple; tuple-t_len += HEAPTUPLESIZE; /* write out the header as well */ LogicalTapeWrite(state-tapeset, tapenum, tuple, HEAPTUPLESIZE); LogicalTapeWrite(state-tapeset, tapenum, tuple-t_data, tuple-t_len-HEAPTUPLESIZE); then I read them with: (in readtup_rawheap): HeapTupletuple = (HeapTuple) palloc(tuplen); USEMEM(state, GetMemoryChunkSpace(tuple)); tuple-t_len = tuplen - HEAPTUPLESIZE; if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, HEAPTUPLESIZE-sizeof(tuplen)) != HEAPTUPLESIZE-sizeof(tuplen)) elog(ERROR, unexpected end of data); if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_data, tuple-t_len) != tuple-t_len) elog(ERROR, unexpected end of data); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch
I think I've found the problem: tuple-t_data wasn't at HEAPTUPLESIZE distance from tuple. I guess the code makes that assumption somewhere, so I had to do: tuple-t_data = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE); Now that test works! Hope I don't find any more problems... Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch
Perhaps you could supply a .sql file containing a testcase illustrating the performance benefits you tested with your patch Sure. Attached the updated patch (should solve a bug) and a script. The sql scripts generates a 2M rows table (orig); then the table is copied and the copy clustered using seq + sort (since set enable_seqscan=false;). Then the table orig is copied again, and the copy clustered using regular index scan (set enable_indexscan=true; set enable_seqscan=false). Then the same thing is done on a 5M rows table, and on a 10M rows table. On my system (Sol10 on a dual Opteron 2.8) single disc: 2M: seq+sort 11secs; regular index scan: 33secs 5M: seq+sort 39secs; regular index scan: 105secs 10M:seq+sort 83secs; regular index scan: 646secs Maybe someone could suggest a better/different test? Leonardo sorted_cluster20100210.patch Description: Binary data cluster_tests.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I: [HACKERS] About Our CLUSTER implementation is pessimal patch
Not even a comment? As I said, performance results on my system were very good I know you're all very busy getting 9.0 out, but I think the results in heap scanning + sort instead of index scanning for CLUSTER are very good... I would like to know if I did something wrong/I should improve something in the patch... I haven't tested it with index expressions yet (but the tests in make check work). Thanks Leonardo Hi all, attached a patch to do seq scan + sorting instead of index scan on CLUSTER (when that's supposed to be faster). As I've already said, the patch is based on: http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php Of course, the code isn't supposed to be ready to be merged: I would like to write more comments and add some test cases to cluster.sql (plus change all the things you are going to tell me I have to change...) I would like your opinions on code correctness and the decisions I took, especially: 1) function names (cost_index_scan_vs_seqscansort I guess it's awful...) 2) the fact that I put in Tuplesortstate an EState variable, so that MakeSingleTupleTableSlot wouldn't have to be called for every row in the expression indexes case 3) the expression index case is not optimized: I preferred to call FormIndexDatum once for the first key value in copytup_rawheap and another time to get all the remaining values in comparetup_rawheap. I liked the idea of re-using FormIndexDatum in that case, instead of copyingpasting only the relevant code: but FormIndexDatum returns all the values, even when I might need only the first one 4) I refactored the code to deform and rewrite tuple into the function deform_and_rewrite_tuple, because now that part can be called by the regular index scan or by the new seq-scan + sort (but I could copypaste those lines instead of refactoring them into a new function) Suggestions and comments are not just welcome, but needed! sorted_cluster.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] IndexBuildHeapScan and RIDs order
Hi, I was looking at the code for bitmap index: http://archives.postgresql.org/pgsql-hackers/2008-10/msg01691.php and I couldn't understand why during bmbuild (the ambuild call for bitmap indexes) the code checks for not-ordered ItemPointerData(s). In which cases the ItemPointerData(s) given by IndexBuildHeapScan are not in order (when allow_sync=false)? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
I know you're all very busy getting 9.0 out, but I think the results in heap scanning + sort instead of index scanning for CLUSTER are very good... I would like to know if I did something wrong/I should improve something in the patch... I haven't tested it with index expressions yet (but the tests in make check work). Thanks Leonardo Hi all, attached a patch to do seq scan + sorting instead of index scan on CLUSTER (when that's supposed to be faster). As I've already said, the patch is based on: http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php Of course, the code isn't supposed to be ready to be merged: I would like to write more comments and add some test cases to cluster.sql (plus change all the things you are going to tell me I have to change...) I would like your opinions on code correctness and the decisions I took, especially: 1) function names (cost_index_scan_vs_seqscansort I guess it's awful...) 2) the fact that I put in Tuplesortstate an EState variable, so that MakeSingleTupleTableSlot wouldn't have to be called for every row in the expression indexes case 3) the expression index case is not optimized: I preferred to call FormIndexDatum once for the first key value in copytup_rawheap and another time to get all the remaining values in comparetup_rawheap. I liked the idea of re-using FormIndexDatum in that case, instead of copyingpasting only the relevant code: but FormIndexDatum returns all the values, even when I might need only the first one 4) I refactored the code to deform and rewrite tuple into the function deform_and_rewrite_tuple, because now that part can be called by the regular index scan or by the new seq-scan + sort (but I could copypaste those lines instead of refactoring them into a new function) Suggestions and comments are not just welcome, but needed! sorted_cluster.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
Hi all, attached a patch to do seq scan + sorting instead of index scan on CLUSTER (when that's supposed to be faster). As I've already said, the patch is based on: http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php Of course, the code isn't supposed to be ready to be merged: I would like to write more comments and add some test cases to cluster.sql (plus change all the things you are going to tell me I have to change...) I would like your opinions on code correctness and the decisions I took, especially: 1) function names (cost_index_scan_vs_seqscansort I guess it's awful...) 2) the fact that I put in Tuplesortstate an EState variable, so that MakeSingleTupleTableSlot wouldn't have to be called for every row in the expression indexes case 3) the expression index case is not optimized: I preferred to call FormIndexDatum once for the first key value in copytup_rawheap and another time to get all the remaining values in comparetup_rawheap. I liked the idea of re-using FormIndexDatum in that case, instead of copyingpasting only the relevant code: but FormIndexDatum returns all the values, even when I might need only the first one 4) I refactored the code to deform and rewrite tuple into the function deform_and_rewrite_tuple, because now that part can be called by the regular index scan or by the new seq-scan + sort (but I could copypaste those lines instead of refactoring them into a new function) Suggestions and comments are not just welcome, but needed! Leonardo sorted_cluster.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
Consider multi-column indexes, ie: CREATE INDEX i_foo ON foo (length(a), length(b)); Ok, I've never thought of expression indexes that way (in the (expr1,expr2,exprN) form): that is a good example. Maybe you're confusing expression indexes with partial indexes? No no, that was exactly what I needed to know. Thank you very much Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
If we ever get another index type that supports ordered scans, it'll be time enough to worry about cases like this. Ok BTW, I think you could use tuplesort_begin_index_btree() rather than touching _bt_mkscankey_nodata directly. well I created my own tuplesort_begin_rawheap method (copied from: http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php ). From there I call _bt_mkscankey_nodata (as tuplesort_begin_index_btree() does), plus I set up everything else I'm going to need in tuplesort. Another question: why is IndexInfo.ii_Expressions a list? How can an index have more than one expression? Sorry if it's a stupid question, but I'm not familiar with index expressions. I think I'm almost there (some very stupid tests pass). I'll try to submit a patch soon to understand if I'm going in the right direction. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
Rule it out. Note you should be looking at pg_am.amcanorder, not hardwiring knowledge of particular index types. Sorry, I replied ok too fast... I can look at pg_am.amcanorder, but I would still need the ScanKey to be used by tuplesort; and I can't find any other way of doing it than calling _bt_mkscankey_nodata, which is btree-specific. I guess either: - add another function to the list of Index Access Method Functions, something that returns the ScanKey in case pg_am.amcanorder is true or - hardwiring the fact that the only way to seq scan + sort in CLUSTER is using a btree... hence the call to _bt_mkscankey_nodata But maybe there's another way of doing it, I don't know the code enough Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
So my proposal would be: do the test seq_scan vs sort/index_scan only for regular btree index, and integrate that test in the planner. Keep in mind that this patch was after the deadline for 9.0, so there is probably not a huge rush to get this done. That's true; I'll try to get the whole thing done then... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
So, if I'm not mistaken: hash indexes - can't be used in CLUSTER gin indexes - can't be used in CLUSTER that leaves: btree - ok expression btree - I have to find a way to compute the expression for each tuple: hints? gist - how can I get something comparable by tuplesort? Or should I rule it out from the seq scan + sort path? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
Note you should be looking at pg_am.amcanorder, not hardwiring knowledge of particular index types. Ok. Would it make sense to use FormIndexDatum to get the index value to be used by tuplesort? I'm having trouble avoiding the call to _bt_mkscankey_nodata to get the scanKeys... that relates to btree only, I can't find the general function... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
Anyone? I'd like some feedback before moving on to do the seq scan + sort in those CLUSTER cases where use_index_scan returns false... - Messaggio originale - Da: Leonardo F m_li...@yahoo.it A: pgsql-hackers@postgresql.org Inviato: Mer 20 gennaio 2010, 18:48:00 Oggetto: Re: [HACKERS] About Our CLUSTER implementation is pessimal patch I read the thread Our CLUSTER implementation is pessimal http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php . I was going to try to add the proper cost_index/cost_sort calls to decide which path should be executed, as in: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php I think I got something up and running to check if a table scan + sort is supposed to be faster than an index scan for a certain CLUSTER operation. The way I did it is (I guess...) wrong: I created the elements needed by get_relation_info, create_seqscan_path, create_index_path, cost_sort. It has been, obviously, a trial and error approach: I added the member values as soon as one function call crashed... and I bet I didn't get all the corner cases. Is there any better way of doing it? Leonardo (this is called in copy_heap_data to decide which path to choose:) static bool use_index_scan(Oid tableOid, Oid indexOid) { RelOptInfo *rel; PlannerInfo *root; Query *query; PlannerGlobal *glob; Path *seqAndSortPath; IndexPath *indexPath; RangeTblEntry *rte; rel = makeNode(RelOptInfo); rel-reloptkind = RELOPT_BASEREL; rel-relid = 1; rel-rtekind = RTE_RELATION; /* needed by get_relation_info */ glob = makeNode(PlannerGlobal); /* needed by get_relation_info: */ query = makeNode(Query); query-resultRelation = 0; root = makeNode(PlannerInfo); root-parse = query; root-glob = glob; get_relation_info(root, tableOid, false, rel); seqAndSortPath = create_seqscan_path(NULL, rel); rel-rows = rel-tuples; rte = makeNode(RangeTblEntry); rte-rtekind = RTE_RELATION; rte-relid = tableOid; root-simple_rel_array_size = 2; root-simple_rte_array = (RangeTblEntry **) palloc0(root-simple_rel_array_size * sizeof(RangeTblEntry *)); root-simple_rte_array[1] = rte; root-total_table_pages = rel-pages; indexPath = create_index_path(root, (IndexOptInfo*)(list_head(rel-indexlist)-data.ptr_value), NULL, NULL, ForwardScanDirection, NULL); cost_sort(seqAndSortPath, root, NULL, seqAndSortPath-total_cost, rel-tuples, rel-width, -1); return indexPath-path.total_cost seqAndSortPath-total_cost; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
* Do we need to disable sort-path for tables clustered on a gist index? Yes; as I said in a previous mail, only plain btree indexes (that is, not custom expression indexes) would have that option (at least in a first version...) * I'd prefer to separate cost calculation routines from create_index_path() and cost_sort(), rather than using a dummy planner. How? Copypaste (removing unnecessary code) of the existing create_index_path() and cost_sort()? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
one idea could be to actually prepare a query using SPI for select * from table order by cols and then peek inside to see which plan was generated. I like that!!! Here's a first attempt, it looks like it's working... (I still have to skip non-btree indexes and expression indexes, plus add a ASC/DESC to the select) static bool use_index_scan(Relation OldHeap, Oid indexOid) { HeapTupleindexTuple; Form_pg_index indexForm; struct _SPI_plan *spiPlan; char st[(NAMEDATALEN+1)*INDEX_MAX_KEYS+NAMEDATALEN+100]; int i; TupleDescoldTupDesc; bool retval = true; PlannedStmt *plan; CachedPlanSource *cps; oldTupDesc = RelationGetDescr(OldHeap); sprintf(st, select * from %s order by , OldHeap-rd_rel-relname.data); indexTuple = SearchSysCache(INDEXRELID, ObjectIdGetDatum(indexOid),0, 0, 0); if (!HeapTupleIsValid(indexTuple)) elog(ERROR, cache lookup failed for index %u, indexOid); indexForm = (Form_pg_index) GETSTRUCT(indexTuple); for (i = 0; i indexForm-indnatts; i++) { strcat(st, oldTupDesc-attrs[indexForm-indkey.values[i]-1]-attname.data); if (i+1 indexForm-indnatts) { strcat(st, ,); } } ReleaseSysCache(indexTuple); if (SPI_connect() != SPI_OK_CONNECT) return false; spiPlan = SPI_prepare(st, 0, NULL); if (spiPlan == NULL) { SPI_finish(); return false; } cps = (CachedPlanSource*)(list_head(spiPlan-plancache_list)-data.ptr_value); plan = (PlannedStmt*)(list_head(cps-plan-stmt_list)-data.ptr_value); if (IsA(plan-planTree, Sort)) { retval = false; } SPI_freeplan(spiPlan); SPI_finish(); return retval; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
By the time you make this actually work in all cases, it's probably going to be more of a mess than the other way; I meant to add only ASC/DESC; I would leave all other cases (non-btrees, custom expression btrees) to use the old index-scan method. not to mention that it doesn't work *at all* without violating SPI internals. You lost me there... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
I meant to add only ASC/DESC; I would leave all other cases (non-btrees, custom expression btrees) to use the old index-scan method. That hardly seems acceptable. Well I brought up that in an earlier post: http://old.nabble.com/Re%3A-About-%22Our-CLUSTER-implementation-is-pessimal%22-patch-p27179107.html I hoped that since people mostly (95%?) use plain btree indexes, a patch that helped CLUSTER with using such indexes would be fine (at least at first...). I guess that a patch that deals with all other types of indexes would be way more complicated (not at the planning stage, but in the scan+sort phase)? I hardly think that keeping yourself at arm's length from the planner by getting cozy with SPI internals is a net improvement in modularity. So you think that code snippet that I sent earlier (the function that uses create_index_path etc) could be put in planner.c (almost) as it is? It looked clumsy to me (I liked the SPI code better) Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
Well, the expression cases would be more likely to cost more if implemented as a sort, but that doesn't mean that a sort couldn't be a win. Besides, even if you blow off the expression case, what about nulls first/last, nondefault opclasses, etc? Ok, let's split the problem in 2 parts: a) choosing the right path b) using seq scan + sort in case the planner (somehow) said it's faster You're right: for a) nondefault opclasses and other things would make the SPI version wrong. So let's stick for the moment with the cost_sort create_index_path etc calls for a). I think that the calls to create_index_path would also deal with every other possible index type (something that's impossible with the SPI version). For b): I'm using the code in http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php That doesn't deal with expression indexes, nor with anything that is not a btree index. But it's much faster on what people use the most (and not slower on anything else). So my proposal would be: do the test seq_scan vs sort/index_scan only for regular btree index, and integrate that test in the planner. That doesn't mean that sometime in the future we're not going to have full support for all index types in seq scan + sort CLUSTER... but I would like to have something that works faster on what people use, and not slower in the other cases without waiting ages to have the whole thing... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
New version of the patch, let me know if I can fix/change something else. Leonardo getsetbit.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
All issues addressed, with one tiny nit-pick -- the get_bit and set_bit methods are not part of the SQL standard. Damn! I completely forgot to mention that I had no idea if what I wrote in the docs made any sense... Well thank you for your thorough review. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
I read the thread Our CLUSTER implementation is pessimal http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php . I would like to try/integrate that patch as we use CLUSTER a lot on our system. I was going to try to add the proper cost_index/cost_sort calls to decide which path should be executed, as in: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php I think I got something up and running to check if a table scan + sort is supposed to be faster than an index scan for a certain CLUSTER operation. The way I did it is (I guess...) wrong: I created the elements needed by get_relation_info, create_seqscan_path, create_index_path, cost_sort. It has been, obviously, a trial and error approach: I added the member values as soon as one function call crashed... and I bet I didn't get all the corner cases. Is there any better way of doing it? Leonardo (this is called in copy_heap_data to decide which path to choose:) static bool use_index_scan(Oid tableOid, Oid indexOid) { RelOptInfo *rel; PlannerInfo *root; Query *query; PlannerGlobal *glob; Path *seqAndSortPath; IndexPath *indexPath; RangeTblEntry *rte; rel = makeNode(RelOptInfo); rel-reloptkind = RELOPT_BASEREL; rel-relid = 1; rel-rtekind = RTE_RELATION; /* needed by get_relation_info */ glob = makeNode(PlannerGlobal); /* needed by get_relation_info: */ query = makeNode(Query); query-resultRelation = 0; root = makeNode(PlannerInfo); root-parse = query; root-glob = glob; get_relation_info(root, tableOid, false, rel); seqAndSortPath = create_seqscan_path(NULL, rel); rel-rows = rel-tuples; rte = makeNode(RangeTblEntry); rte-rtekind = RTE_RELATION; rte-relid = tableOid; root-simple_rel_array_size = 2; root-simple_rte_array = (RangeTblEntry **) palloc0(root-simple_rel_array_size * sizeof(RangeTblEntry *)); root-simple_rte_array[1] = rte; root-total_table_pages = rel-pages; indexPath = create_index_path(root, (IndexOptInfo*)(list_head(rel-indexlist)-data.ptr_value), NULL, NULL, ForwardScanDirection, NULL); cost_sort(seqAndSortPath, root, NULL, seqAndSortPath-total_cost, rel-tuples, rel-width, -1); return indexPath-path.total_cost seqAndSortPath-total_cost; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
In the documentation, the get_bit and set_bit methods are added to a list where we state The following SQL-standard functions work on bit strings as well as character strings; however they are not SQL-standard functions and are implemented on binary strings, not character strings. Ok, I didn't change that, but I guess I can fix it. The tests don't include any examples where the bit string lines up with a byte boundary or is operating on the first or last bit of a byte, and the overlay function is not tested with values which change the length of the bit string. (All of these cases seem to work in the current version, but seem like the sort of thing which could get broken in revision.) Ok, I'll add some corner-cases tests. There is a peculiar omission from the patch -- it adds supportfor the overlay function to bit strings but not binary strings. Sincethe standard drops the bit string as a standard type in the 2003standard, in favor of boolean type and binary string types, it seemsodd to omit binary string support (which is defined in the standard)but include bit string support (which isn't). What the patch adds seemsuseful, and I don't think the omission should be considered fatal, butit would be nice to see us also cover binary strings if we can. Mmh, don't know how complicated that would be, but I can give it a try: I guess it could be a simple SQL replacement (as for char strings and bit strings?) Well, there was one thing which is beyond my ken in this regard: I'm not sure that the cases from bits8 to (unsigned char *) are needed or appropriate, although on my machine they don't seem to hurt. Perhaps someone with a better grasp of such issues can comment. That's some kind of copypaste from varlena.c byteaGetBit: I don't know if they can be avoided. I'm not sure the leading whitespace in pg_proc.h is as it should be. Do you mean the tab in: _null_(tab)select ... ? Yes, I think I should remove it. I'd prefer to see the comments for get_bit and set_bit mention that the location is specified left-to-right in a zero-based fashion consistent with the other get_bit and set_bit functions, but inconsistent with the standard substring, position, overlay functions. Personally, I find it unfortunate that our extensions introduced this inconsistency, but let's keep it consistent within the similarly named functions. Ok; I found them bizarre too, but I kept it consistent. It seems odd to me that you specify the bit to set as a 32 bit integer, and that that is what get_bit returns, but again, this is consistent with what we do for the bytea functions, so it's probably the right thing to match that. Same as above: I tried to be consistent. Using a ERRCODE_ARRAY_SUBSCRIPT_ERROR for a bit position which is out-of-bounds seems somewhat bizarre to me -- I'd probably have chosen ERRCODE_INVALID_PARAMETER_VALUE in a green field -- but again, it matches the bytea implementation. Again, same as above: consistency; but I can change it if you want This last point is probably more a matter of C coding style than anything, and I'm way to rusty in C to want to be the final arbiter of something like this, but it seems to me that oldByte and newByte local variables are just cluttering things up rather than clarifying: intoldByte, newByte; [...] oldByte = ((unsigned char *) r)[byteNo]; if (newBit == 0) newByte = oldByte (~(1 bitNo)); else newByte = oldByte | (1 bitNo); ((unsigned char *) r)[byteNo] = newByte; vs if (newBit == 0) ((unsigned char *) r)[byteNo] = (~(1 bitNo)); else ((unsigned char *) r)[byteNo] |= (1 bitNo); I agree, you version is cleaner. Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values
This patch no longer applies. Could you rebase it? Done (I think). Added a couple of simple tests for bit overlay. I didn't include the catversion.h changes: obviously the CATALOG_VERSION_NO has to be changed. Leonardo Index: src/backend/utils/adt/varbit.c === RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/varbit.c,v retrieving revision 1.63 diff -c -r1.63 varbit.c *** src/backend/utils/adt/varbit.c 7 Jan 2010 20:17:43 - 1.63 --- src/backend/utils/adt/varbit.c 18 Jan 2010 09:05:50 - *** *** 1606,1608 --- 1606,1704 } PG_RETURN_INT32(0); } + + + /* bitsetbit + * Given an instance of type 'bit' creates a new one with + * the Nth bit set to the given value. + */ + Datum + bitsetbit(PG_FUNCTION_ARGS) + { + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + int32 n = PG_GETARG_INT32(1); + int32 newBit = PG_GETARG_INT32(2); + VarBit *result; + int len, + bitlen; + bits8 *r, + *p; + int byteNo, + bitNo; + int oldByte, + newByte; + + bitlen = VARBITLEN(arg1); + if (n 0 || n = bitlen) + ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), +errmsg(index %d out of valid range, 0..%d, + n, bitlen - 1))); + /* +* sanity check! +*/ + if (newBit != 0 newBit != 1) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg(new bit must be 0 or 1))); + + len = VARSIZE(arg1); + result = (VarBit *) palloc(len); + SET_VARSIZE(result, len); + VARBITLEN(result) = bitlen; + + p = VARBITS(arg1); + r = VARBITS(result); + + memcpy(r, p, VARBITBYTES(arg1)); + + byteNo = n / BITS_PER_BYTE; + bitNo = BITS_PER_BYTE - 1 - (n % BITS_PER_BYTE); + + /* +* Update the byte. +*/ + oldByte = ((unsigned char *) r)[byteNo]; + + if (newBit == 0) + newByte = oldByte (~(1 bitNo)); + else + newByte = oldByte | (1 bitNo); + + ((unsigned char *) r)[byteNo] = newByte; + + PG_RETURN_VARBIT_P(result); + } + + /* bitgetbit + * returns the value of the Nth bit (0 or 1). + */ + Datum + bitgetbit(PG_FUNCTION_ARGS) + { + VarBit *arg1 = PG_GETARG_VARBIT_P(0); + int32 n = PG_GETARG_INT32(1); + int bitlen; + int byteNo, + bitNo; + int byte; + + bitlen = VARBITLEN(arg1); + + if (n 0 || n = bitlen) + ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), +errmsg(index %d out of valid range, 0..%d, + n, bitlen - 1))); + + byteNo = n / BITS_PER_BYTE; + bitNo = BITS_PER_BYTE - 1 - n % BITS_PER_BYTE; + + byte = ((unsigned char *) VARBITS(arg1))[byteNo]; + + if (byte (1 bitNo)) + PG_RETURN_INT32(1); + else + PG_RETURN_INT32(0); + } + Index: src/include/catalog/pg_proc.h === RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v retrieving revision 1.562 diff -c -r1.562 pg_proc.h *** src/include/catalog/pg_proc.h 15 Jan 2010 09:19:07 - 1.562 --- src/include/catalog/pg_proc.h 18 Jan 2010 09:05:58 - *** *** 2405,2410 --- 2405,2418 DATA(insert OID = 1699 ( substring PGNSP PGUID 12 1 0 0 f f f t f i 2 0 1560 1560 23 _null_ _null_ _null_ _null_ bitsubstr_no_len _null_ _null_ _null_ )); DESCR(return portion of bitstring); + DATA(insert OID = 3030 ( overlay PGNSP PGUID 14 1 0 0 f f f t f i 4 0 1560 1560 1560 23 23 _null_ _null_ _null_ _null_ select pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + $4)) _null_ _null_ _null_ )); + DESCR(substitute portion of bit string); + DATA(insert OID = 3031 ( overlay PGNSP PGUID 14 1 0 0 f f f t f i 3 0 1560 1560 1560 23 _null_ _null_ _null_ _null_select pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + pg_catalog.length($2))) _null_ _null_ _null_ )); + DESCR(substitute portion of bit string); + DATA(insert OID = 3032 ( get_bitPGNSP PGUID 12 1 0 0 f f f t f i 2 0 23 1560 23 _null_ _null_ _null_ _null_ bitgetbit _null_ _null_ _null_ )); + DESCR(get bit); + DATA(insert
[HACKERS] About Our CLUSTER implementation is pessimal patch
Hi, I read the thread Our CLUSTER implementation is pessimal http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php . I would like to try/integrate that patch as we use CLUSTER a lot on our system. I was going to try to add the proper cost_index/cost_sort calls to decide which path should be executed, as in: http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php I don't think it will be easy without help... I'll ask here a lot I'm afraid... About that patch: 1) would it be possible to use the tuplesort_*tupleslot set of functions instead of writing new ones for HeapTuple? That is: is it that difficult/impossible/nonsense to construct TupleTableSlot from HeapTuple and use those? 2) The patch doesn't check HeapTupleSatisfiesVacuum before passing it to tuplesort_putrawtuple: would it be reasonable to check the isdead flag before calling tuplesort_putrawtuple for each tuple? Sorry if I talked nonsense... Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
Yeah, I think you could do that, I agree it feels better that way. You'll still need new copytup and comparetup functions, though, to deal with HeapTupleHeaders instead of MinimalTuples, or modify the existing ones to handle both. You meant HeapTuple, not HeapTupleHeaders, right? Mmh, didn't think of those two functions; I might as well start with Gregory Stark's patch (that is: using HeapTuple) And some way to indicate that you want to preserve the visibility information when you create the tuplesort, maybe a new parameter to tuplesort_begin_heap(). I guess that using Gregory Stark's patch there's no need for it, since it uses HeapTuples, right? A patch that: 1) uses always the old CLUSTER method for non-btree indexes and for expression indexes 2) add a whole set of new functions to tuplesort (as in Gregory Stark's patch) would be rejected for sure? Or can be thought as a better than nothing, works in 90% cases patch? Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Allow substring/replace() to get/set bit values
What we can do in the back branches is make the code treat any negative value as meaning two-arg form. To throw an error we'd need to refactor the pg_proc representation ... I was going to fix that myself, but I think it has just been done. How can I keep up with who's doing what? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
Is anybody interested? Otherwise the entry could be removed from the TODO list... Even if not, you can still submit a patch. There are a lot more users of PG than there are people who read -hackers. Ok, I'll try and submit a patch. Thank you very much. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: Allow substring/replace() to get/set bit values
Hi all, attached a patch that adds the following functions for bit string: - overlay - get_bit - set_bit Some info: 1) overlay is implemented as calls to substring; given the different way substring behaves when used with strings vs bit strings: test=# SELECT substring(B'0001' from 1 for -1); substring -- 0001 (1 row) test=# SELECT substring('0001' from 1 for -1); ERROR: negative substring length not allowed I don't think that this overlay implementation is what we want? Example: test=# SELECT overlay(B'' placing B'01' from 1 for 2); overlay - 0111 (1 row) (looks ok) test=# SELECT overlay(B'' placing B'01' from 0 for 2); overlay --- 0 (1 row) This happens because substring(bit, pos, -1) means substring(bit, pos, length(bit string)), and -1 values for bit substring parameters are allowed: is this a bug in bit substring??? 2) I tried implementing bit_get and bit_set as calls to overlay/substring: DATA(insert OID = 3032 ( get_bit PGNSP PGUID 14 1 0 0 f f f t f i 2 0 23 1560 23 _null_ _null_ _null_ _null_ select (pg_catalog.substring($1, $2, 1))::int4 _null_ _null_ _null_ )); DESCR(get bit); DATA(insert OID = 3033 ( set_bit PGNSP PGUID 14 1 0 0 f f f t f i 3 0 1560 1560 23 23 _null_ _null_ _null_ _null_ select pg_catalog.overlay($1, $3::bit, $2, 1) _null_ _null_ _null_ )); DESCR(set bit); but this doesn't give any check on the values provided: that the bit looked for is in fact in the right range, and that the bit in set_bit is in fact a bit (I don't like the idea of writing select set_bit(B'01010111', B'1') instead of select set_bit(B'01010111', 1) ). So I coded them in proper C internal functions. Leonardo patch_bit.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Allow substring/replace() to get/set bit values
Thanks! Please add your patch here: https://commitfest.postgresql.org/action/commitfest_view/open Ok; but what about what I said about the difference between bit/string substring? That affects overlay behaviour for bit... I've even got ERROR: invalid memory alloc request size 4244635647 with: SELECT substring(B'0001' from 5 for -2); (this is 8.4.2, windows version, not modified...) test=# SELECT substring(B'0001' from 1 for -1); substring -- 0001 (1 row) test=# SELECT substring('0001' from 1 for -1); ERROR: negative substring length not allowed -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
To sum up: 1) a new function, get_bit, that calls substring 2) a new function, overlay, that replaces bits (starting at a certain position) 3) a new function, set_bit, that calls overlay That seems reasonable to me. Not sure what others think. Is anybody interested? Otherwise the entry could be removed from the TODO list... Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
You might want to search the archives (or the wiki history, or the CVS history if it's been there since before we moved the TODO list to the wiki) for discussion of why that item was added to the TODO in the first place. I read the thread: http://archives.postgresql.org/pgsql-hackers/2004-02/msg00478.php 1) it is true that getbit sounds a lot like what substring() does, but the same could be said for binary string substring/get_byte; so IMHO get/set_bit should be present for bit string 2) it is not very clear to me how setbit could actually be handled by replace() (maybe overlay style?) 3) since I'm looking at byte string get/set_bit to understand how that works, I'm having a hard time understanding why the bit indexes in get/set_bit are low-first based: select get_bit(E'\\376\\376'::bytea, s) as b,s from generate_series(0,15,1) as s b s 0 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 0 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 I understand this is the internal representation, but still: if someone asked me what the 8th bit in 11101110 is, I would have said 1, not 0 (assuming the first bit has index '0'). Actually, David Helgason's patch (http://archives.postgresql.org/pgsql-hackers/2004-01/msg00498.php) goes in this direction: note the bitNo = 7 - (n % 8); part. Using that algorithm would mean get/set_bit in bit string would behave differently from what they do in binary string (IMHO it's the binary string implementation that is wrong). Leonardo -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
As you say, there's really no point in changing the internal representation, and if you don't find replace() useful either, then why are you even working on this at all? I would like a get_bit / set_bit for bit strings, as I find them useful. get_bit could be a simple call to substring, but there's no way of doing a set_bit on a bit string as far as I know. I don't like the replace syntax for bit strings since it won't give you the same functionality of set_bit, plus I don't really see how someone would want to look for a bit string and replace it with another bit string. But I see that someone might want to overlay a bit string with another (this is different from replace since you have to tell the position where the replacing would start, instead of looking for a bit string). To sum up: 1) a new function, get_bit, that calls substring 2) a new function, overlay, that replaces bits (starting at a certain position) 3) a new function, set_bit, that calls overlay Since the latest discussion of this is more than five years old, it's unclear that anyone even cares any more. It seems to me that making replace overlay a substring of bits could be a reasonable thing to do, but if nobody actually wants it, then the simplest thing to do is remove this from the TODO and call it good. I understand: it would be both a useful feature to me and a way to start coding postgres. But, of course, if there's no interest, I'll pass... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I: [HACKERS] TODO: Allow substring/replace() to get/set bit values
Re-reading the docs it looks like the only thing missing is get/set_bit for bit string. Substring is already implemented for bit string, and I don't really know if replace is useful at all. (sorry if the other mail came with a different sender name) Leonardo I would like to work on Allow substring/replace() to get/set bit values, since it looks like a simple task. The item is not marked as easy on the TODO though. Before proceding to a discussion on how this functions should be implemented (I got from the messages on the mailing list that bit substring/replace functions should do it) I would like to know if it's a complicated task. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers