Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-07-06 Thread Leonardo F
 I reviewed 
 your patch. It seems to be in good shape, and worked as
 expected. I 
 suppressed a compiler warning in the patch and cleaned up
 whitespaces in it. 
 Patch attached.


Thanks for the review!

I saw that you also changed the writing:

 LogicalTapeWrite(state-tapeset, tapenum,
 tuple, HEAPTUPLESIZE);
 LogicalTapeWrite(state-tapeset, tapenum, tuple-t_data, 
tuple-t_len-HEAPTUPLESIZE);


and the reading:

 tuple-t_len = tuplen - HEAPTUPLESIZE;
 if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, 
tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen))
  elog(ERROR, unexpected end of data);



into (writing):

 LogicalTapeWrite(state-tapeset, tapenum,
tuple, tuple-t_len);


(reading):

if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, 
tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen))



Are we sure it's 100% equivalent?

I remember I had issues with the fact that tuple-t_data wasn't
at HEAPTUPLESIZE distance from tuple:

http://osdir.com/ml/pgsql-hackers/2010-02/msg00744.html


 I think we need some documentation for the change. The 
 only downside
 of the feature is that sorted cluster requires twice disk 
 spaces of
 the target table (temp space for disk sort and the result 
 table).
 Could I ask you to write documentation about the new 
 behavior?
 Also, code comments can be improved; especially we need 
 better
 description than copypaste from 
 FormIndexDatum.


I'll try to improve the comments and add doc changes (but my English
will have to be double checked...)





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bitmap indexes - performance

2010-07-01 Thread Leonardo F
Using as a starting point the old bitmap patch in:

http://archives.postgresql.org/message-id/20081101000154.go27...@fune


I re-applied and re-worked the patch to see what kind of improvements over
btrees bitmaps actually provided.

Using a 20M rows table of 10/100/1000 random values, I've found that:

1) bulk index creation time is roughly 6 times better
2) index size is 6-15 times smaller (depending on column cardinality)
3) there's almost no difference in query times (but I have to make more
tests)
4) I can't say anything about the insertion performance, but I guess
bitmap will perform way worse than btree

Are these improvements (index creation time, index size) worth enough
to keep on working on this?

I mean: given that bitmaps don't give any benefits in query times, but
only benefits related to disk size and bulk index creation times, and
will have horrible performance for insertions/deletions: would this job be
worthed?

In case it is: I will try to clean up the patch and post it...


As a side note: I guess that most of the bitmap indexes performance 
improvements in the SELECT area are already implemented in postgres
in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that
say that bitmap indexes are faster for selects, unless of course they
are ANDed/ORed together (which is something postgres already does
for regular btree indexes)




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bitmap indexes - performance

2010-07-01 Thread Leonardo F
 In 
 principle a bitmap index scan should be significantly faster if the
 index can 
 return the bitmap more or less natively rather than having
 to construct 
 it.  

The problem I'm seeing is that even on a 20M rows table, doing a

select * from t where c1=10 and c2=1

where c1 and c2 are low cardinality columns, leads to a *very*
fast bitmap index scan, even with btree indexes (200ms per index
on my PC).
The rest of the time is spent in actually retrieving heap rows; and
of course no index type is going to help with that.

Now: if an index search on such a big table takes so little time,
what kind of improvement are we trying to get?
The btree indexes on c1 and c2 are about 340MB eaxh: maybe
I'm experiencing some caching weirdness? Or it's normal that an
index search on such a big table is that fast (again, not counting
the heap scan step, which will be required no matter the index
type)? I'll try to re-test it...

 In particular, I recall some discussions about developing 
 a streaming
 API whereby an index AM could return a bitmap page-by-page or 
 so,
 rather than having to construct the whole thing in-memory 
 before
 anything could happen.  This would be a huge win for AND/OR 
 cases,
 and even for a simple indexscan it would eliminate the existing 
 startup
 cost penalty for a bitmap scan.  Streaming like this would 
 also
 eliminate the problem of having to lossify large bitmaps in order 
 to
 not overrun memory.

One of the improvements I was going to try was to avoid calling

tid_set_bit (or whatever is the function, I don't remember now) for
every row, and call something like tid_set_bits_in_page where
a whole page was passed in: this would remove a lot of the hash_*
calls that are made in each and every tid_set_bit call (now that's
something btree can't do, but bitmap indexes can do easily).
But I stopped before implementing it, because, as I said, I don't
think the improvement would still be worth it (even calling 
tid_set_bit 1/20th of the needed times didn't help that much; we're
still talking about going from 200ms to 180ms on a query that
takes seconds to execute). But I'm going to give more tested
numbers...

Talking about bitmap indexes I don't think we should mention
memory... I mean: bitmap indexes are supposed to be used on
huge tables, and I don't think that 100MB (which holds a lot of
rows in a tbm...) to spare as work_mem would be a big problem... 
As for the startup cost: again, I wouldn't see that as a big
improvement, as we're talking mostly OLAP scenarios, where
most likely there will be some other blocking operator (group by,
sort, sub select etc) that will remove any improvements in
startup time...

To sum up: IMHO nor improvements in memory usage nor 
in startup time would be good reasons to switch to bitmap
indexes... but bulk index creation time (10 minutes to index
what it takes 60 minutes with btree... and maybe more if tables
are bigger) and (maybe) index disk space might be...
but I'm not 100% convinced...

I'm trying to find more docs that explain the improvements of
bitmap indexes in other products... but most of what I've found
talks about bitmapAND/OR which is something that is very
cool, but that postgres already does even with btree indexes...
or index creation time/size, which are, for the moment, the only
things that I'm pretty confident the patch would actually provide.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] access method: are disk pages mandatory?

2010-06-23 Thread Leonardo F
in bufpage.h:

all blocks written out by an access method must be disk pages

but in 
http://www.postgresql.org/docs/8.4/static/storage-page-layout.html

Actually, index access methods need not use this page format. All the 
existing index methods do use this basic format, but the data kept on
index metapages usually doesn't follow the item layout rules.

I'm not getting it: am I supposed to use the disk page format when
writing an index access method, or it's just a good practice because
it makes the handling easier? Given the docs it looks recommended,
but the comment on the code sounds more mandatory.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error with GIT Repository

2010-06-11 Thread Leonardo F
 Why are you cloning over http? 

Me too I've used http, since I'm behind a proxy and I couldn't
find a simple way of having the git:// method working behind
a proxy...




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Git: Unable to get pack file

2010-06-09 Thread Leonardo F
 I've re-run git repack on 
 it, please try again. At least that file is
 accessible from here 
 now..


It worked, thank you very much





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Git: Unable to get pack file

2010-06-08 Thread Leonardo F
Hi,


I tried getting the source using:

git clone http://git.postgresql.org/git/postgresql.git postgresql-git

but after a while (252MB) I always get:

[...]
Getting pack 61e1395a5bdacda95de5432123a0f8124fff05e6
which contains 476418893d3a2f366f47dbe4ce6d7522cc427545
error: Unable to get pack file 
http://git.postgresql.org/git/postgresql.git/objects/pack/pack-61e1395a5bdacda95de5432123a0f8124fff05e6.pack
couldn't connect to host
error: Unable to find 476418893d3a2f366f47dbe4ce6d7522cc427545 under 
http://git.postgresql.org/git/postgresql.git
Cannot obtain needed blob 476418893d3a2f366f47dbe4ce6d7522cc427545
while processing commit ee6ddc7575ab1ce16d8f4eb2cdbcc525d43a6437.
fatal: Fetch failed.



I'm behind proxy/firewall, so I used the http://; protocol.

wget 
http://git.postgresql.org/git/postgresql.git/objects/pack/pack-61e1395a5bdacda95de5432123a0f8124fff05e6.pack

works... 

What am I doing wrong?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [FWD] About Our CLUSTER implementation is pessimal patch

2010-02-15 Thread Leonardo F
I really thought this would have caused some interest, since

- this item is in the TODO list
- the improvement for CLUSTER in some scenarios is 800%,
and maybe more (if I didn't do anything wrong, of course...)

Could at least the message:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00766.php
be added to the TODO page, under 
Improve CLUSTER performance by sorting to reduce
random I/O ?
It would be sad if the patch got lost... 


Leonardo


 Attached the updated patch (should solve a bug) and a script.
 The sql scripts generates a 2M rows table (orig); then the
 table is copied and the copy clustered using seq + sort (since 
 set enable_seqscan=false;).
 Then the table orig is copied again, and the copy clustered
 using regular index scan (set enable_indexscan=true; set 
 enable_seqscan=false).
 Then the same thing is done on a 5M rows table, and on a 10M
 rows table.
 
 On my system (Sol10 on a dual Opteron 2.8) single disc:
 
 
 2M:  seq+sort 11secs; regular index scan: 33secs
 5M:  seq+sort 39secs; regular index scan: 105secs
 10M:seq+sort 83secs; regular index scan: 646secs
 
 
 Maybe someone could suggest a better/different test?
 
 
 Leonardo





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FWD] About Our CLUSTER implementation is pessimal patch

2010-02-15 Thread Leonardo F
 As outlined in the Submission timing section, you're 
 asking about something during the wrong time to be doing so--that's why 
 you're 
 not getting any real feedback.  Add your patch to the next CommitFest by 
 linking 
 to your message at https://commitfest.postgresql.org/ 


Ok!
But there's something I don't understand: I didn't add the patch to the next
CommitFest because I thought it could never be added in 9.0 (because it adds a
new feature which has never been discussed). Hence I thought it should have
been discussed (not properly reviewed) out of a CommitFest.
The Submission timing section talks about beta phase, not alpha phase, so
I'm stll confused...
In other words: should patches that won't be included in the next release
(because it's too late) still added to the next CommitFest? I thought a very 
rough
discussion was the way to go in these cases, but I'm not familiar at all with 
the
process... I'll wait for an answer before adding the patch to the CommitFest 
(and
in case, I'll add more comments and docs to it)

Thank you very much!


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FWD] About Our CLUSTER implementation is pessimal patch

2010-02-15 Thread Leonardo F
 Yes. There's not going to be any more commitfests for this release, so
 the next commitfest is for 9.1.


Perfect! Where could I find such information? I mean: how could I know
it?

 (don't worry about the lack of enthusiasm for the patch, people are just
 very busy with 9.0 and don't have the energy to think about 9.1 material
 at this point)


I understand! Since it's going to be posted in a CommitFest, I'll try to clean
up the patch a little.

Thank you Greg and Heikki.


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-02-10 Thread Leonardo F
Hi all,


while testing the seq scan + sort CLUSTER implementation, I've found a
bug in write/readtup_rawheap.

The functions are needed by the sort part.
The code in 

http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php

didn't copy the whole tuple, but only the HeapTuple header: the t_data
part wasn't copied (at least, that's my understanding), The original code
is at the bottom of the email. It didn't work (the table wasn't fully clustered
at the end of the CLUSTER command).

So I modified write/readtup_rawheap: they are supposed to store/retrieve
both the fixed part of HeapTupleData, plus the variable part t_data.
But a get a lot of:
WARNING:  problem in alloc set TupleSort: detected write past chunk end
  in block 0x96853f0, chunk 0x968723c
WARNING:  problem in alloc set TupleSort: detected write past chunk end 
  in block 0x96853f0, chunk 0x96872c8
warnings when calling tuplesort_end  and some of the data gets garbled
after the CLUSTER command.


What am I doing wrong? Using the debugger, data coming out of
readtup_rawheap seems fine...

I *really* need your help here...


static void
writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup)
{
HeapTupletuple = (HeapTuple) stup-tuple;
tuple-t_len += sizeof(HeapTupleData); /* write out the header as well */

LogicalTapeWrite(state-tapeset, tapenum,
tuple, sizeof(HeapTupleData));
LogicalTapeWrite(state-tapeset, tapenum, tuple-t_data, 
 tuple-t_len-sizeof(HeapTupleData));
if (state-randomAccess)/* need trailing length word? */
  LogicalTapeWrite(state-tapeset, tapenum,
 tuple, sizeof(tuple-t_len));

FREEMEM(state, GetMemoryChunkSpace(tuple));
heap_freetuple(tuple);
}

static void
readtup_rawheap(Tuplesortstate *state, SortTuple *stup,
int tapenum, unsigned int tuplen)
{
HeapTupletuple = (HeapTuple) palloc(sizeof(HeapTupleData));
tuple-t_data = (HeapTupleHeader) palloc(tuplen-sizeof(HeapTupleData));

USEMEM(state, GetMemoryChunkSpace(tuple));

tuple-t_len = tuplen - sizeof(HeapTupleData);
if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, 
sizeof(HeapTupleData)-sizeof(tuplen))
   != sizeof(HeapTupleData)-sizeof(tuplen))
  elog(ERROR, unexpected end of data);
if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_data, tuple-t_len) != 
tuple-t_len)
  elog(ERROR, unexpected end of data);
if (state-randomAccess)/* need trailing length word? */
 if (LogicalTapeRead(state-tapeset, tapenum, tuplen,
   sizeof(tuplen)) != sizeof(tuplen))
  elog(ERROR, unexpected end of data);

stup-tuple = tuple;
/* set up first-column key value */
if (state-indexInfo-ii_Expressions == NULL)
{
/* no expression index, just get the key datum value */
stup-datum1 = heap_getattr((HeapTuple) stup-tuple,
state-indexInfo-ii_KeyAttrNumbers[0],
state-tupDesc,
stup-isnull1);
}
else
{
  [...] expression index part, removed for clarity 
}
}




Original code:

static void
writetup_rawheap(Tuplesortstate *state, int tapenum, SortTuple *stup)
{
HeapTuple   tuple = (HeapTuple) stup-tuple;
tuple-t_len += HEAPTUPLESIZE; /* write out the header as well */

LogicalTapeWrite(state-tapeset, tapenum,
tuple, tuple-t_len);

if (state-randomAccess)/* need trailing length word? */
 LogicalTapeWrite(state-tapeset, tapenum,
   tuple, sizeof(tuple-t_len));

FREEMEM(state, GetMemoryChunkSpace(tuple));
heap_freetuple(tuple);
}

static void
readtup_rawheap(Tuplesortstate *state, SortTuple *stup,
int tapenum, unsigned int tuplen)
{
HeapTuple   tuple = (HeapTuple) palloc(tuplen);

USEMEM(state, GetMemoryChunkSpace(tuple));

tuple-t_len = tuplen - HEAPTUPLESIZE;
if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, 
  tuplen-sizeof(tuplen)) != tuplen-sizeof(tuplen))
  elog(ERROR, unexpected end of data);
if (state-randomAccess)/* need trailing length word? */
if (LogicalTapeRead(state-tapeset, tapenum, tuplen,
sizeof(tuplen)) != sizeof(tuplen))
elog(ERROR, unexpected end of data);

stup-tuple = tuple;
/* set up first-column key value */
stup-datum1 = heap_getattr(tuple,
state-scanKeys[0].sk_attno,
state-tupDesc,
stup-isnull1);
}





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-02-10 Thread Leonardo F
 I think you're confusing HeapTuple and HeapTupleHeader. SortTuple-tuple
 field should point to a HeapTupleHeader, not a HeapTuple.


Mmh, I don't get it: that would mean I might be using more info than required,
but I still don't understand why it's not working... at the end, CLUSTER is 
going
to need full HeapTuple(s) (if I'm not mistaken).
SortTuple-tuple can be any of MinimalTuple, IndexTuple, even a Datum.
So I added my own read/write_rawtuple (I'm not that good, they were in the
original patch and I copypasted them).

I can write and read those tuples (using some Asserts I think everything goes

as expected in write/readtup_rawheap). But when calling tuplesort_getrawtuple,
after some right tuples, the call to tuplesort_gettuple_common fails because
the lines:

/* pull next preread tuple from list, insert in heap */
newtup = state-memtuples[tupIndex];


give a wrong initialized newtup (in other words, newtup is random memory).
Since write/readtup_rawheap seems fine, I can't understand what's going on...

I write the HeapTuples with:

(in writetup_rawheap):
HeapTupletuple = (HeapTuple) stup-tuple;
tuple-t_len += HEAPTUPLESIZE; /* write out the header as well */

LogicalTapeWrite(state-tapeset, tapenum,
  tuple, HEAPTUPLESIZE);
LogicalTapeWrite(state-tapeset, tapenum, tuple-t_data, 
   tuple-t_len-HEAPTUPLESIZE);



then I read them with:
(in readtup_rawheap):
HeapTupletuple = (HeapTuple) palloc(tuplen);
USEMEM(state, GetMemoryChunkSpace(tuple));

tuple-t_len = tuplen - HEAPTUPLESIZE;
if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_self, 
  HEAPTUPLESIZE-sizeof(tuplen)) != HEAPTUPLESIZE-sizeof(tuplen))
  elog(ERROR, unexpected end of data);
if (LogicalTapeRead(state-tapeset, tapenum, tuple-t_data, tuple-t_len) != 
tuple-t_len)
  elog(ERROR, unexpected end of data);




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-02-10 Thread Leonardo F
I think I've found the problem:

tuple-t_data wasn't at HEAPTUPLESIZE distance from tuple.
I guess the code makes that assumption somewhere, so I had
to do:

tuple-t_data = (HeapTupleHeader) ((char *) tuple + 
 HEAPTUPLESIZE);

Now that test works! Hope I don't find any more problems...



Leonardo





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-02-10 Thread Leonardo F
Perhaps you could supply a .sql file containing a testcase 
 illustrating the performance benefits you tested with your patch

Sure.


Attached the updated patch (should solve a bug) and a script.
The sql scripts generates a 2M rows table (orig); then the
table is copied and the copy clustered using seq + sort (since 
set enable_seqscan=false;).
Then the table orig is copied again, and the copy clustered
using regular index scan (set enable_indexscan=true; set 
enable_seqscan=false).
Then the same thing is done on a 5M rows table, and on a 10M
rows table.

On my system (Sol10 on a dual Opteron 2.8) single disc:


2M:  seq+sort 11secs; regular index scan: 33secs
5M:  seq+sort 39secs; regular index scan: 105secs
10M:seq+sort 83secs; regular index scan: 646secs


Maybe someone could suggest a better/different test?


Leonardo



  

sorted_cluster20100210.patch
Description: Binary data


cluster_tests.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


I: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-02-09 Thread Leonardo F
Not even a comment? As I said, performance results on my system
were very good

 I know you're all very busy getting 9.0 out, but I think the results in
 heap scanning + sort instead of index scanning for CLUSTER are
 very good... I would like to know if I did something wrong/I should
 improve something in the patch... I haven't tested it with index
 expressions yet (but the tests in make check work).
 
 Thanks
 
 Leonardo
 
 
  Hi all,
  
  attached a patch to do seq scan + sorting instead of index scan 
  
  on CLUSTER (when that's supposed to be faster).
  
  As I've already said, the patch is based on: 
  http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php
  
  Of course, the code isn't supposed to be ready to be merged: I
  would like to write more comments and add some test cases to
  cluster.sql (plus change all the things you are going to tell me I
  have to change...)
  
  I would like your opinions on code correctness and the decisions
  I took, especially:
  
  1) function names (cost_index_scan_vs_seqscansort I guess
  it's awful...)
  
  2) the fact that I put in Tuplesortstate an EState variable, so that 
  MakeSingleTupleTableSlot wouldn't have to be called for every
  row in the expression indexes case
  
  3) the expression index case is not optimized: I preferred to
  call FormIndexDatum once for the first key value in 
  copytup_rawheap and another time to get all the remaining values
  in comparetup_rawheap. I liked the idea of re-using
  FormIndexDatum in that case, instead of copyingpasting only
  the relevant code: but FormIndexDatum returns all the values,
  
  even when I might need only the first one
  
  
  4) I refactored the code to deform and rewrite tuple into the function
  deform_and_rewrite_tuple, because now that part can be called
  by the regular index scan or by the new seq-scan + sort (but I
  could copypaste those lines instead of refactoring them into a new
  function)
  
  Suggestions and comments are not just welcome, but needed!



  

sorted_cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] IndexBuildHeapScan and RIDs order

2010-02-08 Thread Leonardo F
Hi,

I was looking at the code for bitmap index:

http://archives.postgresql.org/pgsql-hackers/2008-10/msg01691.php

and I couldn't understand why during bmbuild (the ambuild call for
bitmap indexes) the code checks for not-ordered ItemPointerData(s).

In which cases the ItemPointerData(s) given by IndexBuildHeapScan

are not in order (when allow_sync=false)?



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-02-01 Thread Leonardo F
I know you're all very busy getting 9.0 out, but I think the results in
heap scanning + sort instead of index scanning for CLUSTER are
very good... I would like to know if I did something wrong/I should
improve something in the patch... I haven't tested it with index
expressions yet (but the tests in make check work).

Thanks

Leonardo


 Hi all,
 
 attached a patch to do seq scan + sorting instead of index scan 
 
 on CLUSTER (when that's supposed to be faster).
 
 As I've already said, the patch is based on: 
 http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php
 
 Of course, the code isn't supposed to be ready to be merged: I
 would like to write more comments and add some test cases to
 cluster.sql (plus change all the things you are going to tell me I
 have to change...)
 
 I would like your opinions on code correctness and the decisions
 I took, especially:
 
 1) function names (cost_index_scan_vs_seqscansort I guess
 it's awful...)
 
 2) the fact that I put in Tuplesortstate an EState variable, so that 
 MakeSingleTupleTableSlot wouldn't have to be called for every
 row in the expression indexes case
 
 3) the expression index case is not optimized: I preferred to
 call FormIndexDatum once for the first key value in 
 copytup_rawheap and another time to get all the remaining values
 in comparetup_rawheap. I liked the idea of re-using
 FormIndexDatum in that case, instead of copyingpasting only
 the relevant code: but FormIndexDatum returns all the values,
 
 even when I might need only the first one
 
 
 4) I refactored the code to deform and rewrite tuple into the function
 deform_and_rewrite_tuple, because now that part can be called
 by the regular index scan or by the new seq-scan + sort (but I
 could copypaste those lines instead of refactoring them into a new
 function)
 
 Suggestions and comments are not just welcome, but needed!


  

sorted_cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-28 Thread Leonardo F
Hi all,

attached a patch to do seq scan + sorting instead of index scan 

on CLUSTER (when that's supposed to be faster).

As I've already said, the patch is based on: 
http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php

Of course, the code isn't supposed to be ready to be merged: I
would like to write more comments and add some test cases to
cluster.sql (plus change all the things you are going to tell me I
have to change...)

I would like your opinions on code correctness and the decisions
I took, especially:

1) function names (cost_index_scan_vs_seqscansort I guess
it's awful...)

2) the fact that I put in Tuplesortstate an EState variable, so that 
MakeSingleTupleTableSlot wouldn't have to be called for every
row in the expression indexes case

3) the expression index case is not optimized: I preferred to
call FormIndexDatum once for the first key value in 
copytup_rawheap and another time to get all the remaining values
in comparetup_rawheap. I liked the idea of re-using
FormIndexDatum in that case, instead of copyingpasting only
the relevant code: but FormIndexDatum returns all the values,

even when I might need only the first one


4) I refactored the code to deform and rewrite tuple into the function
deform_and_rewrite_tuple, because now that part can be called
by the regular index scan or by the new seq-scan + sort (but I
could copypaste those lines instead of refactoring them into a new
function)

Suggestions and comments are not just welcome, but needed!


Leonardo


  

sorted_cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-27 Thread Leonardo F
 Consider multi-column indexes, ie:
 CREATE INDEX i_foo ON foo (length(a), length(b));


Ok, I've never thought of expression indexes that way 
(in the (expr1,expr2,exprN) form): that is a good example.

 Maybe you're confusing expression indexes with partial indexes? 

No no, that was exactly what I needed to know.


Thank you very much

Leonardo





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-26 Thread Leonardo F
 If we ever get another index type that supports ordered
 scans, it'll be time enough to worry about cases like this.


Ok

 BTW, I think you could use tuplesort_begin_index_btree() rather than
 touching _bt_mkscankey_nodata directly.  


well I created my own tuplesort_begin_rawheap method (copied from:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php ).
From there I call _bt_mkscankey_nodata (as tuplesort_begin_index_btree()
does), plus I set up everything else I'm going to need in tuplesort.

Another question: 

why is IndexInfo.ii_Expressions a list? How can an index have more than
one expression? Sorry if it's a stupid question, but I'm not familiar with
index expressions.

I think I'm almost there (some very stupid tests pass). I'll try to submit a
patch soon to understand if I'm going in the right direction.



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-25 Thread Leonardo F
 Rule it out.  Note you should be looking at pg_am.amcanorder, not
 hardwiring knowledge of particular index types. 

Sorry, I replied ok too fast...


I can look at pg_am.amcanorder, but I would still need the ScanKey to be used

by tuplesort; and I can't find any other way of doing it than calling 
_bt_mkscankey_nodata, which is btree-specific.

I guess either:

- add another function to the list of Index Access Method Functions, something
that returns the ScanKey in case pg_am.amcanorder is true

or

- hardwiring the fact that the only way to seq scan + sort in CLUSTER is using
a btree... hence the call to _bt_mkscankey_nodata

But maybe there's another way of doing it, I don't know the code enough


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-22 Thread Leonardo F
  So my proposal would be: do the test seq_scan vs sort/index_scan only for
  regular btree index, and integrate that test in the planner.
 
 Keep in mind that this patch was after the deadline for 9.0, so there
 is probably not a huge rush to get this done.


That's true; I'll try to get the whole thing done then... 




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-22 Thread Leonardo F
So, if I'm not mistaken:


hash indexes - can't be used in CLUSTER
gin indexes - can't be used in CLUSTER

that leaves:

btree - ok
expression btree - I have to find a way to compute the expression for
  each tuple: hints?
gist - how can I get something comparable by tuplesort? Or should I rule it
   out from the seq scan + sort path?



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-22 Thread Leonardo F
 Note you should be looking at pg_am.amcanorder, not
 hardwiring knowledge of particular index types.


Ok. 

Would it make sense to use

FormIndexDatum

to get the index value to be used by tuplesort?

I'm having trouble avoiding the call to _bt_mkscankey_nodata
to get the scanKeys... that relates to btree only, I can't find 
the general function...




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-21 Thread Leonardo F
Anyone? I'd like some feedback before moving on to do the seq scan + sort in 
those
CLUSTER cases where use_index_scan returns false...



- Messaggio originale -
 Da: Leonardo F m_li...@yahoo.it
 A: pgsql-hackers@postgresql.org
 Inviato: Mer 20 gennaio 2010, 18:48:00
 Oggetto: Re: [HACKERS] About Our CLUSTER implementation is pessimal patch
 
  I read the thread Our CLUSTER implementation is pessimal 
  http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .
  
  I was going to try to add the proper cost_index/cost_sort calls to decide 
 which 
  path should be executed, as in:
  
  http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php
 
 I think I got something up and running to check if a table scan + sort is 
 supposed
 to be faster than an index scan for a certain CLUSTER operation.
 
 The way I did it is (I guess...) wrong: I created the elements needed by
 get_relation_info, create_seqscan_path, create_index_path, cost_sort.
 
 It has been, obviously, a trial and error approach: I added the member values 
 as
 soon as one function call crashed... and I bet I didn't get all the corner 
 cases.
 Is there any better way of doing it?
 
 Leonardo
 
 (this is called in copy_heap_data to decide which path to choose:)
 
 static bool use_index_scan(Oid tableOid, Oid indexOid)
 {
 RelOptInfo *rel;
 PlannerInfo *root;
 Query *query;
 PlannerGlobal *glob;
 Path *seqAndSortPath;
 IndexPath *indexPath;
 RangeTblEntry *rte;
 
 rel = makeNode(RelOptInfo);
 rel-reloptkind = RELOPT_BASEREL;
 rel-relid = 1;
 rel-rtekind = RTE_RELATION;
 
 /* needed by get_relation_info */
 glob = makeNode(PlannerGlobal);
 
 /* needed by get_relation_info: */
 query = makeNode(Query);
 query-resultRelation = 0;
 
 root = makeNode(PlannerInfo);
 
 root-parse = query;
 root-glob = glob;
 
 get_relation_info(root, tableOid, false, rel);
 seqAndSortPath = create_seqscan_path(NULL, rel);
 
 rel-rows = rel-tuples;
 
 rte = makeNode(RangeTblEntry);
 rte-rtekind = RTE_RELATION;
 rte-relid = tableOid;
 
 root-simple_rel_array_size = 2;
 root-simple_rte_array = (RangeTblEntry **)
 palloc0(root-simple_rel_array_size * sizeof(RangeTblEntry *));
 root-simple_rte_array[1] = rte;
 
 root-total_table_pages = rel-pages;
 
 indexPath = create_index_path(root, 
 (IndexOptInfo*)(list_head(rel-indexlist)-data.ptr_value), NULL, NULL, 
 ForwardScanDirection, NULL);
 cost_sort(seqAndSortPath, root, NULL, seqAndSortPath-total_cost, 
 rel-tuples, 
 rel-width, -1);
 
 return indexPath-path.total_cost  seqAndSortPath-total_cost;
 }




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-21 Thread Leonardo F
 * Do we need to disable sort-path for tables clustered on a gist index?

Yes; as I said in a previous mail, only plain btree indexes (that is, not
custom expression indexes) would have that option (at least in a first
version...)

 * I'd prefer to separate cost calculation routines from create_index_path()
and cost_sort(), rather than using a dummy planner.


How? Copypaste (removing unnecessary code) of the existing 
create_index_path() and cost_sort()?


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-21 Thread Leonardo F
one idea could be to actually prepare a query using SPI for select * from 
table order by cols and then peek inside
 to see which plan was generated. 

I like that!!!
Here's a first attempt, it looks like it's working...
(I still have to skip non-btree indexes and expression indexes, plus
add a ASC/DESC to the select)

static bool use_index_scan(Relation OldHeap, Oid indexOid)
{
HeapTupleindexTuple;
Form_pg_index indexForm;
struct _SPI_plan *spiPlan;
char st[(NAMEDATALEN+1)*INDEX_MAX_KEYS+NAMEDATALEN+100];
int i;
TupleDescoldTupDesc;
bool retval = true;
PlannedStmt *plan;
CachedPlanSource *cps;

oldTupDesc = RelationGetDescr(OldHeap);

sprintf(st, select * from %s order by , OldHeap-rd_rel-relname.data);
indexTuple = SearchSysCache(INDEXRELID, ObjectIdGetDatum(indexOid),0, 0, 0);

if (!HeapTupleIsValid(indexTuple))
   elog(ERROR, cache lookup failed for index %u, indexOid);

indexForm = (Form_pg_index) GETSTRUCT(indexTuple);

for (i = 0; i  indexForm-indnatts; i++)
{
  strcat(st, oldTupDesc-attrs[indexForm-indkey.values[i]-1]-attname.data);
  if (i+1  indexForm-indnatts)
  {
   strcat(st, ,);
   }
}

ReleaseSysCache(indexTuple);

if (SPI_connect() != SPI_OK_CONNECT)
   return false;

spiPlan = SPI_prepare(st, 0, NULL);
if (spiPlan == NULL)
{
  SPI_finish(); 
  return false;
}

cps = (CachedPlanSource*)(list_head(spiPlan-plancache_list)-data.ptr_value);
plan = (PlannedStmt*)(list_head(cps-plan-stmt_list)-data.ptr_value);
if (IsA(plan-planTree, Sort))
{
   retval = false;
}

SPI_freeplan(spiPlan);
SPI_finish();

return retval;
}





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-21 Thread Leonardo F
 By the time you make this actually work in all cases, it's probably
 going to be more of a mess than the other way; 

I meant to add only ASC/DESC; I would leave all other cases
(non-btrees, custom expression btrees) to use the old index-scan method.

 not to mention that it
 doesn't work *at all* without violating SPI internals.


You lost me there...




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-21 Thread Leonardo F
  I meant to add only ASC/DESC; I would leave all other cases
  (non-btrees, custom expression btrees) to use the old index-scan method.
 
 That hardly seems acceptable.


Well I brought up that in an earlier post:

http://old.nabble.com/Re%3A-About-%22Our-CLUSTER-implementation-is-pessimal%22-patch-p27179107.html

I hoped that since people mostly (95%?) use plain btree indexes,
a patch that helped CLUSTER with using such indexes would be fine
(at least at first...). I guess that a patch that deals with all other types of
indexes would be way more complicated (not at the planning stage,
but in the scan+sort phase)?

 I hardly think that keeping yourself at arm's length from the planner
 by getting cozy with SPI internals is a net improvement in modularity.


So you think that code snippet that I sent earlier (the function that uses
create_index_path etc) could be put in planner.c (almost) as it is? It looked
clumsy to me (I liked the SPI code better)


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-21 Thread Leonardo F
 Well, the expression cases would be more likely to cost more if
 implemented as a sort, but that doesn't mean that a sort couldn't be a
 win.  Besides, even if you blow off the expression case, what about
 nulls first/last, nondefault opclasses, etc?


Ok, let's split the problem in 2 parts: 

a) choosing the right path
b) using seq scan + sort in case the planner (somehow) said it's faster

You're right: for a) nondefault opclasses and other things would make the
SPI version wrong. So let's stick for the moment with the cost_sort 
create_index_path etc calls for a). I think that the calls to create_index_path
would also deal with every other possible index type (something that's
impossible with the SPI version).

For b):

I'm using the code in 

http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php

That doesn't deal with expression indexes, nor with anything that is not
a btree index. But it's much faster on what people use the most (and not
slower on anything else).

So my proposal would be: do the test seq_scan vs sort/index_scan only for
regular btree index, and integrate that test in the planner.

That doesn't mean that sometime in the future we're not going to have full
support for all index types in seq scan + sort CLUSTER... but I would like
to have something that works faster on what people use, and not slower
in the other cases without waiting ages to have the whole thing...




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-20 Thread Leonardo F
New version of the patch, let me know if I can fix/change something else.



Leonardo



  

getsetbit.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-20 Thread Leonardo F
 All issues addressed, with one tiny nit-pick -- the get_bit and
 set_bit methods are not part of the SQL standard. 


Damn! I completely forgot to mention that I had no idea if what I wrote
in the docs made any sense...

Well thank you for your thorough review.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-20 Thread Leonardo F
 I read the thread Our CLUSTER implementation is pessimal 
 http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .
 
 I would like to try/integrate that patch as we use CLUSTER a lot on our 
 system.
 
 I was going to try to add the proper cost_index/cost_sort calls to decide 
 which 
 path should be executed, as in:
 
 http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php

I think I got something up and running to check if a table scan + sort is 
supposed
to be faster than an index scan for a certain CLUSTER operation.

The way I did it is (I guess...) wrong: I created the elements needed by
get_relation_info, create_seqscan_path, create_index_path, cost_sort.

It has been, obviously, a trial and error approach: I added the member values as
soon as one function call crashed... and I bet I didn't get all the corner 
cases.
Is there any better way of doing it?

Leonardo

(this is called in copy_heap_data to decide which path to choose:)

static bool use_index_scan(Oid tableOid, Oid indexOid)
{
RelOptInfo *rel;
PlannerInfo *root;
Query *query;
PlannerGlobal *glob;
Path *seqAndSortPath;
IndexPath *indexPath;
RangeTblEntry *rte;

rel = makeNode(RelOptInfo);
rel-reloptkind = RELOPT_BASEREL;
rel-relid = 1;
rel-rtekind = RTE_RELATION;

/* needed by get_relation_info */
glob = makeNode(PlannerGlobal);

/* needed by get_relation_info: */
query = makeNode(Query);
query-resultRelation = 0;

root = makeNode(PlannerInfo);

root-parse = query;
root-glob = glob;

get_relation_info(root, tableOid, false, rel);
seqAndSortPath = create_seqscan_path(NULL, rel);

rel-rows = rel-tuples;

rte = makeNode(RangeTblEntry);
rte-rtekind = RTE_RELATION;
rte-relid = tableOid;

root-simple_rel_array_size = 2;
root-simple_rte_array = (RangeTblEntry **)
palloc0(root-simple_rel_array_size * sizeof(RangeTblEntry *));
root-simple_rte_array[1] = rte;

root-total_table_pages = rel-pages;

indexPath = create_index_path(root, 
(IndexOptInfo*)(list_head(rel-indexlist)-data.ptr_value), NULL, NULL, 
ForwardScanDirection, NULL);
cost_sort(seqAndSortPath, root, NULL, seqAndSortPath-total_cost, rel-tuples, 
rel-width, -1);

return indexPath-path.total_cost  seqAndSortPath-total_cost;
}





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-19 Thread Leonardo F
 In the
 documentation, the get_bit and set_bit methods are added to a list
 where we state The following SQL-standard functions work on bit
 strings as well as character strings; however they are not
 SQL-standard functions and are implemented on binary strings, not
 character strings. 

Ok, I didn't change that, but I guess I can fix it.


 The tests don't include any examples where the bit
 string lines up with a byte boundary or is operating on the first or
 last bit of a byte, and the overlay function is not tested with
 values which change the length of the bit string. (All of these
 cases seem to work in the current version, but seem like the sort of
 thing which could get broken in revision.)


Ok, I'll add some corner-cases tests.

 There is a peculiar omission from the patch -- it adds supportfor the
 overlay function to bit strings but not binary strings. Sincethe
 standard drops the bit string as a standard type in the 2003standard,
 in favor of boolean type and binary string types, it seemsodd to omit
 binary string support (which is defined in the standard)but include
 bit string support (which isn't). What the patch adds seemsuseful,
 and I don't think the omission should be considered fatal, butit
 would be nice to see us also cover binary strings if we can.


Mmh, don't know how complicated that would be, but I can give it a try:
I guess it could be a simple SQL replacement (as for char strings and bit
strings?)

 Well, there was one thing which is beyond my ken
 in this regard: I'm not sure that the cases from bits8 to (unsigned
 char *) are needed or appropriate, although on my machine they don't
 seem to hurt.  Perhaps someone with a better grasp of such issues can
 comment.


That's some kind of copypaste from varlena.c byteaGetBit: I don't know
if they can be avoided.

 I'm not sure the leading whitespace in pg_proc.h is as it should be.


Do you mean the tab in:
 
_null_(tab)select ...

?

Yes, I think I should remove it.

 I'd prefer to see the comments for get_bit and set_bit mention that
 the location is specified left-to-right in a zero-based fashion
 consistent with the other get_bit and set_bit functions, but
 inconsistent with the standard substring, position, overlay
 functions.  Personally, I find it unfortunate that our extensions
 introduced this inconsistency, but let's keep it consistent within
 the similarly named functions.

Ok; I found them bizarre too, but I kept it consistent.

 It seems odd to me that you specify the bit to set as a 32 bit
 integer, and that that is what get_bit returns, but again, this is
 consistent with what we do for the bytea functions, so it's probably
 the right thing to match that.


Same as above: I tried to be consistent.

 Using a ERRCODE_ARRAY_SUBSCRIPT_ERROR for a bit position which is
 out-of-bounds seems somewhat bizarre to me -- I'd probably have
 chosen ERRCODE_INVALID_PARAMETER_VALUE in a green field -- but again,
 it matches the bytea implementation.


Again, same as above: consistency; but I can change it if you want

 This last point is probably more a matter of C coding style than
 anything, and I'm way to rusty in C to want to be the final arbiter
 of something like this, but it seems to me that oldByte and newByte
 local variables are just cluttering things up rather than clarifying:
 
 intoldByte,
 newByte;
 [...]
 oldByte = ((unsigned char *) r)[byteNo];
 
 if (newBit == 0)
 newByte = oldByte  (~(1  bitNo));
 else
 newByte = oldByte | (1  bitNo);
 
 ((unsigned char *) r)[byteNo] = newByte;
 
 vs
 
 if (newBit == 0)
 ((unsigned char *) r)[byteNo] = (~(1  bitNo));
 else
 ((unsigned char *) r)[byteNo] |= (1  bitNo);
 


I agree, you version is cleaner.


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Patch: Allow substring/replace() to get/set bit values

2010-01-18 Thread Leonardo F
 This patch no longer applies.  Could you rebase it?


Done (I think). Added a couple of simple tests for bit overlay.

I didn't include the catversion.h changes: obviously the CATALOG_VERSION_NO has 
to be changed.


Leonardo


  Index: src/backend/utils/adt/varbit.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/varbit.c,v
retrieving revision 1.63
diff -c -r1.63 varbit.c
*** src/backend/utils/adt/varbit.c  7 Jan 2010 20:17:43 -   1.63
--- src/backend/utils/adt/varbit.c  18 Jan 2010 09:05:50 -
***
*** 1606,1608 
--- 1606,1704 
}
PG_RETURN_INT32(0);
  }
+ 
+ 
+ /* bitsetbit
+  * Given an instance of type 'bit' creates a new one with
+  * the Nth bit set to the given value.
+  */
+ Datum
+ bitsetbit(PG_FUNCTION_ARGS)
+ {
+   VarBit *arg1 = PG_GETARG_VARBIT_P(0);
+   int32   n = PG_GETARG_INT32(1);
+   int32   newBit = PG_GETARG_INT32(2);
+   VarBit *result;
+   int len,
+   bitlen;
+   bits8  *r,
+  *p;
+   int byteNo,
+   bitNo;
+   int oldByte,
+   newByte;
+ 
+   bitlen = VARBITLEN(arg1);
+   if (n  0 || n = bitlen)
+   ereport(ERROR,
+   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+errmsg(index %d out of valid range, 0..%d,
+   n, bitlen - 1)));
+   /*
+* sanity check!
+*/
+   if (newBit != 0  newBit != 1)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg(new bit must be 0 or 1)));
+ 
+   len = VARSIZE(arg1);
+   result = (VarBit *) palloc(len);
+   SET_VARSIZE(result, len);
+   VARBITLEN(result) = bitlen;
+ 
+   p = VARBITS(arg1);
+   r = VARBITS(result);
+ 
+   memcpy(r, p, VARBITBYTES(arg1));
+ 
+   byteNo = n / BITS_PER_BYTE;
+   bitNo = BITS_PER_BYTE - 1 - (n % BITS_PER_BYTE);
+ 
+   /*
+* Update the byte.
+*/
+   oldByte = ((unsigned char *) r)[byteNo];
+ 
+   if (newBit == 0)
+   newByte = oldByte  (~(1  bitNo));
+   else
+   newByte = oldByte | (1  bitNo);
+ 
+   ((unsigned char *) r)[byteNo] = newByte;
+ 
+   PG_RETURN_VARBIT_P(result);
+ }
+ 
+ /* bitgetbit
+  * returns the value of the Nth bit (0 or 1).
+  */
+ Datum
+ bitgetbit(PG_FUNCTION_ARGS)
+ {
+   VarBit *arg1 = PG_GETARG_VARBIT_P(0);
+   int32   n = PG_GETARG_INT32(1);
+   int bitlen;
+   int byteNo,
+   bitNo;
+   int byte;
+ 
+   bitlen = VARBITLEN(arg1);
+ 
+   if (n  0 || n = bitlen)
+   ereport(ERROR,
+   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+errmsg(index %d out of valid range, 0..%d,
+   n, bitlen - 1)));
+ 
+   byteNo = n / BITS_PER_BYTE;
+   bitNo = BITS_PER_BYTE - 1 - n % BITS_PER_BYTE;
+ 
+   byte = ((unsigned char *) VARBITS(arg1))[byteNo];
+ 
+   if (byte (1  bitNo))
+   PG_RETURN_INT32(1);
+   else
+   PG_RETURN_INT32(0);
+ }
+ 
Index: src/include/catalog/pg_proc.h
===
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.562
diff -c -r1.562 pg_proc.h
*** src/include/catalog/pg_proc.h   15 Jan 2010 09:19:07 -  1.562
--- src/include/catalog/pg_proc.h   18 Jan 2010 09:05:58 -
***
*** 2405,2410 
--- 2405,2418 
  DATA(insert OID = 1699 (  substring   PGNSP PGUID 12 1 0 0 f 
f f t f i 2 0 1560 1560 23 _null_ _null_ _null_ _null_ bitsubstr_no_len 
_null_ _null_ _null_ ));
  DESCR(return portion of bitstring);
  
+ DATA(insert OID = 3030 (  overlay PGNSP PGUID 14 1 0 0 f 
f f t f i 4 0 1560 1560 1560 23 23 _null_ _null_ _null_ _null_ select 
pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + 
$4)) _null_ _null_ _null_ ));
+ DESCR(substitute portion of bit string);
+ DATA(insert OID = 3031 (  overlay PGNSP PGUID 14 1 0 0 f 
f f t f i 3 0 1560 1560 1560 23 _null_ _null_ _null_ _null_select 
pg_catalog.substring($1, 1, ($3 - 1)) || $2 || pg_catalog.substring($1, ($3 + 
pg_catalog.length($2))) _null_ _null_ _null_ ));
+ DESCR(substitute portion of bit string);
+ DATA(insert OID = 3032 (  get_bitPGNSP PGUID 12 1 0 0 f f f t 
f i 2 0 23 1560 23 _null_ _null_ _null_ _null_ bitgetbit _null_ _null_ _null_ 
));
+ DESCR(get bit);
+ DATA(insert 

[HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-15 Thread Leonardo F
Hi,

I read the thread Our CLUSTER implementation is pessimal 
http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .

I would like to try/integrate that patch as we use CLUSTER a lot on our system.

I was going to try to add the proper cost_index/cost_sort calls to decide which 
path should be executed, as in:

http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php

I don't think it will be easy without help... I'll ask here a lot I'm afraid...

About that patch:

1) would it be possible to use the tuplesort_*tupleslot set of functions 
instead of writing new ones for HeapTuple? That is: is it that 
difficult/impossible/nonsense to construct TupleTableSlot from HeapTuple and 
use those?

2) The patch doesn't check HeapTupleSatisfiesVacuum before passing it to 
tuplesort_putrawtuple: would it be reasonable to check the isdead flag before 
calling tuplesort_putrawtuple for each tuple?


Sorry if I talked nonsense...



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] About Our CLUSTER implementation is pessimal patch

2010-01-15 Thread Leonardo F
 Yeah, I think you could do that, I agree it feels better that way.
 You'll still need new copytup and comparetup functions, though, to deal
 with HeapTupleHeaders instead of MinimalTuples, or modify the existing
 ones to handle both. 

You meant HeapTuple, not HeapTupleHeaders, right?

Mmh, didn't think of those two functions; I might as well start with Gregory
Stark's patch (that is: using HeapTuple)

 And some way to indicate that you want to preserve
 the visibility information when you create the tuplesort, maybe a new
 parameter to tuplesort_begin_heap().

I guess that using Gregory Stark's patch there's no need for it, since it uses
HeapTuples, right?

A patch that:

1) uses always the old CLUSTER method for non-btree indexes and for
expression indexes
2) add a whole set of new functions to tuplesort (as in Gregory Stark's patch)

would be rejected for sure? Or can be thought as a better than nothing,
works in 90% cases patch?


Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Allow substring/replace() to get/set bit values

2010-01-08 Thread Leonardo F
 What we can do in the back branches is make the code treat any
 negative value as meaning two-arg form.  To throw an error we'd
 need to refactor the pg_proc representation ...


I was going to fix that myself, but I think it has just been done.

How can I keep up with who's doing what?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-07 Thread Leonardo F


  Is anybody interested? Otherwise the entry could be removed from the TODO 
 list...
 
 Even if not, you can still submit a patch.  There are a lot more users
 of PG than there are people who read -hackers.



Ok, I'll try and submit a patch. Thank you very much.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: Allow substring/replace() to get/set bit values

2010-01-07 Thread Leonardo F
Hi all,


attached a patch that adds the following functions for bit string:

- overlay
- get_bit
- set_bit


Some info:

1) overlay is implemented as calls to substring; given the different way 
substring behaves when used with strings vs bit strings:

test=# SELECT substring(B'0001' from 1 for -1);
substring 
--
0001
(1 row)

test=# SELECT substring('0001' from 1 for -1);
ERROR:  negative substring length not allowed



I don't think that this overlay implementation is what we want?

Example:

test=# SELECT overlay(B'' placing B'01' from 1 for 2);
overlay 
-
0111
(1 row)


(looks ok)

test=# SELECT overlay(B'' placing B'01' from 0 for 2);
overlay 
---
0
(1 row)



This happens because substring(bit, pos, -1) means substring(bit, pos, 
length(bit string)),
and  -1 values for bit substring parameters are allowed: is this a bug in bit 
substring???

2) I tried implementing bit_get and bit_set as calls to overlay/substring:

DATA(insert OID = 3032 (  get_bit  PGNSP PGUID 14 1 0 0 f f f t 
f i 2 0 23 1560 23 _null_ _null_ _null_ _null_ select 
(pg_catalog.substring($1, $2, 1))::int4 _null_ _null_ _null_ ));
DESCR(get bit);
DATA(insert OID = 3033 (  set_bit  PGNSP PGUID 14 1 0 0 f f f t 
f i 3 0 1560 1560 23 23 _null_ _null_ _null_ _null_ select 
pg_catalog.overlay($1, $3::bit, $2, 1) _null_ _null_ _null_ ));
DESCR(set bit);


but this doesn't give any check on the values provided:
that the bit looked for is in fact in the right range, and that the bit in 
set_bit is in fact a bit
(I don't like the idea of writing select set_bit(B'01010111', B'1') instead 
of 
select set_bit(B'01010111', 1) ).
So I coded them in proper C internal functions.



Leonardo


  

patch_bit.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Allow substring/replace() to get/set bit values

2010-01-07 Thread Leonardo F
 Thanks!  Please add your patch here:
 
 https://commitfest.postgresql.org/action/commitfest_view/open
 


Ok; but what about what I said about the difference between bit/string 
substring?
That affects overlay behaviour for bit...


I've even got 

ERROR:  invalid memory alloc request size 4244635647 

with:


SELECT substring(B'0001' from 5 for -2);

(this is 8.4.2, windows version, not modified...)



test=# SELECT substring(B'0001' from 1 for -1);
substring 
--
0001
(1 row)


test=# SELECT substring('0001' from 1 for -1);
ERROR:  negative substring length not allowed




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-06 Thread Leonardo F
  To sum up:
 
  1) a new function, get_bit, that calls substring
  2) a new function, overlay, that replaces bits (starting at a certain 
 position)
  3) a new function, set_bit, that calls overlay
 
 That seems reasonable to me.  Not sure what others think.


Is anybody interested? Otherwise the entry could be removed from the TODO 
list...



Leonardo





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-05 Thread Leonardo F
 You might want to search the archives (or the wiki history, or the CVS
 history if it's been there since before we moved the TODO list to the
 wiki) for discussion of why that item was added to the TODO in the
 first place.



I read the thread:

http://archives.postgresql.org/pgsql-hackers/2004-02/msg00478.php

1) it is true that getbit sounds a lot like what substring() does, but the 
same could be said for binary string substring/get_byte; so IMHO get/set_bit 
should be present for bit string
2) it is not very clear to me how setbit could actually be handled by 
replace() (maybe overlay style?)
3) since I'm looking at byte string get/set_bit to understand how that works, 
I'm having a hard time understanding why the bit indexes in get/set_bit are 
low-first based:

select get_bit(E'\\376\\376'::bytea, s) as b,s from generate_series(0,15,1) as s
b s
0 0
1 1
1 2
1 3
1 4
1 5
1 6
1 7
0 8
1 9
1 10
1 11
1 12
1 13
1 14
1 15


I understand this is the internal representation, but still: if someone asked 
me what the 8th bit in 11101110 is, I would have said 1, not 0 
(assuming the first bit has index '0'). Actually, David Helgason's patch 
(http://archives.postgresql.org/pgsql-hackers/2004-01/msg00498.php) goes in 
this direction: note the 

bitNo = 7 - (n % 8);

part. Using that algorithm would mean get/set_bit in bit string would behave 
differently from what they do in binary string (IMHO it's the binary string 
implementation that is wrong).



Leonardo




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-05 Thread Leonardo F
 As you say, there's really no point in changing the internal
 representation, and if you don't find replace() useful either, then
 why are you even working on this at all?  

I would like a get_bit / set_bit for bit strings, as I find them useful.
get_bit could be a simple call to substring, but there's no way of doing a 
set_bit on a bit string as far as I know.

I don't like the replace syntax for bit strings since it won't give you the 
same functionality of set_bit, 
plus I don't really see how someone would want to look for a bit string and 
replace it with another bit string.
But I see that someone might want to overlay a bit string with another (this is 
different from replace since you
have to tell the position where the replacing would start, instead of looking 
for a bit string).

To sum up:

1) a new function, get_bit, that calls substring
2) a new function, overlay, that replaces bits (starting at a certain 
position)
3) a new function, set_bit, that calls overlay


 Since the latest discussion
 of this is more than five years old, it's unclear that anyone even
 cares any more.  It seems to me that making replace overlay a
 substring of bits could be a reasonable thing to do, but if nobody
 actually wants it, then the simplest thing to do is remove this from
 the TODO and call it good.

I understand: it would be both a useful feature to me and a way to start coding 
postgres.

But, of course, if there's no interest, I'll pass...





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


I: [HACKERS] TODO: Allow substring/replace() to get/set bit values

2010-01-04 Thread Leonardo F
Re-reading the docs it looks like the only thing missing is get/set_bit for bit 
string.


Substring is already implemented for bit string, and I don't really know if 
replace is useful at all.


(sorry if the other mail came with a different sender name)


Leonardo



 I would like to work on Allow substring/replace() to get/set bit values, 
 since 
 it looks like a simple task.
 
 The item is not marked as easy on the TODO though. Before proceding to a 
 discussion on how this functions should be implemented (I got from the 
 messages 
 on the mailing list that bit substring/replace functions should do it) I 
 would 
 like to know if it's a complicated task.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers