Re: [HACKERS] Review: B-Tree emulation for GIN

2009-04-04 Thread Teodor Sigaev




* After this new look at the code I think that matchPartialInPendingList
is completely broken.  Surely it's an infinite loop if the
comparePartial function returns -1.  I also don't understand why it

fixed


stops short of examining the tuple at maxoff, or for that matter why
it's doing any searching at all when the code that's calling it seems
to be implementing a binary search.  I think that part of the code
needs more testing and commenting.

Tuples of rows are stored in pending list in the same order as in entry tree
(order by (attrnum, Datum)). So, it's possible to use binary search. By test's
binary search presents about 5-15% of performance.

But heap's row could be one page or on several pages. If page contains
several rows, each row is checked separately and tuples of one heap row are
placed from pendingPosition-firstOffset up to pendingPosition-lastOffset.
The page is marked by GIN_LIST_FULLROW flag if it contains all tuples of row(s) 
or it's a last page in page's chain containing single heap row.


scanGetCandidate() always returns offset range for tuples of one heap row.
Although it may return NOT all tuples of row, if so then collectDatumForItem()
will ask scanGetCandidate() about next portion of tuples of that heap row. It's
safe because tuple's of one heap row are continuosly stored in pending list.


* In keyGetItem(), it's not at all clear what the assumptions are for
the contents of the entryRes[] array and for the interaction with a
LossyPage entry.  Is it intentional that you run through the array
comparing to a LossyPage key before deciding to exit the loop?  It
seems like that might be affecting the behavior at the next call,
but if so you're making some undocumented assumptions about the way
comparison of a lossy-page pointer is going to behave.  I thought it'd
be cleaner to move the ItemPointerIsLossyPage check up to before that
loop (right after the ItemPointerIsMax test) but the more I look at it
the less sure I am if that would break things.  That code needs more
commenting too.

entryRes[] array is used for two purposes:
- as an argument for consistentFn
- to store information about entries which was a minimal on previous
  call/loop. So, only that entries should be renewed by entryGetItem()
  call. Lossy page is encoded as ItemPointer with greatest offset (0x),
  so  keyGetItem() could return items from page and then return the same
  page as lossy. This is not a problem because of usage of TIDBitmap to
  callect all results.

So, algorithm of  keyGetItem could be described as:
1 renew all entries which is equal to previous result (entryGetItems returns 
them in increasing order)

2 find minimal entries
3 fill up  entryRes[] to new values and check lossy or call consistentFn



* I'd also like to come to some agreement about getting rid of the
fail-on-NULL-scankey problem in newScanKey().  As I noted in the
comment there, we could make that work cleanly if we are willing to
assume that all GIN-indexable operators are strict.  We already assume
the same for hash and btree operators, so it doesn't seem like a big
problem to do this, but I wonder if there are any objections.
Agree.  I changed the GIN code, but don't know where is other places to change 
to fixate this agreement.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/



gin.patch.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-04-04 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 [ fixes for the GIN stuff I complained about before ]

This all looks good to me, please apply.  One little suggestion:
  
!   /* 
!* entryRes array is used for:
!* - as an argument for consistentFn
!* - entry-curItem with corresponding key-entryRes[i] == 
false are greater 
!*   than key-curItem, so next loop/call they should be renewed
!*   by entryGetItem(). So, we need to set up an array before
!*   checking of lossy page.
!*/

pgindent will reflow this comment block, since it's not at the left
margin.  To keep the formatting looking good you'll need to add /*

 * I'd also like to come to some agreement about getting rid of the
 fail-on-NULL-scankey problem in newScanKey().  As I noted in the
 comment there, we could make that work cleanly if we are willing to
 assume that all GIN-indexable operators are strict.  We already assume
 the same for hash and btree operators, so it doesn't seem like a big
 problem to do this, but I wonder if there are any objections.

 Agree.  I changed the GIN code, but don't know where is other places
 to change to fixate this agreement.

I don't think there is anything else that needs to be done for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-26 Thread Jeff Davis
On Wed, 2009-03-25 at 19:31 -0400, Tom Lane wrote:
 * I'd also like to come to some agreement about getting rid of the
 fail-on-NULL-scankey problem in newScanKey().  As I noted in the
 comment there, we could make that work cleanly if we are willing to
 assume that all GIN-indexable operators are strict.  We already assume
 the same for hash and btree operators, so it doesn't seem like a big
 problem to do this, but I wonder if there are any objections.

IS NULL is indexable in a btree and non-strict, so there is at least
some precedent.

Also, if extractQuery is non-strict, shouldn't we call it and see if it
returns some useful keys? If so, I don't see a reason to assume that
nothing matches.

If the opclass author wants a search against NULL to mean matches
nothing, they can just make extractQuery non-strict and return -1.

However, if extractQuery is strict or returns NULL, I'm fine with either
an error or assuming nothing matches. I don't see a functionality
difference either way, so we should just document whatever seems to make
the most sense.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-26 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 Also, if extractQuery is non-strict, shouldn't we call it and see if it
 returns some useful keys?

Perhaps.  One risk factor for approaching it that way is that there are
probably a lot of opclasses out there that haven't bothered to mark
these functions strict, since it's never mattered before.  (A handy
example is that the brand new btree_gin opclasses did not bother, as
submitted; though in a fit of paranoia I made them do so before
committing.)  If the extractQuery function isn't actually guarding
against this, you'll get a crash.

That's not a showstopper reason not to change, of course, but it does
mean that I'd like to see an actual use case for a non-strict GIN index
operator before taking any risk.  Note that IS NULL isn't an operator,
so even if we were to try to support it in GIN, that would be a
different code path (just as it is in btree).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-25 Thread Tom Lane
[ back to this patch... ]

Teodor Sigaev teo...@sigaev.ru writes:
 Looked at this a bit ... do you think it's really a good idea to remove
 the strategy number argument of comparePartial?  The argument given in
 the docs for it is that it might be needed to determine when to end the
 scan, and that still seems plausible to me.

 Strategy number is not removed, it's replaced by pointer to opclass-specific 
 data on per key basis. Actually, partial match feature is new for 8.4 and 
 there 
 is no any compatibility problem. None of existing opclasses (except 
 contrib/btree_gin) uses strategy number in comparePartial.

I'm still not real happy about omitting the strategy number from
comparePartial's arguments.  Yes, the extractQuery function can make
up for that if it has to, but why should it have to?  It seems to me
to be inconsistent that the consistent function gets the strategy
number but comparePartial doesn't.

I know that there's not a backwards compatibility issue here, but it
still seems to me that you're forcing opclasses to do things the hard
way (allocating an extra_data entry) when they might only need
information that's already part of the call signature for existing
functions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-25 Thread Tom Lane
BTW ... while I'm thinking about it: it seems to me to be a serious
error that the consistent() function isn't given nkeys so that it can
know the length of the arrays it's being handed.  I suppose it's
possible for it to re-deduce nkeys by examining the query datum, but
that could be quite expensive; and it's certainly error-prone to have
to keep extractQuery() and consistent() in sync on this.  Since we are
adding arguments to consistent() anyway for 8.4, I propose that its
signature ought to be

bool consistent(bool check[], StrategyNumber n, Datum query,
int32 nkeys, bool *recheck, Pointer extra_data[])

where the first three arguments are what existed in 8.3.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-25 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom BTW ... while I'm thinking about it: it seems to me to be a
 Tom serious error that the consistent() function isn't given nkeys
 Tom so that it can know the length of the arrays it's being handed.
 Tom I suppose it's possible for it to re-deduce nkeys by examining
 Tom the query datum, but that could be quite expensive; and it's
 Tom certainly error-prone to have to keep extractQuery() and
 Tom consistent() in sync on this.

I ran into exactly this problem in hstore; the omission of nkeys there
is seriously annoying.

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-25 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom BTW ... while I'm thinking about it: it seems to me to be a
  Tom serious error that the consistent() function isn't given nkeys
  Tom so that it can know the length of the arrays it's being handed.

 I ran into exactly this problem in hstore; the omission of nkeys there
 is seriously annoying.

It's de-omitted ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-25 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 [ btree_gin 0.12 ]

Committed with some editorializations.  There are still a few loose
ends:

* the question about zero-key queries that I mentioned before

* After this new look at the code I think that matchPartialInPendingList
is completely broken.  Surely it's an infinite loop if the
comparePartial function returns -1.  I also don't understand why it
stops short of examining the tuple at maxoff, or for that matter why
it's doing any searching at all when the code that's calling it seems
to be implementing a binary search.  I think that part of the code
needs more testing and commenting.

* In keyGetItem(), it's not at all clear what the assumptions are for
the contents of the entryRes[] array and for the interaction with a
LossyPage entry.  Is it intentional that you run through the array
comparing to a LossyPage key before deciding to exit the loop?  It
seems like that might be affecting the behavior at the next call,
but if so you're making some undocumented assumptions about the way
comparison of a lossy-page pointer is going to behave.  I thought it'd
be cleaner to move the ItemPointerIsLossyPage check up to before that
loop (right after the ItemPointerIsMax test) but the more I look at it
the less sure I am if that would break things.  That code needs more
commenting too.

* I'd also like to come to some agreement about getting rid of the
fail-on-NULL-scankey problem in newScanKey().  As I noted in the
comment there, we could make that work cleanly if we are willing to
assume that all GIN-indexable operators are strict.  We already assume
the same for hash and btree operators, so it doesn't seem like a big
problem to do this, but I wonder if there are any objections.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-18 Thread Teodor Sigaev


It's easy to un-dirty that hack, but before I'd like to see your 
comments about thoughts above.


Yeah, please revert that hack.
Done. Also, I changed void *extra_data to Pointer extra_data and corresponding 
**extra_data and ***extra_data to simplify understanding.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


btree_gin-0.12.gz
Description: Unix tar archive
-
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-16 Thread Heikki Linnakangas

Teodor Sigaev wrote:
You might want to declare extra_data as just void *, instead of an 
array of pointers. The data type implementation might want to store 
something there that's not per-key, but applies to the whole query. I 
see that you're passing it to comparePartial, but that seems to be 
just future-proofing. What kind of a data type are you envisioning 
that would 


wildspeed module (http://www.sigaev.ru/cvsweb/cvsweb.cgi/wildspeed/) - 
for each key from it's needed to store some info. Right now it's coded 
directly in Datum, but it looks ugly (at least for me).


Ok, I guess it's best to leave it as you had in the patch then.

make use of it? It seems that you could pass the same information in 
the partial key Datum itself that extractQuery returns. You're 
currently using it as a way to avoid some palloc's in 
gin_tsquery_consistent(). That seems like a pretty dirty hack. I doubt 
there's any meaningful performance advantage from that, but if there 
is, I think you could use a statically allocated array instead.


It's easy to un-dirty that hack, but before I'd like to see your 
comments about thoughts above.


Yeah, please revert that hack.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-05 Thread Teodor Sigaev
The GIN_EXTRACT_VALUE macro returns a pointer to a static 'entries' 
variable. That doesn't seem safe. Is it really never possible to have to 
 two GIN searches in a plan, both calling and using the value returned 
by extractValue simultaneously? In any case that seems like a pretty 
weak assumption.

Fixed.


You might want to declare extra_data as just void *, instead of an 
array of pointers. The data type implementation might want to store 
something there that's not per-key, but applies to the whole query. I 
see that you're passing it to comparePartial, but that seems to be just 
future-proofing. What kind of a data type are you envisioning that would 


wildspeed module (http://www.sigaev.ru/cvsweb/cvsweb.cgi/wildspeed/) - for each 
key from it's needed to store some info. Right now it's coded directly in Datum, 
but it looks ugly (at least for me).


It's possible to clarify interface by introducing new type:

typedef void* OpaquePtr;

Then, prototypes will be:
extractQuery(..., OpaquePtr* extra_data[])
consistent(, OpaquePtr extra_data[])
comparePartial(..., OpaquePtr extra_data)

Or another option: partial match feature is new for 8.4, so we could change 
interface:

typedef struct KeyData {
bool pmatch,
void *extra_data;
} KeyData;

Datum *extractQuery(Datum query, int32 *nkeys, StrategyNumber n, KeyData* 
data[])
bool consistent(bool check[], StrategyNumber n, Datum query, bool *recheck, 
KeyData data[])

comparePartial(Datum partial_key, Datum key, KeyData *data);

make use of it? It seems that you could pass the same information in the 
partial key Datum itself that extractQuery returns. You're currently 
using it as a way to avoid some palloc's in gin_tsquery_consistent(). 
That seems like a pretty dirty hack. I doubt there's any meaningful 
performance advantage from that, but if there is, I think you could use 
a statically allocated array instead.


It's easy to un-dirty that hack, but before I'd like to see your comments about 
thoughts above.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


btree_gin-0.11.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-03-04 Thread Heikki Linnakangas
The GIN_EXTRACT_VALUE macro returns a pointer to a static 'entries' 
variable. That doesn't seem safe. Is it really never possible to have to 
 two GIN searches in a plan, both calling and using the value returned 
by extractValue simultaneously? In any case that seems like a pretty 
weak assumption.


You might want to declare extra_data as just void *, instead of an 
array of pointers. The data type implementation might want to store 
something there that's not per-key, but applies to the whole query. I 
see that you're passing it to comparePartial, but that seems to be just 
future-proofing. What kind of a data type are you envisioning that would 
make use of it? It seems that you could pass the same information in the 
partial key Datum itself that extractQuery returns. You're currently 
using it as a way to avoid some palloc's in gin_tsquery_consistent(). 
That seems like a pretty dirty hack. I doubt there's any meaningful 
performance advantage from that, but if there is, I think you could use 
a statically allocated array instead.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-02-11 Thread Teodor Sigaev

Looking through the code again, gin_compare_prefix_##type looks a little
confusing.

Is there a reason for using:
  (data-strategy == BTLessStrategyNumber ||
   data-strategy == BTLessEqualStrategyNumber ) ? 
 PointerGetDatum(data-datum) : a

rather than just using:
  PointerGetDatum(data-datum)


Added comments:
/*
 * Datum a is a value from extract_query method and for BTLess*
 * strategy it is a left-most value. So, use original datum from
 * QueryInfo to decide stop scanning on not. Datum b is always
 * from index.
 */



Also, it might be a little less confusing if you used two separate
variables rather than using res for two purposes.

done
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


btree_gin-0.10.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-02-06 Thread Jeff Davis
On Wed, 2009-02-04 at 20:22 +0300, Teodor Sigaev wrote:
  The description of extractQuery's extra_data parameter seems confusing
  too.  AFAICS it is incorrect, or at least misleading, to describe it as
  void ** extra_data[]; it is really void ***extra_data, because there is
  only one object there not an array.
 
 It's really array, see fillScanKey() in ginscan.c, line 61 of patched file. 
 extractQuery could provide data for each returned key.
 

If I understand correctly, the extra_data parameter to extractQuery()
should mean a pointer to an array of void*. void **extra_data[]
might be misinterpreted as an array of void**.

Would void *(*extra_data)[] be more descriptive? I'm not an expert on
style questions like this, but it seems to match the parameter lists for
comparePartial() and consistent().

void ***extra_data seems reasonable, too.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-02-06 Thread Jeff Davis
On Mon, 2009-01-19 at 21:41 +0300, Teodor Sigaev wrote:
  gin_numeric_cmp() can be called from regular SQL. I missed this before,
  but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
  v0.7 at least).
 
 Fixed, gin_numeric_cmp is marked as strict.
 
  And how does GIN handle SQL NULL values in the column? Does it index
  them at all, or just ignore them?
 SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
 C NULL: NULL-numeric could be returned only by gin_extract_query_numeric 
 which 
 cannot be called by user directly because of internal type of argument.
 GIN doesn't do anything with values returned by gin_extract_query_numeric 
 except 
 providing they as an argument for comparing functions.

Looking through the code again, gin_compare_prefix_##type looks a little
confusing.

Is there a reason for using:
  (data-strategy == BTLessStrategyNumber ||
   data-strategy == BTLessEqualStrategyNumber ) ? 
 PointerGetDatum(data-datum) : a
rather than just using:
  PointerGetDatum(data-datum)

Also, it might be a little less confusing if you used two separate
variables rather than using res for two purposes.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-02-04 Thread Teodor Sigaev

Looked at this a bit ... do you think it's really a good idea to remove
the strategy number argument of comparePartial?  The argument given in
the docs for it is that it might be needed to determine when to end the
scan, and that still seems plausible to me.
Strategy number is not removed, it's replaced by pointer to opclass-specific 
data on per key basis. Actually, partial match feature is new for 8.4 and there 
is no any compatibility problem. None of existing opclasses (except 
contrib/btree_gin) uses strategy number in comparePartial.





The description of extractQuery's extra_data parameter seems confusing
too.  AFAICS it is incorrect, or at least misleading, to describe it as
void ** extra_data[]; it is really void ***extra_data, because there is
only one object there not an array.


It's really array, see fillScanKey() in ginscan.c, line 61 of patched file. 
extractQuery could provide data for each returned key.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-02-02 Thread Tom Lane
Looked at this a bit ... do you think it's really a good idea to remove
the strategy number argument of comparePartial?  The argument given in
the docs for it is that it might be needed to determine when to end the
scan, and that still seems plausible to me.

The description of extractQuery's extra_data parameter seems confusing
too.  AFAICS it is incorrect, or at least misleading, to describe it as
void ** extra_data[]; it is really void ***extra_data, because there is
only one object there not an array.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Alvaro Herrera
Jeff Davis escribió:

 I like the fact that this patch does not modify the numeric ADT. It
 still relies on the fact that the numeric type will never make use of
 the minimal varlena struct, however. I bring this up in case someone
 sees it as a problem.

Greg Stark was working on a patch to make certain values use the short
representation.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Teodor Sigaev

still relies on the fact that the numeric type will never make use of
the minimal varlena struct, however. I bring this up in case someone
sees it as a problem.


Greg Stark was working on a patch to make certain values use the short
representation.


Fake value contains only VARHDRSZ bytes.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 I like the fact that this patch does not modify the numeric ADT. It
 still relies on the fact that the numeric type will never make use of
 the minimal varlena struct, however. I bring this up in case someone
 sees it as a problem.

I'm pretty certain I recall Greg Stark recommending that we adopt
something like that as the standard numeric representation of zero.
It didn't get done yet, but it might happen someday.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Jeff Davis
On Mon, 2009-01-19 at 11:35 -0500, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  I like the fact that this patch does not modify the numeric ADT. It
  still relies on the fact that the numeric type will never make use of
  the minimal varlena struct, however. I bring this up in case someone
  sees it as a problem.
 
 I'm pretty certain I recall Greg Stark recommending that we adopt
 something like that as the standard numeric representation of zero.
 It didn't get done yet, but it might happen someday.
 

Then we should use the previous version of the patch here:

http://archives.postgresql.org/message-id/1231709713.25019.129.ca...@jdavis

Was there any talk of supporting a +/- infinity in numeric? If we did
that, it would allow numeric to be supported for btree-gin.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Teodor Sigaev

I'm pretty certain I recall Greg Stark recommending that we adopt
something like that as the standard numeric representation of zero.
It didn't get done yet, but it might happen someday.


Changes:
- use NULL as left-most value. It's safe because NULL numeric value
  cannot be an argument for any function except gin_numeric_cmp and it
  cannot be returned in regular SQL query.
- fix uninstall script for support numeric type.
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


btree_gin-0.8.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Jeff Davis
On Mon, 2009-01-19 at 20:15 +0300, Teodor Sigaev wrote:
 Changes:
 - use NULL as left-most value. It's safe because NULL numeric value
cannot be an argument for any function except gin_numeric_cmp and it
cannot be returned in regular SQL query.

gin_numeric_cmp() can be called from regular SQL. I missed this before,
but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
v0.7 at least).

I know you mean a C NULL, not a SQL NULL, but it reminded me to test SQL
NULL.

And how does GIN handle SQL NULL values in the column? Does it index
them at all, or just ignore them?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Teodor Sigaev

gin_numeric_cmp() can be called from regular SQL. I missed this before,
but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
v0.7 at least).


Fixed, gin_numeric_cmp is marked as strict.


And how does GIN handle SQL NULL values in the column? Does it index
them at all, or just ignore them?

SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
C NULL: NULL-numeric could be returned only by gin_extract_query_numeric which 
cannot be called by user directly because of internal type of argument.
GIN doesn't do anything with values returned by gin_extract_query_numeric except 
providing they as an argument for comparing functions.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


btree_gin-0.9.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Jeff Davis
On Mon, 2009-01-19 at 21:41 +0300, Teodor Sigaev wrote:
  And how does GIN handle SQL NULL values in the column? Does it index
  them at all, or just ignore them?
 SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
 C NULL: NULL-numeric could be returned only by gin_extract_query_numeric 
 which 
 cannot be called by user directly because of internal type of argument.
 GIN doesn't do anything with values returned by gin_extract_query_numeric 
 except 
 providing they as an argument for comparing functions.

Ok, I understand now. I will look at this later.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-19 Thread Jeff Davis
On Mon, 2009-01-19 at 21:41 +0300, Teodor Sigaev wrote:
  gin_numeric_cmp() can be called from regular SQL. I missed this before,
  but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
  v0.7 at least).
 
 Fixed, gin_numeric_cmp is marked as strict.
 
  And how does GIN handle SQL NULL values in the column? Does it index
  them at all, or just ignore them?
 SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
 C NULL: NULL-numeric could be returned only by gin_extract_query_numeric 
 which 
 cannot be called by user directly because of internal type of argument.
 GIN doesn't do anything with values returned by gin_extract_query_numeric 
 except 
 providing they as an argument for comparing functions.

Ok, looks good. I updated the wiki to show this as the latest version of
the patch.

Thanks,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-18 Thread Jeff Davis
On Fri, 2009-01-16 at 17:42 +0300, Teodor Sigaev wrote:
  Unfortunately, that means numeric is not supportable for btree-gin
  because there is no left-most value from which to start the scan. Do you
  see an easy workaround to support numeric?
 Hmm, let we use minimal varlena struct (with size equal to VARHDRSZ) as 
 left-most (and fake) value. It is never used for any actual argument except 
 compare function, so, new compare function is provided. New version of patch 
 is 
 attached and it based on on your patch (btree-gin.20090111.patch.gz).
 

Looks good to me. I updated the CommitFest wiki to point to this patch.

I like the fact that this patch does not modify the numeric ADT. It
still relies on the fact that the numeric type will never make use of
the minimal varlena struct, however. I bring this up in case someone
sees it as a problem.

Thanks,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-16 Thread Teodor Sigaev

I have merged this with HEAD, written a brief document (which is mostly
a copy of the btree-gist page), added the module to the contrib
Makefile, and made some very minor changes. Patch attached.

Thank you



A couple comments:

1. Right now, to implement less than you need to start at the
beginning of the index and scan until you reach the supplied query
datum. This is because GIN doesn't support backwards scans
( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01146.php ).

Unfortunately, that means numeric is not supportable for btree-gin
because there is no left-most value from which to start the scan. Do you
see an easy workaround to support numeric?
Hmm, let we use minimal varlena struct (with size equal to VARHDRSZ) as 
left-most (and fake) value. It is never used for any actual argument except 
compare function, so, new compare function is provided. New version of patch is 
attached and it based on on your patch (btree-gin.20090111.patch.gz).


 2. Why do you create a shell type int32 from btree_gin.sql? I tried
 doing a search-and-replace to use int4 instead, and it seems to work
 fine (and eliminates the warnings). I left it with int32 in my version
 of the patch because I thought you may have some reason for using it.

My mistake, thank you for fix
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


btree_gin-0.7.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2009-01-11 Thread Jeff Davis
On Sun, 2008-12-28 at 23:41 -0800, Jeff Davis wrote:
 2. Why do you create a shell type int32 from btree_gin.sql? I tried
 doing a search-and-replace to use int4 instead, and it seems to work
 fine (and eliminates the warnings). I left it with int32 in my version
 of the patch because I thought you may have some reason for using it.
 

Attached an updated patch with this change.

Regards,
Jeff Davis


btree-gin.20090111.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2008-12-28 Thread Jeff Davis
On Fri, 2008-12-19 at 13:26 +0300, Teodor Sigaev wrote:
 Updated patch.

I have merged this with HEAD, written a brief document (which is mostly
a copy of the btree-gist page), added the module to the contrib
Makefile, and made some very minor changes. Patch attached.

A couple comments:

1. Right now, to implement less than you need to start at the
beginning of the index and scan until you reach the supplied query
datum. This is because GIN doesn't support backwards scans
( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01146.php ).

Unfortunately, that means numeric is not supportable for btree-gin
because there is no left-most value from which to start the scan. Do you
see an easy workaround to support numeric?

2. Why do you create a shell type int32 from btree_gin.sql? I tried
doing a search-and-replace to use int4 instead, and it seems to work
fine (and eliminates the warnings). I left it with int32 in my version
of the patch because I thought you may have some reason for using it.

Regards,
Jeff Davis


btree-gin.20081228.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2008-12-19 Thread Teodor Sigaev

Updated patch.

Ibrar Ahmed wrote:

Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing 
something wrong here. Regression diff is attached.


BTW these queries work fine outside the regression.

 
--

  Ibrar Ahmed
  EnterpriseDB   http://www.enterprisedb.com







--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_extra-0.2.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2008-12-19 Thread Ibrar Ahmed
Thanks,


On Fri, Dec 19, 2008 at 3:26 PM, Teodor Sigaev teo...@sigaev.ru wrote:

 Updated patch.

 Ibrar Ahmed wrote:

 Hi Teodor Sigaev,

 I am getting server crash in contrib regression. May be I am doing
 something wrong here. Regression diff is attached.

 BTW these queries work fine outside the regression.


 --
  Ibrar Ahmed
  EnterpriseDB   http://www.enterprisedb.com


 



 --
 Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW:
 http://www.sigaev.ru/




-- 
  Ibrar Ahmed
  EnterpriseDB   http://www.enterprisedb.com


[HACKERS] Review: B-Tree emulation for GIN

2008-12-17 Thread Ibrar Ahmed
Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing something
wrong here. Regression diff is attached.

BTW these queries work fine outside the regression.


-- 
  Ibrar Ahmed
  EnterpriseDB   http://www.enterprisedb.com


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: B-Tree emulation for GIN

2008-12-17 Thread Teodor Sigaev

will see, may be it's needed to update the patch

Ibrar Ahmed wrote:

Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing 
something wrong here. Regression diff is attached.


BTW these queries work fine outside the regression.

 
--

  Ibrar Ahmed
  EnterpriseDB   http://www.enterprisedb.com







--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers