Re: [PATCHES] hash index work

2005-05-28 Thread Neil Conway

Bruce Momjian wrote:

Neil, I have added these item to the TODO list.  Do you plan on applying
this?


No, I don't have any immediate plans to apply it, as unfortunately I 
didn't see a performance win :-( It's also possible I'm just not 
measuring the right workload, although I don't have time to rerun the 
benchmarks at the moment.


The patch does two things: (1) change hash indexes to only store the 
key's hash value, not the entire key (2) store index elements within a 
hash bucket in order of hash key and search for matches via binary 
search. #1 is definitely a win in some in some circumstances (e.g. 
indexing large fields or types with expensive equality operators), but 
those aren't the common case. I'm surprised that #2 is not a more 
noticeable improvement...


One possibility would be to provide an optional implementation of #1, 
perhaps via an alternate index operator class. That way people could 
select it in those situations in which it is worth using.


-Neil

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PATCHES] psql backslash consistency

2005-05-28 Thread Robert Treat
On Friday 27 May 2005 20:45, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > On Fri, May 27, 2005 at 04:16:15PM -0400, Tom Lane wrote:
> >> There seems to be a distinct lack of unanimity about that judgment ;-)
> >
> > Well, yes, _across Postgres hackers_.  But if we were to ask
> > pgsql-general I have a feeling we would measure more weight on one side.
>
> Yeah, but which side ;-) ?  I think the pg-general population would have
> a very much higher fraction of people who have no user-defined functions
> and therefore would see no value in \df not showing system functions.
>

Given that a good majority of the system functions aren't even documented, I 
think you'd find it more likely people would sway toward not having the few 
functions they have written not be totally hidden within the vast list of 
system functions that a majority of people will never make use of.  As a 
point of reference, both pgadmin and phppgadmin default to the "hide system 
functions" method and I haven't seen too many complaints.

> If we put in a config variable, that at least lowers the stakes for the
> losing side in the argument about what the default should be.  Without
> that, I think there will be some serious flamewars ahead...
>

I'm not against the idea of a config variable, but this is what, the third or 
fourth go around on this?  It seems rather unfair to put this burden upon the 
current patch writer at this stage of the game  if someone wants to code 
the config option let them, put it shouldn't be a barrier to having the 
current patch be applied.  

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] psql backslash consistency

2005-05-28 Thread Tom Lane
Robert Treat <[EMAIL PROTECTED]> writes:
> I'm not against the idea of a config variable, but this is what, the
> third or fourth go around on this?  It seems rather unfair to put this
> burden upon the current patch writer at this stage of the game...

The fact that objections keep being raised should suggest to you that
the idea is not uncontroversial.  I think it's necessary to look for a
compromise that everyone can live with.  You're really wasting your
breath to repeat the same arguments over and over and expect that
anyone's mind will change.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] hash index work

2005-05-28 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> The patch does two things: (1) change hash indexes to only store the 
> key's hash value, not the entire key (2) store index elements within a 
> hash bucket in order of hash key and search for matches via binary 
> search. #1 is definitely a win in some in some circumstances (e.g. 
> indexing large fields or types with expensive equality operators), but 
> those aren't the common case. I'm surprised that #2 is not a more 
> noticeable improvement...

It occurs to me that change #1 makes it cheaper to skip over index
entries that are in the same bucket but don't have the exact same hash
code; but it makes it significantly more expensive to skip over entries
that have the same hash code but aren't really equal to the key being
sought (since you have to visit the heap to find out they aren't equal).
Maybe your test workload had enough occurrences of the latter case to
balance out the win from the former.

I think it would be worth investigating a variant in which the index
stores both the hash code and the key value.  This allows cheap
elimination of non-matching hash codes, and binary sorting of index
entries, without adding any extra trips to the heap.  The downside is
that it makes the index larger so you incur more I/O there --- so this
might not be a win either.

> One possibility would be to provide an optional implementation of #1, 
> perhaps via an alternate index operator class. That way people could 
> select it in those situations in which it is worth using.

I think it would definitely be a good idea to make the lossy behavior
optional.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PATCHES] psql backslash consistency

2005-05-28 Thread Robert Treat
On Saturday 28 May 2005 11:12, Tom Lane wrote:
> Robert Treat <[EMAIL PROTECTED]> writes:
> > I'm not against the idea of a config variable, but this is what, the
> > third or fourth go around on this?  It seems rather unfair to put this
> > burden upon the current patch writer at this stage of the game...
>
> The fact that objections keep being raised should suggest to you that
> the idea is not uncontroversial.  I think it's necessary to look for a
> compromise that everyone can live with.  You're really wasting your
> breath to repeat the same arguments over and over and expect that
> anyone's mind will change.
>

I haven't heard a new objection yet that was discussed the previous several go 
arounds, and yet here we are adding yet another precondition...

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] patches for items from TODO list

2005-05-28 Thread Bruce Momjian
Here is an updated version of the COPY \x patch.  It is the first patch
attached.

Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend.  This is the second patch.

Third, I found out that psql has some unusual handling of escaped
numbers.  Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
is octal, and \0xddd is decimal.  It is basically following the strtol()
rules for an escaped value.  This seems confusing and contradicts how
the rest of our system works. I looked at 'bash' and found that it
supports the \000 and \x00 just like C, so I am confused why we have
the current behavior.  This patch makes psql consistent with the rest of
our system for back slashes.  It does break backward compatibility.  It
wouldn't be a big deal to fix, except we document this in the psql
manual page, and that adds confusion.

FYI, here is the current psql behavior:

test=> \set x '\42'
test=> \echo :x
*
test=> \set x '\042'
test=> \echo :x
"
test=> \set x '\0x42'
test=> \echo :x
B

The new behavior is:

test=> \set x '\42'
test=> \echo :x
"
test=> \set x '\042'
test=> \echo :x
"
test=> \set x '\x42'
test=> \echo :x
B

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/copy.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.65
diff -c -c -r1.65 copy.sgml
*** doc/src/sgml/ref/copy.sgml  7 May 2005 02:22:45 -   1.65
--- doc/src/sgml/ref/copy.sgml  28 May 2005 14:02:59 -
***
*** 424,436 
 Backslash followed by one to three octal digits specifies
 the character with that numeric code

   
  
 
  
! Presently, COPY TO will never emit an octal-digits
! backslash sequence, but it does use the other sequences listed above
! for those control characters.
 
  
 
--- 424,441 
 Backslash followed by one to three octal digits specifies
 the character with that numeric code

+   
+\xdigits
+Backslash x followed by one or two hex digits 
specifies
+the character with that numeric code
+   
   
  
 
  
! Presently, COPY TO will never emit an octal or 
! hex-digits backslash sequence, but it does use the other sequences
! listed above for those control characters.
 
  
 
Index: src/backend/commands/copy.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.244
diff -c -c -r1.244 copy.c
*** src/backend/commands/copy.c 7 May 2005 02:22:46 -   1.244
--- src/backend/commands/copy.c 28 May 2005 14:03:07 -
***
*** 2274,2279 
--- 2274,2291 
return result;
  }
  
+ /*
+  *Return decimal value for a hexadecimal digit
+  */
+ static
+ int GetDecimalFromHex(char hex)
+ {
+   if (isdigit(hex))
+   return hex - '0';
+   else
+   return tolower(hex) - 'a' + 10;
+ }
+ 
  /*--
   * Read the value of a single attribute, performing de-escaping as needed.
   *
***
*** 2335,2340 
--- 2347,2353 
case '5':
case '6':
case '7':
+   /* handle \013 */
{
int val;
  
***
*** 2360,2365 
--- 2373,2402 
c = val & 0377;
}
break;
+   case 'x':
+   /* Handle \x3F */
+   if (line_buf.cursor < line_buf.len)
+   {
+   char hexchar = 
line_buf.data[line_buf.cursor];
+ 
+   if (isxdigit(hexchar))
+   {
+   int val = 
GetDecimalFromHex(hexchar);
+ 
+   line_buf.cursor++;
+   if (line_buf.cursor < 
line_buf.len)
+   {
+   hexchar = 
line_buf.data[line_buf.cursor];
+  

Re: [PATCHES] [HACKERS] patches for items from TODO list

2005-05-28 Thread Tom Lane
Bruce Momjian  writes:
> Here is an updated version of the COPY \x patch.  It is the first patch
> attached.
> Also, I realized that if we support \x in COPY, we should also support
> \x in strings to the backend.  This is the second patch.

Do we really want to do any of these things?  We've been getting beaten
up recently about the fact that we have non-SQL-spec string escapes
(ie, all the backslash stuff) so I'm a bit dubious about adding more,
especially when there's little if any demand for it.

I don't object too much to the COPY addition, since that's outside any
spec anyway, but I do think we ought to think twice about adding this
to SQL literal handling.

> Third, I found out that psql has some unusual handling of escaped
> numbers.  Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
> is octal, and \0xddd is decimal.  It is basically following the strtol()
> rules for an escaped value.  This seems confusing and contradicts how
> the rest of our system works.

I agree, that's just going to confuse people.

> ! xqescape[\\][^0-7x]

If you are going to insist on this, at least make it case-insensitive.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly