Re: [PATCHES] hash index improving v3

2008-09-06 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes:
> - tbm_add_tuples(tbm, &scan->xs_ctup.t_self, 1, false);
> + tbm_add_tuples(tbm, &scan->xs_ctup.t_self, 1, true);

Hah, I bet that explains Jonah's complaint that recheck didn't seem to
be happening in his tests.  Nice catch.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-06 Thread Tom Lane
"Alex Hunsaker" <[EMAIL PROTECTED]> writes:
> On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
>> Ok now that I made it so it actually *test* collisions, with the patch
>> it always returns all rows that matched the hashed "key".

> And here is the fix, we just forget to set the recheck flag for bitmap scans.

For the convenience of anyone intending to test, here is an updated
patch against CVS HEAD that incorporates Alex's fix.

regards, tom lane



binwQERo3PM5e.bin
Description: hash-v5.patch.gz

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] still alive?

2008-09-06 Thread Bruce Momjian
Abhijit Menon-Sen wrote:
> I thought -patches was supposed to die. What happened?

I was wondering the same thing.  Peter?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] to_date() validation

2008-09-06 Thread Alex Hunsaker
On Fri, Aug 29, 2008 at 7:39 PM, Brendan Jurd <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Well, it's been a long time coming, but I've finally got a patch to
> improve the error handling of to_date() and to_timestamp(), as
> previously discussed on -hackers. [1][2]

BTW -patches is obsolete just submit pathces to -hackers.

Im just following this:
http://wiki.postgresql.org/wiki/Reviewing_a_Patch so lets get started.

Submission review: Everything looks good. Tests seem reasonable patch
applies etc.

Usability review: I think both linked threads in the parent mail give
sufficient justification.

Feature test: Everything seems to work as advertised. However before
via sscanf() most limited the max length of the input.  Before they
were just silently ignored now they get added to the result:

patched:
# SELECT to_timestamp('1', 'HHMM');
 to_timestamp
--
 0009-03-19 11:00:00-06:59:56

8.3.3:
# SELECT to_timestamp('1', 'HHMM');
   to_timestamp
---
 0001-11-01 11:00:00-07 BC

Performance review: simple pgbench of to_timestamp did not show any
noticeable differences

Coding review: seemed fine to me, code matched surrounding style,
comments made sense etc

Code review: one minor nit
*** a/src/backend/utils/adt/formatting.c
--- b/src/backend/utils/adt/formatting.c
***
*** 781,787  static const KeyWord DCH_keywords[] = {
{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN},

/* last */
!   {NULL, 0, 0, 0}
  };

  /* --
--- 781,787 
{"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN},

/* last */
!   {NULL, 0, 0, 0, 0}
  };

  /* --
***

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Jaime Casanova
On Sun, Aug 31, 2008 at 3:35 PM, Ryan Bradetich <[EMAIL PROTECTED]> wrote:
> Hello all,
>

a few comments.

- i think you have to add some more comments in uint.c file and maybe
a header indicating this is part of the postgresql project or that is
intended to use with postgres or something of the like

- what is uint1? i know int, int2, int4, int8 so i think we should
have uint, uint2, uint4 (maybe uint8?)

>uint-base.tar.bz2  -- The core of the unsigned integer type.

seems there is something wrong in the unlikely macro (i'm using GCC
4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64)

postgres=# select -256::uint1;
ERROR:  uint1 out of range
STATEMENT:  select -256::uint1;
ERROR:  uint1 out of range
postgres=# select -255::uint1;
 ?column?
--
 -255
(1 row)

postgres=# select -2::uint1;
 ?column?
--
   -2
(1 row)

postgres=# select -5::uint1 + 30::uint1;
 ?column?
--
   25
(1 row)

>uint-tests.tar.bz2  -- The regression tests.
>

here failed two regression tests but that is because the path

>* Converted build system to use PGXS (more portable).

the Makefile doesn't work here... i have installed postgres 8.3.3 from
ubuntu package and the test env i compile manually (the uint module
tried to install in the ubuntu location while it should in the env
location)

attached a Makefile that fix that

i still have to make some more test...

-- 
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. (593) 87171157


Makefile
Description: Binary data


regression.diffs
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Tom Lane
"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> seems there is something wrong in the unlikely macro (i'm using GCC
> 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64)

> postgres=# select -256::uint1;
> ERROR:  uint1 out of range

No, that's just because this is parsed as -(256::uint1)

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-06 Thread Tom Lane
"David Rowley" <[EMAIL PROTECTED]> writes:
> I've made and attached the changes Heikki recommended.

I looked this over a bit and was immediately confused by one thing:
the introductory comment says that the skip table size ought to be based
on the length of the haystack, which makes sense to me, but the code is
actually initializing it on the basis of len2, ie, the length of the
needle.  Isn't that a bug?  Was the same bug present in the tests you
made to determine the best table sizes?

Stylistic issues:

I really dislike having two copies of the heuristic size-setting code.
I think it's worth introducing a second test of use_wchar in order to
arrange text_position_setup like this:

... existing code ...

choose skiptablesize

initialize skip table (this loop doesn't depend on char width)

if (!state->use_wchar)
process char needle
else
process wide-char needle

Also it might be worth having local-variable copies of skiptablesize and
len2 for this initialization loop:

for (ai = 0; ai <= state->skiptablesize; ai++)
state->skiptable[ai] = state->len2;

I'm not at all sure whether a C compiler will think that it's allowed to
avoid fetching these values again on each iteration, seeing that the
loop is storing into other integer members of *state.  Given that this
loop is exactly the overhead we're trying to control by adjusting
skiptablesize, making it as tight as possible seems worthwhile.

> Now that the skip table is a member of TextPositionState, I was not quite
> sure if I should #define a size for it. It would certainly look neater, only
> the code that defines the skip table size in text_position_start assumes 256
> elements. 

I tend to agree that a #define is pointless there, since there's no easy
way to make the relevant code do anything with it.  It would be
worthwhile to point out adjacent to the code what the max length is,
though.  I had gotten as far as rewriting your introductory comment like
this

/*
 * Prepare the skip table for Boyer-Moore-Horspool searching.  First we
 * must determine how big a skip table to use.  The declaration of
 * TextPositionState allows up to 256 elements, but with haystack lengths
 * of only a few characters we don't really want to have to initialize so
 * many elements --- it would take too long in comparison to the actual
 * search time.  So we choose a useful skip table size based on the
 * haystack length.

before noticing that what I was writing wasn't what the code actually
did.  Another point that doesn't seem to be included in your comments is
that the skip table size *must* be a power of 2 because we use bit
masking.  (In fact state->skiptablesize might better be named skiptablemask
since it's really 1 less than the table size.)

BTW, try to avoid introducing so much random vertical space.  IMHO it
does little for readability and much to make routines too long to fit in
one screenful.  Generally the PG code doesn't use double blank lines
anywhere inside a function, nor blank lines before a right brace line.
pg_indent will clean this up to some extent, but it would be better if
the code looked more like the project style in the first place.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-06 Thread Tom Lane
I wrote:
> I looked this over a bit and was immediately confused by one thing:
> the introductory comment says that the skip table size ought to be based
> on the length of the haystack, which makes sense to me, but the code is
> actually initializing it on the basis of len2, ie, the length of the
> needle.  Isn't that a bug?  Was the same bug present in the tests you
> made to determine the best table sizes?

BTW, to the extent that you feel like testing a different idea,
I would suggest:

* don't bother initializing the skiptable when len1 < len2

* otherwise, choose its size based on len1 - len2, not just len1 or
len2.  This is (one less than) the maximum number of search loop
consultations of the skip table that can happen, so it seems like a
plausible number, and better than either length alone.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Jaime Casanova
On Sat, Sep 6, 2008 at 3:57 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Jaime Casanova" <[EMAIL PROTECTED]> writes:
>> seems there is something wrong in the unlikely macro (i'm using GCC
>> 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64)
>
>> postgres=# select -256::uint1;
>> ERROR:  uint1 out of range
>
> No, that's just because this is parsed as -(256::uint1)
>

actually, i thought that case is right but the -255::uint1 returning a
negative number (aka -255) is what bothers me


-- 
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. (593) 87171157

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Jaime Casanova
On Sat, Sep 6, 2008 at 3:57 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Jaime Casanova" <[EMAIL PROTECTED]> writes:
>> seems there is something wrong in the unlikely macro (i'm using GCC
>> 4.2.3 in Ubuntu 4.2.3-2ubuntu7 with amd64)
>
>> postgres=# select -256::uint1;
>> ERROR:  uint1 out of range
>
> No, that's just because this is parsed as -(256::uint1)
>

ah! ok, i see the point...

postgres=# select 256::uint1;
ERROR:  uint1 out of range

but is right that way of parsing? so i get a negative number instead
of an error?

-- 
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. (593) 87171157

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)

2008-09-06 Thread David Rowley
Tom Lane wrote:
> I looked this over a bit and was immediately confused by one thing:
> the introductory comment says that the skip table size ought to be based
> on the length of the haystack, which makes sense to me, but the code is
> actually initializing it on the basis of len2, ie, the length of the
> needle.  Isn't that a bug?  Was the same bug present in the tests you
> made to determine the best table sizes?

Yes Bug. That's what I get for making last minute changes. It didn't affect
the benchmark, that was done before moving the code into postgresql for
testing. The function I tested with had an extra parameter to set the skip
table size. Each possible size was tested and the best time was saved along
with the best size.

> BTW, to the extent that you feel like testing a different idea,
> I would suggest:

> * don't bother initializing the skiptable when len1 < len2

Good plan.

> * otherwise, choose its size based on len1 - len2, not just len1 or
> len2.  This is (one less than) the maximum number of search loop
> consultations of the skip table that can happen, so it seems like a
> plausible number, and better than either length alone.

That seems like a better idea. I had considered len1 * len2, giving that's
the worst case for BMH. Of course the lengths would need to be raised quite
a bit. I'll go with len1 - len2

 I'll make the above changes and then run my benchmark again to see if the
skip table size logic should be updated.

I'll also benchmark and update the benchmark spreadsheet to see what's
changed.

David.



-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Tom Lane
"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> On Sat, Sep 6, 2008 at 3:57 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>>> postgres=# select -256::uint1;
>>> ERROR:  uint1 out of range
>> 
>> No, that's just because this is parsed as -(256::uint1)

> actually, i thought that case is right but the -255::uint1 returning a
> negative number (aka -255) is what bothers me

Well, again, that's -(255::uint1).  I suppose uint1 hasn't got a
negation operator (what would it do??), so probably the sequence of
events is to form 255::uint1, then implicitly promote it to some signed
type or other (most likely int4), then negate.  Not much to be done
about this unless you want to get rid of the implicit coercion to signed
types, which would probably defeat most of the purpose.

Now, if (-255)::uint1 fails to throw error, that would be a bug IMHO.
Casting any negative value to uint ought to fail, no?

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-06 Thread Alex Hunsaker
On Sat, Sep 6, 2008 at 1:09 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>For the convenience of anyone intending to test, here is an updated
>patch against CVS HEAD that incorporates Alex's fix.

Here are the results for a table containing 1 million entries that
will generate hash collisions.  It paints a bad picture for the patch
but then again im not sure how relevant the issue is.  For example
yesterday I imported a table with 10 million collisions and the create
index is still running (now at about ~18 hours).  Maybe we should warn
if there are lots of collisions when creating the index and suggest
you use a btree? Anyway here are the results.

./pgbench -c1 -n -t10 -f bench_createindex.sql
cvs head: tps = 0.002169
v5  : tps = 0.002196

pgbench -c1 -n -t1000 -f bench_bitmap.sql
cvs head: tps = 24.011871
v5:   tps = 2.543123

pgbench -c1 -n -t1000 -f bench_index.sql
cvs head: tps = 51.614502
v5:   tps = 3.205542

pgbench -c1 -n -t1000 -f bench_seqscan.sql
cvs head: tps = 8.553318
v5:   tps = 9.836091

Table created via:
create table test_hash (num int8);
./hash | psql -c 'copy test_hash from stdin;'


bench_create.sql
Description: Binary data


bench_index.sql
Description: Binary data


bench_seqscan.sql
Description: Binary data


int8collide.patch
Description: Binary data
#include 
#include 
#include 

int main(void)
{
	unsigned long y = 0;
	unsigned cnt = 0;

	while(cnt < 100)
	{
		y += UINT_MAX;
		y += 1;

		printf("%ld\n", y);

		cnt++;
	}
}

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] hash index improving v3

2008-09-06 Thread Alex Hunsaker
2008/9/6 Alex Hunsaker <[EMAIL PROTECTED]>:
> pgbench -c1 -n -t1000 -f bench_bitmap.sql
> cvs head: tps = 24.011871
> v5:   tps = 2.543123

oops forgot to attach bench_bitmap.sql


bench_bitmap.sql
Description: Binary data

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Jaime Casanova
On Sat, Sep 6, 2008 at 7:10 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> Now, if (-255)::uint1 fails to throw error, that would be a bug IMHO.
> Casting any negative value to uint ought to fail, no?
>

then the patch is right but it seems to me like that is broking the
law of less surprise i expected -2::uint1 to be equivalent to
(-2)::uint1 that should be at least documented, no?

-- 
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. (593) 87171157

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Tom Lane
"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> then the patch is right but it seems to me like that is broking the
> law of less surprise i expected -2::uint1 to be equivalent to
> (-2)::uint1 that should be at least documented, no?

See the precedence table here:
http://www.postgresql.org/docs/8.3/static/sql-syntax-lexical.html#SQL-PRECEDENCE

:: binds more tightly than -, and always has.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]

2008-09-06 Thread Jaime Casanova
On Sat, Sep 6, 2008 at 3:41 PM, Jaime Casanova
<[EMAIL PROTECTED]> wrote:
>
> i still have to make some more test...
>

why i need the cast in this case? even if the cast is really necesary
(the message seems realy ugly)

contrib_regression=# select * from t1 where f1 > 35;
ERROR:  unsupported type: 16486

contrib_regression=# select * from t1 where f1 > 35::uint4;
 f1
-
 36
 37
 38


-- 
regards,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. (593) 87171157

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches