Re: [HACKERS] Can postgres create a file with physically continuous blocks.

2010-12-21 Thread Heikki Linnakangas

On 22.12.2010 09:25, Rob Wultsch wrote:

On Wed, Dec 22, 2010 at 12:15 AM, Heikki Linnakangas
  wrote:

Hmm, innodb_autoextend_increment seems more like what we're discussing here
(http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_autoextend_increment).
If I'm reading that correctly, InnoDB defaults to extending files in 8MB
chunks.


This is not pure apples to apples as InnoDB does direct io, however
doesn't the checkpoint completion target code call fsync repeatedly in
order to achieve the check point completion target?


It only fsync's each file once. If there's a lot of files, it needs to 
issue a lot of fsync's, but for different files.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can postgres create a file with physically continuous blocks.

2010-12-21 Thread Rob Wultsch
On Wed, Dec 22, 2010 at 12:15 AM, Heikki Linnakangas
 wrote:
> On 22.12.2010 03:45, Rob Wultsch wrote:
>>
>> On Tue, Dec 21, 2010 at 4:49 AM, Robert Haas
>>  wrote:
>>>
>>> On Sun, Dec 19, 2010 at 1:10 PM, Jim Nasby  wrote:

 On Dec 19, 2010, at 1:10 AM, flyusa2010 fly wrote:
>
> Does postgres make an effort to create a file with physically
> continuous blocks?

 AFAIK all files are expanded as needed. I don't think there's any flags
 you can pass to the filesystem to tell it "this file will eventually be 1GB
 in size". So, we're basically at the mercy of the FS to try and keep things
 contiguous.
>>>
>>> There have been some reports that we would do better on some
>>> filesystems if we extended the file more than a block at a time, as we
>>> do today.  However, AFAIK, no one is pursuing this ATM.
>>
>> The has been found to be the case in the MySQL world, particularly
>> when ext3 is in use:
>> http://forge.mysql.com/worklog/task.php?id=4925
>> http://www.facebook.com/note.php?note_id=194501560932
>
> These seem to be about extending the transaction log, and we already
> pre-allocate the WAL. The WAL is repeatedly fsync'd, so I can understand
> that extending that in small chunks would hurt performance a lot, as the
> filesystem needs to flush the metadata changes to disk at every commit.
> However, that's not an issue with extending data files, they are only
> fsync'd at checkpoints.
>
> It might well be advantageous to extend data files in larger chunks too, but
> it's probably nowhere near as important as with the WAL.

Agree.

>> Also, InnoDB has an option for how much data should be allocated at
>> the end of a tablespace when it needs to grow:
>>
>> http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_data_file_path
>
> Hmm, innodb_autoextend_increment seems more like what we're discussing here
> (http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_autoextend_increment).
> If I'm reading that correctly, InnoDB defaults to extending files in 8MB
> chunks.

This is not pure apples to apples as InnoDB does direct io, however
doesn't the checkpoint completion target code call fsync repeatedly in
order to achieve the check point completion target? And for that
matter, haven't there been recent discussion on hackers about calling
fsync more often?

Sorry for the loopy email. I have not been getting anywhere near
enough sleep recently :(
-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Heikki Linnakangas

On 22.12.2010 02:56, Merlin Moncure wrote:

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane  wrote:

Merlin Moncure  writes:

Attached is an incomplete patch disabling hint bits based on compile
switch. ...
So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.


The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query.  You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.


right -- see the attached clog_stress.sql above.  It creates a script
that inserts records in blocks of 1, deletes half of them, and
vacuums.  Neither the execution of the script nor a seq scan following
its execution showed an interesting performance difference (which I am
arbitrarily calling 5% in either direction).  Like I said though, I
don't trust the patch or the results yet.


Make sure you have a good mix of different xids in the table, 
TransactionLogFetch has a one-item cache so repeatedly checking the same 
xid is much faster than the general case.


Perhaps run pgbench for a while, and then do "SELECT COUNT(*)" on the 
resulting tables.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can postgres create a file with physically continuous blocks.

2010-12-21 Thread Heikki Linnakangas

On 22.12.2010 03:45, Rob Wultsch wrote:

On Tue, Dec 21, 2010 at 4:49 AM, Robert Haas  wrote:

On Sun, Dec 19, 2010 at 1:10 PM, Jim Nasby  wrote:

On Dec 19, 2010, at 1:10 AM, flyusa2010 fly wrote:

Does postgres make an effort to create a file with physically continuous blocks?


AFAIK all files are expanded as needed. I don't think there's any flags you can pass to 
the filesystem to tell it "this file will eventually be 1GB in size". So, we're 
basically at the mercy of the FS to try and keep things contiguous.


There have been some reports that we would do better on some
filesystems if we extended the file more than a block at a time, as we
do today.  However, AFAIK, no one is pursuing this ATM.


The has been found to be the case in the MySQL world, particularly
when ext3 is in use:
http://forge.mysql.com/worklog/task.php?id=4925
http://www.facebook.com/note.php?note_id=194501560932


These seem to be about extending the transaction log, and we already 
pre-allocate the WAL. The WAL is repeatedly fsync'd, so I can understand 
that extending that in small chunks would hurt performance a lot, as the 
filesystem needs to flush the metadata changes to disk at every commit. 
However, that's not an issue with extending data files, they are only 
fsync'd at checkpoints.


It might well be advantageous to extend data files in larger chunks too, 
but it's probably nowhere near as important as with the WAL.



Also, InnoDB has an option for how much data should be allocated at
the end of a tablespace when it needs to grow:
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_data_file_path


Hmm, innodb_autoextend_increment seems more like what we're discussing 
here 
(http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_autoextend_increment). 
If I'm reading that correctly, InnoDB defaults to extending files in 8MB 
chunks.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Fujii Masao
On Wed, Dec 22, 2010 at 12:14 PM, Tom Lane  wrote:
> Fujii Masao  writes:
>> How about doing target != ALL test at the head for the most common case
>> (target == ALL)?
>
> That's an idea, but the test you propose implements it incorrectly.

Thanks! I revised the patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


signal-some-children-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The cost of visibillity testing? (gin-search)

2010-12-21 Thread Jesper Krogh

On 2010-12-21 21:28, Andres Freund wrote:

On Tuesday 21 December 2010 20:25:16 Jesper Krogh wrote:
   

What have I missed in the logic?
 

A reproducible testcase ;-)
   

Yes, I did a  complete dump/restore of the dataset and the numbers
looked like expected. So table bloat seems to be the problem/challenge.

I must have hit a strange sitauation where my table-bloat proportionally
was significantly higher than my gin-index-bloat.

Jesper

--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperlu problem with utf8

2010-12-21 Thread Alex Hunsaker
On Mon, Dec 20, 2010 at 00:39, Alex Hunsaker  wrote:

> In further review over caffeine this morning I noticed there are a few
> places I missed: plperl_build_tuple_result(), plperl_modify_tuple()
> and Util.XS.

And here is v3, fixes the above and also makes sure to properly
encode/decode SPI arguments.  Tested on a latin1 database with latin1
columns and utf8 with utf8 columns.  Also passes make installcheck (of
course) and changes one or two things to make plperl.c warning free.


plperl_enc_v3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-21 Thread Tom Lane
Robert Haas  writes:
> On Tue, Dec 21, 2010 at 10:24 PM, Tom Lane  wrote:
>> I'm fairly uncomfortable about the broad swath and low return of this
>> patch.  Noah is assuming that none of these places are relying on
>> strncmp to stop short upon finding a null, and I don't believe that
>> that's a safe assumption in every single place.  Nor do I believe that
>> it's worth the effort of trying to prove it safe in most of those
>> places.

> Eh, I already committed somewhat more than that.  I did think about
> the concern which you raise.

Okay ... I was arguing for not bothering to expend that effort, but
since you already did, it's a moot point.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in ts_rank_cd

2010-12-21 Thread Tom Lane
Sushant Sinha  writes:
> There is a bug in ts_rank_cd. It does not correctly give rank when the
> query lexeme is the first one in the tsvector.

Hmm ... I cannot reproduce the behavior you're complaining of.
You say

> select ts_rank_cd(to_tsvector('english', 'abc sdd'),
> plainto_tsquery('english', 'abc'));   
>  ts_rank_cd 
> 
>   0

but I get

regression=# select ts_rank_cd(to_tsvector('english', 'abc sdd'),
regression(# plainto_tsquery('english', 'abc'));   
 ts_rank_cd 

0.1
(1 row)

> The problem is that the Cover finding algorithm ignores the lexeme at
> the 0th position,

As far as I can tell, there is no "0th position" --- tsvector counts
positions from one.  The only way to see pos == 0 in the input to
Cover() is if the tsvector has been stripped of position information.
ts_rank_cd is documented to return 0 in that situation.  Your patch
would have the effect of causing it to return some nonzero, but quite
bogus, ranking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest wrap-up

2010-12-21 Thread Robert Haas
On Tue, Dec 21, 2010 at 11:12 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Dec 15, 2010 at 11:29 AM, Tom Lane  wrote:
 - Writeable CTEs - I think we need Tom to pick this one up.
 - Fix snapshot taking inconsistencies - Ready for committer. Can any
 committer pick this up?
>
>>> Will take a look at these two also.
>
>> Tom, what is your time frame on this?  I think we should wrap up the
>> CF without these and bundle 9.1alpha3 unless you plan to get to this
>> in the next day or two.
>
> We probably shouldn't hold up the alpha for these, if there are no
> other items outstanding.

OK.  I've moved them to the next CommitFest and marked this one closed.

*bangs gavel*

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-21 Thread Robert Haas
On Tue, Dec 21, 2010 at 10:24 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> If it's done properly, I don't see how this would be a risk.
>
> I'm fairly uncomfortable about the broad swath and low return of this
> patch.  Noah is assuming that none of these places are relying on
> strncmp to stop short upon finding a null, and I don't believe that
> that's a safe assumption in every single place.  Nor do I believe that
> it's worth the effort of trying to prove it safe in most of those
> places.
>
> I think this might be a good idea in the varchar.c and varlena.c calls,
> but I'd be inclined to leave the rest of the calls alone.

Eh, I already committed somewhat more than that.  I did think about
the concern which you raise.  It seems pretty clear that's not a
danger in readfuncs.c.  In the hstore and ltree cases, at least at
first blush, it appears to me that it would be downright broken for
someone to be counting on a null to terminate the comparison.  The
intent of these bits of code appears to be to do equality comparison a
string stored as a byte count + a byte string, rather than a
null-terminated cstring, so unless I'm misunderstanding something it's
more likely that the use of strncmp() would lead to a bug; the prior
coding doesn't look like it would be correct if NUL bytes were
possible.  The tsearch cases also appear to be safe in this regard,
but since I decided against committing those on other grounds I
haven't looked at them as carefully.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-21 Thread Tom Lane
Robert Haas  writes:
> If it's done properly, I don't see how this would be a risk.

I'm fairly uncomfortable about the broad swath and low return of this
patch.  Noah is assuming that none of these places are relying on
strncmp to stop short upon finding a null, and I don't believe that
that's a safe assumption in every single place.  Nor do I believe that
it's worth the effort of trying to prove it safe in most of those
places.

I think this might be a good idea in the varchar.c and varlena.c calls,
but I'd be inclined to leave the rest of the calls alone.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-21 Thread Robert Haas
On Tue, Dec 21, 2010 at 6:24 PM, Robert Haas  wrote:
> On Mon, Dec 20, 2010 at 1:10 PM, Noah Misch  wrote:
>> When the caller knows the smaller string length, memcmp and strncmp are
>> functionally equivalent.  Since memcmp need not watch each byte for a NULL
>> terminator, it often compares a CPU word at a time for better performance.  
>> The
>> attached patch changes use of strncmp to memcmp where we have the length of 
>> the
>> shorter string.  I was most interested in the varlena.c instances, but I 
>> tried
>> to find all applicable call sites.  To benchmark it, I used the attached
>> "bench-texteq.sql".  This patch improved my 5-run average timing of the 
>> SELECT
>> from 65.8s to 56.9s, a 13% improvement.  I can't think of a case where the
>> change should be pessimal.
>
> This is a good idea.  I will check this over and commit it.

A little benchmarking reveals that on my system (MacOS X 10.6.5) it
appears that strncmp() is faster for a 4 character string, but
memcmp() is faster for a 5+ character string.  So I think most of
these are pretty clear wins, but I have reverted the changes to
src/backend/tsearch because I'm not entirely confident that lexemes
and affixes will be long enough on average for this to be a win there.
 Please feel free to resubmit that part with performance results
showing that it works out to a win.  Some of the ltree changes
produced compiler warnings, so I omitted those also.  Committed the
rest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Tom Lane
Fujii Masao  writes:
> How about doing target != ALL test at the head for the most common case
> (target == ALL)?

That's an idea, but the test you propose implements it incorrectly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Mark Kirkwood

On 22/12/10 13:56, Merlin Moncure wrote:

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane  wrote:

@Mark: apparently the cvs server is behind git and there are some
recent changes to heapam.c that need more attention.  I need to get
git going on my box, but try changing this:

if ((tuple->t_infomask&  HEAP_XMIN_COMMITTED) ||
(!(tuple->t_infomask&  HEAP_XMIN_COMMITTED)&&
!(tuple->t_infomask&  HEAP_XMIN_INVALID)&&
TransactionIdDidCommit(xmin)))

to this:

if (TransactionIdDidCommit(xmin))

also, isn't the extra check vs HEAP_XMIN_COMMITTED redundant, and if
you do have to look up clog, why not set the hint bit?



That gets it compiling.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-21 Thread Gurjeet Singh
On Tue, Dec 21, 2010 at 9:01 PM, Robert Haas  wrote:

> On Tue, Dec 21, 2010 at 8:29 PM, Gurjeet Singh 
> wrote:
> > On Tue, Dec 21, 2010 at 6:24 PM, Robert Haas 
> wrote:
> >>
> >> On Mon, Dec 20, 2010 at 1:10 PM, Noah Misch  wrote:
> >> > When the caller knows the smaller string length, memcmp and strncmp
> are
> >> > functionally equivalent.  Since memcmp need not watch each byte for a
> >> > NULL
> >> > terminator, it often compares a CPU word at a time for better
> >> > performance.  The
> >> > attached patch changes use of strncmp to memcmp where we have the
> length
> >> > of the
> >> > shorter string.  I was most interested in the varlena.c instances, but
> I
> >> > tried
> >> > to find all applicable call sites.  To benchmark it, I used the
> attached
> >> > "bench-texteq.sql".  This patch improved my 5-run average timing of
> the
> >> > SELECT
> >> > from 65.8s to 56.9s, a 13% improvement.  I can't think of a case where
> >> > the
> >> > change should be pessimal.
> >>
> >> This is a good idea.  I will check this over and commit it.
> >
> > Doesn't this risk accessing bytes beyond the shorter string?
>
> If it's done properly, I don't see how this would be a risk.
>
> > Look at the
> > warning above the StrNCpy(), for example.
>
> If you're talking about this comment:
>
>  *  BTW: when you need to copy a non-null-terminated string (like a
> text
>  *  datum) and add a null, do not do it with StrNCpy(..., len+1).  That
>  *  might seem to work, but it fetches one byte more than there is in
> the
>  *  text object.
>
> ...then that's not applicable here.  It's perfectly safe to compare to
> strings of length n using an n-byte memcmp().  The bytes being
> compared are 0 through n - 1; the terminating null is in byte n, or
> else it isn't, but memcmp() certainly isn't going to look at it.
>
>
I missed the part where Noah said "... where we have the length of the *
_shorter_* string". I agree we are safe here.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-21 Thread Robert Haas
On Tue, Dec 21, 2010 at 8:29 PM, Gurjeet Singh  wrote:
> On Tue, Dec 21, 2010 at 6:24 PM, Robert Haas  wrote:
>>
>> On Mon, Dec 20, 2010 at 1:10 PM, Noah Misch  wrote:
>> > When the caller knows the smaller string length, memcmp and strncmp are
>> > functionally equivalent.  Since memcmp need not watch each byte for a
>> > NULL
>> > terminator, it often compares a CPU word at a time for better
>> > performance.  The
>> > attached patch changes use of strncmp to memcmp where we have the length
>> > of the
>> > shorter string.  I was most interested in the varlena.c instances, but I
>> > tried
>> > to find all applicable call sites.  To benchmark it, I used the attached
>> > "bench-texteq.sql".  This patch improved my 5-run average timing of the
>> > SELECT
>> > from 65.8s to 56.9s, a 13% improvement.  I can't think of a case where
>> > the
>> > change should be pessimal.
>>
>> This is a good idea.  I will check this over and commit it.
>
> Doesn't this risk accessing bytes beyond the shorter string?

If it's done properly, I don't see how this would be a risk.

> Look at the
> warning above the StrNCpy(), for example.

If you're talking about this comment:

 *  BTW: when you need to copy a non-null-terminated string (like a text
 *  datum) and add a null, do not do it with StrNCpy(..., len+1).  That
 *  might seem to work, but it fetches one byte more than there is in the
 *  text object.

...then that's not applicable here.  It's perfectly safe to compare to
strings of length n using an n-byte memcmp().  The bytes being
compared are 0 through n - 1; the terminating null is in byte n, or
else it isn't, but memcmp() certainly isn't going to look at it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can postgres create a file with physically continuous blocks.

2010-12-21 Thread Rob Wultsch
On Tue, Dec 21, 2010 at 4:49 AM, Robert Haas  wrote:
> On Sun, Dec 19, 2010 at 1:10 PM, Jim Nasby  wrote:
>> On Dec 19, 2010, at 1:10 AM, flyusa2010 fly wrote:
>>> Does postgres make an effort to create a file with physically continuous 
>>> blocks?
>>
>> AFAIK all files are expanded as needed. I don't think there's any flags you 
>> can pass to the filesystem to tell it "this file will eventually be 1GB in 
>> size". So, we're basically at the mercy of the FS to try and keep things 
>> contiguous.
>
> There have been some reports that we would do better on some
> filesystems if we extended the file more than a block at a time, as we
> do today.  However, AFAIK, no one is pursuing this ATM.
>


The has been found to be the case in the MySQL world, particularly
when ext3 is in use:
http://forge.mysql.com/worklog/task.php?id=4925
http://www.facebook.com/note.php?note_id=194501560932


Also, InnoDB has an option for how much data should be allocated at
the end of a tablespace when it needs to grow:
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_data_file_path

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-21 Thread Gurjeet Singh
On Tue, Dec 21, 2010 at 6:24 PM, Robert Haas  wrote:

> On Mon, Dec 20, 2010 at 1:10 PM, Noah Misch  wrote:
> > When the caller knows the smaller string length, memcmp and strncmp are
> > functionally equivalent.  Since memcmp need not watch each byte for a
> NULL
> > terminator, it often compares a CPU word at a time for better
> performance.  The
> > attached patch changes use of strncmp to memcmp where we have the length
> of the
> > shorter string.  I was most interested in the varlena.c instances, but I
> tried
> > to find all applicable call sites.  To benchmark it, I used the attached
> > "bench-texteq.sql".  This patch improved my 5-run average timing of the
> SELECT
> > from 65.8s to 56.9s, a 13% improvement.  I can't think of a case where
> the
> > change should be pessimal.
>
> This is a good idea.  I will check this over and commit it.
>

Doesn't this risk accessing bytes beyond the shorter string? Look at the
warning above the StrNCpy(), for example.

Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device


Re: [HACKERS] Patch BUG #5103: "pg_ctl -w (re)start" fails with custom unix_socket_directory

2010-12-21 Thread Alvaro Herrera
Excerpts from Quan Zongliang's message of mar dic 21 18:36:11 -0300 2010:
> On Mon, 29 Nov 2010 10:29:17 -0300
> Alvaro Herrera  wrote:
> 

> > I think the way this should work is that you call postmaster with a new
> > switch and it prints out its configuration, after reading the
> > appropriate config file(s).  That way it handles all the little details
> > such as figuring out the correct config file, hadle include files, etc.
> > This output would be presumably easier to parse and more trustworthy.
> 
> Sorry for my late reply.
> 
> I will check the source of postmaster.

Actually Bruce Momjian is now working on a different fix:
unix_socket_directory would be added to postmaster.pid, allowing pg_ctl
to find it.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Fujii Masao
On Sat, Dec 18, 2010 at 1:00 AM, Robert Haas  wrote:
> On Fri, Dec 17, 2010 at 10:27 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> I think the attached might be a little tidier.  Thoughts?
>>
>> I'm not really thrilled at the idea of calling
>> IsPostmasterChildWalSender for every child whether or not it will have
>> any impact on the decision.  That involves touching shared memory which
>> can be rather expensive (see previous discussions about shared cache
>> lines and so forth).
>
> The existing code already does that, unless I'm missing something.  We
> could improve on my proposed patch a bit by doing the is_autovacuum
> test first and the walsender test second.  I'm not sure how to improve
> on it beyond that.

How about doing target != ALL test at the head for the most common case
(target == ALL)? I added that test into your patch and changed it so that the
is_autovacuum test is done first.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


signal-some-children-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Merlin Moncure
On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> Attached is an incomplete patch disabling hint bits based on compile
>> switch. ...
>> So far, at least doing pgbench runs and another test designed to
>> exercise clog lookups, the performance loss of always doing full
>> lookup hasn't materialized.
>
> The standard pgbench test would be just about 100% useless for stressing
> this, because its net database activity is only about one row
> touched/updated per query.  You need a test case that hits lots of rows
> per query, else you're just measuring parse+plan+network overhead.

right -- see the attached clog_stress.sql above.  It creates a script
that inserts records in blocks of 1, deletes half of them, and
vacuums.  Neither the execution of the script nor a seq scan following
its execution showed an interesting performance difference (which I am
arbitrarily calling 5% in either direction).  Like I said though, I
don't trust the patch or the results yet.

@Mark: apparently the cvs server is behind git and there are some
recent changes to heapam.c that need more attention.  I need to get
git going on my box, but try changing this:

if ((tuple->t_infomask & HEAP_XMIN_COMMITTED) ||
(!(tuple->t_infomask & HEAP_XMIN_COMMITTED) &&
 !(tuple->t_infomask & HEAP_XMIN_INVALID) &&
 TransactionIdDidCommit(xmin)))

to this:

if (TransactionIdDidCommit(xmin))

also, isn't the extra check vs HEAP_XMIN_COMMITTED redundant, and if
you do have to look up clog, why not set the hint bit?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Tom Lane
Merlin Moncure  writes:
> Attached is an incomplete patch disabling hint bits based on compile
> switch. ...
> So far, at least doing pgbench runs and another test designed to
> exercise clog lookups, the performance loss of always doing full
> lookup hasn't materialized.

The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query.  You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Merlin Moncure
On Tue, Dec 21, 2010 at 7:20 PM, Merlin Moncure  wrote:
> On Tue, Dec 21, 2010 at 7:06 PM, Mark Kirkwood
>  wrote:
>> On 22/12/10 13:05, Mark Kirkwood wrote:
>>>
>>> On 22/12/10 11:42, Merlin Moncure wrote:

 Attached is an incomplete patch disabling hint bits based on compile
 switch.  It's not complete, for example it's not reconciling some
 assumptions in heapam.c that hint bits have been set in various
 routines.  However, it mostly passes regression and I deemed it good
 enough to run some preliminary benchmarks and fool around.  Obviously,
 hint bits are an annoying impediment to a couple of other cool pending
 features, and it certainly would be nice to operate without them.
 Also, for particular workloads, the extra i/o hint bits can cause a
 fair amount of pain.
>>>
>>> Looks like a great idea to test, however I don't seem to be able to
>>> compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
>>> src/include/pg_config_manual.h)
>>>
>>> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
>>> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
>>> -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
>>> heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
>>> heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in this
>>> function)
>>> heapam.c:3867: error: (Each undeclared identifier is reported only once
>>> heapam.c:3867: error: for each function it appears in.)
>>> heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this
>>> function)
>>> make[4]: *** [heapam.o] Error 1
>>>
>>
>> Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)
>
> did you check to see if the patch applied clean? btw I was working
> against postgresql-9.0.1...

ah, this is the problem (9.0.1 vs head).  to work vs head it prob
needs a few more tweaks.  you can also try removing it yourself --
most of the changes follow a similar pattern.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Merlin Moncure
On Tue, Dec 21, 2010 at 7:06 PM, Mark Kirkwood
 wrote:
> On 22/12/10 13:05, Mark Kirkwood wrote:
>>
>> On 22/12/10 11:42, Merlin Moncure wrote:
>>>
>>> Attached is an incomplete patch disabling hint bits based on compile
>>> switch.  It's not complete, for example it's not reconciling some
>>> assumptions in heapam.c that hint bits have been set in various
>>> routines.  However, it mostly passes regression and I deemed it good
>>> enough to run some preliminary benchmarks and fool around.  Obviously,
>>> hint bits are an annoying impediment to a couple of other cool pending
>>> features, and it certainly would be nice to operate without them.
>>> Also, for particular workloads, the extra i/o hint bits can cause a
>>> fair amount of pain.
>>
>> Looks like a great idea to test, however I don't seem to be able to
>> compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
>> src/include/pg_config_manual.h)
>>
>> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
>> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
>> -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
>> heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
>> heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in this
>> function)
>> heapam.c:3867: error: (Each undeclared identifier is reported only once
>> heapam.c:3867: error: for each function it appears in.)
>> heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this
>> function)
>> make[4]: *** [heapam.o] Error 1
>>
>
> Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

did you check to see if the patch applied clean? btw I was working
against postgresql-9.0.1...

it looks like you are missing at least some of the changes to htup.h:

../postgresql-9.0.1_hb2/src/include/access/htup.h

#ifndef DISABLE_HINT_BITS
#define HEAP_XMIN_COMMITTED 0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID   0x0200  /* t_xmin invalid/aborted */
#define HEAP_XMAX_COMMITTED 0x0400  /* t_xmax committed */
#define HEAP_XMAX_INVALID   0x0800  /* t_xmax invalid/aborted */
#endif

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Mark Kirkwood

On 22/12/10 13:05, Mark Kirkwood wrote:

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.


Looks like a great idea to test, however I don't seem to be able to 
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of 
src/include/pg_config_manual.h)


gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing 
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o 
heapam.c

heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in 
this function)

heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in 
this function)

make[4]: *** [heapam.o] Error 1



Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Mark Kirkwood

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.


Looks like a great idea to test, however I don't seem to be able to 
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of 
src/include/pg_config_manual.h)


gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing 
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c

heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in 
this function)

heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this 
function)

make[4]: *** [heapam.o] Error 1


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How much do the hint bits help?

2010-12-21 Thread Kevin Grittner
Merlin Moncure  wrote:
 
> *) what's a good way to stress the clog severely? I'd like to pick
> a degenerate case to get a better idea of the way things stand
> without them.
 
The worst I can think of is a large database with a 90/10 mix of
reads to writes -- all short transactions.  Maybe someone else can
do better.  In particular, I'm not sure how savepoints might play
into a degenerate case.
 
Since we're always talking about how to do better with hint bits
during an unlogged bulk load, it would be interesting to benchmark
one of those followed by a `select count(*) from newtable;` with and
without the patch, on a data set too big to fit in RAM.
 
> *) is there community interest in a full patch that fills in the
> missing details not implemented here?
 
I'm certainly curious to see real numbers.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The cost of visibillity testing? (gin-search)

2010-12-21 Thread Tom Lane
Heikki Linnakangas  writes:
> On 21.12.2010 21:25, Jesper Krogh wrote:
>> Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for
>> visibillity-testing?

> It certainly shouldn't be.

>> What have I missed in the logic?

> Perhaps you have a lot of empty space or dead tuples that don't match 
> the query in the table, which the sequential scan has to grovel through, 
> but the bitmap scan skips? What does EXPLAIN ANALYZE of both queries say?

Another possibility is that the seqscan is slowed by trying to operate
in a limited number of buffers (the buffer strategy stuff).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncmp->memcmp when we know the shorter length

2010-12-21 Thread Robert Haas
On Mon, Dec 20, 2010 at 1:10 PM, Noah Misch  wrote:
> When the caller knows the smaller string length, memcmp and strncmp are
> functionally equivalent.  Since memcmp need not watch each byte for a NULL
> terminator, it often compares a CPU word at a time for better performance.  
> The
> attached patch changes use of strncmp to memcmp where we have the length of 
> the
> shorter string.  I was most interested in the varlena.c instances, but I tried
> to find all applicable call sites.  To benchmark it, I used the attached
> "bench-texteq.sql".  This patch improved my 5-run average timing of the SELECT
> from 65.8s to 56.9s, a 13% improvement.  I can't think of a case where the
> change should be pessimal.

This is a good idea.  I will check this over and commit it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comment typo in nodeWindowAgg.c

2010-12-21 Thread Robert Haas
On Tue, Dec 21, 2010 at 4:17 PM, Andreas Karlsson  wrote:
> Found a couple of small typos in the comments of nodeWindowAgg.c when
> they refer to functions in nodeAgg.c. The pluralities of the function
> names (initialize_aggregates and advance_aggregates) are wrong. The
> reference to "finalize_aggregate" is correct though.

Committed, thanks.  But please attach patches rather than including them inline.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch BUG #5103: "pg_ctl -w (re)start" fails with custom unix_socket_directory

2010-12-21 Thread Quan Zongliang
On Mon, 29 Nov 2010 10:29:17 -0300
Alvaro Herrera  wrote:

> Excerpts from Quan Zongliang's message of sáb nov 27 06:03:12 -0300 2010:
> > Hi, all
> > 
> > I created a pg_ctl patch to fix:
> > * BUG #5103: "pg_ctl -w (re)start" fails with custom unix_socket_directory 
> > Allow pg_ctl to work properly with configuration files located outside the 
> > PGDATA directory
> 
> I think the way this should work is that you call postmaster with a new
> switch and it prints out its configuration, after reading the
> appropriate config file(s).  That way it handles all the little details
> such as figuring out the correct config file, hadle include files, etc.
> This output would be presumably easier to parse and more trustworthy.
> 
> Right now we have --describe-config, which is missing the values for
> each config option.
> 

Sorry for my late reply.

I will check the source of postmaster.


-- 
Quan Zongliang 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Robert Haas
On Tue, Dec 21, 2010 at 4:57 PM, Pavel Golub  wrote:
> Anyway I find such function usefull even though I still hadn't
> situation when it might be needed.

Yeah, I agree.  I'm not sure we should add it to core, but it's
certainly just as useful as many things we have in contrib.  I'll bet
it would get at least as much use as the six argument form of
levenshtein_less_equal().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Pavel Golub
Hello.

Guys, guys! It was only a joke! :)

Please accept my appologies.

Anyway I find such function usefull even though I still hadn't
situation when it might be needed.
You wrote:



AD> On 12/21/2010 04:28 PM, Pavel Golub wrote:
>>
>> PS>  * It isn't a typical and often request,
>> PS>  * There are not hard breaks for custom implementation,
>> PS>  * You can use plperu or plpython based solutions,
>> PS>  * It's not part of ANSI SQL
>>
>> But MySQL has such function. What's wrong with us? ;)
>>

AD> Our aim is not to duplicate everything in MySQL.

AD> cheers

AD> andrew



-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Andrew Dunstan



On 12/21/2010 04:28 PM, Pavel Golub wrote:


PS>  * It isn't a typical and often request,
PS>  * There are not hard breaks for custom implementation,
PS>  * You can use plperu or plpython based solutions,
PS>  * It's not part of ANSI SQL

But MySQL has such function. What's wrong with us? ;)



Our aim is not to duplicate everything in MySQL.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Kenneth Marshall
On Tue, Dec 21, 2010 at 11:28:17PM +0200, Pavel Golub wrote:
> Hello, Pavel.
> 
> You wrote:
> 
> PS> Hello
> 
> PS> Dne 21. prosince 2010 21:11 Tom Mudru??ka  
> napsal(a):
> >>
> >> Thx for you answers :-)
> >> Well... i know that i can write my own plugin and i am familiar with C so
> >> this is not the problem, but i think that such feature should be
> >> implemented directly in PgSQL because there are already functions for
> >> converting to/from base 16 so why don't make this more flexible and
> >> generalize it to any other radix? It's quite simple to do and i don't see
> >> any reason why 16 should be there and 8, 32 or 36 shouldn't :-)
> >>
> 
> PS> * It isn't a typical and often request,
> PS> * There are not hard breaks for custom implementation,
> PS> * You can use plperu or plpython based solutions,
> PS> * It's not part of ANSI SQL
> 
> But MySQL has such function. What's wrong with us? ;)
> 

You are not really helping to make a good case... :)

Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Pavel Golub
Hello, Pavel.

You wrote:

PS> Hello

PS> Dne 21. prosince 2010 21:11 Tomáš Mudruňka  napsal(a):
>>
>> Thx for you answers :-)
>> Well... i know that i can write my own plugin and i am familiar with C so
>> this is not the problem, but i think that such feature should be
>> implemented directly in PgSQL because there are already functions for
>> converting to/from base 16 so why don't make this more flexible and
>> generalize it to any other radix? It's quite simple to do and i don't see
>> any reason why 16 should be there and 8, 32 or 36 shouldn't :-)
>>

PS> * It isn't a typical and often request,
PS> * There are not hard breaks for custom implementation,
PS> * You can use plperu or plpython based solutions,
PS> * It's not part of ANSI SQL

But MySQL has such function. What's wrong with us? ;)

PS> Regards

PS> Pavel Stehule

>> peace
>>
>> On Tue, 21 Dec 2010 15:04:03 +0100, Florian Pflug  wrote:
>>> On Dec21, 2010, at 12:48 , Robert Haas wrote:
 2010/12/21 Tomáš Mudruňka :
> Is there possibility of having internal base converting function in
> PgSQL?
> There are already functions for converting between decimal and
> hexadecimal
> notations i think pgsql can be able to convert between number with
> radixes
> from 1 to 36 (actually fast (de)encoding base36 is what i need)...

 It should be pretty easy to write such a function in C, perhaps using
 strtol() or strtoul().
>>>
>>> If you're not comfortable doing this in C, you might also want to
>> consider
>>> one of procedural languages pl/pgsql, pl/perl, pl/python. pl/pgsql is
>>> probably
>>> only viable if you just need this for ints and bigints, unless you don't
>>> care about performance.
>>>
>>> best regards,
>>> Florian Pflug
>>
>> --
>> S pozdravem
>> Best regards
>>   Tomáš Mudruňka - Spoje.net / Arachne Labs
>>
>> XMPP/Jabber: har...@jabbim.cz, ICQ: 283782978
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>




-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wCTE behaviour

2010-12-21 Thread David Fetter
On Tue, Dec 21, 2010 at 11:14:31PM +0200, Peter Eisentraut wrote:
> On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote:
> > On 2010-11-12 8:25 PM +0200, I wrote:
> > > I'm going to take some time off this weekend to get a patch with this
> > > behaviour to the next commitfest.
> > 
> > .. and a wild patch appears.
> > 
> > This is almost exactly the patch from 2010-02 without 
> > CommandCounterIncrement()s.  It's still a bit rough around the edges and 
> > needs some more comments, but I'm posting it here anyway.
> 
> To pick up an earlier thread again, has any serious thought been given
> to adapting the SQL2001/DB2 syntax instead of our own?

Yes, and it's a good deal more limited and less intuitive than ours.

This is one place where we got it right and the standard just got
pushed into doing whatever IBM did.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Comment typo in nodeWindowAgg.c

2010-12-21 Thread Andreas Karlsson
Hi,

Found a couple of small typos in the comments of nodeWindowAgg.c when
they refer to functions in nodeAgg.c. The pluralities of the function
names (initialize_aggregates and advance_aggregates) are wrong. The
reference to "finalize_aggregate" is correct though.


diff --git a/src/backend/executor/nodeWindowAgg.c
b/src/backend/executor/nodeWindowAgg.c
index c3efe12..51f98c1 100644
*** a/src/backend/executor/nodeWindowAgg.c
--- b/src/backend/executor/nodeWindowAgg.c
*** static bool window_gettupleslot(WindowOb
*** 181,187 
  
  /*
   * initialize_windowaggregate
!  * parallel to initialize_aggregate in nodeAgg.c
   */
  static void
  initialize_windowaggregate(WindowAggState *winstate,
--- 181,187 
  
  /*
   * initialize_windowaggregate
!  * parallel to initialize_aggregates in nodeAgg.c
   */
  static void
  initialize_windowaggregate(WindowAggState *winstate,
*** initialize_windowaggregate(WindowAggStat
*** 207,213 
  
  /*
   * advance_windowaggregate
!  * parallel to advance_aggregate in nodeAgg.c
   */
  static void
  advance_windowaggregate(WindowAggState *winstate,
--- 207,213 
  
  /*
   * advance_windowaggregate
!  * parallel to advance_aggregates in nodeAgg.c
   */
  static void
  advance_windowaggregate(WindowAggState *winstate,

Regards,
Andreas Karlsson



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wCTE behaviour

2010-12-21 Thread Peter Eisentraut
On sön, 2010-11-14 at 04:45 +0200, Marko Tiikkaja wrote:
> On 2010-11-12 8:25 PM +0200, I wrote:
> > I'm going to take some time off this weekend to get a patch with this
> > behaviour to the next commitfest.
> 
> .. and a wild patch appears.
> 
> This is almost exactly the patch from 2010-02 without 
> CommandCounterIncrement()s.  It's still a bit rough around the edges and 
> needs some more comments, but I'm posting it here anyway.

To pick up an earlier thread again, has any serious thought been given
to adapting the SQL2001/DB2 syntax instead of our own?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Pavel Stehule
Hello

Dne 21. prosince 2010 21:11 Tomáš Mudruňka  napsal(a):
>
> Thx for you answers :-)
> Well... i know that i can write my own plugin and i am familiar with C so
> this is not the problem, but i think that such feature should be
> implemented directly in PgSQL because there are already functions for
> converting to/from base 16 so why don't make this more flexible and
> generalize it to any other radix? It's quite simple to do and i don't see
> any reason why 16 should be there and 8, 32 or 36 shouldn't :-)
>

* It isn't a typical and often request,
* There are not hard breaks for custom implementation,
* You can use plperu or plpython based solutions,
* It's not part of ANSI SQL

Regards

Pavel Stehule

> peace
>
> On Tue, 21 Dec 2010 15:04:03 +0100, Florian Pflug  wrote:
>> On Dec21, 2010, at 12:48 , Robert Haas wrote:
>>> 2010/12/21 Tomáš Mudruňka :
 Is there possibility of having internal base converting function in
 PgSQL?
 There are already functions for converting between decimal and
 hexadecimal
 notations i think pgsql can be able to convert between number with
 radixes
 from 1 to 36 (actually fast (de)encoding base36 is what i need)...
>>>
>>> It should be pretty easy to write such a function in C, perhaps using
>>> strtol() or strtoul().
>>
>> If you're not comfortable doing this in C, you might also want to
> consider
>> one of procedural languages pl/pgsql, pl/perl, pl/python. pl/pgsql is
>> probably
>> only viable if you just need this for ints and bigints, unless you don't
>> care about performance.
>>
>> best regards,
>> Florian Pflug
>
> --
> S pozdravem
> Best regards
>   Tomáš Mudruňka - Spoje.net / Arachne Labs
>
> XMPP/Jabber: har...@jabbim.cz, ICQ: 283782978
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The cost of visibillity testing? (gin-search)

2010-12-21 Thread Andres Freund
On Tuesday 21 December 2010 20:25:16 Jesper Krogh wrote:
> What have I missed in the logic?
A reproducible testcase ;-)

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Tomáš Mudruňka

Thx for you answers :-)
Well... i know that i can write my own plugin and i am familiar with C so
this is not the problem, but i think that such feature should be
implemented directly in PgSQL because there are already functions for
converting to/from base 16 so why don't make this more flexible and
generalize it to any other radix? It's quite simple to do and i don't see
any reason why 16 should be there and 8, 32 or 36 shouldn't :-)

peace

On Tue, 21 Dec 2010 15:04:03 +0100, Florian Pflug  wrote:
> On Dec21, 2010, at 12:48 , Robert Haas wrote:
>> 2010/12/21 Tomáš Mudruňka :
>>> Is there possibility of having internal base converting function in
>>> PgSQL?
>>> There are already functions for converting between decimal and
>>> hexadecimal
>>> notations i think pgsql can be able to convert between number with
>>> radixes
>>> from 1 to 36 (actually fast (de)encoding base36 is what i need)...
>> 
>> It should be pretty easy to write such a function in C, perhaps using
>> strtol() or strtoul().
> 
> If you're not comfortable doing this in C, you might also want to
consider
> one of procedural languages pl/pgsql, pl/perl, pl/python. pl/pgsql is
> probably
> only viable if you just need this for ints and bigints, unless you don't
> care about performance.
> 
> best regards,
> Florian Pflug

-- 
S pozdravem
Best regards
   Tomáš Mudruňka - Spoje.net / Arachne Labs

XMPP/Jabber: har...@jabbim.cz, ICQ: 283782978

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Alvaro Herrera
Excerpts from Robert Haas's message of mar dic 21 08:40:49 -0300 2010:

> > Well, non-developers don't tend to attach gdb very often.  Alvaro
> > mentioned a problem installation upthread, thus the question.
> 
> Hearing no cries of "please-oh-please-backpatch-this", I've committed
> it just to master.

Please-oh-please backpatch this ... at least to 8.4.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The cost of visibillity testing? (gin-search)

2010-12-21 Thread Heikki Linnakangas

On 21.12.2010 21:25, Jesper Krogh wrote:

The first query should have the cost of the GIN-search +
visibillity-test of 158K tuples,
the latter should have the cost of visibillity-testing 168K tuples. If
we set the cost
of actually searching GIN to 0 then the gin-search - visibillity costs:
95/158000 0.000373ms/tuple
where the seq-scan case costs close to 0.001ms/tuple (close to 3 times
as much).

So I have a strong feeling that GIN is cheating on the visibillity tests
otherwise I have problems imagining how it ever can become faster to
execute
than the seq_scan of the table.

Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for
visibillity-testing?


It certainly shouldn't be.


What have I missed in the logic?


Perhaps you have a lot of empty space or dead tuples that don't match 
the query in the table, which the sequential scan has to grovel through, 
but the bitmap scan skips? What does EXPLAIN ANALYZE of both queries say?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Eric Ridge
On Tue, Dec 21, 2010 at 2:33 PM, Robert Haas  wrote:
> The point of the patch was to improve cases where attaching gdb
> *didn't* work well.  Any cases where it was already working for you
> aren't going to be made worse by this.

Okay, great.  Thanks for the clarification.

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Robert Haas
On Tue, Dec 21, 2010 at 1:45 PM, Eric Ridge  wrote:
> On Mon, Dec 20, 2010 at 3:36 PM, Martijn van Oosterhout
>  wrote:
>> On Mon, Dec 20, 2010 at 03:08:02PM -0500, Robert Haas wrote:
>>> The attached patch appears to work correctly on MacOS X.  I did check,
>>> BTW: getppid() in the attached process returns gdb's pid.  Poor!
>>
>> This appears to be a BSDism at least. On Linux and BSD derivatives the
>> man pages specifically mention the reparenting (needed for catching
>> signals) but on Linux getppid() is specifically documented to return
>> the correct value anyway.
>
> I'm just a random lurker here, and happened to catch the last bit of
> this thread.  Could one of you that understand this issue straighten
> something out for me?
>
> Every now and again we've been known to attach gdb to a production
> Postgres backend to troubleshoot problems.  Ya know, just trying to
> get an idea of what Postgres is actually doing via a backtrace.  This
> is always on Linux, BTW.
>
> Does this thread mean that the above no longer works with v9?  Or is
> this only on non-Linux systems, or did the patch Robert Haas commit
> "fix" fix?  We're still using 8.1 (slowly moving to 8.4) in
> production, but have plans of picking up 9.x later in '11.  Just
> wondering if we need to actually be a bit more careful in the future?

The point of the patch was to improve cases where attaching gdb
*didn't* work well.  Any cases where it was already working for you
aren't going to be made worse by this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - core functionality

2010-12-21 Thread Simon Riggs
On Wed, 2010-12-15 at 22:25 +0900, Shigeru HANADA wrote:

> Attached are revised version of SQL/MED core functionality patches.

Looks very interesting new feature, well done.

Can I ask some questions about how this will work?
No particular order, just numbered for reference.

1. The docs don't actually say what a foreign table is. Is it a local
representation of foreign data? Or a local copy of foreign data? Or is
it a table created on a remote node?

2. Will CREATE FOREIGN TABLE require a transactionid? It seems a good
replacement for temp tables on Hot Standby to be able to run a CREATE
FOREIGN TABLE using the file_fdw, then reuse the file again later.

3. Do we support CREATE TEMP FOREIGN TABLE? It seems desirable to be
able to move data around temporarily, as we do with normal tables.

4. In Hot Standby, we are creating many copies of the data tables on
different servers. That seems to break the concept that data is in only
one place, when we assume that a foreign table is on only one foreign
server. How will we represent the concept that data is potentially
available identically from more than one place? Any other comments about
how this will work with Hot Standby?

5. In PL/Proxy, we have the concept that a table is sharded across
multiple nodes. Is that possible here? Again, we seem to have the
concept that a table is only ever in a single place.

6. Can we do CREATE FOREIGN TABLE  AS SELECT ...
I guess the answer depends on (1)

7. Why does ANALYZE skip foreign tables? Surely its really important we
know things about a foreign table, otherwise we are going to optimize
things very badly.

8. Is the WHERE clause passed down into a ForeignScan?

9. The docs for CHECK constraints imply that the CHECK is executed
against any rows returned from FDW. Are we really going to execute that
as an additional filter on each row retrieved?

10. Can a foreign table be referenced by a FK?

11. Can you create a DO INSTEAD trigger on a FOREIGN TABLE?

12. I think it would be useful for both review and afterwards to write
the documentation section now, so we can begin to understand this. Will
there be a documentation section on writing a FDW also? There are enough
open questions here that I think we need docs and a review guide,
otherwise we'll end up with some weird missing feature, which would be a
great shame.

13. How does this relate to dblink? Is that going to be replaced by this
feature?

14. How do we do scrollable cursors with foreign tables? Do we
materialize them always? Or...

15. In terms of planning queries, do we have a concept of additional
cost per row on a foreign server? How does the planner decide how costly
retrieving data is from the FDW?

16. If we cancel a query, is there an API call to send query cancel to
the FDW and so on to the foreign server? Does that still work if we hot
other kinds of ERROR, or FATAL?

17. Can we request different types of transaction isolation on the
foreign server, or do certain states get passed through from our
session? e.g. if we are running a serializable transaction, does that
state get passed through to the FDW, so it knows to request that on the
foreign server? That seems essential if we are going to make pg_dump
work correctly.

18. Does pg_dump dump the data in the FDW or just of the definition of
the data? Can we have an option for either?

19. If we PREPARE a statement, are there API calls to pass thru the
PREPARE to the FDW? Or are calls always dynamic?

20. If default privileges include INSERT, UPDATE or DELETE, does this
cause error, or does it silently get ignored for foreign tables? I think
I would want the latter.

21. Can we LOCK a foreign table? I guess so. Presumably no LOCK is
passed through to the FDW?

22. Can we build an local index on a foreign table?

No too sure what the right answers are to these questions, but I think
we need to know the answers to understand what we are getting.

Thanks

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] The cost of visibillity testing? (gin-search)

2010-12-21 Thread Jesper Krogh

Hi Hackers.

I have a feeling that GIN is "cheating" on the visibillity checks:

test=# set enable_seqscan = off;
SET
Time: 0.129 ms
test=# select count(id) from fts_test where fts @@ to_tsquery('core');
 count

 158827
(1 row)

Time: 95.530 ms
test=# explain select count(id) from fts_test where fts @@ 
to_tsquery('core');

  QUERY PLAN
--
 Aggregate  (cost=211571.52..211571.53 rows=1 width=4)
   ->  Bitmap Heap Scan on fts_test  (cost=134925.95..211174.01 
rows=159004 width=4)

 Recheck Cond: (fts @@ to_tsquery('core'::text))
 ->  Bitmap Index Scan on fts_idx  (cost=0.00..134886.20 
rows=159004 width=0)

   Index Cond: (fts @@ to_tsquery('core'::text))
(5 rows)

Time: 0.609 ms

test=# select count(id) from fts_test;
 count

 168556
(1 row)

Time: 164.655 ms

test=# explain select count(id) from fts_test;
   QUERY PLAN

 Aggregate  (cost=1075969.95..1075969.96 rows=1 width=4)
   ->  Seq Scan on fts_test  (cost=100.00..1075548.56 
rows=168556 width=4)

(2 rows)

Time: 0.338 ms

This is run multiple times for both queries and the seqscan of the table
is consistently about 1.8 times more expensive than the fts-scan.
This is all on a fully memory cached dataset.

The first query should have the cost of the GIN-search + 
visibillity-test of 158K tuples,
the latter should have the cost of visibillity-testing 168K tuples. If 
we set the cost
of actually searching GIN to 0 then the gin-search - visibillity costs: 
95/158000 0.000373ms/tuple
where the seq-scan case costs close to 0.001ms/tuple (close to 3 times 
as much).


So I have a strong feeling that GIN is cheating on the visibillity tests
otherwise I have problems imagining how it ever can become faster to execute
than the seq_scan of the table.

Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for 
visibillity-testing?


What have I missed in the logic?

Thanks.

--
Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Eric Ridge
On Mon, Dec 20, 2010 at 3:36 PM, Martijn van Oosterhout
 wrote:
> On Mon, Dec 20, 2010 at 03:08:02PM -0500, Robert Haas wrote:
>> The attached patch appears to work correctly on MacOS X.  I did check,
>> BTW: getppid() in the attached process returns gdb's pid.  Poor!
>
> This appears to be a BSDism at least. On Linux and BSD derivatives the
> man pages specifically mention the reparenting (needed for catching
> signals) but on Linux getppid() is specifically documented to return
> the correct value anyway.

I'm just a random lurker here, and happened to catch the last bit of
this thread.  Could one of you that understand this issue straighten
something out for me?

Every now and again we've been known to attach gdb to a production
Postgres backend to troubleshoot problems.  Ya know, just trying to
get an idea of what Postgres is actually doing via a backtrace.  This
is always on Linux, BTW.

Does this thread mean that the above no longer works with v9?  Or is
this only on non-Linux systems, or did the patch Robert Haas commit
"fix" fix?  We're still using 8.1 (slowly moving to 8.4) in
production, but have plans of picking up 9.x later in '11.  Just
wondering if we need to actually be a bit more careful in the future?

Thanks!

eric

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread Tomas Vondra
Dne 21.12.2010 16:54, Florian Pflug napsal(a):
>> Hmmm, maybe we could give this possibility (to identify two separate
>> groups of columns) to the user. So instead of 'buid stats for (A,B,C)' the
>> user would say 'build stats for (A,B) and (C)' - this actually represents
>> apriori knowledge of dependencies supplied by the user.
>>
>> In that case we could search for 'implicativeness' between those two
>> groups (and not within the groups), and we could restrict ourselves to 2D
>> (and thus use a more sophisticated formula).
> 
> Hm, I hated this idea at first, but I'm starting to like it more and more.
> It *does* seem rather unrealistic that a user would know that a bunch of
> columns are correlated, but have no idea in what way... 

Yes, that's true. Although sometimes the dependency may be very
complicated - but let's restrict to 2D for now, build something that
solves this simplified case and then we can discuss higher dimensions.

> Any examples when this's be the case would be very much appreciated - Maybe
> we should ask around on -general about this?

Well, I think the ZIP code example i a typical case of this - the users
know about the dependency between ZIP codes and cities. A natural
workaround would be to omit the dependent column from the query, but
that's not always possible (e.g. when an ORM is involved, building the
queries automatically).

>> But we should be able to do some basic analysis even when the user
>> supplies a list of columns without such apriori knowledge.
> 
> That, I think, overcomplicates things, at least for a first cut.
> 
> To summarize, I think you've shown quite nicely that the uniform bayesian
> approach is a very sensible first step towards better estimates in the case
> of correlated columns. It's statistically sound, and the dist(A,B) estimates
> it requires are probably a necessary ingredient of any solution to the 
> problem.
> If we can make it degrade more gracefully if the columns are uncorrelated we
> should do that, but if we can't thats still no reason to drop the whole idea.

Agreed. IMHO the uncorrelated case is not a big concern, as the users
usually know something's wrong with the columns. But we should introduce
some 'autodetect' but let's leave that for the future.

> So I guess we should turn our attention to how we'd obtain reasonably good 
> estimates
> of dist(A,B), and return to the current discussion once the other pieces are 
> in place.
> 
> I think that maybe it'd be acceptable to scan a large portion of the
> table to estimate dist(A,B), *if* we must only do so only once in a while. 
> But even with
> a full scan, how would we arrive at an estimate for dist(A,B)? Keeping all 
> values in memory,
> and spilling them into, say, an on-disk hash table adds even more overhead to 
> the already
> expensive full scan. Maybe using a bloom filter instead of a hash table could 
> avoid
> the spilling to disk, in exchange for a slightly less precise result...

I have no idea what a Bloom filter is (shame on me). I was not thinking
about collecting the stats, I was interested primarily in what data do
we actually need. And my knowledge about the algorithms currently used
is very limited :-(

But I agree we should at least discuss the possible solutions. Until now
I've done something like this

   SELECT COUNT(DISTINCT a) AS dist_a,
  COUNT(DISTINCT b) AS dist_b,
  COUNT(DISTINCT a || ':' || b) AS dist_ab FROM my_table;

but that's not very efficient.

My plan for the near future (a few weeks) is to build a simple 'module'
with the ability to estimate the number of rows for a given condition.
This could actually be quite useful as a stand-alone contrib module, as
the users often ask how to get a number of rows fast (usually for paging).

That may be quite slow when the query returns too many rows, even when
there is an index. It may be even much slower than the actual query (as
it usually contains a small LIMIT).

An estimate is often sufficient, but the 'pg_class.tuples' does not
really work with conditions. So this might be an improvement ...

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimization histograms

2010-12-21 Thread Kevin Grittner
amit sehas  wrote:
 
> for the histograms for cost based optimization, is there a rule of
> thumb on how often to rebuild them?
 
In recent major versions, autovacuum should normally keep you in
good shape.  The exception is when you make major changes to the
contents of a table (such as in a bulk data load) and then
immediately try to use the table before autovacuum has had time to
notice the activity and generate fresh statistics; for these cases
you probably want to do a manual run.
 
For more information, see:
 
http://www.postgresql.org/docs/current/interactive/routine-vacuuming.html#VACUUM-FOR-STATISTICS
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] optimization histograms

2010-12-21 Thread amit sehas
HI,

for the histograms for cost based optimization, is there a rule of thumb on how 
often to rebuild them? They are obviously not being continuously updated...what 
is the state of the art in this area, do all the other databases also end up 
with stale statistics every now and then and have to keep rebuilding the stats?

thanks
-Amit


  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread tv
> On Dec21, 2010, at 15:51 , t...@fuzzy.cz wrote:
 This is the reason why they choose to always combine the values (with
 varying weights).
>>>
>>> There are no varying weights involved there. What they do is to express
>>> P(A=x,B=y) once as
>>>
>>> ...
>>>
>>>  P(A=x,B=y) ~= P(B=y|A=x)*P(A=x)/2 + P(A=x|B=y)*P(B=y)/2
>>>  = dist(A)*P(A=x)/(2*dist(A,B)) +
>>> dist(B)*P(B=x)/(2*dist(A,B))
>>>  = (dist(A)*P(A=x) + dist(B)*P(B=y)) / (2*dist(A,B))
>>>
>>> That averaging steps add *no* further data-dependent weights.
>>
>> Sorry, by 'varying weights' I didn't mean that the weights are different
>> for each value of A or B. What I meant is that they combine the values
>> with different weights (just as you explained).
>
> I'm still not sure we're on the same page here. The resulting formula
> is *not* a weighted average of P(A=x) and P(B=y), since in general
> dist(A) + dist(B) = 2*dist(A,B) does *not* hold. It may look like one
> syntactically, but that's about it.

OK, another crazy usage or 'weights' on my side :-(

What I meant is that in the end you have two equations of P(A,B):

  P(A=x|B=y)*P(B=y) = dist(B)*P(B=y)/dist(A,B)
  P(B=y|A=x)*P(A=x) = dist(A)*P(A=x)/dist(A,B)

and you need to combine those two estimates. They did that by averaging,
as they don't know which of the estimates is better.

Generally I think that is a good solution, unless you know one of the
estimates is much more reliable (although I'm not sure we should
completely omit the other estimate).

> The resulting formula instead is an *unweighted* (weights 1) average of
> the two estimates P(B=y|A=x)*P(A=x) and P(A=x|B=y)*P(B=y). You might just
> as well estimate P(A=x,B=y) with
>
>   P(B=y|A=x)*P(A=x) = dist(A)*P(A=x)/dist(A,B)
>
> and it's *still* be the very same uniform bayesian approach, just no
> longer symmetric in A and B. Which may easily be preferable if you
> have reasons to believe that this estimate is more correct than the
> one obtained by swapping A and B. The original paper doesn't deal with
> that case simply because they don't mention how P(A=x) and P(B=y)
> are obtained at all. The postgres estimator, on the other hand,
> knows quite well how it derived P(A=x) and P(B=y) and may have much
> higher confidence in one value than in the other.

OK, good point. I haven't realized that one of the estimates may be much
more reliable.

But let's assume both estimates are about the same (regarding reliability)
and let's see the following example

 A | B
 =
 1 | 1
 1 | 1
 1 | 1
 1 | 2
 2 | 1
 2 | 2
 2 | 2
 2 | 2

Thus dist(A)=dist(B)=2, dist(A,B)=4 and

  P(A=1)=P(A=2)=P(B=1)=P(B=2)=1/2
  P(A=1,B=1)=P(A=2,B=2)=3/8
  P(A=1,B=2)=P(A=1,B=1)=1/8

According to the formula presented in the paper, the partial estimates for
P(A=1,B=2) are

  P(A=1|B=2)*P(B=2) = dist(A)/dist(A,B) * P(B=2) = 2/4 * 1/2 = 1/4
  P(B=2|A=1)*P(A=1) = dist(B)/dist(A,B) * P(A=1) = 2/4 * 1/2 = 1/4

Thus P(A=1,B=2) = (1/4 + 1/4)/2 = 1/4, so it's overestimated (2x)

 A | B
 =
 1 | 1
 1 | 2
 1 | 2
 1 | 2
 2 | 1
 2 | 1
 2 | 1
 2 | 2

This obviously has exactly the same features (regarding number of distinct
values), and the produced estimate is exactly the same. But in this case

  P(A=1,B=2)=P(A=2,B=1)=3/8
  P(A=1,B=1)=P(A=2,B=2)=1/8

and thus the 1/4 is an underestimate (compared to 3/8).

The problem is the F(A,B) does not change at all. It's very simple to
construct examples (just use more rows) where F(A,B) returns exactly the
same value, but the estimates are off. The averaging somehow (smooths)
this of ...

But I think I'm missing something about how to use the F(?,?) to derive
the final estimate. So maybe the resulting estimate would be better.

Say there are two tables

   A | B | number of such rows
  ==
   1 | 1 | 1000
   1 | 2 | 1000
   2 | 1 | 1000
   1 | 2 | 1000

   A | B | number of such rows
  ==
   1 | 1 | 1
   1 | 2 | 1999
   2 | 1 | 1999
   1 | 2 | 1

How would you estimate the P(A=1,B=1) in those cases? Assume that both
estimates are equally reliable - i.e. deduced from a histogram or MCV.

>
> Assume for example that you're preparing the statement
>
>   SELECT * FROM T WHERE A = ? AND B = 1
>
> We'll then estimate P(A=?) as 1/dist(A), since we cannot do any better
> without an actual value for the parameter "?". The estimate for P(B=1),
> on the other hand, can use the histogram, and will thus very likely be
> much more precise. The two estimates for P(A=?,B=1) in this case are
>
>   P(A=?,B=1)*P(B=1) = dist(B)*P(B=1)/dist(A,B), and
>   P(B=1,A=?)*P(A=1) = dist(A)*P(A=?)/dist(A,B).
>
> There's a good chance that the former beats the latter, and thus also
> the average, then.

OK, good point. I was not thinking about prepared statements. In this case
it makes sense to use only one of the estimates ...

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/

Re: [HACKERS] Owner inheritance

2010-12-21 Thread Andrew Dunstan



On 12/21/2010 07:04 AM, gsdfg gdfg wrote:
Would be great if owner can be inherited from parent object (owner 
table ==> schema owner ==> database owner).

CREATE statement could add OWNER TO PARENT to cover this feature.



That syntax would violate POLA in the case of inherited tables (OWNER TO 
CONTAINER, or just OWNER TO SCHEMA etc might be clearer). And I think 
we'd have to restrict it to superusers anyway, which would seriously 
limit its usefulness.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Owner inheritance

2010-12-21 Thread Tom Lane
gsdfg gdfg  writes:
> Would be great if owner can be inherited from parent object (owner table ==>
> schema owner ==> database owner).
> CREATE statement could add OWNER TO PARENT to cover this feature.

What it would be is a great security hole --- exactly analogous to
allowing Unix "chown" to non-superusers.  Read up on the security
pitfalls of being able to give away ownership of an object.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, patch 22 (cleanup, review, cleanup)

2010-12-21 Thread Dimitri Fontaine
"Erik Rijkers"  writes:
>>   http://pgsql.tapoueh.org/extensions/doc/html/sql-alterextension.html
[...]
> Two changes to sql-alterextension.sgml:

Fixed and uploaded on the URL above, will be in the next patch revision,
thanks!

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest wrap-up

2010-12-21 Thread Tom Lane
Robert Haas  writes:
> On Wed, Dec 15, 2010 at 11:29 AM, Tom Lane  wrote:
>>> - Writeable CTEs - I think we need Tom to pick this one up.
>>> - Fix snapshot taking inconsistencies - Ready for committer. Can any
>>> committer pick this up?

>> Will take a look at these two also.

> Tom, what is your time frame on this?  I think we should wrap up the
> CF without these and bundle 9.1alpha3 unless you plan to get to this
> in the next day or two.

We probably shouldn't hold up the alpha for these, if there are no
other items outstanding.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Owner inheritance

2010-12-21 Thread gsdfg gdfg
Would be great if owner can be inherited from parent object (owner table ==>
schema owner ==> database owner).
CREATE statement could add OWNER TO PARENT to cover this feature.

Michel


Re: [HACKERS] CommitFest wrap-up

2010-12-21 Thread Robert Haas
On Wed, Dec 15, 2010 at 11:29 AM, Tom Lane  wrote:
>>> - Writeable CTEs - I think we need Tom to pick this one up.
>>> - Fix snapshot taking inconsistencies - Ready for committer. Can any
>>> committer pick this up?
>
> Will take a look at these two also.

Tom, what is your time frame on this?  I think we should wrap up the
CF without these and bundle 9.1alpha3 unless you plan to get to this
in the next day or two.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread Florian Pflug
On Dec21, 2010, at 13:25 , t...@fuzzy.cz wrote:
> And there's one additional - IMHO very important - requirement. The whole
> thing should easily extend to more than two columns. This "IF (F(A,B) >
> F(B,A)) THEN ..." probably is not a good solution regarding this.
> 
> For example given 3 columns A,B,C, would you do that comparison for each
> pair of columns, or would you do that for A vs (B,C)? Or maybe a
> completely different approach? Because that would require to collect a lot
> more data (number of distinct values in each combination) etc.

That's certainly a valid concern. The uniform bayesian approach avoids that
quite beautifully...

> Hmmm, maybe we could give this possibility (to identify two separate
> groups of columns) to the user. So instead of 'buid stats for (A,B,C)' the
> user would say 'build stats for (A,B) and (C)' - this actually represents
> apriori knowledge of dependencies supplied by the user.
> 
> In that case we could search for 'implicativeness' between those two
> groups (and not within the groups), and we could restrict ourselves to 2D
> (and thus use a more sophisticated formula).

Hm, I hated this idea at first, but I'm starting to like it more and more.
It *does* seem rather unrealistic that a user would know that a bunch of
columns are correlated, but have no idea in what way... 

Any examples when this's be the case would be very much appreciated - Maybe
we should ask around on -general about this?

> But we should be able to do some basic analysis even when the user
> supplies a list of columns without such apriori knowledge.

That, I think, overcomplicates things, at least for a first cut.

To summarize, I think you've shown quite nicely that the uniform bayesian
approach is a very sensible first step towards better estimates in the case
of correlated columns. It's statistically sound, and the dist(A,B) estimates
it requires are probably a necessary ingredient of any solution to the problem.
If we can make it degrade more gracefully if the columns are uncorrelated we
should do that, but if we can't thats still no reason to drop the whole idea.

So I guess we should turn our attention to how we'd obtain reasonably good 
estimates
of dist(A,B), and return to the current discussion once the other pieces are in 
place.

I think that maybe it'd be acceptable to scan a large portion of the
table to estimate dist(A,B), *if* we must only do so only once in a while. But 
even with
a full scan, how would we arrive at an estimate for dist(A,B)? Keeping all 
values in memory,
and spilling them into, say, an on-disk hash table adds even more overhead to 
the already
expensive full scan. Maybe using a bloom filter instead of a hash table could 
avoid
the spilling to disk, in exchange for a slightly less precise result...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, patch 22 (cleanup, review, cleanup)

2010-12-21 Thread Erik Rijkers
On Tue, December 21, 2010 09:57, Dimitri Fontaine wrote:
> "Erik Rijkers"  writes:
>> I might be mistaken but it looks like a 
>> doc/src/sgml/ref/alter_extension.sgml is missing?
>
> Mmm, it seems that git was agreeing with you, so here's it:
>
>   git ls-files doc/src/sgml/ref/alter_extension.sgml
>   
> http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=9371a9763651df2636cb6c20dced7cd67398c477
>
> It was already online for readers of the HTML version of the docs:
>
>   http://pgsql.tapoueh.org/extensions/doc/html/sql-alterextension.html
>
> And it will appear in next revision of the patch. Thanks!
> --
> Dimitri Fontaine
> http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
>

Two changes to sql-alterextension.sgml:

  ALTER EXTENSION name SET EXTENSION new_schema

should be:

  ALTER EXTENSION name SET SCHEMA new_schema




And in the 'Description' there are (I think) old copy/paste remnants:

  ALTER EXTENSION changes the definition of an existing type. There are only 
one subforms:
  SET SCHEMA

it should be (something like):

  ALTER EXTENSION changes an existing extension. There is only one form:
  ALTER EXTENSION set schema new_schema




Erik Rijkers





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread Florian Pflug
On Dec21, 2010, at 15:51 , t...@fuzzy.cz wrote:
>>> This is the reason why they choose to always combine the values (with
>>> varying weights).
>> 
>> There are no varying weights involved there. What they do is to express
>> P(A=x,B=y) once as
>> 
>> ...
>> 
>>  P(A=x,B=y) ~= P(B=y|A=x)*P(A=x)/2 + P(A=x|B=y)*P(B=y)/2
>>  = dist(A)*P(A=x)/(2*dist(A,B)) +
>> dist(B)*P(B=x)/(2*dist(A,B))
>>  = (dist(A)*P(A=x) + dist(B)*P(B=y)) / (2*dist(A,B))
>> 
>> That averaging steps add *no* further data-dependent weights.
> 
> Sorry, by 'varying weights' I didn't mean that the weights are different
> for each value of A or B. What I meant is that they combine the values
> with different weights (just as you explained).

I'm still not sure we're on the same page here. The resulting formula
is *not* a weighted average of P(A=x) and P(B=y), since in general
dist(A) + dist(B) = 2*dist(A,B) does *not* hold. It may look like one
syntactically, but that's about it.

The resulting formula instead is an *unweighted* (weights 1) average of
the two estimates P(B=y|A=x)*P(A=x) and P(A=x|B=y)*P(B=y). You might just
as well estimate P(A=x,B=y) with

  P(B=y|A=x)*P(A=x) = dist(A)*P(A=x)/dist(A,B)

and it's *still* be the very same uniform bayesian approach, just no
longer symmetric in A and B. Which may easily be preferable if you
have reasons to believe that this estimate is more correct than the
one obtained by swapping A and B. The original paper doesn't deal with
that case simply because they don't mention how P(A=x) and P(B=y)
are obtained at all. The postgres estimator, on the other hand,
knows quite well how it derived P(A=x) and P(B=y) and may have much
higher confidence in one value than in the other.

Assume for example that you're preparing the statement

  SELECT * FROM T WHERE A = ? AND B = 1

We'll then estimate P(A=?) as 1/dist(A), since we cannot do any better
without an actual value for the parameter "?". The estimate for P(B=1),
on the other hand, can use the histogram, and will thus very likely be
much more precise. The two estimates for P(A=?,B=1) in this case are

  P(A=?,B=1)*P(B=1) = dist(B)*P(B=1)/dist(A,B), and
  P(B=1,A=?)*P(A=1) = dist(A)*P(A=?)/dist(A,B).

There's a good chance that the former beats the latter, and thus also
the average, then.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread tv
> On Dec21, 2010, at 11:37 , t...@fuzzy.cz wrote:
>> I doubt there is a way to this decision with just dist(A), dist(B) and
>> dist(A,B) values. Well, we could go with a rule
>>
>>  if [dist(A) == dist(A,B)] the [A => B]
>>
>> but that's very fragile. Think about estimates (we're not going to work
>> with exact values of dist(?)), and then about data errors (e.g. a city
>> matched to an incorrect ZIP code or something like that).
>
> Huh? The whole point of the F(A,B)-exercise is to avoid precisely this
> kind of fragility without penalizing the non-correlated case...

Yes, I understand the intention, but I'm not sure how exactly do you want
to use the F(?,?) function to compute the P(A,B) - which is the value
we're looking for.

If I understand it correctly, you proposed something like this

  IF (F(A,B) > F(B,A)) THEN
P(A,B) := c*P(A);
  ELSE
P(A,B) := d*P(B);
  END IF;

or something like that (I guess c=dist(A)/dist(A,B) and
d=dist(B)/dist(A,B)). But what if F(A,B)=0.6 and F(B,A)=0.59? This may
easily happen due to data errors / imprecise estimate.

And this actually matters, because P(A) and P(B) may be actually
significantly different. So this would be really vulnerable to slight
changes in the estimates etc.

>> This is the reason why they choose to always combine the values (with
>> varying weights).
>
> There are no varying weights involved there. What they do is to express
> P(A=x,B=y) once as
>
> ...
>
>   P(A=x,B=y) ~= P(B=y|A=x)*P(A=x)/2 + P(A=x|B=y)*P(B=y)/2
>   = dist(A)*P(A=x)/(2*dist(A,B)) +
> dist(B)*P(B=x)/(2*dist(A,B))
>   = (dist(A)*P(A=x) + dist(B)*P(B=y)) / (2*dist(A,B))
>
> That averaging steps add *no* further data-dependent weights.

Sorry, by 'varying weights' I didn't mean that the weights are different
for each value of A or B. What I meant is that they combine the values
with different weights (just as you explained).

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread Florian Pflug
On Dec21, 2010, at 11:37 , t...@fuzzy.cz wrote:
> I doubt there is a way to this decision with just dist(A), dist(B) and
> dist(A,B) values. Well, we could go with a rule
> 
>  if [dist(A) == dist(A,B)] the [A => B]
> 
> but that's very fragile. Think about estimates (we're not going to work
> with exact values of dist(?)), and then about data errors (e.g. a city
> matched to an incorrect ZIP code or something like that).

Huh? The whole point of the F(A,B)-exercise is to avoid precisely this
kind of fragility without penalizing the non-correlated case...

> This is the reason why they choose to always combine the values (with
> varying weights).

There are no varying weights involved there. What they do is to express
P(A=x,B=y) once as

  P(A=x,B=y) = P(B=y|A=x)*P(A=x) and then as
  P(A=x,B=y) = P(A=x|B=y)*P(B=y).

Then they assume

  P(B=y|A=x) ~= dist(A)/dist(A,B) and
  P(A=x|B=y) ~= dist(B)/dist(A,B),

and go on to average the two different ways of computing P(A=x,B=y), which
finally gives

  P(A=x,B=y) ~= P(B=y|A=x)*P(A=x)/2 + P(A=x|B=y)*P(B=y)/2
  = dist(A)*P(A=x)/(2*dist(A,B)) + dist(B)*P(B=x)/(2*dist(A,B))
  = (dist(A)*P(A=x) + dist(B)*P(B=y)) / (2*dist(A,B))

That averaging steps add *no* further data-dependent weights. 

>> I'd like to find a statistical explanation for that definition of
>> F(A,B), but so far I couldn't come up with any. I created a Maple 14
>> worksheet while playing around with this - if you happen to have a
>> copy of Maple available I'd be happy to send it to you..
> 
> No, I don't have Maple. Have you tried Maxima
> (http://maxima.sourceforge.net) or Sage (http://www.sagemath.org/). Sage
> even has an online notebook - that seems like a very comfortable way to
> exchange this kind of data.

I haven' tried them, but I will. That java-based GUI of Maple is driving
me nuts anyway... Thanks for the pointers!

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Florian Pflug
On Dec21, 2010, at 12:48 , Robert Haas wrote:
> 2010/12/21 Tomáš Mudruňka :
>> Is there possibility of having internal base converting function in PgSQL?
>> There are already functions for converting between decimal and hexadecimal
>> notations i think pgsql can be able to convert between number with radixes
>> from 1 to 36 (actually fast (de)encoding base36 is what i need)...
> 
> It should be pretty easy to write such a function in C, perhaps using
> strtol() or strtoul().

If you're not comfortable doing this in C, you might also want to consider
one of procedural languages pl/pgsql, pl/perl, pl/python. pl/pgsql is probably
only viable if you just need this for ints and bigints, unless you don't
care about performance.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bug in ts_rank_cd

2010-12-21 Thread Sushant Sinha
MY PREV EMAIL HAD A PROBLEM. Please reply to this one
==

There is a bug in ts_rank_cd. It does not correctly give rank when the
query lexeme is the first one in the tsvector.

Example:

select ts_rank_cd(to_tsvector('english', 'abc sdd'),
plainto_tsquery('english', 'abc'));   
 ts_rank_cd 

  0

select ts_rank_cd(to_tsvector('english', 'bcg abc sdd'),
plainto_tsquery('english', 'abc'));
 ts_rank_cd 

0.1

The problem is that the Cover finding algorithm ignores the lexeme at
the 0th position, I have attached a patch which fixes it. After the
patch the result is fine.

select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery(
'english', 'abc'));
 ts_rank_cd 

0.1

--- postgresql-9.0.0/src/backend/utils/adt/tsrank.c	2010-01-02 22:27:55.0 +0530
+++ postgres-9.0.0-tsrankbugfix/src/backend/utils/adt/tsrank.c	2010-12-21 18:39:57.0 +0530
@@ -551,7 +551,7 @@
 	memset(qr->operandexist, 0, sizeof(bool) * qr->query->size);
 
 	ext->p = 0x7fff;
-	ext->q = 0;
+	ext->q = -1;
 	ptr = doc + ext->pos;
 
 	/* find upper bound of cover from current position, move up */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bug in ts_rank_cd

2010-12-21 Thread Sushant Sinha
There is a bug in ts_rank_cd. It does not correctly give rank when the
query lexeme is the first one in the tsvector.

Example:

select ts_rank_cd(to_tsvector('english', 'abc sdd'),
plainto_tsquery('english', 'abc'));   
 ts_rank_cd 

  0

select ts_rank_cd(to_tsvector('english', 'bcg abc sdd'),
plainto_tsquery('english', 'abc'));
 ts_rank_cd 

0.1

The problem is that the Cover finding algorithm ignores the lexeme at
the 0th position, I have attached a patch which fixes it. After the
patch the result is fine.

select ts_rank_cd(to_tsvector('english', 'abc sdd'), plainto_tsquery(
'english', 'abc'));
 ts_rank_cd 

0.1

--- postgresql-9.0.0/src/backend/utils/adt/tsrank.c	2010-01-02 22:27:55.0 +0530
+++ postgres-9.0.0-tsrankbugfix/src/backend/utils/adt/tsrank.c	2010-12-21 18:39:57.0 +0530
@@ -551,7 +551,7 @@
 	memset(qr->operandexist, 0, sizeof(bool) * qr->query->size);
 
 	ext->p = 0x7fff;
-	ext->q = 0;
+	ext->q = -1;
 	ptr = doc + ext->pos;
 
 	/* find upper bound of cover from current position, move up */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2010-12-21 Thread Robert Haas
On Mon, Dec 20, 2010 at 6:42 AM, Itagaki Takahiro
 wrote:
> On Sun, Dec 19, 2010 at 12:45, Robert Haas  wrote:
>> I'm not questioning any of that.  But I'd like the resulting code to
>> be as maintainable as we can make it.
>
> I added comments and moved some setup codes for COPY TO to BeginCopyTo()
> for maintainability. CopyTo() still contains parts of initialization,
> but I've not touched it yet because we don't need the arrangement for now.

I haven't analyzed this enough to know whether I agree with it, but as
a trivial matter you should certainly revert this hunk:

/* field raw data pointers found by COPY FROM */
-
-   int max_fields;
-   char ** raw_fields;
+   int max_fields;
+   char  **raw_fields;


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2010-12-21 Thread Itagaki Takahiro
On Tue, Dec 21, 2010 at 20:14, Shigeru HANADA  wrote:
> Attached is the revised version of file_fdw patch.  This patch is
> based on Itagaki-san's copy_export-20101220.diff patch.

#1. Don't you have per-tuple memory leak? I added GetCopyExecutorState()
because the caller needs to reset the per-tuple context periodically.

Or, if you eventually make a HeapTuple from values and nulls arrays,
you could modify NextCopyFrom() to return a HeapTuple instead of values,
nulls, and tupleOid. The reason I didn't use HeapTuple is that I've
seen arrays were used in the proposed FDW APIs. But we don't have to
use such arrays if you use HeapTuple based APIs.

IMHO, I prefer HeapTuple because we can simplify NextCopyFrom and
keep EState private in copy.c.

#2. Can you avoid making EXPLAIN text in fplan->explainInfo on
non-EXPLAIN cases? It's a waste of CPU cycles in normal executions.
I doubt whether FdwPlan.explainInfo field is the best design.
How do we use the EXPLAIN text for XML or JSON explain formats?
Instead, we could have an additional routine for EXPLAIN.

#3. Why do you re-open a foreign table in estimate_costs() ?
Since the caller seems to have the options for them, you can
pass them directly, no?

In addition, passing a half-initialized fplan to estimate_costs()
is a bad idea. If you think it is an OUT parameter, the OUT params
should be *startup_cost and *total_cost.

#4. It'a minor cosmetic point, but our coding conventions would be
we don't need (void *) when we cast a pointer to void *, but need
(Type *) when we cast a void pointer to another type.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread tv
> On Mon, Dec 20, 2010 at 9:29 PM, Florian Pflug  wrote:
>> You might use that to decide if either A->B or B->a looks function-like
>> enough to use the uniform bayesian approach. Or you might even go
>> further,
>> and decide *with* bayesian formula to use - the paper you cited always
>> averages
>>
>>  P(A=x|B=y)*P(B=y) and
>>  P(B=y|A=x)*P(A=x)
>>
>> but they offer no convincing reason for that other than "We don't know
>> which to pick".
>
> Ideally you want to somehow make this a continuous transaition between
> the available formulas rather than a discrete transition, I think.  If
> F(A,B) = 1 then the selectivity of A = x AND B = y is just P(A=x), and
> if it's 0, then it's P(A=x)*P(B=y).  But suppose F(A,B)=0.5.  Then
> what?  A naive approach would be to estimate P(A=x && B=y) = P(A=x) *
> (1 - (1 - F(A,B))*(1 - P(B = y))), so that if, say, P(A=x) = 0.1 and
> P(B=y) = 0.1, then when F(A,B) = 0 we estimate 0.01, when F(A,B) = 1
> we estimate 0.1, and when F(A,B) = 0.5 we estimate (0.1)(1 - 0.5*0.9)
> = 0.055.  Of course I'm just hand-waving here, and this is without any
> mathematical basis, being just the simplest formula I could think of
> that gets the endpoints right and plots some sort of smooth curve
> between them in the middle.  A similar formula with a believable
> argument to back it up seems like it would be a big step forward for
> this method.

This somehow reminds me how the various t-norms in fuzzy logic evolved.
I'm not saying we should use fuzzy logic here, but the requirements are
very similar so it might be an interesting inspiration. See for example
this http://plato.stanford.edu/entries/logic-fuzzy (chapter 4).

And there's one additional - IMHO very important - requirement. The whole
thing should easily extend to more than two columns. This "IF (F(A,B) >
F(B,A)) THEN ..." probably is not a good solution regarding this.

For example given 3 columns A,B,C, would you do that comparison for each
pair of columns, or would you do that for A vs (B,C)? Or maybe a
completely different approach? Because that would require to collect a lot
more data (number of distinct values in each combination) etc.

I'm not saying for example there is a table with (C=A+B)

  A | B | C
 ===
  1 | 1 | 2
  1 | 2 | 3
  1 | 3 | 4
  2 | 1 | 3
  2 | 2 | 4
  2 | 3 | 5
  3 | 1 | 4
  3 | 2 | 5
  3 | 3 | 6

So that dist(A)=dist(B)=3, dist(C)=6 and dist(A,B,C)=dist(A,B)=9. Given
the paper, you get something like

 P(A,B,C) = [dist(A)*P(A) +  dist(B)*P(B) + dist(C)*P(C)] / [3*dist(A,B,C)]
  = [P(A) + P(B) + 2*P(C)] / 9

so for example

 P(A=3,B=2,C=5) = [1/3 + 1/3 + 2/9]/9 = (8/9)/9

which is almost correct (by 1/81).

Don't get me wrong - I'm not a fanatic who thinks this particular formula
is the best formula possible. I'm just saying we could end up with a
formula that works beautifully in 2D, but once we get to 3 columns it
fails miserably.

Hmmm, maybe we could give this possibility (to identify two separate
groups of columns) to the user. So instead of 'buid stats for (A,B,C)' the
user would say 'build stats for (A,B) and (C)' - this actually represents
apriori knowledge of dependencies supplied by the user.

In that case we could search for 'implicativeness' between those two
groups (and not within the groups), and we could restrict ourselves to 2D
(and thus use a more sophisticated formula).

But we should be able to do some basic analysis even when the user
supplies a list of columns without such apriori knowledge.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Pavel Stehule
Dne 21. prosince 2010 12:48 Robert Haas  napsal(a):
> 2010/12/21 Tomáš Mudruňka :
>> Is there possibility of having internal base converting function in PgSQL?
>> There are already functions for converting between decimal and hexadecimal
>> notations i think pgsql can be able to convert between number with radixes
>> from 1 to 36 (actually fast (de)encoding base36 is what i need)...
>
> It should be pretty easy to write such a function in C, perhaps using
> strtol() or strtoul().  Because PostgreSQL uses an extensible
> architecture, you could load such a function into your copy of
> PostgreSQL and use it in your environment even if it weren't part of
> the core distribution.  There are a number of existing "contrib"
> modules that you can look at for examples of how to do this.
>
> Whether or not we'd accept a patch to add such a function to core or
> contrib, I'm not sure.  Nobody's written one yet...

Most used transformations are available from core now - just need a
wrapper function.

This functions isn't a clean, - should be based on int, long int or bytea?

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can postgres create a file with physically continuous blocks.

2010-12-21 Thread Robert Haas
On Sun, Dec 19, 2010 at 1:10 PM, Jim Nasby  wrote:
> On Dec 19, 2010, at 1:10 AM, flyusa2010 fly wrote:
>> Does postgres make an effort to create a file with physically continuous 
>> blocks?
>
> AFAIK all files are expanded as needed. I don't think there's any flags you 
> can pass to the filesystem to tell it "this file will eventually be 1GB in 
> size". So, we're basically at the mercy of the FS to try and keep things 
> contiguous.

There have been some reports that we would do better on some
filesystems if we extended the file more than a block at a time, as we
do today.  However, AFAIK, no one is pursuing this ATM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [FeatureRequest] Base Convert Function

2010-12-21 Thread Robert Haas
2010/12/21 Tomáš Mudruňka :
> Is there possibility of having internal base converting function in PgSQL?
> There are already functions for converting between decimal and hexadecimal
> notations i think pgsql can be able to convert between number with radixes
> from 1 to 36 (actually fast (de)encoding base36 is what i need)...

It should be pretty easy to write such a function in C, perhaps using
strtol() or strtoul().  Because PostgreSQL uses an extensible
architecture, you could load such a function into your copy of
PostgreSQL and use it in your environment even if it weren't part of
the core distribution.  There are a number of existing "contrib"
modules that you can look at for examples of how to do this.

Whether or not we'd accept a patch to add such a function to core or
contrib, I'm not sure.  Nobody's written one yet...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SignalSomeChildren

2010-12-21 Thread Robert Haas
On Mon, Dec 20, 2010 at 3:14 PM, Robert Haas  wrote:
> On Mon, Dec 20, 2010 at 3:11 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Mon, Dec 20, 2010 at 2:23 PM, Tom Lane  wrote:
 I like that better actually ... one less thing for developers to get wrong.
>>
>>> The attached patch appears to work correctly on MacOS X.  I did check,
>>> BTW: getppid() in the attached process returns gdb's pid.  Poor!
>>
>> Looks good to me.
>>
>>> For my own purposes, I would be just as happy to apply this only to
>>> master.  But I wonder if anyone wants to argue for back-patching, to
>>> help debug existing installations?
>>
>> Given the lack of non-developer complaints, I see no need to backpatch.
>
> Well, non-developers don't tend to attach gdb very often.  Alvaro
> mentioned a problem installation upthread, thus the question.

Hearing no cries of "please-oh-please-backpatch-this", I've committed
it just to master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread Robert Haas
On Mon, Dec 20, 2010 at 9:29 PM, Florian Pflug  wrote:
> I tried to pick up Robert's idea of quantifying "Implicativeness" -
> i.e., finding a number between 0 and 1 that describes how close the
> (A,B) are to representing a function A -> B.

Actually Heikki's idea...

> Observe that dist(A),dist(B) <= dist(A,B) <= dist(A)*dist(B) if the
> estimates of dist(?) are consistent. From that you easily get
>
>  dist(A,B)/dist(B) <= dist(A) <= dist(A,B) and
>  dist(A,B)/dist(A) <= dist(B) <= dist(A,B)
>
> If dist(A) == dist(A,B), then there is a functional dependency
> A -> B, and conversely if dist(B) == dist(A,B) there is a functional
> dependency B -> A. Note that you can have both at the same time!
>
> On the other hand, if dist(B) = dist(A,B)/dist(A), then B has the
> smallest number of distinct values possible for a given combination
> of dist(A,B) and dist(A). This is the anti-function case.
>
> This motivates the definition
>
>  F(A,B) = [ dist(A)*dist(B) - dist(A,B) ] / [ dist(A,B) * ( dist(B) - 1) ]
>
> (You can probably drop the "-1", it doesn't make much of a difference
> for larger values of dist(B).
>
> F(A,B) specifies where dist(A) lies relative to dist(A,B)/dist(B) and
> dist(A,B) - a value of 0 indicates dist(A) = dist(A,B)/dist(B) while
> a value of 1 indicates that dist(A) == dist(A,B).
>
> So F(A,B) is a suitable measure of "Implicativeness" - it's higher
> if the table (A,B) looks more like a function A -> B.
>
> You might use that to decide if either A->B or B->a looks function-like
> enough to use the uniform bayesian approach. Or you might even go further,
> and decide *with* bayesian formula to use - the paper you cited always
> averages
>
>  P(A=x|B=y)*P(B=y) and
>  P(B=y|A=x)*P(A=x)
>
> but they offer no convincing reason for that other than "We don't know
> which to pick".

Ideally you want to somehow make this a continuous transaition between
the available formulas rather than a discrete transition, I think.  If
F(A,B) = 1 then the selectivity of A = x AND B = y is just P(A=x), and
if it's 0, then it's P(A=x)*P(B=y).  But suppose F(A,B)=0.5.  Then
what?  A naive approach would be to estimate P(A=x && B=y) = P(A=x) *
(1 - (1 - F(A,B))*(1 - P(B = y))), so that if, say, P(A=x) = 0.1 and
P(B=y) = 0.1, then when F(A,B) = 0 we estimate 0.01, when F(A,B) = 1
we estimate 0.1, and when F(A,B) = 0.5 we estimate (0.1)(1 - 0.5*0.9)
= 0.055.  Of course I'm just hand-waving here, and this is without any
mathematical basis, being just the simplest formula I could think of
that gets the endpoints right and plots some sort of smooth curve
between them in the middle.  A similar formula with a believable
argument to back it up seems like it would be a big step forward for
this method.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED - file_fdw

2010-12-21 Thread Shigeru HANADA
On Mon, 20 Dec 2010 20:42:38 +0900
Itagaki Takahiro  wrote:
> On Sun, Dec 19, 2010 at 12:45, Robert Haas  wrote:
> > I'm not questioning any of that.  But I'd like the resulting code to
> > be as maintainable as we can make it.
> 
> I added comments and moved some setup codes for COPY TO to BeginCopyTo()
> for maintainability. CopyTo() still contains parts of initialization,
> but I've not touched it yet because we don't need the arrangement for now.

Attached is the revised version of file_fdw patch.  This patch is
based on Itagaki-san's copy_export-20101220.diff patch.

Changes from previous version are:

* file_fdw uses CopyErrorCallback() as error context callback routine
in fileIterate() to report error context.  "CONTEXT" line in the
example below is added by the callback.

postgres=# select * From csv_tellers_bad;
ERROR:  missing data for column "bid"
CONTEXT:  COPY csv_tellers_bad, line 10: "10"
postgres=#

* Only superusers can change table-level file_fdw options.  Normal
user can't change the options even if the user was the owner of the
table.  This is for security reason.

Regards,
--
Shigeru Hanada


file_fdw-20101221.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal : cross-column stats

2010-12-21 Thread tv
> On Dec18, 2010, at 17:59 , Tomas Vondra wrote:
>> It seems to me you're missing one very important thing - this was not
>> meant as a new default way to do estimates. It was meant as an option
>> when the user (DBA, developer, ...) realizes the current solution gives
>> really bad estimates (due to correlation). In that case he could create
>> 'cross-column' statistics on those columns, and the optimizer would use
>> that info to do the estimates.
>
> I do understand that. I just have the nagging feeling that there is a
> way to judge from dist(A), dist(B) and dist(A,B) whether it makes sense
> to apply the uniform bayesian approach or to assume the columns are
> unrelated.

I doubt there is a way to this decision with just dist(A), dist(B) and
dist(A,B) values. Well, we could go with a rule

  if [dist(A) == dist(A,B)] the [A => B]

but that's very fragile. Think about estimates (we're not going to work
with exact values of dist(?)), and then about data errors (e.g. a city
matched to an incorrect ZIP code or something like that).

So for a real-world dataset, the condition [dist(A)==dist(A,B)] will
almost never hold. And about the same holds for the "uniform correlation"
assumption which is the basis for the formula I posted.

So actually we're looking for a formula that does reasonable estimates and
is robust even in cases where the correlation is not uniform or the
estimates are a reasonably unprecise.

> This motivates the definition
>
> F(A,B) = [ dist(A)*dist(B) - dist(A,B) ] / [dist(A,B) * ( dist(B) - 1)]
>
> (You can probably drop the "-1", it doesn't make much of a difference
> for larger values of dist(B).
>
> F(A,B) specifies where dist(A) lies relative to dist(A,B)/dist(B) and
> dist(A,B) - a value of 0 indicates dist(A) = dist(A,B)/dist(B) while
> a value of 1 indicates that dist(A) == dist(A,B).
>
> So F(A,B) is a suitable measure of "Implicativeness" - it's higher
> if the table (A,B) looks more like a function A -> B.
>
> You might use that to decide if either A->B or B->a looks function-like
> enough to use the uniform bayesian approach. Or you might even go further,
> and decide *with* bayesian formula to use - the paper you cited always
> averages
>
>   P(A=x|B=y)*P(B=y) and
>   P(B=y|A=x)*P(A=x)
>
> but they offer no convincing reason for that other than "We don't know
> which to pick".

Well, the reason why they chose the sum/2 approach is they were unable to
infer which of the values is 'better' and the sum/2 limits the errors.

I haven't studied this thoroughly, but my impression is that you are going
in the same direction as they did, i.e. while they've done

   P(A,B) = (P(A|B)*P(A) + P(B|A)*P(B)) / 2

with P(A|B) = dist(A) / dist(A,B), you've chosen P(A|B) ~ F(B,A) or
something like that.

You'll probably object that you could compute F(A,B) and F(B,A) and then
use only the part corresponding to the larger value, but what if the
F(A,B) and F(B,A) are about the same?

This is the reason why they choose to always combine the values (with
varying weights).

> I'd like to find a statistical explanation for that definition of
> F(A,B), but so far I couldn't come up with any. I created a Maple 14
> worksheet while playing around with this - if you happen to have a
> copy of Maple available I'd be happy to send it to you..

No, I don't have Maple. Have you tried Maxima
(http://maxima.sourceforge.net) or Sage (http://www.sagemath.org/). Sage
even has an online notebook - that seems like a very comfortable way to
exchange this kind of data.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, patch 22 (cleanup, review, cleanup)

2010-12-21 Thread Dimitri Fontaine
Alvaro Herrera  writes:
>  function  linkend="functions-extension">pg_extension_flag_dump
[...]
> So presumably this shouldn't be documented because it cannot be called
> anyway?

It can be called but only from an extension's script.

> To be honest I don't understand the purpose of this part of the patch.

So the problem we're offering a solution for, here, is the extension
with user data problem: the extension infrastructure is only there so
that pg_dump knows to filter OUT sql objects from its dump, prefering a
single create extension command.  Some extension allows users to control
the data in some of they're objects: now you want to have those in the
backup again.

>From the docs:

  
http://pgsql.tapoueh.org/extensions/doc/html/functions-admin.html#FUNCTIONS-EXTENSION

  pg_extension_with_user_data allows extension's author to prepare
  installation scripts that will work well for initial installation and
  when restoring from a pg_dump backup, which issues CREATE EXTENSION
  foo WITH NO USER DATA;. See CREATE EXTENSION for details. One way to
  use it is as following:

  DO $$
   BEGIN
IF pg_extension_with_user_data() THEN
  create schema foo;
  create table foo.bar(id serial primary key);
  perform pg_extension_flag_dump('foo.bar_id_seq'::regclass);
  perform pg_extension_flag_dump('foo.bar::regclass);
END IF;
   END;
  $$;

I don't really know how to improve the docs, you seem to have been
surprised by reading the CREATE EXTENSION docs but you didn't follow the
link to the function's doc with the above details, did you?

I'm open to improving things here, but I'm not seeing how yet.

> I attach some minor fixes while reading it over.  I compiled but didn't
> run it :-)

Thanks a lot, that's applied in my git repo, and I did run it
successfully! It will be part of the next patch revision.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extensions, patch 22 (cleanup, review, cleanup)

2010-12-21 Thread Dimitri Fontaine
"Erik Rijkers"  writes:
> I might be mistaken but it looks like a doc/src/sgml/ref/alter_extension.sgml 
> is missing?

Mmm, it seems that git was agreeing with you, so here's it:

  git ls-files doc/src/sgml/ref/alter_extension.sgml 
  
http://git.postgresql.org/gitweb?p=postgresql-extension.git;a=commitdiff;h=9371a9763651df2636cb6c20dced7cd67398c477

It was already online for readers of the HTML version of the docs:

  http://pgsql.tapoueh.org/extensions/doc/html/sql-alterextension.html

And it will appear in next revision of the patch. Thanks!
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers