date:20051019

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL]

2005-10-19 Thread Tino Wildenhain

Am Mittwoch, den 19.10.2005, 22:04 +0200 schrieb Tino Wildenhain:
> Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier:
> > I'm CC'ng this over to -hackers ... Tom?  Comments?
> > 
> ...

> > >> Then we are broken too :)
> > >>
> > >> # select 'a ' = 'a  ';
> > >>   ?column?
> > >> --
> > >>   f
> > >> (1 row)
> 
> 
> experiment=# SELECT 'a '::char = 'a  '::char;
>  ?column?
> --
>  t
> 
Sorry, copied wrong line :)

experiment=# SELECT 'a '::char(10) = 'a  '::char(10);
 ?column?
--
 t

and:

SELECT '|' || 'foo  '::char(10) || '|';
 ?column?
--
 |foo|


vs.


SELECT '|' || 'foo  ' || '|';
 ?column?
--
 |foo  |




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] [HACKERS] 'a' == 'a '

2005-10-19 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Chris Travers <[EMAIL PROTECTED]> writes:
> > If I understand the spec correctly, it seems to indicate that this is 
> > specific to the locale/character set.
> 
> The spec associates padding behavior with collations, which per spec are
> separate from the datatypes --- that is, you should be able to able to
> specify a collation for each string-type table column (whether char(N)
> or varchar(N)) and even for each literal string constant.  We do not
> currently have that capability, and accordingly fall back to binding
> PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N).
> 
> AFAICS this choice is allowed by the spec since the default collation is
> implementation-defined.

Does it even make sense for char(N) to not be space padded? I had the
impression char(N) was always N characters long, not more or less. I can't
picture any other character being used for padding, then you would need a more
flexible rtrim function.

And I can understand the collation order determining whether 'a' and 'a '
compare equal. But surely if you store 'a' in a varchar(N) you have to get 'a'
back out, not some other string! Does the spec really allow varchar to
actually be padded and not just compare ignoring trailing space?

(I can't believe anyone really wants varchar to be space padded. Space padding
always seemed like a legacy feature for databases with fixed record length
data types. Why would anyone want a string data type that can't represent all
strings?)

-- 
greg

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] [HACKERS] 'a' == 'a '

2005-10-19 Thread Tom Lane

Chris Travers <[EMAIL PROTECTED]> writes:
> If I understand the spec correctly, it seems to indicate that this is 
> specific to the locale/character set.

The spec associates padding behavior with collations, which per spec are
separate from the datatypes --- that is, you should be able to able to
specify a collation for each string-type table column (whether char(N)
or varchar(N)) and even for each literal string constant.  We do not
currently have that capability, and accordingly fall back to binding
PAD SPACE behavior to char(N) and NO PAD behavior to varchar(N).

AFAICS this choice is allowed by the spec since the default collation is
implementation-defined.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [GENERAL] Call for translators

2005-10-19 Thread Alvaro Herrera

Lyubomir Rusanov wrote:

> Hi,
> I am also interested in helping building Bulgarian translation for
> PostgreSQL. 
> I think that we will not have enough time for 8.1 but maybe for 8.2.

The best you could do is submit your incremental improvements for 8.1,
so there is at least _some_ translation.  Then you complete it for 8.2.
I advise you start with the client-side tools (which you can reasonably
expect to finish for 8.1), and leave the backend messages for last,
because of the size.

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos / con todos los humanos acabaré (Bender)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] [HACKERS] 'a' == 'a '

2005-10-19 Thread Chris Travers


Josh Berkus wrote:


Dann,

 


I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine.  It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area.  When I read the standard, it looked to me like
PostgreSQL was not performing correctly.  It is not unlikely that I read
it wrong.
   



AFAIT, the standard says "implementation-specific".   So we're standard.

The main cost for comparing trimmed values is performance; factoring an 
rtrim into every comparison will add significant overhead to the already 
CPU-locked process of, for example, creating indexes.  We're looking for 
ways to make the comparison operators lighter-weight, not heavier.
 

If I understand the spec correctly, it seems to indicate that this is 
specific to the locale/character set.  Assuming that the standard 
doesn't have anything to do with any character sets, it should be 
possible to make this available for those who want it as an initdb 
option.  Whether or not this is important enough to offer or not is 
another matter.


Personally my questions are:

1)  How many people have been bitten by this badly?
2)  How many people have been bitten by joins that depend on padding?

Personally, unlike case folding, this seems to be an area where a bit of 
documentation (i.e. all collation sets have are assumed to have the NO 
PAD option in the SQL standard) would be sufficient to answer to 
questions of standards-compliance.


My general perspective on this is that if trailing blanks are a significant 
hazard for your application, then trim them on data input.  That requires 
a *lot* less peformance overhead than doing it every time you compare 
something.  
 

In general I agree.  But I am not willing to jump to the conclusion that 
it will never be warranted to add this as an initdb option.  I am more 
interested in what cases people see where this would be required.  But I 
agree that the bar is much higher than it is in many other cases.


Best Wishes,
Chris Travers
Metatron Technology Consulting

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] RSS feeds of CVS revision logs

2005-10-19 Thread Joshua D. Drake

Hello,

Some of you may have noted the new project I am playing with the
PostgreSQL Source browser.
It is a Subversion->Trac interface to the PostgreSQL CVS repository. The
entire repository
is represented. It is currently updated daily but I am hoping to get
this down to hourly in

the near future.

The site can be viewed here:

http://www-new.commandprompt.com/projects/public/pgsql

The URL will change in the future, probably to pgcode.org or maybe just
projects.commandprompt.com.

I was speaking with JoshB and he mentioned that one particular facet of
this may be interesting
to you folks, which is the RSS feeds. JoshB was thinking this would be
useful for replacing
pgsql-patches@ for some people as their mailboxes get filled up. You
have the ability using any RSS

reader to subscribe to any portion of the revision log.

For example, for the docs project, they can subscribe to Trunk->docs and
thus only see the revision
logs within their feed for the docs project. At your leisure you may
either choose to see only

a brief synopsis for the complete commit message.

Here is an example of the short feed:

http://www-new.commandprompt.com/projects/public/pgsql/log/trunk?limit=100&mode=stop_on_copy&stop_rev=&format=rss

Here is an example of the long feed:

http://www-new.commandprompt.com/projects/public/pgsql/log/trunk?verbose=on&format=rss&stop_rev=&limit=100&mode=stop_on_copy

Notice the only real difference is the verbose=on

The power of this at least from my perspective, is that I use
Thunderbird as my RSS reader. I can view a current
revision, click forward, and send a question to the list about a
specific change.

Sincerely,

Joshua D. Drake
Command Prompt, Inc.

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] 'a' == 'a '

2005-10-19 Thread Dann Corbit

If there is a significant performance benefit to not expanding text columns in 
comparison operations, then it seems it should be OK.

I probably read the standard wrong, but it seems to me that varchar, char, and 
bpchar columns should all behave the same (e.g. if you do not expand with 
 or the PAD character (whatever that is) then all char type columns 
should behave the same.  I guess that there could be different default 
collations for different column types though (that is clearly allowed in the 
standard).  Perhaps it just needs to be documented in such a way that even a 
blockhead like me can comprehend it easily.

> -Original Message-
> From: Josh Berkus [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 19, 2005 5:06 PM
> To: pgsql-hackers@postgresql.org
> Cc: Dann Corbit; Stephan Szabo; Terry Fielder; Tino Wildenhain; Marc G.
> Fournier; [EMAIL PROTECTED]; pgsql-general@postgresql.org
> Subject: Re: [HACKERS] 'a' == 'a '
> 
> Dann,
> 
> > I think that whatever is done ought to be whatever the standard says.
> > If I misinterpret the standard and PostgreSQL is doing it right, then
> > that is fine.  It is just that PostgreSQL is very counter-intuitive
> > compared to other database systems that I have used in this one
> > particular area.  When I read the standard, it looked to me like
> > PostgreSQL was not performing correctly.  It is not unlikely that I read
> > it wrong.
> 
> AFAIT, the standard says "implementation-specific".   So we're standard.
> 
> The main cost for comparing trimmed values is performance; factoring an
> rtrim into every comparison will add significant overhead to the already
> CPU-locked process of, for example, creating indexes.  We're looking for
> ways to make the comparison operators lighter-weight, not heavier.
> 
> My general perspective on this is that if trailing blanks are a
> significant
> hazard for your application, then trim them on data input.  That requires
> a *lot* less peformance overhead than doing it every time you compare
> something.
> 
> Changing the behaviour would break backwards compatibility for some users.
> For that matter, I've been subscribed to 8 PostgreSQL mailing lists since
> 1999, and this is the first time I can recall someone complaining about
> this comparison behavior.  So it's obviously not a widespread issue.
> 
> --
> --Josh
> 
> Josh Berkus
> Aglio Database Solutions
> San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 'a' == 'a '

2005-10-19 Thread Josh Berkus

Dann,

> I think that whatever is done ought to be whatever the standard says.
> If I misinterpret the standard and PostgreSQL is doing it right, then
> that is fine.  It is just that PostgreSQL is very counter-intuitive
> compared to other database systems that I have used in this one
> particular area.  When I read the standard, it looked to me like
> PostgreSQL was not performing correctly.  It is not unlikely that I read
> it wrong.

AFAIT, the standard says "implementation-specific".   So we're standard.

The main cost for comparing trimmed values is performance; factoring an 
rtrim into every comparison will add significant overhead to the already 
CPU-locked process of, for example, creating indexes.  We're looking for 
ways to make the comparison operators lighter-weight, not heavier.

My general perspective on this is that if trailing blanks are a significant 
hazard for your application, then trim them on data input.  That requires 
a *lot* less peformance overhead than doing it every time you compare 
something.  

Changing the behaviour would break backwards compatibility for some users.  
For that matter, I've been subscribed to 8 PostgreSQL mailing lists since 
1999, and this is the first time I can recall someone complaining about 
this comparison behavior.  So it's obviously not a widespread issue.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Dann Corbit

> -Original Message-
> From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 19, 2005 2:46 PM
> To: Dann Corbit
> Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier;
> [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-
> [EMAIL PROTECTED]
> Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
> buysInnobase)
> 
> On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote:
> > > When the compared datatypes are VARCHAR: YES
> >
> > What is the value of doing that?
> >
> > I can see plenty of harm and absolutely no return.  We are talking
about
> > blank padding before comparison.  Do you really want 'Danniel '
> > considered distinct from 'Danniel  ' in a comparison?  In real life,
> > what does that buy you?
> 
> Well, looking from the point of view of using indexes, indexes can
only
> really match on things that are equal. Which means the system is going
> to have to trim them anyway. I'm of the opinion that strings are
> strings and spaces are no different from other characters.
> 
> That bit of the standard quoted earlier, if you read the PAD character
> that is different from any other character as being the NUL character,
> then 'a' is clearly different from 'a'. This
> whacky space behaviour is something I associate with the char(N) and
is
> the main reason I never use it.
> 
> > Perhaps this is old hat to the long-timers around here and there is
a
> > good explanation as to why varchar should have non-blank padding
when
> > comparisons are performed.  Can someone point me to documentation
that
> > explains it?
> 
> The way I understood it:
> 
> char(N) is blank padding
> varchar(N) is not
> 
> If you make varchar(n) do blank padding, then what's the difference
> between the two types? 

Storage.  The blank padding is only for comparison purposes.

>You may as well get rid of one...

All the other database systems seem to handle it in the way that I
expect.
Which is not to say that it is the right way or that it agrees with the
standard.
But it is how it appears to me, so far.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Dann Corbit

> -Original Message-
> From: Stephan Szabo [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 19, 2005 2:34 PM
> To: Dann Corbit
> Cc: Terry Fielder; Tino Wildenhain; Marc G. Fournier;
> [EMAIL PROTECTED]; pgsql-hackers@postgresql.org; pgsql-
> [EMAIL PROTECTED]
> Subject: Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy]
[GENERAL]
> Oracle buysInnobase)
> 
> On Wed, 19 Oct 2005, Dann Corbit wrote:
> 
> > > -Original Message-
> > > From: Terry Fielder [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, October 19, 2005 2:05 PM
> > > To: Dann Corbit
> > > Cc: Tino Wildenhain; Marc G. Fournier;
[EMAIL PROTECTED];
> > > pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
> > > Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL]
Oracle
> > > buysInnobase)
> > >
> > >
> > >
> > > Dann Corbit wrote:
> > > > Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
> > > >
> > > > connxdatasync=# select 1 where cast('a' as varchar(30)) =
cast('a '
> > as
> > > > varchar(30));
> > > >  ?column?
> > > > --
> > > > (0 rows)
> > > >
> > > > I see how you can interpret the SQL Standard to make the above
> > response
> > > > a correct one.  But is it the response that you would like?
> > >
> > > When the compared datatypes are VARCHAR: YES
> >
> > What is the value of doing that?
> >
> > I can see plenty of harm and absolutely no return.  We are talking
about
> > blank padding before comparison.  Do you really want 'Danniel '
> > considered distinct from 'Danniel  ' in a comparison?  In real life,
> > what does that buy you?
> 
> It buys you the ability to store things where trailing spaces are
> signficant (for example passwords) within the existing limitations of
not
> having a full set of the collation behavior.

I suppose that there may be some value in having 'J&^% ' be different
from 'J&^%   '.

I would expect to insert a password with trailing blanks to be the same
as inserting a password with no trailing blanks.

I think that whatever is done ought to be whatever the standard says.
If I misinterpret the standard and PostgreSQL is doing it right, then
that is fine.  It is just that PostgreSQL is very counter-intuitive
compared to other database systems that I have used in this one
particular area.  When I read the standard, it looked to me like
PostgreSQL was not performing correctly.  It is not unlikely that I read
it wrong.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Martijn van Oosterhout

On Wed, Oct 19, 2005 at 02:05:20PM -0700, Dann Corbit wrote:
> > When the compared datatypes are VARCHAR: YES
> 
> What is the value of doing that?
> 
> I can see plenty of harm and absolutely no return.  We are talking about
> blank padding before comparison.  Do you really want 'Danniel '
> considered distinct from 'Danniel  ' in a comparison?  In real life,
> what does that buy you?

Well, looking from the point of view of using indexes, indexes can only
really match on things that are equal. Which means the system is going
to have to trim them anyway. I'm of the opinion that strings are
strings and spaces are no different from other characters.

That bit of the standard quoted earlier, if you read the PAD character
that is different from any other character as being the NUL character,
then 'a' is clearly different from 'a'. This
whacky space behaviour is something I associate with the char(N) and is
the main reason I never use it.

> Perhaps this is old hat to the long-timers around here and there is a
> good explanation as to why varchar should have non-blank padding when
> comparisons are performed.  Can someone point me to documentation that
> explains it?

The way I understood it:

char(N) is blank padding
varchar(N) is not

If you make varchar(n) do blank padding, then what's the difference
between the two types? You may as well get rid of one...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgpl2oFftSN1C.pgp
Description: PGP signature

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle

2005-10-19 Thread Stephan Szabo

On Wed, 19 Oct 2005, Dann Corbit wrote:

> > -Original Message-
> > From: Terry Fielder [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, October 19, 2005 2:05 PM
> > To: Dann Corbit
> > Cc: Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED];
> > pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
> > Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
> > buysInnobase)
> >
> >
> >
> > Dann Corbit wrote:
> > > Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
> > >
> > > connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a '
> as
> > > varchar(30));
> > >  ?column?
> > > --
> > > (0 rows)
> > >
> > > I see how you can interpret the SQL Standard to make the above
> response
> > > a correct one.  But is it the response that you would like?
> >
> > When the compared datatypes are VARCHAR: YES
>
> What is the value of doing that?
>
> I can see plenty of harm and absolutely no return.  We are talking about
> blank padding before comparison.  Do you really want 'Danniel '
> considered distinct from 'Danniel  ' in a comparison?  In real life,
> what does that buy you?

It buys you the ability to store things where trailing spaces are
signficant (for example passwords) within the existing limitations of not
having a full set of the collation behavior.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Dann Corbit

> -Original Message-
> From: Terry Fielder [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 19, 2005 2:05 PM
> To: Dann Corbit
> Cc: Tino Wildenhain; Marc G. Fournier; [EMAIL PROTECTED];
> pgsql-hackers@postgresql.org; pgsql-general@postgresql.org
> Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
> buysInnobase)
> 
> 
> 
> Dann Corbit wrote:
> > Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:
> >
> > connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a '
as
> > varchar(30));
> >  ?column?
> > --
> > (0 rows)
> >
> > I see how you can interpret the SQL Standard to make the above
response
> > a correct one.  But is it the response that you would like?
> 
> When the compared datatypes are VARCHAR: YES

What is the value of doing that?

I can see plenty of harm and absolutely no return.  We are talking about
blank padding before comparison.  Do you really want 'Danniel '
considered distinct from 'Danniel  ' in a comparison?  In real life,
what does that buy you?

And even if people think it is a good idea, it seems to be at odds with
the way that I read the standard (though I am easily confused by simple
enough language at times).

It could well be that I am jaded from years of doing it the wrong way (I
expect two character strings with all leading non-blanks in agreement to
compare equal).

Perhaps this is old hat to the long-timers around here and there is a
good explanation as to why varchar should have non-blank padding when
comparisons are performed.  Can someone point me to documentation that
explains it?

[snip]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buys

2005-10-19 Thread Greg Stark

"Guy Rouillier" <[EMAIL PROTECTED]> writes:

> Tino Wildenhain wrote:
> >
> > experiment=# SELECT 'a '::char = 'a  '::char;
> >  ?column?
> > --
> >  t
> > 
> 
> This does't show anything useful, because the ::char casting simply
> takes the first char of any string:
> 
> select 'abc'::char = 'axy'::char
> 
> Also results in 'true'.  Hopefully no one in this discussion would want
> those two strings to be equal.

A better experiment:

db=> select 'a '::char(3) = 'a  '::char(3);
 ?column? 
--
 t
(1 row)

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buysInnobase)

2005-10-19 Thread Dann Corbit

Try this query in Oracle, SQL*Server, DB/2, Informix, etc.:

connxdatasync=# select 1 where cast('a' as varchar(30)) = cast('a ' as
varchar(30));
 ?column?
--
(0 rows)

I see how you can interpret the SQL Standard to make the above response
a correct one.  But is it the response that you would like?

Suppose, for instance, that you have an employee table in your database.
You have another table called benefits.  Perhaps it is even in another
database.  Conceivably even in a database other than PostgreSQL.

Anyway, you want to match information between the two systems so you
join on some redundant columns like a.First_Name = b.fname AND
a.Last_name = b.lname AND a.Middle_Initial = b.mi AND a.City = b.city
AND a.Street_name = b.street

If the columns are not the same length (or one is fixed length and the
other variable), then 'Danniel' won't match 'Danniel' unless you trim
it.  If you trim it, then the indexes go out the window.  If the indexes
go out the window, then we table scan.

I don't like that behavior.  Perhaps others who know more than me can
say why not blank padding comparisons is a good idea.

Clearly, one could argue that having redundant data is bad and that
every attribute in a database intended to match should be exactly the
same type.  But there are lots of database systems badly designed.  And
of well designed systems, it is not uncommon to have more than one
database in your organization, and a need to perform federated joins as
well because of it.

> -Original Message-
> From: Tino Wildenhain [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 19, 2005 1:05 PM
> To: Marc G. Fournier
> Cc: Dann Corbit; [EMAIL PROTECTED]; pgsql-
> [EMAIL PROTECTED]; pgsql-general@postgresql.org
> Subject: Re: 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle
> buysInnobase)
> 
> Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier:
> > I'm CC'ng this over to -hackers ... Tom?  Comments?
> >
> > On Wed, 19 Oct 2005, Dann Corbit wrote:
> >
> > > Yes, clearly that is the wrong result according to the SQL
standard.
> > >
> > > Here is a SQL*Server query:
> > > select 1 where 'a' = 'a ' AND 'a' = 'a  ' AND 'a ' = 'a '
> > >
> > > It returns (correctly): 1
> > >
> > >> -Original Message-
> > >> From: [EMAIL PROTECTED] [mailto:pgsql-general-
> > >> [EMAIL PROTECTED] On Behalf Of Marc G. Fournier
> > >> Sent: Wednesday, October 19, 2005 11:41 AM
> > >> To: [EMAIL PROTECTED]
> > >> Cc: pgsql-general@postgresql.org
> > >> Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase
> > >>
> > >> On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote:
> > >>
> > >>> I was referring to trailing blanks, but did not explicitly say
it,
> > >>> though showed it in the examples.  I am pretty sure that the SQL
> > >>> standard says that trailing whitespace is insignificant in
string
> > >>> comparison.
> > >>
> > >> Then we are broken too :)
> > >>
> > >> # select 'a ' = 'a  ';
> > >>   ?column?
> > >> --
> > >>   f
> > >> (1 row)
> 
> 
> experiment=# SELECT 'a '::char = 'a  '::char;
>  ?column?
> --
>  t
> 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buys

2005-10-19 Thread Tino Wildenhain

Am Mittwoch, den 19.10.2005, 16:29 -0300 schrieb Marc G. Fournier:
> I'm CC'ng this over to -hackers ... Tom?  Comments?
> 
> On Wed, 19 Oct 2005, Dann Corbit wrote:
> 
> > Yes, clearly that is the wrong result according to the SQL standard.
> >
> > Here is a SQL*Server query:
> > select 1 where 'a' = 'a ' AND 'a' = 'a  ' AND 'a ' = 'a '
> >
> > It returns (correctly): 1
> >
> >> -Original Message-
> >> From: [EMAIL PROTECTED] [mailto:pgsql-general-
> >> [EMAIL PROTECTED] On Behalf Of Marc G. Fournier
> >> Sent: Wednesday, October 19, 2005 11:41 AM
> >> To: [EMAIL PROTECTED]
> >> Cc: pgsql-general@postgresql.org
> >> Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase
> >>
> >> On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote:
> >>
> >>> I was referring to trailing blanks, but did not explicitly say it,
> >>> though showed it in the examples.  I am pretty sure that the SQL
> >>> standard says that trailing whitespace is insignificant in string
> >>> comparison.
> >>
> >> Then we are broken too :)
> >>
> >> # select 'a ' = 'a  ';
> >>   ?column?
> >> --
> >>   f
> >> (1 row)


experiment=# SELECT 'a '::char = 'a  '::char;
 ?column?
--
 t



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Josh Berkus

Kevin,

> This would require capture of information beyond what I was thinking
> about in terms of schema.  Do you think we need to capture just index
> type, or something more?  Do you propose that we capture the pg_*
> metadata related to every object referenced in the plan, every object
> related to every table in the query, or everything in the entire
> database?

Not right away.  However, having collected the info in the first place 
would give us the option of duping a copy of the system tables so that we 
could mine them for more information.

Also, for our test harness, we will have pre-programmed object information 
for the objects we use.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [pgsql-advocacy] [GENERAL] Oracle buys Innobase

2005-10-19 Thread Dann Corbit

Given this part of that same rule applied to the strings:
"b) If the length in characters of X is not equal to the length in
characters of Y, then the shorter string is effectively replaced, for
the purposes of comparison, with a copy of itself that has been extended
to the length of the longer string by concatenation on the right of one
or more pad characters, where the pad character is chosen based on CS.
If CS has the NO PAD characteristic, then the pad character is an
implementation-dependent character different from any character in the
character set of X and Y that collates less than any string under CS.
Otherwise, the pad character is a ."

I do not see how two strings which are otherwise equal (except for
length or blank padding) can possibly compare unequal unless the NO PAD
property is applied and the implementation defined pad character is also
something other than a .  Is that the case for PostgreSQL?  Even
if it is, is seems truly bizarre that the NO PAD attribute would be
applied to string constants.

> -Original Message-
> From: Marc G. Fournier [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 19, 2005 12:53 PM
> To: Dann Corbit
> Cc: Stephan Szabo; Marc G. Fournier; [EMAIL PROTECTED];
pgsql-
> [EMAIL PROTECTED]; pgsql-general@postgresql.org
> Subject: RE: [pgsql-advocacy] [GENERAL] Oracle buys Innobase
> 
> On Wed, 19 Oct 2005, Dann Corbit wrote:
> 
> >> -Original Message-
> >> From: Stephan Szabo [mailto:[EMAIL PROTECTED]
> >> Sent: Wednesday, October 19, 2005 12:39 PM
> >> To: Dann Corbit
> >> Cc: Marc G. Fournier; [EMAIL PROTECTED]; pgsql-
> >> [EMAIL PROTECTED]
> >> Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase
> >>
> >> On Wed, 19 Oct 2005, Dann Corbit wrote:
> >>
> >>> Yes, clearly that is the wrong result according to the SQL
standard.
> >>>
> >>> Here is a SQL*Server query:
> >>> select 1 where 'a' = 'a ' AND 'a' = 'a  ' AND 'a ' = 'a '
> >>>
> >>> It returns (correctly): 1
> >>
> >> Doesn't that depend on the collating sequence in use, or is a NO
PAD
> >> collating sequence not allowed here?
> >
> > If the implementation defines constants as NO PAD and the
implementation
> > defined pad character is something other than space, then they could
> > compare unequal.
> >
> > I would find that implementation disturbing.  But I am easily bent
out
> > of shape.
> >
> > The attached HTML file in my earlier post is the official quote from
the
> > SQL 99 standard.  That is the formal and correct definition, far
> > superior to my off the cuff approximations.
> 
> 'k, if I'm reading the right section (you say its bolded, but I'm
using
> pine which doesn't seem to do a good job of reading HTML):
> 
> ===
> d) Depending on the collating sequence, two strings may compare as
> equal even if they are of different lengths or contain different
> sequences of characters. When any of the operations MAX, MIN, and
> DISTINCT reference a grouping column, and the UNION, EXCEPT, and
> INTERSECT operators refer to character strings, the specific value
> selected by these operations from a set of such equal values is
> implementation-dependent.
> ===
> 
> I think the key part of that 'clause' is "two strings *may* compare as
> equal" ... sounds implementation dependent to me, depending on how the
> implementor interprets it ... or am I reading the wrong section?
> 
> 
> Marc G. Fournier   Hub.Org Networking Services
> (http://www.hub.org)
> Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ:
> 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [pgsql-advocacy] [GENERAL] Oracle buys Innobase

2005-10-19 Thread Marc G. Fournier

On Wed, 19 Oct 2005, Dann Corbit wrote:

-Original Message-
From: Stephan Szabo [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 19, 2005 12:39 PM
To: Dann Corbit
Cc: Marc G. Fournier; [EMAIL PROTECTED]; pgsql-
[EMAIL PROTECTED]
Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase

On Wed, 19 Oct 2005, Dann Corbit wrote:

Yes, clearly that is the wrong result according to the SQL standard.

Here is a SQL*Server query:
select 1 where 'a' = 'a ' AND 'a' = 'a  ' AND 'a ' = 'a '

It returns (correctly): 1

Doesn't that depend on the collating sequence in use, or is a NO PAD
collating sequence not allowed here?

If the implementation defines constants as NO PAD and the implementation
defined pad character is something other than space, then they could
compare unequal.

I would find that implementation disturbing.  But I am easily bent out
of shape.

The attached HTML file in my earlier post is the official quote from the
SQL 99 standard.  That is the formal and correct definition, far
superior to my off the cuff approximations.

'k, if I'm reading the right section (you say its bolded, but I'm using 
pine which doesn't seem to do a good job of reading HTML):

===
d) Depending on the collating sequence, two strings may compare as
equal even if they are of different lengths or contain different
sequences of characters. When any of the operations MAX, MIN, and
DISTINCT reference a grouping column, and the UNION, EXCEPT, and
INTERSECT operators refer to character strings, the specific value
selected by these operations from a set of such equal values is
implementation-dependent.
===

I think the key part of that 'clause' is "two strings *may* compare as 
equal" ... sounds implementation dependent to me, depending on how the 
implementor interprets it ... or am I reading the wrong section?

Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner

This would require capture of information beyond what I was thinking
about in terms of schema.  Do you think we need to capture just index
type, or something more?  Do you propose that we capture the pg_*
metadata related to every object referenced in the plan, every object
related to every table in the query, or everything in the entire
database?


>>> Josh Berkus   >>>

[smip]

the only way you're going analyze index costing by type 
of index is if the index name is stored somewhere (db_object) so that it

can be matched to its characteristics.

[snip]



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] 'a' == 'a ' (Was: RE: [pgsql-advocacy] [GENERAL] Oracle buys Innobase)

2005-10-19 Thread Marc G. Fournier



I'm CC'ng this over to -hackers ... Tom?  Comments?

On Wed, 19 Oct 2005, Dann Corbit wrote:


Yes, clearly that is the wrong result according to the SQL standard.

Here is a SQL*Server query:
select 1 where 'a' = 'a ' AND 'a' = 'a  ' AND 'a ' = 'a '

It returns (correctly): 1


-Original Message-
From: [EMAIL PROTECTED] [mailto:pgsql-general-
[EMAIL PROTECTED] On Behalf Of Marc G. Fournier
Sent: Wednesday, October 19, 2005 11:41 AM
To: [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Subject: Re: [pgsql-advocacy] [GENERAL] Oracle buys Innobase

On Wed, 19 Oct 2005, [EMAIL PROTECTED] wrote:


I was referring to trailing blanks, but did not explicitly say it,
though showed it in the examples.  I am pretty sure that the SQL
standard says that trailing whitespace is insignificant in string
comparison.


Then we are broken too :)

# select 'a ' = 'a  ';
  ?column?
--
  f
(1 row)


Marc G. Fournier   Hub.Org Networking Services
(http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ:
7615664

---(end of

broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org





Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Andrew Dunstan




Robert Treat wrote:


On Tuesday 18 October 2005 18:05, Kevin Grittner wrote:
 


Regarding the idea of a site where results could be posted
and loaded into a database which would be available for
public access -- I agree that would be great; however, my
client is not willing to take that on.  If anyone wants to
volunteer, that wuold be fantastic.

   



Josh, does the setup on the foundry homepages give you enough to do this?

 



pgFoundry currently does not provide either a database or web programs 
for groups, for security reasons. All you get is the ability to make 
static pages. The new machine will provide both, we hope.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner

Maybe we could associate a set of defaults to runtime_environment,
and you would associate any overrides with the runtime_options.

Does this address both your concerns?

>>> Josh Berkus   >>>
Kevin,

> When it gets downt to the detail, it may make sense to combine
> or split some of these.  For example, runtime_options should
> probably not have a column for each currently known option,
> but a child table which maps to all non-default option values.

I'm a little cautious about storing only non-defaults; the defaults have

changed from version to version, after all.  If we did that, we'd need
to 
have a "defaults" table in the db as a reference list.

Also, we'll need to store runtime options both on the "machine" level
and 
on the "query" level, in order to allow testing of changing an enable_*
or 
other query cost option at runtime.   Not sure how to capture this, 
though.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Robert Treat

On Tuesday 18 October 2005 18:05, Kevin Grittner wrote:
> Regarding the idea of a site where results could be posted
> and loaded into a database which would be available for
> public access -- I agree that would be great; however, my
> client is not willing to take that on.  If anyone wants to
> volunteer, that wuold be fantastic.
>

Josh, does the setup on the foundry homepages give you enough to do this?

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner

I'm not interested in storing less information.  I'm trying to make sure
that all redundant information is justified.  Since I plan to store the
actual query text and the full EXPLAIN ANALYZE output, every
column I pull out and put in another table is redundant data.  The
questions are, why do we have it, is it going to be used heavily
enough in such a way that it should be stored redundantly?  If the
goal is to be able to get at the number of filters, etc., perhaps
another
table which digests this to a form which can be used in summary
queries is what's needed.  If this is a rare type of query which few
people will run, maybe they can go to the EXPLAIN ANALYZE,
using the line number info, and grab it from there.

-Kevin


>>> Josh Berkus   >>>

I'm also really unclear on why you're so focused on storing less 
information rather than more.  In an "investigation" tool like this,
it's 
important to collect as much data as possible because we don't know
what's 
going to be valuable until we analyze it.   You seem to be starting out 
with the idea that you *already* know exactly where the problems are 
located, in which case why develop a tool at all?  Just fix the problem.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Josh Berkus

Kevin,

> When it gets downt to the detail, it may make sense to combine
> or split some of these.  For example, runtime_options should
> probably not have a column for each currently known option,
> but a child table which maps to all non-default option values.

I'm a little cautious about storing only non-defaults; the defaults have 
changed from version to version, after all.  If we did that, we'd need to 
have a "defaults" table in the db as a reference list.

Also, we'll need to store runtime options both on the "machine" level and 
on the "query" level, in order to allow testing of changing an enable_* or 
other query cost option at runtime.   Not sure how to capture this, 
though.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner

Summary of schema I'm considering.  Comments welcome.

When it gets downt to the detail, it may make sense to combine
or split some of these.  For example, runtime_options should
probably not have a column for each currently known option,
but a child table which maps to all non-default option values.

submitter
  Identifies who submitted test results.
 Name, optional info such as email address and organization

runtime_environment
  Identifies the overall test environment.
  OS with distribution & version, CPU number, type speed, RAM,
background load, static configuration, etc.
  This provides context for a series of tests, to see how the numbers
look in a given environment

dataset_characteristics
  Identifies data metrics which may affect costing accuracy
  Table counts, row counts, column counts, disk space used,
level of fragmentation.
  Maybe some of the "standard" tests will share common dataset
characteristics across multiple environments.

cache_state
  Identifies the level of initial caching for a test, and the degree to
which
the referenced data can be cached during execution

runtime_options
  Identifies the runtime choices in effect for a query run.
  The state of EXPLAIN, ANALYZE, enable_xxx, and dynamic configuration
settings, such as random_page_cost.

query
  Identifies a test query, possibly run by many people in many
environments
against various datasets with different cache states and runtime
options

test_result_summary
  Ties a query to details about a run, with a summary of results.
  Run time from the client perspective, rows returned.

test_result_step_detail
  Shows EXPLAIN ANALYZE information (if any) for each step.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] 8.04 and RedHat/CentOS init script issue

2005-10-19 Thread Devrim GUNDUZ


Hi,

On Tue, 18 Oct 2005, Tony Caduto wrote:

I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 and 
while booting the init script reports that the daemon [FAILED], but after I 
logon it shows the postmaster running and I am able to connect from any 
client remotely.


I made not modifcations to the script and there is nothing out of the 
ordinary in the log.


Hmm. In 8.0.4 RPM init scripts, we were using a 1 second of sleep time 
(see sleep 1 line in the init script). On some cases where the system is 
slow, you are prompted about the startup failure; however this is not the 
real case.


In 8.1 RPMs, the sleep time was increased to 2 seconds; which we believe 
that won't have the problem you've reported:


http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgsqlrpms/patches/8.1/postgresql.init?rev=1.2&content-type=text/x-cvsweb-markup

So please increase this sleep time and give another try.

Regards,
--
Devrim GUNDUZ
Kivi Bilişim Teknolojileri - http://www.kivi.com.tr
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
  http://www.gunduz.org
---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Josh Berkus

Kevin,

> If we stored the actual queries and the EXPLAIN ANALYZE results (when
> generated) in the database, what would be the purpose of the node_name,
> db_object, and condition_detail columns?  They don't seem like they
> would be useful for statistical analysis, and it seems like the
> information would be more useful in context.  Are these column really
> needed?

Yes.  For example, the only way you're going analyze index costing by type 
of index is if the index name is stored somewhere (db_object) so that it 
can be matched to its characteristics.  For condition_detail, again we 
could determine that (for example) we have costing problems when filters 
involve more than 2 columns or complex expressions.  

Node_name is as actually duplicative of some of the other columns, so I 
suppose it could be dropped.

> For a given node_type, are there mutiple valid condition_type values?
> If so, I need to modify my python script to capture this.  If not, I
> don't see a need to store it.

I'm not sure.  Even if there aren't now, there could be in the future. I'm 
more focused on supporting cross-node-type conditions.  For example, 
"Filter" conditions can apply to a variety of node types (Index Scan, 
Merge Join, Subquery Scan, Seq Scan, aggregates).   If we were costing 
Filters, we'd want to be able to aggregate their stats regardless of the 
node in which they occurred.

I'm also really unclear on why you're so focused on storing less 
information rather than more.  In an "investigation" tool like this, it's 
important to collect as much data as possible because we don't know what's 
going to be valuable until we analyze it.   You seem to be starting out 
with the idea that you *already* know exactly where the problems are 
located, in which case why develop a tool at all?  Just fix the problem.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] A costing analysis tool

2005-10-19 Thread Kevin Grittner

If we stored the actual queries and the EXPLAIN ANALYZE results (when
generated) in the database, what would be the purpose of the node_name,
db_object, and condition_detail columns?  They don't seem like they
would be useful for statistical analysis, and it seems like the
information would be more useful in context.  Are these column really
needed?

For a given node_type, are there mutiple valid condition_type values? 
If so, I need to modify my python script to capture this.  If not, I
don't see a need to store it.

-Kevin


>>> Josh Berkus   >>>

[snip]

so, for example, the query step:
"->  Seq Scan on detail0009  (cost=0.00..20500.11
rows=26 width=1005) (actual time=453.000..5983.000 rows=53588 loops=1)"
"  Filter: ((txcontenttype ~~* '%html%'::text)
AND ((vchost)::text ~~* '%www.%'::text))"

Could be displayed as:

query_instance  12008
step_id 14701
parent_step 14698
node_name   Seq Scan on detail0009
node_type   Seq Scan
cost_start  0
cost_end20500.11
est_rows26
time_start  453.0
time_end5983.0
actual_rows 53588
loops   1
db_object   detail009
condition_type  Filter
condition_detail((txcontenttype ~~* '%html%'::text) AND
((vchost)::text ~~* 
'%www.%'::text))

By collecting all of this data, you make it possible to perform other
sorts of 
analysis on the cost estimates.   For example, statistical analysis
might 
tell us that 3-or-more-condition filters take significantly longer to
execute 
than single-condition filters, which would be important to know for the
cost 
model.  Limiting it to collecting only 3 of the 13 bits of node data
produced 
by EA would very much limit the usefulness of the tool and the
reliability of 
its statistics.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Optimization system

2005-10-19 Thread Dann Corbit

http://lca2005.linux.org.au/Papers/Neil%20Conway/Inside%20the%20PostgreSQL%20Query%20Optimizer/pg_query_optimizer.pdf

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED] On Behalf Of Sebastián Ávila
> Sent: Wednesday, October 19, 2005 9:04 AM
> To: pgsql-hackers@postgresql.org
> Subject: [HACKERS] Optimization system
> 
> Hi people,
> 
> Please I need help about this, where I could find information about
> the query optimization system in postgresql?
> Thanks!
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Optimization system

2005-10-19 Thread Jonah H. Harris

see src/backend/optimizer/README
On 10/19/05, Sebastián Ávila <[EMAIL PROTECTED]> wrote:
Hi people,Please I need help about this, where I could find information aboutthe query optimization system in postgresql?Thanks!---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings-- Respectfully,Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporation
http://www.enterprisedb.com/

[HACKERS] Optimization system

2005-10-19 Thread Sebastián Ávila

Hi people,

Please I need help about this, where I could find information about
the query optimization system in postgresql?
Thanks!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Segv in pg_autovacuum in 8.0.x

2005-10-19 Thread Matthew T. O'Connor


daveg wrote:

Apologies if this is old news, but pg_autovacuum in 8.0.x has the bad habit
of SEGVing and exiting when a table gets dropped out from under it. This
creates problems if you rely on pg_autovacuum for the bulk of your vacuuming
as it forgets it's statistics when it is restarted and so will skip some
desireable vacuums.

I have a patch for this that should apply against any 8.0.x if there is any
interest.
  


Please post the patch, it might get committed into 8.0.x.  I personally 
haven't heard any other reports of it dieing when tables are dropped, 
but I don't doubt there are some lingering issues like this.



I looked at the new autovacuum in 8.1 and it appears from casual inspection
not to have the same problem.


The autovacuum in 8.1 is a very different critter.  While its design is 
based on the contrib autovacuum, it is mostly a total rewrite, and 
probably a vast improvement :-)



Matt


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] Call for translators

2005-10-19 Thread Lyubomir Rusanov





Hi,
I am also interested in helping building Bulgarian translation for
PostgreSQL. 
I think that we will not have enough time for 8.1 but maybe for 8.2.

Regards
Lyubomir Rusanov

Ivan Pavlov wrote:

  Peter Eisentraut wrote:
  
  
As the release of PostgreSQL 8.1 draws near, it is once again time to update 
the translations of the program messages.  To check the status of your 
language, check out this web site:

http://developer.postgresql.org/~petere/nlsstatus/

To participate in the translation effort, please follow the instructions set 
out on that page, or contact me directly if you have further questions.


  
  
Is there any particular reason why there is no translation in Bulgarian?
If it's only the lack of contributors, you can count me as the first one.

Ivan Pavlov

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] 2nd try @NetBSD/2.0 Alpha

2005-10-19 Thread Larry Rosenman

Martijn van Oosterhout wrote:
> On Tue, Oct 18, 2005 at 10:07:26PM -0500, Larry Rosenman wrote:
>>> ...or attach with a debugger like gdb.
>> 
>> d'oh.  I go stupid occasionally :)
>> 
>> If someone wants me to, I can try that.
> 
> Yes, actually. See, its dying in the seg test already with:
> 
>   -- Open intervals
>   SELECT '0..'::seg AS seg;
> ! ERROR:  floating-point exception
> ! DETAIL:  An invalid floating-point operation was signaled. This
>   probably means an out-of-range result or an invalid operation, s
> uch as division by zero. SELECT '0...'::seg AS seg; ! ERROR: 
> floating-point exception ! DETAIL:  An invalid floating-point
> operation was signaled. This probably means an out-of-range result or
> an invalid operation, s uch as division by zero.  
> 
> You need to attach a debugger to find out where that error is
> actually happenening. Just startup the backend, connect to it and
> connect gdb to the newly spawned backend and just run that query by
> hand. Then you should get the backtrace at SIGFPE.   
> 
I don't have the time today (need to do some paying work).  However,
If someone wants, I can pass an ID/PW along so that they 
May find it.  (or it can wait till the weekend).

LER


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 3535 Gaspar Drive, Dallas, TX 75220-3611 US


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Question about Ctrl-C and less

2005-10-19 Thread Martijn van Oosterhout

On Wed, Oct 19, 2005 at 04:24:21AM -0700, Kevin Brown wrote:
> It does?  It looked to me like it was setting the signal handler to
> SIG_IGN before doing the popen(), and resetting it to the internal
> signal handler after doing the pclose().

Oops, you're right. It works because less sets it's own signal handler,
as does more. Sure, the status is inherited but I doubt there's a pager
out there that doesn't set its own signal handlers. Still, best do it
properly.

> If I were to take that approach, I'd do the check immediately after
> writing to the pipe for sure, and possibly after reading from the
> database handle.  I'd have to look at the code to see what's going on
> and thus where the best places to stick the checks are.  But in
> general, you want to put a check anywhere within the loop that is
> immediately after a blocking operation such as a read or write.
> However, that said...

Well, the pager is only run after all the data has been collected from
the server so none of this is an issue while the query is being
processed. Send the cancel request and toss the data. Once we start the
pager just check after each line is printed.

> The semantics for handling SIGINT here should be pretty much the same
> as they are when there's no pager at all (which I know works because I

You can't do a pclose in a signal handler, it's not one of the
"reeentrant safe" functions and could lead to deadlocks. The signal
manpage documents the ones you can use. Just set a flag. Setting the
descriptor to NULL is worse because then we have check before every
output function. fprintf(NULL, ...) will segfault on most
architechtures I wager. 

Also, the signal handler can't free the memory, there's way too many
arrays. A flag allows the output functions to exit sanely.

> If you're hitting SIGQUIT a lot then perhaps you need to assign it to
> a different key.  :-)

I use SIGQUIT to get out of the situation you get currently when you
press Ctrl-C to interrupt less. Somehow all the "q"s go to psql and all
the Ctrl-Ds to less. SIGQUIT just kills them both from where I can
clean the terminal up. So I will press SIGQUIT less once we fix up psql
to not do stupid things on SIGINT. :)

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgp3AL1cKVSLD.pgp
Description: PGP signature

Re: [HACKERS] [GENERAL] Call for translators

2005-10-19 Thread Ivan Pavlov

Peter Eisentraut wrote:
> As the release of PostgreSQL 8.1 draws near, it is once again time to update 
> the translations of the program messages.  To check the status of your 
> language, check out this web site:
> 
> http://developer.postgresql.org/~petere/nlsstatus/
> 
> To participate in the translation effort, please follow the instructions set 
> out on that page, or contact me directly if you have further questions.
> 

Is there any particular reason why there is no translation in Bulgarian?
If it's only the lack of contributors, you can count me as the first one.

Ivan Pavlov

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Question about Ctrl-C and less

2005-10-19 Thread Andrew - Supernews

On 2005-10-19, Kevin Brown <[EMAIL PROTECTED]> wrote:
> Making assumptions about what the pager will do upon receipt of SIGINT
> is folly as well.
>
> Setting up SIGINT to be ignored may be the right answer (I don't
> believe it is -- see below), but if so then it needs to be done
> properly.  If it gets ignored prior to the popen(), then the child
> will also end up ignoring it by default, because signal disposition is
> inherited by child processes.  If we ignore SIGINT, it should be after
> the popen(), not before.

I do not believe it is possible to do the signal disposition correctly
and still use popen() to run the pager. (You would need to reimplement
popen using raw syscalls.)

> So I think the right answer here is for psql to handle SIGINT
> internally by doing a pclose() first

The chances that psql can do this safely approach zero. pclose() is not a
signal-safe function, so it can only be called from a signal handler if
you _know_ that the signal did not interrupt any non-signal-safe function.
(Nor can the signal handler longjmp out in such a case, unless the code is
never again going to call any unsafe function.)

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pg_dump permissions

2005-10-19 Thread Andrew - Supernews

On 2005-10-19, Christopher Kings-Lynne <[EMAIL PROTECTED]> wrote:
> Suppose I have a database with a table in it.  I try to dump that 
> database.  The user I dump as is not the owner of that table nor does it 
> have the SELECT permission on that table.
>
> Why can't I do a schema-only dump of that table?  I find that I need 
> SELECT permission on the table, even though my user can happily query 
> the catalogs.

pg_dump takes an access share lock on everything it touches, to avoid
having the schema change out from under it.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Question about Ctrl-C and less

2005-10-19 Thread Kevin Brown

Martijn van Oosterhout wrote:
> On Tue, Oct 18, 2005 at 09:32:25PM -0700, Kevin Brown wrote:
> > So I think the right answer here is for psql to handle SIGINT
> > internally by doing a pclose() first (and at this point, it probably
> > should ignore SIGINT altogether), then returning to the main loop
> > (and, of course, cleaning up anything that needs it along the way).
> > If the child hasn't exited then pclose() will block until it has.  The
> > end result should be the semantics you want: if psql is in the middle
> > of sending a bunch of rows of output to the pager, this will interrupt
> > that process.  If the pager remains running then it will hopefully
> > give the user the ability to scroll through whatever results were sent
> > to it.
> 
> That's what I meant by "more comprehensive patch". Basically, the
> longjmp has to go because it leaks memory (and file descriptors) and
> doesn't allow you to control things at the right level. My little patch
> basically set the signal handler *after* the popen so everything works
> as expected.

It does?  It looked to me like it was setting the signal handler to
SIG_IGN before doing the popen(), and resetting it to the internal
signal handler after doing the pclose().

Am I examining the wrong patch?

> My plan is to have the interrupt handler set a flag "control_c_pressed"
> and check it at strategic points. Then memory can be deallocated and
> returned properly. It's a lot more invasive and the corners cases will
> be "interesting".

If I were to take that approach, I'd do the check immediately after
writing to the pipe for sure, and possibly after reading from the
database handle.  I'd have to look at the code to see what's going on
and thus where the best places to stick the checks are.  But in
general, you want to put a check anywhere within the loop that is
immediately after a blocking operation such as a read or write.
However, that said...

The semantics for handling SIGINT here should be pretty much the same
as they are when there's no pager at all (which I know works because I
can interrupt the output anytime and get the psql prompt back).  In
fact, this suggests that the proper course of action would be to store
the pipe's file descriptor someplace accessible to the normal SIGINT
handler, and have the SIGINT handler check its value upon receipt of
SIGINT.  If the value is nonzero then pclose() it and then proceed as
before (and zero the descriptor after any pclose() so that the handler
always does the right thing).  If there's any additional memory that
has to be freed, then do that too, of course.

That'll take care of the descriptor leak and it'll get you the right
semantics as a nice bonus.

> Your point about SIGQUIT is valid. I didn't include it in my patch
> since I wasn't sure what the expected behaviour would be. If I got core
> files everytime I pressed SIGQUIT, I'd have a lot of core files
> scattered around my disk; one of the reasons I disable core files by
> default.

If you're hitting SIGQUIT a lot then perhaps you need to assign it to
a different key.  :-)

Seriously, you shouldn't be using SIGQUIT unless you want a core file.
Otherwise, use SIGINT.  If the application is unresponsive to SIGINT,
my normal course of action is to suspend it via job control signals
(i.e., send SIGTSTP) and then send it a SIGTERM via the kill command.
If that doesn't do it, then it gets a SIGKILL.  And if it doesn't stop
upon receipt of SIGTSTP I kill it externally and complain to its
developers (assuming it isn't hung in some uninterruptible sleep in
the kernel or something).

> OTOH, if I wanted to trap psql, I'd run it under a debugger or attach
> one which would catch SIGQUIT even if the program ignores it.

Suppose it's hung on a really hard to reproduce condition?  Point
being that you don't necessarily know ahead of time that you're going
to want a core file.  Otherwise SIGQUIT wouldn't have the default
semantics it has, and it wouldn't have a terminal key associated with
it.  The guys that designed this stuff knew what they were doing.

-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] collector/autovacuum after crash (8.1beta3)

2005-10-19 Thread Szima Gábor



Hi All,

At system crash or poweroff the autovacuum statistics will be lost,
because this statistics only stored in RAM and saved/restored at
service shutdown/startup.
I think it should be saved periodically and not to be deleted after
crash.



-Sygma

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] 2nd try @NetBSD/2.0 Alpha

2005-10-19 Thread Martijn van Oosterhout

On Tue, Oct 18, 2005 at 10:07:26PM -0500, Larry Rosenman wrote:
> >...or attach with a debugger like gdb.
> 
> d'oh.  I go stupid occasionally :)
> 
> If someone wants me to, I can try that.

Yes, actually. See, its dying in the seg test already with:

  -- Open intervals
  SELECT '0..'::seg AS seg;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This
probably means an out-of-range result or an invalid operation, s
uch as division by zero.
  SELECT '0...'::seg AS seg;
! ERROR:  floating-point exception
! DETAIL:  An invalid floating-point operation was signaled. This
probably means an out-of-range result or an invalid operation, s
uch as division by zero.

You need to attach a debugger to find out where that error is actually
happenening. Just startup the backend, connect to it and connect gdb to
the newly spawned backend and just run that query by hand. Then you
should get the backtrace at SIGFPE.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgp437RUDPiCT.pgp
Description: PGP signature

Re: [HACKERS] Question about Ctrl-C and less

2005-10-19 Thread Martijn van Oosterhout

On Tue, Oct 18, 2005 at 09:32:25PM -0700, Kevin Brown wrote:
> So I think the right answer here is for psql to handle SIGINT
> internally by doing a pclose() first (and at this point, it probably
> should ignore SIGINT altogether), then returning to the main loop
> (and, of course, cleaning up anything that needs it along the way).
> If the child hasn't exited then pclose() will block until it has.  The
> end result should be the semantics you want: if psql is in the middle
> of sending a bunch of rows of output to the pager, this will interrupt
> that process.  If the pager remains running then it will hopefully
> give the user the ability to scroll through whatever results were sent
> to it.

That's what I meant by "more comprehensive patch". Basically, the
longjmp has to go because it leaks memory (and file descriptors) and
doesn't allow you to control things at the right level. My little patch
basically set the signal handler *after* the popen so everything works
as expected.

My plan is to have the interrupt handler set a flag "control_c_pressed"
and check it at strategic points. Then memory can be deallocated and
returned properly. It's a lot more invasive and the corners cases will
be "interesting".

Your point about SIGQUIT is valid. I didn't include it in my patch
since I wasn't sure what the expected behaviour would be. If I got core
files everytime I pressed SIGQUIT, I'd have a lot of core files
scattered around my disk; one of the reasons I disable core files by
default.

OTOH, if I wanted to trap psql, I'd run it under a debugger or attach
one which would catch SIGQUIT even if the program ignores it.

Anyway, thanks for the response, hopefully we can get this sorted out
in a later release...
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgpZo9xP9YYFB.pgp
Description: PGP signature

[HACKERS] Segv in pg_autovacuum in 8.0.x

2005-10-19 Thread daveg


Apologies if this is old news, but pg_autovacuum in 8.0.x has the bad habit
of SEGVing and exiting when a table gets dropped out from under it. This
creates problems if you rely on pg_autovacuum for the bulk of your vacuuming
as it forgets it's statistics when it is restarted and so will skip some
desireable vacuums.

I have a patch for this that should apply against any 8.0.x if there is any
interest.

I looked at the new autovacuum in 8.1 and it appears from casual inspection
not to have the same problem.

-dg

-- 
David Gould  [EMAIL PROTECTED]
If simplicity worked, the world would be overrun with insects.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] pg_dump permissions

2005-10-19 Thread Christopher Kings-Lynne

Suppose I have a database with a table in it.  I try to dump that 
database.  The user I dump as is not the owner of that table nor does it 
have the SELECT permission on that table.


Why can't I do a schema-only dump of that table?  I find that I need 
SELECT permission on the table, even though my user can happily query 
the catalogs.


PostgreSQL version: 7.4

Chris


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] SIGSEGV in Postgres 8.0.3 (libpq4)

2005-10-19 Thread Richard Huxton


Anand Kumria wrote:

Hi,

I have a set of perl scripts which invoke each other (via system());
eventually I found that they were crashing and ultimately causing Perl
to SIGSEGV.

I am using Debian testing and have recompiled postgres-8.0.3-15 to
include debugging symbols.  This is a dual-CPU dual-stacked IPv4/IPv6
host running Linux 2.4.22.


From what I have been able to determine the problem is in libpq4:


Starting program: /usr/bin/perl ./add_address 265 2001:502:d399:0:0:0:0:44
[Thread debugging using libthread_db enabled]
[New Thread 16384 (LWP 8523)]
(no debugging symbols found)
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 8523)]


The first thing the developers will ask for is a reproducible test-case. 
 Could you reduce "add_address" to the minimum?


The next thing they'll ask is whether you built with 
--enable-thread-safety turned on.

http://www.postgresql.org/docs/8.0/static/libpq-threading.html

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend

47 matches

Mail list logo