[HACKERS] Why isn't stats_temp_directory automatically created?

2009-04-14 Thread Fujii Masao
Hi,

log_directory is automatically created if not present when starting
the database server. But, stats_temp_directory is not created. Why?
ISTM that current behavior is undesirable.

Is it worth making the patch which creates stats_temp_directory
if not present?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New trigger option of pg_standby

2009-04-14 Thread Simon Riggs
On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote:

 A lookahead (the +1) may have pg_standby get stuck as follows.
 Am I missing something?
 
 1. the trigger file containing smart is created.
 2. pg_standby is executed.
 2-1. nextWALfile is restored.
 2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
 3. the restored nextWALfile is applied.
 4. pg_standby is executed again to restore nextWALfile+1.

This can't happen. (4) will never occur when (2-2) has occurred. A
non-zero error code means file not available which will cause recovery
to end and hence no requests for further WAL files are made.

It does *seem* as if there is a race condition there in that another WAL
file may arrive after we have taken the decision there are no more WAL
files, but it's not a problem. That could happen if we issue the trigger
while the master is still up, which is a mistake - why would we do that?
If we only issue the trigger once we are happy the master is down then
we don't get a problem.

So lets do it the next+1 way, when triggered.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New trigger option of pg_standby

2009-04-14 Thread Fujii Masao
Hi,

On Tue, Apr 14, 2009 at 6:35 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote:

 A lookahead (the +1) may have pg_standby get stuck as follows.
 Am I missing something?

 1. the trigger file containing smart is created.
 2. pg_standby is executed.
     2-1. nextWALfile is restored.
     2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
 3. the restored nextWALfile is applied.
 4. pg_standby is executed again to restore nextWALfile+1.

 This can't happen. (4) will never occur when (2-2) has occurred. A
 non-zero error code means file not available which will cause recovery
 to end and hence no requests for further WAL files are made.

When pg_standby exits with non-zero code, (3) and (4) will never
occur, and the transactions in nextWALfile will be lost. So, in (2-2),
pg_standby has to call exit(0), I think.

On the other hand, if exit(0) is called in (2-2), the above scenario
happens.

 It does *seem* as if there is a race condition there in that another WAL
 file may arrive after we have taken the decision there are no more WAL
 files, but it's not a problem. That could happen if we issue the trigger
 while the master is still up, which is a mistake - why would we do that?
 If we only issue the trigger once we are happy the master is down then
 we don't get a problem.

Yeah, I agree that such race condition is not a problem. The
trigger file has to be created after all the WAL files arrive at
the standby server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
 It gets worse though: I have seldom seen such a badly designed piece of
 syntax as the Unicode string syntax --- see
 http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL
-SYNTAX-STRINGS-UESCAPE

 You scan the string, and then after that they tell you what the escape
 character is!?  Not to mention the obvious ambiguity with  as an
 operator.

 If we let this go into 8.4, our previous rounds with security holes
 caused by careless string parsing will look like a day at the beach.
 No frontend that isn't fully cognizant of the Unicode string syntax is
 going to parse such things correctly --- it's going to be trivial for
 a bad guy to confuse a quoting mechanism as to what's an escape and what
 isn't.

Note that the escape character marks the Unicode escapes; it doesn't affect the 
quote characters that delimit the string.  So offhand I can't see any potential 
for quote confusion/SQL injection type problems.  Please elaborate if you see 
a problem.

If there are problems, we could consider getting rid of the UESCAPE clause.  
Without it, the U'' strings would behave much like the E'' strings.  But I'd 
like to understand the problem first.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New trigger option of pg_standby

2009-04-14 Thread Fujii Masao
Hi,

On Mon, Apr 13, 2009 at 2:52 PM, Fujii Masao masao.fu...@gmail.com wrote:
 But, a lookahead nextWALfile seems to work fine.

 if (triggered)
 {
    if (smartMode  nextWALfile exists)
        exit(0)
    else
    {
        delete trigger file
        exit(1)
    }
 }

Umm... in this algorithm, the trigger file remains after failover
if the nextWALfile has the invalid record which means the end
of WAL files.

I'd like to propose another simple idea; pg_standby deletes the
trigger file *whenever* the nextWALfile is a timeline history file.
A timeline history file is restored at the end of recovery, so it's
guaranteed that the trigger file is deleted whether nextWALfile
exists or not.

A timeline history file is restored also at the beginning of
recovery, so the accidentally remaining trigger file is deleted
in early warm-standby as a side-effect of this idea.

How does that sound?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Saturday 11 April 2009 21:50:29 Josh Berkus wrote:
 On 4/11/09 11:47 AM, Marko Kreen wrote:
  On 4/11/09, Tom Lanet...@sss.pgh.pa.us  wrote:
It gets worse though: I have seldom seen such a badly designed piece
  of syntax as the Unicode string syntax --- see
   
  http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#
 SQL-SYNTAX-STRINGS-UESCAPE

 WTF?  Whose feature is this?  What's the use case?

The use case is approximately the same as #; in HTML: entering Unicode 
characters that your screen or keyboard cannot easily produce.  It's a 
desperately needed feature for me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Marko Kreen
On 4/14/09, Peter Eisentraut pete...@gmx.net wrote:
 On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
   It gets worse though: I have seldom seen such a badly designed piece of
   syntax as the Unicode string syntax --- see
   http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL
  -SYNTAX-STRINGS-UESCAPE
  
   You scan the string, and then after that they tell you what the escape
   character is!?  Not to mention the obvious ambiguity with  as an
   operator.
  
   If we let this go into 8.4, our previous rounds with security holes
   caused by careless string parsing will look like a day at the beach.
   No frontend that isn't fully cognizant of the Unicode string syntax is
   going to parse such things correctly --- it's going to be trivial for
   a bad guy to confuse a quoting mechanism as to what's an escape and what
   isn't.


 Note that the escape character marks the Unicode escapes; it doesn't affect 
 the
  quote characters that delimit the string.  So offhand I can't see any 
 potential
  for quote confusion/SQL injection type problems.  Please elaborate if you see
  a problem.

  If there are problems, we could consider getting rid of the UESCAPE clause.
  Without it, the U'' strings would behave much like the E'' strings.  But I'd
  like to understand the problem first.

I think the problem is that they should not act like E'' strings, but they
should act like plain '' strings - they should follow stdstr setting.

That way existing tools that may (or may not..) understand E'' and stdstr
settings, but definitely have not heard about U'' strings can still
parse the SQL without new surprises.

If they already act that way then keeping U should be fine.

And if UESCAPE does not affect main string parsing, but is handled in
second pass going over parsed string - like bytea \ - then that should
also be fine and should not cause any new surprises.

But if not, it must go.

I would prefer that such quoting extensions would wait until
stdstr=on setting is the only mode Postgres will operate.
Fitting new quoting ways to environment with flippable stdstr setting
will be rather painful for everyone.

I still stand on my proposal, how about extending E'' strings with
unicode escapes (eg. \u)?  The E'' strings are already more
clearly defined than '' and they are our own, we don't need to
consider random standards, but can consider our sanity.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq is not thread safe

2009-04-14 Thread Zdenek Kotala
When postgreSQL is compiled with --thread-safe that libpq should be
thread safe. But it is not true when somebody call fork(). The problem
is that fork() forks only active threads and some mutex can stay locked
by another thread. We use ssl_config mutex which is global.

We need implement atfork handlers to fix this. See 
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

We should add pthread_atfork into _ini libpq section.

Another problem with fork is that new process inherit connections and so
on. Which is not also good, but it is happened also on single threaded
application and developer can fix it in own code. Maybe some notice in
documentation should help what application should do after fork.

Comments?

Zdenek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Windowing functions vs aggregates

2009-04-14 Thread Teodor Sigaev

Cast of aggregate's type works:
# select avg(s)::int4 from foo;

but that doesn't work for with new windowing functions interface:
# select avg(s)::int4 OVER () from foo;
ERROR:  syntax error at or near OVER
LINE 1: select avg(s)::int4 OVER () from foo;

Is that intentional?

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing functions vs aggregates

2009-04-14 Thread Greg Stark
2009/4/14 Teodor Sigaev teo...@sigaev.ru:
 select avg(s)::int4 OVER () from foo;

You can put the cast outside the window expression such as:

postgres=# select s,(avg(s) OVER (range between unbounded preceding
and current row))::int4 from foo;
 s | avg
---+-
 1 |   2
 2 |   2
 3 |   2
(3 rows)


However, I'm kind of confused by that result. Why does the range
between unbounded preceding and current row seem to be doing the
average of the whole result set? This is not related to the cast:

postgres=# select s,avg(s) OVER (range between unbounded preceding and
current row) from foo;
 s |avg
---+
 1 | 2.
 2 | 2.
 3 | 2.
(3 rows)


I haven't recompiled recently and I do recall some bug fixes a while
back. Was this that? I'm recompiling now.






-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why isn't stats_temp_directory automatically created?

2009-04-14 Thread Euler Taveira de Oliveira

Fujii Masao escreveu:

Is it worth making the patch which creates stats_temp_directory
if not present?


+1.


--
  Euler Taveira de Oliveira
  http://www.timbira.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing functions vs aggregates

2009-04-14 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 Cast of aggregate's type works:
 # select avg(s)::int4 from foo;

 but that doesn't work for with new windowing functions interface:
 # select avg(s)::int4 OVER () from foo;
 ERROR:  syntax error at or near OVER
 LINE 1: select avg(s)::int4 OVER () from foo;

 Is that intentional?

You would need to put the cast around the whole foo() over ()
construct.  That's not a divisible unit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Windowing functions vs aggregates

2009-04-14 Thread Tom Lane
Greg Stark st...@enterprisedb.com writes:
 However, I'm kind of confused by that result. Why does the range
 between unbounded preceding and current row seem to be doing the
 average of the whole result set?

That's what it's supposed to do. Current row really includes all peers
of the current row in the window frame ordering, and since you didn't
specify any ORDER BY clause, all the rows are peers.  If you put in
order by s you'll get the result you were expecting:

regression=# select s,(avg(s) OVER (range between unbounded preceding
and current row)) from foo;
 s |avg 
---+
 1 | 2.5000
 2 | 2.5000
 3 | 2.5000
 4 | 2.5000
(4 rows)

regression=# select s,(avg(s) OVER (order by s range between unbounded preceding
   
and current row)) from foo;
 s |  avg   
---+
 1 | 1.
 2 | 1.5000
 3 | 2.
 4 | 2.5000
(4 rows)


I suppose the SQL committee defined it like that to try to reduce the
implementation dependency of the results.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Marko Kreen
On 4/14/09, Peter Eisentraut pete...@gmx.net wrote:
 On Tuesday 14 April 2009 14:38:38 Marko Kreen wrote:
   I think the problem is that they should not act like E'' strings, but they
   should act like plain '' strings - they should follow stdstr setting.
  
   That way existing tools that may (or may not..) understand E'' and stdstr
   settings, but definitely have not heard about U'' strings can still
   parse the SQL without new surprises.


 Can you be more specific in what surprises you expect?  What algorithms do
  you suppose those existing tools use and what expectations do they have?

If the parsing does not happen in 2 passes and it does not take account
of stdstr setting then the  default breakage would be:

   stdstr=off, U' \' UESCAPE '!'.

And anything, whose security or functionality depends on parsing SQL
can be broken that way.

Broken functionality would be eg. Slony (or other replication solution)
distributing developer-written SQL code to bunch of nodes.  It needs to
parse text file to SQL statements and execute them separately.

There are probably other solutions who expect to understand SQL
at least token level to function correctly.  (pgpool, java has
probably something depending on it, etc.)

   I still stand on my proposal, how about extending E'' strings with
   unicode escapes (eg. \u)?  The E'' strings are already more
   clearly defined than '' and they are our own, we don't need to
   consider random standards, but can consider our sanity.


 This doesn't excite me.  I think the tendency should be to get rid of E''
  usage, because its definition of escape sequences is single-byte and ASCII
  centric and thus overall a legacy construct.

Why are you concentrating only on \0xx escapes?  The \\, \n, etc
seem standard and forward looking enough.  Yes, unicode escapes are
missing but we can add them without breaking anything.

  Certainly, we will want to keep
  around E'' for a long time or forever, but it is a legitimate goal for
  application writers to not use it, which is after all the reason behind this
  whole standards-conforming strings project.  I wouldn't want to have a
  forward-looking feature such as the Unicode escapes be burdened with that 
 kind
  of legacy behavior.

  Also note that Unicode escapes are also available for identifiers, for which
  there is no existing E that you can add it to.

Well, I was not rejecting the standard quoting, but suggesting
postponing until the stdstr mess is sorted out.  We can use \uXX
in meantime and I think most Postgres users would prefer to keep
using it...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Kevin Grittner
Pavel Stehule pavel.steh...@gmail.com wrote: 
 I though about it too. But I am not sure, if this isn't too
 complicated solution for simple task. If I thing little bit more -
 main important is timestamp of last change.
 
Yeah, if it would be too heavy to add a timestamp column or two to
pg_class and maybe one or two others, why is it better to add a whole
new table to maintain in parallel -- with it's own primary key,
foreign keys (or similar integrity enforcement mechanism), etc. 
Others apparently see a bigger advantage to this than I, but if it's
not something I can just eyeball while I'm looking at the object
definition, it isn't likely to save me much over going to other
sources.
 
Let's not over-engineer this.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
 If we let this go into 8.4, our previous rounds with security holes
 caused by careless string parsing will look like a day at the beach.

 Note that the escape character marks the Unicode escapes; it doesn't
 affect the quote characters that delimit the string.  So offhand I
 can't see any potential for quote confusion/SQL injection type
 problems.  Please elaborate if you see a problem.

The problem is the interaction with non-standard-conforming strings.

Case 1:

select u'foo\' uescape ',' ...

The backend will see the backslash as just a data character, and
will think that ... is live SQL text.  A non-Unicode-literal-aware
frontend will think that the backslash escapes the second quote, the
comma is live SQL text, and the ... is quoted material.  Construction
of an actual SQL injection attack is left as an exercise for the reader,
but certainly the raw material is here.

Case 2:

select u'foo' uescape '\' ...

Again, any existing frontend code will think that the backslash quotes
the final quote and the ... is quoted material.  This one is
particularly nasty because we allow arbitrary amounts of whitespace and
commenting on either side of uescape:


select u'foo' 
/* hello joe, do you /* understand nested comments today?
*/
-- yes, this one too */
uescape
-- but not this one /*
'\' ...

I suspect that it's actually impossible to parse such a thing correctly
without a full-fledged flex lexer or something of equivalent complexity.
Certainly it's a couple of orders of magnitude harder than it is for
either standard-conforming or E'' literals.

Case 3:

select u'foo\' uescape ',' ...

select u  'foo\' uescape ',' ...

In the first form the ... is live SQL, in the second form it is quoted
material.  This means that you might correctly validate a query and then
have your results invalidated by later processing that innocently adds
or removes whitespace.  (This is particularly nasty in a standard that
demands we parse x/-1 and x / -1 the same ...)

So what we've got here is a whole new set of potential SQL injection
attacks by confusing frontend literal-syntax checking, plus a truly
staggering increase in the minimum *required* complexity of such
checking.  I understand the usefulness of being able to write Unicode
code points, but they're not useful enough to justify this syntax.

This thread has already mentioned a couple of ways we could add the
same facility without making any lexical-level changes, at least for
data values.  I admit that the SQL:2008 way also covers Unicode code
points in identifiers, which we can't emulate without a lexical change;
but frankly I think the use-case for that is so thin as to be almost
nonexistent.  Who is going to choose identifiers that they cannot easily
type on their keyboards?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 10:27 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Pavel Stehule pavel.steh...@gmail.com wrote:
 I though about it too. But I am not sure, if this isn't too
 complicated solution for simple task. If I thing little bit more -
 main important is timestamp of last change.

 Yeah, if it would be too heavy to add a timestamp column or two to
 pg_class and maybe one or two others, why is it better to add a whole
 new table to maintain in parallel -- with it's own primary key,
 foreign keys (or similar integrity enforcement mechanism), etc.

Making pg_class and pg_proc tables larger hurts run-time performance,
potentially.  Making a separate table only slows down DDL operations,
which are much less frequent.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression failure on RHEL 4 w/ PostgreSQL 8.4 beta1

2009-04-14 Thread Devrim GÜNDÜZ
On Mon, 2009-04-13 at 15:31 -0400, Tom Lane wrote:
  I'm getting the following failure on RHEL 4:
 
  http://www.gunduz.org/temp/regression.out
  http://www.gunduz.org/temp/regression.diffs
 
 This test is checking whether you have working 64-bit-tzdata support.
 It seems you don't.
 
 If you built with --with-system-tzdata, and RHEL4 doesn't include
 64-bit tzdata files, then this failure would be expected.  (I'm not
 totally sure about the second premise, but some quick digging in
 the specfile's changelog suggests that Red Hat only started to
 support 64-bit tzdata in RHEL5.)

So what changed between 8.3 and 8.4? Same box can build 8.3 with
--with-system-tzdata . 

Regards,
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Regression failure on RHEL 4 w/ PostgreSQL 8.4 beta1

2009-04-14 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= dev...@gunduz.org writes:
 On Mon, 2009-04-13 at 15:31 -0400, Tom Lane wrote:
 This test is checking whether you have working 64-bit-tzdata support.
 It seems you don't.

 So what changed between 8.3 and 8.4? Same box can build 8.3 with
 --with-system-tzdata . 

We didn't have 64-bit tzdata support before --- it's a new test covering
new functionality.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Andrew Dunstan



Marko Kreen wrote:

I still stand on my proposal, how about extending E'' strings with
unicode escapes (eg. \u)?  The E'' strings are already more
clearly defined than '' and they are our own, we don't need to
consider random standards, but can consider our sanity.

  


I suspect there would be lots more support in the user community, where 
\u is well understood in a number of contexts (Java and ECMAScript, 
for example). It's also tolerably sane.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote:
 Umm, but isn't that because your encoding is using one code point?

 See the OP's explanation w.r.t. canonical equivalence.

 This isn't about the number of bytes, but about whether or not we should
 count characters encoded as two or more combined code points as a single
 char or not.

Here is a test case that shows the problem (if your terminal can display 
combining characters (xterm appears to work)):

SELECT U'\00E9', char_length(U'\00E9');
 ?column? | char_length
--+-
 é|   1
(1 row)

SELECT U'\0065\0301', char_length(U'\0065\0301');
 ?column? | char_length 
--+-
 é|   2
(1 row)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 07:07:27 Andrew Gierth wrote:
 FWIW, the SQL spec puts the onus of normalization squarely on the
 application; the database is allowed to assume that Unicode strings
 are already normalized, is allowed to behave in implementation-defined
 ways when presented with strings that aren't normalized, and provision
 of normalization functions and predicates is just another optional
 feature.

Can you name chapter and verse on that?

I see this, for example,

6.27 numeric value function

5) If a char length expression is specified, then
Case:
a) If the character encoding form of character value expression is not UTF8, 
UTF16, or UTF32, then let S be the string value expression.
Case:
i)
If the most specific type of S is character string, then the result is the 
number of characters in the value of S.
NOTE 134 — The number of characters in a character string is determined 
according to the semantics of the character set of that character string.
ii)
Otherwise, the result is OCTET_LENGTH(S).
b) Otherwise, the result is the number of explicit or implicit char length 
units in char length expression, counted in accordance with the definition 
of those units in the relevant normatively referenced document.

So SQL redirects the question of character length the Unicode standard.  I 
have not been able to find anything there on a quick look, but I'm sure the 
Unicode standard has some very specific ideas on this.  Note that the matter 
of normalization is not mentioned here.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Greg Stark
On Tue, Apr 14, 2009 at 1:32 PM, Peter Eisentraut pete...@gmx.net wrote:
 On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote:
 Umm, but isn't that because your encoding is using one code point?

 See the OP's explanation w.r.t. canonical equivalence.

 This isn't about the number of bytes, but about whether or not we should
 count characters encoded as two or more combined code points as a single
 char or not.

 Here is a test case that shows the problem (if your terminal can display
 combining characters (xterm appears to work)):

 SELECT U'\00E9', char_length(U'\00E9');
  ?column? | char_length
 --+-
  é        |           1
 (1 row)

 SELECT U'\0065\0301', char_length(U'\0065\0301');
  ?column? | char_length
 --+-
  é        |           2
 (1 row)

What's really at issue is what is a string?. That is, it a sequence
of characters or a sequence of code points. If it's the former then we
would also have to prohibit certain strings such as U'\0301'
entirely. And we have to make substr() pick out the right number of
code points, etc.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 I would prefer that such quoting extensions would wait until
 stdstr=on setting is the only mode Postgres will operate.
 Fitting new quoting ways to environment with flippable stdstr setting
 will be rather painful for everyone.

It would certainly be a lot safer to wait until non-standard-conforming
strings don't exist anymore.  The problem is that that may never happen,
and is certainly not on the roadmap to happen in the foreseeable future.

 I still stand on my proposal, how about extending E'' strings with
 unicode escapes (eg. \u)?  The E'' strings are already more
 clearly defined than '' and they are our own, we don't need to
 consider random standards, but can consider our sanity.

That's one way we could proceed.  The other proposal that seemed
attractive to me was a decode-like function:

uescape('foo\00e9bar')
uescape('foo\00e9bar', '\')

(double all the backslashes if you assume not
standard_conforming_strings).  The arguments in favor of this one
are (1) you can apply it to the result of an expression, it's not
strictly tied to literals; and (2) it's a lot lower-footprint solution
since it doesn't affect basic literal handling.  If you wish to suppose
that this is only a stopgap until someday when we can implement the SQL
standard syntax more safely, then low footprint is good.  One could
even imagine back-porting this into existing releases as a user-defined
function.

The solution with \u in extended literals is probably workable too.
I'm slightly worried about the possibility of issues with code that
thinks it knows what an E-literal means but doesn't really.  In
particular something might think it knows that \u just means u,
and proceed to strip the backslash.  I don't see a path for that to
become a security hole though, only a garden-variety bug.  So I could
live with that one on the grounds of being easier to use (which it
would be, because of less typing compared to uescape()).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 14:38:38 Marko Kreen wrote:
 I think the problem is that they should not act like E'' strings, but they
 should act like plain '' strings - they should follow stdstr setting.

 That way existing tools that may (or may not..) understand E'' and stdstr
 settings, but definitely have not heard about U'' strings can still
 parse the SQL without new surprises.

Can you be more specific in what surprises you expect?  What algorithms do 
you suppose those existing tools use and what expectations do they have?

 I still stand on my proposal, how about extending E'' strings with
 unicode escapes (eg. \u)?  The E'' strings are already more
 clearly defined than '' and they are our own, we don't need to
 consider random standards, but can consider our sanity.

This doesn't excite me.  I think the tendency should be to get rid of E'' 
usage, because its definition of escape sequences is single-byte and ASCII 
centric and thus overall a legacy construct.  Certainly, we will want to keep 
around E'' for a long time or forever, but it is a legitimate goal for 
application writers to not use it, which is after all the reason behind this 
whole standards-conforming strings project.  I wouldn't want to have a 
forward-looking feature such as the Unicode escapes be burdened with that kind 
of legacy behavior.

Also note that Unicode escapes are also available for identifiers, for which 
there is no existing E that you can add it to.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Saturday 11 April 2009 18:20:47 Sam Mason wrote:
 I can't see much support in the other database engines; searched for
 Oracle, MS-SQL, DB2 and Firebird.  MySQL has it planned for 7.1, so not
 for a while.

DB2 supports it, as far as I know.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 14, 2009 at 10:27 AM, Kevin Grittner
 Yeah, if it would be too heavy to add a timestamp column or two to
 pg_class and maybe one or two others, why is it better to add a whole
 new table to maintain in parallel -- with it's own primary key,
 foreign keys (or similar integrity enforcement mechanism), etc.

 Making pg_class and pg_proc tables larger hurts run-time performance,
 potentially.  Making a separate table only slows down DDL operations,
 which are much less frequent.

And even more to the point, adding columns to the core system tables
means you pay the performance cost *even when not using the feature*.
We normally expect that inessential features should avoid making a
performance impact on those who have no use for them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with Function Type in \df

2009-04-14 Thread Tom Lane
David Fetter da...@fetter.org writes:
 On Mon, Apr 13, 2009 at 07:24:31PM -0400, Tom Lane wrote:
 I'd go for something like
 
 Type
 
 window
 agg
 trigger
 normal
 
 Or we could spell out aggregate, but that makes the column a
 couple of characters wider ...

 Done.

I had a second thought about that: presumably we should make the
function type names translatable.  If we do that, it might be better
to make the aggregate case be aggregate and take the width hit.
Otherwise translators are going to be puzzled when they come across
agg as a translatable phrase.

Or maybe I'm overthinking that problem.  Comments from anyone who
actually does translations?

 I've also added \df[antw], which lets people narrow their search.

Why didn't you make it work like \d[tisv], ie allow more than one
letter?  If you're going to be inventing new features at this late
date, they should at least work like the adjacent precedent.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Tom Lane
Greg Stark st...@enterprisedb.com writes:
 What's really at issue is what is a string?. That is, it a sequence
 of characters or a sequence of code points. If it's the former then we
 would also have to prohibit certain strings such as U'\0301'
 entirely. And we have to make substr() pick out the right number of
 code points, etc.

Another question is what is the purpose of a database?  To me it would
be quite the wrong thing for the DB to not store what is presented, as
long as it's considered legal.  Normalization of legal variant forms
seems pretty questionable.  So I'm with the camp that says this is the
application's responsibility.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with Function Type in \df

2009-04-14 Thread David Fetter
On Tue, Apr 14, 2009 at 12:35:21PM -0400, Tom Lane wrote:
 David Fetter da...@fetter.org writes:
  On Mon, Apr 13, 2009 at 07:24:31PM -0400, Tom Lane wrote:
  I'd go for something like
  
  Type
  
  window
  agg
  trigger
  normal
  
  Or we could spell out aggregate, but that makes the column a
  couple of characters wider ...
 
  Done.
 
 I had a second thought about that: presumably we should make the
 function type names translatable.  If we do that, it might be better
 to make the aggregate case be aggregate and take the width hit.
 Otherwise translators are going to be puzzled when they come across
 agg as a translatable phrase.

OK

 Or maybe I'm overthinking that problem.  Comments from anyone who
 actually does translations?
 
  I've also added \df[antw], which lets people narrow their search.
 
 Why didn't you make it work like \d[tisv], ie allow more than one
 letter?  If you're going to be inventing new features at this late
 date, they should at least work like the adjacent precedent.

Because I didn't think of it.  Lemme see about that :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread David E. Wheeler

On Apr 14, 2009, at 9:26 AM, Tom Lane wrote:

Another question is what is the purpose of a database?  To me it  
would

be quite the wrong thing for the DB to not store what is presented, as
long as it's considered legal.  Normalization of legal variant forms
seems pretty questionable.  So I'm with the camp that says this is the
application's responsibility.


Can `convert()` normalize strings?

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 8:53 AM, Peter Eisentraut pete...@gmx.net wrote:
 This doesn't excite me.  I think the tendency should be to get rid of E''
 usage, because its definition of escape sequences is single-byte and ASCII
 centric and thus overall a legacy construct.  Certainly, we will want to keep
 around E'' for a long time or forever, but it is a legitimate goal for
 application writers to not use it, which is after all the reason behind this
 whole standards-conforming strings project.  I wouldn't want to have a
 forward-looking feature such as the Unicode escapes be burdened with that kind
 of legacy behavior.

 Also note that Unicode escapes are also available for identifiers, for which
 there is no existing E that you can add it to.

Maybe I've just got my head deeply in the sand, but I don't understand
what the alternative to E'' supposedly is.  How am I supposed to write
the equivalent of E'\t\n\f' without using E''?  The
standard_conforming_strings syntax apparently supports no escapes of
any kind, which seems so hideously inconvenient that I can't even
imagine why someone wants that behavior.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Andrew Dunstan



David E. Wheeler wrote:

On Apr 14, 2009, at 9:26 AM, Tom Lane wrote:


Another question is what is the purpose of a database?  To me it would
be quite the wrong thing for the DB to not store what is presented, as
long as it's considered legal.  Normalization of legal variant forms
seems pretty questionable.  So I'm with the camp that says this is the
application's responsibility.


Can `convert()` normalize strings?




I think that's handling a quite different problem.

It certainly should not do so automatically, IMNSHO.

I think there's a good case for some functions implementing the various 
Unicode normalization functions, though.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Andrew Dunstan



Kevin Grittner wrote:
 
I'm curious -- can every multi-code-point character be normalized to a

single-code-point character?
 
  


I don't believe so. Those combinations used in the most common 
orthographic languages have their own code points, but I understand you 
can use the combining chars with essentially any other chars, although 
it might not always make much sense to do so. That's important when 
you're inventing symbols in things like Mathematical and Scientific papers.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regression failure on RHEL 4 w/ PostgreSQL 8.4 beta1

2009-04-14 Thread Devrim GÜNDÜZ
On Tue, 2009-04-14 at 11:16 -0400, Tom Lane wrote:
  So what changed between 8.3 and 8.4? Same box can build 8.3 with
  --with-system-tzdata . 
 
 We didn't have 64-bit tzdata support before --- it's a new test
 covering new functionality.

Thanks Tom. 

Regards,
-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: 
 Making pg_class and pg_proc tables larger hurts run-time
performance,
 potentially.  Making a separate table only slows down DDL
operations,
 which are much less frequent.
 
Copying the pg_class table, with oids and indexes, with and without
the addition of one timestamp column, the timestamp column caused the
copy to be about 11.3% larger; so I see your point.
 
I guess I didn't realize just how tight the pg_class table was.
 
Given all that, I'm going to say that from my perspective I don't
think the convenience of saving the information is worth the cost,
with either approach.  I understand it might mean more to others.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I think there's a good case for some functions implementing the various 
 Unicode normalization functions, though.

I have no objection to that so long as the code footprint is in line
with the utility gain (i.e. not all that much).  If we have to bring in
ICU or something similar to make it happen, the cost/benefit ratio looks
pretty bad.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered topg_proc and pg_class

2009-04-14 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote: 
 the timestamp column caused the copy to be about 11.3% larger
 
Grabbed the wrong numbers.  It's really 2.5%, but still
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Kevin Grittner
Greg Stark st...@enterprisedb.com wrote: 
 Peter Eisentraut pete...@gmx.net wrote:
 SELECT U'\00E9', char_length(U'\00E9');
  ?column? | char_length
 --+-
  é|   1
 (1 row)

 SELECT U'\0065\0301', char_length(U'\0065\0301');
  ?column? | char_length
 --+-
  é|   2
 (1 row)
 
 What's really at issue is what is a string?. That is, it a
 sequence of characters or a sequence of code points.
 
Doesn't the SQL standard refer to them as character string literals?
 
The function is called character_length or char_length.
 
I'm curious -- can every multi-code-point character be normalized to a
single-code-point character?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 18:49:45 Greg Stark wrote:
 What's really at issue is what is a string?. That is, it a sequence
 of characters or a sequence of code points.

I think a sequence of codepoints would be about as silly a definition as the 
antiquated notion of a string as a sequence of bytes.

 If it's the former then we
 would also have to prohibit certain strings such as U'\0301'
 entirely. And we have to make substr() pick out the right number of
 code points, etc.

Sure enough.  That all goes along with what the original poster was saying.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Monday 13 April 2009 20:18:31 - - wrote:
 2) PG has no support for the Unicode collation algorithm. Collation is
 offloaded to the OS, which makes this quite inflexible.

This argument is unclear.  Do you want the Unicode collation algorithm or do 
you want flexibility?  Some OS do implement the Unicode collation algorithm.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 19:26:41 Tom Lane wrote:
 Another question is what is the purpose of a database?  To me it would
 be quite the wrong thing for the DB to not store what is presented, as
 long as it's considered legal.  Normalization of legal variant forms
 seems pretty questionable.  So I'm with the camp that says this is the
 application's responsibility.

I think automatically normalizing or otherwise fiddling with Unicode strings 
with combining characters is not acceptable.  But the point is that we should 
process equivalent forms in a consistent way.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Maybe I've just got my head deeply in the sand, but I don't understand
 what the alternative to E'' supposedly is.  How am I supposed to write
 the equivalent of E'\t\n\f' without using E''?  The
 standard_conforming_strings syntax apparently supports no escapes of
 any kind, which seems so hideously inconvenient that I can't even
 imagine why someone wants that behavior.

Well, quite aside from issues of compatibility with standards and other
databases, I'm sure there are lots of Windows users who are more
interested in being able to store a Windows pathname without doubling
their backslashes than they are in being able to type readable names
for ASCII control characters.  After all, in most cases you can get
those characters into a string just by typing them (especially if you
aren't using readline or something like it).

BTW, does anyone know whether Unicode includes the ASCII control
characters ... ie, is \u0009 a name for tab?  If so, maybe this
syntax is in part an attempt to cover that use-case in the standard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread - -
 I don't believe that the standard forbids the use of combining chars at all.
 RFC 3629 says:

  ... This issue is amenable to solutions based on Unicode Normalization
  Forms, see [UAX15].

 This is the relevant part. Tom was claiming that the UTF8 encoding required
 normalizing the string of unicode codepoints before encoding. I'm not sure
 that's true though, is it?

No. I think Tom has mistaken this for the fact that the UTF8 encoding
can have multiple byte representations for one and the same code
point. The standard requires the
shortest byte representation to be used. (Please see
http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/character-encoding.html
for more information). However, this has nothing to do with *code
point* normalization. The encoding does not require a code point
sequence to be normalized. Infact, UTF-8 could hold any of the 4
different normalized forms, 2 of which are completely decomposed
forms, that is, every accent takes up its own code point. Also, UTF-8
could hold non-normalized strings. Encodings just deal with how code
points are represented in memory or over wires.

 Another question is what is the purpose of a database?  To me it would
 be quite the wrong thing for the DB to not store what is presented, as
 long as it's considered legal.  Normalization of legal variant forms
 seems pretty questionable.  So I'm with the camp that says this is the
 application's responsibility.

What I did not mean is automatic normalization. I meant something like
PG providing a function to normalize strings which can be explicitly
called by the user in case it is needed. For example:

SELECT * FROM table1 WHERE normalize(a, 'NFC') = normalize($1, 'NFC');
-- NFC is one of the 4 mentioned normalization forms and the one that
should probably be used, since it combines code points rather than
decomposing them.

I completely agree that the database should never just normalize by
itself, because it might be the users intention to store
non-normalized strings. An exception might be an explicit
configuration setting which tells PG to normalize automatically. In
case of the above SELECT query, the problem of offloading the
normalization to the app means, that every single application that is
ever used with this database has to a) normalize the string, b) use
the same normalization form. If just one application at one point in
time fails to do so, string comparison is no longer safe (which is
could be a security problem as the quoted RFC text says). But with a
callable function like normalize() above, the user himself can choose
whether it is important or not. That is, does he want code points to
match (do not use normalize() then), or does he want characters to
match (use normalize() then). The user can normalize the string
exactly where it is needed (for comparison).

I've searched PG's source code and it appeared to me that the 'text'
type is just a typedef for 'varlena', the same type 'bytea' is based
on. Given that the client and database encoding is the same, does this
mean that text is internally stored in exactly the same binary
representation the client has sent it in? So that if the client has
sent it in any of the 4 normalized forms, PG guarantees to store and
retrieve it (in case of a later SELECT) exactly as it was sent (store
what is presented)? In other words: does PG guarantuee the code point
sequence to remain the same? Because if it does not, you cannot
offload the normalization work to the app anyway, since PG would be
allowed un-normalize it internally.

Also, what happens if the client has a different encoding than the
database, and PG has to internally convert client strings to UTF-8.
Does it only generate code points in the same normalized form that it
expects the user input to be in?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Warm Standby restore_command documentation (was: New trigger option of pg_standby)

2009-04-14 Thread Andreas Pflug
I've been following the thread with growing lack of understanding why
this is so hardly discussed, and I went back to the documentation of
what the restore_command should do (
http://www.postgresql.org/docs/8.3/static/warm-standby.html )

While the algorithm presented in the pseudocode isn't dealing too good
with a situation where the trigger is set while the restore_command is
sleeping (this should be handled better in a real implementation), the
code says

Restore all wal files. If no more wal files are present, stop restoring
if the trigger is set; otherwise wait for a new wal file.

Since pg_standby is meant as implementation of restore_command, it has
to follow the directive stated above; *anything else is a bug*.
pg_standby currently does *not* obey this directive, and has that
documented, but a documented bug still is a bug.

Conclusion: There's no new trigger option needed, instead pg_standby
has to be fixed so it does what the warm standby option of postgres
needs. The trigger is only to be examined if no more files are
restorable, and only once.

Regards,
Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 20:35:21 Robert Haas wrote:
 Maybe I've just got my head deeply in the sand, but I don't understand
 what the alternative to E'' supposedly is.  How am I supposed to write
 the equivalent of E'\t\n\f' without using E''?

Well, the first alternative is to type those characters in literally.  The 
second alternative is the U'' syntax. ;-)  The third alternative is to design 
applications that don't need this, because the processing behavior of those 
characters is quite unportable.  But of course in some cases using the E'' 
syntax is the most convenient.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread David E. Wheeler

On Apr 14, 2009, at 11:22 AM, Tom Lane wrote:


BTW, does anyone know whether Unicode includes the ASCII control
characters ... ie, is \u0009 a name for tab?  If so, maybe this
syntax is in part an attempt to cover that use-case in the standard.


Yes, you can use, e.g., #x0009; in HTML to represent a tab character.

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread David E. Wheeler

On Apr 14, 2009, at 11:10 AM, Tom Lane wrote:


Andrew Dunstan and...@dunslane.net writes:
I think there's a good case for some functions implementing the  
various

Unicode normalization functions, though.


I have no objection to that so long as the code footprint is in line
with the utility gain (i.e. not all that much).  If we have to bring  
in
ICU or something similar to make it happen, the cost/benefit ratio  
looks

pretty bad.


I've no idea what it would require, but the mapping table must be  
pretty substantial. Still, I'd love to have this functionality in the  
database.


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 17:13:00 Marko Kreen wrote:
 If the parsing does not happen in 2 passes and it does not take account
 of stdstr setting then the  default breakage would be:

stdstr=off, U' \' UESCAPE '!'.

I think we can handle that and the cases Tom presents by erroring out when the 
U syntax is used with stdstr off.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 21:22:29 Tom Lane wrote:
 BTW, does anyone know whether Unicode includes the ASCII control
 characters ... ie, is \u0009 a name for tab?  If so, maybe this
 syntax is in part an attempt to cover that use-case in the standard.

Yes on both.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered topg_proc and pg_class

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 2:13 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 the timestamp column caused the copy to be about 11.3% larger

 Grabbed the wrong numbers.  It's really 2.5%, but still

Well, that's why Tom doesn't want to add it to pg_class.  But putting
it in a separate table will have no impact on the speed of anything
except DDL statements, and even then it won't require copying the
whole table, so the performance impact will be pretty minimal, so I
think it should be all right.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 18:54:33 Tom Lane wrote:
 The other proposal that seemed
 attractive to me was a decode-like function:

 uescape('foo\00e9bar')
 uescape('foo\00e9bar', '\')

This was discussed previously, but rejected with the following argument:

There are some other disadvantages for making a function call.  You 
couldn't use that kind of literal in any other place where the parser 
calls for a string constant: role names, tablespace locations, 
passwords, copy delimiters, enum values, function body, file names.

Some of those appear to be very plausible use cases.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 17:32:00 Tom Lane wrote:
  I admit that the SQL:2008 way also covers Unicode code
 points in identifiers, which we can't emulate without a lexical change;
 but frankly I think the use-case for that is so thin as to be almost
 nonexistent.  Who is going to choose identifiers that they cannot easily
 type on their keyboards?

For example, table names are automatically generated, or you write a test case 
for weird looking names, or you want to add special characters in an 
identifier that will later be displayed somewhere, or in general you are 
writing an application for a foreign language.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 I think we can handle that and the cases Tom presents by erroring out
 when the U syntax is used with stdstr off.

I think you're missing the point --- this is not about whether the
syntax is unambiguous (it is already) but about whether a frontend that
doesn't understand it 100% will be secure against subversion.  I have no
confidence in the latter assumption.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 2:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Maybe I've just got my head deeply in the sand, but I don't understand
 what the alternative to E'' supposedly is.  How am I supposed to write
 the equivalent of E'\t\n\f' without using E''?  The
 standard_conforming_strings syntax apparently supports no escapes of
 any kind, which seems so hideously inconvenient that I can't even
 imagine why someone wants that behavior.

 Well, quite aside from issues of compatibility with standards and other
 databases, I'm sure there are lots of Windows users who are more
 interested in being able to store a Windows pathname without doubling
 their backslashes than they are in being able to type readable names
 for ASCII control characters.  After all, in most cases you can get
 those characters into a string just by typing them (especially if you
 aren't using readline or something like it).

Well, that's fine, but that's a long way from Peter's statement that
I think the tendency should be to get rid of E'' usage.  It's only
been a minor inconvenience for me to change my applications to use
E'', but I certainly don't see how I could get by without it, and it's
far more like other programming languages that I use (e.g. C) than the
standard syntax is.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On Tuesday 14 April 2009 18:54:33 Tom Lane wrote:
 The other proposal that seemed
 attractive to me was a decode-like function:
 
 uescape('foo\00e9bar')
 uescape('foo\00e9bar', '\')

 This was discussed previously, but rejected with the following argument:

 There are some other disadvantages for making a function call.  You 
 couldn't use that kind of literal in any other place where the parser 
 calls for a string constant: role names, tablespace locations, 
 passwords, copy delimiters, enum values, function body, file names.

I'm less than convinced that those are really plausible use-cases for
characters that one is unable to type directly.  However, I'll grant the
point.  So that narrows us down to considering the \u extension to E''
strings as a saner and safer alternative to the spec's syntax.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with Function Type in \df

2009-04-14 Thread Alvaro Herrera
Tom Lane wrote:

 I had a second thought about that: presumably we should make the
 function type names translatable.  If we do that, it might be better
 to make the aggregate case be aggregate and take the width hit.
 Otherwise translators are going to be puzzled when they come across
 agg as a translatable phrase.

I think it's good to have them translatable.  As for using aggregate
instead of agg I don't think it's that great an idea.  If you need to
notify translators that agg stands for aggregate, add a
/* translator: */ comment.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with Function Type in \df

2009-04-14 Thread David Fetter
On Tue, Apr 14, 2009 at 02:52:32PM -0400, Alvaro Herrera wrote:
 Tom Lane wrote:
 
  I had a second thought about that: presumably we should make the
  function type names translatable.  If we do that, it might be
  better to make the aggregate case be aggregate and take the
  width hit.  Otherwise translators are going to be puzzled when
  they come across agg as a translatable phrase.
 
 I think it's good to have them translatable.  As for using
 aggregate instead of agg I don't think it's that great an idea.
 If you need to notify translators that agg stands for aggregate,
 add a /* translator: */ comment.

Where would I add that?  First mention, each time, or...?

Cheers,
David (reworking patch per suggestions)
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with Function Type in \df

2009-04-14 Thread Alvaro Herrera
David Fetter wrote:
 On Tue, Apr 14, 2009 at 02:52:32PM -0400, Alvaro Herrera wrote:

  I think it's good to have them translatable.  As for using
  aggregate instead of agg I don't think it's that great an idea.
  If you need to notify translators that agg stands for aggregate,
  add a /* translator: */ comment.
 
 Where would I add that?  First mention, each time, or...?

Is there more than one mention of agg?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql with Function Type in \df

2009-04-14 Thread David Fetter
On Tue, Apr 14, 2009 at 03:04:55PM -0400, Alvaro Herrera wrote:
 David Fetter wrote:
  On Tue, Apr 14, 2009 at 02:52:32PM -0400, Alvaro Herrera wrote:
 
   I think it's good to have them translatable.  As for using
   aggregate instead of agg I don't think it's that great an
   idea.  If you need to notify translators that agg stands for
   aggregate, add a /* translator: */ comment.
  
  Where would I add that?  First mention, each time, or...?
 
 Is there more than one mention of agg?

It's in 3 branches in describe.c.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Peter Eisentraut
On Tuesday 14 April 2009 21:48:12 Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  I think we can handle that and the cases Tom presents by erroring out
  when the U syntax is used with stdstr off.

 I think you're missing the point --- this is not about whether the
 syntax is unambiguous (it is already) but about whether a frontend that
 doesn't understand it 100% will be secure against subversion.  I have no
 confidence in the latter assumption.

I think I am getting the point quite well.  Do you have an example how this 
can be subverted?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Tom Lane
Meredith L. Patterson m...@osogato.com writes:
 Tom Lane wrote:
 I suspect that it's actually impossible to parse such a thing correctly
 without a full-fledged flex lexer or something of equivalent complexity.

 Is there a reason not to use a full-fledged flex lexer?

The point is that that's a pretty large imposition on client code that
we don't control or maintain, in order to get a feature that could be
gotten in much less dangerous ways that don't impact any code outside
PG.

 I'd be willing to take a crack at such a thing, but I'm working 80-hour
 weeks through the end of June and likely wouldn't be able to put in any
 time on it till then. So I definitely couldn't promise anything for 8.4,
 but if putting it off till 8.5 works, sign me up.

Shall we pass your name on to every package using Postgres, then?  This
is *not* about code within Postgres.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Meredith L. Patterson
Tom Lane wrote:
 I suspect that it's actually impossible to parse such a thing correctly
 without a full-fledged flex lexer or something of equivalent complexity.
 Certainly it's a couple of orders of magnitude harder than it is for
 either standard-conforming or E'' literals.

Is there a reason not to use a full-fledged flex lexer?

I'd be willing to take a crack at such a thing, but I'm working 80-hour
weeks through the end of June and likely wouldn't be able to put in any
time on it till then. So I definitely couldn't promise anything for 8.4,
but if putting it off till 8.5 works, sign me up.

Cheers,
--mlp
_
Meredith L. Patterson
Founder and CTO
Osogato, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Meredith L. Patterson
Tom Lane wrote:
 This is *not* about code within Postgres.

One typically provides libraries for this sort of thing, but your point
is taken; suggestion withdrawn.

--mlp
_
Meredith L. Patterson
Founder and CTO
Osogato, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Marko Kreen
On 4/14/09, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
   On Tuesday 14 April 2009 18:54:33 Tom Lane wrote:
   The other proposal that seemed
   attractive to me was a decode-like function:
  
   uescape('foo\00e9bar')
   uescape('foo\00e9bar', '\')

   This was discussed previously, but rejected with the following argument:

   There are some other disadvantages for making a function call.  You
   couldn't use that kind of literal in any other place where the parser
   calls for a string constant: role names, tablespace locations,
   passwords, copy delimiters, enum values, function body, file names.


 I'm less than convinced that those are really plausible use-cases for
  characters that one is unable to type directly.  However, I'll grant the
  point.  So that narrows us down to considering the \u extension to E''
  strings as a saner and safer alternative to the spec's syntax.

My vote would go to \u.  The U may be sql standard but it's different
from any established practical standard.


Alternative would be to make U follow stdstr setting:

stdstr=on - you get fully standard-conforming syntax:

  U'\xxx' UESCAPE '\'

stdstr=off - you need to follow old quoting rules:

  U'\\xxx' UESCAPE '\\'

This would result in safe, and when stdstr=on, fully standard compliant
syntax.  Only downside would be that in practice - stdstr=off - it would
be unusable.


Third alternative would be to do both of them - \u as a usable method
and safe-U to mark the checkbox for SQL-standard compliance.
If we do want U, I would prefer that to U-only syntax.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Tom Lane
Whichever way the current discussion about Unicode literals turns out,
it's clear that plpgsql is not up to speed on matching the core lexer's
behavior --- it's wrong anyway with respect to
standard_conforming_strings.

I had earlier speculated semi-facetiously about ripping out the plpgsql
lexer altogether, but the more I think about it the less silly the idea
looks.  Suppose that we change the core lexer so that the keyword lookup
table it's supposed to use is passed to scanner_init() rather than being
hard-wired in.  Then make plpgsql call the core lexer using its own
keyword table.  Everything else would match core lexical behavior
automatically.  The special behavior that we do want, such as being
able to construct a string representing a desired subrange of the input,
could all be handled in plpgsql-specific wrapper code.

I've just spent a few minutes looking for trouble spots in this theory,
and so far the only real ugliness I can see is that plpgsql treats
:= and .. as single tokens whereas the core would parse them as two
tokens.  We could hack the core lexer to have an additional switch that
controls that.  Or maybe just make it always return them as single
tokens --- AFAICS, neither combination is legal in core SQL anyway,
so this would only result in a small change in the exact syntax error
you get if you write such a thing in core SQL.

Another trouble spot is the #option syntax, but that could be handled
by a special-purpose prescan, or just dropped altogether; it's not like
we've ever used that for anything but debugging.

It looks like this might take about a day's worth of work (IOW two
or three days real time) to get done.

Normally I'd only consider doing such a thing during development phase,
but since we're staring at at least one and maybe two bugs that are
going to be hard to fix in any materially-less-intrusive way, I'm
thinking about doing it now.  Theoretically this change shouldn't break
any working code, so letting it hit the streets in 8.4beta2 doesn't seem
totally unreasonable.

Comments, objections, better ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-14 Thread Sergey Burladyan
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes:

 There are some issues:
 * Is it possible to determine the platform encoding?

There is no platform encoding in linux. File name encoding depend on user
locale, so different users can have different encoding of file name.

-- 
Sergey Burladyan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode string literals versus the world

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, that's fine, but that's a long way from Peter's statement that
 I think the tendency should be to get rid of E'' usage.

 Bear in mind that that's Peter's opinion; it's not necessarily shared
 by anyone else.  I was just responding to your assertion of the
 diametrically opposed position that non-E strings are useless (which
 I don't share either).

Useless might be an overstatement, but I certainly have gotten no
benefit out of them.  It seems decidedly odd to me to propose that
users embed literal control characters in their code/SQL scripts in
lieu of using escape sequences.  If that were a normal and reasonable
thing to do then I expect C, C++, Perl, Python, Ruby, Javascript, and
countless other languages wouldn't provide this functionality.  In
reality, most of them do provide it, sometimes as the only option.
And as to your point about Windows pathnames, if you're trying to
avoid escaping lots of backslashes, dollar-quoting is a perfectly
adequate solution.

The real motivation for standard_conforming_strings is the one
embedded in the name.  And I'm fine with that.  But I don't see E''
disappearing from my code any time soon.  I periodically have a need
for things like foo || E'\n' || bar and it screws up the formatting of
the code to write foo || '
' || bar.  Writing it that way, it's also entirely non-obvious whether
there are any spaces or tabs after the opening quote and before the
newline, and if so whether or not they are intentional.  But I don't
mind if someone ELSE likes the new, standard strings - it just isn't
me.  :-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Robert Haas
On Tue, Apr 14, 2009 at 4:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Whichever way the current discussion about Unicode literals turns out,
 it's clear that plpgsql is not up to speed on matching the core lexer's
 behavior --- it's wrong anyway with respect to
 standard_conforming_strings.

 I had earlier speculated semi-facetiously about ripping out the plpgsql
 lexer altogether, but the more I think about it the less silly the idea
 looks.  Suppose that we change the core lexer so that the keyword lookup
 table it's supposed to use is passed to scanner_init() rather than being
 hard-wired in.  Then make plpgsql call the core lexer using its own
 keyword table.  Everything else would match core lexical behavior
 automatically.  The special behavior that we do want, such as being
 able to construct a string representing a desired subrange of the input,
 could all be handled in plpgsql-specific wrapper code.

 I've just spent a few minutes looking for trouble spots in this theory,
 and so far the only real ugliness I can see is that plpgsql treats
 := and .. as single tokens whereas the core would parse them as two
 tokens.  We could hack the core lexer to have an additional switch that
 controls that.  Or maybe just make it always return them as single
 tokens --- AFAICS, neither combination is legal in core SQL anyway,
 so this would only result in a small change in the exact syntax error
 you get if you write such a thing in core SQL.

 Another trouble spot is the #option syntax, but that could be handled
 by a special-purpose prescan, or just dropped altogether; it's not like
 we've ever used that for anything but debugging.

 It looks like this might take about a day's worth of work (IOW two
 or three days real time) to get done.

 Normally I'd only consider doing such a thing during development phase,
 but since we're staring at at least one and maybe two bugs that are
 going to be hard to fix in any materially-less-intrusive way, I'm
 thinking about doing it now.  Theoretically this change shouldn't break
 any working code, so letting it hit the streets in 8.4beta2 doesn't seem
 totally unreasonable.

 Comments, objections, better ideas?

All this sounds good.  As for how to handle := and .., I think making
them lex the same way in PL/pgsql and core SQL would be a good thing.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clean shutdown and warm standby

2009-04-14 Thread Guillaume Smet
Hi,

On Wed, Apr 8, 2009 at 9:11 PM, I wrote:
 Following the discussion here
 http://archives.postgresql.org/message-id/49d9e986.8010...@pse-consulting.de
 , I wrote a small patch which rotates the last XLog file on shutdown
 [snip]

Any comment or advice on how I can fix it with a different method if
this one is considered wrong?

Original message and patch here:
http://archives.postgresql.org/message-id/1d4e0c10904081211p2c0f1cdepe620c11d1271c...@mail.gmail.com

Thanks.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Simon Riggs

On Tue, 2009-04-14 at 16:37 -0400, Tom Lane wrote:
 Comments, objections, better ideas?

Please, if you do this, make it optional.

Potentially changing the behaviour of thousands of functions just to fix
a rare bug will not endear us to our users. The bug may be something
that people are relying on in some subtle way, ugly as that sounds.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Andrew Dunstan



Robert Haas wrote:

All this sounds good.  As for how to handle := and .., I think making
them lex the same way in PL/pgsql and core SQL would be a good thing.


  


They don't have any significance in core SQL. What would we do with the 
lexeme?


ISTR we've used some hacks in the past to split lexemes into pieces, and 
presumably we'd have to do something similar with these.


The only thing that makes me nervous about this is that we're very close 
to Beta. OTOH, this is one area the regression suite should give a 
fairly good workout to.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Robert Haas wrote:
 All this sounds good.  As for how to handle := and .., I think making
 them lex the same way in PL/pgsql and core SQL would be a good thing.

 They don't have any significance in core SQL. What would we do with the 
 lexeme?

It would just fail --- the core grammar will have no production that can
accept it.  Right offhand I think the only difference is that instead of

regression=# select a .. 2;
ERROR:  syntax error at or near .
LINE 1: select a .. 2;
  ^

you'd see 

regression=# select a .. 2;
ERROR:  syntax error at or near ..
LINE 1: select a .. 2;
 ^

ie it acts like one token not two in the error message.

This solution would become problematic if the core grammar ever had a
meaning for := or .. that required treating them as two tokens (eg,
the grammar allowed this sequence with whitespace between).  I don't
think that's very likely though; and if it did happen we could fix it
with the aforementioned control switch.

 The only thing that makes me nervous about this is that we're very close 
 to Beta. OTOH, this is one area the regression suite should give a 
 fairly good workout to.

Yeah, I'd rather have done it before beta1, but too late.  The other
solution still entails massive changes to the plpgsql lexer, so it
doesn't really look like much lower risk.  AFAICS the practical
alternatives are a reimplementation in beta2, or no fix until 8.5.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Yet another regression issue with Fedora-10 + PG 8.4 beta1

2009-04-14 Thread Devrim GÜNDÜZ
Hi,

I built PG 8.4 beta1 on 2 different Fedora-10 boxes. One of them was a
local box, which runs under QEMU. The other one is Fedora's build
servers. (I did a scratch build on Fedora build server)

On the first machine, I got a regression failure:

http://www.gunduz.org/temp/regression.out
http://www.gunduz.org/temp/regression.diffs

I could not repeat this with 2nd one (Fedora build servers)

Any ideas what might have caused it?

Regards,

-- 
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] psql with Function Type in \df

2009-04-14 Thread David Fetter
On Tue, Apr 14, 2009 at 02:52:32PM -0400, Alvaro Herrera wrote:
 Tom Lane wrote:
 
  I had a second thought about that: presumably we should make the
  function type names translatable.  If we do that, it might be better
  to make the aggregate case be aggregate and take the width hit.
  Otherwise translators are going to be puzzled when they come across
  agg as a translatable phrase.
 
 I think it's good to have them translatable.  As for using aggregate
 instead of agg I don't think it's that great an idea.  If you need to
 notify translators that agg stands for aggregate, add a
 /* translator: */ comment.

Here's the next revision :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 10d42ca..5224440 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1043,14 +1043,19 @@ testdb=gt;
 listitem
 para
 Lists available functions, together with their argument and
-return types. If replaceable
-class=parameterpattern/replaceable
-is specified, only functions whose names match the pattern are shown.
-If the form literal\df+/literal is used, additional information 
about
-each function, including volatility, language, source code and 
description, is shown.
-By default, only user-created objects are shown;  supply a
-pattern or the literalS/literal modifier to include system
-objects.
+return types and their function type: 'normal', 'agg',
+'trigger', and 'window'.  If replaceable
+class=parameterpattern/replaceable is specified, only
+functions whose names match the pattern are shown.  If the
+form literal\df+/literal is used, additional information
+about each function, including volatility, language, source
+code and description, is shown.  By default, only user-created
+objects are shown;  supply a pattern or the
+literalS/literal modifier to include system objects.  To
+include aggregates in the result set, use \dfa, normal
+functions, \dfn, trigger functions, \dft, windowing functions,
+\dfw.  You may freely mix and match the +, S, a, n, t and w
+options.
 /para
 
 note
@@ -1064,7 +1069,6 @@ testdb=gt;
 /listitem
   /varlistentry
 
-
   varlistentry
 termliteral\dF[+] [ replaceable 
class=parameterpattern/replaceable ]/literal/term
 listitem
diff --git a/doc/src/sgml/release.sgml b/doc/src/sgml/release.sgml
index fc56c3d..09ba686 100644
--- a/doc/src/sgml/release.sgml
+++ b/doc/src/sgml/release.sgml
@@ -183,6 +183,15 @@ do it for earlier branch release files.
  /para
 /listitem
 
+listitem
+ para
+  In psql, \df now shows which type of function (normal,
+  aggregate, trigger, or window) it is.  You can also specify
+  mix-and-match options.  To get aggregates and windowing
+  functions, including system ones, for example, invoke \dfwaS+
+ /para
+/listitem
+
/itemizedlist
 
para
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b39466d..1dc3cc3 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -365,8 +365,22 @@ exec_command(const char *cmd,
case 'D':
success = listDomains(pattern, show_system);
break;
-   case 'f':
-   success = describeFunctions(pattern, 
show_verbose, show_system);
+   case 'f':   /* function subsystem */
+   switch (cmd[2])
+   {
+   case '\0':
+   case '+':
+   case 'S':
+   case 'a':
+   case 'n':
+   case 't':
+   case 'w':
+   success =  
describeFunctions(cmd[2], pattern, show_verbose, show_system);
+   break;
+   default:
+   status = PSQL_CMD_UNKNOWN;
+   break;
+   }
break;
case 'g':
/* no longer distinct from \du */
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 731baf8..ad5dcbe 100644
--- a/src/bin/psql/describe.c

Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2009-04-14 at 16:37 -0400, Tom Lane wrote:
 Comments, objections, better ideas?

 Please, if you do this, make it optional.

I don't think making the plpgsql lexer pluggable is realistic.

 Potentially changing the behaviour of thousands of functions just to fix
 a rare bug will not endear us to our users. The bug may be something
 that people are relying on in some subtle way, ugly as that sounds.

That's why I don't want to change it in a minor release.  In a major
release, however, it's fair game.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Yet another regression issue with Fedora-10 + PG 8.4 beta1

2009-04-14 Thread Tom Lane
Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= dev...@gunduz.org writes:
 I built PG 8.4 beta1 on 2 different Fedora-10 boxes. One of them was a
 local box, which runs under QEMU. The other one is Fedora's build
 servers. (I did a scratch build on Fedora build server)

 On the first machine, I got a regression failure:

 http://www.gunduz.org/temp/regression.out
 http://www.gunduz.org/temp/regression.diffs

 I could not repeat this with 2nd one (Fedora build servers)

 Any ideas what might have caused it?

QEMU screwing up the handling of floating-point traps, perhaps?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Bruce Momjian
Tom Lane wrote:
 Pavel Stehule pavel.steh...@gmail.com writes:
  this my proposal is very simple. It help to people who have to manage
  large or complex database system. Important data are date of creating
  and date of altering tables and stored procedures. These data cannot
  be modified by user, so implementation doesn't need any new
  statements.
 
 ISTM anyone who thinks they need this actually need a full DDL log;
 or at least, if we give them this, they will be back next week asking
 for a full log.  So it'd save a lot of work to tell them to just log
 their DDL to start with.
 
 Some obvious objections to the simple approach:
 - what if I want to know *who* made the change
 - what if I need to know about the change before last
 - what if I need to know about a DROP
 - what if I need to know about operators, operator classes, schemas, etc
   etc

How do you handle dump/restore?  Is it preserved?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unicode support

2009-04-14 Thread Andrew Gierth
 Peter == Peter Eisentraut pete...@gmx.net writes:

  On Tuesday 14 April 2009 07:07:27 Andrew Gierth wrote:
  FWIW, the SQL spec puts the onus of normalization squarely on the
  application; the database is allowed to assume that Unicode
  strings are already normalized, is allowed to behave in
  implementation-defined ways when presented with strings that
  aren't normalized, and provision of normalization functions and
  predicates is just another optional feature.

 Peter Can you name chapter and verse on that?

4.2.8 Universal character sets

  A UCS string is a character string whose character repertoire is UCS
  and whose character encoding form is one of UTF8, UTF16, or
  UTF32. Any two UCS strings are comparable.

  An SQL-implementation may assume that all UCS strings are normalized
  in one of Normalization Form C (NFC), Normalization Form D (NFD),
  Normalization Form KC (NFKC), or Normalization Form KD (NFKD), as
  specified by [Unicode15]. normalized predicate may be used to
  verify the normalization form to which a particular UCS string
  conforms. Applications may also use normalize function to enforce
  a particular normal form. With the exception of normalize function
  and normalized predicate, the result of any operation on an
  unnormalized UCS string is implementation-defined.

  Conversion of UCS strings from one character set to another is
  automatic.

  Detection of a noncharacter in a UCS-string causes an exception
  condition to be raised. The detection of an unassigned code point
  does not.

[Obviously there are things here that we don't conform to anyway (we
don't raise exceptions for noncharacters, for example. We don't claim
conformance to T061.]

normalized predicate ::=
  row value predicand normalized predicate part 2
normalized predicate part 2 ::=
  IS [ NOT ] [ normal form ] NORMALIZED

1) Without Feature T061, UCS support, conforming SQL language shall
   not contain a normalized predicate.

2) Without Feature F394, Optional normal form specification,
   conforming SQL language shall not contain normal form.

normalize function ::=
  NORMALIZE left paren character value expression
  [ comma normal form [ comma normalize function result length ] ] 
right paren

normal form ::=
NFC
  | NFD
  | NFKC
  | NFKD

7) Without Feature T061, UCS support, conforming SQL language shall
   not contain a normalize function.

9) Without Feature F394, Optional normal form specification,
   conforming SQL language shall not contain normal form.

 Peter I see this, for example,

 Peter 6.27 numeric value function
 [...]
 Peter So SQL redirects the question of character length the Unicode
 Peter standard.  I have not been able to find anything there on a
 Peter quick look, but I'm sure the Unicode standard has some very
 Peter specific ideas on this.  Note that the matter of normalization
 Peter is not mentioned here.

I've taken a not-so-quick look at the Unicode standard (though I don't
claim to be any sort of expert on it), and I certainly can't see any
definitive indication what the length is supposed to be; however, the
use of terminology such as combining character sequence (meaning a
series of codepoints that combine to make a single glyph) certainly
seems to strongly imply that our interpretation is correct and that
the OP's is not.

Other indications: the units used by length() must be the same as the
units used by position() and substring() (in the spec, when USING
CHARACTERS is specified), and it would not make sense to use a
definition of character that did not allow you to look inside a
combining sequence.

I've also failed so far to find any examples of other programming
languages in which a combining character sequence is taken to be a
single character for purposes of length or position specification.

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: add columns created and altered to pg_proc and pg_class

2009-04-14 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 How do you handle dump/restore?  Is it preserved?

I would think not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Bruce Momjian
Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Tue, 2009-04-14 at 16:37 -0400, Tom Lane wrote:
  Comments, objections, better ideas?
 
  Please, if you do this, make it optional.
 
 I don't think making the plpgsql lexer pluggable is realistic.
 
  Potentially changing the behaviour of thousands of functions just to fix
  a rare bug will not endear us to our users. The bug may be something
  that people are relying on in some subtle way, ugly as that sounds.
 
 That's why I don't want to change it in a minor release.  In a major
 release, however, it's fair game.

Well, this bug has existed long before 8.4 so we could just leave it for
8.5, and it is not like we have had tons of complaints;  the only
complaint I saw was one from March, 2008.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Well, this bug has existed long before 8.4 so we could just leave it for
 8.5, and it is not like we have had tons of complaints;  the only
 complaint I saw was one from March, 2008.

We had one last week, which is what prompted me to start looking at the
plpgsql lexer situation in the first place.  Also, if the unicode
literal situation doesn't change, that's going to be problematic as
well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replacing plpgsql's lexer

2009-04-14 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 Well, this bug has existed long before 8.4 so we could just leave it for
 8.5, and it is not like we have had tons of complaints;  the only
 complaint I saw was one from March, 2008.

I think it's a good thing to do in general.  I'm also concerned about
if it will impact the plpgsql functions we have (which are pretty
numerous..) but in the end I'd rather have it fixed in 8.4 than possibly
delayed indefinitely (after all, if it's in 8.4, why fix it for 8.5?).

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Patch for server-side encoding issues

2009-04-14 Thread Itagaki Takahiro
Here is a WIP patch to solve server-side encoding issues.
It includes Solution of the file name problem of copy on windows patch.

http://archives.postgresql.org/message-id/20090413184335.39be.52131...@oss.ntt.co.jp

It could solve the following issues. They are not only in Windows nor
Japan-specific problems. They could also occur if you use databases
with mulitple encodings or database with non-platform-native encoding
even on POSIX platforms.

1 Non-ascii file paths for database that encoding is different from
platform's encoding (that comes from $LANG or Windows codepage),
especially for COPY TO/FROM.

2 Use appropriate encoding for non-text server log (console, syslog
and eventlog). The encoding is the same as 1.

3 Use appropriate encoding for text server log (stderr and csvlog),
especially database cluster has databases with a variety of encoding.
New GUC parameter 'log_encoding' specifies the encoding in server log.

4 (incomplete) Avoid encoding conversion error in printing server log
and messages for client. Instead of error, print '?' if there is no
equivalent character in the target encoding.

For 4, I use PG_TRY and PG_CATCH for now, but it must be a bad manner.
Instead, I'm thinking that convertion procedures will take an optional
argument whether it should raise error or not. However, we need to
modify all of conversion functions to do so.

More research is needed against following situations:
  - NLS messages
  - Module path for LOAD
  - Arguments for system(), including archive_command and restore_command
  - Query texts for other database in pg_stat_activity and pg_stat_statements

Comments welcome. Please notify me if I'm missing something.


Here is a sample code to test the patch.

(client_encoding = sjis / system encoding = sjis)

C:\home\createdb utfdb --encoding=utf8 --locale=C
C:\home\createdb eucdb --encoding=eucjp --locale=C

C:\home\psql utfdb -c COPY (SELECT 1) TO 'C:/home/日本語ファイル.txt'
C:\home\psql utfdb -c SELECT '日本語' WITH ERROR
ERROR:  syntax error at or near WITH ERROR
LINE 1: SELECT '日本語' WITH ERROR
^

C:\home\psql eucdb -c COPY (SELECT 1) TO 'C:/home/日本語ファイル.txt'
C:\home\psql eucdb -c SELECT '日本語' WITH ERROR
ERROR:  syntax error at or near WITH ERROR
LINE 1: SELECT '日本語' WITH ERROR
^


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


server-side_encoding_issues_20090415.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Memory exhaustion during bulk insert

2009-04-14 Thread Xin Wang
Hi all,

I'm doing an experimental project using Postgres as the prototype.
I want to store attribute values of xml type in an internal XML table
which is created for every XML column. One XML node (element,
attribute or text) is stored as a tuple in the XML table. While
a 127MB XML document 'dblp.xml' (that has about 4 million XML nodes
thus 4 million tuples) is being stored, 2GB memory is exhausted rapidly
and then my computer hangs up. I guess the reason is that the memory
runs out before the transaction is committed because the number of
tuples being inserted is too large.

The flow of tuple insertion and functions called are as follows:

while (get the next XML node != NULL)
{
/* fill in values and isnull array */
...
tup = heap_form_tuple(tupleDesc, values, isnull);
simple_heap_insert(xmlTable, tup);
...
heap_freetuple(tup);
}

I searched the mailinglist archive and noticed that a patch to improve
bulk insert performance is committed in Nov 2008. The log message said

(the patch) keeps the current target buffer pinned and make it work
in a small ring of buffers to avoid having bulk inserts trash the whole
buffer arena.

However, I do not know much about the code below the heapam layer. Can that
patch solve my problem (the version I use is 8.3.5)? Or could you give me
some suggestion about how to avoid memory exhaustion during bulk insert
(in the meanwhile it must clean up nicely after a transaction abort)?

Thanks in advance.
Regards,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers