date:20120202

On Thu, Feb 2, 2012 at 4:26 PM, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Wed, Feb 1, 2012 at 9:18 PM, Fujii Masao masao.fu...@gmail.com wrote:

 --On 15. Januar 2012 02:50:00 -0500 Jaime Casanova ja...@2ndquadrant.com
 wrote:

 Attached is a patch thats implements a pg_stat_recovery view that
 keeps counters about processed wal records. I just notice that it
 still lacks documentation but i will add it during the week.



 Before reviewing the patch, I'd like to know: what's the purpose of this 
 view?
 It's only debug purpose? ISTM that most users don't care about this view at 
 all.


 yeah! you're right. most users won't care about it... did i tell that
 i added a track_recovery GUC so only users that wanted pay for it? i
 probably did not tell that :D

If only core developer is interested in this view, ISTM that short
description for
each WAL record is not required because he or she can know the meaning of each
WAL record by reading the source code. No? Adding short descriptions for every
WAL records seems to be overkill.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stats_recovery view

On Thu, Feb 2, 2012 at 4:32 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Feb 2, 2012 at 08:26, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Wed, Feb 1, 2012 at 9:18 PM, Fujii Masao masao.fu...@gmail.com wrote:

 --On 15. Januar 2012 02:50:00 -0500 Jaime Casanova ja...@2ndquadrant.com
 wrote:

 Attached is a patch thats implements a pg_stat_recovery view that
 keeps counters about processed wal records. I just notice that it
 still lacks documentation but i will add it during the week.



 Before reviewing the patch, I'd like to know: what's the purpose of this 
 view?
 It's only debug purpose? ISTM that most users don't care about this view at 
 all.


 yeah! you're right. most users won't care about it... did i tell that
 i added a track_recovery GUC so only users that wanted pay for it? i
 probably did not tell that :D

 I haven't looked through the code in detail, but one direct comment:
 do we really need/want to send this through the stats collector? It
 will only ever have one sender - perhaps we should just either store
 it in shared memory or store it locally and only send it on demand?

Agreed. I think we should either.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-02 Thread Marko Kreen

On Tue, Jan 31, 2012 at 8:56 PM, Robert Haas robertmh...@gmail.com wrote:
 2012/1/29 Kohei KaiGai kai...@kaigai.gr.jp:
     Remote SQL: DECLARE pgsql_fdw_cursor_0 SCROLL CURSOR FOR SELECT
 a, b FROM public.t1 WHERE (a  2)
  (3 rows)

 Shouldn't we be using protocol-level cursors rather than SQL-level cursors?

I think you want this instead:

  https://commitfest.postgresql.org/action/patch_view?id=769

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring log_newpage

2012-02-02 Thread Simon Riggs

On Thu, Feb 2, 2012 at 7:26 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 Well, you can obviously check the catalogs for that, but you must be
 assuming that you don't have access to the catalogs or this would be a
 non-issue.

 You can also identify the kind of page by looking at the special area of the
 stored page. See:
 http://archives.postgresql.org/pgsql-hackers/2007-04/msg00392.php

How does that work with different forks?

I think its very ugly to mark all sorts of different pages as if they
were heap pages when they clearly aren't. I don't recall anything so
ugly being allowed anywhere else in the system. Why is it *needed*
here?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON output functions.

2012-02-02 Thread Abhijit Menon-Sen

At 2012-02-01 18:48:28 -0500, andrew.duns...@pgexperts.com wrote:

 For now I'm inclined not to proceed with that, and leave it as an
 optimization to be considered later if necessary. Thoughts?

I agree, there doesn't seem to be a pressing need to do it now.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index-only scan performance regression

2012-02-02 Thread Dean Rasheed

On 2 February 2012 01:40, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 1, 2012 at 7:44 PM, Robert Haas robertmh...@gmail.com wrote:
 So, I guess the trade-off here is that, since sinval messages aren't
 processed immediately, we often won't notice the VM extension until
 the next statement starts, whereas with the current implementation, we
 notice it right away.  On the other hand, noticing it right away is
 costing us a system call or two per tuple.  So on further thought, I
 think we should do this.


Yes, that's a nice summary.

 Patch committed.  I moved the smgr inval to after the actual extension
 is done, which seems superior, and adjusted the comments slightly.


Thanks.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring log_newpage

2012-02-02 Thread Heikki Linnakangas


On 02.02.2012 11:35, Simon Riggs wrote:

On Thu, Feb 2, 2012 at 7:26 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:


Well, you can obviously check the catalogs for that, but you must be
assuming that you don't have access to the catalogs or this would be a
non-issue.

You can also identify the kind of page by looking at the special area of the
stored page. See:
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00392.php


How does that work with different forks?


You have the fork number explicitly in the newpage record already.


I think its very ugly to mark all sorts of different pages as if they
were heap pages when they clearly aren't. I don't recall anything so
ugly being allowed anywhere else in the system. Why is it *needed*
here?


It's not needed. Beauty is in the eye of the beholder, but I don't find 
it all that ugly, and the comment in log_newpage explains it well.


I don't see much value in adding a new field to the record. Any tools 
that read the WAL format will need to be taught about that change. Not a 
huge issue, but I also don't see much gain. On the whole I'd be inclined 
to just leave it all alone, but whatever.


I don't think it's a good idea to rename XLOG_HEAP_NEWPAGE to 
XLOG_NEWPAGE. The WAL record is still part of the heapam rmgr, even if 
it's used by other access methods, too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-02-02 Thread Abhijit Menon-Sen

At 2012-02-01 11:28:50 -0500, robertmh...@gmail.com wrote:

 It's also pretty clear that JSON
 string - PG text data type is going to admit of a number of error
 conditions (transcoding errors and perhaps invalid surrogate pairs) so
 throwing one more on the pile doesn't cost much.

Hi Robert.

I'm sorry for being slow, but I don't understand what you're proposing
to do here (if anything). Could I ask you to explain, please?

Are you talking about allowing the six literal bytes \u to be
present in the JSON? If so, I agree, there seems to be no reason to
disallow it.

Are you also saying we should allow any \u sequence, without
checking for errors (e.g. invalid surrogate pairs or parts thereof)?

And what transcoding errors are you referring to?

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix float8 parsing of denormal values (on some platforms?)

2012-02-02 Thread Marti Raudsepp

On Wed, Feb 1, 2012 at 20:17, Tom Lane t...@sss.pgh.pa.us wrote:
 Applied with minor revisions.

Thanks! :)

We're already seeing first buildfarm failures, on system narwhal
using an msys/mingw compiler.
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=narwhaldt=2012-02-02%2005%3A00%3A02

No idea which libc it uses though. The MSVC libc looks fine because
currawong passes these tests.

PS: it would be useful to have the output of 'cc -v' in buildfarm
reports since the Compiler field may be out of date.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-02-02 Thread Marti Raudsepp

On Mon, Jan 30, 2012 at 20:33, Gilles Darold gilles.dar...@dalibo.com wrote:
 After some time searching for a Pg system administration function like
 pg_is_in_recovery(), let's say pg_is_in_backup(), I couldn't find one.
 The minor patch attached here adds this administrative function that can
 be used with others backup control functions. This is a very little
 patch outside documentation because the function is only a wrapper to
 the internal BackupInProgress() function, just like pg_is_in_recovery()
 is calling RecoveryInProgress().

I think it would be more useful to have a function that returns a
timestamp when the backup started. That way, it's possible to write a
generic monitoring script that alerts the sysadmin only when a backup
has been running for too long, but is silent for well-behaved backups.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-02-02 Thread hubert depesz lubaczewski

On Wed, Feb 01, 2012 at 10:02:14PM +0100, Dimitri Fontaine wrote:
 The case for a table that is partly user data and partly extension data
 is very thin, I think that if I had this need I would use inheritance
 and a CHECK(user_data is true/false) constraint to filter the data.

definitely agree. i.e. i don't really see a case when we'd have data
from both extension, and normal usage, in the same table.
and the overhead of tracking source of data seems to be excessive.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-02-02 Thread Pavel Stehule

2012/2/2 Marti Raudsepp ma...@juffo.org:
 On Mon, Jan 30, 2012 at 20:33, Gilles Darold gilles.dar...@dalibo.com wrote:
 After some time searching for a Pg system administration function like
 pg_is_in_recovery(), let's say pg_is_in_backup(), I couldn't find one.
 The minor patch attached here adds this administrative function that can
 be used with others backup control functions. This is a very little
 patch outside documentation because the function is only a wrapper to
 the internal BackupInProgress() function, just like pg_is_in_recovery()
 is calling RecoveryInProgress().

 I think it would be more useful to have a function that returns a
 timestamp when the backup started. That way, it's possible to write a
 generic monitoring script that alerts the sysadmin only when a backup
 has been running for too long, but is silent for well-behaved backups.

+1

Pavel


 Regards,
 Marti

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spgist text_ops and LIKE

On Wed, Feb 1, 2012 at 10:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Is spgist intended to support prefix searches with LIKE?

 Too lazy to look at the code right now, but I think indxpath.c contains
 hardwired assumptions that LIKE prefix optimizations are only possible
 with btree indexes.  Possibly it would be worth relaxing that.  (The
 whole special index operator mechanism is undesirably special-purpose,
 but I currently have no ideas about how to make it more flexible.)

Is it as simple as the attached?

Given the set of operators that are available for spgist, one would
assume that it's intended to handle both pattern-matching and regular
comparison operators.  On the flip side, spg_text_inner_consistent()
seems to fall back to a full index-scan for the regular comparison
operators, which makes me wonder why anyone would bother having the
operator at all.  Indeed, in my testing, it's slightly slower than a
sequential scan:

select sum(1) from person where last_name  'WAR';
 - btree: 113-114ms
 - btree with text_pattern_ops: can't do it
 - spgist: 920-935ms
 - sequential scan: 895-910ms

With the attached patch, it seems to work as well as text_pattern_ops
for pattern matching:

select sum(1) from person where last_name LIKE 'WAR%';
 - btree: can't do it
 - btree with text_pattern_ops: 6-7ms
 - spgist: 7-9ms
 - sequential scan: 170-180ms

It's fine for equality, too:

select sum(1) from person where last_name = 'WARNER';
 - btree: 1.7-1.9ms
 - btree with text_pattern_ops: 1.7-1.9ms
 - spgist: 1.7-1.9ms
 - sequential scan: 160-165ms

It seems that there's at least the potential for spgist indexes to be
extremely efficient, given adequate knowledge about the collation
under which comparisons are being done.   If for example we're trying
to find strings where last_name  'WAR', we certainly need to chase
down the subtrees where the first character is 'W', 'X', 'Y', or 'Z'.
If the collation is case-insensitive, we might also need to chase 'w',
'x', 'y', and 'z'.  And there might be other characters that need to
be chased as well, depending on the collation: some collations might
sort numbers before letters, while others might do the reverse.  But
it's likely that there are few if any collations where we need to
chase down the subtree for 'Q' on this query, because anything
beginning with 'Q' is likely to sort before anything beginning with
'WAR' pretty much anywhere.  I'm not sure there's a good way to get
that information without a lot of grotty hard-coding, but it seems
like the potential is there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


spgist-pattern-match.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

On Thu, Feb 2, 2012 at 4:54 AM, Abhijit Menon-Sen a...@toroid.org wrote:
 At 2012-02-01 11:28:50 -0500, robertmh...@gmail.com wrote:
 It's also pretty clear that JSON
 string - PG text data type is going to admit of a number of error
 conditions (transcoding errors and perhaps invalid surrogate pairs) so
 throwing one more on the pile doesn't cost much.

 I'm sorry for being slow, but I don't understand what you're proposing
 to do here (if anything). Could I ask you to explain, please?

 Are you talking about allowing the six literal bytes \u to be
 present in the JSON? If so, I agree, there seems to be no reason to
 disallow it.

 Are you also saying we should allow any \u sequence, without
 checking for errors (e.g. invalid surrogate pairs or parts thereof)?

 And what transcoding errors are you referring to?

Consider the following JSON object:

abc

This is a JSON string.  Someone is eventually going to propose a
function with a  name like json_to_string() which, when given this
JSON object, returns a three-character string with the PostgreSQL text
type.  That's useful and I support it.  But now suppose we pass this
JSON object to that same function:

a\u0062c

The user will quite rightly expect that since this JSON object
represents the same value as the first JSON object, they're going to
get the same answer back from json_to_string(), namely abc.  So far,
so good.  But now suppose we pass this JSON object to that same
function:

a\uc

This is going to have to be an error condition, because PostgreSQL
does not allow values of type text to contain embedded NUL characters.
 Now consider this:

a\uABCDc

Suppose that \uABCD represents a character that exists in Unicode, but
the server-encoding is SHIFT-JIS or EUC-JP or some other system which
has no equivalent for the character represented by \uABCD.  Again,
when json_to_string() is applied to a value of this type, it must
fail.

In other words, we're knowingly allowing JSON strings to contain
characters which might not be representable as PostgreSQL strings,
because JSON allows any Unicode character, and the server encoding
might not be Unicode, and the server doesn't allow embedded NULs in
any encoding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-02 Thread k...@rice.edu

On Wed, Feb 01, 2012 at 04:12:58PM -0600, Jim Nasby wrote:
 On Jan 26, 2012, at 9:32 PM, Robert Haas wrote:
  But if we want to put it on a diet, the first thing I'd probably be
  inclined to lose is the float4 specialization.  Some members of the
  audience will recall that I take dim view of floating point arithmetic
  generally, but I'll concede that there are valid reasons for using
  float8.  I have a harder time coming up with a good reason to use
  float4 - ever, for anything you care about.  So I would be inclined to
  think that if we want to trim this back a bit, maybe that's the one to
  let go.  If we want to be even more aggressive, the next thing I'd
  probably lose is the optimization of multiple sortkey cases, on the
  theory that single sort keys are probably by far the most common
  practical case.
 
 I do find float4 to be useful, though it's possible that my understanding is 
 flawed…
 
 We end up using float to represent ratios in our database; things that 
 really, honest to God do NOT need to be exact.
 
 In most cases, 7 digits of precision (which AFAIK is what you're guaranteed 
 with float4) is plenty, so we use float4 rather than bloat the database 
 (though, since we're on 64bit hardware I guess that distinction is somewhat 
 moot…).
 
 Is there something I'm missing that would make float4 useless as compared to 
 float8?
 --
 Jim C. Nasby, Database Architect   j...@nasby.net
 512.569.9461 (cell) http://jim.nasby.net
 
If the values stored are float4, it would be nice to have that fast-path
sort available too. The cases where I have used float4 values in the past,
I absolutely did not need any of the float8 baggage and in my case, using
the actual float4 comparison operator resulted in a significant time savings
over the normal float8. This could be processor specific, but it would be
worth testing before throwing it out.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] freezing multixacts

On Wed, Feb 1, 2012 at 11:33 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 So freezing multixacts is not all that easy.  I mean, you just scan the
 page looking for multis lesser than the cutoff; for those that are dead,
 they can just be removed completely, but what about ones that still have
 members running?  This is pretty unlikely but not impossible.

Right.

 If there's only one remaining member, the problem is easy: replace it
 with that transaction's xid, and set the appropriate hint bits.  But if
 there's more than one, the only way out is to create a new multi.  This
 increases multixactid consumption, but I don't see any other option.

Why do we need to freeze anything if the transactions are still
running?  We certainly don't freeze regular transaction IDs while the
transactions are still running; it would give wrong answers.  It's
probably possible to do it for mxids, but why would you need to?
Suppose you have a tuple A which is locked by a series of transactions
T0, T1, T2, ...; AIUI, each new locker is going to have to create a
new mxid with all the existing entries plus a new one for itself.
But, unless I'm confused, as it's doing so, it can discard any entries
for locks taken by transactions which are no longer running.  So given
an mxid with living members, any dead member in that mxid must have
been living at the time the newest member was added.  Surely we can't
be consuming mxids anywhere near fast enough for that to be a problem.
 There could be an updating transaction involved as well, but if
that's not running any more then it has either committed (in which
case the tuple will be dead once the global-xmin advances past it) or
aborted (in which case we can forget about it).

 However, there are cases where not even that is possible -- consider
 tuple freezing during WAL recovery.  Recovery is going to need to
 replace those multis with other multis, but it cannot create new multis
 itself.  The only solution here appears to be that when multis are
 frozen in the master, replacement multis have to be logged too.  So the
 heap_freeze_tuple Xlog record will have a map of old multi to new.  That
 way, recovery can just determine the new multi to use for any particular
 old multi; since multixact creation is also logged, we're certain that
 the replacement value has already been defined.

This doesn't sound right.  Why would recovery need to create a multi
that didn't exist on the master?  Any multi it applies to a record
should be one that it was told to apply by the master; and the master
should have already WAL-logged the creation of that multi.  I don't
think that replacement mxids have to be logged; I think that *all*
mxids have to be logged.  Am I all wet?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_tuple_needs_freeze false positive

On Wed, Feb 1, 2012 at 8:01 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 Suggested patch attached.  I'd backpatch this as far as it applies
 cleanly.

This is new code in 9.2, but it's modelled on heap_freeze_tuple(), which is old.

I'm not convinced that it's a bug.  Suppose that xmax is set but is
hinted as invalid.  We process the table and advanced relfrozenxid;
then, we crash.  After recovery, it's possible that the hint bit is
gone (after all, setting hint bits isn't WAL-logged).  Now we're in
big trouble, because the next CLOG lookup on that xmax value might not
happen until it's been reused, and we might get a different answer
than before.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-02-02 Thread Marko Kreen

On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote:
 Hello, This is new version of dblink.c
 
 - Memory is properly freed when realloc returns NULL in storeHandler().
 
 - The bug that free() in finishStoreInfo() will be fed with
   garbage pointer when malloc for sinfo-valbuflen fails is
   fixed.

Thanks, merged.  I also did some minor coding style cleanups.

Tagging it Ready For Committer.  I don't see any notable
issues with the patch anymore.

There is some potential for experimenting with more aggressive
optimizations on dblink side, but I'd like to get a nod from
a committer for libpq changes first.

-- 
marko

*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
***
*** 160,162  PQconnectStartParams  157
--- 160,164 
  PQping158
  PQpingParams  159
  PQlibVersion  160
+ PQsetRowProcessor	  161
+ PQsetRowProcessorErrMsg	  162
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***
*** 2693,2698  makeEmptyPGconn(void)
--- 2693,2701 
  	conn-wait_ssl_try = false;
  #endif
  
+ 	/* set default row processor */
+ 	PQsetRowProcessor(conn, NULL, NULL);
+ 
  	/*
  	 * We try to send at least 8K at a time, which is the usual size of pipe
  	 * buffers on Unix systems.  That way, when we are sending a large amount
***
*** 2711,2718  makeEmptyPGconn(void)
--- 2714,2726 
  	initPQExpBuffer(conn-errorMessage);
  	initPQExpBuffer(conn-workBuffer);
  
+ 	/* set up initial row buffer */
+ 	conn-rowBufLen = 32;
+ 	conn-rowBuf = (PGrowValue *)malloc(conn-rowBufLen * sizeof(PGrowValue));
+ 
  	if (conn-inBuffer == NULL ||
  		conn-outBuffer == NULL ||
+ 		conn-rowBuf == NULL ||
  		PQExpBufferBroken(conn-errorMessage) ||
  		PQExpBufferBroken(conn-workBuffer))
  	{
***
*** 2812,2817  freePGconn(PGconn *conn)
--- 2820,2827 
  		free(conn-inBuffer);
  	if (conn-outBuffer)
  		free(conn-outBuffer);
+ 	if (conn-rowBuf)
+ 		free(conn-rowBuf);
  	termPQExpBuffer(conn-errorMessage);
  	termPQExpBuffer(conn-workBuffer);
  
***
*** 5076,5078  PQregisterThreadLock(pgthreadlock_t newhandler)
--- 5086,5089 
  
  	return prev;
  }
+ 
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
***
*** 66,71  static PGresult *PQexecFinish(PGconn *conn);
--- 66,72 
  static int PQsendDescribe(PGconn *conn, char desc_type,
  			   const char *desc_target);
  static int	check_field_number(const PGresult *res, int field_num);
+ static int	pqAddRow(PGresult *res, void *param, PGrowValue *columns);
  
  
  /* 
***
*** 160,165  PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
--- 161,167 
  	result-curBlock = NULL;
  	result-curOffset = 0;
  	result-spaceLeft = 0;
+ 	result-rowProcessorErrMsg = NULL;
  
  	if (conn)
  	{
***
*** 701,707  pqClearAsyncResult(PGconn *conn)
  	if (conn-result)
  		PQclear(conn-result);
  	conn-result = NULL;
- 	conn-curTuple = NULL;
  }
  
  /*
--- 703,708 
***
*** 756,762  pqPrepareAsyncResult(PGconn *conn)
  	 */
  	res = conn-result;
  	conn-result = NULL;		/* handing over ownership to caller */
- 	conn-curTuple = NULL;		/* just in case */
  	if (!res)
  		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
  	else
--- 757,762 
***
*** 828,833  pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
--- 828,900 
  }
  
  /*
+  * PQsetRowProcessor
+  *   Set function that copies column data out from network buffer.
+  */
+ void
+ PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+ {
+ 	conn-rowProcessor = (func ? func : pqAddRow);
+ 	conn-rowProcessorParam = param;
+ }
+ 
+ /*
+  * PQsetRowProcessorErrMsg
+  *Set the error message pass back to the caller of RowProcessor.
+  *
+  *You can replace the previous message by alternative mes, or clear
+  *it with NULL.
+  */
+ void
+ PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ {
+ 	if (msg)
+ 		res-rowProcessorErrMsg = pqResultStrdup(res, msg);
+ 	else
+ 		res-rowProcessorErrMsg = NULL;
+ }
+ 
+ /*
+  * pqAddRow
+  *	  add a row to the PGresult structure, growing it if necessary
+  *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+  */
+ static int
+ pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+ {
+ 	PGresAttValue *tup;
+ 	int			nfields = res-numAttributes;
+ 	int			i;
+ 
+ 	tup = (PGresAttValue *)
+ 		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+ 	if (tup == NULL)
+ 		return FALSE;
+ 
+ 	for (i = 0 ; i  nfields ; i++)
+ 	{
+ 		tup[i].len = columns[i].len;
+ 		if (tup[i].len == NULL_LEN)
+ 		{
+ 			tup[i].value = res-null_field;
+ 		}
+ 		else
+ 		{
+ 			bool isbinary = (res-attDescs[i].format != 0);
+ 			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+ 			if (tup[i].value == NULL)
+ return FALSE;
+

Re: [HACKERS] keywords in initdb are case-sensitive?

On Wed, Feb 1, 2012 at 11:45 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 In miracee's review of Peter's patch for new -A options in initdb (in
 commitfest app only), it is noted that pg_hba.conf keyword parsing is
 done in a case sensitive manner.  So if you write Trust rather than
 trust, it's not recognized.

 This seemed pretty nonsensical to me, and it's not documented, so I came
 up with the trivial attached patch.

 Comparisons to user and database names and the like are unchanged and
 thus require matching case.

 Thoughts?

We have lots of things that are case-sensitive; I don't particularly
see why this one should be different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for implementing SPI_gettypemod()

On Wed, Feb 1, 2012 at 5:23 AM, Chetan Suttraway
chetan.suttra...@enterprisedb.com wrote:
 Hi All,

 This is regarding the TODO item :
 Add SPI_gettypmod() to return a field's typemod from a TupleDesc

 The related message is:
 http://archives.postgresql.org/pgsql-hackers/2005-11/msg00250.php

 This basically talks about having an SPI_gettypemod() which returns the
 typmod of a field of tupdesc

 Please refer the attached patch based on the suggested implementation.

Please add this to the next CommitFest:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-02-02 Thread Abhijit Menon-Sen

At 2012-02-02 08:54:32 -0500, robertmh...@gmail.com wrote:

 Someone is eventually going to propose a function with a  name like
 json_to_string() which, when given this JSON object, returns a
 three-character string with the PostgreSQL text type. 

Ah, that's the bit I was missing. I thought you were talking about an
immediate error condition.

 That's useful and I support it.

Agreed. Also, now I understand that you are saying that json_to_string()
(json_string_to_text?) would fail if the result couldn't be represented
as a text in the current encoding, and that's sensible as well. I had
misunderstood is going to admit of a number of error… in your mail.

As for surrogate pairs, just to be clear, what I was proposing earlier
in the thread was to change json.c:json_lex_string() to detect errors
(e.g. only one half of a surrogate pair, which is the commonest error
I've encountered in the wild) and reject such strings.

Thanks for the explanation.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] keywords in initdb are case-sensitive?

2012-02-02 Thread Alvaro Herrera


Excerpts from Robert Haas's message of jue feb 02 11:39:29 -0300 2012:
 On Wed, Feb 1, 2012 at 11:45 PM, Alvaro Herrera alvhe...@alvh.no-ip.org 
 wrote:
  In miracee's review of Peter's patch for new -A options in initdb (in
  commitfest app only), it is noted that pg_hba.conf keyword parsing is
  done in a case sensitive manner.  So if you write Trust rather than
  trust, it's not recognized.
 
  This seemed pretty nonsensical to me, and it's not documented, so I came
  up with the trivial attached patch.
 
  Comparisons to user and database names and the like are unchanged and
  thus require matching case.
 
 We have lots of things that are case-sensitive; I don't particularly
 see why this one should be different.

Err, postgresql.conf processing is case insensitive, which is the most
closely related example.  Are you saying we should make that case
sensitive as well?  What I'm saying is that I see no good reason for
keyword comparison to be case sensitive here.  We don't compare case on
SQL keywords either.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON output functions.

2012-02-02 Thread Andrew Dunstan




On 02/02/2012 04:35 AM, Abhijit Menon-Sen wrote:

At 2012-02-01 18:48:28 -0500, andrew.duns...@pgexperts.com wrote:

For now I'm inclined not to proceed with that, and leave it as an
optimization to be considered later if necessary. Thoughts?

I agree, there doesn't seem to be a pressing need to do it now.




OK, here's my final version of the patch for constructor functions. If 
there's no further comment I'll go with this.


cheers

andrew

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ec14004..22adcb8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9617,6 +9617,65 @@ table2-mapping
   /sect2
  /sect1
 
+ sect1 id=functions-json
+  titleJSON functions/title
+
+  indexterm zone=datatype-json
+	primaryJSON/primary
+	secondaryFunctions and operators/secondary
+  /indexterm
+
+  para
+This section descripbes the functions that are available for creating
+JSON (see xref linkend=datatype-json) data.
+  /para
+
+  table id=functions-json-table
+titleJSON Support Functions/title
+tgroup cols=4
+ thead
+  row
+   entryFunction/entry
+   entryDescription/entry
+   entryExample/entry
+   entryExample Result/entry
+  /row
+ /thead
+ tbody
+  row
+   entry
+ indexterm
+  primaryarray_to_json/primary
+ /indexterm
+ literalarray_to_json(anyarray [, pretty_bool])/literal
+   /entry
+   entry
+ Returns the array as JSON. A Postgres multi-dimensional array 
+ becomes a JSON array of arrays. Line feeds will be added between 
+ dimension 1 elements if pretty_bool is true.
+   /entry
+   entryliteralarray_to_json('{{1,5},{99,100}}'::int[])/literal/entry
+   entryliteral[[1,5],[99,100]]/literal/entry
+  /row
+  row
+   entry
+ indexterm
+  primaryrow_to_json/primary
+ /indexterm
+ literalrow_to_json(record [, pretty_bool])/literal
+   /entry
+   entry
+ Returns the row as JSON. Line feeds will be added between level 
+ 1 elements if pretty_bool is true.
+   /entry
+   entryliteralrow_to_json(row(1,'foo'))/literal/entry
+   entryliteral{f1:1,f2:foo}/literal/entry
+  /row
+ /tbody
+/tgroup
+   /table
+
+ /sect1
 
  sect1 id=functions-sequence
   titleSequence Manipulation Functions/title
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e35ac59..e57580e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -24,6 +24,7 @@
 #include rewrite/rewriteHandler.h
 #include tcop/tcopprot.h
 #include utils/builtins.h
+#include utils/json.h
 #include utils/lsyscache.h
 #include utils/rel.h
 #include utils/snapmgr.h
@@ -99,7 +100,6 @@ static void ExplainDummyGroup(const char *objtype, const char *labelname,
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
-static void escape_json(StringInfo buf, const char *str);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -2319,51 +2319,6 @@ ExplainYAMLLineStarting(ExplainState *es)
 }
 
 /*
- * Produce a JSON string literal, properly escaping characters in the text.
- */
-static void
-escape_json(StringInfo buf, const char *str)
-{
-	const char *p;
-
-	appendStringInfoCharMacro(buf, '\');
-	for (p = str; *p; p++)
-	{
-		switch (*p)
-		{
-			case '\b':
-appendStringInfoString(buf, \\b);
-break;
-			case '\f':
-appendStringInfoString(buf, \\f);
-break;
-			case '\n':
-appendStringInfoString(buf, \\n);
-break;
-			case '\r':
-appendStringInfoString(buf, \\r);
-break;
-			case '\t':
-appendStringInfoString(buf, \\t);
-break;
-			case '':
-appendStringInfoString(buf, \\\);
-break;
-			case '\\':
-appendStringInfoString(buf, );
-break;
-			default:
-if ((unsigned char) *p  ' ')
-	appendStringInfo(buf, \\u%04x, (int) *p);
-else
-	appendStringInfoCharMacro(buf, *p);
-break;
-		}
-	}
-	appendStringInfoCharMacro(buf, '\');
-}
-
-/*
  * YAML is a superset of JSON; unfortuantely, the YAML quoting rules are
  * ridiculously complicated -- as documented in sections 5.3 and 7.3.3 of
  * http://yaml.org/spec/1.2/spec.html -- so we chose to just quote everything.
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index cbb81d1..60addf2 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -13,11 +13,17 @@
  */
 #include postgres.h
 
+#include catalog/pg_type.h
+#include executor/spi.h
 #include lib/stringinfo.h
 #include libpq/pqformat.h
 #include mb/pg_wchar.h
+#include parser/parse_coerce.h
+#include utils/array.h
 #include utils/builtins.h
+#include utils/lsyscache.h
 #include utils/json.h
+#include utils/typcache.h
 
 typedef enum
 {
@@ -72,8 +78,11 @@ static void json_lex_number(JsonLexContext *lex,

Re: [HACKERS] double writes using double-write buffer approach [WIP]

On Fri, Jan 27, 2012 at 5:31 PM, Dan Scales sca...@vmware.com wrote:
 I've been prototyping the double-write buffer idea that Heikki and Simon
 had proposed (as an alternative to a previous patch that only batched up
 writes by the checkpointer).  I think it is a good idea, and can help
 double-writes perform better in the case of lots of backend evictions.
 It also centralizes most of the code change in smgr.c.  However, it is
 trickier to reason about.

This doesn't compile on MacOS X, because there's no writev().

I don't understand how you can possibly get away with such small
buffers.  AIUI, you must retained every page in the double-write
buffer until it's been written and fsync'd to disk.  That means the
most dirty data you'll ever be able to have in the operating system
cache with this implementation is (128 + 64) * 8kB = 1.5MB.  Granted,
we currently have occasional problems with the OS caching too *much*
dirty data, but that seems like it's going way, way too far in the
opposite direction.  That's barely enough for the system to do any
write reordering at all.

I am particularly worried about what happens when a ring buffer is in
use.  I tried running pgbench -i -s 10 with this patch applied,
full_page_writes=off, double_writes=on.  It took 41.2 seconds to
complete.  The same test with the stock code takes 14.3 seconds; and
the actual situation is worse for double-writes than those numbers
might imply, because the index build time doesn't seem to be much
affected, while the COPY takes a small eternity with the patch
compared to the usual way of doing things.  I think the slowdown on
COPY once the double-write buffer fills is on the order of 10x.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring log_newpage

Simon Riggs si...@2ndquadrant.com writes:
 On Wed, Feb 1, 2012 at 10:42 PM, Jim Nasby j...@nasby.net wrote:
 But we already had RelFileNode; wouldn't that be enough to tell what rmgr 
 was responsible for the new page? Can 2 different rmgrs write to the same 
 file node?

 No, but which one? No way to tell unless you have full list of
 relfilenodes and check each one. So btree changes look like heap
 changes etc..

What I'm not following is why we should care.  There are no cases at
present where this matters, and no proposed patches (that I know of)
that would make it matter.  If an individual rmgr needs control of
what happens during a given operation, it wouldn't use XLOG_NEWPAGE
to represent the operation.  End of problem.  You haven't provided
any convincing argument why this needs to be changed globally.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_tuple_needs_freeze false positive

Robert Haas robertmh...@gmail.com writes:
 I'm not convinced that it's a bug.  Suppose that xmax is set but is
 hinted as invalid.

XMAX_INVALID is not a hint.  When it's set, the contents of the field
must be presumed to be garbage.  Any code failing to adhere to that rule
is broken.

 We process the table and advanced relfrozenxid;
 then, we crash.  After recovery, it's possible that the hint bit is
 gone (after all, setting hint bits isn't WAL-logged).  Now we're in
 big trouble, because the next CLOG lookup on that xmax value might not
 happen until it's been reused, and we might get a different answer
 than before.

I believe we have adequate defenses against that, and even if we did
not, that doesn't make the code in question less wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum rate limit in KBps

2012-02-02 Thread Jeff Janes

On Sun, Jan 15, 2012 at 12:24 AM, Greg Smith g...@2ndquadrant.com wrote:
 So far the reaction I've gotten from my recent submission to make autovacuum
 log its read/write in MB/s has been rather positive.  I've been surprised at
 the unprecedented (to me at least) amount of backporting onto big production
 systems it's gotten.  There is a whole lot of pent up frustration among
 larger installs over not having good visibility into how changing cost-based
 vacuum parameters turns into real-world units.

 That got me thinking:  if MB/s is what everyone wants to monitor, can we
 provide a UI to set these parameters that way too?  The attached patch is a
 bit rough still, but it does that.  The key was recognizing that the cost
 delay plus cost limit can be converted into an upper limit on cost units per
 second, presuming the writes themselves are free.  If you then also assume
 the worst case--that everything will end up dirty--by throwing in the block
 size, too,  you compute a maximum rate in MB/s.  That represents the fastest
 you can possibly write.

Since this is mostly a usability patch, I was looking at the
documentation, trying to pretend I'm a end user who hasn't seen the
sausage being made.

I think the doc changes are too conservative.  When using cost-based
vacuuming should be something like When using rate-limited
vacuuming.

What does the cost mean in the variable vacuum_cost_rate_limit?
It seems to be genuflection to the past--the past we think is too
confusing.  Since this is the primary knob the end user is expected to
use, the fact that we use these cost things to implement
rate-limited vacuuming is a detail that should not be reflected in the
variable name, so vacuum_rate_limit seems better.  Leave the cost
stuff to the advanced users who want to read beyond the primary knob.

Whether I want to rate-limit the vacuum at all should be determined by
vacuum_rate_limit instead of by setting vacuum_cost_delay.
vacuum_rate_limit=0 should mean unlimited.  I think it is pretty
intuitive that, in cases where a literal 0 makes no sense, then 0
really means infinity, and that convention is used in other places.

I think it is confusing to have more variables than there are degrees
of freedom.  If we want one knob which is largely writes but mixes in
some reads and simple page visits as well, then I think
vacuum_cost_page_dirty should go away (effectively be fixed at 1.0),
and the vacuum_cost_page_miss would default to 0.5 and
vacuum_cost_page_hit default to 0.05.


Also, in the current patch, in addition to the overflow at high rate
limits, there is an rounding-to-zero at low rate limits that leads to
floating point exceptions.

PST:LOG:  cost limit=0 based on rate limit=10 KB/s delay=20 dirty cost=20
PST:STATEMENT:  VACUUM VERBOSE t;
PST:ERROR:  floating-point exception
PST:DETAIL:  An invalid floating-point operation was signaled. This
probably means an out-of-range result or an invalid operation, such as
division by zero.
PST:STATEMENT:  VACUUM VERBOSE t;

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] keywords in pg_hba.conf are case-sensitive?

[ adjusting thread title to have something to do with reality ]

Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Robert Haas's message of jue feb 02 11:39:29 -0300 2012:
 We have lots of things that are case-sensitive; I don't particularly
 see why this one should be different.

 Err, postgresql.conf processing is case insensitive, which is the most
 closely related example.  Are you saying we should make that case
 sensitive as well?  What I'm saying is that I see no good reason for
 keyword comparison to be case sensitive here.  We don't compare case on
 SQL keywords either.

One thing I'm concerned about is that there are places in pg_hba.conf
where a token might be either a keyword or a user/group/database name.
If you start recognizing keywords case-insensitively, you could break
files that used to work, ie what was meant to be a name will now be
read as a keyword.  Admittedly, the odds of that are not very large, but
they're not zero either.  Given the entire lack of complaints about this
from the field, I'm inclined to think it's better to leave well enough
alone.  We could add a documentation note if you feel a need for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spgist text_ops and LIKE

Robert Haas robertmh...@gmail.com writes:
 On Wed, Feb 1, 2012 at 10:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Too lazy to look at the code right now, but I think indxpath.c contains
 hardwired assumptions that LIKE prefix optimizations are only possible
 with btree indexes.  Possibly it would be worth relaxing that.

 Is it as simple as the attached?

Yeah, that looks about right, with the single quibble that the other
places where we have #defines in pg_opfamily.h put them after the
corresponding DATA line not before.  Please fix and commit.

 On the flip side, spg_text_inner_consistent()
 seems to fall back to a full index-scan for the regular comparison
 operators, which makes me wonder why anyone would bother having the
 operator at all.

MHO is that that code path is only useful when you are in C locale;
but I think that's a useful enough case to be worth the rather small
amount of incremental code to handle it.  I'm not sure I believe your
speculations about making spgist handle inequality searches in other
locales.  Non-C sort orders are too bizarre and we have too little
visibility into the precise rules the current locale is following.

... however ... could it be reasonable to apply strxfrm and store the
result of that in the index?  Then searches would just need byte-by-byte
comparisons.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON output functions.

2012-02-02 Thread Pavel Stehule

2012/2/2 Andrew Dunstan and...@dunslane.net:


 On 02/02/2012 04:35 AM, Abhijit Menon-Sen wrote:

 At 2012-02-01 18:48:28 -0500, andrew.duns...@pgexperts.com wrote:

 For now I'm inclined not to proceed with that, and leave it as an
 optimization to be considered later if necessary. Thoughts?

 I agree, there doesn't seem to be a pressing need to do it now.



 OK, here's my final version of the patch for constructor functions. If
 there's no further comment I'll go with this.

These function are super, Thank you

Do you plan to fix a issue with row attribute names in 9.2?

Regards

Pavel


 cheers

 andrew



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_tuple_needs_freeze false positive

On Thu, Feb 2, 2012 at 11:27 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm not convinced that it's a bug.  Suppose that xmax is set but is
 hinted as invalid.

 XMAX_INVALID is not a hint.  When it's set, the contents of the field
 must be presumed to be garbage.  Any code failing to adhere to that rule
 is broken.

 We process the table and advanced relfrozenxid;
 then, we crash.  After recovery, it's possible that the hint bit is
 gone (after all, setting hint bits isn't WAL-logged).  Now we're in
 big trouble, because the next CLOG lookup on that xmax value might not
 happen until it's been reused, and we might get a different answer
 than before.

 I believe we have adequate defenses against that, and even if we did
 not, that doesn't make the code in question less wrong.

I believe the adequate defense that we have is precisely the logic you
are proposing to change.  Regardless of whether you want to call
XMAX_INVALID a hint or, say, a giant tortoise, I am fairly sure that
we don't WAL-log setting it.  That means that a bit set before a crash
won't necessarily still be set after a crash.  But the corresponding
relfrozenxid advancement will be WAL-logged, leading to the problem
scenario I described.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-02-02 Thread Gilles Darold

Le 02/02/2012 12:23, Marti Raudsepp a écrit :
 On Mon, Jan 30, 2012 at 20:33, Gilles Darold gilles.dar...@dalibo.com wrote:
 After some time searching for a Pg system administration function like
 pg_is_in_recovery(), let's say pg_is_in_backup(), I couldn't find one.
 The minor patch attached here adds this administrative function that can
 be used with others backup control functions. This is a very little
 patch outside documentation because the function is only a wrapper to
 the internal BackupInProgress() function, just like pg_is_in_recovery()
 is calling RecoveryInProgress().
 I think it would be more useful to have a function that returns a
 timestamp when the backup started. That way, it's possible to write a
 generic monitoring script that alerts the sysadmin only when a backup
 has been running for too long, but is silent for well-behaved backups.

For now the internal function BackupInProgress() only return a boolean.
I like the idea that pg_is_in_backup() just works as pg_is_in_recovery()
do. I mean it doesn't return the timestamp of the recovery starting time
but only true or false.

Maybe we can have an other function that will return all information
available in the backup_label file ?

Regards,

-- 
Gilles Darold
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ecpglib use PQconnectdbParams

2012-02-02 Thread Peter Eisentraut

I noticed ecpglib still uses PQconnectdb() with a craftily assembled
connection string.  Here is a patch to use PQconnectdbParams() instead.
diff --git i/src/interfaces/ecpg/ecpglib/connect.c w/src/interfaces/ecpg/ecpglib/connect.c
index 909ba70..ea69e15 100644
--- i/src/interfaces/ecpg/ecpglib/connect.c
+++ w/src/interfaces/ecpg/ecpglib/connect.c
@@ -260,14 +260,6 @@ ECPGnoticeReceiver(void *arg, const PGresult *result)
 	ecpg_log(raising sqlcode %d\n, sqlcode);
 }
 
-static int
-strlen_or_null(const char *string)
-{
-	if (!string)
-		return 0;
-	return (strlen(string));
-}
-
 /* this contains some quick hacks, needs to be cleaned up, but it works */
 bool
 ECPGconnect(int lineno, int c, const char *name, const char *user, const char *passwd, const char *connection_name, int autocommit)
@@ -281,8 +273,9 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
 			   *tmp,
 			   *port = NULL,
 			   *realname = NULL,
-			   *options = NULL,
-			   *connect_string = NULL;
+			   *options = NULL;
+	const char *conn_keywords[6];
+	const char *conn_values[6];
 
 	ecpg_init_sqlca(sqlca);
 
@@ -482,34 +475,52 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p
 			 options ? with options  : , options ? options : ,
 			 (user  strlen(user)  0) ? for user  : , user ? user : );
 
-	connect_string = ecpg_alloc(strlen_or_null(host)
-+ strlen_or_null(port)
-+ strlen_or_null(options)
-+ strlen_or_null(realname)
-+ strlen_or_null(user)
-+ strlen_or_null(passwd)
-			  + sizeof( host = port = dbname = user = password =), lineno);
-
-	if (options)/* replace '' if tehre are any */
+	if (options)/* replace '' if there are any */
 		for (i = 0; options[i]; i++)
 			if (options[i] == '')
 options[i] = ' ';
 
-	sprintf(connect_string, %s%s %s%s %s%s %s%s %s%s %s,
-			realname ? dbname= : , realname ? realname : ,
-			host ? host= : , host ? host : ,
-			port ? port= : , port ? port : ,
-			(user  strlen(user)  0) ? user= : , user ? user : ,
-	 (passwd  strlen(passwd)  0) ? password= : , passwd ? passwd : ,
-			options ? options : );
+	i = 0;
+	if (realname)
+	{
+		conn_keywords[i] = dbname;
+		conn_values[i] = realname;
+		i++;
+	}
+	if (host)
+	{
+		conn_keywords[i] = host;
+		conn_values[i] = host;
+		i++;
+	}
+	if (port)
+	{
+		conn_keywords[i] = port;
+		conn_values[i] = port;
+		i++;
+	}
+	if (user  strlen(user)  0)
+	{
+		conn_keywords[i] = user;
+		conn_values[i] = user;
+		i++;
+	}
+	if (passwd  strlen(passwd)  0)
+	{
+		conn_keywords[i] = password;
+		conn_values[i] = passwd;
+		i++;
+	}
+	if (options)
+	{
+		conn_keywords[i] = options;
+		conn_values[i] = options;
+		i++;
+	}
+	conn_keywords[i] = NULL;	/* terminator */
 
-	/*
-	 * this is deprecated this-connection = PQsetdbLogin(host, port, options,
-	 * NULL, realname, user, passwd);
-	 */
-	this-connection = PQconnectdb(connect_string);
+	this-connection = PQconnectdbParams(conn_keywords, conn_values, 0);
 
-	ecpg_free(connect_string);
 	if (host)
 		ecpg_free(host);
 	if (port)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap_tuple_needs_freeze false positive

On Thu, Feb 2, 2012 at 12:45 PM, Robert Haas robertmh...@gmail.com wrote:
 I believe the adequate defense that we have is precisely the logic you
 are proposing to change.  Regardless of whether you want to call
 XMAX_INVALID a hint or, say, a giant tortoise, I am fairly sure that
 we don't WAL-log setting it.  That means that a bit set before a crash
 won't necessarily still be set after a crash.  But the corresponding
 relfrozenxid advancement will be WAL-logged, leading to the problem
 scenario I described.

To put that another way, the problem isn't that we might have code
somewhere in the system that ignores HEAP_XMAX_INVALID.  The problem
is that HEAP_XMAX_INVALID might not still be set on that tuple the
next time somebody looks at it, if a database crash intervenes after
that bit is set and before it is flushed to disk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] keywords in pg_hba.conf are case-sensitive?

On Thu, Feb 2, 2012 at 10:10 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Err, postgresql.conf processing is case insensitive, which is the most
 closely related example.  Are you saying we should make that case
 sensitive as well?  What I'm saying is that I see no good reason for
 keyword comparison to be case sensitive here.  We don't compare case on
 SQL keywords either.

In some sense this is just a religious question: UNIX commands are
case-sensitive (except on MacOS X, where I keep typing pg_Ctl start by
mistake and it lets me get away with it) and any UNIX purist worth his
salt will tell you that's the one true way.  On the other hand,
ignoring case for comparison purposes has a long and distinguished
history, too, especially outside of computing.  In this particular
case, I am mostly in favor of leaving it alone because I can't see any
real upside to changing it.  Having multiple ways to spell the same
configuration setting, even when they differ only in case, complicates
the job of (for example) anyone who wants to write a pg_hba.conf file
parser: some files will be valid on 9.2 that weren't valid on 9.1, or
(as Tom points out) they might both be valid but mean subtly different
things in corner cases.  If it were already case-sensitive, I would be
in favor of leaving that alone, too; there's just not enough upside to
justifying tinkering with it, IMV.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spgist text_ops and LIKE

On Thu, Feb 2, 2012 at 12:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Feb 1, 2012 at 10:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Too lazy to look at the code right now, but I think indxpath.c contains
 hardwired assumptions that LIKE prefix optimizations are only possible
 with btree indexes.  Possibly it would be worth relaxing that.

 Is it as simple as the attached?

 Yeah, that looks about right, with the single quibble that the other
 places where we have #defines in pg_opfamily.h put them after the
 corresponding DATA line not before.  Please fix and commit.

Done.

 On the flip side, spg_text_inner_consistent()
 seems to fall back to a full index-scan for the regular comparison
 operators, which makes me wonder why anyone would bother having the
 operator at all.

 MHO is that that code path is only useful when you are in C locale;
 but I think that's a useful enough case to be worth the rather small
 amount of incremental code to handle it.  I'm not sure I believe your
 speculations about making spgist handle inequality searches in other
 locales.  Non-C sort orders are too bizarre and we have too little
 visibility into the precise rules the current locale is following.

Unfortunately, I fear that you are right, unless we were to switch
over to using a locale machinery of our own devising in place of that
provided by the OS.  And no, I'm not proposing that.

 ... however ... could it be reasonable to apply strxfrm and store the
 result of that in the index?  Then searches would just need byte-by-byte
 comparisons.

Yeah, you could, but I don't see the point.  I was hoping that spgist
would give us the ability to have a single index that allows both
inequality comparisons and pattern matching.  If you're going to have
one index where the strings are stored straight and another where
strxfrm() is applied, then you could just build two btree indexes, one
the regular way and a second with text_pattern_ops.

I'm having trouble figuring out under what set of circumstances spgist
is expected to be the best available alternative.  It only supports a
small subset of the data types that GiST does, so I suppose the point
is that it should be faster for the cases that it does handle.  And,
considering that this is all brand-new code, the fact that it's almost
keeping pace with btree on both pattern-matching and equality
comparisons is certainly respectable -- but I so far haven't found any
cases where it's a clear win.  There's limited glory in being the
almost-fastest way of indexing for a certain class of queries.
Admittedly, I haven't tried the point-in-box stuff yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spgist text_ops and LIKE

Robert Haas robertmh...@gmail.com writes:
 I'm having trouble figuring out under what set of circumstances spgist
 is expected to be the best available alternative.  It only supports a
 small subset of the data types that GiST does, so I suppose the point
 is that it should be faster for the cases that it does handle.  And,
 considering that this is all brand-new code, the fact that it's almost
 keeping pace with btree on both pattern-matching and equality
 comparisons is certainly respectable -- but I so far haven't found any
 cases where it's a clear win.  There's limited glory in being the
 almost-fastest way of indexing for a certain class of queries.
 Admittedly, I haven't tried the point-in-box stuff yet.

I think the text opclass is mostly meant as sample code/proof-of-concept.
What spgist is likely to be good for is queries that btree can't cope
with at all.  But it's cute that we can make it handle LIKE for a few
more lines of code, so I'm fine with adding that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spgist text_ops and LIKE

2012-02-02 Thread Alexander Korotkov

On Thu, Feb 2, 2012 at 10:21 PM, Robert Haas robertmh...@gmail.com wrote:

 I'm having trouble figuring out under what set of circumstances spgist
 is expected to be the best available alternative.  It only supports a
 small subset of the data types that GiST does, so I suppose the point
 is that it should be faster for the cases that it does handle.  And,
 considering that this is all brand-new code, the fact that it's almost
 keeping pace with btree on both pattern-matching and equality
 comparisons is certainly respectable -- but I so far haven't found any
 cases where it's a clear win.  There's limited glory in being the
 almost-fastest way of indexing for a certain class of queries.


I think that current implementation of suffix tree (in particular
implementation of consistent methods) in spgist is far from reveal full
potential of suffix tree. I see at least two additional search types which
can be efficiently evaluated using suffix tree:
1) Search by prefix regular expression. For example, search for
/^a(bc|d)e[fgh]/ in single index scan.
2) Search by levenshtein distance, i.e. SELECT * FROM tbl WHERE
levenshtein(col, 'some constant')  k; using index (surely actual syntax
for such index scan would be anouther).
This is only two additional applications which first comes to my mind,
there could be other. Unfortunately, I don't have enough of time for it
now. But I like to put my hands on this in future.

Admittedly, I haven't tried the point-in-box stuff yet.


In short, I believe spgist for geometrical data-types is more CPU-effective
and less IO-effective than gist. Gist have to call consistent method for
all index tuples of the page it uses in scan, while spgist scans smaller
nodes. Sure, comprehensive testing is required for doing final conclusion
about that.

-
With best regards,
Alexander Korotkov.

Re: [HACKERS] keywords in pg_hba.conf are case-sensitive?

2012-02-02 Thread Magnus Hagander

On Feb 2, 2012 5:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 [ adjusting thread title to have something to do with reality ]

 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Robert Haas's message of jue feb 02 11:39:29 -0300 2012:
  We have lots of things that are case-sensitive; I don't particularly
  see why this one should be different.

  Err, postgresql.conf processing is case insensitive, which is the most
  closely related example.  Are you saying we should make that case
  sensitive as well?  What I'm saying is that I see no good reason for
  keyword comparison to be case sensitive here.  We don't compare case on
  SQL keywords either.

 One thing I'm concerned about is that there are places in pg_hba.conf
 where a token might be either a keyword or a user/group/database name.
 If you start recognizing keywords case-insensitively, you could break
 files that used to work, ie what was meant to be a name will now be
 read as a keyword.  Admittedly, the odds of that are not very large, but
 they're not zero either.  Given the entire lack of complaints about this
 from the field, I'm inclined to think it's better to leave well enough
 alone.  We could add a documentation note if you feel a need for that.

+1. I don't think I've heard a single complaint about this ever...

/Magnus

Re: [HACKERS] Fast GiST index build - further improvements

2012-02-02 Thread Alexander Korotkov

On Thu, Sep 8, 2011 at 8:35 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Now that the main GiST index build patch has been committed, there's a few
further improvements that could make it much faster still:

Better management of the buffer pages on disk. At the moment, the
temporary file is used as a heap of pages belonging to all the buffers in
random order. I think we could speed up the algorithm considerably by
reading/writing the buffer pages sequentially. For example, when an
internal page is split, and all the tuples in its buffer are relocated,
that would be a great chance to write the new pages of the new buffers in
sequential order, instead of writing them back to the pages freed up by the
original buffer, which can be spread all around the temp file. I wonder if
we could use a separate file for each buffer? Or at least, a separate file
for all buffers that are larger than, say 100 MB in size.

Better management of in-memory buffer pages. When we start emptying a
buffer, we currently flush all the buffer pages in memory to the temporary
file, to make room for new buffer pages. But that's a waste of time, if
some of the pages we had in memory belonged to the buffer we're about to
empty next, or that we empty tuples to. Also, if while emptying a buffer,
all the tuples go to just one or two lower level buffers, it would be
beneficial to keep more than one page in-memory for those buffers.

Buffering leaf pages. This I posted on a separate thread already:
http://archives.postgresql.**org/message-id/4E5350DB.**
3060...@enterprisedb.comhttp://archives.postgresql.org/message-id/4e5350db.3060...@enterprisedb.com

Also, at the moment there is one issue with the algorithm that we have
glossed over this far: For each buffer, we keep some information in memory,
in the hash table, and in the auxiliary lists. That means that the amount
of memory needed for the build scales with the size of the index. If you're
dealing with very large indexes, hopefully you also have a lot of RAM in
your system, so I don't think this is a problem in practice. Still, it
would be nice to do something about that. A straightforward idea would be
to swap some of the information to disk. Another idea that, simpler to
implement, would be to completely destroy a buffer, freeing all the memory
it uses, when it becomes completely empty. Then, if you're about to run out
of memory (as defined by maintenance_work_mem), you can empty some low
level buffers to disk to free up some.

Unfortunately, I hadn't enough of time to implement something of this
before 9.2 release. Work on my Phd. thesis and web-projects takes too much
time.

But, I think there is one thing we should fix before 9.2 release. We assume
that gist index build have at least effective_cache_size/4 of cache. This
assumption could easily be false on high concurrency systems. I don't see
the way for convincing estimate here, but we could document this behaviour.
So, users could just tune effective_cache_size for gist index build on high
concurrency.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] pg_stats_recovery view

2012-02-02 Thread Bernd Helmle




--On 2. Februar 2012 17:12:11 +0900 Fujii Masao masao.fu...@gmail.com wrote:


If only core developer is interested in this view, ISTM that short
description for
each WAL record is not required because he or she can know the meaning of each
WAL record by reading the source code. No? Adding short descriptions for every
WAL records seems to be overkill.


Yes, for a developer option alone adding all these *_short_desc functions looks
too much code for too less benefit. However, if someone with less code affinity
is interested to debug his server during recovery, it might be easier for him 
to interpret the statistic counters.


Unfortunately i didn't manage to do it this week, but what i'm also interested 
in is how large the performance regression is if the track_recovery variable is 
activated. Not sure wether it really makes a big difference, but maybe 
debugging recovery from a large archive could slow down to a degree, where you 
want the GUC but can't afford it?


And, for display purposes, when this is intended for developers only, shouldn't 
it be treated like all the other debug options as a DEVELOPER_OPTION, too?


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] NULL's support in SP-GiST

2012-02-02 Thread Oleg Bartunov


Hi there,

attached patch introduces NULLs indexing for SP-GiST. With this patch
Sp-GiST supports IS NULL, IS NOT NULL clauses, as well as full index scan.

We added boolean satisfyAll field in spgInnerConsistentIn and spgLeafConsistentIn 
structures, which informs the  user-defined methods, that all search results 
satisfy a query and should be returned. Calls of consistent methods are

needed because they know how to reconstruct an original value.

Unlike BTree we can't introduce a rule like NULL is greater than
anything else, because Sp-GiST  doesn't know  semantics of indexed data.
Also, Sp-GiST is essentially single-column index, so we can
store null values outside the main structure of index and use separate code 
path to work with NULLs. Actually, storing just ItemPointer 
(instead of the whole index tuple) is enough for NULLs, so we can reuse 
data tree storage from GIN, which used for storing 
ItemPointers for each indexed values. For that purpose, GinPageOpaqueData 
and SpGistPageOpaqueData should be compatible, at least at flag's
positions and values. In fact, it's needed only for vacuum code, 
which produces full index scan.





Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

spgist_null-0.8.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-02-02 Thread Bernd Helmle




--On 2. Februar 2012 18:47:35 +0100 Gilles Darold gilles.dar...@dalibo.com 
wrote:



For now the internal function BackupInProgress() only return a boolean.
I like the idea that pg_is_in_backup() just works as pg_is_in_recovery()
do. I mean it doesn't return the timestamp of the recovery starting time
but only true or false.

Maybe we can have an other function that will return all information
available in the backup_label file ?


I've seen customers using pg_read_file() to do exactly this. E.g. they are 
searching for the START TIME value, if backup_label is present, to report the 
backup start.


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-02-02 Thread Magnus Hagander

On Thu, Feb 2, 2012 at 12:23, Marti Raudsepp ma...@juffo.org wrote:
 On Mon, Jan 30, 2012 at 20:33, Gilles Darold gilles.dar...@dalibo.com wrote:
 After some time searching for a Pg system administration function like
 pg_is_in_recovery(), let's say pg_is_in_backup(), I couldn't find one.
 The minor patch attached here adds this administrative function that can
 be used with others backup control functions. This is a very little
 patch outside documentation because the function is only a wrapper to
 the internal BackupInProgress() function, just like pg_is_in_recovery()
 is calling RecoveryInProgress().

 I think it would be more useful to have a function that returns a
 timestamp when the backup started. That way, it's possible to write a
 generic monitoring script that alerts the sysadmin only when a backup
 has been running for too long, but is silent for well-behaved backups.

If there is more than one concurrent backup running, which one do you
return? The first one or the latest one? Or perhaps you need an
interface thta can return them all...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name

[ working on this patch now ... ]

Matthew Draper matt...@trebex.net writes:
 On 25/01/12 18:37, Hitoshi Harada wrote:
 Should we throw an error in such ambiguity? Or did you make it happen
 intentionally? If latter, we should also mention the rule in the
 manual.

 I did consider it, and felt it was the most consistent:

I believe the issue here is that a two-part name A.B has two possible
interpretations (once we have eliminated table references supplied by
the current SQL command inside the function): per the comment,

 * A.BA = record-typed parameter name, B = field name
 *OR: A = function name, B = parameter name

If both interpretations are feasible, what should we do?  The patch
tries them in the above order, but I think the other order would be
better.  My argument is this: the current behavior doesn't provide any
out other than changing the function or parameter name.  Now
presumably, if someone is silly enough to use a parameter name the same
as the function's name, he's got a good reason to do so and would not
like to be forced to change it.  If we prefer the function.parameter
interpretation, then he still has a way to get to a field name: he just
has to use a three-part name function.parameter.field.  If we prefer the
parameter.field interpretation, but he needs to refer to a scalar
parameter, the only way to do it is to use an unqualified reference,
which might be infeasible (eg, if it also matches a column name exposed
in the SQL command).

Another problem with the current implementation is that if A matches a
parameter name, but ParseFuncOrColumn fails (ie, the parameter is not of
composite type or doesn't contain a field named B), then the hook just
fails to resolve anything; it doesn't fall back to consider the
function-name-first alternative.  So that's another usability black
mark.  We could probably complicate the code enough so it did consider
the function.parameter case at that point, but I don't think that there
is a strong enough argument for this precedence order to justify such
complication.

In short, I propose swapping the order in which these cases are tried.

(BTW, my reading of the SQL spec is that it thinks we should throw
an error for such ambiguity.  So that would be another possible answer,
but I'm not sure it's greatly helpful.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hot standby fails if any backend crashes

I'm currently working with Duncan Rance's test case for bug #6425, and
I am observing a very nasty behavior in HEAD: once one of the
hot-standby query backends crashes, the standby postmaster SIGQUIT's
all its children and then just quits itself, with no log message and
apparently no effort to restart.  Surely this is not intended?  The
log shows

TRAP: FailedAssertion(!(((lpp)-lp_flags == 1)), File: heapam.c, Line: 735)
2012-02-02 18:02:39.985 EST 29363 LOG:  server process (PID 15238) was 
terminated by signal 6: Aborted
2012-02-02 18:02:39.985 EST 29363 DETAIL:  Failed process was running: SELECT * 
FROM repro_02_ref;
2012-02-02 18:02:39.985 EST 29363 LOG:  terminating any other active server 
processes
2012-02-02 18:02:39.985 EST 15214 WARNING:  terminating connection because of 
crash of another server process
2012-02-02 18:02:39.985 EST 15214 DETAIL:  The postmaster has commanded this 
server process to roll back the current transaction and exit, because another 
server process exited abnormally and possibly corrupted shared memory.
2012-02-02 18:02:39.985 EST 15214 HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.
2012-02-02 18:02:39.985 EST 15213 WARNING:  terminating connection because of 
crash of another server process
2012-02-02 18:02:39.985 EST 15213 DETAIL:  The postmaster has commanded this 
server process to roll back the current transaction and exit, because another 
server process exited abnormally and possibly corrupted shared memory.
2012-02-02 18:02:39.985 EST 15213 HINT:  In a moment you should be able to 
reconnect to the database and repeat your command.
[ repeat the above for what I assume are all the child processes ]

... and then nothing.  The standby postmaster is no longer running and
there are no log messages from it after the terminating any other
active server processes one.  No core dump from it, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON output functions.

2012-02-02 Thread Andrew Dunstan




On 02/02/2012 12:20 PM, Pavel Stehule wrote:

2012/2/2 Andrew Dunstanand...@dunslane.net:


On 02/02/2012 04:35 AM, Abhijit Menon-Sen wrote:

At 2012-02-01 18:48:28 -0500, andrew.duns...@pgexperts.com wrote:

For now I'm inclined not to proceed with that, and leave it as an
optimization to be considered later if necessary. Thoughts?

I agree, there doesn't seem to be a pressing need to do it now.



OK, here's my final version of the patch for constructor functions. If
there's no further comment I'll go with this.

These function are super, Thank you

Do you plan to fix a issue with row attribute names in 9.2?




Yeah. Tom did some initial work which he published here: 
http://archives.postgresql.org/message-id/28413.1321500388%40sss.pgh.pa.us, 
noting:


   It's not really ideal with respect to
   the ValuesScan case, because what you get seems to always be the
   hard-wired columnN names for VALUES columns, even if you try to
   override that with an alias
   ...
   Curiously, it works just fine if the VALUES can be folded

and later he said:

   Upon further review, this patch would need some more work even for the
   RowExpr case, because there are several places that build RowExprs
   without bothering to build a valid colnames list.  It's clearly soluble
   if anyone cares to put in the work, but I'm not personally excited
   enough to pursue it ..

I'm going to look at that issue first, since the unfolded VALUES clause seems 
like something of an obscure corner case. Feel free to chime in if you can.

cheers


andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans

2012-02-02 Thread Noah Misch

On Wed, Jan 25, 2012 at 08:42:05PM -0500, Robert Haas wrote:
 Only the first pass of vacuum knows how to mark pages all-visible.
 After the update, the first pass of the first vacuum sees a dead tuple
 on the old page and truncates it to a dead line pointer.  When it
 comes to the new page, it observes that the page is now all-visible
 and marks it so.  It then does index vacuuming and returns to the
 first page, marking the dead line pointer unused.  But during this
 second visit to the old page, there's no possibility of marking the
 page as all-visible, because the code doesn't know how to do that.
 The next vacuum's first pass, however, can do so, because there are no
 longer any dead tuples on the page.
 
 We could fix this by modifying lazy_vacuum_page(): since we have to
 dirty the buffer anyway, we could recheck whether all the remaining
 tuples on the page are now all-visible, and if so set the visibility
 map bit.  This is probably desirable even apart from index-only scans,
 because it will frequently save the next vacuum the cost of reading,
 dirtying, and writing extra pages.  There will be some incremental CPU
 cost, but that seems likely to be more than repaid by the I/O savings.
 
 Thoughts?  Should we do this at all?  If so, should we squeeze it into
 9.2 or leave it for 9.3?

Sounds like a good idea.  It has bothered me that two consecutive VACUUMs of a
table, with no intervening activity, can dirty a page twice.  Making that less
frequent is a good thing.  I'd hold the change for 9.3, but that's probably an
unusually-conservative viewpoint.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

2012-02-02 Thread Joachim Wieland

On Wed, Feb 1, 2012 at 12:24 PM, Robert Haas robertmh...@gmail.com wrote:
 I think we're more-or-less proposing to rename Archive to
 Connection, aren't we?

 And then ArchiveHandle can store all the things that aren't related to
 a specific connection.

How about something like that:
(Hopefully you'll come up with better names...)

StateHandle {
   Connection
   Schema
   Archive
   Methods
   union {
  DumpOptions
  RestoreOptions
   }
}

Dumping would mean to do:

  Connection - Schema - Archive using DumpOptions through the
specified Methods

Restore:

   Archive - Schema - Connection using RestoreOptions through the
specified Methods

Dumping from one database and restoring into another one would be two
StateHandles with different Connections, Archive == NULL, Schema
pointing to the same Schema, Methods most likely also pointing to the
same function pointer table and each with different Options in the
union of course.

Granted, you could say that above I've only grouped the elements of
the ArchiveHandle, but I don't really see that breaking it up into
several objects makes it any better or easier. There could be some
accessor functions to hide the details of the whole object to the
different consumers.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-02-02 Thread Euler Taveira de Oliveira

On 02-02-2012 20:06, Magnus Hagander wrote:
 If there is more than one concurrent backup running, which one do you
 return? The first one or the latest one? Or perhaps you need an
 interface thta can return them all...
 
IMHO, pg_is_in_backup() should return true if one or more backup copies are
running. As about returning the backup timestamp, we could return an array
like array[['label_1', '2012-01-28 02:00:01 BRST'], ['label_2', '2012-01-28
03:40:34 BRST']] or NULL if none.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-02-02 Thread Kyotaro HORIGUCHI

Hello,

 Thanks, merged.  I also did some minor coding style cleanups.

Thank you for editing many comments and some code I'd left
unchanged from my carelessness, and lack of understanding your
comments. I'll be more careful about that...

 There is some potential for experimenting with more aggressive
 optimizations on dblink side, but I'd like to get a nod from
 a committer for libpq changes first.

I'm looking forward to the aggressiveness of that:-)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

On Fri, Feb 3, 2012 at 11:31 AM, Euler Taveira de Oliveira
eu...@timbira.com wrote:
 On 02-02-2012 20:06, Magnus Hagander wrote:
 If there is more than one concurrent backup running, which one do you
 return? The first one or the latest one? Or perhaps you need an
 interface thta can return them all...

 IMHO, pg_is_in_backup() should return true if one or more backup copies are
 running. As about returning the backup timestamp, we could return an array
 like array[['label_1', '2012-01-28 02:00:01 BRST'], ['label_2', '2012-01-28
 03:40:34 BRST']] or NULL if none.

It seems to be more user-friendly to introduce a view like pg_stat_backup
rather than the function returning an array.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby fails if any backend crashes

I wrote:
 I'm currently working with Duncan Rance's test case for bug #6425, and
 I am observing a very nasty behavior in HEAD: once one of the
 hot-standby query backends crashes, the standby postmaster SIGQUIT's
 all its children and then just quits itself, with no log message and
 apparently no effort to restart.  Surely this is not intended?

I looked through postmaster.c and found that the cause of this is pretty
obvious: if the startup process exits with any non-zero status, we
assume that represents an unrecoverable error condition, and set
RecoveryError which causes the postmaster to exit silently as soon as
its last child is gone.  But we do this even if the reason the startup
process did exit(1) is that we sent it SIGQUIT as a result of a crash of
some other process.  Of course this logic dates from a time where the
startup process could not have any siblings, so when it was written,
such a thing was impossible.

I think saner behavior might only require this change:

/*
 * Any unexpected exit (including FATAL exit) of the startup
 * process is treated as a crash, except that we don't want to
 * reinitialize.
 */
if (!EXIT_STATUS_0(exitstatus))
{
-   RecoveryError = true;
+   if (!FatalError)
+   RecoveryError = true;
HandleChildCrash(pid, exitstatus,
 _(startup process));
continue;
}

plus suitable comment adjustments of course.  Haven't tested this yet
though.

It's a bit disturbing that nobody has reported this from the field yet.
Seems to imply that hot standby isn't being used much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby fails if any backend crashes