date:20100728

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Dimitri Fontaine

Tom Lane t...@sss.pgh.pa.us writes:
 In the example, we do have d1.id and d2.basedon grouped in an
 equivalence class.  So in principle you could substitute d1.id into the
 WHERE clause in place of d2.basedon, once you'd checked that it was
 being used with an operator that's compatible with the specific
 equivalence class (ie it's in one of the eclass's opfamilies, I think).
 The problem is to recognize that such a rewrite would be a win --- it
 could just as easily be a big loss.

Ok, that was my feeling too.

 Even if we understood how to direct the rewriting process, I'm really
 dubious that it would win often enough to justify the added planning
 time.  The particular problem here seems narrow enough that solving it
 on the client side is probably a whole lot easier and cheaper than
 trying to get the planner to do it.

My overly naive uneducated idea here would be to produce both the plans
and let the planner evaluate their respective costs. Maybe that's what
you mean here by how to direct the rewriting process. Then we don't
want to generate too many useless plans when you have lots of eclass
around.

This brings back the idea of pondering somehow the optimiser effort
pushed into solving a query plan. Like in gcc we can use different
effort targets and we don't know for sure before hand if -O3 will
produce faster code than -O2, all we know is that it will try harder.

Is it possible to imagine having a plan_eclass_permutations default to
false that would activate the discussed behavior here? Ok, I'm not sure
what form should take such a setting, but clearly, there's a need to be
able to impact the optimiser effort.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] merge command - GSoC progress

2010-07-28 Thread Boxuan Zhai

2010/7/28 Robert Haas robertmh...@gmail.com

 On Tue, Jul 27, 2010 at 1:04 AM, Boxuan Zhai bxzhai2...@gmail.com wrote:
  I have get a edition that the merge command can run. It accept the
 standard
  merge command and can do UPDATE, INSERT and DELETE actions now. But we
  cannot put additional qualification for actions. There are some bugs when
 we
  try to evaluate the quals which make the system quit. I will fix it soon.

 This patch doesn't compile.  You're using zbxprint() from a bunch of
 places where it's not defined.  I get compile warnings for all of
 those files and then a link failure at the end.  You might find it
 useful to create src/Makefile.custom in your local tree and put
 COPT=-Werror in there; it tends to prevent problems of this kind.

 Undefined symbols:
  _zbxprint, referenced from:
  _transformStmt in analyze.o
  _ExecInitMergeAction in nodeModifyTable.o
  _ExecModifyTable in nodeModifyTable.o
  _ExecInitModifyTable in nodeModifyTable.o
  _merge_action_planner in planner.o

 Sorry, this is a debug function defined by me. It may not be included in
the patch. I add a line of #define zbxprint printf somewhere in the
system.


 Not that it's as high-priority as getting this fully working, but you
 should revert the useless changes in this patch - e.g. the one-line
 change to heaptuple.c is obvious debugging leftovers, and all of the
 changes to execQual.c and execUtil.c are whitespace-only.  You should
 also try to make your code and comments conform to project style
 guidelines.  In general, you'll find it easier to keep track of your
 changes (and you'll have fewer spurious changes) if you use git diff
 master...yourbranch instead of marking comments, etc. with ZBX or
 similar.



I will clean all these in my next patch.

I am now very confused with the failure of action qualification. I look
through the whole process of a query, from parser to executor. In my
opinion, the qualification transformed in analyzer, will be processed by
prepsocess_qual_condition() in planner, and then by the ExecInitExpr()
function in excutor start phase (in InitPlan() function). Then the qual is
ready to be used in ExecQual(). Am I correct?

I have done these on the merge action qual, but when I pass them into
ExecQual(), the server just closed abnormally. I don't know if I missed any
steps on preparing  the qual expressions.

  --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company

Re: [HACKERS] merge command - GSoC progress

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 6:08 AM, Boxuan Zhai bxzhai2...@gmail.com wrote:
 On Tue, Jul 27, 2010 at 1:04 AM, Boxuan Zhai bxzhai2...@gmail.com wrote:
  I have get a edition that the merge command can run. It accept the
  standard
  merge command and can do UPDATE, INSERT and DELETE actions now. But we
  cannot put additional qualification for actions. There are some bugs
  when we
  try to evaluate the quals which make the system quit. I will fix it
  soon.

 This patch doesn't compile.  You're using zbxprint() from a bunch of
 places where it's not defined.  I get compile warnings for all of
 those files and then a link failure at the end.  You might find it
 useful to create src/Makefile.custom in your local tree and put
 COPT=-Werror in there; it tends to prevent problems of this kind.

 Undefined symbols:
  _zbxprint, referenced from:
      _transformStmt in analyze.o
      _ExecInitMergeAction in nodeModifyTable.o
      _ExecModifyTable in nodeModifyTable.o
      _ExecInitModifyTable in nodeModifyTable.o
      _merge_action_planner in planner.o

 Sorry, this is a debug function defined by me. It may not be included in the
 patch. I add a line of #define zbxprint printf somewhere in the system.

Yeah, but it's not included in all the places that are needed to make
everything compile.  You might move this to postgres.h or something.

 Not that it's as high-priority as getting this fully working, but you
 should revert the useless changes in this patch - e.g. the one-line
 change to heaptuple.c is obvious debugging leftovers, and all of the
 changes to execQual.c and execUtil.c are whitespace-only.  You should
 also try to make your code and comments conform to project style
 guidelines.  In general, you'll find it easier to keep track of your
 changes (and you'll have fewer spurious changes) if you use git diff
 master...yourbranch instead of marking comments, etc. with ZBX or
 similar.

 I will clean all these in my next patch.

 I am now very confused with the failure of action qualification. I look
 through the whole process of a query, from parser to executor. In my
 opinion, the qualification transformed in analyzer, will be processed by
 prepsocess_qual_condition() in planner, and then by the ExecInitExpr()
 function in excutor start phase (in InitPlan() function). Then the qual is
 ready to be used in ExecQual(). Am I correct?

I'm not sure, sorry.

 I have done these on the merge action qual, but when I pass them into
 ExecQual(), the server just closed abnormally. I don't know if I missed any
 steps on preparing  the qual expressions.

Have you tried attaching a debugger?  Try SELECT pg_backend_pid()
and then use gdb -p the_pid from another window.  Hit continue.
Then do whatever it is that's crashing.  That way you can get a stack
backtrace, and poke around at the data structures.

Using pprint() on node-type data structures, either in debugging code
or actually straight from the debugger via call, is also very
helpful, often-times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostGIS vs. PGXS in 9.0beta3

2010-07-28 Thread Mark Cave-Ayland


Andrew Dunstan wrote:

The real problem has nothing to do with any of the analysis, as you say. 
It is this: they have an override file for PGXS and it uses 
$(mkinstalldirs) which we got rid of about a year ago. So apparently 
they haven't been testing much against any of our alphas or betas or 
they would have seen this long ago. The correct fix is to do the 
following in the PostGIS source root:


   sed -i -e 's/mkinstalldirs/MKDIR_P/' postgis/Makefile.pgxs

cheers

andrew


Hmmm that's totally wrong - the override in Makefile.pgxs should only 
ever be loaded for PostgreSQL 8.3 and 8.4, and not PostgreSQL 9.0 since 
it already has the correct installation paths.


What I suspect is that you're actually getting bitten by this:

http://markmail.org/message/k7iolbazhrqhijfk#query:pg_config%20jun%202007+page:1+mid:rqk6ux2e7npqbrzf+state:results

Or, in other words, configure is picking up the wrong pg_config. Since 
the path fix in the thread was not backported to  8.3, then the 
presence of an another pg_config for PostgreSQL  8.3 in PATH will break 
things :(



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 12:23 AM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2010-07-27 at 15:23 -0700, Jeff Davis wrote:
 On Tue, 2010-07-27 at 17:18 -0400, Robert Haas wrote:
   My first concern with that idea was that it may create an inconsistency
   between the primary and the standby. The primary could have a bunch of
   zero pages that never make it to the standby.
 
  Maybe I'm slow on the uptake here, but don't the pages start out
  all-zeroes on the standby just as they do on the primary? The only way
  it seems like this would be a problem is if a page that previously
  contained data on the primary was subsequently zeroed without writing
  a WAL record - or am I confused?

 The case I was concerned about is when you have a table on the primary
 with a bunch of zero pages at the end. Then you SET TABLESPACE, and none
 of the copied pages (or even the fact that they exist) would be sent to
 the standby, but they would exist on the primary. And later pages may
 have data, so the standby may see page N but not N-1.

 Generally, most of the code is not expecting to read or write past the
 end of the file, unless it's doing an extension.

 However, I think everything is fine during recovery, because it looks
 like it's designed to create zero pages as needed. So your idea seems
 safe to me, although I do still have some doubts because of my lack of
 knowledge in this area; particularly hot standby conflict
 detection/resolution.

 My idea was different: still log the zero page, just don't set LSN or
 TLI for a zero page in log_newpage() or heap_xlog_newpage(). This isn't
 as clean as your idea, but I'm a little more confident that it is
 correct.


 Both potential fixes attached and both appear to work.

 fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
 heap_xlog_newpage() if the page is not zeroed.

 fix2 -- Don't call log_newpage() at all if the page is not zeroed.

 Please review. I don't have a strong opinion about which one should be
 applied.

Looks good.  I still prefer fix2; it seems cleaner to me.  It has
another advantage, too, which is that copy_relation_data() is used
ONLY by ALTER TABLE SET TABLESPACE.  So if I stick to patching that
function, that's the only thing I can possibly break, whereas
log_newpage() is used in a bunch of other places.  I don't think
either fix is going to break anything at all, but considering that
this is going to need back-patching, I'd rather be conservative.

Speaking of back-patching, the subject line describes this as an 8.3+
problem, but it looks to me like this goes all the way back to 8.0.
The code is a little different in 8.2 and prior, but ISTM it's
vulnerable to the same issue.  Don't we need a modified version of
this patch for the 8.0 - 8.2 branches?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 3:45 AM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Even if we understood how to direct the rewriting process, I'm really
 dubious that it would win often enough to justify the added planning
 time.  The particular problem here seems narrow enough that solving it
 on the client side is probably a whole lot easier and cheaper than
 trying to get the planner to do it.

 My overly naive uneducated idea here would be to produce both the plans
 and let the planner evaluate their respective costs. Maybe that's what
 you mean here by how to direct the rewriting process. Then we don't
 want to generate too many useless plans when you have lots of eclass
 around.

The way the planner is set up, you'd have to plan with qual A, then
repeat the entire process with qual B, and then just for good measure
repeat the process with both quals A and B.  ISTM you'd triple the
planning time if there were even just one case of this in a particular
query.  If you have different ways of generating the same output for a
given rel, you can just throw them all into a bucket and let the
planner work it out.  But here you want to have different paths for
the same relation that generate *different output*, and the planner
doesn't understand that concept.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
  But here you want to have different paths for
 the same relation that generate *different output*, and the planner
 doesn't understand that concept.

Sorry? I though what Equivalence Class provides is the proving that
using this qualification or another will *not* affect the output.
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Simon Riggs

On Tue, 2010-07-27 at 21:23 -0700, Jeff Davis wrote:

 Both potential fixes attached and both appear to work.
 
 fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
 heap_xlog_newpage() if the page is not zeroed.
 
 fix2 -- Don't call log_newpage() at all if the page is not zeroed.
 
 Please review. I don't have a strong opinion about which one should be
 applied.

ISTM we should just fix an uninitialized page first, using code from
VACUUM similar to

  if (PageIsNew(page))
  {
ereport(WARNING,
(errmsg(relation \%s\ page %u is uninitialized --- fixing,
relname, blkno)));
PageInit(page, BufferGetPageSize(buf), 0);
  }

then continue as before.

We definitely shouldn't do anything that leaves standby different to
primary.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostGIS vs. PGXS in 9.0beta3

2010-07-28 Thread Andrew Dunstan




Mark Cave-Ayland wrote:

Andrew Dunstan wrote:

The real problem has nothing to do with any of the analysis, as you 
say. It is this: they have an override file for PGXS and it uses 
$(mkinstalldirs) which we got rid of about a year ago. So apparently 
they haven't been testing much against any of our alphas or betas or 
they would have seen this long ago. The correct fix is to do the 
following in the PostGIS source root:


   sed -i -e 's/mkinstalldirs/MKDIR_P/' postgis/Makefile.pgxs

cheers

andrew


Hmmm that's totally wrong - the override in Makefile.pgxs should only 
ever be loaded for PostgreSQL 8.3 and 8.4, and not PostgreSQL 9.0 
since it already has the correct installation paths.


What I suspect is that you're actually getting bitten by this:

http://markmail.org/message/k7iolbazhrqhijfk#query:pg_config%20jun%202007+page:1+mid:rqk6ux2e7npqbrzf+state:results 



Or, in other words, configure is picking up the wrong pg_config. Since 
the path fix in the thread was not backported to  8.3, then the 
presence of an another pg_config for PostgreSQL  8.3 in PATH will 
break things :(


No, the configure test is wrong. Here's what's in configure.ac:


   dnl Temporary hack until minimum PostgreSQL version is 8.5:
   dnl If PostgreSQL  8.5 is detected, trigger the inclusion of the
   new versioned PGXS targets
   PGXSOVERRIDE=0
   if test ! $PGSQL_MINOR_VERSION -ge 5; then
   PGXSOVERRIDE=1
   fi


Of course, we don't have any such thing as 8.5.

cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 6:55 AM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Robert Haas robertmh...@gmail.com writes:
  But here you want to have different paths for
 the same relation that generate *different output*, and the planner
 doesn't understand that concept.

 Sorry? I though what Equivalence Class provides is the proving that
 using this qualification or another will *not* affect the output.

In a query like...

 SELECT d1.ID, d2.ID
 FROM DocPrimary d1
   JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
 WHERE (d1.ID=234409763) or (d2.ID=234409763)

...you're going to scan d1, scan d2, and then join the results.  The
scan of d1 is going to produce different results depending on whether
you evaluate or not d1.ID=234409763, and the scan of d2 is going to
produce different results depending on whether or not you evaluate
d2.BasedOn=234409763.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Yeb Havinga


Robert Haas wrote:

On Wed, Jul 28, 2010 at 6:55 AM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
  

Robert Haas robertmh...@gmail.com writes:


 But here you want to have different paths for
the same relation that generate *different output*, and the planner
doesn't understand that concept.
  

Sorry? I though what Equivalence Class provides is the proving that
using this qualification or another will *not* affect the output.



In a query like...

 SELECT d1.ID, d2.ID
 FROM DocPrimary d1
   JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
 WHERE (d1.ID=234409763) or (d2.ID=234409763)

...you're going to scan d1, scan d2, and then join the results.  The
scan of d1 is going to produce different results depending on whether
you evaluate or not d1.ID=234409763, and the scan of d2 is going to
produce different results depending on whether or not you evaluate
d2.BasedOn=234409763.
  
Wouldn't it be relatively easy, to rewrite the filter expression by 
adding expressions, instead of replacing constants, in the disjunctive 
case, so the example at hand would become:


WHERE (d1.ID=234409763) or (d2.ID=234409763)
AND (d2.BasedOnID=234409763) or (d2.ID=234409763)

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostGIS vs. PGXS in 9.0beta3

2010-07-28 Thread Mark Cave-Ayland


Andrew Dunstan wrote:


No, the configure test is wrong. Here's what's in configure.ac:


   dnl Temporary hack until minimum PostgreSQL version is 8.5:
   dnl If PostgreSQL  8.5 is detected, trigger the inclusion of the
   new versioned PGXS targets
   PGXSOVERRIDE=0
   if test ! $PGSQL_MINOR_VERSION -ge 5; then
   PGXSOVERRIDE=1
   fi


Of course, we don't have any such thing as 8.5.


Ah wait - I see. The fix is in already in the 1.5 branch, it just hasn't 
hit a release yet :(


http://trac.osgeo.org/postgis/changeset/5421/branches/1.5/configure.ac

Looks like we need to push a new release in time for 9.0...


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 7:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2010-07-27 at 21:23 -0700, Jeff Davis wrote:

 Both potential fixes attached and both appear to work.

 fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
 heap_xlog_newpage() if the page is not zeroed.

 fix2 -- Don't call log_newpage() at all if the page is not zeroed.

 Please review. I don't have a strong opinion about which one should be
 applied.

 ISTM we should just fix an uninitialized page first, using code from
 VACUUM similar to

  if (PageIsNew(page))
  {
    ereport(WARNING,
        (errmsg(relation \%s\ page %u is uninitialized --- fixing,
                                                relname, blkno)));
    PageInit(page, BufferGetPageSize(buf), 0);
  }

 then continue as before.

As Jeff Davis pointed out upthread, you don't know that 0 is the
correct special size for the relation being copied.  In the VACUUM
path, that code is only applied to heaps, but that's not necessarily
the case here.

 We definitely shouldn't do anything that leaves standby different to
 primary.

Obviously.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 7:24 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 Sorry? I though what Equivalence Class provides is the proving that
 using this qualification or another will *not* affect the output.

 In a query like...

  SELECT d1.ID, d2.ID
  FROM DocPrimary d1
   JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE (d1.ID=234409763) or (d2.ID=234409763)

 ...you're going to scan d1, scan d2, and then join the results.  The
 scan of d1 is going to produce different results depending on whether
 you evaluate or not d1.ID=234409763, and the scan of d2 is going to
 produce different results depending on whether or not you evaluate
 d2.BasedOn=234409763.

 Wouldn't it be relatively easy, to rewrite the filter expression by adding
 expressions, instead of replacing constants, in the disjunctive case, so the
 example at hand would become:

 WHERE (d1.ID=234409763) or (d2.ID=234409763)
 AND (d2.BasedOnID=234409763) or (d2.ID=234409763)

Yeah, that could be done, but it's not necessarily a win from a
performance standpoint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
  SELECT d1.ID, d2.ID
  FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE (d1.ID=234409763) or (d2.ID=234409763)

 ...you're going to scan d1, scan d2, and then join the results.  The
 scan of d1 is going to produce different results depending on whether
 you evaluate or not d1.ID=234409763, and the scan of d2 is going to
 produce different results depending on whether or not you evaluate
 d2.BasedOn=234409763.

Well I just realised you can't use d2.BasedOn in scanning d1 here. I
don't know what exactly I had in mind previously, but in any case, sorry
for the noise.

I hope the optimiser effort control still hold water nonetheless…
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] patch saved in commitfest application isn't actual now

2010-07-28 Thread Pavel Stehule

Hello,

can you send a current version, please. I looked to git repository,
but you did more changes.

Thank you

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Yeb Havinga


Robert Haas wrote:

On Wed, Jul 28, 2010 at 7:24 AM, Yeb Havinga yebhavi...@gmail.com wrote:
  

Sorry? I though what Equivalence Class provides is the proving that
using this qualification or another will *not* affect the output.


In a query like...

 SELECT d1.ID, d2.ID
 FROM DocPrimary d1
  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
 WHERE (d1.ID=234409763) or (d2.ID=234409763)

...you're going to scan d1, scan d2, and then join the results.  The
scan of d1 is going to produce different results depending on whether
you evaluate or not d1.ID=234409763, and the scan of d2 is going to
produce different results depending on whether or not you evaluate
d2.BasedOn=234409763.
  

Wouldn't it be relatively easy, to rewrite the filter expression by adding
expressions, instead of replacing constants, in the disjunctive case, so the
example at hand would become:

WHERE (d1.ID=234409763) or (d2.ID=234409763)
AND (d2.BasedOnID=234409763) or (d2.ID=234409763)



Yeah, that could be done, but it's not necessarily a win from a
performance standpoint.
  
Not necessarily a win, but on the test case no significant increase in 
planning time. It somehow feels like a good idea to give the planner as 
much information as possible, i.e. for each rel as much baserestrictinfo's.


I earlier forgot parentheses, the correct query is

SELECT d1.ID, d2.ID
FROM DocPrimary d1
  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE ((d1.ID=234409763) or (d2.ID=234409763))
AND ((d2.BasedOn=234409763) or (d2.ID=234409763));

by doing this in the rewrite step, triple planning would be avoided. I 
suspect that a copyObject of the expression + expression tree mutator 
call time during rewrite is negligible compared to plan time, assuming 
this is minimal, in this particulare case there doesn't seem to be much 
planning time between the three variants.


I ran the script below a number of times, the third time is the one with 
expanded expression:


Time: 0.820 ms
Time: 0.859 ms
Time: 0.877 ms
---
Time: 0.617 ms
Time: 0.662 ms
Time: 0.737 ms
---
Time: 0.817 ms
Time: 0.766 ms
Time: 0.826 ms
---
Time: 0.638 ms
Time: 0.700 ms
Time: 0.706 ms
---
Time: 0.463 ms
Time: 0.847 ms
Time: 0.793 ms
---
Time: 0.629 ms
Time: 0.671 ms
Time: 0.703 ms


this was the script (on the relation and index supplied by the OP)

-- warm catalog
explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE (d1.ID=234409763) or (d2.ID=234409763);

\timing

explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE (d1.ID=234409763) or (d2.ID=234409763);
 
explain SELECT d1.ID, d2.ID

FROM DocPrimary d1
  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d2.BasedOn=234409763) or (d2.ID=234409763);

explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
  WHERE ((d1.ID=234409763) or (d2.ID=234409763))
AND ((d2.BasedOn=234409763) or (d2.ID=234409763));



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE SET STATISTICS requires AccessExclusiveLock

2010-07-28 Thread Peter Eisentraut

On tor, 2010-07-15 at 10:24 +0100, Simon Riggs wrote:
 Patch to reduce lock levels for 
  ALTER TABLE
  CREATE TRIGGER
  CREATE RULE

Tried this out, but $subject is still the case.  The problem is that
ATRewriteCatalogs() calls AlterTableCreateToastTable() based on what it
thinks the subcommands are, and AlterTableCreateToastTable() takes an
AccessExclusiveLock.

This could possibly be addressed by moving AT_SetStatistics to
AT_PASS_MISC in order to avoid the TOAST table call.

In a related matter, assigning ShareUpdateExclusiveLock to AT_SetStorage
doesn't work either, because the TOAST table call needs to be done in
that case.

Perhaps this logic needs to be refactored a bit more so that there
aren't any inconsistencies between the lock modes and the passes,
which could lead to false expectations and deadlocks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE SET STATISTICS requires AccessExclusiveLock

2010-07-28 Thread Simon Riggs

On Wed, 2010-07-28 at 15:24 +0300, Peter Eisentraut wrote:
 On tor, 2010-07-15 at 10:24 +0100, Simon Riggs wrote:
  Patch to reduce lock levels for 
   ALTER TABLE
   CREATE TRIGGER
   CREATE RULE
 
 Tried this out, but $subject is still the case.  The problem is that
 ATRewriteCatalogs() calls AlterTableCreateToastTable() based on what it
 thinks the subcommands are, and AlterTableCreateToastTable() takes an
 AccessExclusiveLock.
 
 This could possibly be addressed by moving AT_SetStatistics to
 AT_PASS_MISC in order to avoid the TOAST table call.
 
 In a related matter, assigning ShareUpdateExclusiveLock to AT_SetStorage
 doesn't work either, because the TOAST table call needs to be done in
 that case.
 
 Perhaps this logic needs to be refactored a bit more so that there
 aren't any inconsistencies between the lock modes and the passes,
 which could lead to false expectations and deadlocks.

Thanks for your comments. Will reconsider and update.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query optimization problem

2010-07-28 Thread Tom Lane

Yeb Havinga yebhavi...@gmail.com writes:
 Robert Haas wrote:
 On Wed, Jul 28, 2010 at 7:24 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 Wouldn't it be relatively easy, to rewrite the filter expression by adding
 expressions, instead of replacing constants, in the disjunctive case, so the
 example at hand would become:
 
 WHERE (d1.ID=234409763) or (d2.ID=234409763)
 AND (d2.BasedOnID=234409763) or (d2.ID=234409763)

 Yeah, that could be done, but it's not necessarily a win from a
 performance standpoint.

 Not necessarily a win, but on the test case no significant increase in 
 planning time.

The problem is that it could cost you a lot in execution time, because
of the useless extra filter conditions that will be applied.  The
planner isn't going to notice that those conditions are redundant.
An even worse problem is that because it doesn't know that, it's going
to underestimate the combined selectivity of the two WHERE conditions,
resulting in drastic underestimates of the numbers of rows emitted,
possibly resulting in horribly bad plan choices that kill whatever
performance improvement you got at the bottom level.

What the EquivalenceClass machinery actually buys us is the ability to
deal with a set of partially-redundant possible filter conditions and
apply only enough of them to get a correct plan.  As an example, if the
query has A=B and B=C, we could deduce A=C, but we don't want to apply
all three equality conditions at runtime.  Instead we put all three
variables into an EC, and then there is logic to determine which of the
equality clauses implied by the EC should actually get applied where.
This avoids both the useless-checks-at-runtime problem and the problem
of wrong selectivity estimates.

To do something like this without generating stupid plans, we'd need
some sort of generalized EC mechanism that could figure out which
variants of the clause made the most sense in different contexts.
Or maybe something else entirely --- but just generating a lot of
variants of a clause and throwing them all into the existing mechanism
is not workable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Toward a column reorder solution

2010-07-28 Thread Joshua D. Drake

On Tue, 27 Jul 2010 19:55:18 -0700, David E. Wheeler
da...@kineticode.com wrote:
 On Jul 27, 2010, at 3:01 PM, Joshua D. Drake wrote:
 
 Correct. We are also hoping to get some sponsorship for it.
 
 https://www.fossexperts.com/
 
 Frigging copycat.

Hah! I gave you kudos :P (you are in the FAQ)

JD

-- 
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE SET STATISTICS requires AccessExclusiveLock

2010-07-28 Thread Simon Riggs

On Wed, 2010-07-28 at 15:24 +0300, Peter Eisentraut wrote:
 On tor, 2010-07-15 at 10:24 +0100, Simon Riggs wrote:
  Patch to reduce lock levels for 
   ALTER TABLE
   CREATE TRIGGER
   CREATE RULE
 
 Tried this out, but $subject is still the case.  The problem is that
 ATRewriteCatalogs() calls AlterTableCreateToastTable() based on what it
 thinks the subcommands are, and AlterTableCreateToastTable() takes an
 AccessExclusiveLock.
 
 This could possibly be addressed by moving AT_SetStatistics to
 AT_PASS_MISC in order to avoid the TOAST table call.
 
 In a related matter, assigning ShareUpdateExclusiveLock to AT_SetStorage
 doesn't work either, because the TOAST table call needs to be done in
 that case.
 
 Perhaps this logic needs to be refactored a bit more so that there
 aren't any inconsistencies between the lock modes and the passes,
 which could lead to false expectations and deadlocks.

Changes as suggested, plus tests to confirm lock levels for
ShareUpdateExclusiveLock changes. Will commit soon, if no objection.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 516dbd2..77a3c57 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -376,7 +376,7 @@ cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
-		check_index_is_clusterable(OldHeap, indexOid, recheck);
+		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
 	/* Log what we're doing (this could use more effort) */
 	if (OidIsValid(indexOid))
@@ -405,11 +405,11 @@ cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
  * definition can't change under us.
  */
 void
-check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck)
+check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMODE lockmode)
 {
 	Relation	OldIndex;
 
-	OldIndex = index_open(indexOid, AccessExclusiveLock);
+	OldIndex = index_open(indexOid, lockmode);
 
 	/*
 	 * Check that index is in fact an index on the given relation
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 9a7cd96..defb2e4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2550,6 +2550,8 @@ AlterTableGetLockLevel(List *cmds)
 			case AT_DropCluster:
 			case AT_SetRelOptions:
 			case AT_ResetRelOptions:
+			case AT_SetOptions:
+			case AT_ResetOptions:
 			case AT_SetStorage:
 cmd_lockmode = ShareUpdateExclusiveLock;
 break;
@@ -2669,19 +2671,19 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* Performs own permission checks */
 			ATPrepSetStatistics(rel, cmd-name, cmd-def, lockmode);
-			pass = AT_PASS_COL_ATTRS;
+			pass = AT_PASS_MISC;
 			break;
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 		case AT_ResetOptions:	/* ALTER COLUMN RESET ( options ) */
 			ATSimplePermissionsRelationOrIndex(rel);
 			/* This command never recurses */
-			pass = AT_PASS_COL_ATTRS;
+			pass = AT_PASS_MISC;
 			break;
 		case AT_SetStorage:		/* ALTER COLUMN SET STORAGE */
 			ATSimplePermissions(rel, false);
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
 			/* No command-specific prep needed */
-			pass = AT_PASS_COL_ATTRS;
+			pass = AT_PASS_MISC;
 			break;
 		case AT_DropColumn:		/* DROP COLUMN */
 			ATSimplePermissions(rel, false);
@@ -6906,7 +6908,7 @@ ATExecClusterOn(Relation rel, const char *indexName, LOCKMODE lockmode)
 		indexName, RelationGetRelationName(rel;
 
 	/* Check index is valid to cluster on */
-	check_index_is_clusterable(rel, indexOid, false);
+	check_index_is_clusterable(rel, indexOid, false, lockmode);
 
 	/* And do the work */
 	mark_index_clustered(rel, indexOid);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 4f67930..74d4bd1 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -14,6 +14,7 @@
 #define CLUSTER_H
 
 #include nodes/parsenodes.h
+#include storage/lock.h
 #include utils/relcache.h
 
 
@@ -21,7 +22,7 @@ extern void cluster(ClusterStmt *stmt, bool isTopLevel);
 extern void cluster_rel(Oid tableOid, Oid indexOid, bool recheck,
 			bool verbose, int freeze_min_age, int freeze_table_age);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
-		   bool recheck);
+		   bool recheck, LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid);
 
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace);
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 5aff44f..83e24fd 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -1473,6 +1473,127 @@

Re: [HACKERS] ERROR: argument to pg_get_expr() must come from system catalogs

2010-07-28 Thread Dave Page

On Wed, Jul 28, 2010 at 4:54 PM, Bruce Momjian br...@momjian.us wrote:
 Are we basically leaving pgAdmin in this state until we come up with a
 fix and need a new minor release?  We pride ourselves in not introducing
 breakage in minor releases, but it has certainly happened in this case,
 and it is making pgAdmin look bad.  Dave, is there a hack you can add to
 pgAdmin to work around the join issue until we can fix the backend?

It wouldn't make much difference if there was - the majority of people
won't get it until they upgrade their server anyway.


-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Jeff Davis

On Wed, 2010-07-28 at 06:40 -0400, Robert Haas wrote:
  fix1 -- Only call PageSetLSN/TLI inside log_newpage() and
  heap_xlog_newpage() if the page is not zeroed.
 
  fix2 -- Don't call log_newpage() at all if the page is not zeroed.
 
  Please review. I don't have a strong opinion about which one should be
  applied.
 
 Looks good.  I still prefer fix2; it seems cleaner to me.  It has
 another advantage, too, which is that copy_relation_data() is used
 ONLY by ALTER TABLE SET TABLESPACE.  So if I stick to patching that
 function, that's the only thing I can possibly break, whereas
 log_newpage() is used in a bunch of other places.  I don't think
 either fix is going to break anything at all, but considering that
 this is going to need back-patching, I'd rather be conservative.
 

Sounds good to me.

However, when Simon said We definitely shouldn't do anything that
leaves standby different to primary. you said obviously. Fix2 can
leave a difference between the two, because zeroed pages at the end of
the heap file on the primary will not be sent to the standby (the
standby will only create the zeroed pages if a higher block number is
sent; which won't be the case if the zeroed pages are at the end).

As we discussed before, that looks inconsequential, but I just want to
make sure that it's understood.

 Speaking of back-patching, the subject line describes this as an 8.3+
 problem, but it looks to me like this goes all the way back to 8.0.
 The code is a little different in 8.2 and prior, but ISTM it's
 vulnerable to the same issue.  Don't we need a modified version of
 this patch for the 8.0 - 8.2 branches?

That sounds right. I just saw that the code in question was introduced
in 8.3.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Toward a column reorder solution

2010-07-28 Thread David E. Wheeler

On Jul 28, 2010, at 7:57 AM, Joshua D. Drake wrote:

 Hah! I gave you kudos :P (you are in the FAQ)

Ah, thanks. The link is missing a G: It's PGXN, not PXN.

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-28 Thread James William Pye

On Jul 25, 2010, at 8:01 AM, Kris Jurka wrote:
 The JDBC driver reads server messages for multiple reasons.

 One of them is indeed to do early failure detection.

That's high quality. =)

 Another is to pickup NoticeResponse messages to avoid a network buffer 
 deadlock.

That's a good catch. I don't think psql/restore would often run into this as 
when COPY IN is in play, it's normally restoring a database. However, with 
JDBC, I imagine COPY would more often be used to do bulk loading into live 
tables that may very well cause a NOTICE. [Well, I reference psql/libpq because 
I don't recall it recognizing failure during COPY IN in the past, so I assume 
it's not receiving any data in that state.]

hrm, I suppose a lazy way around that problem would be to suspend all client 
messages(client_min_messages) during COPY IN. Tho, I guess one would still have 
to contend with NotificationResponse, and ParameterStatus..

 So this is possible to work around driver side by peeking into the network 
 stream and delaying processing of the end of copy until the driver agrees 
 that the copy is done, but

I don't think you would have to peek in. If the interface were to always hold 
onto the last message or last n-bytes submitted to be sent, it would be able to 
send the possible CopyData(EOF) and CopyDone once the COPY operation (at the 
interface level) is closed/shutdown/terminated. Granted, this is dependent on 
CopyData(EOF) not being in the middle of regular CopyData, but I gather that 
that would end in an ErrorResponse anyways.

 I still maintain that this is a server bug. It is not OK for the server to 
 assume that the client is done and move on, the client must tell the server 
 what it wants done.


I'm a bit torn here. While it would seem to be either a bug in the spec or a 
bug in the server, I'm inclined to call it a wart in the server's 
implementation of the spec.

I don't see the fix as being dangerous, but I imagine an implementor would want 
to have the workaround in place regardless. I certainly would.

I'd be in favor of seeing this fixed in 9.x, and the documentation updated to 
warn implementors about the wart in the older versions.. That is, I don't see 
any reason why we can't get rid of this unsightly thing considering the 
workarounds would still work with a wart-free server.

cheers, jwp
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Toward a column reorder solution

2010-07-28 Thread Joshua D. Drake

On Wed, 2010-07-28 at 09:30 -0700, David E. Wheeler wrote:
 On Jul 28, 2010, at 7:57 AM, Joshua D. Drake wrote:
 
  Hah! I gave you kudos :P (you are in the FAQ)
 
 Ah, thanks. The link is missing a G: It's PGXN, not PXN.

Yeah that is already fixed, just waiting for cache to clear (on the
server).

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 However, when Simon said We definitely shouldn't do anything that
 leaves standby different to primary. you said obviously. Fix2 can
 leave a difference between the two, because zeroed pages at the end of
 the heap file on the primary will not be sent to the standby (the
 standby will only create the zeroed pages if a higher block number is
 sent; which won't be the case if the zeroed pages are at the end).

 As we discussed before, that looks inconsequential, but I just want to
 make sure that it's understood.

I understand it, and I don't like it one bit.  I haven't caught up on
this thread yet, but I think the only acceptable solution is one that
leaves the slave in the *same* state as the master.  Not a state that
we hope will behave equivalently.  I can think of too many corner cases
where it might not.  (In fact, having a zeroed page in a relation is
already a corner case in itself, so the amount of testing you'd get for
such behaviors is epsilon squared.  You don't want to take that bet.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-28 Thread Kris Jurka




On Wed, 28 Jul 2010, James William Pye wrote:



hrm, I suppose a lazy way around that problem would be to suspend all 
client messages(client_min_messages) during COPY IN. Tho, I guess one 
would still have to contend with NotificationResponse, and 
ParameterStatus..


Technically you won't get NotificationResponse until transaction end, so 
you don't need to worry about that mid copy.


I don't think you would have to peek in. If the interface were to always 
hold onto the last message or last n-bytes submitted to be sent, it 
would be able to send the possible CopyData(EOF) and CopyDone once the 
COPY operation (at the interface level) is closed/shutdown/terminated. 
Granted, this is dependent on CopyData(EOF) not being in the middle of 
regular CopyData, but I gather that that would end in an ErrorResponse 
anyways.


One of the key points of confusion is that CopyData(EOF) does not result 
in an error.  It results in ignoring any futher data.  The problem I have 
is that for text mode it waits for CopyDone, but in binary mode it ends 
the copy sequence immediately.  Additionally the interface exposed by the 
JDBC driver lets the user write arbitrary CopyData bytes to the server, so 
without parsing all of that we don't know whether they've issued 
CopyData(EOF) or not.


Kris Jurka

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Jeff Davis

On Wed, 2010-07-28 at 12:36 -0400, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  However, when Simon said We definitely shouldn't do anything that
  leaves standby different to primary. you said obviously. Fix2 can
  leave a difference between the two, because zeroed pages at the end of
  the heap file on the primary will not be sent to the standby (the
  standby will only create the zeroed pages if a higher block number is
  sent; which won't be the case if the zeroed pages are at the end).
 
  As we discussed before, that looks inconsequential, but I just want to
  make sure that it's understood.
 
 I understand it, and I don't like it one bit.  I haven't caught up on
 this thread yet, but I think the only acceptable solution is one that
 leaves the slave in the *same* state as the master.  Not a state that
 we hope will behave equivalently.  I can think of too many corner cases
 where it might not.  (In fact, having a zeroed page in a relation is
 already a corner case in itself, so the amount of testing you'd get for
 such behaviors is epsilon squared.  You don't want to take that bet.)
 

Ok, sounds like my original fix (fix1) is the way to go then. Log zero
pages, but don't set LSN/TLI if it's a zero page (in log_newpage or
heap_xlog_newpage).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Davis pg...@j-davis.com writes:
 However, when Simon said We definitely shouldn't do anything that
 leaves standby different to primary. you said obviously. Fix2 can
 leave a difference between the two, because zeroed pages at the end of
 the heap file on the primary will not be sent to the standby (the
 standby will only create the zeroed pages if a higher block number is
 sent; which won't be the case if the zeroed pages are at the end).

 As we discussed before, that looks inconsequential, but I just want to
 make sure that it's understood.

 I understand it, and I don't like it one bit.  I haven't caught up on
 this thread yet, but I think the only acceptable solution is one that
 leaves the slave in the *same* state as the master.  Not a state that
 we hope will behave equivalently.  I can think of too many corner cases
 where it might not.  (In fact, having a zeroed page in a relation is
 already a corner case in itself, so the amount of testing you'd get for
 such behaviors is epsilon squared.  You don't want to take that bet.)

I might be missing something here, but I don't see how you're going to
manage that.  In Jeff's original example, he crashes the database
after extending the relation but before initializing and writing the
new page.  I believe that at that point no XLOG has been written yet,
so the relation has been extended but there is no WAL to be sent to
the standby.  So now you have the exact situation you're concerned
about - the relation has been extended on the master but not on the
standby.  As far as I can see, this is an unavoidable consequence of
the fact that we don't XLOG the act of extending the relation.
Worrying about it only in the specific context of ALTER TABLE .. SET
TABLESPACE seems backwards; if there are any bugs there, we're in for
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Jeff Davis

On Wed, 2010-07-28 at 13:18 -0400, Robert Haas wrote:
 In Jeff's original example, he crashes the database
 after extending the relation but before initializing and writing the
 new page.  I believe that at that point no XLOG has been written yet,
 so the relation has been extended but there is no WAL to be sent to
 the standby.  So now you have the exact situation you're concerned
 about - the relation has been extended on the master but not on the
 standby.  As far as I can see, this is an unavoidable consequence of
 the fact that we don't XLOG the act of extending the relation.
 Worrying about it only in the specific context of ALTER TABLE .. SET
 TABLESPACE seems backwards; if there are any bugs there, we're in for
 it.

That's a very good point. Now I'm leaning more toward your fix.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 28, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I understand it, and I don't like it one bit.  I haven't caught up on
 this thread yet, but I think the only acceptable solution is one that
 leaves the slave in the *same* state as the master.

 I might be missing something here, but I don't see how you're going to
 manage that.  In Jeff's original example, he crashes the database
 after extending the relation but before initializing and writing the
 new page.  I believe that at that point no XLOG has been written yet,
 so the relation has been extended but there is no WAL to be sent to
 the standby.  So now you have the exact situation you're concerned
 about - the relation has been extended on the master but not on the
 standby.

You're right that we cannot prevent that situation --- or at least,
the cure would be worse than the disease.  (The cure would be to
XLOG the extension action, obviously, but then out-of-disk-space
has to be a PANIC condition.)  However, it doesn't follow that it's
a good idea to make copy_relation_data *intentionally* make the slave
and master different.

I've caught up on the thread now, and I think that fix2 (skip logging
the page) is extremely dangerous and has little if anything in its
favor.  fix1 seems reasonable given the structure of the page validity
checks.

However, what about Jeff's original comment

: On second thought, why are PageSetLSN and PageSetTLI being called from
: log_newpage(), anyway?

I think it is appropriate to be setting the LSN/TLI in the case of a
page that's been constructed by the caller as part of the WAL-logged
action, but doing so in copy_relation_data seems rather questionable.
We certainly didn't change the source page so changing its LSN seems
rather wrong --- wouldn't it be better to just copy the source pages
with their original LSNs?  So perhaps the best fix is to add a bool
parameter to log_newpage telling it whether to update LSN/TLI, and
have copy_relation_data pass false while the other callers pass true.
(Although I guess we'd need to propagate that flag in the WAL record,
so maybe this is more trouble than its worth.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 2:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 28, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I understand it, and I don't like it one bit.  I haven't caught up on
 this thread yet, but I think the only acceptable solution is one that
 leaves the slave in the *same* state as the master.

 I might be missing something here, but I don't see how you're going to
 manage that.  In Jeff's original example, he crashes the database
 after extending the relation but before initializing and writing the
 new page.  I believe that at that point no XLOG has been written yet,
 so the relation has been extended but there is no WAL to be sent to
 the standby.  So now you have the exact situation you're concerned
 about - the relation has been extended on the master but not on the
 standby.

 You're right that we cannot prevent that situation --- or at least,
 the cure would be worse than the disease.  (The cure would be to
 XLOG the extension action, obviously, but then out-of-disk-space
 has to be a PANIC condition.)

Not to mention that performance would probably be atrocious.

 However, it doesn't follow that it's
 a good idea to make copy_relation_data *intentionally* make the slave
 and master different.

 I've caught up on the thread now, and I think that fix2 (skip logging
 the page) is extremely dangerous and has little if anything in its
 favor.

Why do you think that?  They will be different only in terms of
whether the uninitialized bytes are before or after the nominal EOF,
and we know we have to be indifferent to that case anyway.

 fix1 seems reasonable given the structure of the page validity
 checks.

 However, what about Jeff's original comment

 : On second thought, why are PageSetLSN and PageSetTLI being called from
 : log_newpage(), anyway?

 I think it is appropriate to be setting the LSN/TLI in the case of a
 page that's been constructed by the caller as part of the WAL-logged
 action, but doing so in copy_relation_data seems rather questionable.
 We certainly didn't change the source page so changing its LSN seems
 rather wrong --- wouldn't it be better to just copy the source pages
 with their original LSNs?  So perhaps the best fix is to add a bool
 parameter to log_newpage telling it whether to update LSN/TLI, and
 have copy_relation_data pass false while the other callers pass true.
 (Although I guess we'd need to propagate that flag in the WAL record,
 so maybe this is more trouble than its worth.)

It seems like if log_newpage() were to set the LSN/TLI before calling
XLogInsert() - or optionally not - then it wouldn't be necessary to
set them also in heap_xlog_newpage(); the memcpy operation would by
definition have copied the right information onto the page.  That
seems like it would be a cleaner design, but back-patching a change to
the interpretation of WAL records that might already be on someone's
disk seems dicey at best.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing NUMERIC size for 9.1

2010-07-28 Thread Tom Lane

[ gradually catching up on email ]

Robert Haas robertmh...@gmail.com writes:
 On Fri, Jul 16, 2010 at 2:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't like the way you did that either (specifically, not the kluge
 in NUMERIC_DIGITS()).  It would probably work better if you declared
 two different structs, or a union of same, to represent the two layout
 cases.
 
 n_sign_dscale is now pretty inappropriately named, probably better to
 change the field name.  This will also help to catch anything that's
 not using the macros.  (Renaming the n_weight field, or at least burying
 it in an extra level of struct, would be helpful for the same reason.)

 I'm not sure what you have in mind here.  If we create a union of two
 structs, we'll still have to pick one of them to use to check the high
 bits of the first word, so I'm not sure we'll be adding all that much
 in terms of clarity.

No, you can do something like this:

typedef struct numeric_short
{
uint16  word1;
NumericDigit digits[1];
} numeric_short;

typedef struct numeric_long
{
uint16  word1;
int16   weight;
NumericDigit digits[1];
} numeric_long;

typedef union numeric
{
uint16  word1;
numeric_short   short;
numeric_longlong;
} numeric;

and then access word1 either directly or (after having identified which
format it is) via one of the sub-structs.  If you really wanted to get
pedantic you could have a third sub-struct representing the format for
NaNs, but since those are just going to be word1 it may not be worth the
trouble.

 It seems like you've handled the NAN case a bit awkwardly.  Since the
 weight is uninteresting for a NAN, it's okay to not store the weight
 field, so I think what you should do is consider that the dscale field
 is still full-width, ie the format of the first word remains old-style
 not new-style.  I don't remember whether dscale is meaningful for a NAN,
 but if it is, your approach is constraining what is possible to store,
 and is also breaking compatibility with old databases.

 There is only one NaN value.  Neither weight or dscale is meaningful.
 I think if the high two bits of the first word are 11 we never examine
 anything else - do you see somewhere that we're doing otherwise?

I hadn't actually looked.  I think though that it's a mistake to break
compatibility on both dscale and weight when you only need to break one.
Also, weight is *certainly* uninteresting for NaNs since it's not even
meaningful unless there are digits.  dscale could conceivably be worth
something.

 The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
 awkward; I wonder if there's a better way.  One solution might be to
 offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
 than try to sign-extend per se.

 Hmm... so, if the weight is X we store the value
 X-NUMERIC_SHORT_WEIGHT_MIN as an unsigned integer?  That's kind of a
 funny representation - I *think* it works out to sign extension with
 the high bit flipped.  I guess we could do it that way, but it might
 make it harder/more confusing to do bit arithmetic with the weight
 sign bit later on.

Yeah, it was just an idea.  It seems like there should be an easier way
to extract the sign-extended value, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Jeff Davis

On Wed, 2010-07-28 at 14:50 -0400, Robert Haas wrote:
 It seems like if log_newpage() were to set the LSN/TLI before calling
 XLogInsert() - or optionally not - then it wouldn't be necessary to
 set them also in heap_xlog_newpage(); the memcpy operation would by
 definition have copied the right information onto the page.  That
 seems like it would be a cleaner design, but back-patching a change to
 the interpretation of WAL records that might already be on someone's
 disk seems dicey at best.

How do you set the LSN before XLogInsert()?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 3:08 PM, Jeff Davis pg...@j-davis.com wrote:
 On Wed, 2010-07-28 at 14:50 -0400, Robert Haas wrote:
 It seems like if log_newpage() were to set the LSN/TLI before calling
 XLogInsert() - or optionally not - then it wouldn't be necessary to
 set them also in heap_xlog_newpage(); the memcpy operation would by
 definition have copied the right information onto the page.  That
 seems like it would be a cleaner design, but back-patching a change to
 the interpretation of WAL records that might already be on someone's
 disk seems dicey at best.

 How do you set the LSN before XLogInsert()?

Details, details...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Jul 28, 2010 at 2:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I've caught up on the thread now, and I think that fix2 (skip logging
 the page) is extremely dangerous and has little if anything in its
 favor.

 Why do you think that?  They will be different only in terms of
 whether the uninitialized bytes are before or after the nominal EOF,
 and we know we have to be indifferent to that case anyway.

(1) You're assuming that the page will be zeroes on the slave without
having forced it to be so.  A really obvious case where this fails
is where we're doing crash-and-restart on the master: a later action
could have modified the page away from the all-zero state.  (In
principle that's OK but I think this might break torn-page protection.)

(2) On filesystems that support holes, the page will not have storage,
whereas it (probably) does on the master.  This could lead to a
divergence in behavior later, ie slave runs out of disk space at a
different point than the master.

(3) The position of the nominal EOF can drive choices about which page
to put new tuples in, specifically thats where RelationGetBufferForTuple
will go if FSM has no information.  This could result in unexpected
divergence in behavior after the slave goes live compared to what the
master would have done.  Maybe that's OK but it seems better to avoid
it if we can, especially when you think about crash-and-restart on the
master as opposed to a separate slave.

Now as I said earlier, these are all tiny corners of a corner case, and
they *probably* shouldn't matter.  But I see no good reason to expose
ourselves to the possibility that there's some cases where they do
matter.  Especially when your argument for fix2 is a purely aesthetic
judgment that I don't agree with anyway.

 I think it is appropriate to be setting the LSN/TLI in the case of a
 page that's been constructed by the caller as part of the WAL-logged
 action, but doing so in copy_relation_data seems rather questionable.
 We certainly didn't change the source page so changing its LSN seems
 rather wrong --- wouldn't it be better to just copy the source pages
 with their original LSNs?

 It seems like if log_newpage() were to set the LSN/TLI before calling
 XLogInsert() - or optionally not - then it wouldn't be necessary to
 set them also in heap_xlog_newpage(); the memcpy operation would by
 definition have copied the right information onto the page.

Not possible because it is only after you've done XLogInsert that you
know what LSN was assigned to the WAL record.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Tom Lane

I wrote:
 I think it is appropriate to be setting the LSN/TLI in the case of a
 page that's been constructed by the caller as part of the WAL-logged
 action, but doing so in copy_relation_data seems rather questionable.

BTW, I thought of an argument that explains why that's sane: it marks
the copied page as having been recently WAL-logged.  If we do some
action on the copied relation shortly after completing the
copy_relation_data transaction, we will see that its LSN is later than
the last checkpoint and know that we don't need to emit a full-page WAL
image for it, which is correct because in case of crash+restart the
HEAP_NEWPAGE record will provide the full-page image.  If we left the
source relation's page's LSN in there, we would frequently make the
wrong decision and emit an unnecessary extra full-page image.

So nevermind that distraction.  I'm back to thinking that fix1 is
the way to go.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 3:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 (1) You're assuming that the page will be zeroes on the slave without
 having forced it to be so.  A really obvious case where this fails
 is where we're doing crash-and-restart on the master: a later action
 could have modified the page away from the all-zero state.  (In
 principle that's OK but I think this might break torn-page protection.)

Hmm, yeah, that does seem like it has the potential to be bad.  I
think this is sufficient reason to go with fix #1.

 (2) On filesystems that support holes, the page will not have storage,
 whereas it (probably) does on the master.  This could lead to a
 divergence in behavior later, ie slave runs out of disk space at a
 different point than the master.

I can't get excited about this one.

 (3) The position of the nominal EOF can drive choices about which page
 to put new tuples in, specifically thats where RelationGetBufferForTuple
 will go if FSM has no information.  This could result in unexpected
 divergence in behavior after the slave goes live compared to what the
 master would have done.  Maybe that's OK but it seems better to avoid
 it if we can, especially when you think about crash-and-restart on the
 master as opposed to a separate slave.

You're still going to have that in the normal (not altering the
tablespace) extension case, which is presumably far more common.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] Trouble with COPY IN

2010-07-28 Thread James William Pye

On Jul 28, 2010, at 9:53 AM, Kris Jurka wrote:
 Technically you won't get NotificationResponse until transaction end, so you 
 don't need to worry about that mid copy.

Ah, thanks for noting that. It would appear my original reading of the async 
section didn't get far enough beyond Frontends must be prepared to deal with 
these messages at any time, even when not engaged in a query.. I see the note 
below clarifying NotificationResponse.

 One of the key points of confusion is that CopyData(EOF) does not result in 
 an error.
 It results in ignoring any futher data.
 The problem I have is that for text mode it waits for CopyDone, but in binary 
 mode it ends the copy sequence immediately.

That is bothersome. :\

 Additionally the interface exposed by the JDBC driver lets the user write 
 arbitrary CopyData bytes to the server, so without parsing all of that we 
 don't know whether they've issued CopyData(EOF) or not.

Okay, so you can't know with absolute certainty without parsing the data, but 
the usual case would be handled by holding onto the last-N bytes or so. Enough 
to fit the EOF and perhaps a little more for paranoia's sake.

That's not to say that I'm missing the problem. When (not if, when) the 
user feeds data past a CopyData(EOF), it's going to get interesting.

[Thinking about the logic necessary to handle such a case and avoid network 
buffer deadlock...]
I would think the least invasive way to handle it would be to set the 
CommandComplete and ReadyForQuery messages aside when they are received if 
CopyDone hasn't been sent, continue the COPY operation as usual until it is 
shutdown, send CopyDone and, finally, reinstate CommandComplete and RFQ as if 
they were just received.. I don't think that really accommodates for CopyFail 
as the status in RFQ will need to be adjusted to match what was actually 
done..? Well, I'm not sure you would need to worry about NoticeResponse after a 
premature CommandComplete as INSERTs are no longer happening. ugh.


+1 for a fix.


Not directly regarding your patch, but while the discussion is in the general 
area.
I think it would be wise to throw an error when non-empty CopyData messages are 
received after CopyData(EOF). Chances are that the user is making a mistake and 
should be notified of it.

cheers, jwp
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] page corruption on 8.3+ that makes it to standby

2010-07-28 Thread Jeff Davis

On Wed, 2010-07-28 at 15:37 -0400, Tom Lane wrote:
 So nevermind that distraction.  I'm back to thinking that fix1 is
 the way to go.

Agreed.

It's uncontroversial to have a simple guard against corrupting an
uninitialized page, and uncontroversial is good for things that will be
back-patched.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to show individual statement latencies in pgbench output

2010-07-28 Thread Greg Smith

Finally got around to taking a longer look at your patch, sorry about 
the delay here. Patch itself seems to work on simple tests anyway (more 
on the one suspect bit below). You didn't show what the output looks 
like, so let's start with that because it is both kind of neat and not 
what I expected from your description. Here's the sort of extra stuff 
added to the end of the standard output when you toggle this feature on:


$ pgbench -S pgbench -T 10 -c 8 -j 4 -l -r
...
tps = 28824.943954 (excluding connections establishing)
command latencies
0.001487 \set naccounts 10 * :scale
0.000677 \setrandom aid 1 :naccounts
0.273983 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

From the way you described the patch, I had thought that you were just 
putting this information into the log files or something like that. In 
fact, it's not in the log files; it just shows up in this summary at the 
end. I'm not sure if that's best or not. Some might want to see how the 
per-statement variation varies over time. Sort of torn on whether the 
summary alone is enough detail or not. Let me play with this some more 
and get back to you on that.


Here's what a standard test looks like:

tps = 292.468349 (excluding connections establishing)
command latencies
0.002120 \set nbranches 1 * :scale
0.000682 \set ntellers 10 * :scale
0.000615 \set naccounts 10 * :scale
0.000723 \setrandom aid 1 :naccounts
0.000522 \setrandom bid 1 :nbranches
0.000553 \setrandom tid 1 :ntellers
0.000525 \setrandom delta -5000 5000
0.070307 BEGIN;
1.721631 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE 
aid = :aid;

0.147854 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
11.894366 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE 
tid = :tid;
4.761715 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE 
bid = :bid;
0.643895 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) 
VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

7.452017 END;

I'm happy to see that the END here has a significant portion of time 
assigned to it, which means that it's correctly tracking the commit that 
happens there. It's possible to ask the question why add this feature 
when pg_stat_statements will get you the same data?. I think this gives 
a different enough view of things that it's worth adding anyway. Getting 
the END statements tracked, not having to use prepared statements to 
make it work, and now having to worry about overflowing the 
pg_stat_statements.max parameter that limits what that module tracks are 
all points in favor of this patch being useful even if you know about 
pg_stat_statements.


Now onto the nitpicking. With the standard Ubuntu compiler warnings on I 
get this:


pgbench.c:1588: warning: ‘i’ may be used uninitialized in this function

If you didn't see that, you may want to double-check how verbose you 
have your compiler setup to be; maybe you just missed it (which is of 
course what reviewers are here for). Regardless, the troublesome bit is 
this:


int i;

commands = process_commands(buf[i]);

Which is obviously not a good thing. I'm not sure entirely what you're 
doing with the changes you made to process_file, but I'd suggest you 
double check the logic and coding of that section because there's at 
least one funny thing going on here with i being used without 
initialization first here. I didn't try yet to figure out how this error 
might lead to a bug, but there probably is one.


This looks like a good feature to me, just not sure if it's worth 
extending to produce even more output if people want to see it. Can 
always layer that on top later. I'll continue testing and try to get a 
firmer opinion. Please take a look at the problem I pointed out and 
produce a new patch when you get a chance that fixes that part, so at 
least we don't get stuck on that detail.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] do we need to postpone beta4?

2010-07-28 Thread Robert Haas

On Tue, Jul 27, 2010 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, that's pretty much saying we won't release before September.

 Yup, that's what I think.  In fact I think September might be
 optimistic.  This is what happens when you fork early and allow
 developers to start focusing on new development instead of testing
 the release branch.

Actually, rewind.  I see that you moved the user-mappings issue I was
concerned about to resolved after beta3; I missed the fact that
you'd committed a fix there.  You also fixed the EPQ issue, and the
heap_update_redo problem evaporated.  So now we have the following
issues remaining:

* page corruption after moving tablespace
* ExplainOnePlan handles snapshots differently than ProcessQuery
* name and comment of XLogSetAsyncCommitLSN() should be changed
* Documentation fails to build as PDF

...and I wouldn't necessarily regard any of those as forcing another
beta; the first two are ancient, the third is cosmetic, and the last
one is a build system issue rather than a code change.

Obviously, it's too early to decide anything: we may yet discover more
issues that need to be addressed.  But I think we're in much better
shape than it seemed 24 hours ago.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] string_to_array has to be stable?

2010-07-28 Thread Jeff Davis

On Tue, 2010-07-20 at 11:31 +0200, Pavel Stehule wrote:
 Hello
 
 I am working on to_array, to_string functions and I am looking on
 string_to_array function. I am surprised so this function is marked as
 immutable
 
 postgres=# select array_to_string(array[current_date],',');
  array_to_string
 -
  2010-07-20
 (1 row)
 
 postgres=# set datestyle to German ;
 SET
 postgres=# select array_to_string(array[current_date],',');
  array_to_string
 -
  20.07.2010
 (1 row)
 

What's wrong with that? current_date is the part that's changing, and
it's being passed as an argument to the function. If the argument
changes, an immutable function can return a different result.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation for committing with git

2010-07-28 Thread Daniel Farina

On Wed, Jul 21, 2010 at 9:22 AM, Robert Haas robertmh...@gmail.com wrote:
 On the other hand, if you have technical corrections, or if
 you have suggestions on how to do the same things better (rather than
 suggestions on what to do differently), that would be greatly
 appreciated.

Somewhere in that wiki page there is some musing about the size of
.git directories being somewhat dissatisfying when one feels compelled
to have multiple check-outs materialized.

There's a facility in git known as alternates that let you fetch
objects from some other pool. This is highly useful if you have the
same project checked out many times, either for personal use or on a
hosting server of some sort.

Because the object pool being referenced has no knowledge of other
repositories referencing it, garbage collection (should you be doing
things that generate garbage, such deleting branches and tags) of the
underlying pool can cause corruption in referencing repositories in
the case where they reference objects that have since been deleted.

This will never happen if the repository monotonically grows, as is
often the case for a 'authoritative' repository, such as the one at
git.postgresql.org that only has major branches and release tags that
will never go away. (save the rare case when fixing up after a cvs
race condition that has occurred a few times in the past).

In practice, one can just make a clean clone of a project for the
purposes of such an object pool and then let it sit for months or even
years, as the size of each subsequent .git, even for considerable
amounts of history, is marginal. Once in a while one can perform the
clean-up task of catching up the object pool, if they feel their .git
directories have gotten unacceptably large.

Here's a brief section about it on a git wiki:

https://git.wiki.kernel.org/index.php/GitFaq#How_to_share_objects_between_existing_repositories.3F

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Incorrect FTS result with GIN index

2010-07-28 Thread Tom Lane

Oleg Bartunov o...@sai.msu.su writes:
 you can download dump http://mira.sai.msu.su/~megera/tmp/search_tab.dump

Hmm ... I'm not sure why you're failing to reproduce it, because it's
falling over pretty easily for me.  After poking at it for awhile,
I am of the opinion that scanGetItem's handling of multiple keys is
fundamentally broken and needs to be rewritten completely.  The
particular case I'm seeing here is that one key returns this sequence of
TIDs/lossy flags:

...
1085/4 0
1086/65535 1
1087/4 0
...

while the other one returns this:

...
1083/11 0
1086/6 0
1086/10 0
1087/10 0
...

and what comes out of scanGetItem is just

...
1086/6 1
...

because after returning that, on the next call it advances both input
keystreams.  So 1086/10 should be visited and is not.

I think that depending on the previous entryRes state to determine what
to do is basically unworkable, and what should probably be done instead
is to remember the last-returned TID and advance keystreams with TIDs =
that.  I haven't quite thought through how that should interact with
lossy-page TIDs but it seems more robust than what we've got.

I'm also noticing that the ANDing behavior for the ee:*  dd:* query
style seems very much stupider than it needs to be --- it's returning
lossy pages that very obviously don't need to be examined because the
other keystream has no match at all on that page.  But I haven't had
time to probe into the reason why.

I'm out of time for today, do you want to work on it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] do we need to postpone beta4?

2010-07-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 So now we have the following issues remaining:

 * page corruption after moving tablespace
 * ExplainOnePlan handles snapshots differently than ProcessQuery
 * name and comment of XLogSetAsyncCommitLSN() should be changed
 * Documentation fails to build as PDF

 ...and I wouldn't necessarily regard any of those as forcing another
 beta; the first two are ancient, the third is cosmetic, and the last
 one is a build system issue rather than a code change.

 Obviously, it's too early to decide anything: we may yet discover more
 issues that need to be addressed.  But I think we're in much better
 shape than it seemed 24 hours ago.

Yeah.  I'm off poking at the incorrect FTS result problem, but that
is a pre-existing bug as well; it goes back at least to 8.4 and probably
further.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] string_to_array has to be stable?

2010-07-28 Thread Tom Lane

Jeff Davis pg...@j-davis.com writes:
 On Tue, 2010-07-20 at 11:31 +0200, Pavel Stehule wrote:
 I am working on to_array, to_string functions and I am looking on
 string_to_array function. I am surprised so this function is marked as
 immutable

 What's wrong with that? current_date is the part that's changing, and
 it's being passed as an argument to the function. If the argument
 changes, an immutable function can return a different result.

string_to_array() seems fine to me: it's a predictable transformation
from text to text.  However, I think that there really is an issue with
array_to_string(), because that takes an anyarray and invokes the array
element's type output function.  Type output functions are not
necessarily immutable, and if the input is of a type where that's not
true, then the array_to_string() transformation isn't immutable either.
An example is that date's output function produces different results
depending on datestyle.

I can't remember offhand whether there are any volatile type output
functions, but if there were we'd really need to mark array_to_string()
as volatile.  That would be unpleasant for performance though.   I'd
rather compromise on stable.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] string_to_array has to be stable?

2010-07-28 Thread Jeff Davis

On Wed, 2010-07-28 at 20:25 -0400, Tom Lane wrote:
 string_to_array() seems fine to me: it's a predictable transformation
 from text to text.  However, I think that there really is an issue with
 array_to_string(), because that takes an anyarray and invokes the array
 element's type output function. 

Yes, I misread the problem because he used current_date rather than a
date literal.

 I can't remember offhand whether there are any volatile type output
 functions, but if there were we'd really need to mark array_to_string()
 as volatile.  That would be unpleasant for performance though.   I'd
 rather compromise on stable.  Thoughts?

Stable seems reasonable to me.

A volatile type output function sounds like an edge case. Perhaps there
are even grounds to force a type output function to be stable, similar
to how we force the function for a functional index to be immutable.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to show individual statement latencies in pgbench output

2010-07-28 Thread Florian Pflug

On Jul29, 2010, at 00:48 , Greg Smith wrote:
 Finally got around to taking a longer look at your patch, sorry about the 
 delay here. Patch itself seems to work on simple tests anyway (more on the 
 one suspect bit below). You didn't show what the output looks like, so let's 
 start with that because it is both kind of neat and not what I expected from 
 your description. Here's the sort of extra stuff added to the end of the 
 standard output when you toggle this feature on:
 
 snipped output
 
 From the way you described the patch, I had thought that you were just 
 putting this information into the log files or something like that. In fact, 
 it's not in the log files; it just shows up in this summary at the end. I'm 
 not sure if that's best or not. Some might want to see how the per-statement 
 variation varies over time. Sort of torn on whether the summary alone is 
 enough detail or not. Let me play with this some more and get back to you on 
 that.

I think the two features are pretty much orthogonal, even though they'd make 
use of the same per-statement instrumentation machinery.

I created the patch to tune the wal_writer for the synchronous_commit=off case 
- the idea being that the COMMIT should be virtually instantaneous if the 
wal_writer writes old wal buffers out fast enough.

I haven't yet used pgbench's log output feature, so I can't judge how useful 
the additional of per-statement data to that log is, and what the format should 
be. However, if you think it's useful and can come up with a sensible format, 
I'd be happy to add that feature to the patch.

 Now onto the nitpicking. With the standard Ubuntu compiler warnings on I get 
 this:
 
 pgbench.c:1588: warning: ‘i’ may be used uninitialized in this function
 
 If you didn't see that, you may want to double-check how verbose you have 
 your compiler setup to be; maybe you just missed it (which is of course what 
 reviewers are here for). Regardless, the troublesome bit is this:
 
 int i;
 
 commands = process_commands(buf[i]);

Fixed. That was a leftover of the trimming and comment skipping logic, which my 
patch moves to process_command.

Updated patch is attached.

Thanks for your extensive review 
best regards,
Florian Pflug



pgbench_statementlatency.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

51 matches

Mail list logo