date:20130326

Re: [HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread Atri Sharma

>> I'm a bit confused as to why this is being proposed as a
>> Postgres-related project.  I don't even know what MADlib is, but I'm
>> pretty darn sure that no part of Postgres uses it.  KNNGist certainly
>> doesn't.
>
> It's a reasonably well established extension for Postgres for
> statistical and machine learning methods.  Rather neat, but as you
> indicate, it's not part of Postgres proper.
>
> http://madlib.net/
>
> https://github.com/madlib/madlib/
>

It is the extension that is normally referred to when we talk about
data analytics in Postgres. As you said, it is not part of postgres
proper,but IMO, if we want to extend the data analytics
functionalities of postgres, we need to work on MADlib.

Regards,

Atri



--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread Daniel Farina

On Tue, Mar 26, 2013 at 10:27 PM, Tom Lane  wrote:
> Atri Sharma  writes:
>> I suggested a couple of algorithms to be implemented in MADLib(apart
>> from K Medoids). You could pick some(or all) of them, which would
>> require 3 months to be completed.
>
>> As for more information on index, you can refer
>
>> http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1
>
>> along with the postgres wiki. The wiki is the standard for anything postgres.
>
>> pg_trgm used KNN, but I believe it uses its own implementation of the
>> algorithm. The idea I proposed aims at writing an implementation in
>> the MADlib so that any client program can use the algorithm(s) in
>> their code directly, using MADlib functions.
>
> I'm a bit confused as to why this is being proposed as a
> Postgres-related project.  I don't even know what MADlib is, but I'm
> pretty darn sure that no part of Postgres uses it.  KNNGist certainly
> doesn't.

It's a reasonably well established extension for Postgres for
statistical and machine learning methods.  Rather neat, but as you
indicate, it's not part of Postgres proper.

http://madlib.net/

https://github.com/madlib/madlib/

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread Tom Lane

Atri Sharma  writes:
> I suggested a couple of algorithms to be implemented in MADLib(apart
> from K Medoids). You could pick some(or all) of them, which would
> require 3 months to be completed.

> As for more information on index, you can refer

> http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1

> along with the postgres wiki. The wiki is the standard for anything postgres.

> pg_trgm used KNN, but I believe it uses its own implementation of the
> algorithm. The idea I proposed aims at writing an implementation in
> the MADlib so that any client program can use the algorithm(s) in
> their code directly, using MADlib functions.

I'm a bit confused as to why this is being proposed as a
Postgres-related project.  I don't even know what MADlib is, but I'm
pretty darn sure that no part of Postgres uses it.  KNNGist certainly
doesn't.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread Atri Sharma

I suggested a couple of algorithms to be implemented in MADLib(apart
from K Medoids). You could pick some(or all) of them, which would
require 3 months to be completed.

As for more information on index, you can refer

http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1

along with the postgres wiki. The wiki is the standard for anything postgres.

pg_trgm used KNN, but I believe it uses its own implementation of the
algorithm. The idea I proposed aims at writing an implementation in
the MADlib so that any client program can use the algorithm(s) in
their code directly, using MADlib functions.

Regards,

Atri

On 3/26/13, viod  wrote:
> Hello!
>
> I'm an IT student, and I would like to apply for the 2013 GSoC.
> I've been looking at this mailing list for a while now, and I saw a
> suggestion for GSoC that particularly interested me: implementing the
> K-medoids clustering in Madlib, as it is supposed to be more efficient than
> the K-means algorithm.
>
> I didn't know about these algorithms before, but I have documented myself,
> and it looks quite interesting to me, and even more as I currently have
> lessons (but very very simplified unfortunately).
>
> I've got a few questions:
> Won't this be a quite short project? I can't get an idea of how long it
> would take me to implement this algorithm in a way that would be usable by
> postgresql, but 3 months looks long for this task, doesn't it?
>
> Someone on the IRC channel (can't remember who, sorry) told me it was used
> in the KNN index. I guess this is used by pg_trgm, but are there other
> modules using it currently?
> And could you please give me some links explaining the internals of this
> index? I've been through several articles presenting of it, but none very
> satisfying.
>
> Thanks a lot in advance!
>


-- 
Regards,

Atri
*l'apprenant*


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when promoting node by deleting recovery.conf and restart node

2013-03-26 Thread Simon Riggs

On 25 March 2013 19:14, Heikki Linnakangas  wrote:

> Simon, can you comment on this?

Yes, will do.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regression test failed when enabling checksum

2013-03-26 Thread Simon Riggs

On 26 March 2013 23:23, Jeff Davis  wrote:

> Patch attached. Only brief testing done, so I might have missed
> something. I will look more closely later.

Thanks, I'll look at that tomorrow.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regression test failed when enabling checksum

2013-03-26 Thread Simon Riggs

On 25 March 2013 17:50, Fujii Masao  wrote:

> I found that the regression test failed when I created the database
> cluster with the checksum and set wal_level to archive. I think that
> there are some bugs around checksum feature. Attached is the regression.diff.

Apologies for not responding to your original email, I must have missed that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regression test failed when enabling checksum

2013-03-26 Thread Jeff Davis

On Tue, 2013-03-26 at 02:50 +0900, Fujii Masao wrote:
> Hi,
> 
> I found that the regression test failed when I created the database
> cluster with the checksum and set wal_level to archive. I think that
> there are some bugs around checksum feature. Attached is the regression.diff.

Thank you for the report. This was a significant oversight, but simple
to diagnose and fix.

There were several places that were doing something like:

   PageSetChecksumInplace
   if (use_wal)
  log_newpage
   smgrextend

Which is obviously wrong, because log_newpage set the LSN of the page,
invalidating the checksum. We need to set the checksum after
log_newpage.

Also, I noticed that copy_relation_data was doing smgrread without
validating the checksum (or page header, for that matter), so I also
fixed that.

Patch attached. Only brief testing done, so I might have missed
something. I will look more closely later.

Regards,
Jeff Davis
*** a/src/backend/access/heap/rewriteheap.c
--- b/src/backend/access/heap/rewriteheap.c
***
*** 273,286  end_heap_rewrite(RewriteState state)
  	/* Write the last page, if any */
  	if (state->rs_buffer_valid)
  	{
- 		PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
- 
  		if (state->rs_use_wal)
  			log_newpage(&state->rs_new_rel->rd_node,
  		MAIN_FORKNUM,
  		state->rs_blockno,
  		state->rs_buffer);
  		RelationOpenSmgr(state->rs_new_rel);
  		smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
     (char *) state->rs_buffer, true);
  	}
--- 273,287 
  	/* Write the last page, if any */
  	if (state->rs_buffer_valid)
  	{
  		if (state->rs_use_wal)
  			log_newpage(&state->rs_new_rel->rd_node,
  		MAIN_FORKNUM,
  		state->rs_blockno,
  		state->rs_buffer);
  		RelationOpenSmgr(state->rs_new_rel);
+ 
+ 		PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
+ 
  		smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
     (char *) state->rs_buffer, true);
  	}
***
*** 616,623  raw_heap_insert(RewriteState state, HeapTuple tup)
  		{
  			/* Doesn't fit, so write out the existing page */
  
- 			PageSetChecksumInplace(page, state->rs_blockno);
- 
  			/* XLOG stuff */
  			if (state->rs_use_wal)
  log_newpage(&state->rs_new_rel->rd_node,
--- 617,622 
***
*** 632,637  raw_heap_insert(RewriteState state, HeapTuple tup)
--- 631,639 
  			 * end_heap_rewrite.
  			 */
  			RelationOpenSmgr(state->rs_new_rel);
+ 
+ 			PageSetChecksumInplace(page, state->rs_blockno);
+ 
  			smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM,
  	   state->rs_blockno, (char *) page, true);
  
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 51,56 
--- 51,57 
  #include "commands/tablespace.h"
  #include "commands/trigger.h"
  #include "commands/typecmds.h"
+ #include "common/relpath.h"
  #include "executor/executor.h"
  #include "foreign/foreign.h"
  #include "miscadmin.h"
***
*** 8902,8913  copy_relation_data(SMgrRelation src, SMgrRelation dst,
  
  		smgrread(src, forkNum, blkno, buf);
  
! 		PageSetChecksumInplace(page, blkno);
  
  		/* XLOG stuff */
  		if (use_wal)
  			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);
  
  		/*
  		 * Now write the page.	We say isTemp = true even if it's not a temp
  		 * rel, because there's no need for smgr to schedule an fsync for this
--- 8903,8923 
  
  		smgrread(src, forkNum, blkno, buf);
  
! 		if (!PageIsVerified(page, blkno))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_DATA_CORRUPTED),
! 	 errmsg("invalid page in block %u of relation %s",
! 			blkno,
! 			relpathbackend(src->smgr_rnode.node,
! 		   src->smgr_rnode.backend,
! 		   forkNum;
  
  		/* XLOG stuff */
  		if (use_wal)
  			log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);
  
+ 		PageSetChecksumInplace(page, blkno);
+ 
  		/*
  		 * Now write the page.	We say isTemp = true even if it's not a temp
  		 * rel, because there's no need for smgr to schedule an fsync for this

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ignore invalid indexes in pg_dump

2013-03-26 Thread Michael Paquier

On Wed, Mar 27, 2013 at 6:47 AM, Tom Lane  wrote:

> Michael Paquier  writes:
> > On top of checking indisvalid, I think that some additional checks on
> > indislive and indisready are also necessary.
>
> Those are not necessary, as an index that is marked indisvalid should
> certainly also have those flags set.  If it didn't require making two
> new version distinctions in getIndexes(), I'd be okay with the extra
> checks; but as-is I think the maintenance pain this would add greatly
> outweighs any likely value.
>
> I've committed this in the simpler form that just adds indisvalid
> checks to the appropriate version cases.
>
Thanks.
-- 
Michael

Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Tom Lane

Stefan Kaltenbrunner  writes:
> hmm - will look into that in a bit - but I also just noticed that on the
> same day spoonbill broke there was also a commit to that file
> immediately before that code adding the fflush() calls.

It's hard to see how those would be related to this symptom.  My bet
is that the new fk-deadlock test exposed some pre-existing issue in
isolationtester.  Not quite clear what yet, though.

A different line of thought is that the cancel was received by the
backend but didn't succeed in cancelling the query for some reason.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Stefan Kaltenbrunner

On 03/26/2013 09:33 PM, Tom Lane wrote:
> Stefan Kaltenbrunner  writes:
>> On 03/26/2013 08:45 PM, Tom Lane wrote:
>>> It looks from here like the isolationtester client is what's dropping
>>> the ball --- the backend states are unsurprising, and two of them are
>>> waiting for a new client command.  Can you get a stack trace from the
>>> isolationtester process?
> 
>> https://www.kaltenbrunner.cc/files/isolationtester.txt
> 
> Hmm ... isolationtester.c:584 is in the code that tries to cancel the
> current permutation (test case) after realizing that it's constructed
> an invalid permutation.  It looks like the preceding PQcancel() failed
> for some reason, since the waiting backend is still waiting.  The
> isolationtester code doesn't bother to check for an error result there,
> which is kinda bad, not that it's clear what it could do about it.
> Could you look at the contents of the local variable "buf" in the
> run_permutation stack frame?  Or else try modifying the code along the
> lines of
> 
> -PQcancel(cancel, buf, sizeof(buf));
> +if (!PQcancel(cancel, buf, sizeof(buf)))
> +  fprintf(stderr, "PQcancel failed: %s\n", buf);
> 
> and see if it prints anything interesting before hanging up.

hmm - will look into that in a bit - but I also just noticed that on the
same day spoonbill broke there was also a commit to that file
immediately before that code adding the fflush() calls.


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ignore invalid indexes in pg_dump

2013-03-26 Thread Tom Lane

Michael Paquier  writes:
> On top of checking indisvalid, I think that some additional checks on
> indislive and indisready are also necessary.

Those are not necessary, as an index that is marked indisvalid should
certainly also have those flags set.  If it didn't require making two
new version distinctions in getIndexes(), I'd be okay with the extra
checks; but as-is I think the maintenance pain this would add greatly
outweighs any likely value.

I've committed this in the simpler form that just adds indisvalid
checks to the appropriate version cases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-03-26 Thread Kevin Grittner

Tom Lane  wrote:

> Using HEAD's pg_dump, I see "pg_dump -s regression" taking 5
> seconds.
> On the other hand, running the same executable against the regression
> database on a 9.2 postmaster takes 1.2 seconds.  Looks to me like we
> broke something performance-wise.
>
> A quick check with oprofile says it's all AllocSetCheck's fault:
>
> samples  %    image name  symbol name
> 8    83.6059  postgres    AllocSetCheck
> 1140  1.0858  postgres    base_yyparse
> 918  0.8744  postgres    AllocSetAlloc
> 778  0.7410  postgres    SearchCatCache
> 406  0.3867  postgres    pg_strtok
> 394  0.3753  postgres    hash_search_with_hash_value
> 387  0.3686  postgres    core_yylex
> 373  0.3553  postgres    MemoryContextCheck
> 256  0.2438  postgres    nocachegetattr
> 231  0.2200  postgres    ScanKeywordLookup
> 207  0.1972  postgres    palloc
>
> So maybe I'm nuts to care about the performance of an assert-enabled
> backend, but I don't really find a 4X runtime degradation acceptable,
> even for development work.  Does anyone want to fess up to having caused
> this, or do I need to start tracking down what changed?

I checked master HEAD for a dump of regression and got about 4
seconds.  I checked right after my initial push of matview code and
got 2.5 seconds.  I checked just before that and got 1 second. 
There was some additional pg_dump work for matviews after the
initial push which may or may not account for the rest of the time.

Investigating now.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Tom Lane

Andrew Dunstan  writes:
> There is some timeout code already in the buildfarm client. It was 
> originally put there to help us when we got CVS hangs, a not infrequent 
> occurrence in the early days, so it's currently only used if configured 
> for the checkout phase, but it could easily be used to create a build 
> timeout which would kill the whole process group if the timeout expired. 
> It wouldn't work on Windows, and of course it won't solve whatever 
> problem caused the hang in the first place, but it still might be worth 
> doing.

+1 --- at least then we'd get reports of failures, rather than the
current behavior where the animal just stops reporting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch to add \watch to psql

2013-03-26 Thread Peter Eisentraut

On 3/24/13 3:10 PM, Tom Lane wrote:
> I also concur with the complaint here
> http://www.postgresql.org/message-id/caazkufzxyj-rt1aec6s0g7zm68tdlfbbm1r6hgrbbxnz80k...@mail.gmail.com
> that allowing a minimum sleep of 0 is rather dangerous

The original "watch" command apparently silently corrects a delay of 0
to 0.1 seconds.

> Another minor question is whether we really need the time-of-day in the
> banner,

That's also part of the original "watch" and occasionally useful, I think.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Andrew Dunstan



On 03/26/2013 02:50 PM, Stefan Kaltenbrunner wrote:

Hi all!


I finally started to investigate why spoonbill stopped reporting to the
buildfarm feedback about 2 months ago.
It seems that the foreign-keys locking patch (or something commity very
close to January 23th) broke it in a fairly annoying way - running the
buildfarm script seems to
consistently "stall" during the isolationtester part of the regression
testing leaving the postgresql instance running causing all future
buildfarm runs to fail...




There is some timeout code already in the buildfarm client. It was 
originally put there to help us when we got CVS hangs, a not infrequent 
occurrence in the early days, so it's currently only used if configured 
for the checkout phase, but it could easily be used to create a build 
timeout which would kill the whole process group if the timeout expired. 
It wouldn't work on Windows, and of course it won't solve whatever 
problem caused the hang in the first place, but it still might be worth 
doing.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Tom Lane

Robert Haas  writes:
> Well, you could easily change array_ndims() to error out if ARR_NDIM()
> is negative or more than MAXDIM and return NULL only if it's exactly
> 0.  That wouldn't break backward compatibility because it would throw
> an error only if fed a value that shouldn't ever exist in the first
> place, short of a corrupted database.  I imagine the other functions
> are amenable to similar treatment.

I haven't looked at the patch in detail, but I thought one of the key
changes was that '{}' would now be interpreted as a zero-length 1-D
array rather than a zero-D array.  If we do that it seems a bit moot
to argue about whether we should exactly preserve backwards-compatible
behavior in array_ndims(), because the input it's looking at won't be
the same anymore anyway.

In any case, the entire point of this proposal is that the current
behavior around zero-D arrays is *broken* and we don't want to be
backwards-compatible with it anymore.  So if you wish to argue against
that opinion, do so; but it seems a bit beside the point to simply
complain that backwards compatibility is being lost.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Brendan Jurd

On 27 March 2013 06:47, Robert Haas  wrote:
> On Tue, Mar 26, 2013 at 9:02 AM, Brendan Jurd  wrote:
>> We can't sensibly test for whether an array is empty.  I'd call that a
>> functional problem.
>
> Sure you can.  Equality comparisons work just fine.
>
> rhaas=# select '{}'::int4[] = '{}'::int4[];

The good news is, if anybody out there is using that idiom to test for
emptiness, they will not be disrupted by the change.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Tom Lane

Stefan Kaltenbrunner  writes:
> On 03/26/2013 08:45 PM, Tom Lane wrote:
>> It looks from here like the isolationtester client is what's dropping
>> the ball --- the backend states are unsurprising, and two of them are
>> waiting for a new client command.  Can you get a stack trace from the
>> isolationtester process?

> https://www.kaltenbrunner.cc/files/isolationtester.txt

Hmm ... isolationtester.c:584 is in the code that tries to cancel the
current permutation (test case) after realizing that it's constructed
an invalid permutation.  It looks like the preceding PQcancel() failed
for some reason, since the waiting backend is still waiting.  The
isolationtester code doesn't bother to check for an error result there,
which is kinda bad, not that it's clear what it could do about it.
Could you look at the contents of the local variable "buf" in the
run_permutation stack frame?  Or else try modifying the code along the
lines of

-PQcancel(cancel, buf, sizeof(buf));
+if (!PQcancel(cancel, buf, sizeof(buf)))
+  fprintf(stderr, "PQcancel failed: %s\n", buf);

and see if it prints anything interesting before hanging up.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-03-26 Thread Tom Lane

Using HEAD's pg_dump, I see "pg_dump -s regression" taking 5 seconds.
On the other hand, running the same executable against the regression
database on a 9.2 postmaster takes 1.2 seconds.  Looks to me like we
broke something performance-wise.

A quick check with oprofile says it's all AllocSetCheck's fault:

samples  %image name   symbol name
883.6059  postgres AllocSetCheck
1140  1.0858  postgres base_yyparse
918   0.8744  postgres AllocSetAlloc
778   0.7410  postgres SearchCatCache
406   0.3867  postgres pg_strtok
394   0.3753  postgres hash_search_with_hash_value
387   0.3686  postgres core_yylex
373   0.3553  postgres MemoryContextCheck
256   0.2438  postgres nocachegetattr
231   0.2200  postgres ScanKeywordLookup
207   0.1972  postgres palloc

So maybe I'm nuts to care about the performance of an assert-enabled
backend, but I don't really find a 4X runtime degradation acceptable,
even for development work.  Does anyone want to fess up to having caused
this, or do I need to start tracking down what changed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Andres Freund

On 2013-03-26 13:14:53 -0700, Kevin Grittner wrote:
> Alvaro Herrera  wrote:
> 
> > Not happy with misc.c as a filename.
> 
> We already have two misc.c files:
> 
> src/backend/utils/adt/misc.c
> src/interfaces/ecpg/ecpglib/misc.c
> 
> I much prefer not to repeat the same filename in different
> directories if we can avoid it.
> 
> > How about pg_dump_utils.c or pg_dump_misc.c?
> 
> Those seem reasonable to me.

I vote against including pg_ in the filename, for an implementation
private file that seems duplicative.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Kevin Grittner

Alvaro Herrera  wrote:

> Not happy with misc.c as a filename.

We already have two misc.c files:

src/backend/utils/adt/misc.c
src/interfaces/ecpg/ecpglib/misc.c

I much prefer not to repeat the same filename in different
directories if we can avoid it.

> How about pg_dump_utils.c or pg_dump_misc.c?

Those seem reasonable to me.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Stefan Kaltenbrunner

On 03/26/2013 08:45 PM, Tom Lane wrote:
> Stefan Kaltenbrunner  writes:
>> I finally started to investigate why spoonbill stopped reporting to the
>> buildfarm feedback about 2 months ago.
>> It seems that the foreign-keys locking patch (or something commity very
>> close to January 23th) broke it in a fairly annoying way - running the
>> buildfarm script seems to
>> consistently "stall" during the isolationtester part of the regression
>> testing leaving the postgresql instance running causing all future
>> buildfarm runs to fail...
> 
> It looks from here like the isolationtester client is what's dropping
> the ball --- the backend states are unsurprising, and two of them are
> waiting for a new client command.  Can you get a stack trace from the
> isolationtester process?


https://www.kaltenbrunner.cc/files/isolationtester.txt


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Alvaro Herrera

Robert Haas escribió:
> On Tue, Mar 26, 2013 at 3:02 PM, Alvaro Herrera
>  wrote:

> > Now there *is* one rather big performance problem in this patch, which
> > is that it turns on collection of object dropped data regardless of
> > there being event triggers that use the info at all.  That's a serious
> > drawback and we're going to get complaints about it.  So we need to do
> > something to fix that.
> 
> Really?  Who is going to care about that?  Surely that overhead is
> quite trivial.

I don't think it is, because it involves syscache lookups for each
object being dropped, many extra pallocs, etc.  Surely that's many times
bigger than the PG_TRY overhead you were worried about.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread viod

Hello!

I'm an IT student, and I would like to apply for the 2013 GSoC.
I've been looking at this mailing list for a while now, and I saw a
suggestion for GSoC that particularly interested me: implementing the
K-medoids clustering in Madlib, as it is supposed to be more efficient than
the K-means algorithm.

I didn't know about these algorithms before, but I have documented myself,
and it looks quite interesting to me, and even more as I currently have
lessons (but very very simplified unfortunately).

I've got a few questions:
Won't this be a quite short project? I can't get an idea of how long it
would take me to implement this algorithm in a way that would be usable by
postgresql, but 3 months looks long for this task, doesn't it?

Someone on the IRC channel (can't remember who, sorry) told me it was used
in the KNN index. I guess this is used by pg_trgm, but are there other
modules using it currently?
And could you please give me some links explaining the internals of this
index? I've been through several articles presenting of it, but none very
satisfying.

Thanks a lot in advance!

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Robert Haas

On Tue, Mar 26, 2013 at 9:02 AM, Brendan Jurd  wrote:
> On 26 March 2013 22:57, Robert Haas  wrote:
>> They hate it twice as much when the change is essentially cosmetic.
>> There's no functional problems with arrays as they exist today that
>> this change would solve.
>
> We can't sensibly test for whether an array is empty.  I'd call that a
> functional problem.

Sure you can.  Equality comparisons work just fine.

rhaas=# select '{}'::int4[] = '{}'::int4[];
 ?column?
--
 t
(1 row)

rhaas=# select '{}'::int4[] = '{1}'::int4[];
 ?column?
--
 f
(1 row)

> The NULL return from array_{length,lower,upper,ndims} is those
> functions' way of saying their arguments failed a sanity check.  So we
> cannot distinguish in a disciplined way between a valid, empty array,
> and bad arguments.  If the zero-D implementation had been more
> polished and say, array_ndims returned zero, we had provided an
> array_empty function, or the existing functions threw errors for silly
> arguments instead of returning NULL, then I'd be more inclined to see
> your point.  But as it stands, the zero-D implementation has always
> been half-baked and slightly broken, we just got used to working
> around it.

Well, you could easily change array_ndims() to error out if ARR_NDIM()
is negative or more than MAXDIM and return NULL only if it's exactly
0.  That wouldn't break backward compatibility because it would throw
an error only if fed a value that shouldn't ever exist in the first
place, short of a corrupted database.  I imagine the other functions
are amenable to similar treatment.

And if neither that nor just comparing against an empty array literal
floats your boat, adding an array_is_empty() function would let you
test for this condition without breaking backward compatibility, too.
That's overkill, I think, but it would work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Tom Lane

Stefan Kaltenbrunner  writes:
> I finally started to investigate why spoonbill stopped reporting to the
> buildfarm feedback about 2 months ago.
> It seems that the foreign-keys locking patch (or something commity very
> close to January 23th) broke it in a fairly annoying way - running the
> buildfarm script seems to
> consistently "stall" during the isolationtester part of the regression
> testing leaving the postgresql instance running causing all future
> buildfarm runs to fail...

It looks from here like the isolationtester client is what's dropping
the ball --- the backend states are unsurprising, and two of them are
waiting for a new client command.  Can you get a stack trace from the
isolationtester process?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Robert Haas

On Tue, Mar 26, 2013 at 3:02 PM, Alvaro Herrera
 wrote:
> I tried this and it doesn't work.  The "error pathways" you speak about
> would be the xact.c entry points to commit and abort transactions;
> however, there's a problem with that because some of the commands that
> ProcessUtility() deals with have their own transaction management
> calls internally; so those would call CommitTransaction() and the
> event trigger state would go away, and then when control gets back to
> ProcessUtility there would be nothing to clean up.  I think we could
> ignore the problem, or install smarts in ProcessUtility to avoid calling
> event_trigger.c when one of those commands is involved -- but this seems
> to me a solution worse than the problem.  So all in all I feel like the
> macro on top of PG_TRY is the way to go.

I see.  :-(

> Now there *is* one rather big performance problem in this patch, which
> is that it turns on collection of object dropped data regardless of
> there being event triggers that use the info at all.  That's a serious
> drawback and we're going to get complaints about it.  So we need to do
> something to fix that.

Really?  Who is going to care about that?  Surely that overhead is
quite trivial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Alvaro Herrera

Tom Lane escribió:
> Alvaro Herrera  writes:
> > Now there *is* one rather big performance problem in this patch, which
> > is that it turns on collection of object dropped data regardless of
> > there being event triggers that use the info at all.  That's a serious
> > drawback and we're going to get complaints about it.  So we need to do
> > something to fix that.
> 
> > One idea that comes to mind is to add some more things to the grammar,
> > CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS');
> 
> Uh ... surely we can just notice whether there's a trigger on the
> object-drop event?  I don't understand why we need *yet more*
> mechanism here.

There's no object-drop event, only ddl_command_end.  From previous
discussion I understood we didn't want a separate event, so that's what
we've been running with.

However, I think previous discussions have conflated many different
things, and we've been slowly fixing them one by one; so perhaps at this
point it does make sense to have a new object-drop event.  Let's discuss
it -- we would define it as taking place just before ddl_command_end,
and firing any time a command (with matching tag?) has called
performDeletion or performMultipleDeletions.  Does that sound okay?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql_check_function - rebase for 9.3

2013-03-26 Thread Pavel Stehule

Hello all

2013/3/26 Tom Lane :
> Josh Berkus  writes:
>> Where are we with this patch?  I'm a bit unclear from the discussion on
>> whether it passes muster or not.  Things seem to have petered out.
>
> I took another look at this patch tonight.  I think the thing that is
> bothering everybody (including Pavel) is that as submitted, pl_check.c
> involves a huge amount of duplication of knowledge and code from
> pl_exec.c, and to a lesser extent pl_comp.c.  It certainly looks like a
> maintenance disaster in the making.  It doesn't bother me so much that
> pl_check.c knows about each syntactic structure in plpgsql: there are
> already four or five places you have to touch when adding new syntax.
> Rather, it's that there's so much duplication of knowledge about
> semantic details, which is largely expressed by copied-and-pasted code
> from pl_exec.c.  It seems like a safe bet that we'll sometimes miss the
> need to fix pl_check.c when we fix something in pl_exec.c.  There are
> annoying duplications from pl_comp.c as well, eg knowledge of exactly
> which magic variables are defined in trigger functions.
>
> Having said all that, it's not clear that we can really do any better.
> The only obvious alternative is pushing support for a checking mode
> directly into pl_exec.c, which would obfuscate and slow down that code
> to an unacceptable degree if Pavel's results at
> http://www.postgresql.org/message-id/cafj8prakujmvjpjzfsrye7+ub8jf8wtz5rkxk-0ykxme-k8...@mail.gmail.com
> are any indication.  (In that message, Pavel proposes shoveling the
> problem into the compile code instead, but that seems unlikely to work
> IMO: the main problem in pl_check.c as it stands is duplication of code
> from pl_exec.c not pl_comp.c.  So I think that path could only lead to
> duplicating the same code into pl_comp.c.)
>
> So question 1 is do we want to accept that this is the implementation
> pathway we're going to settle for, or are we going to hold out for a
> better one?  I'd be the first in line to hold out if I had a clue how
> to move towards a better implementation, but I have none.  Are we
> prepared to reject this type of feature entirely because of the
> code-duplication problem?  That doesn't seem attractive either.

I wrote lot of versions and proposed code is redundant, but most
simple and clean.

I am really against to pushing check to pl_exec, because it
significantly decrease code readability and increase some bottle neck
in CPU extensive tests. More, there are too less place for some future
feature - performance advising, more verbose error messages, etc

In PL/pgPSM I used a little bit different architecture - necessary for
PSM and maybe better for PL/pgSQL too. It is two stage compiler -
parsing to AST, and AST compilation. It simplify gram.y and processing
order depends on AST deep iteration and not on bizon rules. It can
have a impact on speed of very large procedures probably - I don't see
different disadvantages. With this architecture I was able do lot of
controls to compile stage without problems.

Most complexity in current code is related to detecting record types
from expressions without expression evaluation. Maybe this code can be
in core or pl_compile.c. Code for Describe (F) message is not too
reusable. It is necessary for

DECLARE r RECORD;
FOR r IN SELECT ...
LOOP
   RAISE NOTICE '>>%<<', r.xx;
END LOOP;

>
> But, even granting that this implementation approach is acceptable,
> the patch does not seem close to being committable as-is: there's
> a lot of fit-and-finish work yet to be done.  To make my point I will
> just quote from one of the regression test additions:
>
> create or replace function f1()
> returns void as $$
> declare a1 int; a2 int;
> begin
>   select 10,20 into a1;
> end;
> $$ language plpgsql;
> -- raise warning
> select plpgsql_check_function('f1()');
>  plpgsql_check_function
> -
>  warning:0:4:SQL statement:too many attributies for target variables
>  Detail: There are less target variables than output columns in query.
>  Hint: Check target variables in SELECT INTO statement
> (3 rows)
>
> Do we like this output format?  I don't.  The unlabeled, undocumented
> fields crammed into a single line with colon separators are neither
> readable nor useful.  If we actually need these fields, why aren't we
> splitting the output into multiple columns?  (I'm also wondering why
> the patch bothers with an option to emit this same info in XML.  Surely
> there is vanishingly small use-case for mechanical examination of this
> output.)

This format can be reduced, redesigned, changed. It is designed like
gcc output and optimized for using from psql console.

I tested table output - in original CHECK statement implementation,
but it is not too friendly for showing in monitor - it is just too
wide. There are similar arguments like using tabular output for
EXPLAIN, although there are high

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Josh Berkus


> I expect to lose this argument, but I think this is a terrible idea.
> Users really hate it when they try to upgrade and find that they, uh,
> can't, because of some application-level incompatibility like this.
> They hate it twice as much when the change is essentially cosmetic.
> There's no functional problems with arrays as they exist today that
> this change would solve.

Sure there is.   How do you distinguish between an array which is NULL
and an array which is empty?

Also, the whole array_dims is NULL thing trips up pretty much every
single PG user who uses arrays for the first time.  I'd expect when we
announce the fix, we'll find that many users where doing the wrong thing
in the first place and didn't know why it wasn't working.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Tom Lane

Alvaro Herrera  writes:
> Now there *is* one rather big performance problem in this patch, which
> is that it turns on collection of object dropped data regardless of
> there being event triggers that use the info at all.  That's a serious
> drawback and we're going to get complaints about it.  So we need to do
> something to fix that.

> One idea that comes to mind is to add some more things to the grammar,
> CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS');

Uh ... surely we can just notice whether there's a trigger on the
object-drop event?  I don't understand why we need *yet more*
mechanism here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Triggerg

2013-03-26 Thread Alvaro Herrera

Robert Haas escribió:
> On Wed, Mar 20, 2013 at 5:42 PM, Alvaro Herrera
>  wrote:
> > Here's a new version of this patch, rebased on top of the new
> > pg_identify_object() stuff.  Note that the regression test doesn't work
> > yet, because I didn't adjust to the new identity output definition (the
> > docs need work, too).  But that's a simple change to do.  I'm leaving
> > that for later.
> 
> I think this is getting there.  A few things to think about:

Thanks.

> - pg_event_trigger_dropped_objects seems to assume that
> currentEventTriggerState will be pointing to the same list on every
> call.  But is that necessarily true?  I'm thinking about a case where
> someone opens a cursor in an event trigger and then tries to read from
> that cursor later in the transaction.  I think you might be able to
> crash the server that way.

Well, no, because it uses materialized return mode, so there's no "next
time" --- the list is constructed completely before
pg_event_trigger_dropped_objects returns.  So there's no such hole.

> - I am not wild about the idea of propagating PG_TRY/PG_CATCH blocks
> into yet more places.  On Linux-x86 they are pretty cheap because
> Linux doesn't need a system call to change the signal mask and x86 has
> few registers that must be saved-and-restored, but elsewhere this can
> be a performance problem.  Now maybe ProcessUtility is not a
> sufficiently-frequently called function for this to matter... but I'm
> not sure.  The alternative is to teach the error handling pathways
> about this in somewhat greater detail, since the point of TRY/CATCH is
> to cleanup things that the regular error handling stuff doesn't now
> about.

I tried this and it doesn't work.  The "error pathways" you speak about
would be the xact.c entry points to commit and abort transactions;
however, there's a problem with that because some of the commands that
ProcessUtility() deals with have their own transaction management
calls internally; so those would call CommitTransaction() and the
event trigger state would go away, and then when control gets back to
ProcessUtility there would be nothing to clean up.  I think we could
ignore the problem, or install smarts in ProcessUtility to avoid calling
event_trigger.c when one of those commands is involved -- but this seems
to me a solution worse than the problem.  So all in all I feel like the
macro on top of PG_TRY is the way to go.

Now there *is* one rather big performance problem in this patch, which
is that it turns on collection of object dropped data regardless of
there being event triggers that use the info at all.  That's a serious
drawback and we're going to get complaints about it.  So we need to do
something to fix that.

One idea that comes to mind is to add some more things to the grammar,
CREATE EVENT TRIGGER name ... WITH ('DROPPED OBJECTS');
or some such, so that when events happen for which any triggers have
that flag enabled, *then* collecting is activated, otherwise not.  This
would be stored in a new column in pg_event_trigger (say "evtsupport", a
char array much like proargmodes).  

The sequence of (ahem) events goes like this:

ProcessUtility()
  EventTriggerBeginCompleteQuery()
  EventTriggerDDLCommandStart()
EventCacheLookup()
EventTriggerInvoke()
  .. run whatever command we've been handed ...
  EventTriggerDDLCommandEnd()
EventCacheLookup()
EventTriggerInvoke()
  EventTriggerEndCompleteQuery()

So EventTriggerBeginCompleteQuery() will have to peek into the event
trigger cache for any ddl_command_end triggers that might apply, and see
if any of them has the flag for "dropped objects".  If it's there, then
enable dropped object collection.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] spoonbill vs. -HEAD

2013-03-26 Thread Stefan Kaltenbrunner

Hi all!


I finally started to investigate why spoonbill stopped reporting to the
buildfarm feedback about 2 months ago.
It seems that the foreign-keys locking patch (or something commity very
close to January 23th) broke it in a fairly annoying way - running the
buildfarm script seems to
consistently "stall" during the isolationtester part of the regression
testing leaving the postgresql instance running causing all future
buildfarm runs to fail...


The process listing at that time looks like:

https://www.kaltenbrunner.cc/files/process_listing.txt


pg_stats_activity of the running instance:


https://www.kaltenbrunner.cc/files/pg_stat_activity.txt


pg_locks:

https://www.kaltenbrunner.cc/files/pg_locks.txt


backtraces of the three backends:

https://www.kaltenbrunner.cc/files/bt_20467.txt
https://www.kaltenbrunner.cc/files/bt_20897.txt
https://www.kaltenbrunner.cc/files/bt_24285.txt




Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] adding support for zero-attribute unique/etc keys

2013-03-26 Thread Gavin Flower


On 27/03/13 06:14, Darren Duncan wrote:

On 2013.03.26 1:40 AM, Albe Laurenz wrote:

Darren Duncan wrote:
So, determining if 2 rows are the same involves an iteration of 
dyadic logical
AND over the predicates for each column comparison.  Now logical AND 
has an
identity value, which is TRUE, because "TRUE AND p" (and "p AND 
TRUE") results
in "p" for all "p".  Therefore, any 2 rows with zero columns each 
are the same.


Since any 2 rows with zero columns are the same, the "UNIQUE 
predicate" is FALSE

any time there is more than 1 row in a table.

Does anyone agree or disagree with this logic?


Yes :^)

You could use the same kind of argument like this:

UNIQUE is true iff any two rows in T satisfy for each column:
the column in row 1 is null OR the column in row 2 is null OR
the column in row 1 is distinct from the column in row 2

Now you you iterate your logical AND over this predicate
for all columns and come up with TRUE since there are none.
Consequently UNIQUE is satisfied, no matter how many rows there are.

In a nutshell:
All members of the empty set satisfy p, but also:
all members of the empty set satisfy the negation of p.

You can use this technique to make anything plausible.


Consider the context however.  We're talking about a UNIQUE constraint 
and so what we want to do is prevent the existence of multiple tuples 
in a relation that are the same for some defined subset of their 
attributes.  I would argue that logically, and commonsensically, two 
tuples with no attributes are the same, and hence a set of distinct 
tuples having zero attributes could have no more than one member, and 
so a UNIQUE constraint over zero attributes would say the relation 
can't have more than one tuple. So unless someone wants to argue that 
two tuples with no attributes are not the same, my interpretation 
makes more sense and is clearly the one to follow. -- Darren Duncan




Hmm as a user, I would like at most one row with empty fields covered by 
a unique index.


Logical arguments to the contrary, remind me of the joke of the school 
boy who told his unlearned father that he had learnt logic and could 
prove that his father actually had 3 fish in his basket despite both 
seeing only 2 fish.  His unlearned father did not try to argue, and 
simply said: well your mother can have the first fish, I'll have the 
second, and that his learned son could have the third...

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-26 Thread Fujii Masao

On Sun, Mar 24, 2013 at 12:37 PM, Michael Paquier
 wrote:
>
>
> On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund 
> wrote:
>>
>> On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
>> > Is someone planning to provide additional feedback about this patch at
>> > some
>> > point?
>>
>> Yes, now that I have returned from my holidays - or well, am returning
>> from them, I do plan to. But it should probably get some implementation
>> level review from somebody but Fujii and me...
>
> Yeah, it would be good to have an extra pair of fresh eyes looking at those
> patches.

Probably I don't have enough time to review the patch thoroughly. It's quite
helpful if someone becomes another reviewer of this patch.

> Please find new patches realigned with HEAD. There were conflicts with 
> commits done recently.

ISTM you failed to make the patches from your repository.
20130323_1_toastindex_v7.patch contains all the changes of
20130323_2_reindex_concurrently_v25.patch

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Alvaro Herrera

Heikki Linnakangas wrote:

> This is what I came up with. I created a new file, misc.c (for lack
> of a better name), for things that are shared by pg_dump and
> pg_restore, but not pg_dumpall or other programs. I moved all the
> parallel stuff from dumputils.c to parallel.c, and everything else
> that's not used outside pg_dump and pg_restore to misc.c. I moved
> exit_horribly() to parallel.c, because it needs to do things
> differently in parallel mode.

Not happy with misc.c as a filename.  How about pg_dump_utils.c or
pg_dump_misc.c?  I think the comment at the top should explicitely say
that the file is intended not to be linked in pg_dumpall.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Kevin Grittner

Merlin Moncure  wrote:

> *) For off-cycle release work that would help enable patches with
> complex performance trade-offs (I'm working up another patch that has
> even more compelling benefits and risks in the buffer allocator),  We
> desperately need a standard battery of comprehensive performance tests
> and doner machines.

Such a thing would vastly reduce the time needed to work on
something like this with confidence that it would not be a disaster
for some unidentified workload.  Sure, something could still "slip
though the cracks" -- but they would *be* cracks, not a wide gaping
hole.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Remove invalid indexes from pg_dump Was: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-26 Thread Fujii Masao

On Tue, Mar 19, 2013 at 9:19 AM, Michael Paquier
 wrote:
> If failures happen with CREATE INDEX CONCURRENTLY, the system will be let
> with invalid indexes. I don't think that the user would like to see invalid
> indexes of
> an existing system being recreated as valid after a restore.
> So why not removing from a dump invalid indexes with something like the
> patch
> attached?

+1

The patch looks good to me.

> This should perhaps be applied in pg_dump for versions down to 8.2 where
> CREATE
> INDEX CONCURRENTLY has been implemented?

I think so.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Heikki Linnakangas


On 26.03.2013 09:51, Heikki Linnakangas wrote:

On 26.03.2013 02:02, Tom Lane wrote:

Heikki Linnakangas writes:

On 25.03.2013 15:36, Tom Lane wrote:

Heikki Linnakangas writes:

Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
Per warning from -Wmissing-format-attribute.



Hm, this is exactly what I removed yesterday, because it makes the
build
fail outright on old gcc:



The attached seems to work. With this patch, on_exit_msg_func() is gone.
There's a different implementation of exit_horribly for pg_dumpall and
pg_dump/restore. In pg_dumpall, it just calls vwrite_msg(). In
pg_dump/restore's version, the logic from parallel_exit_msg_func() is
moved directly to exit_horribly().


Seems probably reasonable, though if we're taking exit_horribly out of
dumputils.c, meseems it ought not be declared in dumputils.h anymore.
Can we put that declaration someplace else, rather than commenting it
with an apology?


Ugh, the patch I posted doesn't actually work, because dumputils.c is
also used in psql and some scripts, so you get a linker error in those.
psql and scripts don't use exit_horribly or many of the other functions
in dumputils.c, so I think we should split dumputils.c into two parts
anyway. fmtId and the other functions that are used by psql in one file,
and the functions that are only shared between pg_dumpall and pg_dump in
another. Then there's also functions that are used by pg_dump and
pg_restore, but not pg_dumpall or psql.

I'll try moving things around a bit...


This is what I came up with. I created a new file, misc.c (for lack of a 
better name), for things that are shared by pg_dump and pg_restore, but 
not pg_dumpall or other programs. I moved all the parallel stuff from 
dumputils.c to parallel.c, and everything else that's not used outside 
pg_dump and pg_restore to misc.c. I moved exit_horribly() to parallel.c, 
because it needs to do things differently in parallel mode.


I still used a function pointer, not for the printf-style message 
printing routine, but for making dumputils.c independent of parallel 
mode. getThreadLocalPQBuffer() is now a function pointer; the default 
implementation just uses a static variable, but when pg_dump/restore 
enters parallel mode, it points the function pointer to a version that 
uses thread-local storage (on windows).


- Heikki
*** a/src/bin/pg_dump/Makefile
--- b/src/bin/pg_dump/Makefile
***
*** 19,25  include $(top_builddir)/src/Makefile.global
  override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
  
  OBJS=	pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o \
! 	pg_backup_null.o pg_backup_tar.o parallel.o \
  	pg_backup_directory.o dumputils.o compress_io.o $(WIN32RES)
  
  KEYWRDOBJS = keywords.o kwlookup.o
--- 19,25 
  override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
  
  OBJS=	pg_backup_archiver.o pg_backup_db.o pg_backup_custom.o \
! 	pg_backup_null.o pg_backup_tar.o parallel.o misc.o \
  	pg_backup_directory.o dumputils.o compress_io.o $(WIN32RES)
  
  KEYWRDOBJS = keywords.o kwlookup.o
*** a/src/bin/pg_dump/common.c
--- b/src/bin/pg_dump/common.c
***
*** 14,19 
--- 14,20 
   *-
   */
  #include "pg_backup_archiver.h"
+ #include "misc.h"
  
  #include 
  
*** a/src/bin/pg_dump/compress_io.c
--- b/src/bin/pg_dump/compress_io.c
***
*** 53,59 
   */
  
  #include "compress_io.h"
! #include "dumputils.h"
  #include "parallel.h"
  
  /*--
--- 53,59 
   */
  
  #include "compress_io.h"
! #include "misc.h"
  #include "parallel.h"
  
  /*--
*** a/src/bin/pg_dump/dumputils.c
--- b/src/bin/pg_dump/dumputils.c
***
*** 25,45 
  extern const ScanKeyword FEScanKeywords[];
  extern const int NumFEScanKeywords;
  
- /* Globals exported by this file */
- int			quote_all_identifiers = 0;
- const char *progname = NULL;
- 
- #define MAX_ON_EXIT_NICELY20
- 
- static struct
- {
- 	on_exit_nicely_callback function;
- 	void	   *arg;
- }	on_exit_nicely_list[MAX_ON_EXIT_NICELY];
- 
- static int	on_exit_nicely_index;
- void		(*on_exit_msg_func) (const char *modulename, const char *fmt, va_list ap) = vwrite_msg;
- 
  #define supports_grant_options(version) ((version) >= 70400)
  
  static bool parseAclItem(const char *item, const char *type,
--- 25,30 
***
*** 49,116  static bool parseAclItem(const char *item, const char *type,
  static char *copyAclUserName(PQExpBuffer output, char *input);
  static void AddAcl(PQExpBuffer aclbuf, const char *keyword,
  	   const char *subname);
! static PQExpBuffer getThreadLocalPQExpBuffer(void);
! 
! #ifdef WIN32
! static void shutdown_parallel_dump_utils(int code, void *unused);
! static bool parallel_init_done = false;
! static DWORD tls_index;
! static DWORD mainThreadId;
  
! static void
! shutdown_parallel_dump_utils(int code, void *unused)
! {
! 	/* Call the cleanup function only from the main thread */
! 	if (main

Re: [HACKERS] adding support for zero-attribute unique/etc keys

2013-03-26 Thread Darren Duncan


On 2013.03.26 1:40 AM, Albe Laurenz wrote:

Darren Duncan wrote:

So, determining if 2 rows are the same involves an iteration of dyadic logical
AND over the predicates for each column comparison.  Now logical AND has an
identity value, which is TRUE, because "TRUE AND p" (and "p AND TRUE") results
in "p" for all "p".  Therefore, any 2 rows with zero columns each are the same.

Since any 2 rows with zero columns are the same, the "UNIQUE predicate" is FALSE
any time there is more than 1 row in a table.

Does anyone agree or disagree with this logic?


Yes :^)

You could use the same kind of argument like this:

UNIQUE is true iff any two rows in T satisfy for each column:
the column in row 1 is null OR the column in row 2 is null OR
the column in row 1 is distinct from the column in row 2

Now you you iterate your logical AND over this predicate
for all columns and come up with TRUE since there are none.
Consequently UNIQUE is satisfied, no matter how many rows there are.

In a nutshell:
All members of the empty set satisfy p, but also:
all members of the empty set satisfy the negation of p.

You can use this technique to make anything plausible.


Consider the context however.  We're talking about a UNIQUE constraint and so 
what we want to do is prevent the existence of multiple tuples in a relation 
that are the same for some defined subset of their attributes.  I would argue 
that logically, and commonsensically, two tuples with no attributes are the 
same, and hence a set of distinct tuples having zero attributes could have no 
more than one member, and so a UNIQUE constraint over zero attributes would say 
the relation can't have more than one tuple.  So unless someone wants to argue 
that two tuples with no attributes are not the same, my interpretation makes 
more sense and is clearly the one to follow. -- Darren Duncan




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Bruce Momjian

On Tue, Mar 26, 2013 at 03:06:30PM +, Simon Riggs wrote:
> > More generally, the fact that a patch has some user-frobbable knob
> > does not mean that it's actually a good or even usable solution.  As
> > everybody keeps saying, testing on a wide range of use-cases would be
> > needed to prove that, and we don't have enough time left for such
> > testing in the 9.3 timeframe.  This problem needs to be attacked in
> > an organized and deliberate fashion, not by hacking something up under
> > time pressure and shipping it with minimal testing.
> 
> Well, it has been tackled like that and we've *all* got nowhere. No
> worries, I can wait a year for that beer.

This was the obvious result of this discussion --- it is a shame we had
to discuss this rather than working on more pressing 9.3 issues.  I also
think someone saying "I would like to apply this now" is more disruptive
than casual discussion about things like buffer count locking.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-26 Thread Merlin Moncure

On Tue, Mar 26, 2013 at 11:40 AM, Bruce Momjian  wrote:
> On Fri, Mar 22, 2013 at 04:16:18PM -0400, Tom Lane wrote:
>> Merlin Moncure  writes:
>> > I think there is some very low hanging optimization fruit in the clock
>> > sweep loop.   first and foremost, I see no good reason why when
>> > scanning pages we have to spin and wait on a buffer in order to
>> > pedantically adjust usage_count.  some simple refactoring there could
>> > set it up so that a simple TAS (or even a TTAS with the first test in
>> > front of the cache line lock as we done automatically in x86 IIRC)
>> > could guard the buffer and, in the event of any lock detected, simply
>> > move on to the next candidate without messing around with that buffer
>> > at all.   This could construed as a 'trylock' variant of a spinlock
>> > and might help out with cases where an especially hot buffer is
>> > locking up the sweep.  This is exploiting the fact that from
>> > StrategyGetBuffer we don't need a *particular* buffer, just *a*
>> > buffer.
>>
>> Hm.  You could argue in fact that if there's contention for the buffer
>> header, that's proof that it's busy and shouldn't have its usage count
>> decremented.  So this seems okay from a logical standpoint.
>>
>> However, I'm not real sure that it's possible to do a conditional
>> spinlock acquire that doesn't create just as much hardware-level
>> contention as a full acquire (ie, TAS is about as bad whether it
>> gets the lock or not).  So the actual benefit is a bit less clear.
>
> Could we view the usage count, and if it is 5, and if there is someone
> holding the lock, we just ignore the buffer and move on to the next
> buffer?  Seems that could be done with no locking.

The idea is that if someone is "holding the lock" to completely ignore
the buffer regardless of usage.  Quotes there because we test the lock
without cacheline lock.  Now if the buffer is apparently unlocked but
returns locked once you *do* acquire cache line lock in anticipation
of refcounting, again immediately bail and go to next buffer.

I see no reason whatsoever to have buffer allocator spin and wait on a
blocked buffer.  This is like jumping onto a merry-go-round being spun
by sadistic teenagers.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-26 Thread Bruce Momjian

On Fri, Mar 22, 2013 at 04:16:18PM -0400, Tom Lane wrote:
> Merlin Moncure  writes:
> > I think there is some very low hanging optimization fruit in the clock
> > sweep loop.   first and foremost, I see no good reason why when
> > scanning pages we have to spin and wait on a buffer in order to
> > pedantically adjust usage_count.  some simple refactoring there could
> > set it up so that a simple TAS (or even a TTAS with the first test in
> > front of the cache line lock as we done automatically in x86 IIRC)
> > could guard the buffer and, in the event of any lock detected, simply
> > move on to the next candidate without messing around with that buffer
> > at all.   This could construed as a 'trylock' variant of a spinlock
> > and might help out with cases where an especially hot buffer is
> > locking up the sweep.  This is exploiting the fact that from
> > StrategyGetBuffer we don't need a *particular* buffer, just *a*
> > buffer.
> 
> Hm.  You could argue in fact that if there's contention for the buffer
> header, that's proof that it's busy and shouldn't have its usage count
> decremented.  So this seems okay from a logical standpoint.
> 
> However, I'm not real sure that it's possible to do a conditional
> spinlock acquire that doesn't create just as much hardware-level
> contention as a full acquire (ie, TAS is about as bad whether it
> gets the lock or not).  So the actual benefit is a bit less clear.

Could we view the usage count, and if it is 5, and if there is someone
holding the lock, we just ignore the buffer and move on to the next
buffer?  Seems that could be done with no locking.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-03-26 Thread Bruce Momjian

On Fri, Mar 22, 2013 at 06:06:18PM +, Greg Stark wrote:
> On Fri, Mar 22, 2013 at 2:02 PM, Tom Lane  wrote:
> > And we definitely looked at ARC
> 
> We didn't just look at it. At least one release used it. Then patent
> issues were raised (and I think the implementation had some contention
> problems).

The problem was cache line overhead between CPUs to manage the ARC
queues.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ideas for improving Concurrency Tests

2013-03-26 Thread Greg Stark

On Tue, Mar 26, 2013 at 7:31 AM, Amit Kapila  wrote:
> Above ideas could be useful to improve concurrency testing and can also be
> helpful to generate test cases for some of the complicated bugs for which
> there is no direct test.

I wonder how much explicit sync points would help with testing though.
It seems like they suffer from the problem that you'll only put sync
points where you actually expect problems and not where you don't
expect them -- which is exactly where problems are likely to occur.

Wouldn't it be more useful to implicitly create sync points whenever
synchronization events like spinlocks being taken occur?

And likewise explicitly listing the timing sequences to test seems
unconvincing. If we could arrange for two threads to execute every
possible interleaving of code by exhaustively trying every combination
that would be far more convincing. Most bugs are likely to hang out in
combinations we don't see in practice -- for instance having a tuple
deleted and a new one inserted in the same slot in the time a
different transaction was context switched out.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] odd behavior in materialized view

2013-03-26 Thread Kevin Grittner

Fujii Masao  wrote:

> Ping? ISTM this problem has not been fixed in HEAD yet.

It's next on my list.  The other reports seemed more serious and
more likely to be contentious in terms of the best fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Simon Riggs

On 26 March 2013 14:44, Tom Lane  wrote:
> Simon Riggs  writes:
>> So please, lets go with a simple solution now that allows users to say
>> what they want.
>
> Simon, this is just empty posturing, as your arguments have nothing
> whatsoever to do with whether the above description applies to your
> patch.

Waiting for an auto-tuned solution to *every* problem means we just
sit and watch bad things happen, knowing how to fix them for
particular cases yet not being able to do anything at all.

> More generally, the fact that a patch has some user-frobbable knob
> does not mean that it's actually a good or even usable solution.  As
> everybody keeps saying, testing on a wide range of use-cases would be
> needed to prove that, and we don't have enough time left for such
> testing in the 9.3 timeframe.  This problem needs to be attacked in
> an organized and deliberate fashion, not by hacking something up under
> time pressure and shipping it with minimal testing.

Well, it has been tackled like that and we've *all* got nowhere. No
worries, I can wait a year for that beer.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Back-branch security updates coming next week

2013-03-26 Thread Tom Lane

The core team has received word of a seriously nasty security problem
in recent releases of Postgres.  We will be wrapping update releases
to fix this next week, following the new "usual schedule" of tarball
wrap Monday afternoon EDT, public announcement Thursday (4/4).

Committers are reminded that it's uncool to commit any potentially
destabilizing changes to back branches in the last day or two before
a release wrap.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Tom Lane

Simon Riggs  writes:
> So please, lets go with a simple solution now that allows users to say
> what they want.

Simon, this is just empty posturing, as your arguments have nothing
whatsoever to do with whether the above description applies to your
patch.

More generally, the fact that a patch has some user-frobbable knob
does not mean that it's actually a good or even usable solution.  As
everybody keeps saying, testing on a wide range of use-cases would be
needed to prove that, and we don't have enough time left for such
testing in the 9.3 timeframe.  This problem needs to be attacked in
an organized and deliberate fashion, not by hacking something up under
time pressure and shipping it with minimal testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump in current master segfaults when dumping 9.2/9.1 databases

2013-03-26 Thread Heikki Linnakangas


On 26.03.2013 15:31, Bernd Helmle wrote:

My current master segfaults with pg_dump when dumping a 9.1 or 9.2
database:

$ LC_ALL=en_US.utf8 pg_dump -s -p 5448
pg_dump: column number -1 is out of range 0..22
zsh: segmentation fault LC_ALL=en_US.utf8 pg_dump -s -p 5448

The reason seems to be that getTables() in pg_dump.c forget to select
relpages in the query for releases >= 90100. The error message comes
from PQgetvalue(res, i, i_relpages), which complains about i_relpages
being -1, which will then return NULL...


Thanks, fixed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump in current master segfaults when dumping 9.2/9.1 databases

2013-03-26 Thread Bernd Helmle


My current master segfaults with pg_dump when dumping a 9.1 or 9.2 database:

$ LC_ALL=en_US.utf8 pg_dump -s -p 5448
pg_dump: column number -1 is out of range 0..22
zsh: segmentation fault  LC_ALL=en_US.utf8 pg_dump -s -p 5448

The reason seems to be that getTables() in pg_dump.c forget to select 
relpages in the query for releases >= 90100. The error message comes from 
PQgetvalue(res, i, i_relpages), which complains about i_relpages being -1, 
which will then return NULL...


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Merlin Moncure

On Tue, Mar 26, 2013 at 7:30 AM, Simon Riggs  wrote:
> On 26 March 2013 11:33, Robert Haas  wrote:
>> On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs  wrote:
>>> On 26 March 2013 01:35, Greg Stark  wrote:
 On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs  
 wrote:
> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
> that point.

 Are you saying you're only interested in working on it now? That after
 9.3 is release you won't be interested in working on it any more?

 As you said we've been eyeing this particular logic since 2004, why
 did it suddenly become more urgent now? Why didn't you work on it 9
 months ago at the beginning of the release cycle?
>>>
>>> I'm not sure why your comments are so confrontational here, but I
>>> don't think it helps much. I'm happy to buy you a beer too.
>>>
>>> As I explained clearly in my first post, this idea came about trying
>>> to improve on the negative aspects of the checksum patch. People were
>>> working on ideas 9 months ago to resolve this, but they have come to
>>> nothing. I regret that; Merlin and others have worked hard to find a
>>> way: Respect to them.
>>>
>>> My suggestion is to implement a feature that takes 1 day to write and
>>> needs little testing to show it works.
>>
>> Any patch in this area isn't likely to take much testing to establish
>> whether it improves some particular case.  The problem is what happens
>> to all of the other cases - and I don't believe that part needs little
>> testing, hence the objections (with which I agree) to doing anything
>> about this now.
>>
>> If we want to change something in this area, we might consider
>> resurrecting the patch I worked on for this last year, which had, I
>> believe, a fairly similar mechanism of operation to what you're
>> proposing, and some other nice properties as well:
>>
>> http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com
>> http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com
>>
>> ...but I think the main reason why that never went anywhere is because
>> we never really had any confidence that the upsides were worth the
>> downsides.  Fundamentally, postponing hint bit setting (or hint bit
>> I/O) increases the total amount of work done by the system.  You still
>> end up writing the hint bits eventually, and in the meantime you do
>> more CLOG lookups.  Now, as a compensating benefit, you can spread the
>> work of writing the hint-bit updated pages out over a longer period of
>> time, so that no single query carries too much of the burden of
>> getting the bits set.  The worst-case-latency vs. aggregate-throughput
>> tradeoff is one with a long history and I think it's appropriate to
>> view this problem through that lens also.
>
> I hadn't realised so many patches existed that were similar. Hackers
> is bigger these days.
>
> Reviewing the patch, I'd say the problem is that it is basically
> implementing a new automatic heuristic. We simply don't have any
> evidence that any new heuristic will work for all cases, so we do
> nothing.
>
> Whether we apply my patch, yours or Merlin's, my main thought now is
> that we need a user parameter to control it so it can be adjusted
> according to need and not touched at all if there is no problem.

After a night thinking about this, I'd like to make some points:

*) my patch deliberately did not 'set bits without dirty' -- with
checksums in mind as you noted (thanks for that).  I think the upside
for marking pages in that fasion anyways is overrated.

*) Any strategy that does not approximate hint bit behavior IMNSHO is
a non-starter.  By that I mean when your $condition is met so that
hint bits are not being written out, scans need to bail out of
HeapTupleSatisfiesMVCC processing with a cheap check.  If you don't do
that and rely on the transam.c guard, you've already missed the boat:
the even without clog lookup the extra processing there I can assure
you will show up in profiling of repeated scans (until vacuum).

*) The case of sequential tuples with the same xid is far and away the
most important one.  In OLTP workloads hint bit i/o is minor compared
to everything else going on.  Also, OLTP workloads are probably better
handled with an hint bit check just before eviction via bgwriter vs
during scan.

*) The budget for extra work inside HeapTupleSatisfiesMVCC is
exceptionally low.  For this reason, I think your idea would be better
framed at the page level and the bailout should be measured in the
number of pages, not tuples (that way the page can send in a single
boolean to control hint bit behavior).

*) The upside of optimizing xmax processing is fairly low for most
workloads I've seen

*) The benchmarking Amit and Hari did needs analysis.

*) For off-cycle release work that would help enable patches with
complex performance trade-offs (I'm working up another patch that has
even more comp

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Brendan Jurd

On 26 March 2013 22:57, Robert Haas  wrote:
> They hate it twice as much when the change is essentially cosmetic.
> There's no functional problems with arrays as they exist today that
> this change would solve.
>

We can't sensibly test for whether an array is empty.  I'd call that a
functional problem.

The NULL return from array_{length,lower,upper,ndims} is those
functions' way of saying their arguments failed a sanity check.  So we
cannot distinguish in a disciplined way between a valid, empty array,
and bad arguments.  If the zero-D implementation had been more
polished and say, array_ndims returned zero, we had provided an
array_empty function, or the existing functions threw errors for silly
arguments instead of returning NULL, then I'd be more inclined to see
your point.  But as it stands, the zero-D implementation has always
been half-baked and slightly broken, we just got used to working
around it.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Simon Riggs

On 26 March 2013 11:33, Robert Haas  wrote:
> On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs  wrote:
>> On 26 March 2013 01:35, Greg Stark  wrote:
>>> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs  wrote:
 I'll bet you all a beer at PgCon 2014 that this remains unresolved at
 that point.
>>>
>>> Are you saying you're only interested in working on it now? That after
>>> 9.3 is release you won't be interested in working on it any more?
>>>
>>> As you said we've been eyeing this particular logic since 2004, why
>>> did it suddenly become more urgent now? Why didn't you work on it 9
>>> months ago at the beginning of the release cycle?
>>
>> I'm not sure why your comments are so confrontational here, but I
>> don't think it helps much. I'm happy to buy you a beer too.
>>
>> As I explained clearly in my first post, this idea came about trying
>> to improve on the negative aspects of the checksum patch. People were
>> working on ideas 9 months ago to resolve this, but they have come to
>> nothing. I regret that; Merlin and others have worked hard to find a
>> way: Respect to them.
>>
>> My suggestion is to implement a feature that takes 1 day to write and
>> needs little testing to show it works.
>
> Any patch in this area isn't likely to take much testing to establish
> whether it improves some particular case.  The problem is what happens
> to all of the other cases - and I don't believe that part needs little
> testing, hence the objections (with which I agree) to doing anything
> about this now.
>
> If we want to change something in this area, we might consider
> resurrecting the patch I worked on for this last year, which had, I
> believe, a fairly similar mechanism of operation to what you're
> proposing, and some other nice properties as well:
>
> http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com
> http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com
>
> ...but I think the main reason why that never went anywhere is because
> we never really had any confidence that the upsides were worth the
> downsides.  Fundamentally, postponing hint bit setting (or hint bit
> I/O) increases the total amount of work done by the system.  You still
> end up writing the hint bits eventually, and in the meantime you do
> more CLOG lookups.  Now, as a compensating benefit, you can spread the
> work of writing the hint-bit updated pages out over a longer period of
> time, so that no single query carries too much of the burden of
> getting the bits set.  The worst-case-latency vs. aggregate-throughput
> tradeoff is one with a long history and I think it's appropriate to
> view this problem through that lens also.

I hadn't realised so many patches existed that were similar. Hackers
is bigger these days.

Reviewing the patch, I'd say the problem is that it is basically
implementing a new automatic heuristic. We simply don't have any
evidence that any new heuristic will work for all cases, so we do
nothing.

Whether we apply my patch, yours or Merlin's, my main thought now is
that we need a user parameter to control it so it can be adjusted
according to need and not touched at all if there is no problem.

My washing machine has a wonderful feature "15 min wash" and it works
great for the times I know I need it; but in general, the auto wash
mode works fine since often you don't care that it takes 90 minutes.
It's much easier to see that the additional user option is beneficial,
but much harder to start arguing that the default wash cycle should be
85 or 92 minutes. It'd be great if the washing machine could work out
that I need my clothes quickly and that on-this-day-only I don't care
about the thoroughness of the wash, but it can't. I don't think the
washing machine engineers are idiots for not being able to work that
out, but if they only offered a single option because they thought
they knew better than me, I'd be less than impressed.

In the same way, we need some way to say "big queries shouldn't do
cleanup" even if autovacuum ends up doing more I/O over time (though
in fact I doubt this is the case, detailed argument on other post).

So please, lets go with a simple solution now that allows users to say
what they want.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Pavel Stehule

2013/3/26 Robert Haas :
> On Sun, Mar 24, 2013 at 10:02 PM, Josh Berkus  wrote:
>> On 03/20/2013 04:45 PM, Brendan Jurd wrote:
>>> Incompatibility:
>>> This patch introduces an incompatible change in the behaviour of the
>>> aforementioned array functions -- instead of returning NULL for empty
>>> arrays they return meaningful values.  Applications relying on the old
>>> behaviour to test for emptiness may be disrupted.  One can
>>
>> As a heavy user of arrays, I support this patch as being worth the
>> breakage of backwards compatibility.  However, that means it certainly
>> will need to wait for 9.4 to provide adequate warning.
>
> I expect to lose this argument, but I think this is a terrible idea.
> Users really hate it when they try to upgrade and find that they, uh,
> can't, because of some application-level incompatibility like this.
> They hate it twice as much when the change is essentially cosmetic.
> There's no functional problems with arrays as they exist today that
> this change would solve.
>
> The way to make a change like this without breaking things for users
> is to introduce a new type with different behavior and gradually
> deprecate the old one.  Now, maybe it doesn't seem worth doing that
> for a change this small.  But if so, I think that's evidence that this
> isn't worth changing in the first place, not that it's worth changing
> without regard for backwards-compatibility.
>
> Personally, I think if we're going to start whacking around the
> behavior here and risk inconveniencing our users, we ought to think a
> little larger.  The fundamental thing that's dictating the current
> behavior is that we have arrays of between 1 and 6 dimensions all
> rolled up under a single data type.  But in many cases, if not nearly
> all cases, what people want is, specifically, a one-dimensional array.
>  If we were going to actually bite the bullet and create separate data
> types for each possible number of array dimensions... and maybe fix
> some other problems at the same time... then the effort involved in
> ensuring backward-compatibility might seem like time better spent.
>

I understand, but I don't agree. W have to fix impractical design of
arrays early. A ARRAY is 1st class - so there is not possible to use
varchar2 trick.

if we don't would to use GUC, what do you think about compatible
extension? We can overload a system functions behave. This can solve a
problem with updates and migrations.

Regards

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-03-26 Thread Robert Haas

On Sun, Mar 24, 2013 at 10:02 PM, Josh Berkus  wrote:
> On 03/20/2013 04:45 PM, Brendan Jurd wrote:
>> Incompatibility:
>> This patch introduces an incompatible change in the behaviour of the
>> aforementioned array functions -- instead of returning NULL for empty
>> arrays they return meaningful values.  Applications relying on the old
>> behaviour to test for emptiness may be disrupted.  One can
>
> As a heavy user of arrays, I support this patch as being worth the
> breakage of backwards compatibility.  However, that means it certainly
> will need to wait for 9.4 to provide adequate warning.

I expect to lose this argument, but I think this is a terrible idea.
Users really hate it when they try to upgrade and find that they, uh,
can't, because of some application-level incompatibility like this.
They hate it twice as much when the change is essentially cosmetic.
There's no functional problems with arrays as they exist today that
this change would solve.

The way to make a change like this without breaking things for users
is to introduce a new type with different behavior and gradually
deprecate the old one.  Now, maybe it doesn't seem worth doing that
for a change this small.  But if so, I think that's evidence that this
isn't worth changing in the first place, not that it's worth changing
without regard for backwards-compatibility.

Personally, I think if we're going to start whacking around the
behavior here and risk inconveniencing our users, we ought to think a
little larger.  The fundamental thing that's dictating the current
behavior is that we have arrays of between 1 and 6 dimensions all
rolled up under a single data type.  But in many cases, if not nearly
all cases, what people want is, specifically, a one-dimensional array.
 If we were going to actually bite the bullet and create separate data
types for each possible number of array dimensions... and maybe fix
some other problems at the same time... then the effort involved in
ensuring backward-compatibility might seem like time better spent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interesting post-mortem on a near disaster with git

2013-03-26 Thread Cédric Villemain

Le lundi 25 mars 2013 19:35:12, Daniel Farina a écrit :
> On Mon, Mar 25, 2013 at 11:07 AM, Stefan Kaltenbrunner
> 
>  wrote:
> >> Back when we used CVS for quite a few years I kept 7 day rolling
> >> snapshots of the CVS repo, against just such a difficulty as this. But
> >> we seem to be much better organized with infrastructure these days so I
> >> haven't done that for a long time.
> > 
> > well there is always room for improvement(and for learning from others)
> > - but I agree that this proposal seems way overkill. If people think we
> > should keep online "delayed" mirrors we certainly have the resources to
> > do that on our own if we want though...
> 
> What about rdiff-backup?  I've set it up for personal use years ago
> (via the handy open source bash script backupninja) years ago and it
> has a pretty nice no-frills point-in-time, self-expiring, file-based
> automatic backup program that works well with file synchronization
> like rsync (I rdiff-backup to one disk and rsync the entire
> rsync-backup output to another disk).  I've enjoyed using it quite a
> bit during my own personal-computer emergencies and thought the
> maintenance required from me has been zero, and I have used it from
> time to time to restore, proving it even works.  Hardlinks can be used
> to tag versions of a file-directory tree recursively relatively
> compactly.
> 
> It won't be as compact as a git-aware solution (since git tends to to
> rewrite entire files, which will confuse file-based incremental
> differential backup), but the amount of data we are talking about is
> pretty small, and as far as a lowest-common-denominator tradeoff for
> use in emergencies, I have to give it a lot of praise.  The main
> advantage it has here is it implements point-in-time recovery
> operations that easy to use and actually seem to work.  That said,
> I've mostly done targeted recoveries rather than trying to recover
> entire trees.

I have the same set up, and same feedback.
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Robert Haas

On Tue, Mar 26, 2013 at 5:27 AM, Simon Riggs  wrote:
> On 26 March 2013 01:35, Greg Stark  wrote:
>> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs  wrote:
>>> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
>>> that point.
>>
>> Are you saying you're only interested in working on it now? That after
>> 9.3 is release you won't be interested in working on it any more?
>>
>> As you said we've been eyeing this particular logic since 2004, why
>> did it suddenly become more urgent now? Why didn't you work on it 9
>> months ago at the beginning of the release cycle?
>
> I'm not sure why your comments are so confrontational here, but I
> don't think it helps much. I'm happy to buy you a beer too.
>
> As I explained clearly in my first post, this idea came about trying
> to improve on the negative aspects of the checksum patch. People were
> working on ideas 9 months ago to resolve this, but they have come to
> nothing. I regret that; Merlin and others have worked hard to find a
> way: Respect to them.
>
> My suggestion is to implement a feature that takes 1 day to write and
> needs little testing to show it works.

Any patch in this area isn't likely to take much testing to establish
whether it improves some particular case.  The problem is what happens
to all of the other cases - and I don't believe that part needs little
testing, hence the objections (with which I agree) to doing anything
about this now.

If we want to change something in this area, we might consider
resurrecting the patch I worked on for this last year, which had, I
believe, a fairly similar mechanism of operation to what you're
proposing, and some other nice properties as well:

http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com
http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com

...but I think the main reason why that never went anywhere is because
we never really had any confidence that the upsides were worth the
downsides.  Fundamentally, postponing hint bit setting (or hint bit
I/O) increases the total amount of work done by the system.  You still
end up writing the hint bits eventually, and in the meantime you do
more CLOG lookups.  Now, as a compensating benefit, you can spread the
work of writing the hint-bit updated pages out over a longer period of
time, so that no single query carries too much of the burden of
getting the bits set.  The worst-case-latency vs. aggregate-throughput
tradeoff is one with a long history and I think it's appropriate to
view this problem through that lens also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-03-26 Thread Heikki Linnakangas


On 30.12.2012 08:31, Daniel Farina wrote:

A version implementing that is attached, except I generate an
additional 64-bit session not exposed to the client to prevent even
casual de-leaking of the session state.  That may seem absurd, until
someone writes a tool that de-xors things and relies on it and then
nobody feels inclined to break it.  It also keeps the public session
number short.

I also opted to save the underestimate since I'm adding a handful of
fixed width fields to the file format anyway.


This patch needs documentation. At a minimum, the new calls_underest 
field needs to be listed in the description of the pg_stat_statements.


Pardon for not following the discussion: What exactly does the 
calls_underest field mean? I couldn't decipher it from the patch. What 
can an admin do with the value? How does it compare with just bumping up 
pg_stat_statements.max?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Limiting setting of hint bits by read-only queries; vacuum_delay

2013-03-26 Thread Simon Riggs

On 26 March 2013 01:35, Greg Stark  wrote:
> On Tue, Mar 26, 2013 at 12:00 AM, Simon Riggs  wrote:
>> I'll bet you all a beer at PgCon 2014 that this remains unresolved at
>> that point.
>
> Are you saying you're only interested in working on it now? That after
> 9.3 is release you won't be interested in working on it any more?
>
> As you said we've been eyeing this particular logic since 2004, why
> did it suddenly become more urgent now? Why didn't you work on it 9
> months ago at the beginning of the release cycle?

I'm not sure why your comments are so confrontational here, but I
don't think it helps much. I'm happy to buy you a beer too.

As I explained clearly in my first post, this idea came about trying
to improve on the negative aspects of the checksum patch. People were
working on ideas 9 months ago to resolve this, but they have come to
nothing. I regret that; Merlin and others have worked hard to find a
way: Respect to them.

My suggestion is to implement a feature that takes 1 day to write and
needs little testing to show it works. I'm happy to pursue that path
now, or later. Deciding we need an all-singing, all-dancing solution
that will take our best men (another) 6 months of hard arguing and
implementation is by far the best way I know of killing anything and I
won't be pursuing that route. If we did have 6 months funding for
any-feature-you-like, I wouldn't spend it all on this. My bet that
nobody else will have enough patience, time and skill, let alone
unpaid leave to follow that path, stands.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] adding support for zero-attribute unique/etc keys

2013-03-26 Thread Albe Laurenz

Darren Duncan wrote:
>> The standard defines UNIQUE on the basis of the "UNIQUE predicate":
>>  ::= UNIQUE 
>> and states:
>> 1) Let T be the result of the .
>> 2) If there are no two rows in T such that the value of each column
>>in one row is non-null and is not distinct
>>from the value of the corresponding column in the other row,
>>then the result of the  is
>>*True*; otherwise, the result of the  is *False*.
>>
>> Since an imagined zero-column query would have an empty set of
>> result columns, you could with equal force argue that these columns
>> satisfy the condition or not, because the members of the empty
>> set have all the properties you desire.
>>
>> So I see no compelling argument that such a UNIQUE constraint
>> would force a single-row table.
> 
> I do see that compelling argument, and it has to do with identities.
> 
> The above definition of "UNIQUE predicate" says that the UNIQUE predicate is
> FALSE iff, for every pair of rows in T, the 2 rows of any pair are the same.

I don't understand that sentence.
I would say that it is FALSE iff there exist two rows in T
that satisy:
a) each column in both rows is not-null
b) each column in one of the rows is not distinct from
   the corresponding column in the other row

> Further, 2 rows are the same iff, for every corresponding column, the values 
> in
> both rows are the same.  Further, 2 such values are the same iff they are both
> not null and are mutually not distinct.
> 
> So, determining if 2 rows are the same involves an iteration of dyadic logical
> AND over the predicates for each column comparison.  Now logical AND has an
> identity value, which is TRUE, because "TRUE AND p" (and "p AND TRUE") results
> in "p" for all "p".  Therefore, any 2 rows with zero columns each are the 
> same.
> 
> Since any 2 rows with zero columns are the same, the "UNIQUE predicate" is 
> FALSE
> any time there is more than 1 row in a table.
> 
> Hence, a UNIQUE constraint over zero columns signifies a row-comparison
> predicate that unconditionally results in TRUE, and so no two rows at all 
> would
> be allowed in the table with that constraint at once, thus restricting the 
> table
> to at most one row.
> 
> Does anyone agree or disagree with this logic?

Yes :^)

You could use the same kind of argument like this:

UNIQUE is true iff any two rows in T satisfy for each column:
the column in row 1 is null OR the column in row 2 is null OR
the column in row 1 is distinct from the column in row 2

Now you you iterate your logical AND over this predicate
for all columns and come up with TRUE since there are none.
Consequently UNIQUE is satisfied, no matter how many rows there are.

In a nutshell:
All members of the empty set satisfy p, but also:
all members of the empty set satisfy the negation of p.

You can use this technique to make anything plausible.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.

2013-03-26 Thread Heikki Linnakangas


On 26.03.2013 02:02, Tom Lane wrote:

Heikki Linnakangas  writes:

On 25.03.2013 15:36, Tom Lane wrote:

Heikki Linnakangas   writes:

Add PF_PRINTF_ATTRIBUTE to on_exit_msg_fmt.
Per warning from -Wmissing-format-attribute.



Hm, this is exactly what I removed yesterday, because it makes the build
fail outright on old gcc:



The attached seems to work. With this patch, on_exit_msg_func() is gone.
There's a different implementation of exit_horribly for pg_dumpall and
pg_dump/restore. In pg_dumpall, it just calls vwrite_msg(). In
pg_dump/restore's version, the logic from parallel_exit_msg_func() is
moved directly to exit_horribly().


Seems probably reasonable, though if we're taking exit_horribly out of
dumputils.c, meseems it ought not be declared in dumputils.h anymore.
Can we put that declaration someplace else, rather than commenting it
with an apology?


Ugh, the patch I posted doesn't actually work, because dumputils.c is 
also used in psql and some scripts, so you get a linker error in those. 
psql and scripts don't use exit_horribly or many of the other functions 
in dumputils.c, so I think we should split dumputils.c into two parts 
anyway. fmtId and the other functions that are used by psql in one file, 
and the functions that are only shared between pg_dumpall and pg_dump in 
another. Then there's also functions that are used by pg_dump and 
pg_restore, but not pg_dumpall or psql.


I'll try moving things around a bit...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Ideas for improving Concurrency Tests

2013-03-26 Thread Amit Kapila

Ideas for improving Concurrency testing   



1. Synchronization points in server code - To have better control for
concurrency testing, define synchronization points in server code which can
be used as follows: 

heap_truncate(..) 
{ 
 

SYNC_POINT(procid,'before_heap_open') 
rel = heap_open(rid,
AccessExclusiveLock); 
relations = lappend(relations, rel); 
} 

exec_simple_query(..) 
{ 
...

finish_xact_command(); 
 
SYNC_POINT(procid,'finish_xact_command') 

/* 
 * If there were no parsetrees,
return EmptyQueryResponse message. 
 */ 
 if (!parsetree_list) 
NullCommand(dest); 
 ... 
 }   
  


When code reaches at sync point it can
either emit a signal 
or wait for a signal 

Signal 
A value of a shared memory variable that
will be interpretted by different 
SYNC POINTS based on it's value. 

Emit a signal 
Assign the value (the signal) to the
shared memory variable ("set a flag") and 
broadcast a global condition to wake
those waiting for a signal. 

Wait for a signal 
Loop over waiting for the global
condition until the global value matches 
the wait-for signal 

   To activate Synchronization points appropriate
actions can be set. 
   For Example, 
SET SYNC_POINT = 'before_heap_open WAIT_FOR
commit'; 
SET SYNC_POINT = 'after_finish_xact_command
SIGNAL commit'; 

   This above commands can activate the synchronization
points named 'before_heap_open' 
   and 'after_finish_xact_command'. 


   session "s1" 
   step s11  {SET SYNC_POINT = 'before_heap_open
WAIT_FOR commit';} 
   step s12  {Truncate tbl;} 
   session "s2" 
   step s21  {SET SYNC_POINT =
'after_finish_xact_command SIGNAL commit';} 
   step s22  {Insert into tbl values(1);} 

   The first activation requests the synchronization
point to wait for 
   another backend to emit the signal 'commit', and
second activation requests 
   the synchronization point to emit the signal
'commit', when the process's execution runs through 
   the synchronization point. 

   Above defined test will allow Truncate table to wait
for Insert to finish 

2. Enhance Isolation Framework - Currently, at most one step can be waiting
at a time. Enhance Concurrency test framework (isolation tester) to make
multiple sessions wait and then allow to release it serially. 

  This might help in
generating complex dead lock scenario's.

 

 

Above ideas could be useful to improve concurrency testing and can also be
helpful to generate test cases for some of the complicated bugs for which
there is no direct test.

This work is not a patch for 9.3, I just wanted an initial feedback. 

 

Feedback/Suggestions?

 

 

Reference : http://dev.mysql.com/doc/internals/en/debug-sync-facility.html

 

 

With Regards,

Amit Kapila.

64 matches

Mail list logo