date:20140120

On 16 January 2014 22:37, Stephen Frost sfr...@snowman.net wrote:

   allow users to more easily move their various objects from
   one tablespace to another.  Included are docs and a regression test;
   I'm happy to improve on both should folks send me suggestions.

Sounds good.

The command uses the word ALL but then less than all objects, i.e.
only moves objects that are owned by the user.

I would like to see two variants of this...

ALL ... which attempts to move all objects and fails if it doesn't own
everything
ALL OWNED ... which moves only objects that it owns, and ignores others

i.e. ALL should mean all

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-20 Thread Mel Gorman

On Fri, Jan 17, 2014 at 11:01:15AM -0800, Josh Berkus wrote:
 Mel,
 

Hi,

 So we have a few interested parties.  What do we need to do to set up
 the Collab session?
 

This is great and thanks!

There are two summits of interest here -- LSF/MM which will have all the
filesystem, storage and memory managemnet people at it on March 24-25th
and Collaboration Summit which is on March 26-28th. We're interested in
both.

The LSF/MM committe are going through the first round of topic proposals at
the moment and we're aiming to send out the first set of invites soon. We're
hoping to invite two PostgreSQL people to LSF/MM itself for the dedicated
topic and your feedback on other topics and how they may help or hinder
PostgreSQL would be welcomed.

As LSF/MM is a relatively closed forum I'll be looking into having a
follow-up discussion at Collaboration Summit that is open to a wider and
more dedicated group. That hopefully will result in a small number of
concrete proposals that can be turned into patches over time.

-- 
Mel Gorman
SUSE Labs


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-20 Thread Andres Freund

On 2014-01-17 18:34:25 +, Mel Gorman wrote:
  The scheme that'd allow us is the following:
  When postgres reads a data page, it will continue to first look up the
  page in its shared buffers, if it's not there, it will perform a page
  cache backed read, but instruct that read to immediately remove from the
  page cache afterwards (new API or, posix_fadvise() or whatever).
  As long
  as it's in shared_buffers, postgres will not need to issue new reads, so
  there's no no benefit keeping it in the page cache.
  If the page is dirtied, it will be written out normally telling the
  kernel to forget about the caching the page (using 3) or possibly direct
  io).
  When a page in postgres's buffers (which wouldn't be set to very large
  values) isn't needed anymore and *not* dirty, it will seed the kernel
  page cache with the current data.
  
 
 Ordinarily the initial read page could be discarded with fadvise but
 the later write would cause the data to be read back in again which is a
 waste. The details of avoiding that re-read are tricky from a core kernel
 perspective because ordinarily the kernel at that point does not know if
 the write is a full complete aligned write of an underlying filesystem
 structure or not.  It may need a different write path which potentially
 leads into needing changes to the address_space operations on a filesystem
 basis -- that would get messy and be a Linux-specific extension. I have
 not researched this properly at all, I could be way off but I have a
 feeling the details get messy.

Hm. This is surprising me a bit - and I bet it does hurt postgres
noticeably if that's the case since the most frequently modified buffers
will only be written out to the OS once every checkpoint but never be
read-in. So they are likely not to be hot enough to stay cached under
cache-pressure.
So this would be a generally beneficial feature - and I doubt it's only
postgres that'd benefit.

  Now, such a scheme wouldn't likely be zero-copy, but it would avoid
  double buffering.
 
 It wouldn't be zero copy because minimally the data needs to be handed
 over the filesystem for writing to the disk and the interface for that is
 offset,length based, not page based. Maybe sometimes it will be zero copy
 but it would be a filesystem-specific thing.

Exactly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compiling extensions on Windows

2014-01-20 Thread Albe Laurenz

Craig Ringer wrote:
 Out of personal interest (in pain and suffering) I was recently looking
 into how to compile extensions out-of-tree on Windows using Visual
 Studio (i.e. no PGXS).
 
 It looks like the conventional answer to this is Do a source build of
 PG, compile your ext in-tree in contrib/, and hope the result is binary
 compatible with release PostgreSQL builds for Windows. Certainly that's
 how I've been doing it to date.
 
 How about everyone else here? Does anyone actually build and distribute
 extensions out of tree at all?

I build binaries for oracle_fdw for EDB's binaries by hand,
i.e. I look at what PGXS does on Linux and try to do something
comparable with cl.  It's rather painful, and something like
PGXS for MSVC would be very welcome.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Dave Chinner

On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
 On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote:
  it's very common to create temporary file data that will never, ever, ever
  actually NEED to hit disk. Where I work being able to tell the kernel to
  avoid flushing those files unless the kernel thinks it's got better things
  to do with that memory would be EXTREMELY valuable
 
 Windows has the FILE_ATTRIBUTE_TEMPORARY flag for this purpose.
 
 ISTR that there was discussion about implementing something analogous
 in Linux when ext4 got delayed allocation support, but I don't think
 it got anywhere and I can't find the discussion now. I think the
 proposed interface was to create and then unlink the file immediately,
 which serves as a hint that the application doesn't care about
 persistence.

You're thinking about O_TMPFILE, which is for making temp files that
can't be seen in the filesystem namespace, not for preventing them
from being written to disk.

I don't really like the idea of overloading a namespace directive to
have special writeback connotations. What we are getting into the
realm of here is generic user controlled allocation and writeback
policy...

 Postgres is far from being the only application that wants this; many
 people resort to tmpfs because of this:
 https://lwn.net/Articles/499410/

Yes, we covered the possibility of using tmpfs much earlier in the
thread, and came to the conclusion that temp files can be larger
than memory so tmpfs isn't the solution here. :)

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-20 Thread Florian Pflug

On Jan19, 2014, at 20:00 , David Rowley dgrowle...@gmail.com wrote:
 I've applied that patch again and put in the sort operators.

I've push a new version to https://github.com/fgp/postgres/tree/invtrans
which includes

* A bunch of missing declaration for *_inv functions

* An assert that the frame end doesn't move backwards - I realized that
  it is after all easy to do that, if it's done after the loop which adds
  the new values, not before.

* EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate
  transitions per row and aggregate. It's a bit imprecise, because it doesn't
  track the count per aggregate, but it's still a good metric for how well
  the inverse transition functions work. If the number is close to one, you
  know that very few rescans are happening.

* I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
  it's the last commit, so if you object to that, then you can merge up to
  eafa72330f23f7c970019156fcc26b18dd55be27 instead of
  de3d9148be9732c4870b76af96c309eaf1d613d7.

A few more things I noticed, all minor stuff

* do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if the
  inverse transition function returns NULL once, we never call it again, so the
  flag won't have any practical effect. And second, assume we ever called the
  forward transition function after the inverse fail, and then retried the 
inverse.
  In the case of do_numeric_discard(), that actually *could* allow the inverse
  to suddenly succeed - if the call to the forward function increased the dscale
  beyond that of the element we tried to remove, removal would suddenly be
  possible again. We never do that, of course, and it seems unlikely we ever
  will. But it's still weird to have code which serves no other purpose than to
  pessimize a case which would otherwise just work fine.

* The state == NULL checks in all the strict inverse transition functions are
  redundant.

I haven't taken a close look at the documentation yet, I hope to be able to
do that tomorrow. Otherwise, things look good as far as I'm concerned.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hstore 2.0 patch

2014-01-20 Thread Rami Grossman

Hi everyone,

Is it possible to add the hstore 2.0 patch (and later the json solution) as
an extension for 9.3 right now, so we'll not have to wait for 9.4

I can say that this patch is very important for my organisation's large
project, they won't approve it as a patch but as extension it will go.

Thanks for your important work!

Rami

Re: [HACKERS] improve the help message about psql -F

2014-01-20 Thread Jov

reasonable，I removed the set,patch attached.

Jov
blog: http:amutu.com/blog http://amutu.com/blog


2014/1/20 Marti Raudsepp ma...@juffo.org

 2014/1/17 Jov am...@amutu.com
  but in the psql --help,-F say:
 
  set field separator (default: |)

  if user don't read the offical doc carefully,he can use:
 
  psql -F , -c 'select ...'
 
  But can't get what he want.
  It is a bad user Experience.

 +1 from me, patch applies and is helpful.

 After patching this line in psql --help is 82 characters long; I think
 it's best to keep help screens below 80 characters wide (although
 there's already 1 other line violating this rule).

 I think the word set is pretty useless there anyway, maybe remove
 that so the message becomes field separator for unaligned output
 (default: |)

 PS: There isn't an open CommitFest to add this to. Shouldn't we always
 open a new CF when the last one goes in progress? If there's no date,
 it may be simply called next

 Regards,
 Marti




field_separator_set_help_improve-v2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-20 Thread David Rowley

On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug f...@phlo.org wrote:

 On Jan19, 2014, at 20:00 , David Rowley dgrowle...@gmail.com wrote:
  I've applied that patch again and put in the sort operators.

 I've push a new version to https://github.com/fgp/postgres/tree/invtrans
 which includes

 * A bunch of missing declaration for *_inv functions


Thanks, I've applied that.


 * An assert that the frame end doesn't move backwards - I realized that
   it is after all easy to do that, if it's done after the loop which adds
   the new values, not before.


I've applied this too, but I'm wondering why an elog for if the head moves
back, but an assert if the tail moves back?


 * EXPLAIN VERBOSE ANALYZE now shows the max. number of forward aggregate
   transitions per row and aggregate. It's a bit imprecise, because it
 doesn't
   track the count per aggregate, but it's still a good metric for how well
   the inverse transition functions work. If the number is close to one, you
   know that very few rescans are happening.


I've not looked at this yet and I don't think I'll have time tonight, but
it sounds interesting. I guess it might be quite nice to have a way to see
this especially with the way the numeric stuff works, it might be actually
pretty hard to otherwise know how many inverse transition failures there
had been. Do you think it's also worth tracking the inverse transition
failures too?

* I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change,
 and
   it's the last commit, so if you object to that, then you can merge up to
   eafa72330f23f7c970019156fcc26b18dd55be27 instead of
   de3d9148be9732c4870b76af96c309eaf1d613d7.


Seems like sfunc really should be tfunc then we could have invtfunc. I'd
probably understand this better if I knew what the 's' was for in sfunc.
I've not applied this just yet. Do you have a reason why you think it's
better?


 A few more things I noticed, all minor stuff

 * do_numeric_discard()'s inverseTransValid flag is unnecessary. First, if
 the
   inverse transition function returns NULL once, we never call it again,
 so the
   flag won't have any practical effect. And second, assume we ever called
 the
   forward transition function after the inverse fail, and then retried the
 inverse.
   In the case of do_numeric_discard(), that actually *could* allow the
 inverse
   to suddenly succeed - if the call to the forward function increased the
 dscale
   beyond that of the element we tried to remove, removal would suddenly be
   possible again. We never do that, of course, and it seems unlikely we
 ever
   will. But it's still weird to have code which serves no other purpose
 than to
   pessimize a case which would otherwise just work fine.


hmm, yeah of course, you are right. I've removed this now.


 * The state == NULL checks in all the strict inverse transition functions
 are
   redundant.


ok, I've removed these and added comments to note that these functions
should be declared strict.


 I haven't taken a close look at the documentation yet, I hope to be able to
 do that tomorrow. Otherwise, things look good as far as I'm concerned.


Thanks, yeah those really do need a review. I've lost a bit of direction
with them and I'm not quite sure just how much detail to go in to with it.
I'd like to explain a bit that users who need to use their numeric columns
in window aggregates might want to think about having a defined scale to
the numeric rather than an undefined scale and explain that this is because
the inverse transition function for numeric bails out if it loses the
maximum seen dscale. Though it does seem generally a good idea to have a
defined scale, but then I guess you've got to have a bit of knowledge about
the numbers you're storing in that case. I'm not quite sure how to put that
into words friendly enough for the documents just yet and or exactly where
to put the words. So any ideas or patches you have around that would be
great.

Once again thanks for all your work on this.

Regards

David Rowley


 best regards,
 Florian Pflug

Re: [HACKERS] Add value to error message when size extends

2014-01-20 Thread Daniel Erez

Hi,

Many thanks for the prompt response and the suggestions!

So, regarding the issue of production quality you've mentioned,
we understand there are two remaining matters to address:
1. debug_query_string:
   As we can't rely on this flag, is there any alternative we can rely on?
2. encapsulation:
   Is there any utility file we could extract the logic to?
   Or, any other functions that are used for encapsulation mechanism?

Thanks!
Daniel 

- Original Message -
 From: Maor Lipchuk mlipc...@redhat.com
 To: Tom Lane t...@sss.pgh.pa.us, Marti Raudsepp ma...@juffo.org
 Cc: pgsql-hackers pgsql-hackers@postgresql.org, Daniel Erez 
 de...@redhat.com
 Sent: Monday, January 20, 2014 2:32:57 AM
 Subject: Re: [HACKERS] Add value to error message when size extends
 
 Hi Tom and Marti
 Thank you so much for your respond.
 
 The solution Tom proposed is much more better, and it will be a great
 solution for clarifying many issues regarding this error.
 
 Regards,
 Maor
 
 
 On 01/19/2014 10:00 PM, Tom Lane wrote:
  Marti Raudsepp ma...@juffo.org writes:
  On Sun, Jan 19, 2014 at 8:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  One thing that occurs to me just now is that perhaps we could report
  the failure as if it were a syntax error
  
  That would be cool, if it can be made to work.
  
  Just as a five-minute proof-of-concept hack, attached is a patch that
  makes varchar() report an error position if it can get one.  This is
  *very* far from being production quality --- debug_query_string is the
  wrong thing to rely on in general, and we'd certainly want to encapsulate
  the logic rather than have individual functions know about how to do it.
  But here's some testing that shows that the idea seems to have promise
  from a usability standpoint:
  
  regression=# create table test (f1 varchar(10), f2 varchar(5), f3
  varchar(10));
  CREATE TABLE
  
  regression=# insert into test values ('a', 'b', 'foobar');
  INSERT 0 1
  
  regression=# insert into test values ('foobar', 'foobar', 'foobar');
  ERROR:  value too long for type character varying(5)
  LINE 1: insert into test values ('foobar', 'foobar', 'foobar');
 ^
  
  regression=# update test set f2 = f3 where f1 = 'a';
  ERROR:  value too long for type character varying(5)
  LINE 1: update test set f2 = f3 where f1 = 'a';
   ^
  
  The first error case points out a limitation of relying on the contents of
  the string to figure out where your problem is.  The error-cursor approach
  has its own limitations, of course; for instance the second case might not
  be thought to be all that helpful.
 Yes, but in this case you will know specifically which column is the
 problematic one.
 The highlight of your approach gains much more benefit when
 updating/inserting multiple values in one update/insert command.
  
  regards, tom lane
  
 
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-01-20 Thread MauMau


From: Amit Kapila amit.kapil...@gmail.com
  Do you think without this the problem reported by you is resolved 
completely.
  User can hit same problem, if he tries to follow similar steps mentioned 
in

  your first mail. I had tried below steps based on description in your
  first mail:

If user register/unregister different versions of pgevent.dll blindly,
then I think
any fix in this area could not prevent the error event source not found


OK, I'll add a check to prevent duplicate registration of the same event 
source with the message:


The specified event source is already registered.

Please correct me if there's better expression in English.

Are there any other suggestions to make this patch ready for committer?  I'd 
like to make all changes in one submission.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

2014-01-20 Thread Marko Tiikkaja


On 1/20/14 2:25 AM, Robert Haas wrote:

On Fri, Jan 17, 2014 at 5:45 AM, Marko Tiikkaja ma...@joh.to wrote:

I see these as being two are different things.  There *is* a need for a
full-blown static analyzer for PL/PgSQL, but I don't think it needs to be in
core.  However, there seems to be a number of pitfalls we could warn the
user about with little effort in core, and I think those are worth doing.


I don't want to be overly negative, but that sounds sort of like
you're saying that it's not worth having a good static analyzer in
core, but that you are in favor of having a bad one.


Sort of, yeah.

My experience of static analyzers is that it's not really feasible to 
try and fix all code they think might be faulty, and I don't expect a 
PL/PgSQL one to be any different.  The idea behind these warnings (to 
me, at least) has been that they're simple and uncontroversial enough 
that it's feasible to say never commit code which produces warnings 
upon compilation.  I realize that where to draw that line is a bit 
arbitrary and subjective, and I don't expect everyone to agree with me 
on the exact list of these warnings.



I personally tend to think a static analyzer is a better fit than
adding a whole laundry list of pragmas.  Most people won't remember to
turn them all on anyway , and those who do will find that it gets
pretty tedious after we have more than about two of them.


What's so hard about plpgsql.warnings='all'?  Or if the fact that it's a 
list is your concern, I'm not going to oppose to making it a boolean.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-20 Thread Alexander Korotkov

On Sun, Jan 19, 2014 at 5:57 AM, Andreas Karlsson andr...@proxel.se wrote:

 On 01/18/2014 08:13 PM, Jeremy Harris wrote:

 On 31/12/13 01:41, Andreas Karlsson wrote:

 On 12/29/2013 08:24 AM, David Rowley wrote:

 If it was possible to devise some way to reuse any
 previous tuplesortstate perhaps just inventing a reset method which
 clears out tuples, then we could see performance exceed the standard
 seqscan - sort. The code the way it is seems to lookup the sort
 functions from the syscache for each group then allocate some sort
 space, so quite a bit of time is also spent in palloc0() and pfree()

 If it was not possible to do this then maybe adding a cost to the number
 of sort groups would be better so that the optimization is skipped if
 there are too many sort groups.


 It should be possible. I have hacked a quick proof of concept for
 reusing the tuplesort state. Can you try it and see if the performance
 regression is fixed by this?

 One thing which have to be fixed with my patch is that we probably want
 to close the tuplesort once we have returned the last tuple from
 ExecSort().

 I have attached my patch and the incremental patch on Alexander's patch.


 How does this work in combination with randomAccess ?


 As far as I can tell randomAccess was broken by the partial sort patch
 even before my change since it would not iterate over multiple tuplesorts
 anyway.

 Alexander: Is this true or am I missing something?


Yes, I decided that Sort node shouldn't provide randomAccess in the case of
skipCols !=0. See assert in the beginning of ExecInitSort. I decided that
it would be better to add explicit materialize node rather than store extra
tuples in tuplesortstate each time.
I also adjusted ExecSupportsMarkRestore, ExecMaterializesOutput and
ExecMaterializesOutput to make planner believe so. I found path-pathtype
to be absolutely never T_Sort. Correct me if I'm wrong.

Another changes in this version of patch:
1) Applied patch to don't compare skipCols in tuplesort by Marti Raudsepp
2) Adjusting sort bound after processing buckets.

--
With best regards,
Alexander Korotkov.


partial-sort-6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

2014-01-20 Thread Robert Haas

On Mon, Jan 20, 2014 at 7:16 AM, Marko Tiikkaja ma...@joh.to wrote:
 What's so hard about plpgsql.warnings='all'?  Or if the fact that it's a
 list is your concern, I'm not going to oppose to making it a boolean.

Sure, that'd be fine.  What I don't want is to have to start each function with:

#option warn_this
#option warn_that
#option warn_theotherthing
#option warn_somethingelse
#option warn_yetanotherthing
#option warn_whatdoesthisdoagain

Also, I think that the way we've been doing it, each of those needs to
become a PL/pgsql keyword.  That's going to become a problem at some
point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

2014-01-20 Thread Marko Tiikkaja


On 1/20/14 1:55 PM, Robert Haas wrote:

On Mon, Jan 20, 2014 at 7:16 AM, Marko Tiikkaja ma...@joh.to wrote:

What's so hard about plpgsql.warnings='all'?  Or if the fact that it's a
list is your concern, I'm not going to oppose to making it a boolean.


Sure, that'd be fine.  What I don't want is to have to start each function with:

#option warn_this
#option warn_that
#option warn_theotherthing
#option warn_somethingelse
#option warn_yetanotherthing
#option warn_whatdoesthisdoagain


Right.  Completely agreed.  The only reason I had them in the patch is 
to have the ability to turn *off* a specific warning for a particular 
function.  But even that's of a bit dubious a value.



Also, I think that the way we've been doing it, each of those needs to
become a PL/pgsql keyword.  That's going to become a problem at some
point.


Yeah, probably. :-(


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

* Simon Riggs (si...@2ndquadrant.com) wrote:
 The command uses the word ALL but then less than all objects, i.e.
 only moves objects that are owned by the user.

My thinking was that it was all from that user's perspective.

 I would like to see two variants of this...
 
 ALL ... which attempts to move all objects and fails if it doesn't own
 everything
 ALL OWNED ... which moves only objects that it owns, and ignores others

I could add that, though it feels like the next request would be to
allow a specific role to be passed in (ie: move all of *this* user's
objects) and I'm not sure we really need to go to that level.  It
doesn't seem like there's really much point in having two options
either- ALL OWNED run by the superuser would be identical to ALL and
normal users would have zero use for just ALL because it would either
be identical to ALL OWNED or it would fail with a permission denied
error.

If an extra noise word to clarify what is happening would be useful,
then I could simply require OWNED as well, but I'm not particularly
thrilled with that option, also ...

 i.e. ALL should mean all

This is a bit of a non-starter when it comes to tablespaces anyway- we
can't move another database's objects and so even if it was ALL, it
may only be moving a subset of the objects in the tablespace (namely
those which are in the current database).  I don't see it being an
improvement to require IN CURRENT DATABASE ALL OWNED even though it
would be more accurate.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] array_length(anyarray)

2014-01-20 Thread Dean Rasheed

On 19 January 2014 11:43, Marko Tiikkaja ma...@joh.to wrote:
 New version attached, without the doc change.


This looks good to me.

 - applies cleanly.
 - compiles with no warnings.
 - passes a sensible set of new regression tests.
 - implements the agreed behaviour, per SQL spec.
 - I can't think of any corner cases to break it.

I think this is ready for committer, although I would also like to see
the doc changes to make the table of array function descriptions a bit
more explicit about corner cases.

Also, does this mean that we can now claim full support for SQL
feature S091 Basic array support?

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hstore 2.0 patch

2014-01-20 Thread Andrew Dunstan



On 01/19/2014 10:31 PM, Rami Grossman wrote:

Hi everyone,

Is it possible to add the hstore 2.0 patch (and later the json 
solution) as an extension for 9.3 right now, so we'll not have to wait 
for 9.4


I can say that this patch is very important for my organisation's 
large project, they won't approve it as a patch but as extension it 
will go.


Thanks for your important work!




In the past I have backported some of the json work as extensions. 
However, that has been done as paid work at commercial rates.


Furthermore it isn't always 100% doable. Backporting types (like jsonb) 
is particularly troublesome, since a type created by an extension 
normally can not be upgradable to a builtin type. That means playing 
some pretty ugly tricks. What is more, we can't even play those tricks 
until we know for sure what the builtin type Oid will be, which we do 
not yet know, as the patch has not been committed.


My intention at least this time around has been not to backport.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_length(anyarray)

2014-01-20 Thread Marko Tiikkaja


On 1/20/14 2:29 PM, Dean Rasheed wrote:

I think this is ready for committer


Thanks!


... although I would also like to see
the doc changes to make the table of array function descriptions a bit
more explicit about corner cases.


Hmm.  I completely missed the fact that unnest() already uses a 
structure similar to yours.  It looks like e.g. window functions do the 
same, but JSON functions all have proper capitalization and periods, and 
some others capitalize but omit periods.


I could submit a separate patch to describe array functions in a bit 
more detail, unless you wanted to do that?  I'm not planning on fixing 
the inconsistencies, though, despite them annoying me.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_length(anyarray)

2014-01-20 Thread Dean Rasheed

On 20 January 2014 13:47, Marko Tiikkaja ma...@joh.to wrote:
 On 1/20/14 2:29 PM, Dean Rasheed wrote:

 I think this is ready for committer


 Thanks!

 ... although I would also like to see

 the doc changes to make the table of array function descriptions a bit
 more explicit about corner cases.


 Hmm.  I completely missed the fact that unnest() already uses a structure
 similar to yours.  It looks like e.g. window functions do the same, but JSON
 functions all have proper capitalization and periods, and some others
 capitalize but omit periods.

 I could submit a separate patch to describe array functions in a bit more
 detail,

Yes please. I think that would be helpful.


 I'm not planning on fixing the
 inconsistencies, though, despite them annoying me.


To be honest, I hadn't even noticed those inconsistencies. The main
thing is to alert new users to the fact that empty arrays behave in a
rather odd way for a couple of those functions.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

On 20 January 2014 14:24, Stephen Frost sfr...@snowman.net wrote:
 * Simon Riggs (si...@2ndquadrant.com) wrote:
 The command uses the word ALL but then less than all objects, i.e.
 only moves objects that are owned by the user.

 My thinking was that it was all from that user's perspective.

 I would like to see two variants of this...

 ALL ... which attempts to move all objects and fails if it doesn't own
 everything
 ALL OWNED ... which moves only objects that it owns, and ignores others

 I could add that, though it feels like the next request would be to
 allow a specific role to be passed in (ie: move all of *this* user's
 objects) and I'm not sure we really need to go to that level.  It
 doesn't seem like there's really much point in having two options
 either- ALL OWNED run by the superuser would be identical to ALL and
 normal users would have zero use for just ALL because it would either
 be identical to ALL OWNED or it would fail with a permission denied
 error.

 If an extra noise word to clarify what is happening would be useful,
 then I could simply require OWNED as well, but I'm not particularly
 thrilled with that option, also ...

 i.e. ALL should mean all

 This is a bit of a non-starter when it comes to tablespaces anyway- we
 can't move another database's objects and so even if it was ALL, it
 may only be moving a subset of the objects in the tablespace (namely
 those which are in the current database).  I don't see it being an
 improvement to require IN CURRENT DATABASE ALL OWNED even though it
 would be more accurate.

Not a good argument since IN CURRENT DATABASE applies to all SQL
commands, so would clearly be unnecessary.

At the moment, ALL does not include all objects. It's a POLA violation
to have a command affect just some objects and not others. That is
especially confusing when the command run as Superuser *will* move all
objects and a RC of zero has different meaning dependent upon who the
user is that executes the command.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-20 Thread Florian Pflug

On Jan20, 2014, at 08:42 , David Rowley dgrowle...@gmail.com wrote:
 On Mon, Jan 20, 2014 at 2:45 PM, Florian Pflug f...@phlo.org wrote:
 * An assert that the frame end doesn't move backwards - I realized that
   it is after all easy to do that, if it's done after the loop which adds
   the new values, not before.
 
 I've applied this too, but I'm wondering why an elog for if the head moves
 back, but an assert if the tail moves back? 

When I put the frame head check in, I was concerned that the code might crash
or loop endlessly if aggregatedbase was ever larger than frameheadpos, so
I made it elog(), just for safety.

The frame end check, OTOH, is done at the very end, so it doesn't protect
against much, it just documents that it's not supposed to happen. 

But yeah, it's bit weird. Feel free to turn the frame end check into an
elog() too if you want to.

 * I've also renamed INVFUNC to INVSFUNC. That's a pretty invasive change, and
   it's the last commit, so if you object to that, then you can merge up to
   eafa72330f23f7c970019156fcc26b18dd55be27 instead of
   de3d9148be9732c4870b76af96c309eaf1d613d7.
 
 
 Seems like sfunc really should be tfunc then we could have invtfunc. I'd 
 probably
 understand this better if I knew what the 's' was for in sfunc. I've not 
 applied
 this just yet. Do you have a reason why you think it's better?

My issue with just invfunc is mainly that it's too generic - it doesn't tell
you what it's supposed to be the inverse of.

I've always assumed that 's' in 'sfunc' and 'stype' stands for 'state', and that
the naming is inspired by control theory, where the function which acts on the
state space is often called S.

 Thanks, yeah those really do need a review. I've lost a bit of direction with
 them and I'm not quite sure just how much detail to go in to with it. I'd like
 to explain a bit that users who need to use their numeric columns in window
 aggregates might want to think about having a defined scale to the numeric 
 rather
 than an undefined scale and explain that this is because the inverse 
 transition
 function for numeric bails out if it loses the maximum seen dscale. Though it
 does seem generally a good idea to have a defined scale, but then I guess 
 you've
 got to have a bit of knowledge about the numbers you're storing in that case.
 I'm not quite sure how to put that into words friendly enough for the 
 documents
 just yet and or exactly where to put the words. So any ideas or patches you 
 have
 around that would be great.

Here's what I image the docs should look like, very roughly

* In CREATE AGGREGATE, we should state the precise axioms we assume about 
forward
and inverse transition functions. The last time I read the text there, it was
a bit ambiguous about whether inverse transition functions assume commutativity,
i.e. whether we assume that we can remove inputs other than the first one from
the aggregation. Currently, all the inverse transition functions we have are,
in fact, commutative, because all the forward transition function are also. But
we could e.g. add an inverse to array_agg and string_agg, and those would
obviously need this ordering guarantee. I'd also like us to explain that the
inverse in inverse transition function shouldn't be taken too literally. For
forward transition function F, inverse transition function G, and inputs a,b,...
we *don't require that G(F(s,a),a) == s. We we do require is that if I is the
initial state, then

  G(F(...F(F(I,a),b)...,z),a) == F(...F(I,b)...,z).

(Well, actually we don't strictly require even that, the two states don't
need to be identical, they just need to produce identical outputs when passed
to the final function)

* In CREATE AGGREGATE, we should also explain the NULL semantics you get
with various combinations of strict and non-strict forward and inverse
transition functions. I think a table listing all the combinations is in order
there, with a column explaining the semantics you get. We should also clearly
state that once you provide an inverse transition function, NULL isn't a valid
state value anymore, except as an initial state, i.e. that the forward 
transition
function must never return NULL in this case.

* The window function page should explain the performance hazards when
a moving frame head is involved. Ideally, we'd list which aggregates never
have to restart, and which sometimes might, and what you can do to avoid that.
We can also tell people to use EXPLAIN VERBOSE ANALYZE to check whether
restarts are occurring. This is were we'd tell people to cast their numeric
operands to a type with defined scale to avoid restarts, and were we'd state
the SUM(float) *does* restart always due to precision loss issues.

BTW, something that we haven't addressed at all is how inverse transition
functions interact with what is called ordered-set aggregates. I haven't
wrapped my head around these fully, I think, so I'm still not sure if there's
anything to do there or not. Can those even be

Re: [HACKERS] plpgsql.warn_shadow

2014-01-20 Thread Florian Pflug

On Jan20, 2014, at 14:05 , Marko Tiikkaja ma...@joh.to wrote:
 On 1/20/14 1:55 PM, Robert Haas wrote:
 On Mon, Jan 20, 2014 at 7:16 AM, Marko Tiikkaja ma...@joh.to wrote:
 What's so hard about plpgsql.warnings='all'?  Or if the fact that it's a
 list is your concern, I'm not going to oppose to making it a boolean.
 
 Sure, that'd be fine.  What I don't want is to have to start each function 
 with:
 
 #option warn_this
 #option warn_that
 #option warn_theotherthing
 #option warn_somethingelse
 #option warn_yetanotherthing
 #option warn_whatdoesthisdoagain
 
 Right.  Completely agreed.  The only reason I had them in the patch is to 
 have the
 ability to turn *off* a specific warning for a particular function.  But even
 that's of a bit dubious a value.

I think as long as there's an easy way to avoid a warning - in the case of
warn_shadow e.g. by renaming the shadowing variable - there's no real 
requirement
to be able to disable the warning on a per-function basis.

And if there isn't a simple way to avoid a particular warning then we
ought not warn about it anyway, I guess, because that would indicate that there
are genuine reasons for doing whatever it is the warning complains about.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Mel Gorman

On Mon, Jan 20, 2014 at 10:51:41AM +1100, Dave Chinner wrote:
 On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
  On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote:
   it's very common to create temporary file data that will never, ever, ever
   actually NEED to hit disk. Where I work being able to tell the kernel to
   avoid flushing those files unless the kernel thinks it's got better things
   to do with that memory would be EXTREMELY valuable
  
  Windows has the FILE_ATTRIBUTE_TEMPORARY flag for this purpose.
  
  ISTR that there was discussion about implementing something analogous
  in Linux when ext4 got delayed allocation support, but I don't think
  it got anywhere and I can't find the discussion now. I think the
  proposed interface was to create and then unlink the file immediately,
  which serves as a hint that the application doesn't care about
  persistence.
 
 You're thinking about O_TMPFILE, which is for making temp files that
 can't be seen in the filesystem namespace, not for preventing them
 from being written to disk.
 
 I don't really like the idea of overloading a namespace directive to
 have special writeback connotations. What we are getting into the
 realm of here is generic user controlled allocation and writeback
 policy...
 

Such overloading would be unwelcome. FWIW, I assumed this would be an
fadvise thing. Initially something that controlled writeback on an inode
and not an fd context that ignored the offset and length parameters.
Granded, someone will probably throw a fit about adding a Linux-specific
flag to the fadvise64 syscall. POSIX_FADV_NOREUSE is currently unimplemented
and it could be argued that it could be used to flag temporary files that
have a different writeback policy but it's not clear if that matches the
original intent of the posix flag.

  Postgres is far from being the only application that wants this; many
  people resort to tmpfs because of this:
  https://lwn.net/Articles/499410/
 
 Yes, we covered the possibility of using tmpfs much earlier in the
 thread, and came to the conclusion that temp files can be larger
 than memory so tmpfs isn't the solution here. :)
 

And swap IO patterns blow chunks because people rarely want to touch
that area of the code with a 50 foot pole. It gets filed under if you're
swapping, you already lost

-- 
Mel Gorman
SUSE Labs


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] change alter user to be a true alias for alter role

2014-01-20 Thread Jov

the doc say:

 ALTER USER is now an alias for ALTER 
 ROLEhttp://www.postgresql.org/docs/devel/static/sql-alterrole.html
 .


but alter user lack the following format:

 ALTER ROLE name [ IN DATABASE database_name ] SET configuration_parameter{ TO 
 | = } {
 value | DEFAULT }
 ALTER ROLE { name | ALL } [ IN DATABASE database_name ] SET
 configuration_parameter FROM CURRENT
 ALTER ROLE { name | ALL } [ IN DATABASE database_name ] RESET
 configuration_parameter
 ALTER ROLE { name | ALL } [ IN DATABASE database_name ] RESET ALL

attach patch add the per database set for alter user:
 ALTER
USER
 name [ IN DATABASE database_name ] SET configuration_parameter { TO | = }
{ value | DEFAULT }
 ALTER
USER
 { name | ALL } [ IN DATABASE database_name ] SET configuration_parameter FROM
CURRENT
 ALTER
USER
 { name | ALL } [ IN DATABASE database_name ] RESET configuration_parameter
 ALTER
USER
 { name | ALL } [ IN DATABASE database_name ] RESET ALL


this make alter user a true alias for alter role.

Jov
blog: http:amutu.com/blog http://amutu.com/blog


full-imp-alter-user-v1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

* Simon Riggs (si...@2ndquadrant.com) wrote:
 Not a good argument since IN CURRENT DATABASE applies to all SQL
 commands, so would clearly be unnecessary.

I suppose it depends on how you're looking at it.

ALTER TABLESPACE ... RENAME, for example, updates a shared catalog and
therefore the change is seen across all databases.  That's not exactly
IN CURRENT DATABASE.

 At the moment, ALL does not include all objects. It's a POLA violation
 to have a command affect just some objects and not others. That is
 especially confusing when the command run as Superuser *will* move all
 objects and a RC of zero has different meaning dependent upon who the
 user is that executes the command.

So you're still looking for an 'OWNED' noise word to be added?  Also, I
did add the ability to specify types of objects (it's often that we'll
have a INDEXES tablespace, so this made sense), so how about:

ALTER TABLESPACE name MOVE OWNED TO name opt_nowait
ALTER TABLESPACE name MOVE TABLES OWNED TO name opt_nowait
ALTER TABLESPACE name MOVE INDEXES OWNED TO name opt_nowait
ALTER TABLESPACE name MOVE MATERIALIZED VIEWS OWNED TO name opt_nowait

Removing the 'ALL' entirely?

Should there be an OWNED BY name_list option also, since that's how we
use 'OWNED' elsewhere?  Should the use of OWNED elsewhere (eg:
REASSIGN OWNED BY) also support just 'OWNED' to mean the current role
(I'm not entirely sure how much sense that makes, but figured I'd ask).

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Mel Gorman

On Fri, Jan 17, 2014 at 03:24:01PM -0500, Gregory Smith wrote:
 On 1/17/14 10:37 AM, Mel Gorman wrote:
 There is not an easy way to tell. To be 100%, it would require an
 instrumentation patch or a systemtap script to detect when a
 particular page is being written back and track the context. There
 are approximations though. Monitor nr_dirty pages over time.
 
 I have a benchmarking wrapper for the pgbench testing program called
 pgbench-tools:  https://github.com/gregs1104/pgbench-tools  As of
 October, on Linux it now plots the Dirty value from /proc/meminfo
 over time.
 SNIP

Cheers for pointing that out, I was not previously aware of its
existence. While I have some support for running pgbench via another kernel
testing framework (mmtests) the postgres-based tests are miserable. Right
now for me, pgbench is only setup to reproduce a workload that detected a
scheduler regression in the past so that it does not get reintroduced. I'd
like to have it running IO-based tests even though I typically do not
do proper regression testing for IO. I have used sysbench as a workload
generator before but it's not great for a number of reasons.

 I've been working on the problem of how we can make a benchmark test
 case that acts enough like real busy PostgreSQL servers that we can
 share it with kernel developers, and then everyone has an objective
 way to measure changes.  These rate limited tests are working much
 better for that than anything I came up with before.
 

This would be very welcome and thanks for the other observations on IO
scheduler parameter tuning. They could potentially be used to evalate any IO
scheduler changes. For example -- deadline scheduler with these parameters
has X transactions/sec throughput with average latency of Y millieseconds
and a maximum fsync latency of Z seconds. Evaluate how well the out-of-box
behaviour compares against it with and without some set of patches.  At the
very least it would be useful for tracking historical kernel performance
over time and bisecting any regressions that got introduced. Once we have
a test I think many kernel developers (me at least) can run automated
bisections once a test case exists.

-- 
Mel Gorman
SUSE Labs


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Jeff Layton

On Mon, 20 Jan 2014 10:51:41 +1100
Dave Chinner da...@fromorbit.com wrote:

 On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
  On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby j...@nasby.net wrote:
   it's very common to create temporary file data that will never, ever, ever
   actually NEED to hit disk. Where I work being able to tell the kernel to
   avoid flushing those files unless the kernel thinks it's got better things
   to do with that memory would be EXTREMELY valuable
  
  Windows has the FILE_ATTRIBUTE_TEMPORARY flag for this purpose.
  
  ISTR that there was discussion about implementing something analogous
  in Linux when ext4 got delayed allocation support, but I don't think
  it got anywhere and I can't find the discussion now. I think the
  proposed interface was to create and then unlink the file immediately,
  which serves as a hint that the application doesn't care about
  persistence.
 
 You're thinking about O_TMPFILE, which is for making temp files that
 can't be seen in the filesystem namespace, not for preventing them
 from being written to disk.
 
 I don't really like the idea of overloading a namespace directive to
 have special writeback connotations. What we are getting into the
 realm of here is generic user controlled allocation and writeback
 policy...
 

Agreed -- O_TMPFILE semantics are a different beast entirely.

Perhaps what might be reasonable though is a fadvise POSIX_FADV_TMPFILE
hint that tells the kernel: Don't write out this data unless it's
necessary due to memory pressure.

If the inode is only open with file descriptors that have that hint
set on them. Then we could exempt it from dirty_expire_interval and
dirty_writeback_interval?

Tracking that desire on an inode open multiple times might be
interesting though. We'd have to be quite careful not to allow that
to open an attack vector.

  Postgres is far from being the only application that wants this; many
  people resort to tmpfs because of this:
  https://lwn.net/Articles/499410/
 
 Yes, we covered the possibility of using tmpfs much earlier in the
 thread, and came to the conclusion that temp files can be larger
 than memory so tmpfs isn't the solution here. :)
 

-- 
Jeff Layton jlay...@redhat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NOT Null constraint on foreign table not working

Rushabh Lathia rushabh.lat...@gmail.com writes:
 As per the PG documentation it says that foreign table do support the
 NOT NULL, NULL and DEFAULT.

There has been a great deal of debate about what constraints on foreign
tables ought to mean.  Right now, at least for postgres_fdw, they're just
taken as documentation of constraints that are supposed to exist on the
far side.  It's not clear what's the point of trying to enforce them
against insertions done locally if the remote table lacks them --- any
table update done on the far side could still violate the constraint.

We might change this in response to a well-reasoned argument, but it won't
happen just because somebody lobs a quick-and-dirty patch over the fence.

If we were going to enforce them locally, I'd opine it should be the FDW's
task to do it, anyway.  It might have more knowledge about the best way
to do it than nodeModifyTable.c can, and if not it could still call
ExecConstraints for itself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

Stephen Frost sfr...@snowman.net writes:
 So you're still looking for an 'OWNED' noise word to be added?  Also, I
 did add the ability to specify types of objects (it's often that we'll
 have a INDEXES tablespace, so this made sense), so how about:

 ALTER TABLESPACE name MOVE OWNED TO name opt_nowait
 ALTER TABLESPACE name MOVE TABLES OWNED TO name opt_nowait
 ALTER TABLESPACE name MOVE INDEXES OWNED TO name opt_nowait
 ALTER TABLESPACE name MOVE MATERIALIZED VIEWS OWNED TO name opt_nowait

 Removing the 'ALL' entirely?

What if you're a superuser and you want to move everybody's objects
(perhaps in preparation for dropping the tablespace)?  I think there's
value in both the ALL and OWNED forms.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

On 20 January 2014 15:46, Stephen Frost sfr...@snowman.net wrote:

 So you're still looking for an 'OWNED' noise word to be added?

To clarify what the command is actually doing.


 Also, I
 did add the ability to specify types of objects (it's often that we'll
 have a INDEXES tablespace, so this made sense), so how about:

 ALTER TABLESPACE name MOVE OWNED TO name opt_nowait

The ALL seems to have value. MOVE ALL OWNED TO sounds better.


 ALTER TABLESPACE name MOVE TABLES OWNED TO name opt_nowait
 ALTER TABLESPACE name MOVE INDEXES OWNED TO name opt_nowait

On those two, I think the docs need to be clearer that we mean that
TABLES means tables, and yes we leave the indexes behind. Or that
INDEXES means and we leave the tables behind. This is intended to
more easily separate tables and indexes into their own tablespaces.
or similar.


 ALTER TABLESPACE name MOVE MATERIALIZED VIEWS OWNED TO name opt_nowait

 Removing the 'ALL' entirely?

 Should there be an OWNED BY name_list option also, since that's how we
 use 'OWNED' elsewhere?  Should the use of OWNED elsewhere (eg:
 REASSIGN OWNED BY) also support just 'OWNED' to mean the current role
 (I'm not entirely sure how much sense that makes, but figured I'd ask).

Maybe.

I'm not clamouring for squeezing additional goodies from you, just to
make a small change to avoid later confusion (for ALL users ;-) )

Good feature, thanks for working on it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 What if you're a superuser and you want to move everybody's objects
 (perhaps in preparation for dropping the tablespace)?  I think there's
 value in both the ALL and OWNED forms.

A superuser is considered to 'own' all objects and so 'ALL' and 'OWNED'
above would be the same when issued by a superuser, in the current
implementation.

Looking at DROP OWNED and REASSIGN OWNED, they operate at the more
specific level of OWNED == relowner rather than if the role is
considered an 'owner' of the object through role membership, as you are
implying above.

As such, I'll rework this to be more in-line with the existing OWNED BY
semantics of REASSIGN OWNED BY and DROP OWNED BY, which means we'd have:

ALTER TABLESPACE name MOVE [ ALL | OWNED [ BY reluser ] ]
[ TABLES | INDEXES | MATERIALIZED VIEWS ] TO name opt_nowait

eg:

ALTER TABLESPACE tblspc1 MOVE ALL TO tblspc2;
ALTER TABLESPACE tblspc1 MOVE OWNED TO tblspc2;
ALTER TABLESPACE tblspc1 MOVE OWNED BY myrole TO tblspc2;
ALTER TABLESPACE tblspc1 MOVE TABLES OWNED BY myrole TO tblspc2;
ALTER TABLESPACE tblspc1 MOVE ALL OWNED BY myrole TO tblspc2;

Thoughts?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

* Simon Riggs (si...@2ndquadrant.com) wrote:
  ALTER TABLESPACE name MOVE OWNED TO name opt_nowait
 
 The ALL seems to have value. MOVE ALL OWNED TO sounds better.

I could go either way on this, really.

  ALTER TABLESPACE name MOVE TABLES OWNED TO name opt_nowait
  ALTER TABLESPACE name MOVE INDEXES OWNED TO name opt_nowait
 
 On those two, I think the docs need to be clearer that we mean that
 TABLES means tables, and yes we leave the indexes behind. Or that
 INDEXES means and we leave the tables behind. This is intended to
 more easily separate tables and indexes into their own tablespaces.
 or similar.

Sure, I can certainly improve the documentation on that.

 I'm not clamouring for squeezing additional goodies from you, just to
 make a small change to avoid later confusion (for ALL users ;-) )

:)  What are your thoughts on what I just proposed to Tom?

 Good feature, thanks for working on it.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Backup throttling

2014-01-20 Thread Antonin Houska

On 01/15/2014 10:52 PM, Alvaro Herrera wrote:
 I gave this patch a look.  There was a bug that the final bounds check
 for int32 range was not done when there was no suffix, so in effect you
 could pass numbers larger than UINT_MAX and pg_basebackup would not
 complain until the number reached the server via BASE_BACKUP.  Maybe
 that's fine, given that numbers above 1G will cause a failure on the
 server side anyway, but it looked like a bug to me.  I tweaked the parse
 routine slightly; other than fix the bug, I made it accept fractional
 numbers, so you can say 0.5M for instance.

Thanks.

 Perhaps we should make sure pg_basebackup rejects numbers larger than 1G
 as well.

Is there a good place to define the constant, so that both backend and
client can use it? I'd say 'include/common' but no existing file seems
to be appropriate. I'm not sure if it's worth to add a new file there.

 Another thing I found a bit strange was the use of the latch.  What this
 patch does is create a separate latch which is used for the throttling.
 This means that if the walsender process receives a signal, it will not
 wake up if it's sleeping in throttling.  Perhaps this is okay: as Andres
 was quoted upthread as saying, maybe this is not a problem because the
 sleep times are typically short anyway.  But we're pretty much used to
 the idea that whenever a signal is sent, processes act on it
 *immediately*.  Maybe some admin will not feel comfortable about waiting
 some extra 20ms when they cancel their base backups.  In any case,
 having a secondary latch to sleep on in a process seems weird.  Maybe
 this should be using MyWalSnd-latch somehow.

o.k., MyWalSnd-latch is used now.

 You have this interesting THROTTLING_MAX_FREQUENCY constant defined to
 128, with the comment check this many times per second.
 Let's see: if the user requests 1MB/s, this value results in
 throttling_sample = 1MB / 128 = 8192.  So for every 8kB transferred, we
 would stop, check the wall clock time, and if less time has lapsed than
 we were supposed to spend transferring those 8kB then we sleep.  Isn't a
 check every 8kB a bit too frequent?  This doesn't seem sensible to me.
 I think we should be checking perhaps every tenth of the requested
 maximum rate, or something like that, not every 1/128th.
 
 Now, what the code actually does is not necessarily that, because the
 sampling value is clamped to a minimum of 32 kB.  But then if we're
 going to do that, why use such a large divisor value in the first place?
 I propose we set that constant to a smaller value such as 8.

I tried to use THROTTLING_SAMPLE_MIN and THROTTLING_MAX_FREQUENCY to
control both the minimum and maximum chunk size. It was probably too
generic, THROTTLING_SAMPLE_MIN is no longer there.

New patch version is attached.

// Antonin Houska (Tony)
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 7d99976..799d214 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1720,7 +1720,7 @@ The commands accepted in walsender mode are:
   /varlistentry
 
   varlistentry
-termBASE_BACKUP [literalLABEL/literal replaceable'label'/replaceable] [literalPROGRESS/literal] [literalFAST/literal] [literalWAL/literal] [literalNOWAIT/literal]/term
+termBASE_BACKUP [literalLABEL/literal replaceable'label'/replaceable] [literalPROGRESS/literal] [literalFAST/literal] [literalWAL/literal] [literalNOWAIT/literal] [literalMAX_RATE/literal]/term
 listitem
  para
   Instructs the server to start streaming a base backup.
@@ -1788,7 +1788,23 @@ The commands accepted in walsender mode are:
   the waiting and the warning, leaving the client responsible for
   ensuring the required log is available.
  /para
- /listitem
+/listitem
+   /varlistentry
+   varlistentry
+termliteralMAX_RATE/literal replaceablerate//term
+listitem
+ para
+  Limit (throttle) the maximum amount of data transferred from server
+  to client per unit of time. The expected unit is bytes per second.
+  If this option is specified, the value must either be equal to zero
+  or it must fall within the range from 32 kB through 1 GB (inclusive).
+  If zero is passed or the option is not specified, no restriction is
+  imposed on the transfer.
+ /para
+ para
+  literalMAX_RATE/literal does not affect WAL streaming.
+ /para
+/listitem
/varlistentry
   /variablelist
  /para
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c379df5..2ec81b7 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -189,6 +189,27 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-r replaceable class=parameterrate/replaceable/option/term
+  termoption--max-rate=replaceable

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-01-20 Thread Robert Haas

On Thu, Jan 16, 2014 at 12:07 AM, Amit Kapila amit.kapil...@gmail.com wrote:
Okay, got your point.
Another minor thing is that in latest patch which I have sent yesterday,
I have modified it such that while formation of chunks if there is a data
at end of string which doesn't have special pattern and is less than max
chunk size, we also consider that as a chunk. The reason of doing this
was that let us say if we have 104 bytes string which contains no special
bit pattern, then it will just have one 64 byte chunk and will leave the
remaining bytes, which might miss the chance of doing compression for
that data.

I ran Heikki's test suit on latest master and latest master plus
pgrb_delta_encoding_v4.patch on a PPC64 machine, but the results
didn't look too good.  The only tests where the WAL volume changed by
more than half a percent were the one short and one long field, no
change test, where it dropped by 17%, but at the expense of an
increase in duration of 38%; and the hundred tiny fields, half
nulled test, where it dropped by 2% without a change in runtime.
Unfortunately, some of the tests where WAL didn't change significantly
took a runtime hit - in particular, hundred tiny fields, half
changed slowed down by 10% and hundred tiny fields, all changed by
8%.  I've attached the full results in OpenOffice format.

Profiling the one short and one long field, no change test turns up
the following:

51.38% postgres  pgrb_delta_encode
23.58% postgres  XLogInsert
 2.54% postgres  heap_update
 1.09% postgres  LWLockRelease
 0.90% postgres  LWLockAcquire
 0.89% postgres  palloc0
 0.88% postgres  log_heap_update
 0.84% postgres  HeapTupleSatisfiesMVCC
 0.75% postgres  ExecModifyTable
 0.73% postgres  hash_search_with_hash_value

Yipes.  That's a lot more than I remember this costing before.  And I
don't understand why I'm seeing such a large time hit on this test
where you actually saw a significant time *reduction*.  One
possibility is that you may have been running with a default
checkpoint_segments value or one that's low enough to force
checkpointing activity during the test.  I ran with
checkpoint_segments=300.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgrb_delta_encoding_v4.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLESPACE ... MOVE ALL TO ...

On 20 January 2014 17:00, Stephen Frost sfr...@snowman.net wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 What if you're a superuser and you want to move everybody's objects
 (perhaps in preparation for dropping the tablespace)?  I think there's
 value in both the ALL and OWNED forms.

 A superuser is considered to 'own' all objects and so 'ALL' and 'OWNED'
 above would be the same when issued by a superuser, in the current
 implementation.

 Looking at DROP OWNED and REASSIGN OWNED, they operate at the more
 specific level of OWNED == relowner rather than if the role is
 considered an 'owner' of the object through role membership, as you are
 implying above.

 As such, I'll rework this to be more in-line with the existing OWNED BY
 semantics of REASSIGN OWNED BY and DROP OWNED BY, which means we'd have:

 ALTER TABLESPACE name MOVE [ ALL | OWNED [ BY reluser ] ]
 [ TABLES | INDEXES | MATERIALIZED VIEWS ] TO name opt_nowait

 eg:

 ALTER TABLESPACE tblspc1 MOVE ALL TO tblspc2;
 ALTER TABLESPACE tblspc1 MOVE OWNED TO tblspc2;
 ALTER TABLESPACE tblspc1 MOVE OWNED BY myrole TO tblspc2;
 ALTER TABLESPACE tblspc1 MOVE TABLES OWNED BY myrole TO tblspc2;
 ALTER TABLESPACE tblspc1 MOVE ALL OWNED BY myrole TO tblspc2;

Sounds great, thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] truncating pg_multixact/members

Robert Haas escribió:
 On Fri, Jan 3, 2014 at 9:11 AM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:

 Yeah, this stuff is definitely underdocumented relative to vacuum right now.

I have added a paragraph or two.  It's a (probably insufficient) start.
I would like to add a sample query to monitor usage, but I just realize
we don't have a function such as age(xid) to expose this info usefully.
We can't introduce one in 9.3 now, but probably we should do so in HEAD.

 Also, while multixactid_freeze_min_age should be low, perhaps a
 million as you suggest, multixactid_freeze_table_age should NOT be
 lowered to 3 million or anything like it.  If you do that, people who
 are actually doing lots of row locking will start getting many more
 full-table scans.  We want to avoid that at all cost.  I'd probably
 make the default the same as for vacuum_freeze_table_age, so that
 mxids only cause extra full-table scans if they're being used more
 quickly than xids.

I agree that the freeze_table limit should not be low, but 150 million
seems too high.  Not really sure what's a good value here.

Here's a first cut at this.  Note I have omitted a setting equivalent to
autovacuum_freeze_max_age, but I think we should have one too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 5148,5153  COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
--- 5148,5168 
/listitem
   /varlistentry
  
+  varlistentry id=guc-multixact-freeze-table-age xreflabel=multixact_freeze_table_age
+   termvarnamemultixact_freeze_table_age/varname (typeinteger/type)/term
+   indexterm
+primaryvarnamemultixact_freeze_table_age/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ commandVACUUM/ performs a whole-table scan if the table's
+ structnamepg_class/.structfieldrelminmxid/ field has reached
+ the age specified by this setting.  The default is 5 million multixacts.
+ For more information see xref linkend=multixact-wraparound.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-vacuum-freeze-min-age xreflabel=vacuum_freeze_min_age
termvarnamevacuum_freeze_min_age/varname (typeinteger/type)/term
indexterm
***
*** 5169,5174  COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
--- 5184,5205 
/listitem
   /varlistentry
  
+  varlistentry id=guc-multixact-freeze-min-age xreflabel=multixact_freeze_min_age
+   termvarnamemultixact_freeze_min_age/varname (typeinteger/type)/term
+   indexterm
+primaryvarnamemultixact_freeze_min_age/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the cutoff age (in multixacts) that commandVACUUM/
+ should use to decide whether to replace multixact IDs with a newer
+ transaction ID or multixact ID while scanning a table.  The default
+ is 1 million multixacts.
+ For more information see xref linkend=multixact-wraparound.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-bytea-output xreflabel=bytea_output
termvarnamebytea_output/varname (typeenum/type)/term
indexterm
*** a/doc/src/sgml/maintenance.sgml
--- b/doc/src/sgml/maintenance.sgml
***
*** 599,604  HINT:  Stop the postmaster and use a standalone backend to VACUUM in mydb.
--- 599,632 
  page for details about using a single-user backend.
 /para
  
+sect3 id=multixact-wraparound
+ 	titleMultixacts and Wraparound/title
+ 
+ 	indexterm
+ 	 primaryMultixact ID/primary
+ 	 secondarywraparound/secondary
+ 	/indexterm
+ 	!-- how about another index entry primary=wraparound,
+ 		 secondary=multixact, and the same for xids? --
+ 
+ 	para
+ 	 Similar to transaction IDs, Multixact IDs are implemented as a 32-bit
+ 	 counter and corresponding storage which requires careful aging management,
+ 	 storage cleanup, and wraparound handling.  Multixacts are used to implement
+ 	 row locking by multiple transactions: since there is limited space in the
+ 	 tuple header to store lock information, that information is stored separately
+ 	 and only a reference to it is in the tuple header.  As with transaction IDs,
+ 	 commandVACUUM/ is in charge of removing old values.  Each
+ 	 commandVACUUM/ run sets a mark in each table that indicates what's the
+ 	 oldest possible value still stored in it; every time this value is older than
+ 	 xref linkend=guc-multixact-freeze-table-age, a full-table scan is forced.
+ 	 Any Multixact older than xref linkend=guc-multixact-freeze-min-age is
+ 	 replaced by something else, which can be the zero value, a lone transaction ID,
+ 	 or a newer Multixact.  Eventually, as all tables in all databases have been
+ 	 scanned

[HACKERS] Comment typo in src/backend/command/cluster.c

2014-01-20 Thread Sawada Masahiko

Hi all,

Attached patch fixes the typo which is in src/backend/command/cluster.c.

Regards,

---
Sawada Masahiko


fix_typo-cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comment typo in src/backend/command/cluster.c

2014-01-20 Thread Fujii Masao

On Tue, Jan 21, 2014 at 2:13 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 Hi all,

 Attached patch fixes the typo which is in src/backend/command/cluster.c.

Thanks! Committed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

On 17 January 2014 16:34, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 01/17/2014 05:20 PM, Simon Riggs wrote:

 +   if (RelationNeedsWAL(indexrel))
 +   CHECK_FOR_WAL_BUDGET();


 I don't think we need the RelationNeedsWAL tests. If the relation is not
 WAL-logged, you won't write much WAL, so you should easily stay under the
 limit. And CHECK_FOR_WAL_BUDGET() better be cheap when you're below the
 limit.

OK, I'll remove them, thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

On 17 January 2014 16:10, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2014-01-17 09:04:54 -0500, Robert Haas wrote:
 That having been said, I bet it could be done at the tail of
 XLogInsert().

 I don't think there are many locations where this would be ok. Sleeping
 while holding exclusive buffer locks? Quite possibly inside a criticial
 section?

 More or less by definition, you're always doing both when you call
 XLogInsert.

 Surely not.

 I agree.  It's got to be somewhere further up the call stack.

Definitely.

 I'm inclined to think that what we ought to do is reconceptualize
 vacuum_delay_point() as something a bit more generic, and sprinkle
 calls to it in a few more places than now.

Agreed; that was the original plan, but implementation delays
prevented the whole vision/discussion/implementation. Requirements
from various areas include WAL rate limiting for replication, I/O rate
limiting, hard CPU and I/O limits for security and mixed workload
coexistence.

I'd still like to get something on this in 9.4 that alleviates the
replication issues, leaving wider changes for later releases.

The vacuum_* parameters don't allow any control over WAL production,
which is often the limiting factor. I could, for example, introduce a
new parameter for vacuum_cost_delay that provides a weighting for each
new BLCKSZ chunk of WAL, then rename all parameters to a more general
form. Or I could forget that and just press ahead with the patch as
is, providing a cleaner interface in next release.

 It's also interesting to wonder about the relationship to
 CHECK_FOR_INTERRUPTS --- although I think that currently, we assume
 that that's *cheap* (1 test and branch) as long as nothing is pending.
 I don't want to see a bunch of arithmetic added to it.

Good point.

I'll call these new calls CHECK_FOR_RESOURCES() to allow us to
implement various kinds of resource checking in future.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Rate Limiting

On 17 January 2014 16:30, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 It occurs to me that this is very similar to the method I proposed in June
 to enforce a hard limit on WAL usage, to avoid PANIC caused by running out
 of disk space when writing WAL:

 http://www.postgresql.org/message-id/51b095fe.6050...@vmware.com

 Enforcing a global limit needs more book-keeping than rate limiting
 individual sesssions. But I'm hoping that the CHECK_WAL_BUDGET() calls could
 be used for both purposes.

We can't set a useful hard limit on WAL by default without that being
a bigger foot gun than what we have now.

If we did have a limit, I would look to make it less exact and take it
out of the main path for example by checkpointer - or perhaps only
make the check when we switch xlog files.

So I don't want to include that requirement into this current work.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

2014-01-20 Thread Heikki Linnakangas


On 01/17/2014 08:49 PM, Alexander Korotkov wrote:

On Fri, Jan 17, 2014 at 10:38 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:


On 01/17/2014 01:05 PM, Alexander Korotkov wrote:


Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.


I'm looking into this now. A few quick notes:

* Even when you don't re-encode the whole page, you still WAL-log the
whole page. While correct, it'd be a pretty obvious optimization to only
WAL-log the modified part.


Oh, I overlooked it. I wrote code in ginRedoInsertData which finds
appropriate place fro data but didn't notice that I still wrote whole page
to xlog record.


Yeah. I think ginRedoInsertData was too cute to be bug-free. I added an 
explicit unmodifiedsize field to the WAL record, so that 
ginRedoInsertData doesn't need to calculate it.



* When new items are appended to the end of the page so that only the last
existing compressed segment is re-encoded, and the page has to be split,
the items aren't divided 50/50 on the pages. The loop that moves segments
destined for the left page to the right won't move any existing, untouched,
segments.


I think this loop will move one segment. However, it's too small.


Implemented this.

I noticed that the gin vacuum redo routine is dead code, except for the 
data-leaf page handling, because we never remove entries or internal 
nodes (page deletion is a separate wal record type). And the data-leaf 
case is functionally equivalent to heap newpage records. I removed the 
dead code and made it more clear that it resembles heap newpage.


Attached is a yet another version, with more bugs fixed and more 
comments added and updated. I would appreciate some heavy-testing of 
this patch now. If you could re-run the tests you've been using, that 
could be great. I've tested the WAL replay by replicating GIN operations 
over streaming replication. That doesn't guarantee it's correct, but 
it's a good smoke test.


- Heikki


gin-packed-postinglists-20140120.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] truncating pg_multixact/members

Alvaro Herrera escribió:

 Here's a first cut at this.  Note I have omitted a setting equivalent to
 autovacuum_freeze_max_age, but I think we should have one too.

Some more comments on the patch:

* I haven't introduced settings to tweak this per table for
autovacuum.  I don't think those are needed.  It's not hard to do,
however; if people opine against this, I will implement that.

* The multixact_freeze_table_age value has been set to 5 million.
I feel this is a big enough number that shouldn't cause too much
vacuuming churn, while at the same time not leaving excessive storage
occupied by pg_multixact/members, which amplifies the space used by the
average number of member in each multi.

(A bit of math: each Xid uses 2 bits.  Therefore for the default 200
million transactions of vacuum_freeze_table_age we use 50 million bytes,
or about 27 MB of space, plus some room for per-page LSNs.  For each
Multi we use 4 bytes in offset plus 5 bytes per member; if we consider 2
members per multi in average, that totals 70 million bytes for the
default multixact_freeze_table_age, so 66 MB of space.)

* I have named the parameters by simply replacing vacuum with
multixact.  I could instead have added the multixact word in the
middle:
vacuum_multixact_freeze_min_age
but this doesn't seem an improvement.

* In the word Multixact in the docs I left the X as lowercase.  I used
uppercase first but that looked pretty odd.  In the middle of a
sentence, the M is also lowercase.


I reworded the paragraph in maintenance.sgml a bit.  If there are
suggestions, please shout.

   para
 Similar to transaction IDs, Multixact IDs are implemented as a 32-bit
 counter and corresponding storage which requires careful aging management,
 storage cleanup, and wraparound handling.  Multixacts are used to implement
 row locking by multiple transactions: since there is limited space in the
 tuple header to store lock information, that information is stored 
separately
 and only a reference to it is in the structfieldxmax/ field in the 
tuple
 header.
/para 

para
 As with transaction IDs,
 commandVACUUM/ is in charge of removing old values.  Each
 commandVACUUM/ run sets 
structnamepg_class/.structfieldrelminmxid/ 
 indicating the oldest possible value still stored in that table; every time
 this value is older than xref linkend=guc-multixact-freeze-table-age, a
 full-table scan is forced.
 During any table scan (either partial or full-table), any multixact older
 than xref linkend=guc-multixact-freeze-min-age is replaced by something
 else, which can be the zero value, a single transaction ID,
 or a newer multixact.  Eventually, as all tables in all databases are
 scanned and their oldest multixact values are advanced, on-disk storage for
 older multixacts can be removed.
/para

   /sect3

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] truncating pg_multixact/members

2014-01-20 Thread Andres Freund

Hi,

On 2014-01-20 15:39:33 -0300, Alvaro Herrera wrote:
 * The multixact_freeze_table_age value has been set to 5 million.
 I feel this is a big enough number that shouldn't cause too much
 vacuuming churn, while at the same time not leaving excessive storage
 occupied by pg_multixact/members, which amplifies the space used by the
 average number of member in each multi.

That seems to be *far* too low to me. In some workloads, remember we've
seen pg_controldata outputs with far high next multi than next xid, that
will cause excessive full table scans. I really think that we shouldn't
change the default for freeze_table_age for multis at all.
I think we should have a lower value for the vacuum_freeze_min_age
equivalent, but that's it.

 (A bit of math: each Xid uses 2 bits.  Therefore for the default 200
 million transactions of vacuum_freeze_table_age we use 50 million bytes,
 or about 27 MB of space, plus some room for per-page LSNs.  For each
 Multi we use 4 bytes in offset plus 5 bytes per member; if we consider 2
 members per multi in average, that totals 70 million bytes for the
 default multixact_freeze_table_age, so 66 MB of space.)

That doesn't seem sufficient cause to change the default to me.

 * I have named the parameters by simply replacing vacuum with
 multixact.  I could instead have added the multixact word in the
 middle:
 vacuum_multixact_freeze_min_age
 but this doesn't seem an improvement.

I vote for the longer version. Right now you can get all relevant vacuum
parameters by grepping/searching for vacuum, we shouldn't give up on
that. If we consider vacuum_multixact_freeze_min_age to be too long, I'd
rather vote for replacing multixact by mxid or such.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trigger information for auto_explain.

2014-01-20 Thread Jaime Casanova

On Tue, Jan 14, 2014 at 4:25 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:

 This patch consists of two parts,

  0001_expose_explain_triggers_v1_20140114.patch

   Expose the code to print out trigger information currently
   hidden in ExplainOnePlan().

  0002_auto_explain_triggers_v1_20140114.patch

   Enable auto_explain to output trigger information.

 Documentation will be added later..


Hi,

I think documentation is the only thing that stops this patch to be
commitable... can you add it?

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] surprising to_timestamp behavior

Jeevan Chalke jeevan.cha...@enterprisedb.com writes:
 I went to review this, and found that there's not actually a patch
 attached ...

 Attached. Sorry for that.

This looks good to me except for one thing: if the upcoming node is a
DCH_FX action (ie, turn on fx_mode), I don't think we want to skip
whitespace.  The original coding had that bug too, but it was only
exposed if the FX prefix was immediately preceded by whitespace in
the format.

It's a bit hard to think of useful cases where this would make a
difference, because it turns out that from_char_parse_int contains
an internal skip-whitespace action that's applied regardless of FX
mode, so it really only matters for non-integer field types.  I
kinda think that maybe now we should remove that extra skip, or at
least count the skipped whitespace against the field width, but
did not experiment to see what the results would be.

Committed with that change and some cosmetic adjustments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trigger information for auto_explain.

Jaime Casanova wrote:
 On Tue, Jan 14, 2014 at 4:25 AM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 
  This patch consists of two parts,
 
   0001_expose_explain_triggers_v1_20140114.patch

   0002_auto_explain_triggers_v1_20140114.patch

  Documentation will be added later..

 I think documentation is the only thing that stops this patch to be
 commitable... can you add it?

Agreed.  I have pushed patch 0001 for now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core

Peter Geoghegan p...@heroku.com writes:
 On Sat, Dec 7, 2013 at 9:26 AM, Fujii Masao masao.fu...@gmail.com wrote:
 The patch doesn't apply cleanly against the master due to recent change of
 pg_stat_statements. Could you update the patch?

 Revision is attached, including changes recently discussed on other
 thread [1] to allow avoiding sending query text where that's not
 desirable and increase in default pg_stat_statements.max to 5000.

I see this is marked as ready for committer.  Where does it stand in
relation to the other long-running thread about calls under-estimation
propagation?  I was surprised to find that there isn't any CommitFest
entry linked to that thread, so I'm wondering if that proposal is
abandoned or what.  If it's not, is committing this going to blow up
that patch?

BTW, I'm also thinking that the detected_version kluge is about ready
to collapse of its own weight, or at least is clearly going to break in
future.  What we need to do is embed the API version in the C name of the
pg_stat_statements function instead.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core

2014-01-20 Thread Peter Geoghegan

On Mon, Jan 20, 2014 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I see this is marked as ready for committer.  Where does it stand in
 relation to the other long-running thread about calls under-estimation
 propagation?  I was surprised to find that there isn't any CommitFest
 entry linked to that thread, so I'm wondering if that proposal is
 abandoned or what.  If it's not, is committing this going to blow up
 that patch?

I believe that proposal was withdrawn. I think the conclusion there
was that we should just expose queryid and be done with it. In a way,
exposing the queryid enabled this work, because it provides an
identifier that can be used instead of sending large query texts each
call.

 BTW, I'm also thinking that the detected_version kluge is about ready
 to collapse of its own weight, or at least is clearly going to break in
 future.  What we need to do is embed the API version in the C name of the
 pg_stat_statements function instead.

I agree that it isn't scalable.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core

Peter Geoghegan p...@heroku.com writes:
 On Mon, Jan 20, 2014 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, I'm also thinking that the detected_version kluge is about ready
 to collapse of its own weight, or at least is clearly going to break in
 future.  What we need to do is embed the API version in the C name of the
 pg_stat_statements function instead.

 I agree that it isn't scalable.

Yeah.  It was more or less tolerable as long as we were only using it in
connection with identifying the set of output columns, but using it to
identify the expected input arguments too seems darn rickety.  I'll
refactor so there are separate C call points for each supported API
version.  (Well, I guess it's too late to separate 1.0 and 1.1, but
we can make 1.2 and future versions separate.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Closing commitfest 2013-11

With apologies to our beloved commitfest-mace-wielding CFM, commitfest
2013-11 intentionally still contains a few open patches.  I think that
CF is largely being ignored by most people now that we have CF 2014-01
in progress.  If we don't want to do anything about these patches in the
immediate future, I propose we move them to CF 2014-01.

* shared memory message queues
  This is part of the suite involving dynamic shmem; not sure whether
  this is a patch that needs more review, or is it ready for
  application, or has it been superceded by later versions in the next
  commitfest.  Patch authors please chime in.
 
* Shave a few instructions from child-process startup sequence
  Discussion stalled without a conclusion; opinions diverge on whether
  this is a useful patch to have.  My personal inclination is to drop
  this patch because it seems pointless, but if someone feels otherwise
  I won't object.  (The objection that it will break as soon as we
  decide to change the invariant about invalid sockets no longer applies
  because it has Asserts to that effect.)  Do we really care about
  performance during process termination?  I'd say this is mildly
  interesting if this code is executed for non-authenticated clients.
 
* Widening application of indices.
  Was this re-posted in 2014-01?
 
* fault tolerant DROP IF EXISTS
  I gave a look and it looks good for application.  This wasn't
  superceded by a future version, correct?
 
* SSL: better default ciphersuite
  I think we should apply this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Closing commitfest 2013-11

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 With apologies to our beloved commitfest-mace-wielding CFM, commitfest
 2013-11 intentionally still contains a few open patches.  I think that
 CF is largely being ignored by most people now that we have CF 2014-01
 in progress.  If we don't want to do anything about these patches in the
 immediate future, I propose we move them to CF 2014-01.

I think the idea was that patch authors should take responsibility for
pushing their patches forward to 2014-01 if they still wanted them
considered.  Quite a few patches already were moved that way, IIRC.

Agreed though that we shouldn't let them just rot.


 * shared memory message queues

Isn't this committed?  There's something by that name breaking the
buildfarm ;-)

 * Shave a few instructions from child-process startup sequence
   Discussion stalled without a conclusion; opinions diverge on whether
   this is a useful patch to have.  My personal inclination is to drop
   this patch because it seems pointless, but if someone feels otherwise
   I won't object.

I was one of the people objecting to it, so +1 for dropping.
 
 * Widening application of indices.
   Was this re-posted in 2014-01?

Yes, there's a newer version already in 2014-01.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second

2014-01-20 Thread Oskari Saarenmaa


18.11.2013 07:53, Sawada Masahiko kirjoitti:

On 13 Nov 2013, at 20:51, Mika Eloranta m...@ohmu.fi wrote:

Prevent excessive progress reporting that can grow to gigabytes
of output with large databases.


I got error with following scenario.

$ initdb -D data -E UTF8 --no-locale
/* setting the replication parameters */
$ pg_basebackup -D 2data
Floating point exception
LOG:  could not send data to client: Broken pipe
ERROR:  base backup could not send data, aborting backup
FATAL:  connection to client lost


Attached a rebased patch with a fix for division by zero error plus some 
code style issues.  I also moved the patch to the current commitfest.


/ Oskari
From 1c54ffc5006320da1b021c2a07939f948ba9fdb1 Mon Sep 17 00:00:00 2001
From: Mika Eloranta m...@ohmu.fi
Date: Tue, 21 Jan 2014 00:15:27 +0200
Subject: [PATCH] pg_basebackup: progress report max once per second

Prevent excessive progress reporting that can grow to gigabytes
of output with large databases.
---
 src/bin/pg_basebackup/pg_basebackup.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 9d13d57..cae181c 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -15,6 +15,7 @@
 #include libpq-fe.h
 #include pqexpbuffer.h
 #include pgtar.h
+#include pgtime.h
 
 #include unistd.h
 #include dirent.h
@@ -46,6 +47,7 @@ static bool	streamwal = false;
 static bool	fastcheckpoint = false;
 static bool	writerecoveryconf = false;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
+static pg_time_t last_progress_report = 0;
 
 /* Progress counters */
 static uint64 totalsize;
@@ -75,7 +77,7 @@ static PQExpBuffer recoveryconfcontents = NULL;
 /* Function headers */
 static void usage(void);
 static void verify_dir_is_empty_or_create(char *dirname);
-static void progress_report(int tablespacenum, const char *filename);
+static void progress_report(int tablespacenum, const char *filename, bool force);
 
 static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
 static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
@@ -401,11 +403,18 @@ verify_dir_is_empty_or_create(char *dirname)
  * is enabled, also print the current file name.
  */
 static void
-progress_report(int tablespacenum, const char *filename)
+progress_report(int tablespacenum, const char *filename, bool force)
 {
-	int			percent = (int) ((totaldone / 1024) * 100 / totalsize);
+	int			percent;
 	char		totaldone_str[32];
 	char		totalsize_str[32];
+	pg_time_t	now = time(NULL);
+
+	if (!showprogress || (now == last_progress_report  !force))
+		return; /* Max once per second */
+
+	last_progress_report = now;
+	percent = totalsize ? (int) ((totaldone / 1024) * 100 / totalsize) : 0;
 
 	/*
 	 * Avoid overflowing past 100% or the full size. This may make the total
@@ -852,9 +861,9 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 			}
 		}
 		totaldone += r;
-		if (showprogress)
-			progress_report(rownum, filename);
+		progress_report(rownum, filename, false);
 	}			/* while (1) */
+	progress_report(rownum, filename, true);
 
 	if (copybuf != NULL)
 		PQfreemem(copybuf);
@@ -1079,8 +1088,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 disconnect_and_exit(1);
 			}
 			totaldone += r;
-			if (showprogress)
-progress_report(rownum, filename);
+			progress_report(rownum, filename, false);
 
 			current_len_left -= r;
 			if (current_len_left == 0  current_padding == 0)
@@ -1096,6 +1104,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
 			}
 		}		/* continuing data in existing file */
 	}			/* loop over all data blocks */
+	progress_report(rownum, filename, true);
 
 	if (file != NULL)
 	{
@@ -1456,8 +1465,7 @@ BaseBackup(void)
 	tablespacecount = PQntuples(res);
 	for (i = 0; i  PQntuples(res); i++)
 	{
-		if (showprogress)
-			totalsize += atol(PQgetvalue(res, i, 2));
+		totalsize += atol(PQgetvalue(res, i, 2));
 
 		/*
 		 * Verify tablespace directories are empty. Don't bother with the
@@ -1504,7 +1512,7 @@ BaseBackup(void)
 
 	if (showprogress)
 	{
-		progress_report(PQntuples(res), NULL);
+		progress_report(PQntuples(res), NULL, true);
 		fprintf(stderr, \n);	/* Need to move to next line */
 	}
 	PQclear(res);
-- 
1.8.4.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Closing commitfest 2013-11

2014-01-20 Thread Dean Rasheed

On 20 January 2014 21:24, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 * fault tolerant DROP IF EXISTS
   I gave a look and it looks good for application.  This wasn't
   superceded by a future version, correct?


No, this hasn't been superceded. +1 for applying it.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-20 Thread Bruce Momjian

On Wed, Jan 15, 2014 at 11:49:09AM +, Mel Gorman wrote:
 It may be the case that mmap/madvise is still required to handle a double
 buffering problem but it's far from being a free lunch and it has costs
 that read/write does not have to deal with. Maybe some of these problems
 can be fixed or mitigated but it is a case where a test case demonstrates
 the problem even if that requires patching PostgreSQL.

We suspected trying to use mmap would have costs, but it is nice to hear
actual details about it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-20 Thread Marti Raudsepp

Hi,

On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp ma...@juffo.org wrote:
 I've been trying it out in a few situations. I implemented a new
 enable_partialsort GUC to make it easier to turn on/off, this way it's a lot
 easier to test. The attached patch applies on top of partial-sort-5.patch

 I though about such option. Generally not because of testing convenience,
 but because of overhead of planning. This way you implement it is quite
 naive :)

I don't understand. I had another look at this and cost_sort still
seems like the best place to implement this, since that's where the
patch decides how many pre-sorted columns to use. Both mergejoin and
simple order-by plans call into it. If enable_partialsort=false then I
skip all pre-sorted options except full sort, making cost_sort behave
pretty much like it did before the patch.

I could change pathkeys_common to return 0, but that seems like a
generic function that shouldn't be tied to partialsort. The old code
paths called pathkeys_contained_in anyway, which has similar
complexity. (Apart for initial_cost_mergejoin, but that doesn't seem
special enough to make an exception for).

Or should I use?:
  enable_partialsort ? pathkeys_common(...) : 0

 For instance, merge join rely on partial sort which will be
 replaced with simple sort.

Are you saying that enable_partialsort=off should keep
partialsort-based mergejoins enabled?

Or are you saying that merge joins shouldn't use simple sort at all?
But merge join was already able to use full Sort nodes before your
patch.

What am I missing?

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix memset usage in pgcrypto

Marko Kreen escribió:
 http://www.viva64.com/en/b/0227/ reported that on-stack memset()s
 might be optimized away by compilers.  Fix it.

Just to clarify, this needs to be applied to all branches, right?  If
so, does the version submitted apply cleanly to all of them?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shave a few instructions from child-process startup sequence

Peter Eisentraut escribió:
 src/backend/postmaster/postmaster.c:2255: indent with spaces.
 +else
 src/backend/postmaster/postmaster.c:2267: indent with spaces.
 +break;

I just checked the Jenkins page for this patch:
http://pgci.eisentraut.org/jenkins/job/postgresql_commitfest_world/243/
just to make sure I can figure out what it means.  You reported it as
build unstable in the commitfest entry:
https://commitfest.postgresql.org/action/patch_view?id=1277
However, looking at Jenkins, I couldn't figure out what the problem is.
I can go to the GNU make + GCC warnings page, which lists one
warning -- but we already know it:
unused variable ‘yyg’ [-Wunused-variable]

I can go to the console output page, which has this:

01:24:53 + tar cJf postgresql-9.4.bin.tar.xz postgresql-9.4.bin/
01:24:53 [WARNINGS] Parsing warnings in console log with parser GNU Make + GNU 
Compiler (gcc)
01:24:53 [WARNINGS] Computing warning deltas based on reference build #242
01:24:53 [WARNINGS] Ignore new warnings since this is the first valid build
01:24:53 Archiving artifacts
01:24:53 WARN: No artifacts found that match the file pattern 
**/regression.diffs,**/regression.out,cpluspluscheck.out. Configuration error?
01:24:53 WARN: ‘**/regression.diffs’ doesn’t match anything: ‘**’ exists but 
not ‘**/regression.diffs’
01:24:53 Checking console output
01:24:53 
/var/lib/jenkins/jobs/postgresql_commitfest_world/builds/2013-11-25_01-14-27/log:
01:24:53 5644dbce38ce0f5f16155eba9988fee1  -
01:24:53 Build step 'Jenkins Text Finder' changed build result to UNSTABLE

But I can hardly blame the patch for the above, can I?

I'm not saying I like the patch; I just wonder how to make Jenkins work
for me effectively.  Is this a configuration error?  Should I be looking
at some other page?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views


(2014/01/13 22:53), Craig Ringer wrote:

On 01/09/2014 11:19 PM, Tom Lane wrote:

Dean Rasheed dean.a.rash...@gmail.com writes:

My first thought was that it should just preprocess any security
barrier quals in subquery_planner() in the same way as other quals are
preprocessed. But thinking about it further, those quals are destined
to become the quals of subqueries in the range table, so we don't
actually want to preprocess them at that stage --- that will happen
later when the new subquery is planned by recursion back into
subquery_planner(). So I think the right answer is to make
adjust_appendrel_attrs() handle recursion into sublink subqueries.


TBH, this sounds like doubling down on a wrong design choice.  I see
no good reason that updatable security views should require any
fundamental rearrangements of the order of operations in the planner.


In that case, would you mind offerign a quick sanity check on the
following alternative idea:

- Add sourceRelation to Query. This refers to the RTE that supplies
tuple projections to feed into ExecModifyTable, with appropriate resjunk
ctid and (if requ'd) oid cols present.

- When expanding a target view into a subquery, set sourceRelation on
the outer view to the index of the RTE of the newly expanded subquery.


I have sane opinion. Existing assumption, resultRelation is RTE of the
table to be read not only modified, makes the implementation complicated.
If we would have a separate sourceRelation, it allows to have flexible
form including sub-query with security_barrier on the reader side.

Could you tell me the direction you're inclined right now?
I wonder whether I should take the latest patch submitted on 09-Jan for
a deep code reviewing and testing.

Thanks,


- In rewriteTargetView, as now, reassign resultRelation to the target
view's base rel. This is required so that  do any RETURNING and WITH
CHECK OPTION fixups required to adjust the RETURNING list to the new
result relation, so they act on the final tuple after any BEFORE
triggers act. Do not flatten the view subquery and merge the quals (as
currently happens); allow it to be expanded as a subquery by the
rewriter instead. Don't mess with the view tlist at this point except by
removing the whole-row Var added by rewriteTargetListUD.

- When doing tlist expansion in preprocess_targetlist, when we process
the outer Query (the only one for which query type is not SELECT, and
the only one that has a non-zero resultRelation), if resultRelation !=
sourceRelation recursively follow the chain of sourceRelation s to the
bottom one with type RTE_RELATION. Do tlist expansion on that inner-most
Query first, using sourceRelation to supply the varno for injected TLEs,
including injecting ctid, oid if req'd, etc. During call stack
unwind, have each intermediate layer do regular tlist expansion, adding
a Var pointing to each tlist entry of the inner subquery.

At the outer level of preprocess_targetlist, sort the tlist, now
expanded to include all required vars, into attribute order for the
resultRelation. (this level is the only one that has resultRelation set).

Avoid invoking preprocess_targetlist on the inner Query again when it's
processed in turn, or just bail out when we see sourceRelation set since
we know it's already been done.

(Alternately, it might be possible to run preprocess_targetlist
depth-first instead of the current outermost-first, but I haven't looked
at that).


The optimizer can still flatten non-security-barrier updatable views,
following the chain of Vars as it collapses each layer. That's
effectively what the current rewriteTargetView code is doing manually at
each pass right now.

I'm sure there are some holes in this outline, but it's struck me as
possibly workable. The key is to set sourceRelation on every inner
subquery in the target query chain, not just the outer one, so it can be
easily followed from the outer query though the subqueries into the
innermost query with RTE_RELATION type.



The only alternative I've looked at is looking clumsier the longer I
examine it: adding a back-reference in each subquery's Query struct, to
the Query containing it and the RTI of the subquery within the outer
Query. That way, once rewriting hits the innermost rel with RTE_RELATION
type, the rewriter can walk back up the Query tree doing tlist
rewriting. I'm not sure if this is workable yet, and it creates ugly
pointer-based backrefs *up* the Query chain, making what was previously
a tree of Query* into a graph. That could get exciting, though there'd
never be any need for mutators to follow the parent query pointer so it
wouldn't make tree rewrites harder.








--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add value to error message when size extends

Daniel Erez de...@redhat.com writes:
 Many thanks for the prompt response and the suggestions!

 So, regarding the issue of production quality you've mentioned,
 we understand there are two remaining matters to address:
 1. debug_query_string:
As we can't rely on this flag, is there any alternative we can rely on?
 2. encapsulation:
Is there any utility file we could extract the logic to?
Or, any other functions that are used for encapsulation mechanism?

The big picture here is that there are two ways we could go:

* Encapsulate some improved version of the logic I posted into an error
reporting auxiliary function, along the lines of
  function_errposition(fcinfo)
and then run around and add calls to this function in the ereports
where it seems useful.  This would be the right answer if we think
only a few such ereports need the extra info --- but if we want it
to apply to many/most function-level error reports, it sounds pretty
darn tedious.  Also it'd require that everyplace that is going to
throw such reports have access to the appropriate fcinfo, which would
require some refactoring in places where SQL functions have been
divided into multiple C routines.

* Add logic in execQual.c to automatically add the information whenever
an error is thrown while executing an identifiable expression node.
This solves the problem for all functions at one stroke.  The problem
is that it would add some overhead to the non-error code path.
It's hard to see how to do this without at least half a dozen added
instructions per function or operator node (it'd likely involve
saving, setting, and restoring a global variable ...).  I'm not sure
if that would be significant but it would very possibly be measurable
on function-call-heavy workloads.  We might think that such overhead
is worth paying for better error reporting; but given how few people
have commented on this thread, I'm not sure many people are excited
about it.

I'd like to hear some more comments before undertaking either approach.

As for getting rid of the use of debug_query_string, it'd take some
work, though it seems do-able.  I think ActivePortal-sourceText is
the right thing to reference in the normal case, but it may not
be the right thing when executing queries from SQL-language functions
for instance.  Also, the reason I didn't use that in the quick-hack
patch was that the test cases I wanted to post involved calls that
will get executed during the planner's constant-simplification pass
(since varchar() is marked IMMUTABLE).  It turns out there isn't
an ActivePortal at that point, so we'd need some thought about how
to make the right query string accessible during planning.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables


(2014/01/14 18:24), Shigeru Hanada wrote:

2013/11/18 Tom Lane t...@sss.pgh.pa.us:

Robert Haas robertmh...@gmail.com writes:

I think it's been previously proposed that we have some version of a
CHECK constraint that effectively acts as an assertion for query
optimization purposes, but isn't actually enforced by the system.  I
can see that being useful in a variety of situations, including this
one.


Yeah, I think it would be much smarter to provide a different syntax
to explicitly represent the notion that we're only assuming the condition
is true, and not trying to enforce it.


I'd like to revisit this feature.

Explicit notation to represent not-enforcing (assertive?) is an
interesting idea.  I'm not sure which word is appropriate, but for
example, let's use the word ASSERTIVE as a new constraint attribute.

CREATE TABLE parent (
 id int NOT NULL,
 group int NOT NULL,
 name text
);

CREATE FOREIGN TABLE child_grp1 (
   /* no additional column */
) INHERITS (parent) SERVER server1;
ALTER  TABLE child_grp1 ADD CONSTRAINT chk_group1 CHECK (group = 1) ASSERTIVE;

If ASSERTIVE is specified, it's not guaranteed that the constraint is
enforced completely, so it should be treated as a hint for planner.
As Robert mentioned, enforcing as much as we can during INSERT/UPDATE
is one option about this issue.

In addition, an idea which I can't throw away is to assume that all
constraints defined on foreign tables as ASSERTIVE.  Foreign tables
potentially have dangers to have wrong data by updating source data
not through foreign tables.  This is not specific to an FDW, so IMO
constraints defined on foreign tables are basically ASSERTIVE.  Of
course PG can try to maintain data correct, but always somebody might
break it.
qu


Does it make sense to apply assertive CHECK constraint on the qual
of ForeignScan to filter out tuples with violated values at the local
side, as if row-level security feature doing.
It enables to handle a situation that planner expects only clean
tuples are returned but FDW driver is unavailable to anomalies.

Probably, this additional check can be turned on/off on the fly,
if FDW driver has a way to inform the core system its capability,
like FDW_CAN_ENFORCE_CHECK_CONSTRAINT that informs planner to skip
local checks.


Besides CHECK constraints, currently NOT NULL constraints are
virtually ASSERTIVE (not enforcing).  Should it also be noted
explicitly?


Backward compatibility

NOT NULL [ASSERTIVE] might be an option.

Thanks,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

2014-01-20 Thread Tomas Vondra

On 20.1.2014 19:30, Heikki Linnakangas wrote:
 
 Attached is a yet another version, with more bugs fixed and more 
 comments added and updated. I would appreciate some heavy-testing of 
 this patch now. If you could re-run the tests you've been using,
 that could be great. I've tested the WAL replay by replicating GIN
 operations over streaming replication. That doesn't guarantee it's
 correct, but it's a good smoke test.

I gave it a try - the OOM error seems to be gone, but now get this

   PANIC:  cannot insert duplicate items to GIN index page

This only happens when building the index incrementally (i.e. using a
sequence of INSERT statements into a table with GIN index). When I
create a new index on a table (already containing the same dataset) it
works just fine.

Also, I tried to reproduce the issue by running a simple plpgsql loop
(instead of a complex python script):

DO LANGUAGE plpgsql $$
DECLARE
r tsvector;
BEGIN
FOR r IN SELECT body_tsvector FROM data_table LOOP
INSERT INTO idx_table (body_tsvector) VALUES (r);
END LOOP;
END$$;

where data_table is the table with imported data (the same data I
mentioned in the post about OOM errors), and index_table is an empty
table with a GIN index. And indeed it fails, but only if I run the block
in multiple sessions in parallel.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

2014-01-20 Thread Shigeru Hanada

Thanks for the comments.

2014/1/21 KaiGai Kohei kai...@ak.jp.nec.com:
 In addition, an idea which I can't throw away is to assume that all
 constraints defined on foreign tables as ASSERTIVE.  Foreign tables
 potentially have dangers to have wrong data by updating source data
 not through foreign tables.  This is not specific to an FDW, so IMO
 constraints defined on foreign tables are basically ASSERTIVE.  Of
 course PG can try to maintain data correct, but always somebody might
 break it.
 qu

 Does it make sense to apply assertive CHECK constraint on the qual
 of ForeignScan to filter out tuples with violated values at the local
 side, as if row-level security feature doing.
 It enables to handle a situation that planner expects only clean
 tuples are returned but FDW driver is unavailable to anomalies.

 Probably, this additional check can be turned on/off on the fly,
 if FDW driver has a way to inform the core system its capability,
 like FDW_CAN_ENFORCE_CHECK_CONSTRAINT that informs planner to skip
 local checks.

Hmm, IIUC you mean that local users can't (or don't need to) know that
data which violates the local constraints exist on remote side.
Applying constraints to the data which is modified through FDW would
be necessary as well.  In that design, FDW is a bidirectional filter
which provides these features:

1) Don't push wrong data into remote data source, by applying local
constraints to the result of the modifying query executed on local PG.
 This is not perfect filter, because remote constraints don't mapped
automatically or perfectly (imagine constraints which is available on
remote but is not supported in PG).
2) Don't retrieve wrong data from remote to local PG, by applying
local constraints

I have a concern about consistency.  It has not been supported, but
let's think of Aggregate push-down invoked by a query below.

SELECT count(*) FROM remote_table;

If this query was fully pushed down, the result is the # of records
exist on remote side, but the result would be # of valid records when
we don't push down the aggregate.  This would confuse users.

 Besides CHECK constraints, currently NOT NULL constraints are
 virtually ASSERTIVE (not enforcing).  Should it also be noted
 explicitly?

 Backward compatibility….

Yep, backward compatibility (especially visible ones to users) should
be minimal, ideally zero.

 NOT NULL [ASSERTIVE] might be an option.

Treating [ASSERTIVE | NOT ASSERTIVE] like DEFERRABLE, and allow
ingASSERTIVE for only foreign tables?  It makes sense, though we need
consider exclusiveness .  But It needs to default to ASSERTIVE on
foreign tables, and NOT ASSERTIVE (means forced) on others.  Isn't
is too complicated?

CREATE FOREIGN TABLE foo (
id int NOT NULL ASSERTIVE CHECK (id  1) ASSERTIVE,
…
CONSTRAINT chk_foo_name_upper CHECK (upper(name) = name) ASSERTIVE
) SERVER server;

BTW, I noticed that this is like push-down-able expressions in
JOIN/WHERE.  We need to check a CHECK constraint defined on a foreign
tables contains only expressions which have same semantics as remote
side (in practice, built-in and immutable)?
-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Implement json_array_elements_text

2014-01-20 Thread Laurence Rowe

Following the discussion on pgsql-general, I thought I'd have a go
implementing json_array_elements_text following the same pattern as
json_each_text. The function makes it possible to join elements of a
json array onto a table, for example:

CREATE TABLE object (name TEXT PRIMARY KEY, properties JSON);

INSERT INTO object (name, properties) VALUES
('one', '{}'),
('two', '{links: [one]}'),
('three', '{links: [one, two]}');

SELECT source.name, target.name
FROM (
SELECT *, json_array_elements_text(properties-'links')::text AS
link_to FROM object
) AS source
JOIN object target ON source.link_to = target.name;


My particular use case has uuid keys for object, which are difficult
to cast from json.

Laurence

---
 doc/src/sgml/func.sgml   | 22 
 src/backend/utils/adt/jsonfuncs.c| 67 +---
 src/include/catalog/pg_proc.h|  2 ++
 src/include/utils/json.h |  1 +
 src/test/regress/expected/json.out   | 34 +++---
 src/test/regress/expected/json_1.out | 34 +++---
 src/test/regress/sql/json.sql|  6 ++--
 7 files changed, 144 insertions(+), 22 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c76d357..e7338b5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10277,6 +10277,28 @@ table2-mapping
   row
entry
  indexterm
+  primaryjson_array_elements_text/primary
+ /indexterm
+ literaljson_array_elements_text(json)/literal
+   /entry
+   entrytypeSETOF json/type/entry
+   entry
+ Expands a JSON array to a set of JSON values. The returned value will be of
+ type text.
+   /entry
+   entryliteraljson_array_elements_text('[foo, bar]')/literal/entry
+   entry
+programlisting
+   value
+---
+ foo
+ bar
+/programlisting
+   /entry
+  /row
+  row
+   entry
+ indexterm
   primaryjson_typeof/primary
  /indexterm
  literaljson_typeof(json)/literal
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 90fa447..b8e64f3 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -66,6 +66,9 @@ static void each_object_field_end(void *state, char *fname, bool isnull);
 static void each_array_start(void *state);
 static void each_scalar(void *state, char *token, JsonTokenType tokentype);
 
+/* common worker for json_each* functions */
+static inline Datum elements_worker(PG_FUNCTION_ARGS, bool as_text);
+
 /* semantic action functions for json_array_elements */
 static void elements_object_start(void *state);
 static void elements_array_element_start(void *state, bool isnull);
@@ -157,6 +160,9 @@ typedef struct ElementsState
 	TupleDesc	ret_tdesc;
 	MemoryContext tmp_cxt;
 	char	   *result_start;
+	bool		normalize_results;
+	bool		next_scalar;
+	char	   *normalized_scalar;
 }	ElementsState;
 
 /* state for get_json_object_as_hash */
@@ -1061,7 +1067,7 @@ each_scalar(void *state, char *token, JsonTokenType tokentype)
 }
 
 /*
- * SQL function json_array_elements
+ * SQL function json_array_elements and json_array_elements_text
  *
  * get the elements from a json array
  *
@@ -1070,10 +1076,22 @@ each_scalar(void *state, char *token, JsonTokenType tokentype)
 Datum
 json_array_elements(PG_FUNCTION_ARGS)
 {
+	return elements_worker(fcinfo, false);
+}
+
+Datum
+json_array_elements_text(PG_FUNCTION_ARGS)
+{
+	return elements_worker(fcinfo, true);
+}
+
+static inline Datum
+elements_worker(PG_FUNCTION_ARGS, bool as_text)
+{
 	text	   *json = PG_GETARG_TEXT_P(0);
 
-	/* elements doesn't need any escaped strings, so use false here */
-	JsonLexContext *lex = makeJsonLexContext(json, false);
+	/* elements only needs escaped strings when as_text */
+	JsonLexContext *lex = makeJsonLexContext(json, as_text);
 	JsonSemAction *sem;
 	ReturnSetInfo *rsi;
 	MemoryContext old_cxt;
@@ -1116,6 +1134,9 @@ json_array_elements(PG_FUNCTION_ARGS)
 	sem-array_element_start = elements_array_element_start;
 	sem-array_element_end = elements_array_element_end;
 
+	state-normalize_results = as_text;
+	state-next_scalar = false;
+
 	state-lex = lex;
 	state-tmp_cxt = AllocSetContextCreate(CurrentMemoryContext,
 		 json_array_elements temporary cxt,
@@ -1138,7 +1159,17 @@ elements_array_element_start(void *state, bool isnull)
 
 	/* save a pointer to where the value starts */
 	if (_state-lex-lex_level == 1)
-		_state-result_start = _state-lex-token_start;
+	{
+		/*
+		 * next_scalar will be reset in the array_element_end handler, and
+		 * since we know the value is a scalar there is no danger of it being
+		 * on while recursing down the tree.
+		 */
+		if (_state-normalize_results  _state-lex-token_type == JSON_TOKEN_STRING)
+			_state-next_scalar = true;
+		else
+			_state-result_start = _state-lex-token_start;
+	}
 }
 
 static void
@@ -1150,7 +1181,7 @@ elements_array_element_end(void

Re: [HACKERS] [PATCH] Implement json_array_elements_text

2014-01-20 Thread Andrew Dunstan



On 01/20/2014 09:58 PM, Laurence Rowe wrote:

Following the discussion on pgsql-general, I thought I'd have a go
implementing json_array_elements_text following the same pattern as
json_each_text. The function makes it possible to join elements of a
json array onto a table,


Can we sneak this very small feature into 9.4? I'm happy to take on the 
review etc.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NOT Null constraint on foreign table not working

On Mon, Jan 20, 2014 at 8:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Rushabh Lathia rushabh.lat...@gmail.com writes:
 As per the PG documentation it says that foreign table do support the
 NOT NULL, NULL and DEFAULT.

 There has been a great deal of debate about what constraints on foreign
 tables ought to mean.  Right now, at least for postgres_fdw, they're just
 taken as documentation of constraints that are supposed to exist on the
 far side.  It's not clear what's the point of trying to enforce them
 against insertions done locally if the remote table lacks them --- any
 table update done on the far side could still violate the constraint.

What is the reason for keeping DEFAULT behaviour different than
constraints. Right now the behaviour for DEFAULT is if it is
defined on foreign table, then it will use that even if original table has
different or no default value?

Create Database foo;
\c foo
create table tbl(c1 int Default 20);

\c postgres
create foreign table tbl(c1 int Default 10) server pgdbfdw; --here
pgdbfdw is server for postgres_fdw
insert into tbl values(Default);
select * from tbl;
 c1

 10
(1 row)

\c foo
insert into tbl values(Default);
select * from tbl;
 c1

 10
 20
(2 rows)


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-20 Thread Craig Ringer

On 01/21/2014 09:09 AM, KaiGai Kohei wrote:
 (2014/01/13 22:53), Craig Ringer wrote:
 On 01/09/2014 11:19 PM, Tom Lane wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 My first thought was that it should just preprocess any security
 barrier quals in subquery_planner() in the same way as other quals are
 preprocessed. But thinking about it further, those quals are destined
 to become the quals of subqueries in the range table, so we don't
 actually want to preprocess them at that stage --- that will happen
 later when the new subquery is planned by recursion back into
 subquery_planner(). So I think the right answer is to make
 adjust_appendrel_attrs() handle recursion into sublink subqueries.

 TBH, this sounds like doubling down on a wrong design choice.  I see
 no good reason that updatable security views should require any
 fundamental rearrangements of the order of operations in the planner.

 In that case, would you mind offerign a quick sanity check on the
 following alternative idea:

 - Add sourceRelation to Query. This refers to the RTE that supplies
 tuple projections to feed into ExecModifyTable, with appropriate resjunk
 ctid and (if requ'd) oid cols present.

 - When expanding a target view into a subquery, set sourceRelation on
 the outer view to the index of the RTE of the newly expanded subquery.

 I have sane opinion. Existing assumption, resultRelation is RTE of the
 table to be read not only modified, makes the implementation complicated.
 If we would have a separate sourceRelation, it allows to have flexible
 form including sub-query with security_barrier on the reader side.
 
 Could you tell me the direction you're inclined right now?

My focus right now is getting your original RLS patch rebased on top of
head, separated into a series of patches that can be understood as
separate functional units, then rewritten on top of updatable security
barrier views.

( See
http://www.postgresql.org/message-id/52dcbef1.3010...@2ndquadrant.com )

I don't really care which updatable security barrier views
implementation it is. Dean's has the advantage of actually working, so
I'm basing it on his.

It sounds like Tom objects to Dean's approach to some degree, but we'll
have to see whether that turns into concrete issues. If it does, it
should be possible to replace the underlying updatable security barrier
views implementation without upsetting the rewritten RLS significantly.
That's part of why I've gone down this path - it separates filtering
rows according to security predicates from the rest of RLS, into a
largely functionally separate patch.

 I wonder whether I should take the latest patch submitted on 09-Jan for
 a deep code reviewing and testing.

That would be extremely valuable. Please break it, or show that you
could not figure out how to break it.

If you find an unfixable flaw, we'll go back to the approach outlined in
this mail - split resultRelation, insert ctids down the subquery chain, etc.

If you don't find a major flaw, then that's one less thing to worry
about, and we can focus on the RLS layer.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NOT Null constraint on foreign table not working

Amit Kapila amit.kapil...@gmail.com writes:
 On Mon, Jan 20, 2014 at 8:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There has been a great deal of debate about what constraints on foreign
 tables ought to mean.

 What is the reason for keeping DEFAULT behaviour different than
 constraints. Right now the behaviour for DEFAULT is if it is
 defined on foreign table, then it will use that even if original table has
 different or no default value?

If you look back to the original thread about the writable-foreign-tables
patch, we expended a lot of sweat on that point too.  The ideal thing IMO
would have been to allow the remote end's default specification to control
what happens, but we found enough difficulties with that that we ended up
punting and allowing the default expression to be evaluated locally.
I'm not terribly satisfied with that result, but that's where we are.

Another thing to keep in mind is that the preferred behavior isn't
necessarily the same for every FDW.  If there isn't a remote SQL server
underlying a foreign table (think file_fdw for instance) then you might
end up with different choices about what defaults and constraints mean.

We basically ran out of time to deal with these issues back in 9.3.
It'd be worth thinking through a holistic proposal about how it ought
to work across a range of FDW types.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NOT Null constraint on foreign table not working

On Tue, Jan 21, 2014 at 9:32 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 On Mon, Jan 20, 2014 at 8:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 There has been a great deal of debate about what constraints on foreign
 tables ought to mean.

 What is the reason for keeping DEFAULT behaviour different than
 constraints. Right now the behaviour for DEFAULT is if it is
 defined on foreign table, then it will use that even if original table has
 different or no default value?

 If you look back to the original thread about the writable-foreign-tables
 patch, we expended a lot of sweat on that point too.  The ideal thing IMO
 would have been to allow the remote end's default specification to control
 what happens, but we found enough difficulties with that that we ended up
 punting and allowing the default expression to be evaluated locally.
 I'm not terribly satisfied with that result, but that's where we are.

okay, I think we can specify more clearly in documentation of Foreign Table
as right now it is bit difficult to get the right behaviour by reading
documentation. Another thing could be to return Syntax Error like it does
for other constraints like CKECK.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks


(2014/01/11 3:11), Robert Haas wrote:

On Mon, Jan 6, 2014 at 5:50 PM, Robert Haas robertmh...@gmail.com wrote:

This is only part of the solution, of course: a complete solution will
involve making the hash table key something other than the lock ID.
What I'm thinking we can do is making the lock ID consist of two
unsigned 32-bit integers.  One of these will be stored in the lwlock
itself, which if my calculations are correct won't increase the size
of LWLockPadded on any common platforms (a 64-bit integer would).
Let's call this the tranch id.  The other will be derived from the
LWLock address.  Let's call this the instance ID.  We'll keep a
table of tranch IDs, which will be assigned consecutively starting
with 0.  We'll keep an array of metadata for tranches, indexed by
tranch ID, and each entry will have three associated pieces of
information: an array base, a stride length, and a printable name.
When we need to identify an lwlock in the log or to dtrace, we'll
fetch the tranch ID from the lwlock itself and use that to index into
the tranch metadata array.  We'll then take the address of the lwlock,
subtract the array base address for the tranch, and divide by the
stride length; the result is the instance ID.  When reporting the
user, we can report either the tranch ID directly or the associated
name for that tranch; in either case, we'll also report the instance
ID.

So initially we'll probably just have tranch 0: the main LWLock array.
  If we move buffer content and I/O locks to the buffer headers, we'll
define tranch 1 and tranch 2 with the same base address: the start of
the buffer descriptor array, and the same stride length, the size of a
buffer descriptor.  One will have the associated name buffer content
lock and the other buffer I/O lock.  If we want, we can define
split the main LWLock array into several tranches so that we can more
easily identify lock manager locks, predicate lock manager locks, and
buffer mapping locks.


OK, I've implemented this: here's what I believe to be a complete
patch, based on the previous lwlock-pointers.patch but now handling
LOCK_DEBUG and TRACE_LWLOCKS and dtrace and a bunch of other loose
ends.  I think this should be adequate for allowing lwlocks to be
stored elsewhere in the main shared memory segment as well as in
dynamic shared memory segments.

Thoughts?


I briefly checked the patch. Most of lines are mechanical replacement
from LWLockId to LWLock *, and compiler didn't claim anything with
-Wall -Werror option.

My concern is around LWLockTranche mechanism. Isn't it too complicated
towards the purpose?
For example, it may make sense to add char lockname[NAMEDATALEN]; at
the tail of LWLock structure if LOCK_DEBUG is enabled. Probably, it
also adds an argument of LWLockAssign() to gives the human readable
name. Is the additional 64bytes (= NAMEDATALEN) per lock too large
for recent hardware?

Below is minor comments.

It seems to me this newer form increased direct reference to
the MainLWLockArray. Is it really graceful?
My recommendation is to define a macro that wraps the reference to
MainLWLockArray.

For example:
  #define PartitionLock(i) \
(MainLWLockArray[LOCK_MANAGER_LWLOCK_OFFSET + (i)].lock)

instead of the following manner.

@@ -3400,7 +3406,12 @@ GetLockStatusData(void)
 * Must grab LWLocks in partition-number order to avoid LWLock deadlock.
 */
for (i = 0; i  NUM_LOCK_PARTITIONS; i++)
-   LWLockAcquire(FirstLockMgrLock + i, LW_SHARED);
+   {
+   LWLock *partitionLock;
+
+   partitionLock = MainLWLockArray[LOCK_MANAGER_LWLOCK_OFFSET + i].lock;
+   LWLockAcquire(partitionLock, LW_SHARED);
+   }


This chunk is enclosed by #ifdef EXEC_BACKEND, the legacy LWLockArray
was not removed.

@@ -5633,6 +5630,7 @@ save_backend_variables(BackendParameters *param, Port 
*port,
param-SpinlockSemaArray = SpinlockSemaArray;
 #endif
param-LWLockArray = LWLockArray;
+   param-MainLWLockArray = MainLWLockArray;
param-ProcStructLock = ProcStructLock;
param-ProcGlobal = ProcGlobal;
param-AuxiliaryProcs = AuxiliaryProcs;

Thanks,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Implement json_array_elements_text

2014-01-20 Thread Laurence Rowe

On 20 January 2014 18:58, Laurence Rowe l...@lrowe.co.uk wrote:

 Following the discussion on pgsql-general, I thought I'd have a go
 implementing json_array_elements_text following the same pattern as
 json_each_text.


This updated patch makes the return type of ``json_array_elements_text``
text rather than json, I'd not set it correctly in pg_proc.h.

Laurence


0001-Implement-json_array_elements_text.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] NOT Null constraint on foreign table not working

2014-01-20 Thread Rushabh Lathia

On Mon, Jan 20, 2014 at 8:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Rushabh Lathia rushabh.lat...@gmail.com writes:
  As per the PG documentation it says that foreign table do support the
  NOT NULL, NULL and DEFAULT.

 There has been a great deal of debate about what constraints on foreign
 tables ought to mean.  Right now, at least for postgres_fdw, they're just
 taken as documentation of constraints that are supposed to exist on the
 far side.  It's not clear what's the point of trying to enforce them
 against insertions done locally if the remote table lacks them --- any
 table update done on the far side could still violate the constraint.

 We might change this in response to a well-reasoned argument, but it won't
 happen just because somebody lobs a quick-and-dirty patch over the fence.

 If we were going to enforce them locally, I'd opine it should be the FDW's
 task to do it, anyway.  It might have more knowledge about the best way
 to do it than nodeModifyTable.c can, and if not it could still call
 ExecConstraints for itself.


Submitted patch was never intended to get checked in without the proper
discussion
and decision about what behaviour should be for foreign table constraint.
Sorry if I
passed wrong message but was never intended.

I found constraints on foreign table is very useful for the application
when the multiple
user accessing same remote table using fdw and both user want to enforce
different
constraint on particular table or different user want to enforce different
DEFAULT
expression for the same table column.

I agree with you that if we want to enforce constraint locally then it
should be
FDW's task to do it rather then nodeModifyTable.c.

Regards
Rushabh Lathia

[HACKERS] proposal: hide application_name from other users

2014-01-20 Thread Harold Giménez

First of all, I apologize for submitting a patch and missing the commitfest
deadline. Given the size of the patch, I thought I'd submit it for your
consideration regardless.

This patch prevents non-superusers from viewing other user's
pg_stat_activity.application_name. This topic was discussed some time ago
[1] and consequently application_name was made world readable [2].

I would like to propose that we hide it instead by reverting to the
original behavior. There is a very large number of databases on the same
cluster shared across different users who can easily view each other's
application_name values. Along with that, there are some libraries that
default application_name to the name of the running process [3], which can
leak information about what web servers applications are running, queue
systems, etc. Furthermore leaking application names in a multi-tenant
environment is more information than an attacker should have access to on
services like Heroku and other similar providers.

Thanks and regards,

-Harold Giménez

[1] http://www.postgresql.org/message-id/14808.1259452...@sss.pgh.pa.us
[2]
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0c61cff57a1dc7685fcac9f09451b261f14cb711
[3]
https://bitbucket.org/ged/ruby-pg/src/6c2444dc63e17eb695363993e8887cc5d67750bc/lib/pg/connection.rb?at=default#cl-44

hide_application_name_v1.patch
Description: Binary data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables


(2014/01/21 11:44), Shigeru Hanada wrote:

Thanks for the comments.

2014/1/21 KaiGai Kohei kai...@ak.jp.nec.com:

In addition, an idea which I can't throw away is to assume that all
constraints defined on foreign tables as ASSERTIVE.  Foreign tables
potentially have dangers to have wrong data by updating source data
not through foreign tables.  This is not specific to an FDW, so IMO
constraints defined on foreign tables are basically ASSERTIVE.  Of
course PG can try to maintain data correct, but always somebody might
break it.
qu


Does it make sense to apply assertive CHECK constraint on the qual
of ForeignScan to filter out tuples with violated values at the local
side, as if row-level security feature doing.
It enables to handle a situation that planner expects only clean
tuples are returned but FDW driver is unavailable to anomalies.

Probably, this additional check can be turned on/off on the fly,
if FDW driver has a way to inform the core system its capability,
like FDW_CAN_ENFORCE_CHECK_CONSTRAINT that informs planner to skip
local checks.


Hmm, IIUC you mean that local users can't (or don't need to) know that
data which violates the local constraints exist on remote side.
Applying constraints to the data which is modified through FDW would
be necessary as well.  In that design, FDW is a bidirectional filter
which provides these features:

1) Don't push wrong data into remote data source, by applying local
constraints to the result of the modifying query executed on local PG.
  This is not perfect filter, because remote constraints don't mapped
automatically or perfectly (imagine constraints which is available on
remote but is not supported in PG).
2) Don't retrieve wrong data from remote to local PG, by applying
local constraints


Yes. (1) can be done with ExecConstraints prior to FDW callback on
UPDATE or INSERT, even not a perfect solution because of side-channel
on the remote data source. For (2), my proposition tries to drop
retrieved violated tuples, however, the result is same.


I have a concern about consistency.  It has not been supported, but
let's think of Aggregate push-down invoked by a query below.

SELECT count(*) FROM remote_table;

If this query was fully pushed down, the result is the # of records
exist on remote side, but the result would be # of valid records when
we don't push down the aggregate.  This would confuse users.


Hmm. In this case, FDW driver needs to be responsible to push-down
the additional check quals into remote side, so it does not work
transparently towards the ForeignScan.
It might be a little bit complicated suggestion for the beginning
of the efforts.


Besides CHECK constraints, currently NOT NULL constraints are
virtually ASSERTIVE (not enforcing).  Should it also be noted
explicitly?


Backward compatibility….


Yep, backward compatibility (especially visible ones to users) should
be minimal, ideally zero.


NOT NULL [ASSERTIVE] might be an option.


Treating [ASSERTIVE | NOT ASSERTIVE] like DEFERRABLE, and allow
ingASSERTIVE for only foreign tables?  It makes sense, though we need
consider exclusiveness .  But It needs to default to ASSERTIVE on
foreign tables, and NOT ASSERTIVE (means forced) on others.  Isn't
is too complicated?


I think it is not easy to implement assertive checks, except for
foreign tables, because all the write stuff to regular tables are
managed by PostgreSQL itself.
So, it is a good first step to add support ASSERTIVE CHECK on
foreign table only, and to enforce FDW drivers nothing special
from my personal sense.

How about committer's opinion?

Thanks,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

On Mon, Jan 20, 2014 at 5:38 PM, MauMau maumau...@gmail.com wrote:
 From: Amit Kapila amit.kapil...@gmail.com

   Do you think without this the problem reported by you is resolved
 completely.
   User can hit same problem, if he tries to follow similar steps mentioned
 in
   your first mail. I had tried below steps based on description in your
   first mail:

 If user register/unregister different versions of pgevent.dll blindly,
 then I think
 any fix in this area could not prevent the error event source not found


 OK, I'll add a check to prevent duplicate registration of the same event
 source with the message:

 The specified event source is already registered.

 Please correct me if there's better expression in English.

How about below message:

event source event_source_name is already registered.

This can make it more consistent with any other message in PG,
example create table.

 Are there any other suggestions to make this patch ready for committer?

Today, I reviewed the patch again and found it okay, except a small
inconsistency which is about default event source name in
postgresql.conf, all other places it's changed except in .conf file.
Do you think it makes sense to change there as well?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation