date:20150222

Re: [HACKERS] Abbreviated keys for Numeric

2015-02-22 Thread Andrew Gierth

 Gavin == Gavin Flower gavinflo...@archidevsys.co.nz writes:

 Gavin What are the standard deviations?

 Gavin Do the arithmetic means change much if you exclude the 2 fastest
 Gavin  2 slowest?

 Gavin How do the arithmetic means compare to their respective medians?

 Gavin Essentially, how consistent are the results, or how great is the
 Gavin noise?  There may be better indicators than the ones I've
 Gavin suggested above.

This is all rather missing the point.

The relevant metric is not how much noise is introduced between runs of
the same code, but rather how much noise is introduced as a result of
non-consequential changes to the code.

I can get variations of several percent - easily more than three sigmas
of the timing of repeated runs of unchanged code - in the time taken to
sort a float8 column simply from introducing varying amounts of padding
into the body of a function which is never called in the test. Clearly,
the only possible effect here is that the changed memory addresses of
functions must be resulting in different patterns of cache misses /
cache replacements, or TLB misses, or similar low-level effects which
have nothing to do with the code as such.

(That this is a low-level alignment effect is supported by the fact that
the performance changes are not monotonic in the size of the padding;
adding more padding may cause either speedups or slowdowns.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position

2015-02-22 Thread Pavel Stehule

2015-02-22 3:00 GMT+01:00 Petr Jelinek p...@2ndquadrant.com:

 On 28/01/15 08:15, Pavel Stehule wrote:



 2015-01-28 0:01 GMT+01:00 Jim Nasby jim.na...@bluetreble.com
 mailto:jim.na...@bluetreble.com:

 On 1/27/15 4:36 AM, Pavel Stehule wrote:


 It is only partially identical - I would to use cache for
 array_offset, but it is not necessary for array_offsets ..
 depends how we would to modify current API to support externally
 cached data.


 Externally cached data?


 Some from these functions has own caches for minimize access to typcache
 (array_map, array_cmp is example). And in first case, I am trying to
 push these information from fn_extra, in second case I don't do it,
 because I don't expect a repeated call (and I am expecting so type cache
 will be enough).


 You actually do caching via fn_extra in both case and I think that's the
 correct way, and yes that part can be moved common function.

 I also see that the documentation does not say what is returned by
 array_offset if nothing is found (it's documented in code but not in sgml).


rebased + fixed docs

Regards

Pavel



 --
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml
new file mode 100644
index 9ea1068..b3630b4
*** a/doc/src/sgml/array.sgml
--- b/doc/src/sgml/array.sgml
*** SELECT * FROM sal_emp WHERE pay_by_quart
*** 600,605 
--- 600,614 
index, as described in xref linkend=indexes-types.
   /para
  
+  para
+   You can also search any value in array using the functionarray_offset/
+   function (It returns a position of first occurrence of value in the array):
+ 
+ programlisting
+ SELECT array_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon');
+ /programlisting
+  /para
+ 
   tip
para
 Arrays are not sets; searching for specific array elements
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index da2ed67..459343a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT NULLIF(value, '(none)') ...
*** 11480,11485 
--- 11480,11488 
  primaryarray_lower/primary
/indexterm
indexterm
+ primaryarray_offset/primary
+   /indexterm
+   indexterm
  primaryarray_prepend/primary
/indexterm
indexterm
*** SELECT NULLIF(value, '(none)') ...
*** 11598,11603 
--- 11601,11633 
 /row
 row
  entry
+  literal
+   functionarray_offset/function(typeanyarray/type, typeanyelement/type optional, typeint/type/optional)
+  /literal
+ /entry
+ entrytypeint/type/entry
+ entryreturns a offset of first occurrence of some element in a array. It uses
+ a literalIS NOT DISTINCT FROM/ operator for comparation. Third
+ optional argument can specify a initial offset when searching starts.
+ Returns NULL when there are not occurence of value in array./entry
+ entryliteralarray_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon')/literal/entry
+ entryliteral2/literal/entry
+/row
+row
+ entry
+  literal
+   functionarray_offsets/function(typeanyarray/type, typeanyelement/type)
+  /literal
+ /entry
+ entrytypeint[]/type/entry
+ entryreturns a array of offset of all occurrences some element in a array. It uses
+ a literalIS NOT DISTINCT FROM/ operator for comparation. Returns empty array,
+ when there are no occurence of element in input array./entry
+ entryliteralarray_offsets(ARRAY['A','A','B','A'], 'A')/literal/entry
+ entryliteral{1,2,4}/literal/entry
+/row
+row
+ entry
   literal
functionarray_prepend/function(typeanyelement/type, typeanyarray/type)
   /literal
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
new file mode 100644
index 5781b95..9bf0eb5
*** a/src/backend/utils/adt/array_userfuncs.c
--- b/src/backend/utils/adt/array_userfuncs.c
***
*** 12,20 
--- 12,24 
   */
  #include postgres.h
  
+ #include catalog/pg_type.h
  #include utils/array.h
  #include utils/builtins.h
  #include utils/lsyscache.h
+ #include utils/typcache.h
+ 
+ static bool array_offset_common(FunctionCallInfo fcinfo, int *result);
  
  
  /*
*** array_agg_array_finalfn(PG_FUNCTION_ARGS
*** 644,646 
--- 648,911 
  
  	PG_RETURN_DATUM(result);
  }
+ 
+ 
+ /*
+  * array_offset - returns a offset of entered element in a array.
+  * Returns NULL when values is not a element of the array. It allow
+  * searching a NULL value due using a NOT DISTINCT FROM operator. 
+  * 
+  * Biggest difference against width_array is unsorted input array.
+  */
+ Datum
+ array_offset(PG_FUNCTION_ARGS)
+ {
+ 	int	result;
+ 
+

Re: [HACKERS] SSL renegotiation

2015-02-22 Thread Andres Freund

On 2015-02-22 01:27:54 +0100, Emil Lenngren wrote:
 I honestly wonder why postgres uses renegotiation at all. The motivation
 that cryptoanalysis is easier as more data is sent seems quite
 far-fetched.

I don't think so. There's a fair number of algorithms that can/could be
much easier be attached with lots of data available. Especially if you
can guess/know/control some of the data.  Additionally renegotiating
regularly helps to constrain a possible key leagage to a certain amount
of time. With backend connections often being alive for weeks at a time
that's not a bad thing.

And it's not just us. E.g. openssh also triggers renegotiations based on
the amount of data sent.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 4

2015-02-22 Thread Petr Jelinek


On 22/02/15 09:57, Andres Freund wrote:

On 2015-02-19 00:49:50 +0100, Petr Jelinek wrote:

On 16/02/15 10:46, Andres Freund wrote:

On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote:


At a quick glance, this basic design seems workable. I would suggest
expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
small price to pay, to make it work more like everything else in the system.


I don't know. Growing from 3 to 5 byte overhead per relevant record (or
even 0 to 5 in case the padding is reused) is rather noticeable. If we
later find it to be a limit (I seriously doubt that), we can still
increase it in a major release without anybody really noticing.



I agree that limiting the overhead is important.

But I have related though about this - I wonder if it's worth to try to map
this more directly to the CommitTsNodeId.


Maybe. I'd rather go the other way round though;
replication_identifier.c/h's stuff seems much more generic than
CommitTsNodeId.



Probably not more generic but definitely nicer as the nodes are named 
and yes it has richer API.



I mean it is currently using that
for the actual storage, but I am thinking of having the Replication
Identifiers be the public API and the CommitTsNodeId stuff be just hidden
implementation detail. This thought is based on the fact that CommitTsNodeId
provides somewhat meaningless number while Replication Identifiers give us
nice name for that number. And if we do that then the type should perhaps be
same for both?


I'm not sure. Given that this is included in a significant number of
recordsd I'd really rather not increase the overhead as described
above. Maybe we can just limit CommitTsNodeId to the same size for now?



That would make sense.

I also wonder about the default nodeid infrastructure the committs 
provides. I can't think of use-case which it can be used for and which 
isn't solved by replication identifiers - in the end the main reason I 
added that infra was to make it possible to write something like 
replication identifiers as part of an extension. In any case I don't 
think the default nodeid can be used in parallel with replication 
identifiers since one will overwrite the SLRU record for the other. 
Maybe it's enough if this is documented but I think it might be better 
if this patch removed that default committs nodeid infrastructure. It's 
just few lines of code which nobody should be using yet.


Thinking about this some more and rereading the code I see that you are 
setting TransactionTreeSetCommitTsData during xlog replay, but not 
during transaction commit, that does not seem correct as the local 
records will not have any nodeid/origin.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-22 Thread Venkata Balaji N


 I am wondering a bit about interaction with wal_keep_segments.
 One thing is that wal_keep_segments is still specified in number of
 segments and not size units, maybe it would be worth to change it also?
 And the other thing is that, if set, the wal_keep_segments is the real
 max_wal_size from the user perspective (not from perspective of the
 algorithm in this patch, but user does not really care about that) which is
 somewhat weird given the naming.


In my opinion -

I think wal_keep_segments being number of segments would help a lot. In my
experience, while handling production databases, to arrive at an optimal
value for wal_keep_segments, we go by calculating number of segments
getting generated in wal archive destination (hourly or daily basis), this
would further help us calculate how many segments to keep considering
various other factors in an replication environment to ensure master has
enough WALs in pg_xlog when standby comes back up after the outage.

Ofcourse, if we can calculate number-of-segments, we can calculate the same
in terms of size too. Calculating number of segments would be more
feasible.

Regards,
VBN

Re: [HACKERS] hash agg is slower on wide tables?

2015-02-22 Thread Andrew Gierth

 Pavel == Pavel Stehule pavel.steh...@gmail.com writes:

 Pavel why we read all columns from t1?
 [...]
 Pavel so it looks so hashagg doesn't eliminate source columns well

I don't think it's supposed to eliminate them.

This is, if I'm understanding the planner logic right, physical-tlist
optimization; it's faster for a table scan to simply return the whole
row (copying nothing, just pointing to the on-disk tuple) and let
hashagg pick out the columns it needs, rather than for the scan to run a
projection step just to select specific columns.

If there's a Sort step, this isn't done because Sort neither evaluates
its input nor projects new tuples on its output, it simply accepts the
tuples it receives and returns them with the same structure. So now it's
important to have the node providing input to the Sort projecting out
only the minimum required set of columns.

Why it's slower on the wider table... that's less obvious.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash agg is slower on wide tables?

2015-02-22 Thread Andres Freund

On 2015-02-22 10:33:16 +, Andrew Gierth wrote:
 This is, if I'm understanding the planner logic right, physical-tlist
 optimization; it's faster for a table scan to simply return the whole
 row (copying nothing, just pointing to the on-disk tuple) and let
 hashagg pick out the columns it needs, rather than for the scan to run a
 projection step just to select specific columns.
 
 If there's a Sort step, this isn't done because Sort neither evaluates
 its input nor projects new tuples on its output, it simply accepts the
 tuples it receives and returns them with the same structure. So now it's
 important to have the node providing input to the Sort projecting out
 only the minimum required set of columns.
 
 Why it's slower on the wider table... that's less obvious.

It's likely to just be tuple deforming. I've not tried it but I'd bet
you'll see slot_deform* very high in the profile. For the narrow table
only two attributes need to be extracted, for the wider one everything
up to a11 will get extracted.

I've wondered before if we shouldn't use the caching via
slot-tts_values so freely - if you only use a couple values from a wide
tuple the current implementation really sucks if those few aren't at the
beginning of the tuple.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)

2015-02-22 Thread Michael Paquier

On Sat, Feb 21, 2015 at 6:51 AM, Peter Eisentraut pete...@gmx.net wrote:
 On 2/20/15 1:56 AM, Michael Paquier wrote:
 We'd still need the .gitignore files somewhere.  Do you want to move
 them one directory up?

 I am not sure I am getting what you are pointing to... For extensions
 that already have non-empty sql/ and expected/, they should have their
 own ignore entries as sql/.gitignore and expected/.gitignore. The
 point of the patch is to simplify the code tree of extensions that
 need to keep empty sql/ and expected/, for example to be able to run
 regression tests after a fresh repository clone for example.

 The affected modules have sql/.gitignore and/or expected/.gitignore
 files, so the case that the directory doesn't exist and needs to be
 created doesn't actually happen.

What about extensions that do not have sql/ and extended/ in their
code tree? I don't have an example of such extensions in my mind but I
cannot assume either that they do not exist. See for example something
that happended a couple of months back, dummy_seclabel has been
trapped by that with df761e3, fixed afterwards with 3325624 (the
structure has been changed again since, but that's the error showed up
because of those missing sql/ and expected/).

 You could argue that these .gitignore files don't actually belong there,
 but your patch doesn't change or move those files, and even modules that
 have non-empty sql/ or expected/ directories have .gitignore files
 there, so it is considered the appropriate location.

This is up to the maintainer of each extension to manage their code
tree. However I can imagine that some people would be grateful if we
allow them to not need sql/ and expected/ containing only one single
.gitignore file ignoring everything with *, making the code tree of
their extensions cleaner.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash agg is slower on wide tables?

2015-02-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 I've wondered before if we shouldn't use the caching via
 slot-tts_values so freely - if you only use a couple values from a wide
 tuple the current implementation really sucks if those few aren't at the
 beginning of the tuple.

Don't see how you expect to get a win that way.  Visiting column k
requires crawling over columns 1..k-1 in practically all cases.
You could maybe save a cycle or so by omitting the physical store
into the Datum array, assuming that you never did need the column
value later ... but the extra bookkeeping for more complicated
tracking of which columns had been extracted would eat that savings
handily.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-22 Thread Amit Kapila

On Mon, Feb 16, 2015 at 10:49 AM, Kevin Grittner kgri...@ymail.com wrote:

 What this discussion has made me reconsider is the metric for
 considering a transaction too old.  The number of transaction IDs
 consumed seems inferior as the unit of measure for that to LSN or
 time.

 It looks to me to be pretty trivial (on the order of maybe 30 lines
 of code) to specify this GUC in minutes rather than transaction
 IDs.

It seems to me that SQL Server also uses similar mechanism to
avoid the bloat in version store (place to store previous versions or
record).

Mechanism used by them is that during the shrink process, the
longest running transactions that have not yet generated row
versions are marked as victims.  If a transaction is marked as a
victim, it can no longer read the row versions in the version store.
When it attempts to read row versions, an error message is
generated and the transaction is rolled back.

I am not sure how much weight it adds to the usefulness of this
patch, however I think if other leading databases provide a way to
control the bloat, it indicates that most of the customers having
write-intesive workload would like to see such an option.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Replication identifiers, take 4

2015-02-22 Thread Peter Eisentraut

On 2/15/15 7:24 PM, Andres Freund wrote:
 On 2015-02-16 01:21:55 +0100, Andres Freund wrote:
 Here's my next attept attempt at producing something we can agree
 upon.

 The major change that might achieve that is that I've now provided a
 separate method to store the origin_id of a node. I've made it
 conditional on !REPLICATION_IDENTIFIER_REUSE_PADDING, to show both
 paths. That new method uses Heikki's xlog rework to dynamically add the
 origin to the record if a origin is set up. That works surprisingly
 simply.

I'm trying to figure out what this feature is supposed to do, but the
patch contains no documentation or a commit message.  Where is one
supposed to start?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-22 Thread Kevin Grittner

Amit Kapila amit.kapil...@gmail.com wrote:

 It seems to me that SQL Server also uses similar mechanism to
 avoid the bloat in version store (place to store previous
 versions or record).

 I think if other leading databases provide a way to control the
 bloat, it indicates that most of the customers having
 write-intesive workload would like to see such an option.

Yeah, I think many users are surprised by the damage that can be
done to a database by an idle connection or long-running read-only
query, especially when coming from other databases which have
protections against that.

The bad news is that changing to specifying the limit in time
rather than transaction IDs is far more complicated than I thought.
Basically, we need to associate times in the past with the value of
latestCompletedXid (or derived values like snapshot xmin) at those
times.  I was initially thinking that by adding a timestamp to the
snapshot we could derive this from active snapshots.  For any one
connection, it's not that hard for it to scan the pairingheap of
snapshots and pick out the right values; but the problem is that it
is process-local memory and this needs to work while the connection
is sitting idle for long periods of time.  I've got a couple ideas
on how to approach it, but neither seems entirely satisfying:

(1) Use a new timer ID to interrupt the process whenever its
oldest snapshot which hasn't yet crossed the old snapshot
threshold does so.  The problem is that this seems as though it
would need to do an awful lot more work (scanning the pairing heap
for the best match) than anything else is currently doing in a
timeout handler.  One alternative would be to keep a second pairing
heap to track snapshots which have not crossed the threshold,
removing items as they get old -- that would make the work needed
in the timeout handler closer to other handlers.

(2) Use a course enough granularity on time and a short enough
maximum for the GUC to just keep a circular buffer of the mappings
in memory.  We might be able to make this dense enough that one
minute resolution for up to 60 days could fit in 338kB.  Obviously
that could be reduced with courser granularity or a shorter
maximum.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2015-02-22 Thread Tomas Vondra

On 22.2.2015 09:14, Jeff Davis wrote:
 
 On Wed, 2015-01-07 at 20:07 +0100, Tomas Vondra wrote:
 So I started digging in the code and I noticed this:

 hash_mem = MemoryContextMemAllocated(aggstate-hashcontext, true);

 which is IMHO obviously wrong, because that accounts only for the
 hashtable itself. It might be correct for aggregates with state passed
 by value, but it's incorrect for state passed by reference (e.g.
 Numeric, arrays etc.), because initialize_aggregates does this:

 oldContext = MemoryContextSwitchTo(aggstate-aggcontext);
 pergroupstate-transValue = datumCopy(peraggstate-initValue,
 peraggstate-transtypeByVal,
 peraggstate-transtypeLen);
 MemoryContextSwitchTo(oldContext);

 and it's also wrong for all the user-defined aggretates that have no
 access to hashcontext at all, and only get reference to aggcontext using
 AggCheckCallContext(). array_agg() is a prime example.
 
 Actually, I believe the first one is right, and the second one is
 wrong. If we allocate pass-by-ref transition states in aggcontext,
 that defeats the purpose of the patch, because we can't release the
 memory when we start a new batch (because we can't reset aggcontext).
 
 Instead, the transition states should be copied into hashcontext; we 
 should count the memory usage of hashcontext; AggCheckCallContext
 should return hashcontext; and after each batch we should reset
 hashcontext.

Hmmm, maybe. I'd however suggest not to stuff everything into a single
memory context, but to create a 'transvalue' context next to
hashcontext, and return that one from AggCheckCallContext(). I.e.
something like this:

aggcontext
hashcontext - hash table etc.
transcontext - new context for transition values

It's better to keep things separated, I guess.

 It might be worth refactoring a bit to call it the groupcontext or 
 something instead, and use it for both HashAgg and GroupAgg. That
 would reduce the need for conditionals.

Not sure that's worth it. The patch already seems complex enough.

 [ ... problems with array_agg subcontexts ... ]
 
 and it runs indefinitely (I gave up after a few minutes). I believe this
 renders the proposed memory accounting approach dead.
 
 ...
 
 The array_agg() patch I submitted to this CF would fix this
 particular query, as it removes the child contexts (so there's no
 need for recursion in MemoryContextMemAllocated), but it does
 nothing to the user-defined aggregates out there. And it's not
 committed yet.
 
 Now that your patch is committed, I don't think I'd call the memory 
 accounting patch I proposed dead yet. We decided that creating one 
 context per group is essentially a bug, so we don't need to optimize
 for that case.

Yes, but the question is how many other aggregates designed similarly to
array_agg are out there (in extensions atc.), and whether we actually
care about them. Because those will be equally impacted by this, and we
can't fix them.

But maybe we don't really care, and it's the author's responsibility to
test them on a new major version, and fix the performance issue just
like we did with array_agg().

 
 Your approach may be better, though.


I'm not sure - it really boils down do the goal and assumptions.

If we restrict ourselves to HashAggregate-only solution, under the
assumption that the number of child contexts is small (and leave the fix
up to the author of the extension), then your approach is probably
simpler and better than the approach I proposed.

If we're aiming for a general-purpose solution that might be used
elsewhere, then I'm afraid we can't make the 'small number of child
contexts' assumption and the simple solution may not be the right one.

Although I believe that we shouldn't be creating large numbers of
'parallel' memory contexts anyway.

 Thank you for reviewing. I'll update my patches and resubmit for the
  next CF.

Thanks for working on this.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash agg is slower on wide tables?

2015-02-22 Thread Andres Freund

On 2015-02-22 09:58:31 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I've wondered before if we shouldn't use the caching via
  slot-tts_values so freely - if you only use a couple values from a wide
  tuple the current implementation really sucks if those few aren't at the
  beginning of the tuple.
 
 Don't see how you expect to get a win that way.  Visiting column k
 requires crawling over columns 1..k-1 in practically all cases.
 You could maybe save a cycle or so by omitting the physical store
 into the Datum array, assuming that you never did need the column
 value later ... but the extra bookkeeping for more complicated
 tracking of which columns had been extracted would eat that savings
 handily.

Depends a bit on the specifics. In this case attcacheoff would allow you
direct access, which surely is going to be more efficient. I'm not sure
how frequent that happens in practice.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 4

2015-02-22 Thread Andres Freund

On 2015-02-22 10:03:59 -0500, Peter Eisentraut wrote:
 On 2/15/15 7:24 PM, Andres Freund wrote:
  On 2015-02-16 01:21:55 +0100, Andres Freund wrote:
  Here's my next attept attempt at producing something we can agree
  upon.
 
  The major change that might achieve that is that I've now provided a
  separate method to store the origin_id of a node. I've made it
  conditional on !REPLICATION_IDENTIFIER_REUSE_PADDING, to show both
  paths. That new method uses Heikki's xlog rework to dynamically add the
  origin to the record if a origin is set up. That works surprisingly
  simply.
 
 I'm trying to figure out what this feature is supposed to do, but the
 patch contains no documentation or a commit message.  Where is one
 supposed to start?

For a relatively short summary:
http://archives.postgresql.org/message-id/20150216002155.GI15326%40awork2.anarazel.de

For a longer one:
http://www.postgresql.org/message-id/20140923182422.ga15...@alap3.anarazel.de

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash agg is slower on wide tables?

2015-02-22 Thread Pavel Stehule

2015-02-22 13:22 GMT+01:00 Andres Freund and...@2ndquadrant.com:

 On 2015-02-22 10:33:16 +, Andrew Gierth wrote:
  This is, if I'm understanding the planner logic right, physical-tlist
  optimization; it's faster for a table scan to simply return the whole
  row (copying nothing, just pointing to the on-disk tuple) and let
  hashagg pick out the columns it needs, rather than for the scan to run a
  projection step just to select specific columns.
 
  If there's a Sort step, this isn't done because Sort neither evaluates
  its input nor projects new tuples on its output, it simply accepts the
  tuples it receives and returns them with the same structure. So now it's
  important to have the node providing input to the Sort projecting out
  only the minimum required set of columns.
 
  Why it's slower on the wider table... that's less obvious.

 It's likely to just be tuple deforming. I've not tried it but I'd bet
 you'll see slot_deform* very high in the profile. For the narrow table
 only two attributes need to be extracted, for the wider one everything
 up to a11 will get extracted.

 I've wondered before if we shouldn't use the caching via
 slot-tts_values so freely - if you only use a couple values from a wide
 tuple the current implementation really sucks if those few aren't at the
 beginning of the tuple.


the number of columns has strong effect, but it is not only one. I tested
first two columns, and bigger tables is aggregated slowly - about 30%

postgres=# explain analyze select count(*), a1, a2 from t1 group by 3,2
order by 3,2;
   QUERY
PLAN
-
 Sort  (cost=2023263.19..2023263.25 rows=24 width=4) (actual
time=84073.451..84073.452 rows=24 loops=1)
   Sort Key: a2, a1
   Sort Method: quicksort  Memory: 26kB
   -  HashAggregate  (cost=2023262.40..2023262.64 rows=24 width=4) (actual
time=84073.430..84073.433 rows=24 loops=1) -- 23700
 Group Key: a2, a1
 -  Seq Scan on t1  (cost=0.00..1497532.80 rows=70097280 width=4)
(actual time=67.325..60152.052 rows=70097280 loops=1)
 Planning time: 0.107 ms
 Execution time: 84073.534 ms
(8 rows)


postgres=# explain analyze select count(*), a1, a2 from t2 group by 3,2
order by 3,2;
  QUERY
PLAN
---
 Sort  (cost=1536868.33..1536868.39 rows=24 width=4) (actual
time=21963.230..21963.231 rows=24 loops=1)
   Sort Key: a2, a1
   Sort Method: quicksort  Memory: 26kB
   -  HashAggregate  (cost=1536867.54..1536867.78 rows=24 width=4) (actual
time=21963.209..21963.213 rows=24 loops=1) -- 16000
 Group Key: a2, a1
 -  Seq Scan on t2  (cost=0.00..1011137.88 rows=70097288 width=4)
(actual time=0.063..5647.404 rows=70097280 loops=1)
 Planning time: 0.069 ms
 Execution time: 21963.340 ms
(8 rows)

Profile when data are in first two columns

   7.87%  postgres [.]
slot_deform_tuple
   7.48%  postgres [.] slot_getattr
   7.10%  postgres [.] hash_search_with_hash_value
   3.74%  postgres [.] execTuplesMatch
   3.68%  postgres [.] ExecAgg

Profile when data are in first and 11 column

  20.35%  postgres [.] slot_deform_tuple
   6.55%  postgres [.] hash_search_with_hash_value
   5.86%  postgres [.] slot_getattr
   4.15%  postgres [.] ExecAgg

So your hypothesis is valid

Regards

Pavel




 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

Re: [HACKERS] Precedence of standard comparison operators

2015-02-22 Thread Tom Lane

I wrote:
 Attached is a draft patch to bring the precedence of comparison operators
 and IS tests into line with the SQL standard.  I have not yet looked into
 producing warnings for changes in parsing decisions ...

I've made some progress on getting parse_expr.c to produce warnings by
after-the-fact analysis of the raw parse tree, and I think it can be made
to work.  However, I've run into two stumbling blocks that greatly limit
the quality of the warnings, and will need to be fixed as separate
patches:

1. BooleanTest and NullTest node types don't carry location pointers,
so there's no way to give an error cursor position referencing them.
Simple enough addition.  I think we left these out because there were
no post-grammar error reports involving them, but now we need some.

2. gram.y expands BETWEEN operations into complex AND/OR nests on sight,
losing the information that there was a BETWEEN there, which is important
if we're to detect possible precedence changes involving BETWEEN.

What we should do about #2 is preserve BETWEEN as a distinct construct
in the output of raw parsing.  This is a good thing anyway, because in
the long run we are going to want to fix BETWEEN's semantic issues
(such as multiple evaluation of possibly-volatile input expressions)
and fixing what the grammar does with it is an essential first step.

Barring objection I'll go do both of those things as separate patches,
and then come back to the precedence warnings.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Abbreviated keys for Numeric

2015-02-22 Thread Gavin Flower


On 22/02/15 22:59, Andrew Gierth wrote:

Gavin == Gavin Flower gavinflo...@archidevsys.co.nz writes:

  Gavin What are the standard deviations?

  Gavin Do the arithmetic means change much if you exclude the 2 fastest
  Gavin  2 slowest?

  Gavin How do the arithmetic means compare to their respective medians?

  Gavin Essentially, how consistent are the results, or how great is the
  Gavin noise?  There may be better indicators than the ones I've
  Gavin suggested above.

This is all rather missing the point.

The relevant metric is not how much noise is introduced between runs of
the same code, but rather how much noise is introduced as a result of
non-consequential changes to the code.

I can get variations of several percent - easily more than three sigmas
of the timing of repeated runs of unchanged code - in the time taken to
sort a float8 column simply from introducing varying amounts of padding
into the body of a function which is never called in the test. Clearly,
the only possible effect here is that the changed memory addresses of
functions must be resulting in different patterns of cache misses /
cache replacements, or TLB misses, or similar low-level effects which
have nothing to do with the code as such.

(That this is a low-level alignment effect is supported by the fact that
the performance changes are not monotonic in the size of the padding;
adding more padding may cause either speedups or slowdowns.)

What is you have pointed out is 'obvious', in retrospect - but still, 
far more extreme than I would have ever expected!


There well may be analogues of the story where a DP manager was 
convinced to get faster hard drives, 50% faster than the old ones, to 
speed up an overnight batch run.  The program then took about 50% longer 
to run.  Turns out that previously the mainframe processed one block in 
a little less time than the old hard drive took to rotate once.  So the 
new hard drives ended up doing two rotations per block read, so took 
longer, despite being faster! (A modern low end Smart Phone, probably 
has at least a thousand times more processing power than that poor old 
Mainframe!)



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] deparsing utility commands

2015-02-22 Thread Andres Freund

On 2015-02-21 17:30:24 +0100, Andres Freund wrote:

   /*
  + * deparse_CreateFunctionStmt
  + * Deparse a CreateFunctionStmt (CREATE FUNCTION)
  + *
  + * Given a function OID and the parsetree that created it, return the JSON
  + * blob representing the creation command.
  + *
  + * XXX this is missing the per-function custom-GUC thing.
  + */

  Subject: [PATCH 27/42] deparse: Support ALTER FUNCTION
 
  + * deparse_AlterFunctionStmt
  + * Deparse a AlterFunctionStmt (ALTER FUNCTION)
  + *
  + * Given a function OID and the parsetree that created it, return the JSON
  + * blob representing the alter command.
  + *
  + * XXX this is missing the per-function custom-GUC thing.
  + */
 
 Hm, so my earlier ptatch needs to partially be copied here. I'm running out 
 of time now though...

Updated 0003 attached that supports both SET for both CREATE and
ALTER. In my previous patch I'd looked at proconfig for the values. A
bit further thought revealed that that's not such a great idea: It works
well enough for CREATE FUNCTION, but already breaks down at CREATE OR
REPLACE FUNCTION unless we want to emit RESET ALL (SET ...)+ which seems
mighty ugly.

Also, the code is actually a good bit easier to understand this
way. I. hate. arrays. ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From bb8b0027e82e628fb098b9707f36fa5ce08b9234 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Sat, 21 Feb 2015 16:36:34 +0100
Subject: [PATCH 3/3] Support SET/RESET in CREATE/ALTER FUNCTION.

---
 src/backend/tcop/deparse_utility.c | 61 ++
 1 file changed, 55 insertions(+), 6 deletions(-)

diff --git a/src/backend/tcop/deparse_utility.c b/src/backend/tcop/deparse_utility.c
index 35c4eca..cb52cb2 100644
--- a/src/backend/tcop/deparse_utility.c
+++ b/src/backend/tcop/deparse_utility.c
@@ -2802,14 +2802,46 @@ deparse_CreateDomain(Oid objectId, Node *parsetree)
 	return createDomain;
 }
 
+static ObjTree *
+deparse_FunctionSet(VariableSetKind kind, char *name, char *value)
+{
+	ObjTree	   *r;
+
+	if (kind == VAR_RESET_ALL)
+	{
+		r = new_objtree_VA(RESET ALL, 0);
+	}
+	else if (value != NULL)
+	{
+		/*
+		 * Some GUC variable names are 'LIST' type and hence must not be
+		 * quoted. FIXME: shouldn't this and pg_get_functiondef() rather use
+		 * guc.c to check for GUC_LIST?
+		 */
+		if (pg_strcasecmp(name, DateStyle) == 0
+			|| pg_strcasecmp(name, search_path) == 0)
+			r = new_objtree_VA(SET %{set_name}I TO %{set_value}s, 0);
+		else
+			r = new_objtree_VA(SET %{set_name}I TO %{set_value}L, 0);
+
+		append_string_object(r, set_name, name);
+		append_string_object(r, set_value, value);
+	}
+	else
+	{
+		r = new_objtree_VA(RESET %{set_name}I, 0);
+		append_string_object(r, set_name, name);
+	}
+
+	return r;
+}
+
 /*
  * deparse_CreateFunctionStmt
  *		Deparse a CreateFunctionStmt (CREATE FUNCTION)
  *
  * Given a function OID and the parsetree that created it, return the JSON
  * blob representing the creation command.
- *
- * XXX this is missing the per-function custom-GUC thing.
  */
 static ObjTree *
 deparse_CreateFunction(Oid objectId, Node *parsetree)
@@ -2825,6 +2857,7 @@ deparse_CreateFunction(Oid objectId, Node *parsetree)
 	char	   *probin;
 	List	   *params;
 	List	   *defaults;
+	List	   *sets = NIL;
 	ListCell   *cell;
 	ListCell   *curdef;
 	ListCell   *table_params = NULL;
@@ -3089,7 +3122,21 @@ deparse_CreateFunction(Oid objectId, Node *parsetree)
 		append_float_object(tmp, rows, procForm-prorows);
 	append_object_object(createFunc, rows, tmp);
 
-	append_array_object(createFunc, set_options, NIL);
+	foreach(cell, node-options)
+	{
+		DefElem	*defel = (DefElem *) lfirst(cell);
+		ObjTree	   *tmp = NULL;
+
+		if (strcmp(defel-defname, set) == 0)
+		{
+			VariableSetStmt *sstmt = (VariableSetStmt *) defel-arg;
+			char *value = ExtractSetVariableArgs(sstmt);
+
+			tmp = deparse_FunctionSet(sstmt-kind, sstmt-name, value);
+			sets = lappend(sets, new_object_object(tmp));
+		}
+	}
+	append_array_object(createFunc, set_options, sets);
 
 	if (probin == NULL)
 	{
@@ -3114,8 +3161,6 @@ deparse_CreateFunction(Oid objectId, Node *parsetree)
  *
  * Given a function OID and the parsetree that created it, return the JSON
  * blob representing the alter command.
- *
- * XXX this is missing the per-function custom-GUC thing.
  */
 static ObjTree *
 deparse_AlterFunction(Oid objectId, Node *parsetree)
@@ -3203,8 +3248,12 @@ deparse_AlterFunction(Oid objectId, Node *parsetree)
 	 defGetNumeric(defel));
 		}
 		else if (strcmp(defel-defname, set) == 0)
-			elog(ERROR, unimplemented deparse of ALTER FUNCTION SET);
+		{
+			VariableSetStmt *sstmt = (VariableSetStmt *) defel-arg;
+			char *value = ExtractSetVariableArgs(sstmt);
 
+			tmp = deparse_FunctionSet(sstmt-kind, sstmt-name, value);
+		}
 		elems = lappend(elems, new_object_object(tmp));
 	}
 
--

Re: [HACKERS] Abbreviated keys for Numeric

2015-02-22 Thread Tomas Vondra

On 22.2.2015 10:59, Andrew Gierth wrote:
 Gavin == Gavin Flower gavinflo...@archidevsys.co.nz writes:

  Gavin Essentially, how consistent are the results, or how great is the
  Gavin noise?  There may be better indicators than the ones I've
  Gavin suggested above.
 
 This is all rather missing the point.
 
 The relevant metric is not how much noise is introduced between runs
 of the same code, but rather how much noise is introduced as a result
 of non-consequential changes to the code.
 
 I can get variations of several percent - easily more than three
 sigmas of the timing of repeated runs of unchanged code - in the time
 taken to sort a float8 column simply from introducing varying amounts
 of padding into the body of a function which is never called in the
 test. Clearly, the only possible effect here is that the changed
 memory addresses of functions must be resulting in different patterns
 of cache misses / cache replacements, or TLB misses, or similar
 low-level effects which have nothing to do with the code as such.
 
 (That this is a low-level alignment effect is supported by the fact
 that the performance changes are not monotonic in the size of the
 padding; adding more padding may cause either speedups or slowdowns.)

Interesting, but I think Gavin was asking about how much variation was
there for each tested case (e.g. query executed on the same code /
dataset). And in those cases the padding / alignment won't change, and
thus the effects you describe won't influence the results, no?


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)

2015-02-22 Thread Peter Eisentraut

On 2/22/15 5:41 AM, Michael Paquier wrote:
 You could argue that these .gitignore files don't actually belong there,
 but your patch doesn't change or move those files, and even modules that
 have non-empty sql/ or expected/ directories have .gitignore files
 there, so it is considered the appropriate location.
 
 This is up to the maintainer of each extension to manage their code
 tree. However I can imagine that some people would be grateful if we
 allow them to not need sql/ and expected/ containing only one single
 .gitignore file ignoring everything with *, making the code tree of
 their extensions cleaner.

I would like to have an extension in tree that also does this, so we
have a regression test of this functionality.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 4

2015-02-22 Thread Andres Freund

On 2015-02-22 04:59:30 +0100, Petr Jelinek wrote:
 Now that the issue with padding seems to no longer exists since the patch
 works both with and without padding, I went through the code and here are
 some comments I have (in no particular order).
 
 In CheckPointReplicationIdentifier:
 + * FIXME: Add a CRC32 to the end.
 
 The function already does it (I guess you forgot to remove the comment).

Yep. I locally have a WIP version that's much cleaned up and doesn't
contain it anymore.

 Using max_replication_slots as limit for replication_identifier states does
 not really make sense to me as replication_identifiers track remote info
 while and slots are local and in case of master-slave replication you need
 replication identifiers but don't need slots.

On the other hand, it's quite cheap if unused. Not sure if several
variables are worth it.

 In bootstrap.c:
  #define MARKNOTNULL(att) \
  ((att)-attlen  0 || \
   (att)-atttypid == OIDVECTOROID || \
 - (att)-atttypid == INT2VECTOROID)
 + (att)-atttypid == INT2VECTOROID || \
 + strcmp(NameStr((att)-attname), riname) == 0 \
 +)
 
 Huh? Can this be solved in a nicer way?

Yes. I'd mentioned that this is just a temporary hack ;). I've since
pushed a more proper solution; BKI_FORCE_NOT_NULL can now be specified
on the column.

 Since we call XLogFlush with local_lsn as parameter, shouldn't we check that
 it's actually within valid range?
 Currently we'll get errors like this if set to invalid value:
 ERROR:  xlog flush request 123/123 is not satisfied --- flushed only to
 0/168FB18

I think we should rather remove local_lsn from all parameters that are
user callable, adding them there was a mistake. It's really only
relevant for the cases where it's called by commit.

 In AdvanceReplicationIndentifier:
 +/*
 + * XXX: should we restore into a hashtable and dump into shmem only 
 after
 + * recovery finished?
 + */
 
 Probably no given that the function is also callable via SQL interface.

Well, it's still a good idea regardless...

 As I wrote in another email, I would like to integrate the RepNodeId and
 CommitTSNodeId into single thing.

Will reply separately there.

 There are no docs for the new sql interfaces.

Yea. The whole thing needs docs.


Thanks,


Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] hash agg is slower on wide tables?

2015-02-22 Thread Pavel Stehule

Hi

I did some benchmarks and I found some strange numbers.

do $$
begin
  drop table if exists t1;
  execute 'create table t1(' ||
 array_to_string(array(select 'a' || i || ' smallint' from
generate_series(1,30) g(i)), ',') || ')';
  -- special column a2, a11
  insert into t1
select 2008, i % 12 + 1, random()*20, random()*20, random()*20,
random()*20, random()*20, random()*20, random()*20, random()*20,
   case when random()  0.9 then 1 else 0 end, random()*20,
random()*20, random()*20, random()*20, random()*20, random()*20,
random()*20, random()*20, random()*20,
   random()*20, random()*20, random()*20, random()*20, random()*20,
random()*20, random()*20, random()*20, random()*20, random()*20
  from generate_series(1,7009728) g(i);
  drop table if exists t2;
  create table t2 as select a2, a11 from t1;
  analyze t1; analyze t2;
end;
$$;

postgres=# \dt+ t*
  List of relations
 Schema | Name | Type  | Owner |  Size  | Description
+--+---+---++-
 public | t1   | table | pavel | 622 MB |
 public | t2   | table | pavel | 242 MB |
(2 rows)

postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2
order by 3,2;
QUERY
PLAN
---
 Sort  (cost=202327.03..202327.09 rows=24 width=4) (actual
time=2609.159..2609.161 rows=24 loops=1)
   Sort Key: a11, a2
   Sort Method: quicksort  Memory: 26kB
   -  HashAggregate  (cost=202326.24..202326.48 rows=24 width=4) (actual
time=2609.137..2609.139 rows=24 loops=1) --- grouping 1997 ms
 Group Key: a11, a2
 -  Seq Scan on t1  (cost=0.00..149753.28 rows=7009728 width=4)
(actual time=0.071..616.222 rows=7009728 loops=1)
 Planning time: 0.138 ms
 Execution time: 2609.247 ms
(8 rows)

postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2
order by 3,2;
QUERY
PLAN
---
 Sort  (cost=153688.03..153688.09 rows=24 width=4) (actual
time=2100.058..2100.059 rows=24 loops=1)
   Sort Key: a11, a2
   Sort Method: quicksort  Memory: 26kB
   -  HashAggregate  (cost=153687.24..153687.48 rows=24 width=4) (actual
time=2100.037..2100.040 rows=24 loops=1) --- grouping 1567 ms -- 25% faster
 Group Key: a11, a2
 -  Seq Scan on t2  (cost=0.00..101114.28 rows=7009728 width=4)
(actual time=0.043..532.680 rows=7009728 loops=1)
 Planning time: 0.178 ms
 Execution time: 2100.158 ms
(8 rows)

postgres=# \dt+ t*
   List of relations
 Schema | Name | Type  | Owner |  Size   | Description
+--+---+---+-+-
 public | t1   | table | pavel | 6225 MB |
 public | t2   | table | pavel | 2423 MB |
(2 rows)

postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2
order by 3,2;
   QUERY
PLAN
-
 Sort  (cost=2023263.19..2023263.25 rows=24 width=4) (actual
time=99453.272..99453.274 rows=24 loops=1)
   Sort Key: a11, a2
   Sort Method: quicksort  Memory: 26kB
   -  HashAggregate  (cost=2023262.40..2023262.64 rows=24 width=4) (actual
time=99453.244..99453.249 rows=24 loops=1) --- 31891 ms
 Group Key: a11, a2
 -  Seq Scan on t1  (cost=0.00..1497532.80 rows=70097280 width=4)
(actual time=16.935..67562.615 rows=70097280 loops=1)
 Planning time: 14.526 ms
 Execution time: 99453.413 ms
(8 rows)

postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2
order by 3,2;
  QUERY
PLAN
---
 Sort  (cost=1536868.33..1536868.39 rows=24 width=4) (actual
time=20656.397..20656.399 rows=24 loops=1)
   Sort Key: a11, a2
   Sort Method: quicksort  Memory: 26kB
   -  HashAggregate  (cost=1536867.54..1536867.78 rows=24 width=4) (actual
time=20656.375..20656.378 rows=24 loops=1) --- 15248 ms --100% faster
 Group Key: a11, a2
 -  Seq Scan on t2  (cost=0.00..1011137.88 rows=70097288 width=4)
(actual time=0.060..5408.205 rows=70097280 loops=1)
 Planning time: 0.161 ms
 Execution time: 20656.475 ms
(8 rows)

It looks like hah agg is slower when it is based on wide table about
25-100%. Is it - or I don't see something?

Regards

Pavel

Re: [HACKERS] Replication identifiers, take 4

2015-02-22 Thread Andres Freund

On 2015-02-19 00:49:50 +0100, Petr Jelinek wrote:
 On 16/02/15 10:46, Andres Freund wrote:
 On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote:
 
 At a quick glance, this basic design seems workable. I would suggest
 expanding the replication IDs to regular 4 byte oids. Two extra bytes is a
 small price to pay, to make it work more like everything else in the system.
 
 I don't know. Growing from 3 to 5 byte overhead per relevant record (or
 even 0 to 5 in case the padding is reused) is rather noticeable. If we
 later find it to be a limit (I seriously doubt that), we can still
 increase it in a major release without anybody really noticing.
 
 
 I agree that limiting the overhead is important.
 
 But I have related though about this - I wonder if it's worth to try to map
 this more directly to the CommitTsNodeId.

Maybe. I'd rather go the other way round though;
replication_identifier.c/h's stuff seems much more generic than
CommitTsNodeId.

 I mean it is currently using that
 for the actual storage, but I am thinking of having the Replication
 Identifiers be the public API and the CommitTsNodeId stuff be just hidden
 implementation detail. This thought is based on the fact that CommitTsNodeId
 provides somewhat meaningless number while Replication Identifiers give us
 nice name for that number. And if we do that then the type should perhaps be
 same for both?

I'm not sure. Given that this is included in a significant number of
recordsd I'd really rather not increase the overhead as described
above. Maybe we can just limit CommitTsNodeId to the same size for now?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2015-02-22 Thread Jeff Davis


On Wed, 2015-01-07 at 20:07 +0100, Tomas Vondra wrote:
 So I started digging in the code and I noticed this:
 
 hash_mem = MemoryContextMemAllocated(aggstate-hashcontext, true);
 
 which is IMHO obviously wrong, because that accounts only for the
 hashtable itself. It might be correct for aggregates with state passed
 by value, but it's incorrect for state passed by reference (e.g.
 Numeric, arrays etc.), because initialize_aggregates does this:
 
 oldContext = MemoryContextSwitchTo(aggstate-aggcontext);
 pergroupstate-transValue = datumCopy(peraggstate-initValue,
 peraggstate-transtypeByVal,
 peraggstate-transtypeLen);
 MemoryContextSwitchTo(oldContext);
 
 and it's also wrong for all the user-defined aggretates that have no
 access to hashcontext at all, and only get reference to aggcontext using
 AggCheckCallContext(). array_agg() is a prime example.

Actually, I believe the first one is right, and the second one is wrong.
If we allocate pass-by-ref transition states in aggcontext, that defeats
the purpose of the patch, because we can't release the memory when we
start a new batch (because we can't reset aggcontext).

Instead, the transition states should be copied into hashcontext; we
should count the memory usage of hashcontext; AggCheckCallContext should
return hashcontext; and after each batch we should reset hashcontext.

It might be worth refactoring a bit to call it the groupcontext or
something instead, and use it for both HashAgg and GroupAgg. That would
reduce the need for conditionals.



[ ... problems with array_agg subcontexts ... ]

 and it runs indefinitely (I gave up after a few minutes). I believe this
 renders the proposed memory accounting approach dead.

...

 The array_agg() patch I submitted to this CF would fix this particular
 query, as it removes the child contexts (so there's no need for
 recursion in MemoryContextMemAllocated), but it does nothing to the
 user-defined aggregates out there. And it's not committed yet.

Now that your patch is committed, I don't think I'd call the memory
accounting patch I proposed dead yet. We decided that creating one
context per group is essentially a bug, so we don't need to optimize for
that case.

Your approach may be better, though.

Thank you for reviewing. I'll update my patches and resubmit for the
next CF.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash agg is slower on wide tables?

2015-02-22 Thread Pavel Stehule

2015-02-22 9:28 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 I did some benchmarks and I found some strange numbers.

 do $$
 begin
   drop table if exists t1;
   execute 'create table t1(' ||
  array_to_string(array(select 'a' || i || ' smallint' from
 generate_series(1,30) g(i)), ',') || ')';
   -- special column a2, a11
   insert into t1
 select 2008, i % 12 + 1, random()*20, random()*20, random()*20,
 random()*20, random()*20, random()*20, random()*20, random()*20,
case when random()  0.9 then 1 else 0 end, random()*20,
 random()*20, random()*20, random()*20, random()*20, random()*20,
 random()*20, random()*20, random()*20,
random()*20, random()*20, random()*20, random()*20,
 random()*20, random()*20, random()*20, random()*20, random()*20, random()*20
   from generate_series(1,7009728) g(i);
   drop table if exists t2;
   create table t2 as select a2, a11 from t1;
   analyze t1; analyze t2;
 end;
 $$;

 postgres=# \dt+ t*
   List of relations
  Schema | Name | Type  | Owner |  Size  | Description
 +--+---+---++-
  public | t1   | table | pavel | 622 MB |
  public | t2   | table | pavel | 242 MB |
 (2 rows)

 postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2
 order by 3,2;
 QUERY
 PLAN

 ---
  Sort  (cost=202327.03..202327.09 rows=24 width=4) (actual
 time=2609.159..2609.161 rows=24 loops=1)
Sort Key: a11, a2
Sort Method: quicksort  Memory: 26kB
-  HashAggregate  (cost=202326.24..202326.48 rows=24 width=4) (actual
 time=2609.137..2609.139 rows=24 loops=1) --- grouping 1997 ms
  Group Key: a11, a2
  -  Seq Scan on t1  (cost=0.00..149753.28 rows=7009728 width=4)
 (actual time=0.071..616.222 rows=7009728 loops=1)
  Planning time: 0.138 ms
  Execution time: 2609.247 ms
 (8 rows)

 postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2
 order by 3,2;
 QUERY
 PLAN

 ---
  Sort  (cost=153688.03..153688.09 rows=24 width=4) (actual
 time=2100.058..2100.059 rows=24 loops=1)
Sort Key: a11, a2
Sort Method: quicksort  Memory: 26kB
-  HashAggregate  (cost=153687.24..153687.48 rows=24 width=4) (actual
 time=2100.037..2100.040 rows=24 loops=1) --- grouping 1567 ms -- 25% faster
  Group Key: a11, a2
  -  Seq Scan on t2  (cost=0.00..101114.28 rows=7009728 width=4)
 (actual time=0.043..532.680 rows=7009728 loops=1)
  Planning time: 0.178 ms
  Execution time: 2100.158 ms
 (8 rows)

 postgres=# \dt+ t*
List of relations
  Schema | Name | Type  | Owner |  Size   | Description
 +--+---+---+-+-
  public | t1   | table | pavel | 6225 MB |
  public | t2   | table | pavel | 2423 MB |
 (2 rows)

 postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2
 order by 3,2;
QUERY
 PLAN

 -
  Sort  (cost=2023263.19..2023263.25 rows=24 width=4) (actual
 time=99453.272..99453.274 rows=24 loops=1)
Sort Key: a11, a2
Sort Method: quicksort  Memory: 26kB
-  HashAggregate  (cost=2023262.40..2023262.64 rows=24 width=4)
 (actual time=99453.244..99453.249 rows=24 loops=1) --- 31891 ms
  Group Key: a11, a2
  -  Seq Scan on t1  (cost=0.00..1497532.80 rows=70097280 width=4)
 (actual time=16.935..67562.615 rows=70097280 loops=1)
  Planning time: 14.526 ms
  Execution time: 99453.413 ms
 (8 rows)

 postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2
 order by 3,2;
   QUERY
 PLAN

 ---
  Sort  (cost=1536868.33..1536868.39 rows=24 width=4) (actual
 time=20656.397..20656.399 rows=24 loops=1)
Sort Key: a11, a2
Sort Method: quicksort  Memory: 26kB
-  HashAggregate  (cost=1536867.54..1536867.78 rows=24 width=4)
 (actual time=20656.375..20656.378 rows=24 loops=1) --- 15248 ms --100%
 faster
  Group Key: a11, a2
  -  Seq Scan on t2  (cost=0.00..1011137.88 rows=70097288 width=4)
 (actual time=0.060..5408.205 rows=70097280 loops=1)
  Planning time: 0.161 ms
  Execution time: 20656.475 ms
 (8 rows)

 It looks like hah agg is slower when it is based on wide table about
 25-100%. Is it - or I don't see something?


next query?

why we read all columns from t1?

postgres=# explain analyze verbose select count(*),

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-22 Thread Peter Geoghegan

On Sun, Feb 22, 2015 at 1:30 PM, Peter Geoghegan p...@heroku.com wrote:
 You should try it with the data fully sorted like this, but with one
 tiny difference: The very last tuple is out of order. How does that
 look?

Another thing that may be of particular interest to you as a Czech
person is how various locales perform. I believe that the collation
rules of Czech and Hungarian are particularly complex, with several
passes often required for strcoll() (I think maybe more than 4). You
should either measure that, or control for it. I was prepared to
attribute the differences in the two systems to differences in compute
bandwidth between the CPUs involved (which are perhaps more noticeable
now, since memory latency is less of a bottleneck). However, the
inconsistent use of collations could actually matter here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-02-22 Thread Tom Lane

Magnus Hagander mag...@hagander.net writes:
 On Sun, Feb 15, 2015 at 9:46 PM, Peter Geoghegan p...@heroku.com wrote:
 also think that it's a waste of screen space to show who within the
 annotation view. Granted, the old app supported this, but I tend to
 think that if I actually cared who added a certain annotation, I'd be
 happy to drill down into history. I haven't cared so far, AFAICR.

 Hmm. Personally, I definitely care about who made an annotation, but i
 guess I could be OK with going into the history as well. Anybody else have
 an opinion on the matter? I don't personally feel strongly either way.

Hmm ... in the old system, I agree with Peter, in fact I tended to find
the who annotations positively misleading, because people frequently
added other peoples' messages to the CF app.  It always looked to me like
this had the effect of attributing person A's patch to person B, who'd
actually only updated the CF entry.  I think the primary thing you usually
want to know is who is the author of a given email message.

With the new system of auto-adding messages, that particular problem
should be gone, and annotations should be mostly special events rather
than routine maintenance.  So maybe who made the annotation is of more
interest than it often was before.

I'm inclined to say you should leave it alone for the moment.  We don't
have enough experience with the new system to be sure how we'll use it,
so why do work you might just end up reverting?  We can adjust the display
later if we become convinced we don't need this info.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-22 Thread Peter Geoghegan

On Sun, Feb 22, 2015 at 1:19 PM, Tomas Vondra
tomas.von...@2ndquadrant.com wrote:
 In short, this fixes all the cases except for the ASC sorted data. I
 haven't done any code review, but I think we want this.

 I'll use data from the i5-2500k, but it applies to the Xeon too, except
 that the Xeon results are more noisy and the speedups are not that
 significant.

 For the 'text' data type, and 'random' dataset, the results are these:

   scaledatumcost-model
 ---
  10 328%  323%
 100 392%  391%
 200  96%  565%
 300  97%  572%
 400  97%  571%
 500  98%  570%

 The numbers are speedup vs. master, so 100% means exactly the same
 speed, 200% means twice as fast.

 So while with 'datum' patch this actually caused very nice speedup for
 small datasets - about 3-4x speedup up to 1M rows, for larger datasets
 we've seen small regression (~3% slower). With the cost model fix, we
 actually see a significant speedup (about 5.7x) for these cases.

Cool.

 I haven't verified whether this produces the same results, but if it
 does this is very nice.

 For 'DESC' dataset (i.e. data sorted in reverse order), we do get even
 better numbers, with up to 6.5x speedup on large datasets.

 But for 'ASC' dataset (i.e. already sorted data), we do get this:

   scaledatumcost-model
 ---
  10  85%   84%
 100  87%   87%
 200  76%   96%
 300  82%   90%
 400  91%   83%
 500  93%   81%

 Ummm, not that great, I guess :-(

You should try it with the data fully sorted like this, but with one
tiny difference: The very last tuple is out of order. How does that
look?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-02-22 Thread Magnus Hagander

On Sun, Feb 15, 2015 at 7:59 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 2/14/15 7:30 AM, Magnus Hagander wrote:
  On Mon, Feb 9, 2015 at 4:56 PM, Robert Haas robertmh...@gmail.com
  Can we make it smarter, so that the kinds of things people produce
  intending for them to be patches are thought by the CF app to be
  patches?
 
 
  Doing it wouldn't be too hard, as the code right now is simply:
 
  # Attempt to identify the file using magic information
  mtype = mag.buffer(contents)
  if mtype.startswith('text/x-diff'):
  a.ispatch = True
  else:
  a.ispatch = False
 
 
  (where mag is the API call into the magic module)
 
  So we could easily add for example our own regexp parsing or so. The
  question is do we want to - because we'll have to maintain it as well.
  But I guess if we have a restricted enough set of rules, we can probably
  live with that.

 As I had described in
 http://www.postgresql.org/message-id/54dd2413.8030...@gmx.net, this is
 all but impossible.  The above rule is certainly completely detached
 from the reality of what people actually send in.  If you are just
 ignoring patches that don't match your rule set, this is not going to
 work very well.

 I think the old system where the patch submitter declared, this message
 contains my patch, is the only one that will work.



Would you suggest removing the automated system completely, or keep it
around and just make it possible to override it (either by removing the
note that something is a patch, or by making something that's not listed as
a patch become marked as such)?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] New CF app deployment

2015-02-22 Thread Magnus Hagander

On Sun, Feb 15, 2015 at 9:46 PM, Peter Geoghegan p...@heroku.com wrote:

 On Sun, Feb 15, 2015 at 4:59 PM, Peter Eisentraut pete...@gmx.net wrote:
  I think the old system where the patch submitter declared, this message
  contains my patch, is the only one that will work.

 I tend to agree. That being said, calling out latest attachments is
 also useful (or highlighting that a particular mail has a particular
 file attached in general).


We can keep listing the attachment and just remove the automated check of
what *kind* of attachment it is.


Annotations-wise, I think that it would be nice to be able to modify
 an annotation, in case a typo is made (the old app supported this). I


I was sort of thinking it was something that happened seldom enough that
you could just delete it and remove it again. But I guess if we're
specifically thinking typos, it would make sense to add the ability to edit.



 also think that it's a waste of screen space to show who within the
 annotation view. Granted, the old app supported this, but I tend to
 think that if I actually cared who added a certain annotation, I'd be
 happy to drill down into history. I haven't cared so far, AFAICR.


Hmm. Personally, I definitely care about who made an annotation, but i
guess I could be OK with going into the history as well. Anybody else have
an opinion on the matter? I don't personally feel strongly either way.



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-22 Thread Tomas Vondra

On 23.2.2015 00:16, Peter Geoghegan wrote:
 On Sun, Feb 22, 2015 at 1:30 PM, Peter Geoghegan p...@heroku.com wrote:
 You should try it with the data fully sorted like this, but with one
 tiny difference: The very last tuple is out of order. How does that
 look?

I'm running that test now, I'll post the results tomorrow.

 Another thing that may be of particular interest to you as a Czech
 person is how various locales perform. I believe that the collation
 rules of Czech and Hungarian are particularly complex, with several
 passes often required for strcoll() (I think maybe more than 4). You
 should either measure that, or control for it. I was prepared to
 attribute the differences in the two systems to differences in compute
 bandwidth between the CPUs involved (which are perhaps more noticeable
 now, since memory latency is less of a bottleneck). However, the
 inconsistent use of collations could actually matter here.

That's a good point, but all the tests were executed with en_US.UTF-8.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-22 Thread Venkata Balaji N

On Sat, Feb 14, 2015 at 4:43 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 02/04/2015 11:41 PM, Josh Berkus wrote:

 On 02/04/2015 12:06 PM, Robert Haas wrote:

 On Wed, Feb 4, 2015 at 1:05 PM, Josh Berkus j...@agliodbs.com wrote:

 Let me push max_wal_size and min_wal_size again as our new parameter
 names, because:

 * does what it says on the tin
 * new user friendly
 * encourages people to express it in MB, not segments
 * very different from the old name, so people will know it works
 differently


 That's not bad.  If we added a hard WAL limit in a future release, how
 would that fit into this naming scheme?


 Well, first, nobody's at present proposing a patch to add a hard limit,
 so I'm reluctant to choose non-obvious names to avoid conflict with a
 feature nobody may ever write.  There's a number of reasons a hard limit
 would be difficult and/or undesirable.

 If we did add one, I'd suggest calling it wal_size_limit or something
 similar.  However, we're most likely to only implement the limit for
 archives, which means that it might acually be called
 archive_buffer_limit or something more to the point.


 Ok, I don't hear any loud objections to min_wal_size and max_wal_size, so
 let's go with that then.

 Attached is a new version of this. It now comes in four patches. The first
 three are just GUC-related preliminary work, the first of which I posted on
 a separate thread today.


I applied all the 4 patches to the latest master successfully and performed
a test with heavy continuous load. I see no much difference in the
checkpoint behaviour and all seems to be working as expected.

I did a test with following parameter values -

max_wal_size = 1MB
min_wal_size = 1000MB
checkpoint_timeout = 5min

Upon performing a heavy load operation, the checkpoints were occurring
based on timeouts.

pg_xlog size fluctuated a bit (not very much). Initially few mins pg_xlog
size stayed at 3.3G and gradually increased to 5.5G max during the
operation. There was a continuous fluctuation on number of segments being
removed+recycled.

A part of the checkpoint logs are as follows -

2015-02-23 15:16:00.318 GMT-10 LOG:  checkpoint starting: time
2015-02-23 15:16:53.943 GMT-10 LOG:  checkpoint complete: wrote 3010
buffers (18.4%); 0 transaction log file(s) added, 0 removed, 159 recycled;
write=27.171 s, sync=25.945 s, total=53.625 s; sync files=20, longest=5.376
s, average=1.297 s; distance=2748844 kB, estimate=2748844 kB
2015-02-23 15:21:00.438 GMT-10 LOG:  checkpoint starting: time
2015-02-23 15:22:01.352 GMT-10 LOG:  checkpoint complete: wrote 2812
buffers (17.2%); 0 transaction log file(s) added, 0 removed, 168 recycled;
write=25.351 s, sync=35.346 s, total=60.914 s; sync files=34, longest=9.025
s, average=1.039 s; distance=1983318 kB, estimate=2672291 kB
2015-02-23 15:26:00.314 GMT-10 LOG:  checkpoint starting: time
2015-02-23 15:26:25.612 GMT-10 LOG:  checkpoint complete: wrote 2510
buffers (15.3%); 0 transaction log file(s) added, 0 removed, 121 recycled;
write=22.623 s, sync=2.477 s, total=25.297 s; sync files=20, longest=1.418
s, average=0.123 s; distance=2537230 kB, estimate=2658785 kB
2015-02-23 15:31:00.477 GMT-10 LOG:  checkpoint starting: time
2015-02-23 15:31:25.925 GMT-10 LOG:  checkpoint complete: wrote 2625
buffers (16.0%); 0 transaction log file(s) added, 0 removed, 155 recycled;
write=23.657 s, sync=1.592 s, total=25.447 s; sync files=13, longest=0.319
s, average=0.122 s; distance=2797386 kB, estimate=2797386 kB
2015-02-23 15:36:00.607 GMT-10 LOG:  checkpoint starting: time
2015-02-23 15:36:52.686 GMT-10 LOG:  checkpoint complete: wrote 3473
buffers (21.2%); 0 transaction log file(s) added, 0 removed, 171 recycled;
write=31.257 s, sync=20.446 s, total=52.078 s; sync files=33, longest=4.512
s, average=0.619 s; distance=2153903 kB, estimate=2733038 kB
2015-02-23 15:41:00.675 GMT-10 LOG:  checkpoint starting: time
2015-02-23 15:41:25.092 GMT-10 LOG:  checkpoint complete: wrote 2456
buffers (15.0%); 0 transaction log file(s) added, 0 removed, 131 recycled;
write=21.974 s, sync=2.282 s, total=24.417 s; sync files=27, longest=1.275
s, average=0.084 s; distance=2258648 kB, estimate=2685599 kB
2015-02-23 15:46:00.671 GMT-10 LOG:  checkpoint starting: time
2015-02-23 15:46:26.757 GMT-10 LOG:  checkpoint complete: wrote 2644
buffers (16.1%); 0 transaction log file(s) added, 0 removed, 138 recycled;
write=23.619 s, sync=2.181 s, total=26.086 s; sync files=12, longest=0.709
s, average=0.181 s; distance=2787124 kB, estimate=2787124 kB
2015-02-23 15:51:00.509 GMT-10 LOG:  checkpoint starting: time
2015-02-23 15:53:30.793 GMT-10 LOG:  checkpoint complete: wrote 13408
buffers (81.8%); 0 transaction log file(s) added, 0 removed, 170 recycled;
write=149.432 s, sync=0.664 s, total=150.284 s; sync files=13,
longest=0.286 s, average=0.051 s; distance=1244483 kB, estimate=2632860 kB

Above checkpoint logs are generated at the time when pg_xlog size was at
5.4G

*Code* *Review*

I had a look at the

Re: [HACKERS] Enforce creation of destination folders for source files in pg_regress (Was: pg_regress writes into source tree)

2015-02-22 Thread Michael Paquier

On Mon, Feb 23, 2015 at 12:00 AM, Peter Eisentraut pete...@gmx.net wrote:
 On 2/22/15 5:41 AM, Michael Paquier wrote:
 You could argue that these .gitignore files don't actually belong there,
 but your patch doesn't change or move those files, and even modules that
 have non-empty sql/ or expected/ directories have .gitignore files
 there, so it is considered the appropriate location.

 This is up to the maintainer of each extension to manage their code
 tree. However I can imagine that some people would be grateful if we
 allow them to not need sql/ and expected/ containing only one single
 .gitignore file ignoring everything with *, making the code tree of
 their extensions cleaner.

 I would like to have an extension in tree that also does this, so we
 have a regression test of this functionality.

Sure. Here is one in the patch attached added as a test module. The
name of the module is regress_dynamic. Perhaps the name could be
better..
-- 
Michael
From 0b89d35f9605f866b45943d842898e30923476c3 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 23 Feb 2015 15:02:14 +0900
Subject: [PATCH 1/2] Enforce creation of input and output paths in pg_regress

This is particularly useful for extensions that have only regression tests
in input/ and output/ dynamically generated when running the tests to keep
the code tree of such extensions clean without empty folders containing as
only content a .gitignore ignoring everything else other than its own
existence.
---
 src/test/regress/pg_regress.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 3af0e57..a7aa580 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -496,6 +496,7 @@ convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c
 {
 	char		testtablespace[MAXPGPATH];
 	char		indir[MAXPGPATH];
+	char		result_dir[MAXPGPATH];
 	struct stat st;
 	int			ret;
 	char	  **name;
@@ -520,6 +521,14 @@ convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c
 		/* Error logged in pgfnames */
 		exit(2);
 
+	/*
+	 * Enforce creation of destination directory if it does not exist yet.
+	 * This is useful for tests using only source files.
+	 */
+	snprintf(result_dir, MAXPGPATH, %s/%s, dest_dir, dest_subdir);
+	if (!directory_exists(result_dir))
+		make_directory(result_dir);
+
 	snprintf(testtablespace, MAXPGPATH, %s/testtablespace, outputdir);
 
 #ifdef WIN32
@@ -565,7 +574,7 @@ convert_sourcefiles_in(char *source_subdir, char *dest_dir, char *dest_subdir, c
 		/* build the full actual paths to open */
 		snprintf(prefix, strlen(*name) - 6, %s, *name);
 		snprintf(srcfile, MAXPGPATH, %s/%s, indir, *name);
-		snprintf(destfile, MAXPGPATH, %s/%s/%s.%s, dest_dir, dest_subdir,
+		snprintf(destfile, MAXPGPATH, %s/%s.%s, result_dir,
  prefix, suffix);
 
 		infile = fopen(srcfile, r);
-- 
2.3.0

From c5e58c85e743c3c7b133234588e2010612166f8f Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Mon, 23 Feb 2015 15:23:44 +0900
Subject: [PATCH 2/2] Add regress_dynamic as a test module

This simple extension has the characteristic to only contain non-static
regression test content, and is aimed to test if pg_regress is able to
generate appropriately input and output directories when they do not
exist.
---
 src/test/modules/regress_dynamic/.gitignore  |  8 
 src/test/modules/regress_dynamic/Makefile| 16 
 src/test/modules/regress_dynamic/README  |  6 ++
 src/test/modules/regress_dynamic/input/basic.source  |  9 +
 src/test/modules/regress_dynamic/output/basic.source | 11 +++
 .../modules/regress_dynamic/regress_dynamic--1.0.sql |  8 
 src/test/modules/regress_dynamic/regress_dynamic.control |  5 +
 7 files changed, 63 insertions(+)
 create mode 100644 src/test/modules/regress_dynamic/.gitignore
 create mode 100644 src/test/modules/regress_dynamic/Makefile
 create mode 100644 src/test/modules/regress_dynamic/README
 create mode 100644 src/test/modules/regress_dynamic/input/basic.source
 create mode 100644 src/test/modules/regress_dynamic/output/basic.source
 create mode 100644 src/test/modules/regress_dynamic/regress_dynamic--1.0.sql
 create mode 100644 src/test/modules/regress_dynamic/regress_dynamic.control

diff --git a/src/test/modules/regress_dynamic/.gitignore b/src/test/modules/regress_dynamic/.gitignore
new file mode 100644
index 000..122ede3
--- /dev/null
+++ b/src/test/modules/regress_dynamic/.gitignore
@@ -0,0 +1,8 @@
+# Generated sub-directories
+/log/
+/results/
+/tmp_check/
+
+# Input and output directories of regression tests
+/expected/
+/sql/
diff --git a/src/test/modules/regress_dynamic/Makefile b/src/test/modules/regress_dynamic/Makefile
new file mode 100644
index 000..2cab345
--- /dev/null
+++

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-22 Thread Stefan Kaltenbrunner

On 02/13/2015 06:27 AM, Tom Lane wrote:
 Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly
 the same failure pattern on HEAD:
 
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhordt=2015-02-06%2011%3A59%3A59
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tickdt=2015-02-12%2010%3A22%3A57
 
 I'd say we have a problem.  I'd even go so far as to say that somebody has
 completely broken locking, because this looks like autovacuum and manual
 vacuuming are hitting the same table at the same time.

fwiw - looks like spoonbill(not doing CLOBBER_CACHE_ALWAYS) managed to
trigger that ones as well:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2015-02-23%2000%3A00%3A06

there is also some failures from the BETWEEN changes in that
regression.diff but that might be fallout from the above problem.


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

2015-02-22 Thread Corey Huinker

Changes in this patch:
- added polymorphic versions of dblink_fetch()
- upped dblink version # to 1.2 because of new functions
- migration 1.1 - 1.2
- DocBook changes for dblink(), dblink_get_result(), dblink_fetch()

On Sun, Feb 22, 2015 at 11:38 PM, Corey Huinker corey.huin...@gmail.com
wrote:

 nevermind. Found it.

 On Sun, Feb 22, 2015 at 11:18 PM, Corey Huinker corey.huin...@gmail.com
 wrote:

 Yes, that was it, I discovered it myself and should have posted a
 nevermind.

 Now I'm slogging through figuring out where to find elog() messages from
 the temporary server. It's slow, but it's progress.

 On Sun, Feb 22, 2015 at 10:39 PM, Michael Paquier 
 michael.paqu...@gmail.com wrote:

 On Mon, Feb 23, 2015 at 12:03 PM, Corey Huinker corey.huin...@gmail.com
 wrote:
  + ERROR:  could not stat file
 
 /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql:
  No such file or directory

 Didn't you forget to add dblink--1.2.sql to DATA in contrib/dblink
 Makefile? That would explain why this file has not been included in
 the temporary installation deployed by pg_regress.
 --
 Michael




diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile
index e833b92..a00466e 100644
--- a/contrib/dblink/Makefile
+++ b/contrib/dblink/Makefile
@@ -6,7 +6,8 @@ PG_CPPFLAGS = -I$(libpq_srcdir)
 SHLIB_LINK = $(libpq)
 
 EXTENSION = dblink
-DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql
+DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql \
+   dblink--1.2.sql dblink--1.1--1.2.sql
 
 REGRESS = paths dblink
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress \
diff --git a/contrib/dblink/dblink--1.1--1.2.sql 
b/contrib/dblink/dblink--1.1--1.2.sql
index e5a0900..128611d 100644
--- a/contrib/dblink/dblink--1.1--1.2.sql
+++ b/contrib/dblink/dblink--1.1--1.2.sql
@@ -32,3 +32,15 @@ CREATE FUNCTION dblink_get_result(text, bool, anyelement)
 RETURNS SETOF anyelement
 AS 'MODULE_PATHNAME', 'dblink_get_result'
 LANGUAGE C;
+
+CREATE FUNCTION dblink_fetch (text, int, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_fetch'
+LANGUAGE C;
+
+CREATE FUNCTION dblink_fetch (text, int, boolean, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_fetch'
+LANGUAGE C;
+
+
diff --git a/contrib/dblink/dblink--1.2.sql b/contrib/dblink/dblink--1.2.sql
index bf5ddaa..9d31e2e 100644
--- a/contrib/dblink/dblink--1.2.sql
+++ b/contrib/dblink/dblink--1.2.sql
@@ -71,6 +71,19 @@ RETURNS setof record
 AS 'MODULE_PATHNAME','dblink_fetch'
 LANGUAGE C STRICT;
 
+CREATE FUNCTION dblink_fetch (text, int, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_fetch'
+LANGUAGE C;
+
+CREATE FUNCTION dblink_fetch (text, int, boolean, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_fetch'
+LANGUAGE C;
+
+
+
+
 CREATE FUNCTION dblink_fetch (text, text, int)
 RETURNS setof record
 AS 'MODULE_PATHNAME','dblink_fetch'
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 009b877..fde750c 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -537,52 +537,39 @@ dblink_fetch(PG_FUNCTION_ARGS)
char   *curname = NULL;
int howmany = 0;
boolfail = true;/* default to backward compatible */
+   int first_optarg;
 
prepTuplestoreResult(fcinfo);
 
DBLINK_INIT;
 
-   if (PG_NARGS() == 4)
+   if (get_fn_expr_argtype(fcinfo-flinfo,1) == TEXTOID)
{
-   /* text,text,int,bool */
+   /* text,text,int,[bool],[anytype] */
conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
curname = text_to_cstring(PG_GETARG_TEXT_PP(1));
howmany = PG_GETARG_INT32(2);
-   fail = PG_GETARG_BOOL(3);
-
+   first_optarg = 3;
rconn = getConnectionByName(conname);
if (rconn)
conn = rconn-conn;
}
-   else if (PG_NARGS() == 3)
-   {
-   /* text,text,int or text,int,bool */
-   if (get_fn_expr_argtype(fcinfo-flinfo, 2) == BOOLOID)
-   {
-   curname = text_to_cstring(PG_GETARG_TEXT_PP(0));
-   howmany = PG_GETARG_INT32(1);
-   fail = PG_GETARG_BOOL(2);
-   conn = pconn-conn;
-   }
-   else
-   {
-   conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
-   curname = text_to_cstring(PG_GETARG_TEXT_PP(1));
-   howmany = PG_GETARG_INT32(2);
-
-   rconn = getConnectionByName(conname);
-   if (rconn)
-   conn = rconn-conn;
-   }
-   }
-   else if (PG_NARGS() == 2)
+   else
{
-   /* text,int */
+   /*

Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]

2015-02-22 Thread Michael Paquier

On Sun, Feb 22, 2015 at 6:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 After some more hacking, the only remaining uses of foo[1] in struct
 declarations are:

 1. A couple of places where the array is actually the only struct member;
 for some unexplainable reason gcc won't let you use flexible array syntax
 in that case.

Yes. Thanks for adding notes at those places.

 2. struct sqlda_struct in ecpg's sqlda-native.h.  We have a problem with
 using [FLEXIBLE_ARRAY_MEMBER] there because (1) pg_config.h isn't (and I
 think shouldn't be) #included by this file, and (2) there is very possibly
 application code depending on sizeof() this struct; the risk of breaking
 such code seems to outweigh any likely benefit.  Also, despite that
 I tried changing [1] to [] and fixing the places I could find that
 depended on sizeof(struct sqlda_struct), but I apparently can't find them
 all because the ecpg regression tests fell over :-(.

Yeah, I think we can live with that. The rest of the backend code is
clean now, and the allocation calculations are more understandable.

 Anyway, barring excitement in the buildfarm I think this project is
 complete.

Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_check_dir comments and implementation mismatch

2015-02-22 Thread Robert Haas

On Fri, Feb 20, 2015 at 12:59 AM, Noah Misch n...@leadboat.com wrote:
 On Mon, Feb 02, 2015 at 03:48:33PM -0500, Robert Haas wrote:
 On Sat, Jan 31, 2015 at 8:28 AM, Marco Nenciarini 
 marco.nenciar...@2ndquadrant.it wrote:
  I've attached a new version of the patch fixing the missing closedir on
  readdir error.

 If readir() fails and closedir() succeeds, the return will be -1 but
 errno will be 0.

 Out of curiosity, have you seen a closedir() implementation behave that way?
 It would violate C99 (The value of errno is zero at program startup, but is
 never set to zero by any library function.) and POSIX.

No.  Good point, I didn't think about that.  I think this way is safer, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_check_dir comments and implementation mismatch

2015-02-22 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Fri, Feb 20, 2015 at 12:59 AM, Noah Misch n...@leadboat.com wrote:
 On Mon, Feb 02, 2015 at 03:48:33PM -0500, Robert Haas wrote:
 If readir() fails and closedir() succeeds, the return will be -1 but
 errno will be 0.

 Out of curiosity, have you seen a closedir() implementation behave that way?
 It would violate C99 (The value of errno is zero at program startup, but is
 never set to zero by any library function.) and POSIX.

 No.  Good point, I didn't think about that.  I think this way is safer, 
 though.

While the spec forbids library functions from setting errno to zero, there
is no restriction on them changing errno in other ways despite returning
success; their exit-time value of errno is only well-defined if they fail.
So we do need to preserve errno explicitly across closedir(), or we may
report the wrong failure from readdir().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] restrict global access to be readonly

2015-02-22 Thread Robert Haas

On Tue, Feb 17, 2015 at 4:40 AM, happy times guangzhouzh...@qq.com wrote:
  The first choice Tom pointed makes sense to me: adding this as eqivalent to
 setting all subsequent transactions as read only. It is useful enough in the
 scenarios where disk limit for the instance is reached, we want to block all
 write access(this limit is typically soft limit and vacuum logs or sort
 spills could be permitted).

 I previously thought of the choice of not generating any WAL semantics,
 but now doubt if thats practically useful. We are forced to restart the old
 master with recovery mode during switching roles of master-slave, which
 would make it into the state of not generating any WAL.

 And for logical replication, seems setting transactions as readonly could do
 the job to avoid logs to be shipped to slave.

 One other thing to consider is the user to be blocked. I expect this command
 to prevent write access even for the superusers, since there may be other
 internal apps that connect as superuser and do writes, they are expected to
 be blocked too. And sometime we may use this command to avoid any unexpected
 write operation.

 Last thing is when the command returns. I expected it to return immediately
 and not waiting for existing active transactions to finish. This is to avoid
 existing long running transactions to block it and let the user to decide
 whether to wait or kill existing transactions.

The use cases I can think of are:

- Opening a possibly-damaged database strictly read-only so you can
inspect it without risking further damage; or forcing a damaged
database that is already up and running to go read-only to prevent
further damage.  In this case, you'd want to prohibit all writes to
the data files, even hint bit changes, but the use of temporary files
for sorts or hashes would be fine.

- Forcing a master into a read-only state in preparation for a
controlled switchover.  If you can completely stop WAL generation on
the master, replay all WAL generated on the master prior to the
switchover on the standby, and then switch over, you could make the
old master a slave of the new master.  I think the requirements here
are similar to the previous case.  I'm not 100% sure that we need to
prevent hint bits getting set in this case, but it probably wouldn't
hurt. Temp file writes are, again, OK.

- Taking a copy of the database for backup purposes.  Same thing again.

- Accessing a cluster stored on a read-only medium, like a CD or DVD.
In this case, even temporary file writes are no good.

- Your proposed use case of preventing the disk from being filled up
is a little different.  There's no real problem if the data files fill
up the disk; at a certain point, users will get errors, but forcing
the database read only is going to do that anyway (and sooner).
There's a big problem if the xlog partition fills up, though.  You
could want a few different things here depending on the details of
your use case, but preventing all WAL generation is one of them.

Based on the above, I'm inclined to think that making the system read
only should (1) prevent all WAL generation and (2) prevent all data
file modifications, but (3) still allow the use of temporary files.

--

It's also worth taking a look at what other systems support.  SQL
server support something like this feature using this (ugly) syntax:

ALTER DATABASE [whatever] SET  READ_ONLY WITH NO_WAIT

I'm not sure what the semantics are, exactly.

In Oracle, you can open a database read-only:

ALTER DATABASE whatever OPEN READ ONLY;

It's not clear to me whether you can use this to force an
already-read-write database back to read only mode.  Oracle also lets
you make a tablespace read-only if there are no open transactions,
running hot backups, etc anywhere in the system.  That syntax is just:

ALTER TABLESPACE whatever READ ONLY;

That forbids all future data file modifications.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

2015-02-22 Thread Corey Huinker

Yes, that was it, I discovered it myself and should have posted a
nevermind.

Now I'm slogging through figuring out where to find elog() messages from
the temporary server. It's slow, but it's progress.

On Sun, Feb 22, 2015 at 10:39 PM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Mon, Feb 23, 2015 at 12:03 PM, Corey Huinker corey.huin...@gmail.com
 wrote:
  + ERROR:  could not stat file
 
 /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql:
  No such file or directory

 Didn't you forget to add dblink--1.2.sql to DATA in contrib/dblink
 Makefile? That would explain why this file has not been included in
 the temporary installation deployed by pg_regress.
 --
 Michael

Re: [HACKERS] Precedence of standard comparison operators

2015-02-22 Thread Tom Lane

Attached is an improved patch that includes optional warnings for
constructs that changed parsing.  It's not quite 100% but I think it's
about 90% correct; the difference in size between this and the previous
patch should be a pretty fair indication of what it's going to cost us
to have a warning capability.

What's missing from this version is that it can't tell the difference
between LIKE/ILIKE/SIMILAR TO and the underlying operators, that is,
it sees a LIKE b as a ~~ b because that's what the grammar emits.
However, those inputs have different operator-precedence behavior.

Likewise, we can't tell the difference between
xmlexpr IS NOT DOCUMENT
NOT (xmlexpr IS DOCUMENT)
because the grammar converts the former into the latter --- and again,
those two things have different precedence behavior.

It wouldn't take very much additional code to fix these things by changing
what the grammar emits; but I'm running out of energy for today.  In any
case, I thought I should put this up and see if this general approach is
going to satisfy people's concerns about making such a change.

regards, tom lane

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bcb106..22a2d32 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** dynamic_library_path = 'C:\tools\postgre
*** 6752,6757 
--- 6752,6780 
/listitem
   /varlistentry
  
+  varlistentry id=guc-operator-precedence-warning xreflabel=operator_precedence_warning
+   termvarnameoperator_precedence_warning/varname (typeboolean/type)
+   indexterm
+primaryvarnameoperator_precedence_warning/ configuration parameter/primary
+   /indexterm
+   /term
+   listitem
+para
+ When on, the parser will emit a warning for any construct that might
+ have changed meanings since productnamePostgreSQL/ 9.4 as a result
+ of changes in operator precedence.  This is useful for auditing
+ applications to see if precedence changes have broken anything; but it
+ is not meant to be left turned on in production, since it will warn
+ about some perfectly valid, standard-compliant SQL code.
+ The default is literaloff/.
+/para
+ 
+para
+ See xref linkend=sql-precedence for more information.
+/para
+   /listitem
+  /varlistentry
+ 
  varlistentry id=guc-quote-all-identifiers xreflabel=quote-all-identifiers
termvarnamequote_all_identifiers/varname (typeboolean/type)
indexterm
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 4b81b08..e7484c4 100644
*** a/doc/src/sgml/syntax.sgml
--- b/doc/src/sgml/syntax.sgml
*** CAST ( 'replaceablestring/replaceable
*** 984,993 
  associativity of the operators in productnamePostgreSQL/.
  Most operators have the same precedence and are left-associative.
  The precedence and associativity of the operators is hard-wired
! into the parser.  This can lead to non-intuitive behavior; for
! example the Boolean operators literallt;/ and
! literalgt;/ have a different precedence than the Boolean
! operators literallt;=/ and literalgt;=/.  Also, you will
  sometimes need to add parentheses when using combinations of
  binary and unary operators.  For instance:
  programlisting
--- 984,994 
  associativity of the operators in productnamePostgreSQL/.
  Most operators have the same precedence and are left-associative.
  The precedence and associativity of the operators is hard-wired
! into the parser.
!/para
! 
!para
! You will
  sometimes need to add parentheses when using combinations of
  binary and unary operators.  For instance:
  programlisting
*** SELECT (5 !) - 6;
*** 1008,1014 
 /para
  
 table id=sql-precedence-table
! titleOperator Precedence (decreasing)/title
  
  tgroup cols=3
   thead
--- 1009,1015 
 /para
  
 table id=sql-precedence-table
! titleOperator Precedence (highest to lowest)/title
  
  tgroup cols=3
   thead
*** SELECT (5 !) - 6;
*** 1063,1087 
/row
  
row
!entrytokenIS/token/entry
!entry/entry
!entryliteralIS TRUE/, literalIS FALSE/, literalIS NULL/, etc/entry
!   /row
! 
!   row
!entrytokenISNULL/token/entry
!entry/entry
!entrytest for null/entry
!   /row
! 
!   row
!entrytokenNOTNULL/token/entry
!entry/entry
!entrytest for not null/entry
!   /row
! 
!   row
!entry(any other)/entry
 entryleft/entry
 entryall other native and user-defined operators/entry
/row
--- 1064,1070 
/row
  
row
!entry(any other operator)/entry
 entryleft/entry
 entryall other native and user-defined operators/entry
/row
***

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-02-22 Thread Tomas Vondra

Hi!

On 28.1.2015 05:03, Abhijit Menon-Sen wrote:
 At 2015-01-27 17:00:27 -0600, jim.na...@bluetreble.com wrote:

 It would be best to get this into a commit fest so it's not lost.
 
 It's there already.
 
 -- Abhijit
 
 

I looked at this patch today, so a few comments from me:

1) I believe the patch is slightly broken, because the version was
   bumped to 1.3 but only partially, so I get this on make install

$ make -j9 -s install
/usr/bin/install: cannot stat ‘./pgstattuple--1.3.sql’: No such
file or directory
../../src/makefiles/pgxs.mk:129: recipe for target 'install' failed
make[1]: *** [install] Error 1
Makefile:85: recipe for target 'install-pgstattuple-recurse' failed
make: *** [install-pgstattuple-recurse] Error 2
make: *** Waiting for unfinished jobs

   The problem seems to be that Makefile already lists --1.3.sql in the
   DATA section, but the file was not renamed (there's just --1.2.sql).

   After fixing that, it's also necessary to fix the version in the
   control file, otherwise the CREATE EXTENSION fails.

  default_version = '1.2' - '1.3'

   At least that fixed the install for me.


2) Returning Datum from fbstat_heap, which is a static function seems a
   bit strange to me, I'd use the HeapTuple directly, but meh ...


3) The way the tuple is built by first turning the values into strings
   and then turned back into Datums by calling BuildTupleFromCStrings
   is more serious I guess. Why not to just keep the Datums and call
   heap_form_tuple directly?

   I see the other functions in pgstattuple.c do use the string way, so
   maybe there's a reason for that? Or is this just to keep the code
   consistent in the extension?

4) I really doubt 'fastbloat' is a good name. I propose 'pgstatbloat'
   to make that consistent with the rest of pgstattuple functions. Or
   something similar, but 'fastbloat' is just plain confusing.


Otherwise, the code looks OK to me. Now, there are a few features I'd
like to have for production use (to minimize the impact):

1) no index support :-(

   I'd like to see support for more relation types (at least btree
   indexes). Are there any plans for that? Do we have an idea on how to
   compute that?

2) sampling just a portion of the table

   For example, being able to sample just 5% of blocks, making it less
   obtrusive, especially on huge tables. Interestingly, there's a
   TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
   of the methods (e.g. functions behind SYSTEM sampling)?

3) throttling

   Another feature minimizing impact of running this on production might
   be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
   or something along those lines.

4) prefetch

   fbstat_heap is using visibility map to skip fully-visible pages,
   which is nice, but if we skip too many pages it breaks readahead
   similarly to bitmap heap scan. I believe this is another place where
   effective_io_concurrency (i.e. prefetch) would be appropriate.

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-22 Thread Robert Haas

On Sat, Feb 21, 2015 at 11:29 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 I am wondering a bit about interaction with wal_keep_segments.
 One thing is that wal_keep_segments is still specified in number of segments
 and not size units, maybe it would be worth to change it also?
 And the other thing is that, if set, the wal_keep_segments is the real
 max_wal_size from the user perspective (not from perspective of the
 algorithm in this patch, but user does not really care about that) which is
 somewhat weird given the naming.

It seems like wal_keep_segments is more closely related to
wal_*min*_size.  The idea of both settings is that each is a minimum
amount of WAL we want to keep around for some purpose.  But they're
not quite the same, I guess, because wal_min_size just forces us to
keep that many files around - they can be overwritten whenever.
wal_keep_segments is an amount of actual WAL data we want to keep
around.

Would it make sense to require that wal_keep_segments = wal_min_size?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

2015-02-22 Thread Corey Huinker

I seem to be getting tripped up in the regression test. This line was found
in regression.diff

+ ERROR:  could not stat file
/home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql:
No such file or directory

The file dblink--1.2.sql does exist
in /home/ubuntu/src/postgres/contrib/dblink/

~/src/postgres/contrib/dblink$ ls -la dblink--1.?.sql
-rw-rw-r-- 1 ubuntu ubuntu 5.7K Feb 22 16:02 dblink--1.1.sql
-rw-rw-r-- 1 ubuntu ubuntu 6.8K Feb 22 16:50 dblink--1.2.sql


But it evidently isn't making it into the tmp_check dir.

What needs to happen for new files to be seen by the regression test
harness?


On Fri, Feb 20, 2015 at 1:14 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Fri, Feb 20, 2015 at 2:50 PM, Corey Huinker corey.huin...@gmail.com
 wrote:
  Thanks - completely new to this process, so I'm going to need
  walking-through of it. I promise to document what I learn and try to add
  that to the commitfest wiki. Where can I go for guidance about
 documentation
  format and regression tests?

 Here are some guidelines about how to submit a patch:
 https://wiki.postgresql.org/wiki/Submitting_a_Patch

 You can have as well a look here to see how extensions like dblink are
 structured:
 http://www.postgresql.org/docs/devel/static/extend-pgxs.html
 What you need to do in the case of your patch is to add necessary test
 cases to in sql/dblink.sql for the new functions you have added and
 then update the output in expected/. Be sure to not break any existing
 things as well. After running the tests in your development
 environment output results are available in results then pg_regress
 generates a differential file in regressions.diffs.

 For the documentation, updating dblink.sgml with the new function
 prototypes would be sufficient. With perhaps an example(s) to show
 that what you want to add is useful.
 Regards,
 --
 Michael

Re: [HACKERS] dblink: add polymorphic functions.

2015-02-22 Thread Corey Huinker

nevermind. Found it.

On Sun, Feb 22, 2015 at 11:18 PM, Corey Huinker corey.huin...@gmail.com
wrote:

 Yes, that was it, I discovered it myself and should have posted a
 nevermind.

 Now I'm slogging through figuring out where to find elog() messages from
 the temporary server. It's slow, but it's progress.

 On Sun, Feb 22, 2015 at 10:39 PM, Michael Paquier 
 michael.paqu...@gmail.com wrote:

 On Mon, Feb 23, 2015 at 12:03 PM, Corey Huinker corey.huin...@gmail.com
 wrote:
  + ERROR:  could not stat file
 
 /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql:
  No such file or directory

 Didn't you forget to add dblink--1.2.sql to DATA in contrib/dblink
 Makefile? That would explain why this file has not been included in
 the temporary installation deployed by pg_regress.
 --
 Michael

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-22 Thread Andrew Dunstan



On 02/22/2015 11:48 AM, Kevin Grittner wrote:


(2) Use a course enough granularity on time and a short enough
maximum for the GUC to just keep a circular buffer of the mappings
in memory.  We might be able to make this dense enough that one
minute resolution for up to 60 days could fit in 338kB.  Obviously
that could be reduced with courser granularity or a shorter
maximum.




This doesn't sound too bad to me. Presumably these would be tunable. I 
think one minute granularity would be fine for most purposes,  but I can 
imagine people who would want it finer, like 10 seconds, say.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch

2015-02-22 Thread Tomas Vondra

Hi,

On 22.2.2015 18:57, Petr Jelinek wrote:
 Tomas noticed that the patch is missing error check when TABLESAMPLE
 is used on view, so here is a new version that checks it's only used 
 against table or matview.
 
 No other changes.

Curious question - could/should this use page prefetch, similar to what
bitmap heap scan does? I believe the answer is 'yes'.

With SYSTEM that should be rather straightforward to implement, because
it already works at page level, and it's likely to give significant
performance speedup, similar to bitmap index scan:

http://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com

With BERNOULLI that might be more complex to implement because of the
page/tuple sampling, and the benefit is probably much lower than for
SYSTEM because it's likely that at least one tuple will be sampled.

I'm not saying it has to be done in this CF (or that it makes the patch
uncommitable).

For example, this seems like a very nice project for the GSoC (clear
scope, not too large, ...).

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-02-22 Thread Tomas Vondra

On 23.2.2015 03:20, Jim Nasby wrote:
 On 2/22/15 5:41 PM, Tomas Vondra wrote:
 Otherwise, the code looks OK to me. Now, there are a few features I'd
 like to have for production use (to minimize the impact):

 1) no index support:-(

 I'd like to see support for more relation types (at least btree
 indexes). Are there any plans for that? Do we have an idea on how to
 compute that?
 
 It'd be cleaner if had actual an actual am function for this, but see
 below.
 
 2) sampling just a portion of the table

 For example, being able to sample just 5% of blocks, making it less
 obtrusive, especially on huge tables. Interestingly, there's a
 TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
 of the methods (e.g. functions behind SYSTEM sampling)?

 3) throttling

 Another feature minimizing impact of running this on production might
 be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
 or something along those lines.

 4) prefetch

 fbstat_heap is using visibility map to skip fully-visible pages,
 which is nice, but if we skip too many pages it breaks readahead
 similarly to bitmap heap scan. I believe this is another place where
 effective_io_concurrency (i.e. prefetch) would be appropriate.
 
 All of those wishes are solved in one way or another by vacuum and/or
 analyze. If we had a hook in the tuple scanning loop and at the end of
 vacuum you could just piggy-back on it. But really all we'd need for
 vacuum to be able to report this info is one more field in LVRelStats, a
 call to GetRecordedFreeSpace for all-visible pages, and some logic to
 deal with pages skipped because we couldn't get the vacuum lock.
 
 Should we just add this to vacuum instead?

Possibly. I think the ultimate goal is to be able to get this info
easily and without disrupting the system performance too much (which is
difficult without sampling/throttling). If we can stuff that into
autovacuum reasonably, and then get the info from catalogs, I'm OK with
that.

However I'm not sure putting this into autovacuum is actually possible,
because how do you merge data from multiple partial runs (when each of
them skipped different pages)? Also, autovacuum is not the only place
where we free space - we'd have to handle HOT for example, I guess.


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] attribute for non-gcc compilers

2015-02-22 Thread Robert Haas

On Tue, Feb 17, 2015 at 8:41 AM, Oskari Saarenmaa o...@ohmu.fi wrote:
 15.01.2015, 21:58, Robert Haas kirjoitti:
 On Wed, Jan 14, 2015 at 5:54 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 I think I'd for now simply not define pg_attribute_aligned() on
 platforms where it's not supported, instead of defining it empty. If we
 need a softer variant we can name it pg_attribute_aligned_if_possible or
 something.

 Sounds sane?

 Yes, that sounds like a much better plan.

 Attached an updated patch rebased on today's git master that never
 defines aligned or packed empty.

 This is also included in the current commitfest,
 https://commitfest.postgresql.org/4/115/

Is this going to play nicely with pgindent?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2015-02-22 Thread Petr Jelinek


On 23/02/15 03:24, Robert Haas wrote:

On Sat, Feb 21, 2015 at 11:29 PM, Petr Jelinek p...@2ndquadrant.com wrote:

I am wondering a bit about interaction with wal_keep_segments.
One thing is that wal_keep_segments is still specified in number of segments
and not size units, maybe it would be worth to change it also?
And the other thing is that, if set, the wal_keep_segments is the real
max_wal_size from the user perspective (not from perspective of the
algorithm in this patch, but user does not really care about that) which is
somewhat weird given the naming.


It seems like wal_keep_segments is more closely related to
wal_*min*_size.  The idea of both settings is that each is a minimum
amount of WAL we want to keep around for some purpose.  But they're
not quite the same, I guess, because wal_min_size just forces us to
keep that many files around - they can be overwritten whenever.
wal_keep_segments is an amount of actual WAL data we want to keep
around.


Err yes of course, min not max :)



Would it make sense to require that wal_keep_segments = wal_min_size?



It would to me, the patch as it stands is confusing in a sense that you 
can set min and max but then wal_keep_segments somewhat overrides those.


And BTW this brings another point, I actually don't see check for 
min_wal_size = max_wal_size anywhere in the patch.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

2015-02-22 Thread Michael Paquier

On Mon, Feb 23, 2015 at 12:03 PM, Corey Huinker corey.huin...@gmail.com wrote:
 + ERROR:  could not stat file
 /home/ubuntu/src/postgres/contrib/dblink/tmp_check/install/usr/local/pgsql/share/extension/dblink--1.2.sql:
 No such file or directory

Didn't you forget to add dblink--1.2.sql to DATA in contrib/dblink
Makefile? That would explain why this file has not been included in
the temporary installation deployed by pg_regress.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)

2015-02-22 Thread Jim Nasby


On 2/22/15 5:41 PM, Tomas Vondra wrote:

Otherwise, the code looks OK to me. Now, there are a few features I'd
like to have for production use (to minimize the impact):

1) no index support:-(

I'd like to see support for more relation types (at least btree
indexes). Are there any plans for that? Do we have an idea on how to
compute that?


It'd be cleaner if had actual an actual am function for this, but see below.


2) sampling just a portion of the table

For example, being able to sample just 5% of blocks, making it less
obtrusive, especially on huge tables. Interestingly, there's a
TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
of the methods (e.g. functions behind SYSTEM sampling)?

3) throttling

Another feature minimizing impact of running this on production might
be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
or something along those lines.

4) prefetch

fbstat_heap is using visibility map to skip fully-visible pages,
which is nice, but if we skip too many pages it breaks readahead
similarly to bitmap heap scan. I believe this is another place where
effective_io_concurrency (i.e. prefetch) would be appropriate.


All of those wishes are solved in one way or another by vacuum and/or 
analyze. If we had a hook in the tuple scanning loop and at the end of 
vacuum you could just piggy-back on it. But really all we'd need for 
vacuum to be able to report this info is one more field in LVRelStats, a 
call to GetRecordedFreeSpace for all-visible pages, and some logic to 
deal with pages skipped because we couldn't get the vacuum lock.


Should we just add this to vacuum instead?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

53 matches

Mail list logo