date:20150209

Re: [HACKERS] Parallel Seq Scan

2015-02-09 Thread Andres Freund

On 2015-02-07 17:16:12 -0500, Robert Haas wrote:
> On Sat, Feb 7, 2015 at 4:30 PM, Andres Freund  wrote:
> > > [ criticicm of Amit's heapam integration ]

> > I'm not convinced that that reasoning is generally valid. While it may
> > work out nicely for seqscans - which might be useful enough on its own -
> > the more stuff we parallelize the *more* the executor will have to know
> > about it to make it sane. To actually scale nicely e.g. a parallel sort
> > will have to execute the nodes below it on each backend, instead of
> > doing that in one as a separate step, ferrying over all tuples to
> > indivdual backends through queues, and only then parallezing the
> > sort.
> >
> > Now. None of that is likely to matter immediately, but I think starting
> > to build the infrastructure at the points where we'll later need it does
> > make some sense.
>
> Well, I agree with you, but I'm not really sure what that has to do
> with the issue at hand.  I mean, if we were to apply Amit's patch,
> we'd been in a situation where, for a non-parallel heap scan, heapam.c
> decides the order in which blocks get scanned, but for a parallel heap
> scan, nodeParallelSeqscan.c makes that decision.  Maybe I'm an old
> fuddy-duddy[1] but that seems like an abstraction violation to me.  I
> think the executor should see a parallel scan as a stream of tuples
> that streams into a bunch of backends in parallel, without really
> knowing how heapam.c is dividing up the work.  That's how it's
> modularized today, and I don't see a reason to change it.  Do you?

I don't really agree. Normally heapam just sequentially scan the heap in
one go, not much logic to that. Ok, then there's also the synchronized
seqscan stuff - which just about every user of heapscans but the
executor promptly disables again. I don't think a heap_scan_page() or
similar API will consitute a relevant layering violation over what we
already have.

Note that I'm not saying that Amit's patch is right - I haven't read it
- but that I don't think a 'scan this range of pages' heapscan API would
not be a bad idea. Not even just for parallelism, but for a bunch of
usecases.

> Regarding tuple flow between backends, I've thought about that before,
> I agree that we need it, and I don't think I know how to do it.  I can
> see how to have a group of processes executing a single node in
> parallel, or a single process executing a group of nodes we break off
> from the query tree and push down to it, but what you're talking about
> here is a group of processes executing a group of nodes jointly.

I don't think it really is that. I think you'd do it essentially by
introducing a couple more nodes. Something like

  SomeUpperLayerNode
  |
  |
 AggCombinerNode
/   \
   / \
  /   \
   PartialHashAggNode   PartialHashAggNode  
.PartialHashAggNode ...
   ||
   ||
   ||
   ||
PartialSeqScanPartialSeqScan

The only thing that'd potentially might need to end up working jointly
jointly would be the block selection of the individual PartialSeqScans
to avoid having to wait for stragglers for too long. E.g. each might
just ask for a range of a 16 megabytes or so that it scans sequentially.

In such a plan - a pretty sensible and not that uncommon thing for
parallelized aggregates - you'd need to be able to tell the heap scans
which blocks to scan. Right?

> That seems like an excellent idea, but I don't know how to design it.
> Actually routing the tuples between whichever backends we want to
> exchange them between is easy enough, but how do we decide whether to
> generate such a plan?  What does the actual plan tree look like?

I described above how I think it'd roughly look like. Whether to
generate it probably would be dependant on the cardinality (not much
point to do the above if all groups are distinct) and possibly the
aggregates in use (if we have a parallizable sum/count/avg etc).

> Maybe we designate nodes as can-generate-multiple-tuple-streams (seq
> scan, mostly, I would think) and can-absorb-parallel-tuple-streams
> (sort, hash, materialize), or something like that, but I'm really
> fuzzy on the details.

I don't think we really should have individual nodes that produce
multiple streams - that seems like it'd end up being really
complicated. I'd more say that we have distinct nodes (like the
PartialSeqScan ones above) that do a teensy bit of coordination about
which work to perform.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Traini

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Syed, Rahila

Hello,

A bug had been introduced in the latest versions of the patch. The order of 
parameters passed to pglz_decompress was wrong.
Please find attached patch with following correction,

Original code,
+   if (pglz_decompress(block_image, record->uncompressBuf,
+   bkpb->bkp_len, 
bkpb->bkp_uncompress_len) == 0)
Correction
+   if (pglz_decompress(block_image, bkpb->bkp_len,
+   record->uncompressBuf, 
bkpb->bkp_uncompress_len) == 0)


>For example here you should not have a space between the function definition 
>and the variable declarations:
>+{
>+
>+int orig_len = BLCKSZ - hole_length;
>This is as well incorrect in two places:
>if(hole_length != 0)
>There should be a space between the if and its condition in parenthesis.

Also corrected above code format mistakes.

Thank you,
Rahila Syed

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Syed, Rahila
Sent: Monday, February 09, 2015 6:58 PM
To: Michael Paquier; Fujii Masao
Cc: PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

Hello,

>> Do we always need extra two bytes for compressed backup block?
>> ISTM that extra bytes are not necessary when the hole length is zero.
>> In this case the length of the original backup block (i.e.,
>> uncompressed) must be BLCKSZ, so we don't need to save the original 
>> size in the extra bytes.

>Yes, we would need a additional bit to identify that. We could steal it from 
>length in XLogRecordBlockImageHeader.

This is implemented in the attached patch by dividing length field as follows,
uint16  length:15,  
with_hole:1; 

>"2" should be replaced with the macro variable indicating the size of 
>extra header for compressed backup block.
Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2

>You can refactor XLogCompressBackupBlock() and move all the above code 
>to it for more simplicity
This is also implemented in the patch attached.

Thank you,
Rahila Syed


-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com]
Sent: Friday, February 06, 2015 6:00 PM
To: Fujii Masao
Cc: Syed, Rahila; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
> Do we always need extra two bytes for compressed backup block?
> ISTM that extra bytes are not necessary when the hole length is zero.
> In this case the length of the original backup block (i.e.,
> uncompressed) must be BLCKSZ, so we don't need to save the original 
> size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

> Furthermore, when fpw compression is disabled and the hole length is 
> zero, we seem to be able to save one byte from the header of backup 
> block. Currently we use 4 bytes for the header, 2 bytes for the length 
> of backup block, 15 bits for the hole offset and 1 bit for the flag 
> indicating whether block is compressed or not. But in that case, the 
> length of backup block doesn't need to be stored because it must be 
> BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on HEAD, because 
you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader 
to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 
bytes as hole-related data is not necessary. I imagine that a patch optimizing 
that wouldn't be that hard to write as well.

> +int page_len = BLCKSZ - hole_length;
> +char *scratch_buf;
> +if (hole_length != 0)
> +{
> +scratch_buf = compression_scratch;
> +memcpy(scratch_buf, page, hole_offset);
> +memcpy(scratch_buf + hole_offset,
> +   page + (hole_offset + hole_length),
> +   BLCKSZ - (hole_length + hole_offset));
> +}
> +else
> +scratch_buf = page;
> +
> +/* Perform compression of block */
> +if (XLogCompressBackupBlock(scratch_buf,
> +page_len,
> +regbuf->compressed_page,
> +&compress_len))
> +{
> +/* compression is done, add record */
> +is_compressed = true;
> +}
>
> You can refactor XLogCompressBackupBlock() and move all the above code 
> to it for more simplicity.

Sure.
--
Michael

__
Disclaimer: Thi

Re: [HACKERS] RangeType internal use

2015-02-09 Thread Noah Misch

On Mon, Feb 09, 2015 at 12:37:05PM -0500, Tom Lane wrote:
> Robert Haas  writes:
> > Yeah, but people expect to be able to partition on ranges that are not
> > all of equal width.  I think any proposal that we shouldn't support
> > that is the kiss of death for a feature like this - it will be so
> > restricted as to eliminate 75% of the use cases.
> 
> Well, that's debatable IMO (especially your claim that variable-size
> partitions would be needed by a majority of users).

I don't know about user wishes directly, though I do suspect fixed partition
stride would cover more than 25% of uses cases.  I do know that SQL Server,
Oracle and MySQL have variable-stride range partitioning, and none of them
have fixed-stride range partitioning.  So, like Heikki and Robert, I would bet
on variable-stride range partitioning.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench -f and vacuum

2015-02-09 Thread Michael Paquier

On Mon, Feb 9, 2015 at 2:54 PM, Tatsuo Ishii wrote:
> Agreed. Here is the patch to implement the idea: -f just implies -n.

Some small comments:
- is_no_vacuum, as well as is_init_mode, are defined as an integers
but their use imply that they are boolean switches. This patch sets
is_no_vacuum to true, while the rest of the code actually increment
its value when -n is used. Why not simply changing both flags as
booleans? My suggestion is not really related to this patch and purely
cosmetic but we could change things at the same time, or update your
patch to increment to is_no_vacuum++ when -f is used.
- The documentation misses some markups for pgbench and VACUUM and did
not respect the 80-character limit.
Attached is an updated patch with those things changed.
Regards,
-- 
Michael
From 3553151908a1be40b67703fcc27b696bad546b96 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 10 Feb 2015 03:59:19 +0900
Subject: [PATCH] Imply -n when using -f in pgbench

At the same time make the flags is_init_mode and is_no_vaccuum booleans.
This is a purely cosmetic change but those flags have been always used
as such, even if they were defined as integers.
---
 contrib/pgbench/pgbench.c | 9 +
 doc/src/sgml/pgbench.sgml | 7 +++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ddd11a0..eed9324 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -2679,8 +2679,8 @@ main(int argc, char **argv)
 	int			c;
 	int			nclients = 1;	/* default number of simulated clients */
 	int			nthreads = 1;	/* default number of threads */
-	int			is_init_mode = 0;		/* initialize mode? */
-	int			is_no_vacuum = 0;		/* no vacuum at all before testing? */
+	bool		is_init_mode = false;	/* initialize mode? */
+	bool		is_no_vacuum = false;	/* no vacuum at all before testing? */
 	int			do_vacuum_accounts = 0; /* do vacuum accounts before testing? */
 	int			ttype = 0;		/* transaction type. 0: TPC-B, 1: SELECT only,
  * 2: skip update of branches and tellers */
@@ -2753,13 +2753,13 @@ main(int argc, char **argv)
 		switch (c)
 		{
 			case 'i':
-is_init_mode++;
+is_init_mode = true;
 break;
 			case 'h':
 pghost = pg_strdup(optarg);
 break;
 			case 'n':
-is_no_vacuum++;
+is_no_vacuum = true;
 break;
 			case 'v':
 do_vacuum_accounts++;
@@ -2872,6 +2872,7 @@ main(int argc, char **argv)
 			case 'f':
 benchmarking_option_set = true;
 ttype = 3;
+is_no_vacuum = true;	/* "-f" implies "-n" (no vacuum) */
 filename = pg_strdup(optarg);
 if (process_file(filename) == false || *sql_files[num_files - 1] == NULL)
 	exit(1);
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 7d203cd..0070882 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -316,6 +316,13 @@ pgbench  options  dbname
 -N, -S, and -f
 are mutually exclusive.

+   
+Please note that -f implies -n,
+which means that VACUUM for the standard
+pgbench tables will not be performed before
+running the benchmarking. VACUUM command should
+be run beforehand if needed.
+   
   
  
 
-- 
2.3.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-09 Thread Etsuro Fujita


On 2015/02/10 7:23, Dean Rasheed wrote:

On 9 February 2015 at 21:17, Stephen Frost  wrote:

On Fri, Jan 30, 2015 at 5:20 AM, Etsuro Fujita

I noticed that when updating security barrier views on foreign tables,
we fail to give FOR UPDATE to selection queries issued at ForeignScan.



I've looked into this a fair bit more over the weekend and the issue
appears to be that the FDW isn't expecting a do-instead sub-query.
I've been considering how we might be able to address that but havn't
come up with any particularly great ideas and would welcome any
suggestions.  Simply having the FDW try to go up through the query would
likely end up with too many queries showing up with 'for update'.  We
add the 'for update' to the sub-query before we even get called from
the 'Modify' path too, which means we can't use that to realize when
we're getting ready to modify rows and therefore need to lock them.

In any case, I'll continue to look but would welcome any other thoughts.



Sorry, I didn't have time to look at this properly. My initial thought
is that expand_security_qual() needs to request a lock on rows coming
from the relation it pushes down into a subquery if that relation was
the result relation, because otherwise it won't have any locks, since
preprocess_rowmarks() only adds PlanRowMarks to non-target relations.


That seems close to what I had in mind; expand_security_qual() needs to 
request a FOR UPDATE lock on rows coming from the relation it pushes 
down into a subquery only when that relation is the result relation and 
*foreign table*.


Thanks for dicussing this issue!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExplainModifyTarget doesn't work as expected

2015-02-09 Thread Etsuro Fujita

On 2015/02/07 1:09, Tom Lane wrote:
> IIRC, this code was written at a time when we didn't have NO INHERIT check
> constraints and so it was impossible for the parent table to get optimized
> away while leaving children.  So the comment in ExplainModifyTarget was
> good at the time.  But it no longer is.
> 
> I think your basic idea of preserving the original parent table's relid
> is correct; but instead of doing it like this patch does, I'd be inclined
> to make ModifyTable inherit from Scan not Plan, and use the scan.scanrelid
> field to carry the parent RTI.  Then you would probably end up with a net
> savings of code rather than net addition; certainly ExplainModifyTarget
> would go away entirely since you'd just treat ModifyTable like any other
> Scan in this part of EXPLAIN.

Will follow your revision.

Thanks!

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [pgsql-advocacy] GSoC 2015 - mentors, students and admins.

2015-02-09 Thread Atri Sharma

I am up for mentoring again.
On 10 Feb 2015 02:23, "Thom Brown"  wrote:

> Hi all,
>
> Google Summer of Code 2015 is approaching.  I'm intending on registering
> PostgreSQL again this year.
>
> Before I do that, I'd like to have an idea of how many people are
> interested in being either a student or a mentor.
>
> I've volunteered to be admin again, but if anyone else has a strong
> interest of seeing the projects through this year, please let yourself be
> known.
>
> Who would be willing to mentor projects this year?
>
> Project ideas are welcome, and I've copied many from last year's
> submissions into this year's wiki page.  Please feel free to add more (or
> delete any that stand no chance or are redundant):
> https://wiki.postgresql.org/wiki/GSoC_2015
>
> Students can find more information about GSoC here:
> http://www.postgresql.org/developer/summerofcode
>
> Thanks
>
> Thom
>

Re: [HACKERS] RangeType internal use

2015-02-09 Thread Amit Langote

On 10-02-2015 AM 02:37, Tom Lane wrote:
> Robert Haas  writes:
>> On Mon, Feb 9, 2015 at 10:36 AM, Tom Lane  wrote:
>>> It's going to be complicated and probably buggy, and I think it is heading
>>> in the wrong direction altogether.  If you want to partition in some
>>> arbitrary complicated fashion that the system can't reason about very
>>> effectively, we *already have that*.  IMO the entire point of building
>>> a new partitioning infrastructure is to build something simple, reliable,
>>> and a whole lot faster than what you can get from inheritance
>>> relationships.  And "faster" is going to come mainly from making the
>>> partitioning rules as simple as possible, not as complex as possible.
> 
>> Yeah, but people expect to be able to partition on ranges that are not
>> all of equal width.  I think any proposal that we shouldn't support
>> that is the kiss of death for a feature like this - it will be so
>> restricted as to eliminate 75% of the use cases.
> 
> Well, that's debatable IMO (especially your claim that variable-size
> partitions would be needed by a majority of users).  But in any case,
> partitioning behavior that is emergent from a bunch of independent pieces
> of information scattered among N tables seems absolutely untenable from
> where I sit.  Whatever we support, the behavior needs to be described by
> *one* chunk of information --- a sorted list of bin bounding values,
> perhaps.
> 

I'm a bit confused here. I got an impression that partitioning formula
as you suggest would consist of two pieces of information - an origin
point & a bin width. Then routing a tuple consists of using exactly
these two values to tell a bin number and hence a partition in O(1) time
assuming we've made all partitions be exactly bin-width wide.

You mention here a sorted list of bin bounding values which we can very
well put together for a partitioned table in its relation descriptor
based on whatever information we stored in catalog. That is, we can
always have a *one* chunk of partitioning information as *internal*
representation irrespective of how generalized we make our on-disk
representation. We can get O(log N) if not O(1) from that I'd hope. In
fact, that's what I had in mind about this.

Did I read it wrong?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Michael Paquier

On Mon, Feb 9, 2015 at 10:27 PM, Syed, Rahila wrote:
> (snip)

Thanks for showing up here! I have not tested the test the patch,
those comments are based on what I read from v17.

>>> Do we always need extra two bytes for compressed backup block?
>>> ISTM that extra bytes are not necessary when the hole length is zero.
>>> In this case the length of the original backup block (i.e.,
>>> uncompressed) must be BLCKSZ, so we don't need to save the original
>>> size in the extra bytes.
>
>>Yes, we would need a additional bit to identify that. We could steal it from 
>>length in XLogRecordBlockImageHeader.
>
> This is implemented in the attached patch by dividing length field as follows,
> uint16  length:15,
> with_hole:1;

IMO, we should add details about how this new field is used in the
comments on top of XLogRecordBlockImageHeader, meaning that when a
page hole is present we use the compression info structure and when
there is no hole, we are sure that the FPW raw length is BLCKSZ
meaning that the two bytes of the CompressionInfo stuff is
unnecessary.

>
>>"2" should be replaced with the macro variable indicating the size of
>>extra header for compressed backup block.
> Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2
>
>>You can refactor XLogCompressBackupBlock() and move all the
>>above code to it for more simplicity
> This is also implemented in the patch attached.

This portion looks correct to me.

A couple of other comments:
1) Nitpicky but, code format is sometimes strange.
For example here you should not have a space between the function
definition and the variable declarations:
+{
+
+int orig_len = BLCKSZ - hole_length;
This is as well incorrect in two places:
if(hole_length != 0)
There should be a space between the if and its condition in parenthesis.
2) For correctness with_hole should be set even for uncompressed
pages. I think that we should as well use it for sanity checks in
xlogreader.c when decoding records.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-09 Thread Tom Lane

Ravi Kiran  writes:
> I tried modifying the postgresql.conf file where I set the value
> enable_hashjoin=off and also enable_mergejoin=off, so that I could force
> postgres to use nested loop.
> but postgres is still using the hash join algorithm even after modifying
> the postgresql code.

Did you remember "pg_ctl reload"?  enable_hashjoin=off will most certainly
work if it's active.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sloppy back-patching of column-privilege leak

2015-02-09 Thread Tom Lane

Stephen Frost  writes:
> * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
>> FWIW using different commit messages for different branches is a
>> mistake.  The implicit policy we have is that there is one message,
>> identical for all branches, that describe how the patch differs in each
>> branch whenever necessary.  Note that the git_changelog output that
>> Robert pasted is not verbatim from the tool; what it actually prints is
>> three separate entries, one for each different message you used, which
>> is not what is supposed to occur.

> Ok, thanks.  That's certainly easy enough to do and I'll do so in the
> future.  I could have sworn I'd seen cases where further clarification
> was done for branch-specific commits but perhaps something else was
> different there.

Some people do it differently when the per-branch commits are very much
different, but what Alvaro suggests is certainly handy when it comes time
to make release notes ;-).

>> I say this policy is implicit because I don't recall it being spelled
>> out anywhere, but since it's embodied in git_changelog's working
>> principle it's important we stick to it.

> I have to admit that I've never really used git_changelog.

It's pretty handy.  The output for a couple of recent commits looks like

Author: Noah Misch 
Branch: master [a7a4adcf8] 2015-02-06 23:14:27 -0500

Assert(PqCommReadingMsg) in pq_peekbyte().

Interrupting pq_recvbuf() can break protocol sync, so its callers all
deserve this assertion.  The one pq_peekbyte() caller suffices already.

Author: Heikki Linnakangas 
Branch: master [ff16b40f8] 2015-02-06 11:26:50 +0200
Branch: REL9_4_STABLE [3bc4c6942] 2015-02-06 11:27:12 +0200
Branch: REL9_3_STABLE [5f0ba4abb] 2015-02-06 11:32:16 +0200
Branch: REL9_2_STABLE [2af568c6b] 2015-02-06 11:32:37 +0200
Branch: REL9_1_STABLE [0d36d9f2b] 2015-02-06 11:32:42 +0200

Report WAL flush, not insert, position in replication IDENTIFY_SYSTEM

When beginning streaming replication, the client usually issues the
IDENTIFY_SYSTEM command, which used to return the current WAL insert
position. That's not suitable for the intended purpose of that field,
however. pg_receivexlog uses it to start replication from the reported
point, but if it hasn't been flushed to disk yet, it will fail. Change
IDENTIFY_SYSTEM to report the flush position instead.

Backpatch to 9.1 and above. 9.0 doesn't report any WAL position.

Heikki's five commits got merged into one entry because they had identical
log-message texts and were committed on the same day.  Further back in the
output you find things like

Author: Peter Eisentraut 
Branch: master [1332bbb30] 2015-02-01 22:36:44 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [0ca8cc581] 2015-02-01 22:40:13 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [6b9b705c9] 2015-02-01 22:40:25 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [040f5ef50] 2015-02-01 22:40:36 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [2b0d33599] 2015-02-01 22:40:45 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [b09ca8834] 2015-02-01 22:40:53 -0500

doc: Improve claim about location of pg_service.conf

The previous wording claimed that the file was always in /etc, but of
course this varies with the installation layout.  Write instead that it
can be found via `pg_config --sysconfdir`.  Even though this is still
somewhat incorrect because it doesn't account of moved installations, it
at least conveys that the location depends on the installation.

which show the first actual release incorporating each patch.  So even
if you're not writing release notes, it's mighty handy for identifying
when/whether a given patch has shipped.  I tend to run this once a week
or so and keep the output around in a text file for quick searching.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-09 Thread Dean Rasheed

On 9 February 2015 at 21:17, Stephen Frost  wrote:
>> > On Fri, Jan 30, 2015 at 5:20 AM, Etsuro Fujita
>> > > I noticed that when updating security barrier views on foreign tables,
>> > > we fail to give FOR UPDATE to selection queries issued at ForeignScan.
>>
> I've looked into this a fair bit more over the weekend and the issue
> appears to be that the FDW isn't expecting a do-instead sub-query.
> I've been considering how we might be able to address that but havn't
> come up with any particularly great ideas and would welcome any
> suggestions.  Simply having the FDW try to go up through the query would
> likely end up with too many queries showing up with 'for update'.  We
> add the 'for update' to the sub-query before we even get called from
> the 'Modify' path too, which means we can't use that to realize when
> we're getting ready to modify rows and therefore need to lock them.
>
> In any case, I'll continue to look but would welcome any other thoughts.
>

Sorry, I didn't have time to look at this properly. My initial thought
is that expand_security_qual() needs to request a lock on rows coming
from the relation it pushes down into a subquery if that relation was
the result relation, because otherwise it won't have any locks, since
preprocess_rowmarks() only adds PlanRowMarks to non-target relations.

Of course that means that it may end up locking more rows than are
actually updated, but that's essentially the same as a SELECT FOR
UPDATE on a s.b. view right now.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Marc Balmer



Am 09.02.15 um 13:13 schrieb Hakan Kocaman:
> Hi,
> 
> 2015-02-09 10:37 GMT+01:00 Marc Balmer mailto:m...@msys.ch>>:
> 
> 
> (I use cursors to display large datasets in a page-wise way, where the
> user can move per-page, or, when displaying a single record, per record.
>  When the user goes back from per-record view to page-view, I have to
> restore the cursor to the position it was on before the user changed to
> per-record view.)
> 
> 
> On a totaly unrelated note:
> http://use-the-index-luke.com/de/blog/2013-07/pagination-done-the-postgresql-way

yes, totally unrelated, indeed.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] enabling nestedloop and disabling hashjon

2015-02-09 Thread Ravi Kiran

Hi,


I want to disable the hashjoin algorithm used by postgres by default, and
enable the nested loop join algorithm, can some one tell me how to do that.

I tried modifying the postgresql.conf file where I set the value
enable_hashjoin=off and also enable_mergejoin=off, so that I could force
postgres to use nested loop.
but postgres is still using the hash join algorithm even after modifying
the postgresql code.

can some one tell me what I am doing wrong, or is there any other file
where I need to modify to enable nested loop join algorithm.

Thanks
ᐧ

[HACKERS] Corner case for add_path_precheck

2015-02-09 Thread Antonin Houska

While thinking about add_path_precheck() function, it occurred to me that it
can discard some paths that otherwise would have chance be accepted in this
part of add_path():

/*
 * Same pathkeys and outer rels, and fuzzily
 * the same cost, so keep just one; to decide
 * which, first check rows and then do a fuzzy
 * cost comparison with very small fuzz limit.
 * (We used to do an exact cost comparison,
 * but that results in annoying
 * platform-specific plan variations due to
 * roundoff in the cost estimates.)  If things
 * are still tied, arbitrarily keep only the
 * old path.  Notice that we will keep only
 * the old path even if the less-fuzzy
 * comparison decides the startup and total
 * costs compare differently.
 */
if (new_path->rows < old_path->rows)
remove_old = true;  /* new dominates old */
else if (new_path->rows > old_path->rows)
accept_new = false; /* old dominates new */
else if (compare_path_costs_fuzzily(new_path,

The special case is that the path passed to add_path_precheck() has costs
*equal to* those of the old_path. If pathkeys, outer rells and costs are the
same, as summarized in the comment above, I expect add_path_precheck() to
return false. Do I misread anything?

(Maybe the fact that this does not happen too often is that
add_path_precheck() compares the costs exactly, as opposed to the "fuzzy
comparison"?)

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sloppy back-patching of column-privilege leak

2015-02-09 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> FWIW using different commit messages for different branches is a
> mistake.  The implicit policy we have is that there is one message,
> identical for all branches, that describe how the patch differs in each
> branch whenever necessary.  Note that the git_changelog output that
> Robert pasted is not verbatim from the tool; what it actually prints is
> three separate entries, one for each different message you used, which
> is not what is supposed to occur.

Ok, thanks.  That's certainly easy enough to do and I'll do so in the
future.  I could have sworn I'd seen cases where further clarification
was done for branch-specific commits but perhaps something else was
different there.

> I say this policy is implicit because I don't recall it being spelled
> out anywhere, but since it's embodied in git_changelog's working
> principle it's important we stick to it.

I have to admit that I've never really used git_changelog.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-09 Thread Stephen Frost

* Stephen Frost (sfr...@snowman.net) wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Fri, Jan 30, 2015 at 5:20 AM, Etsuro Fujita
> >  wrote:
> > > I noticed that when updating security barrier views on foreign tables,
> > > we fail to give FOR UPDATE to selection queries issued at ForeignScan.
> > > Here is an example.
> [...]
> > > postgres=# alter view rw_view set (security_barrier = true);
> > > ALTER VIEW
> > > postgres=# explain verbose delete from rw_view;
> > > QUERY PLAN
> > > --
> > >  Delete on public.base_ftbl base_ftbl_1  (cost=100.00..144.54 rows=14
> > > width=6)
> > >Remote SQL: DELETE FROM public.base_tbl WHERE ctid = $1
> > >->  Subquery Scan on base_ftbl  (cost=100.00..144.54 rows=14 width=6)
> > >  Output: base_ftbl.ctid
> > >  ->  Foreign Scan on public.base_ftbl base_ftbl_2
> > > (cost=100.00..144.40 rows=14 width=6)
> > >Output: base_ftbl_2.ctid
> > >Remote SQL: SELECT ctid FROM public.base_tbl WHERE
> > > ((visibility = 'public'::text))
> > > (7 rows)
> > >
> > > Correct me if I am wrong.
> > 
> > That looks like a bug to me.
> 
> Agreed.  I've been looking at this and I suspect it's related to the
> discussion around prepsecurity.c and generating the security barrier
> subquery that I've been having with Dean.  An initial look, at least,
> shows that GetForeignPlan is looking at the subquery instead of the base
> relation (as it expects to be).
> 
> I'll continue digging into it.

I've looked into this a fair bit more over the weekend and the issue
appears to be that the FDW isn't expecting a do-instead sub-query.
I've been considering how we might be able to address that but havn't
come up with any particularly great ideas and would welcome any
suggestions.  Simply having the FDW try to go up through the query would
likely end up with too many queries showing up with 'for update'.  We
add the 'for update' to the sub-query before we even get called from
the 'Modify' path too, which means we can't use that to realize when
we're getting ready to modify rows and therefore need to lock them.

In any case, I'll continue to look but would welcome any other thoughts.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] sloppy back-patching of column-privilege leak

2015-02-09 Thread Alvaro Herrera

Stephen Frost wrote:

> > Besides the sloppiness of back-patching in
> > that one macro and nothing else, this is a huge fraction of the patch
> > that's just *gone* in the 9.0 and 9.1 branches, and there's not so
> > much as a hint about it in the commit message, which is pretty
> > astonishing to me.
> 
> Right, 9.0 and 9.1 don't have as detailed messages and so there wasn't
> as much to do in those older branches.  I was well aware of that and I
> could have sworn that I included something in the commit messages..
> Looks like I did, but not in a way which was as clear as it should have
> been.  Specifically, in those older branches, the commit message only
> talks about the functions which exist in those branches-
> BuildIndexValueDescription and ri_ReportViolation.  The commit messages
> for 9.2 and beyond also reference ExecBuildSlotValueDescription, which
> is what you're getting at.
> 
> In hindsight, I agree that doing just that wasn't sufficient and
> thinking on it now I realize that, while having different commit
> messages for each branch may make sense if you're only looking at that
> branch, it ends up being confusing for folks following the overall
> project as they likely look at just the subject of each commit and
> expect the contents for every branch to be the same.  To that point,
> I'll try to be clearer when there's a difference in commit message for
> each branch, or simply include everything for every branch in an
> identical commit message across all of them.

FWIW using different commit messages for different branches is a
mistake.  The implicit policy we have is that there is one message,
identical for all branches, that describe how the patch differs in each
branch whenever necessary.  Note that the git_changelog output that
Robert pasted is not verbatim from the tool; what it actually prints is
three separate entries, one for each different message you used, which
is not what is supposed to occur.

I say this policy is implicit because I don't recall it being spelled
out anywhere, but since it's embodied in git_changelog's working
principle it's important we stick to it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] more RLS oversights

2015-02-09 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> I happened to notice this morning while hacking that the
> "hasRowSecurity" fields added to PlannerGlobal and PlannedStmt have
> not been given proper nodefuncs.c support.  Both need to be added to
> outfuncs.c, and the latter to copyfuncs.c.  The latter omission may
> well be a security bug, although I haven't attempted to verify that,
> but fortunately this isn't released yet.

I saw this and will address it.  Would be great if you wouldn't mind
CC'ing me directly on anything RLS-related, same as you CC'd me on the
column-privilege backpatch.  I expect I'll probably notice anyway, but
I'll see them faster when I'm CC'd.

I agree that it's great that we're catching issues prior to when the
feature is released and look forward to anything else you (or anyone
else!) finds.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] sloppy back-patching of column-privilege leak

2015-02-09 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> In 9.2 and newer branches, this commit makes substantial changes to
> execMain.c.  In 9.1 and 9.0, though, the change to that file is just
> this:
> 
> --- a/src/backend/executor/execMain.c
> +++ b/src/backend/executor/execMain.c
> @@ -95,6 +95,15 @@ static void intorel_receive(TupleTableSlot *slot,
> DestReceiver *self);
>  static void intorel_shutdown(DestReceiver *self);
>  static void intorel_destroy(DestReceiver *self);
> 
> +/*
> + * Note that this macro also exists in commands/trigger.c.  There does not
> + * appear to be any good header to put it into, given the structures that
> + * it uses, so we let them be duplicated.  Be sure to update both if one 
> needs
> + * to be changed, however.
> + */
> +#define GetModifiedColumns(relinfo, estate) \
> +   (rt_fetch((relinfo)->ri_RangeTableIndex,
> (estate)->es_range_table)->modifiedCols)
> +
>  /* end of local decls */
> 
> We shouldn't be adding a macro to that file that isn't used anywhere.

Ah, yes, agreed, that didn't need to be added in the older branches.

> Either that file needs more substantial changes, or it doesn't need
> this, either.  What gives?  

It doesn't need it.  The older branches don't produce as detailed errors
and because of that there weren't any further changes to be made.
Including the extraneous #define in those older branches was certainly a
mistake and I'll correct it.

> Besides the sloppiness of back-patching in
> that one macro and nothing else, this is a huge fraction of the patch
> that's just *gone* in the 9.0 and 9.1 branches, and there's not so
> much as a hint about it in the commit message, which is pretty
> astonishing to me.

Right, 9.0 and 9.1 don't have as detailed messages and so there wasn't
as much to do in those older branches.  I was well aware of that and I
could have sworn that I included something in the commit messages..
Looks like I did, but not in a way which was as clear as it should have
been.  Specifically, in those older branches, the commit message only
talks about the functions which exist in those branches-
BuildIndexValueDescription and ri_ReportViolation.  The commit messages
for 9.2 and beyond also reference ExecBuildSlotValueDescription, which
is what you're getting at.

In hindsight, I agree that doing just that wasn't sufficient and
thinking on it now I realize that, while having different commit
messages for each branch may make sense if you're only looking at that
branch, it ends up being confusing for folks following the overall
project as they likely look at just the subject of each commit and
expect the contents for every branch to be the same.  To that point,
I'll try to be clearer when there's a difference in commit message for
each branch, or simply include everything for every branch in an
identical commit message across all of them.

> Notice how the comments match the actual behavior.  But in 9.0, you've
> got the SAME COMMENTS with DIFFERENT BEHAVIOR:

Good point.  The comments clearly were for the master-based behavior but
the older branches don't have the same detail and so they should have
been updated.  I knew that they were different, and understood and
expected those differences, made the specific changes which were
appropriate for each branch, but missed updating the comments.

> The comments say that you'll see the relevant columns, but you don't.
> Since when is it OK to check things into our regression tests where
> the comments say one thing and the behavior is something else?

I'm a bit confused by this- I certainly didn't intend this mistake to be
implying that our policy on comments and the code they're associated
with should be changed to allow that they don't match up, nor do I
advocate such a position now.

> I don't even know what to say about this.  I cannot understand how you
> can back-patch something like this and fail to notice that the
> regression tests give different output on different branches, and that
> that output is inconsistent with the comments.

I was quite aware that the regression tests were different on different
branches, as I knew (which the commit messages show) that only a subset
of the patch was relevant for the older branches since they didn't
include the detail where the security issue originated from.  I
certainly reviewed the regression tests but obviously missed that the
comments ended up out-dated due to the back-branch capabilities.

Thanks for the review and comments.  I'll remove the extraneous #define,
the comment change which was made to the other #define (as it's not
relevant since the #define is only located in one place) and fix the
regression test comments to match the behavior in the older branches.

Thanks again!

Stephen

signature.asc
Description: Digital signature

[HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-09 Thread Thom Brown

Hi all,

Google Summer of Code 2015 is approaching.  I'm intending on registering
PostgreSQL again this year.

Before I do that, I'd like to have an idea of how many people are
interested in being either a student or a mentor.

I've volunteered to be admin again, but if anyone else has a strong
interest of seeing the projects through this year, please let yourself be
known.

Who would be willing to mentor projects this year?

Project ideas are welcome, and I've copied many from last year's
submissions into this year's wiki page.  Please feel free to add more (or
delete any that stand no chance or are redundant):
https://wiki.postgresql.org/wiki/GSoC_2015

Students can find more information about GSoC here:
http://www.postgresql.org/developer/summerofcode

Thanks

Thom

[HACKERS] sloppy back-patching of column-privilege leak

2015-02-09 Thread Robert Haas

Branch: master [804b6b6db] 2015-01-28 12:31:30 -0500
Branch: REL9_4_STABLE Release: REL9_4_1 [3cc74a3d6] 2015-01-28 12:32:06 -0500
Branch: REL9_3_STABLE Release: REL9_3_6 [4b9874216] 2015-01-28 12:32:39 -0500
Branch: REL9_2_STABLE Release: REL9_2_10 [d49f84b08] 2015-01-28 12:32:56 -0500
Branch: REL9_1_STABLE Release: REL9_1_15 [9406884af] 2015-01-28 12:33:15 -0500
Branch: REL9_0_STABLE Release: REL9_0_19 [3a2063369] 2015-01-28 12:33:29 -0500

In 9.2 and newer branches, this commit makes substantial changes to
execMain.c.  In 9.1 and 9.0, though, the change to that file is just
this:

--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -95,6 +95,15 @@ static void intorel_receive(TupleTableSlot *slot,
DestReceiver *self);
 static void intorel_shutdown(DestReceiver *self);
 static void intorel_destroy(DestReceiver *self);

+/*
+ * Note that this macro also exists in commands/trigger.c.  There does not
+ * appear to be any good header to put it into, given the structures that
+ * it uses, so we let them be duplicated.  Be sure to update both if one needs
+ * to be changed, however.
+ */
+#define GetModifiedColumns(relinfo, estate) \
+   (rt_fetch((relinfo)->ri_RangeTableIndex,
(estate)->es_range_table)->modifiedCols)
+
 /* end of local decls */

We shouldn't be adding a macro to that file that isn't used anywhere.
Either that file needs more substantial changes, or it doesn't need
this, either.  What gives?  Besides the sloppiness of back-patching in
that one macro and nothing else, this is a huge fraction of the patch
that's just *gone* in the 9.0 and 9.1 branches, and there's not so
much as a hint about it in the commit message, which is pretty
astonishing to me.

What's even more troubling is the new regression tests.  In master,
this commit added this:

+INSERT INTO t1 (c1, c2) VALUES (1, 1); -- fail, but row not shown
+ERROR:  duplicate key value violates unique constraint "t1_pkey"
+UPDATE t1 SET c2 = 1; -- fail, but row not shown
+ERROR:  duplicate key value violates unique constraint "t1_pkey"
+INSERT INTO t1 (c1, c2) VALUES (null, null); -- fail, but see columns
being inserted
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (c1, c2) = (null, null).
+INSERT INTO t1 (c3) VALUES (null); -- fail, but see columns being
inserted or have SELECT
+ERROR:  null value in column "c1" violates not-null constraint
+DETAIL:  Failing row contains (c1, c3) = (null, null).
+INSERT INTO t1 (c1) VALUES (5); -- fail, but see columns being
inserted or have SELECT
+ERROR:  null value in column "c2" violates not-null constraint
+DETAIL:  Failing row contains (c1) = (5).
+UPDATE t1 SET c3 = 10; -- fail, but see columns with SELECT rights,
or being modified
+ERROR:  new row for relation "t1" violates check constraint "t1_c3_check"
+DETAIL:  Failing row contains (c1, c3) = (1, 10).

Notice how the comments match the actual behavior.  But in 9.0, you've
got the SAME COMMENTS with DIFFERENT BEHAVIOR:

+INSERT INTO t1 (c1, c2) VALUES (1, 1); -- fail, but row not shown
+ERROR:  duplicate key value violates unique constraint "t1_pkey"
+UPDATE t1 SET c2 = 1; -- fail, but row not shown
+ERROR:  duplicate key value violates unique constraint "t1_pkey"
+INSERT INTO t1 (c1, c2) VALUES (null, null); -- fail, but see columns
being inserted
+ERROR:  null value in column "c1" violates not-null constraint
+INSERT INTO t1 (c3) VALUES (null); -- fail, but see columns being
inserted or have SELECT
+ERROR:  null value in column "c1" violates not-null constraint
+INSERT INTO t1 (c1) VALUES (5); -- fail, but see columns being
inserted or have SELECT
+ERROR:  null value in column "c2" violates not-null constraint
+UPDATE t1 SET c3 = 10; -- fail, but see columns with SELECT rights,
or being modified
+ERROR:  new row for relation "t1" violates check constraint "t1_c3_check"

The comments say that you'll see the relevant columns, but you don't.
Since when is it OK to check things into our regression tests where
the comments say one thing and the behavior is something else?

I don't even know what to say about this.  I cannot understand how you
can back-patch something like this and fail to notice that the
regression tests give different output on different branches, and that
that output is inconsistent with the comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-09 Thread Jan Urbański

I did some more digging on bug
http://www.postgresql.org/message-id/cahul3dpwyfnugdgo95ohydq4kugdnbkptjq0mnbtubhcmg4...@mail.gmail.com
which describes a deadlock when using libpq with SSL in a multi-threaded
environment with other threads doing SSL independently.

Attached is a reproducing Python script in my experience is faster at showing
the problem. Run it with

python -u pg_lock.py

As Heikki correctly diagnosed, the problem is with libpq unsetting the OpenSSL
locking callback while another thread is holding one of the locks. The other
thread then never releases the lock and the next time anyone tries to take it,
the application deadlocks.

The exact way it goes down is like this:



T1 (libpq) T2 (Python)

start libpq connection
init ssl system
add locking callback

   start ssl operation
   take lock

finish libpq connection
destroy ssl system
remove locking callback

   (!) release lock, noop since no callback



And the next time any thread tries to take the lock, it deadlocks.

We added unsetting the locking callback in
4e816286533dd34c10b368487d4079595a3e1418 due to this bug report:
http://www.postgresql.org/message-id/48620925.6070...@pws.com.au

Indeed, commenting out the CRYPTO_set_locking_callback(NULL) call in
fe-secure-openssl.c gets rid of the deadlock. However, it makes php segfault
with the (attached) reproduction script from the original 2008 bug report. If
your php.ini loads both the pgsql and curl extensions, reproduce the segfault 
with:

php -f pg_segfault.php

The most difficult part about fixing this bug is to determine *who's at
fault*. I now lean towards the opinion that we shouldn't be messing with
OpenSSL callbacks *at all*.

First of all, the current behaviour is crazy. We're setting and unsetting the
locking callback every time a connection is made/closed, which is not how
OpenSSL is supposed to be used. The *application* using libpq should set a
callback before it starts threads, it's no business of the library's.

The old behaviour was slightly less insane (set callbacks first time we're
engaging OpenSSL code, never touch them again). The real sane solution is to
leave it up to the application.

I posit we should remove all CRYPTO_set_*_callback functions and associated
cruft from libpq. This unfortunately means that multi-threaded applications
using libpq and SSL will break if they haven't been setting their own callbacks
(if they have, well, tough luck! libpq will just stomp over them the first time
it connects to Postgres, but at least *some* callbacks are left present after
that).

However, AFAICS if your app is not in C, then runtimes already handle that for
you (as they should).

Python:

https://hg.python.org/cpython/file/dc820b44ce21/Modules/_ssl.c#l4284

PHP:

https://github.com/php/php-src/blob/master/ext/curl/interface.c#L1235

Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
libpq was setting them on its own. If we remove the callback handling from
libpq, PHP will need to add them. By the way, the MySQL extension for PHP also
does not set those callbacks.

Let me reiterate: I now believe the callbacks should be set by the application,
libraries should not touch them, since they don't know what will they be
stomping on. If the application is run through a VM like Python or PHP, it's
the VM that should make sure the callbacks are set.

I could submit a patch to get rid of the crazy CRYPTO_*_callback dance in
libpq, but at the very least this will require a warning in the release notes
about how you can't assume that libpq will take care of making sure your app is
multi-threaded safe when using OpenSSL. I also don't know how far that's
back-patcheable.

I would very much like to have this change back-patched, since setting and
resetting the callback makes using libpq in a threaded OpenSSL-enabled app
arguably less safe than if it didn't use any locking. If the app is written
correctly, it will have set locking callbacks before starting. Then libpq will
happily stomp on them. If the app hasn't set callbacks, it wasn't written
correctly in the first place and it will get segfaults instead of deadlocks.

Thanks,
Jan
#!/usr/bin/env python
import sys
import time
import threading
import urllib

import psycopg2


DB_HOST = '127.0.0.1'
DB_USER = 'postgres'
DB_NAME = 'postgres'

HTTPS_URL = 'https://google.com'
NUM_HTTPS_THREADS = 10


def connect():
conn = psycopg2.connect(
host=DB_HOST, user=DB_USER,
database=DB_NAME, sslmode='require')
return conn


class Worker(threading.Thread):

def __init__(self):
super(Worker, self).__init__()
self._stop = threading.Event()

def stop(self):
self._stop.set()

def stopped(self):
return self._stop.isSet()

def run(self):
i = 0
while not self.stopped():
i += 1
self.do_work(i)


class Libpq(Worke

Re: [HACKERS] RangeType internal use

2015-02-09 Thread Robert Haas

On Mon, Feb 9, 2015 at 12:37 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Feb 9, 2015 at 10:36 AM, Tom Lane  wrote:
>>> It's going to be complicated and probably buggy, and I think it is heading
>>> in the wrong direction altogether.  If you want to partition in some
>>> arbitrary complicated fashion that the system can't reason about very
>>> effectively, we *already have that*.  IMO the entire point of building
>>> a new partitioning infrastructure is to build something simple, reliable,
>>> and a whole lot faster than what you can get from inheritance
>>> relationships.  And "faster" is going to come mainly from making the
>>> partitioning rules as simple as possible, not as complex as possible.
>
>> Yeah, but people expect to be able to partition on ranges that are not
>> all of equal width.  I think any proposal that we shouldn't support
>> that is the kiss of death for a feature like this - it will be so
>> restricted as to eliminate 75% of the use cases.
>
> Well, that's debatable IMO (especially your claim that variable-size
> partitions would be needed by a majority of users).  But in any case,
> partitioning behavior that is emergent from a bunch of independent pieces
> of information scattered among N tables seems absolutely untenable from
> where I sit.  Whatever we support, the behavior needs to be described by
> *one* chunk of information --- a sorted list of bin bounding values,
> perhaps.

That's exactly the representation I had in mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeType internal use

2015-02-09 Thread Tom Lane

Robert Haas  writes:
> On Mon, Feb 9, 2015 at 10:36 AM, Tom Lane  wrote:
>> It's going to be complicated and probably buggy, and I think it is heading
>> in the wrong direction altogether.  If you want to partition in some
>> arbitrary complicated fashion that the system can't reason about very
>> effectively, we *already have that*.  IMO the entire point of building
>> a new partitioning infrastructure is to build something simple, reliable,
>> and a whole lot faster than what you can get from inheritance
>> relationships.  And "faster" is going to come mainly from making the
>> partitioning rules as simple as possible, not as complex as possible.

> Yeah, but people expect to be able to partition on ranges that are not
> all of equal width.  I think any proposal that we shouldn't support
> that is the kiss of death for a feature like this - it will be so
> restricted as to eliminate 75% of the use cases.

Well, that's debatable IMO (especially your claim that variable-size
partitions would be needed by a majority of users).  But in any case,
partitioning behavior that is emergent from a bunch of independent pieces
of information scattered among N tables seems absolutely untenable from
where I sit.  Whatever we support, the behavior needs to be described by
*one* chunk of information --- a sorted list of bin bounding values,
perhaps.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeType internal use

2015-02-09 Thread Robert Haas

On Mon, Feb 9, 2015 at 10:36 AM, Tom Lane  wrote:
> It's going to be complicated and probably buggy, and I think it is heading
> in the wrong direction altogether.  If you want to partition in some
> arbitrary complicated fashion that the system can't reason about very
> effectively, we *already have that*.  IMO the entire point of building
> a new partitioning infrastructure is to build something simple, reliable,
> and a whole lot faster than what you can get from inheritance
> relationships.  And "faster" is going to come mainly from making the
> partitioning rules as simple as possible, not as complex as possible.

Yeah, but people expect to be able to partition on ranges that are not
all of equal width.  I think any proposal that we shouldn't support
that is the kiss of death for a feature like this - it will be so
restricted as to eliminate 75% of the use cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-02-09 Thread Robert Haas

On Mon, Feb 9, 2015 at 5:38 AM, Magnus Hagander  wrote:
> On Mon, Feb 9, 2015 at 11:09 AM, Marco Nenciarini
>  wrote:
>>
>> Il 08/02/15 17:04, Magnus Hagander ha scritto:
>> >
>> > Filenames are now shown for attachments, including a direct link to the
>> > attachment itself.  I've also run a job to populate all old threads.
>> >
>>
>> I wonder what is the algorithm to detect when an attachment is a patch.
>>
>> If you look at https://commitfest.postgresql.org/4/94/ all the
>> attachments are marked as "Patch: no", but many of them are
>> clearly a patch.
>
> It uses the "magic" module, same as the "file" command. And that one claims:
>
> mha@mha-laptop:/tmp$ file 0003-File-based-incremental-backup-v9.patch
> 0003-File-based-incremental-backup-v9.patch: ASCII English text, with very
> long lines
>
> I think it doesn't consider it a patch because it's not actually a patch -
> it looks like a git-format actual email message that *contains* a patch. It
> even includes the unix From separator line. So if anything it should have
> detected that it's an email message, which it apparently doesn't.
>
> Picking from the very top patch on the cf, an actual patch looks like this:
>
> mha@mha-laptop:/tmp$ file psql_fix_uri_service_004.patch
> psql_fix_uri_service_004.patch: unified diff output, ASCII text, with very
> long lines

Can we make it smarter, so that the kinds of things people produce
intending for them to be patches are thought by the CF app to be
patches?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeType internal use

2015-02-09 Thread Tom Lane

Amit Langote  writes:
> Okay, let me back up a little and think about your suggestion which I do
> not seem to understand very well - it raises a few questions for me:
> does this mean a partitioning criteria is associated with parent
> (partitioned table) rather than each individual partition?

Absolutely.  Anything else is not scalable; it's just another flavor of
the inheritance + CHECK constraint mechanism.  The entire point of doing a
new partitioning design IMO is to get away from that.  It should be
possible to determine which partition a row belongs to in O(1) time, not
O(N).

> I would guess
> that bin width is partition interval such that each bin number gives
> partition number (of equal-sized consecutively numbered partitions
> without gaps). But I don't quite understand what origin point is? Is
> that a key literal value from which to begin counting bins and if so, is
> it stored in catalog as part of the partitioning rule?

Yeah, I would think so.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RangeType internal use

2015-02-09 Thread Tom Lane

Heikki Linnakangas  writes:
> On 02/09/2015 03:21 AM, Tom Lane wrote:
>> Meh.  I don't care for that much --- it sounds a lot like deciding that
>> your problem is a nail because there is a hammer within reach.  A random
>> collection of ranges doesn't seem like a very appropriate representation
>> to me; first because there is no simple way to enforce that it partitions
>> the key space (no overlaps and no missing portions), and second because it
>> provides little purchase for efficient tuple routing algorithms.  The best
>> you could possibly hope for is some log-N tree search mechanism, and that
>> would require a fair amount of setup beforehand.

> Building a tree or a sorted array of the min or max bounds of the ranges 
> doesn't sound hard. log-N sounds fast enough.

It's going to be complicated and probably buggy, and I think it is heading
in the wrong direction altogether.  If you want to partition in some
arbitrary complicated fashion that the system can't reason about very
effectively, we *already have that*.  IMO the entire point of building
a new partitioning infrastructure is to build something simple, reliable,
and a whole lot faster than what you can get from inheritance
relationships.  And "faster" is going to come mainly from making the
partitioning rules as simple as possible, not as complex as possible.

Just to start with: one of the absolutely fundamental things we need out
of partitioning is the ability to have uniqueness constraints that span
a partitioned table set.  That means the partitions have to be disjoint,
and it has to be not possible to break that.  The design proposed here
won't enforce that without complicated (and again, possibly buggy) logic.

In short, this design requires a whole lot of extra mechanism to get to
places that we have to get to, and I don't believe that that extra
complexity is going to buy anything useful at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Hakan Kocaman

Hi,

2015-02-09 10:37 GMT+01:00 Marc Balmer :

>
> (I use cursors to display large datasets in a page-wise way, where the
> user can move per-page, or, when displaying a single record, per record.
>  When the user goes back from per-record view to page-view, I have to
> restore the cursor to the position it was on before the user changed to
> per-record view.)
>
>
> On a totaly unrelated note:
http://use-the-index-luke.com/de/blog/2013-07/pagination-done-the-postgresql-way

kind regards
hakm kocaman

> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Parallel Seq Scan

2015-02-09 Thread Robert Haas

On Mon, Feb 9, 2015 at 2:31 AM, Amit Kapila  wrote:
> Another idea is to use Executor level interfaces (like ExecutorStart(),
> ExecutorRun(), ExecutorEnd()) for execution rather than using Portal
> level interfaces.  I have used Portal level interfaces with the
> thought that we can reuse the existing infrastructure of Portal to
> make parallel execution of scrollable cursors, but as per my analysis
> it is not so easy to support them especially backward scan, absolute/
> relative fetch, etc, so Executor level interfaces seems more appealing
> to me (something like how Explain statement works (ExplainOnePlan)).
> Using Executor level interfaces will have advantage that we can reuse them
> for other parallel functionalaties.  In this approach, we need to take
> care of constructing relavant structures (with the information passed by
> master backend) required for Executor interfaces, but I think these should
> be lesser than what we need in previous approach (extract seqscan specific
> stuff from executor).

I think using the executor-level interfaces instead of the
portal-level interfaces is a good idea.  That would possibly let us
altogether prohibit access to the portal layer from within a parallel
worker, which seems like it might be a good sanity check to add.  But
that seems to still require us to have a PlannedStmt and a QueryDesc,
and I'm not sure whether that's going to be too much of a pain.  We
might need to think about an alternative API for starting the Executor
like ExecutorStartParallel() or ExecutorStartExtended().  But I'm not
sure.  If you can revise things to go through the executor interfaces
I think that would be a good start, and then perhaps after that we can
see what else makes sense to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-09 Thread Syed, Rahila

Hello,

>> Do we always need extra two bytes for compressed backup block?
>> ISTM that extra bytes are not necessary when the hole length is zero.
>> In this case the length of the original backup block (i.e., 
>> uncompressed) must be BLCKSZ, so we don't need to save the original 
>> size in the extra bytes.

>Yes, we would need a additional bit to identify that. We could steal it from 
>length in XLogRecordBlockImageHeader.

This is implemented in the attached patch by dividing length field as follows,
uint16  length:15,  
with_hole:1; 

>"2" should be replaced with the macro variable indicating the size of 
>extra header for compressed backup block.
Macro SizeOfXLogRecordBlockImageCompressionInfo is used instead of 2

>You can refactor XLogCompressBackupBlock() and move all the 
>above code to it for more simplicity
This is also implemented in the patch attached.

Thank you,
Rahila Syed


-Original Message-
From: Michael Paquier [mailto:michael.paqu...@gmail.com] 
Sent: Friday, February 06, 2015 6:00 PM
To: Fujii Masao
Cc: Syed, Rahila; PostgreSQL mailing lists
Subject: Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

On Fri, Feb 6, 2015 at 3:03 PM, Fujii Masao wrote:
> Do we always need extra two bytes for compressed backup block?
> ISTM that extra bytes are not necessary when the hole length is zero.
> In this case the length of the original backup block (i.e., 
> uncompressed) must be BLCKSZ, so we don't need to save the original 
> size in the extra bytes.

Yes, we would need a additional bit to identify that. We could steal it from 
length in XLogRecordBlockImageHeader.

> Furthermore, when fpw compression is disabled and the hole length is 
> zero, we seem to be able to save one byte from the header of backup 
> block. Currently we use 4 bytes for the header, 2 bytes for the length 
> of backup block, 15 bits for the hole offset and 1 bit for the flag 
> indicating whether block is compressed or not. But in that case, the 
> length of backup block doesn't need to be stored because it must be 
> BLCKSZ. Shouldn't we optimize the header in this way? Thought?

If we do it, that's something to tackle even before this patch on HEAD, because 
you could use the 16th bit of the first 2 bytes of XLogRecordBlockImageHeader 
to do necessary sanity checks, to actually not reduce record by 1 byte, but 2 
bytes as hole-related data is not necessary. I imagine that a patch optimizing 
that wouldn't be that hard to write as well.

> +int page_len = BLCKSZ - hole_length;
> +char *scratch_buf;
> +if (hole_length != 0)
> +{
> +scratch_buf = compression_scratch;
> +memcpy(scratch_buf, page, hole_offset);
> +memcpy(scratch_buf + hole_offset,
> +   page + (hole_offset + hole_length),
> +   BLCKSZ - (hole_length + hole_offset));
> +}
> +else
> +scratch_buf = page;
> +
> +/* Perform compression of block */
> +if (XLogCompressBackupBlock(scratch_buf,
> +page_len,
> +regbuf->compressed_page,
> +&compress_len))
> +{
> +/* compression is done, add record */
> +is_compressed = true;
> +}
>
> You can refactor XLogCompressBackupBlock() and move all the above code 
> to it for more simplicity.

Sure.
--
Michael

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v17.patch
Description: Support-compression-for-full-page-writes-in-WAL_v17.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What exactly is our CRC algorithm?

2015-02-09 Thread Abhijit Menon-Sen

At 2015-02-09 12:52:41 +0200, hlinnakan...@vmware.com wrote:
>
> Ok, I've committed a patch that just moves the existing code to
> common/pg_crc.c […]

Thanks, looks good.

> Attached is a rebased version of the slicing-by-8 patch.

Looks OK too.

> Do you have access to big-endian hardware to test this on?

Yes, I tested it on a Linux/ppc system. I wasn't speculating when I said
it "does make a noticeable difference", though I'm afraid I did not keep
the timings after submitting the revised patch. The speedup was some
double-digit percentage, IIRC.

-- Abhijit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The return value of allocate_recordbuf()

2015-02-09 Thread Michael Paquier

On Mon, Feb 9, 2015 at 7:58 PM, Fujii Masao  wrote:
> MemoryContextAllocExtended() was added, so isn't it time to replace palloc()
> with MemoryContextAllocExtended(CurrentMemoryContext, MCXT_ALLOC_NO_OOM)
> in allocate_recordbuf()?

Yeah, let's move on here, but not with this patch I am afraid as this
breaks any client utility using xlogreader.c in frontend, like
pg_xlogdump or pg_rewind because MemoryContextAllocExtended is not
available in frontend, only in backend. We are going to need something
like palloc_noerror, defined on both backend (mcxt.c) and frontend
(fe_memutils.c) side.

Btw, the huge flag should be used as well as palloc only allows
allocation up to 1GB, and this is incompatible with ~9.4 where malloc
is used.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-02-09 Thread Fujii Masao

On Sun, Feb 8, 2015 at 2:54 PM, Michael Paquier
 wrote:
> On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote:
>> - * Wait for more WAL to arrive. Time out after 5 
>> seconds,
>> - * like when polling the archive, to react to a trigger
>> - * file promptly.
>> + * Wait for more WAL to arrive. Time out after
>> the amount of
>> + * time specified by wal_retrieve_retry_interval, like
>> + * when polling the archive, to react to a
>> trigger file promptly.
>>   */
>>  WaitLatch(&XLogCtl->recoveryWakeupLatch,
>>WL_LATCH_SET | WL_TIMEOUT,
>> -  5000L);
>> +  wal_retrieve_retry_interval * 1000L);
>>
>> This change can prevent the startup process from reacting to
>> a trigger file. Imagine the case where the large interval is set
>> and the user want to promote the standby by using the trigger file
>> instead of pg_ctl promote. I think that the sleep time should be 5s
>> if the interval is set to more than 5s. Thought?
>
> I disagree here. It is interesting to accelerate the check of WAL
> availability from a source in some cases for replication, but the
> opposite is true as well as mentioned by Alexey at the beginning of
> the thread to reduce the number of requests when requesting WAL
> archives from an external storage type AWS. Hence a correct solution
> would be to check periodically for the trigger file with a maximum
> one-time wait of 5s to ensure backward-compatible behavior. We could
> reduce it to 1s or something like that as well.

You seem to have misunderstood the code in question. Or I'm missing something.
The timeout of the WaitLatch is just the interval to check for the trigger file
while waiting for more WAL to arrive from streaming replication. Not related to
the retry time to restore WAL from the archive.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The return value of allocate_recordbuf()

2015-02-09 Thread Fujii Masao

On Mon, Jan 5, 2015 at 1:25 PM, Michael Paquier
 wrote:
> On Thu, Jan 1, 2015 at 1:10 AM, Robert Haas  wrote:
>> On Mon, Dec 29, 2014 at 6:14 AM, Heikki Linnakangas
>>  wrote:
>>> Hmm. There is no way to check beforehand if a palloc() will fail because of
>>> OOM. We could check for MaxAllocSize, though.
>>
>> I think we need a version of palloc that returns NULL instead of
>> throwing an error.  The error-throwing behavior is for the best in
>> almost every case, but I think the no-error version would find enough
>> users to be worthwhile.
> Compression is one of those areas, be it compression of WAL or another
> type. The new API would allow to fallback to the non-compression code
> path if buffer allocation for compression cannot be done because of an
> OOM.
>
> FWIW, I actually looked at how to do that a couple of weeks back, and
> you just need a wrapper function, whose content is the existing
> AllocSetAlloc, taking an additional boolean flag to trigger an ERROR
> or leave with NULL if an OOM appears. On top of that we will need a
> new method in MemoryContextMethods, let's call it alloc_safe, for its
> equivalent, the new palloc_safe.

MemoryContextAllocExtended() was added, so isn't it time to replace palloc()
with MemoryContextAllocExtended(CurrentMemoryContext, MCXT_ALLOC_NO_OOM)
in allocate_recordbuf()?

Regards,

-- 
Fujii Masao
*** a/src/backend/access/transam/xlogreader.c
--- b/src/backend/access/transam/xlogreader.c
***
*** 149,155  allocate_recordbuf(XLogReaderState *state, uint32 reclength)
  
  	if (state->readRecordBuf)
  		pfree(state->readRecordBuf);
! 	state->readRecordBuf = (char *) palloc(newSize);
  	state->readRecordBufSize = newSize;
  	return true;
  }
--- 149,163 
  
  	if (state->readRecordBuf)
  		pfree(state->readRecordBuf);
! 	state->readRecordBuf =
! 		(char *) MemoryContextAllocExtended(CurrentMemoryContext,
! 			newSize,
! 			MCXT_ALLOC_NO_OOM);
! 	if (state->readRecordBuf == NULL)
! 	{
! 		state->readRecordBufSize = 0;
! 		return false;
! 	}
  	state->readRecordBufSize = newSize;
  	return true;
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Marc Balmer



Am 09.02.15 um 11:47 schrieb Marc Balmer:
> 
> 
> Am 09.02.15 um 10:46 schrieb Heikki Linnakangas:
>> [...] 
>> You could fairly easily write an extension to do that, btw. A C function
>> could call GetPortalByName() and peek into the PortalData.portalPos field.
>>
> 
> Would
> 
> PGresult *PQdescribePortal(PGconn *conn, const char *portalName);
> 
> from libpq also provide this information?  Should there be a way to run
> PQdescribePortal() from psql (e.g. \dP) ??

... and a quickly hacked test program shows that it does *not* return
the cursor position.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What exactly is our CRC algorithm?

2015-02-09 Thread Heikki Linnakangas


On 02/08/2015 08:33 PM, Abhijit Menon-Sen wrote:

At 2015-02-08 18:46:30 +0200, hlinnakan...@vmware.com wrote:


So I propose to move pg_crc.c to src/common, and move the tables
from pg_crc_tables.h directly to pg_crc.c. Thoughts?


Sounds fine to me.


Ok, I've committed a patch that just moves the existing code to 
common/pg_crc.c. I also moved pg_crc.h from include/utils to 
include/utils. That'll need any external programs to change their 
#include accordingly, but I think it was worth that. include/common is 
clearly the correct home for that file, and the only reason to keep it 
in include/utils would've been for backwards-compatibility.


Attached is a rebased version of the slicing-by-8 patch. I've made some 
cosmetic changes. Most notably, I turned the bswap32() function into a 
macro. Better to avoid the overhead of a function call, and it also 
avoids building the function altogether on little-endian systems that 
don't need it.


I'll continue review this.


At 2015-01-02 16:46:29 +0200, hlinnakan...@vmware.com wrote:


Would it even make sense to keep the crc variable in different byte
order, and only do the byte-swap once in END_CRC32() ?


...this certainly does make a noticeable difference. Will investigate.


Do you have access to big-endian hardware to test this on? It seems like 
an obvious optimization to shave off that one instruction from the hot 
loop, but if it turns out not to make any measurable difference, I'd 
prefer to keep the tables in the same order on big and little endian 
systems, reducing the amount of #ifdefs needed.


I tested this on my laptop by adding a BSWAP32() into the hot loop - 
which is bogus on a little endian Intel system - and it seems to make 
about 5% difference in a quick micro-benchmark. But it would be nice to 
get some numbers from the kind of big endian systems that people run in 
the real world.


- Heikki
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 90b56e7..509f961 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -193,6 +193,23 @@ fi])# PGAC_C_TYPES_COMPATIBLE
 
 
 
+# PGAC_C_BUILTIN_BSWAP32
+# -
+# Check if the C compiler understands __builtin_bswap32(),
+# and define HAVE__BUILTIN_BSWAP32 if so.
+AC_DEFUN([PGAC_C_BUILTIN_BSWAP32],
+[AC_CACHE_CHECK(for __builtin_bswap32, pgac_cv__builtin_bswap32,
+[AC_TRY_COMPILE([static unsigned long int x = __builtin_bswap32(0xaabbccdd);],
+[],
+[pgac_cv__builtin_bswap32=yes],
+[pgac_cv__builtin_bswap32=no])])
+if test x"$pgac_cv__builtin_bswap32" = xyes ; then
+AC_DEFINE(HAVE__BUILTIN_BSWAP32, 1,
+  [Define to 1 if your compiler understands __builtin_bswap32.])
+fi])# PGAC_C_BUILTIN_BSWAP32
+
+
+
 # PGAC_C_BUILTIN_CONSTANT_P
 # -
 # Check if the C compiler understands __builtin_constant_p(),
diff --git a/configure b/configure
index 8490eb7..fa271fe 100755
--- a/configure
+++ b/configure
@@ -10333,6 +10333,36 @@ if test x"$pgac_cv__types_compatible" = xyes ; then
 $as_echo "#define HAVE__BUILTIN_TYPES_COMPATIBLE_P 1" >>confdefs.h
 
 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_bswap32" >&5
+$as_echo_n "checking for __builtin_bswap32... " >&6; }
+if ${pgac_cv__builtin_bswap32+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+static unsigned long int x = __builtin_bswap32(0xaabbccdd);
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv__builtin_bswap32=yes
+else
+  pgac_cv__builtin_bswap32=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv__builtin_bswap32" >&5
+$as_echo "$pgac_cv__builtin_bswap32" >&6; }
+if test x"$pgac_cv__builtin_bswap32" = xyes ; then
+
+$as_echo "#define HAVE__BUILTIN_BSWAP32 1" >>confdefs.h
+
+fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_constant_p" >&5
 $as_echo_n "checking for __builtin_constant_p... " >&6; }
 if ${pgac_cv__builtin_constant_p+:} false; then :
diff --git a/configure.in b/configure.in
index b4bd09e..e6a49d1 100644
--- a/configure.in
+++ b/configure.in
@@ -1185,6 +1185,7 @@ PGAC_C_SIGNED
 PGAC_C_FUNCNAME_SUPPORT
 PGAC_C_STATIC_ASSERT
 PGAC_C_TYPES_COMPATIBLE
+PGAC_C_BUILTIN_BSWAP32
 PGAC_C_BUILTIN_CONSTANT_P
 PGAC_C_BUILTIN_UNREACHABLE
 PGAC_C_VA_ARGS
diff --git a/src/common/pg_crc.c b/src/common/pg_crc.c
index faf5a66..281707f 100644
--- a/src/common/pg_crc.c
+++ b/src/common/pg_crc.c
@@ -19,76 +19,1156 @@
 
 #include "c.h"
 
+#include "common/pg_crc.h"
+
+#ifdef WORDS_BIGENDIAN
+#define CRC8(x) pg_crc32c_table[0][((crc >> 24) ^ (x)) & 0xFF] ^ (crc << 8)
+#else
+#define CRC8(x) pg_crc32c_table[0][(crc ^ (x)) & 0xFF] ^ (crc >> 8)
+#endif
+
+/*
+ * This function computes a CRC using the slicing-by-8 algorithm, which
+ * uses an 8*256 lookup table to operate on eight bytes in parallel and
+ * recombine the results.
+ *
+ * Michael E. Kounavi

Re: [HACKERS] Parallel Seq Scan

2015-02-09 Thread Amit Kapila

On Sun, Feb 8, 2015 at 11:03 PM, Robert Haas  wrote:
>
> On Sat, Feb 7, 2015 at 10:36 PM, Amit Kapila 
wrote:
> >> Well, I agree with you, but I'm not really sure what that has to do
> >> with the issue at hand.  I mean, if we were to apply Amit's patch,
> >> we'd been in a situation where, for a non-parallel heap scan, heapam.c
> >> decides the order in which blocks get scanned, but for a parallel heap
> >> scan, nodeParallelSeqscan.c makes that decision.
> >
> > I think other places also decides about the order/way heapam.c has
> > to scan, example the order in which rows/pages has to traversed is
> > decided at portal/executor layer and the same is passed till heap and
> > in case of index, the scanlimits (heap_setscanlimits()) are decided
> > outside heapam.c and something similar is done for parallel seq scan.
> > In general, the scan is driven by Scandescriptor which is constructed
> > at upper level and there are some API's exposed to derive the scan.
> > If you are not happy with the current way nodeParallelSeqscan has
> > set the scan limits, we can have some form of callback which do the
> > required work and this callback can be called from heapam.c.
>
> I thought about a callback, but what's the benefit of doing that vs.
> hard-coding it in heapam.c?

Basically I want to address your concern of setting scan limit via
sequence scan node, one of the ways could be that pass a callback_function
and callback_state to heap_beginscan which will remember that information
in HeapScanDesc and then use in heap_getnext(), now callback_state will
have info about next page which will be updated by callback_function.

We can remember callback_function and callback_state information in
estate which will be set only by parallel worker which means it won't effect
non-parallel case.  I think this will be helpful in future as well where we
want
particular scan or sort to use that information to behave as parallel scan
or
sort.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Marc Balmer



Am 09.02.15 um 10:46 schrieb Heikki Linnakangas:
> [...] 
> You could fairly easily write an extension to do that, btw. A C function
> could call GetPortalByName() and peek into the PortalData.portalPos field.
> 

Would

PGresult *PQdescribePortal(PGconn *conn, const char *portalName);

from libpq also provide this information?  Should there be a way to run
PQdescribePortal() from psql (e.g. \dP) ??


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-02-09 Thread Magnus Hagander

On Mon, Feb 9, 2015 at 11:09 AM, Marco Nenciarini <
marco.nenciar...@2ndquadrant.it> wrote:

> Il 08/02/15 17:04, Magnus Hagander ha scritto:
> >
> > Filenames are now shown for attachments, including a direct link to the
> > attachment itself.  I've also run a job to populate all old threads.
> >
>
> I wonder what is the algorithm to detect when an attachment is a patch.
>
> If you look at https://commitfest.postgresql.org/4/94/ all the
> attachments are marked as "Patch: no", but many of them are
> clearly a patch.
>

It uses the "magic" module, same as the "file" command. And that one claims:

mha@mha-laptop:/tmp$ file 0003-File-based-incremental-backup-v9.patch
0003-File-based-incremental-backup-v9.patch: ASCII English text, with very
long lines

I think it doesn't consider it a patch because it's not actually a patch -
it looks like a git-format actual email message that *contains* a patch. It
even includes the unix From separator line. So if anything it should have
detected that it's an email message, which it apparently doesn't.

Picking from the very top patch on the cf, an actual patch looks like this:

mha@mha-laptop:/tmp$ file psql_fix_uri_service_004.patch
psql_fix_uri_service_004.patch: unified diff output, ASCII text, with very
long lines

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] New CF app deployment

2015-02-09 Thread Marco Nenciarini

Il 08/02/15 17:04, Magnus Hagander ha scritto:
> 
> Filenames are now shown for attachments, including a direct link to the
> attachment itself.  I've also run a job to populate all old threads.
> 

I wonder what is the algorithm to detect when an attachment is a patch.

If you look at https://commitfest.postgresql.org/4/94/ all the
attachments are marked as "Patch: no", but many of them are
clearly a patch.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Pavel Stehule

2015-02-09 10:59 GMT+01:00 Marc Balmer :

> >
> > 2015-02-09 10:37 GMT+01:00 Marc Balmer  m...@msys.ch>>:
> >
> > Currently there are FETCH and the (non standard) MOVE commands to
> work
> > on cursors.
> >
> > (I use cursors to display large datasets in a page-wise way, where
> the
> > user can move per-page, or, when displaying a single record, per
> record.
> >  When the user goes back from per-record view to page-view, I have to
> > restore the cursor to the position it was on before the user changed
> to
> > per-record view.)
> >
> > I have to "manually" keep track of the cursor position, but in some
> > cases it would definitely be easier to just query the current cursor
> > position directly from the database and later use "MOVE ABSOLUTE" to
> > rewind it to that position.  That could be achieved e.g. by a
> > hypothetical "TELL " command.  It does, however, not
> exist
> > and I have not found an alternative.  Is there a way to query the
> > current cusros position at all?  If not, does a TELL command sound
> like
> > a good or bad idea?
> >
> >
> > It sounds like good idea.
> >
> > Do we need a new statement? We can implement returning the position to
> > MOVE statement. It returns a delta, but it can returns a absolute
> > position too.
>
> On second thought, a new statement is not needed at all.  As Heikki
> noticed in hsi reply, it could either be a new function or have move to
> return the current position somehow(tm).  Or a nw option to move, maybe
> "MOVE NOT" (don't move the cursor but return it's position?
>
>
returning a absolute position in FETCH, MOVE statements has minimal
overhead probably, so you can get a current position as side effect of last
statement

and we support MOVE RELATIVE 0;

Regards

Pavel


>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Marc Balmer

> 
> 2015-02-09 10:37 GMT+01:00 Marc Balmer mailto:m...@msys.ch>>:
> 
> Currently there are FETCH and the (non standard) MOVE commands to work
> on cursors.
> 
> (I use cursors to display large datasets in a page-wise way, where the
> user can move per-page, or, when displaying a single record, per record.
>  When the user goes back from per-record view to page-view, I have to
> restore the cursor to the position it was on before the user changed to
> per-record view.)
> 
> I have to "manually" keep track of the cursor position, but in some
> cases it would definitely be easier to just query the current cursor
> position directly from the database and later use "MOVE ABSOLUTE" to
> rewind it to that position.  That could be achieved e.g. by a
> hypothetical "TELL " command.  It does, however, not exist
> and I have not found an alternative.  Is there a way to query the
> current cusros position at all?  If not, does a TELL command sound like
> a good or bad idea?
> 
> 
> It sounds like good idea.
> 
> Do we need a new statement? We can implement returning the position to
> MOVE statement. It returns a delta, but it can returns a absolute
> position too.

On second thought, a new statement is not needed at all.  As Heikki
noticed in hsi reply, it could either be a new function or have move to
return the current position somehow(tm).  Or a nw option to move, maybe
"MOVE NOT" (don't move the cursor but return it's position?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Pavel Stehule

Hi

2015-02-09 10:37 GMT+01:00 Marc Balmer :

> Currently there are FETCH and the (non standard) MOVE commands to work
> on cursors.
>
> (I use cursors to display large datasets in a page-wise way, where the
> user can move per-page, or, when displaying a single record, per record.
>  When the user goes back from per-record view to page-view, I have to
> restore the cursor to the position it was on before the user changed to
> per-record view.)
>
> I have to "manually" keep track of the cursor position, but in some
> cases it would definitely be easier to just query the current cursor
> position directly from the database and later use "MOVE ABSOLUTE" to
> rewind it to that position.  That could be achieved e.g. by a
> hypothetical "TELL " command.  It does, however, not exist
> and I have not found an alternative.  Is there a way to query the
> current cusros position at all?  If not, does a TELL command sound like
> a good or bad idea?
>

It sounds like good idea.

Do we need a new statement? We can implement returning the position to MOVE
statement. It returns a delta, but it can returns a absolute position too.

Regards

Pavel



>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Heikki Linnakangas


On 02/09/2015 11:37 AM, Marc Balmer wrote:

Currently there are FETCH and the (non standard) MOVE commands to work
on cursors.

(I use cursors to display large datasets in a page-wise way, where the
user can move per-page, or, when displaying a single record, per record.
  When the user goes back from per-record view to page-view, I have to
restore the cursor to the position it was on before the user changed to
per-record view.)

I have to "manually" keep track of the cursor position, but in some
cases it would definitely be easier to just query the current cursor
position directly from the database and later use "MOVE ABSOLUTE" to
rewind it to that position.  That could be achieved e.g. by a
hypothetical "TELL " command.  It does, however, not exist
and I have not found an alternative.  Is there a way to query the
current cusros position at all?  If not, does a TELL command sound like
a good or bad idea?


It's the first time I hear anyone wanting that, but I guess it might 
come handy sometimes. I think you'd usually still rather keep track of 
that in the client, though, because it's easy to do, and it avoids the 
extra round-trip to execute the TELL command.


Not sure we'd want to add the TELL keyword for this. Perhaps a 
pg_cursor_pos() function would be better.


You could fairly easily write an extension to do that, btw. A C function 
could call GetPortalByName() and peek into the PortalData.portalPos field.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] For cursors, there is FETCH and MOVE, why no TELL?

2015-02-09 Thread Marc Balmer

Currently there are FETCH and the (non standard) MOVE commands to work
on cursors.

(I use cursors to display large datasets in a page-wise way, where the
user can move per-page, or, when displaying a single record, per record.
 When the user goes back from per-record view to page-view, I have to
restore the cursor to the position it was on before the user changed to
per-record view.)

I have to "manually" keep track of the cursor position, but in some
cases it would definitely be easier to just query the current cursor
position directly from the database and later use "MOVE ABSOLUTE" to
rewind it to that position.  That could be achieved e.g. by a
hypothetical "TELL " command.  It does, however, not exist
and I have not found an alternative.  Is there a way to query the
current cusros position at all?  If not, does a TELL command sound like
a good or bad idea?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.6 Feature help requested: Inclusion Constraints

2015-02-09 Thread Jeff Davis

On Sat, 2015-02-07 at 16:08 -0800, Jeff Davis wrote:
> I believe Inclusion Constraints will be important for postgres.

I forgot to credit Darren Duncan with the name of this feature:

http://www.postgresql.org/message-id/4f8bb9b0.5090...@darrenduncan.net

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

48 matches

Mail list logo