date:20150928

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita
> Sent: Tuesday, September 29, 2015 12:15 PM
> To: Kaigai Kouhei(海外 浩平); Robert Haas
> Cc: PostgreSQL-development; 花田茂
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> 
> On 2015/09/29 9:13, Kouhei Kaigai wrote:
> >> -Original Message-
> >> From: pgsql-hackers-ow...@postgresql.org
> >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> >> Sent: Tuesday, September 29, 2015 5:46 AM
> >> To: Kaigai Kouhei(海外 浩平)
> >> Cc: Etsuro Fujita; PostgreSQL-development; 花田茂
> >> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> >>
> >> On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai  
> >> wrote:
> >>> The attached patch allows FDW driver to handle EPQ recheck by its own
> >>> preferable way, even if it is alternative local join or ExecQual to
> >>> the expression being pushed down.
> 
> Thanks for the work, KaiGai-san!
> 
> >> Thanks.  I was all set to commit this, or at least part of it, when I
> >> noticed that we already have an FDW callback called RefetchForeignRow.
> >> We seem to be intending that this new callback should refetch the row
> >> from the foreign server and verify that any pushed-down quals apply to
> >> it.  But why can't RefetchForeignRow do that?  That seems to be what
> >> it's for.
> 
> Thanks for the comments, Robert!
> 
> I thought the same thing [1].  While I thought it was relatively easy to
> make changes to RefetchForeignRow that way for the foreign table case
> (scanrelid>0), I was not sure how hard it would be to do so for the
> foreign join case (scanrelid==0).  So, I proposed to leave that changes
> for 9.6.  I'll have a rethink on this issue along the lines of that
> approach.
>
Even if base relation case, is it really easy to do?

RefetchForeignRow() does not take ForeignScanState as its argument,
so it is not obvious to access its private field, isn't it?
ExecRowMark contains "rti" field, so it might be feasible to find out
the target PlanState using walker routine recently supported, although
it is not a simple enough.
Unless we don't have reference to the private field, it is not feasible
to access expression that was pushed down to the remote-side, therefore,
it does not allow to apply proper rechecks here.

In addition, it is problematic when scanrelid==0 because we have no
relevant ForeignScanState which represents the base relations, even
though ExecRowMark is associated with a particular base relation.
In case of scanrelid==0, EPQ recheck routine also have to ensure
the EPQ tuple is visible towards the join condition in addition to
the qualifier of base relation. These information is also stored within
private data field, so it has to have a reference to the private data
of ForeignScanState of the remote join (scanrelid==0) which contains
the target relation.

Could you introduce us (1) how to access private data field of
ForeignScanState from the RefetchForeignRow callback? (2) why it
is reasonable to implement than the callback on ForeignRecheck().

> Sorry for having had no response.  I was on vacation.
>
Me too. :-)

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

Stephen Frost  writes:
> * Jim Nasby (jim.na...@bluetreble.com) wrote:
>> 2 years ago is when they first released the enterprise edition,
>> which according to [1] had "The most important new feature is that
>> you can now add members to groups of projects."

> It needed a lot more than a single feature.

Just going to their primary web page, and noting that the first line gives
links to "Features" and "Pricing" (but not "Documentation"), and the
second line makes it clear that there's a big difference between the
"Community Edition" and the "Enterprise Edition", is already enough to
turn me off.  We've seen that model before (mysql...) and it doesn't bode
well in the long run.

Further poking shows no evidence of any decent email integration, to name
just one of the things that have been mentioned as requirements in this
thread.  On the other hand, they are big on LDAP logins, and even
two-factor authentication.  (We need this for an issue tracker that's
supposed to provide visibility and easier reporting to people outside the
project?)  And JIRA integration, which seems to be an anti-feature to some
on this thread.  And they'd sure love to be in charge of our code repo.
And the main, possibly only, metadata they can track about an issue is
"assignee" and "milestone".

I can't imagine that this is what we want.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comment update to pathnode.c

2015-09-28 Thread Etsuro Fujita


On 2015/09/12 4:26, Robert Haas wrote:

On Fri, Sep 11, 2015 at 6:22 AM, Etsuro Fujita
 wrote:

The comments for create_foreignscan_path says as follows, but that it's
now possible that the function is called by GetForeignJoinPaths, which
was added in 9.5.

1450 /*
1451  * create_foreignscan_path
1452  *Creates a path corresponding to a scan of a foreign table,
1453  *returning the pathnode.
1454  *
1455  * This function is never called from core Postgres; rather, it's
expected
1456  * to be called by the GetForeignPaths function of a foreign data
wrapper.
1457  * We make the FDW supply all fields of the path, since we do not
have any
1458  * way to calculate them in core.
1459  */

So, I've updated the comments.  Please find attached a patch.


I would write "to be called by the GetForeignPaths or
GetForeignJoinPaths function" to keep it a bit more concise.  And I
would reflow the paragraph.


Thanks for the comments!  Attached is an updated version of the patch.

Best regards,
Etsuro Fujita
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
***
*** 1449,1461  create_worktablescan_path(PlannerInfo *root, RelOptInfo *rel,
  
  /*
   * create_foreignscan_path
!  *	  Creates a path corresponding to a scan of a foreign table,
!  *	  returning the pathnode.
   *
   * This function is never called from core Postgres; rather, it's expected
!  * to be called by the GetForeignPaths function of a foreign data wrapper.
!  * We make the FDW supply all fields of the path, since we do not have any
!  * way to calculate them in core.
   */
  ForeignPath *
  create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,
--- 1449,1461 
  
  /*
   * create_foreignscan_path
!  *	  Creates a path corresponding to a scan of a foreign table or
!  *	  a foreign join, returning the pathnode.
   *
   * This function is never called from core Postgres; rather, it's expected
!  * to be called by the GetForeignPaths or GetForeignJoinPaths function of
!  * a foreign data wrapper.  We make the FDW supply all fields of the path,
!  * since we do not have any way to calculate them in core.
   */
  ForeignPath *
  create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doubt in pgbench TPS number




(Note: I've not read the patch, so this is not an opinion about its
correctness.)


As Fabien anayzed the problem was that pgbench simply uses wrong
variable: nthreads (number of threads, specified by "-j" option)
vs. nclients (number of clients, specified by "-c" option).

@@ -2616,7 +2616,7 @@ printResults(int ttype, int64 normal_xacts, int nclients,
time_include = INSTR_TIME_GET_DOUBLE(total_time);
tps_include = normal_xacts / time_include;
tps_exclude = normal_xacts / (time_include -
-   
(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
+   
(INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));

Here conn_total_time is the sum of time to establish connection to
PostgreSQL. Since establishing connections to PostgreSQL is done in
serial rather than in parallel, conn_total_time should have been
divided by nclients.


After some more thinking and looking again at the connection code, I 
revise slightly my diagnostic:


 - the amount of parallelism is "nclients", as discussed above, when 
reconnecting on each transaction (-C) because the connections are managed 
in parallel from doCustom,


* BUT *

 - if there is no reconnections (not -C) the connections are performed in 
threadRun in a sequential way, all clients wait for the connections of 
other clients in the same thread before starting processing transactions, 
so "nthreads" is the right amount of parallelism in this case.


So on second thought the formula should rather be:

  ...  / (is_connect? nclients: nthreads)

Here is a v2 with this formula. Note that it does not take into account 
thread creation time, which might be significant, the start and end of a 
pgbench run are quite fuzzy.


I've removed the doCustom parameter change as it is included in a larger 
patch I submitted about reworking stat collections in pgbench, so this 
attached patch is bug fix only.


For the record, I still plainly disagree with the idea of shipping a 
performance measuring tool which knowingly displays wrong and optimistic 
figures under some conditions.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 6ae1b86..a6763f6 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2616,7 +2616,8 @@ printResults(int ttype, int64 normal_xacts, int nclients,
 	time_include = INSTR_TIME_GET_DOUBLE(total_time);
 	tps_include = normal_xacts / time_include;
 	tps_exclude = normal_xacts / (time_include -
-		(INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads));
+		(INSTR_TIME_GET_DOUBLE(conn_total_time) /
+		 (is_connect? nclients: nthreads)));
 
 	if (ttype == 0)
 		s = "TPC-B (sort of)";
@@ -3462,7 +3463,7 @@ main(int argc, char **argv)
 
 		/* thread level stats */
 		throttle_lag += thread->throttle_lag;
-		throttle_latency_skipped += threads->throttle_latency_skipped;
+		throttle_latency_skipped += thread->throttle_latency_skipped;
 		latency_late += thread->latency_late;
 		if (throttle_lag_max > thread->throttle_lag_max)
 			throttle_lag_max = thread->throttle_lag_max;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unclear about row-level security USING vs. CHECK

2015-09-28 Thread Charles Clavadetscher

Good morning

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org 
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Stephen Frost
> Sent: Montag, 28. September 2015 21:16
> To: Peter Eisentraut ; Robert Haas 
> Cc: pgsql-hackers ; Charles Clavadetscher 
> 
> Subject: Re: [HACKERS] unclear about row-level security USING vs. CHECK
> 
> * Peter Eisentraut (pete...@gmx.net) wrote:
> > I see.  But it is a bit odd to hide this very fundamental behavior
> > somewhere in a paragraph that starts out with something about roles.
> 
> I'm happy to change that.  You're right, it should be a paragraph by
> itself.
> 
> > There is also a mistake, I believe: DELETE policies also take both a
> > CHECK and a USING clause.
> 
> DELETE never adds records and therefore does not take a CHECK clause,
> only a USING clause:
> 
> =*# create policy p1 on t1 for delete using (c1 > 5) with check (c1 > 10);
> ERROR:  WITH CHECK cannot be applied to SELECT or DELETE
> 
> There has been some discussion about changing that, but that would be a
> future change and not for 9.5.
> 
> > I still find something about this weird, but I'm not sure what.  It's
> > not clear to me at what level this USING->CHECK mapping is applied.  I
> > can write FOR ALL USING and it will be mapped to CHECK for all actions,
> > including INSERT, but when I write FOR INSERT USING it complains.  Why
> > doesn't it do the mapping that case, too?
> 
> INSERT is only adding records and therefore only the CHECK policy
> applies:
> 
> =*# create policy p1 on t1 for insert using (c1 > 5) with check (c1 > 10);
> ERROR:  only WITH CHECK expression allowed for INSERT
> 
> The USING clause is for existing records while the CHECK option is for
> new records, which is why DELETE only has a USING clause and INSERT only
> has a WITH CHECK clause.  ALL allows you to specify clauses for all
> commands, which is why it accepts both.  The only other case which
> allows both is UPDATE, where records are both retrived and added.
> 
> > >> (Btw., what's the meaning of a policy for DELETE?)
> > >
> > > The DELETE policy controls what records a user is able to delete.
> >
> > That needs to be documented somewhere.
> 
> This is included in the CREATE POLICY documentation:
> 
> DELETE
> 
> Using DELETE for a policy means that it will apply to DELETE
>   commands. Only rows which pass this policy will be seen by a DELETE
>   command. Rows may be visible through a SELECT which are not seen by
>   a DELETE, as they do not pass the USING expression for the DELETE,
>   and rows which are not visible through the SELECT policy may be
>   deleted if they pass the DELETE USING policy. The DELETE policy only
>   accepts the USING expression as it only ever applies in cases where
>   records are being extracted from the relation for deletion.
> 
> I'm certainly all for improving the documentation, of course.  What
> about the above isn't clear regarding what DELETE policies do?  Or is
> the issue that it wasn't covered in ddl-rowsecurity?  Perhaps we should
> simply move much of the CREATE POLICY documentation into ddl-rowsecurity
> instead, since that's where people seem to be looking for this
> information?

I think that many people will look first into ddl-rowsecurity to get an 
understanding what it can do and how it can be used.
Detailed information is then in the CREATE POLICY doc. So it could make sense 
to move parts that contribute to understand the
mechanics as a whole from the CREATE POLICY doc to ddl-rowsecurity. As an 
alternative, when it comes to the characteristics of a
specific command, a link to the place in CREATE POLICY doc may be enough. Just 
no duplicated information. That would be difficult to
keep in sync.

> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Sat, Sep 26, 2015 at 9:46 PM, Peter Eisentraut  wrote:
> > > On 9/23/15 3:41 PM, Stephen Frost wrote:
> > > I see.  But it is a bit odd to hide this very fundamental behavior
> > > somewhere in a paragraph that starts out with something about roles.
> > >
> > > There is also a mistake, I believe: DELETE policies also take both a
> > > CHECK and a USING clause.
> > >
> > > I still find something about this weird, but I'm not sure what.  It's
> > > not clear to me at what level this USING->CHECK mapping is applied.  I
> > > can write FOR ALL USING and it will be mapped to CHECK for all actions,
> > > including INSERT, but when I write FOR INSERT USING it complains.  Why
> > > doesn't it do the mapping that case, too?
> >
> > We are really pushing our luck only hammering this stuff out now.  But
> > I think I agree with Peter's concerns, FWIW.
> 
> I listed out the various alternatives but didn't end up getting any
> responses to it.  I'm still of the opinion that the documentation is the
> main thing which needs improving here, but we can also

Re: [HACKERS] Parallel Seq Scan

2015-09-28 Thread Amit Kapila

On Sun, Sep 27, 2015 at 1:39 AM, Robert Haas  wrote:
>
> I intend to commit this patch (but not the crappy test code, of
> course) pretty soon, and then I'm going to start working on the
> portion of the patch that actually adds the Funnel node, which I think
> you are working on renaming to Gather.

Attached patch is a rebased patch based on latest commit (d1b7c1ff)
for Gather node.

- I have to reorganize the defines in execParallel.h and .c. To keep
ParallelExecutorInfo, in GatherState node, we need to include execParallel.h
in execnodes.h which was creating compilation issues as execParallel.h
also includes execnodes.h, so for now I have defined ParallelExecutorInfo
in execnodes.h and instrumentation related structures in instrument.h.
- Renamed parallel_seqscan_degree to degree_of_parallelism
- Rename Funnel to Gather
- Removed PARAM_EXEC parameter handling code, I think we can do this
separately.

I have to work more on partial seq scan patch for rebasing it and handling
review comments for the same, so for now I am sending the first part of
patch (which included Gather node functionality and some general support
for parallel-query execution).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_seqscan_gather_v19.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ToDo: possibility to set sqlcode, detail, hint in elog like functions in plpython and plperl

Hi

now, isn't possible to raise rich exception from plpython or plperl. Should
be fixed.

Regards

Pavel

Re: [HACKERS] row_security GUC, BYPASSRLS

* Noah Misch (n...@leadboat.com) wrote:
> On Thu, Sep 10, 2015 at 03:23:13PM -0400, Stephen Frost wrote:
> > First off, thanks again for your review and comments on RLS.  Hopefully
> > this addresses the last remaining open item from that review.
> 
> Item (3a), too, remains open.

Provided I didn't break the buildfarm (keeping an eye on it), this has
been done.

Will update the open items wiki once it looks like the buildfarm is
happy.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

2015-09-28 Thread Asif Naeem

I have spent sometime to investigate the issue, it is reproduciable. In
case of Windows, when pqsecure_raw_read() function error code
WSAEWOULDBLOCK (EWOULDBLOCK) when no data queued to be read from the non
blocking socket there is a need to log retry flag. Related error code can
be retrieved via Windows WSAGetLastError() instead of errno, preprocessor
SOCK_ERRNO handle it gracefully. PFA patch, it resolve the issue i.e.

C:\PG\postgresql\pg_with_openssl_inst_v1_patch>bin\psql.exe -d postgres -h
>  172.16.141.210
> psql (9.5alpha2)
> WARNING: Console code page (437) differs from Windows code page (1252)
>  8-bit characters might not work correctly. See psql reference
>  page "Notes for Windows users" for details.
> SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384,
> bits: 256, compression: off)
> Type "help" for help.
> postgres=# select version();
>  version
> -
>  PostgreSQL 9.5alpha2, compiled by Visual C++ build 1800, 64-bit
> (1 row)


Regards,
Muhammad Asif Naeem


On Thu, Sep 24, 2015 at 5:12 PM, Thom Brown  wrote:

> On 23 September 2015 at 13:10, Michael Paquier
>  wrote:
> >
> >
> > On Wed, Sep 23, 2015 at 2:15 AM, Robert Haas 
> wrote:
> >>
> >> On Tue, Sep 22, 2015 at 11:23 AM, Andrew Dunstan 
> >> wrote:
> >> > "git bisect" is your friend.
> >>
> >> Yeah, but finding someone who has a working Windows build environment
> >> and a lot of time to run this down is my enemy.  We're trying, but if
> >> anyone else has a clue, that would be much appreciated.
> >
> >
> > That's not cool. I have added this problem in the list of open items for
> > 9.5.
>
> This appears that it might be related to the version of OpenSSL that's
> been packaged with PostgreSQL 9.5 alpha 2.  When swapping this out for
> the version that's shipped with 9.4, it works.  I don't have the
> specific OpenSSL versions to hand, but I'll report back anything as I
> learn more.
>
> --
> Thom
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


win_ssl_issue_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On-demand running query plans using auto_explain and signals

On Mon, Sep 28, 2015 at 7:09 PM, Pavel Stehule 
wrote:

>
> 2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>>
>>> I didn't propose too different solution. There is only one difference -
>>> sharing DSM for smaller data. It is similar to using usual shared memory.
>>>
>>
>> Does this mean implementing some sort of allocator on top of the shared
>> memory segment?  If so, how are you going to prevent fragmentation?
>>
>
> yes, simple memory allocator is necessary in this case. But it should be
> really simple - you can allocate only fixed size blocks - 10KB, 100KB and
> 1MB from separate buffers. So the fragmentation is not possible.
>

Maybe we're talking about completely different ideas, but I don't see how
fixing the block helps to fight fragmentation, in particular.  Can you
sketch a simple example?  E.g. 400 backends share the common segment and
all of them want to publish a plan of ~1kb, for example.  Now what?

--
Alex

Re: [HACKERS] optimizing vacuum truncation scans

On Mon, Sep 28, 2015 at 2:05 AM, Haribabu Kommi
 wrote:
> I got the following way to get the whether data file is present in the
> DISK or SSD.
>
> 1. Get the device file system that table data file is mapped using the
> following or similar.
>
> df -P "filename" | awk 'NR==2{print $1}'
>
> 2. if the device file system is of type /dev/sd* then treat is as a
> disk system and proceed
> with the prefetch optimization.
>
> 3. if we are not able to find the device details directly then we need
> to get the information
> from the mapping system.
>
> Usually the devices will map like the following
>
> /dev/mapper/v** points to ../dm-*
>
> 4. Get the name of the "dm-*"  from the above details and check
> whether it is a SSD or not
> with the following command.
>
> /sys/block/dm-*/queue/rotation
>
> 5. If the value is 0 then it is an SSD drive, 1 means disk drive.
>
> The described above procedure works only for linux. I didn't check for
> other operating systems yet.
> Is it worth to consider?

No.  If we need to have the behavior depend on the hardware, it should
be a GUC or tablespace option or reloption, not some kind of crazy
OS-dependent discovery.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] track_commit_timestamp and COMMIT PREPARED

2015-09-28 Thread Petr Jelinek


On 2015-09-28 18:59, Robert Haas wrote:


The patch looks good to me except the following minor points.

  * or not.  Normal path through RecordTransactionCommit() will be related
  * to a transaction commit XLog record, and so should pass "false" here.

The above source comment of TransactionTreeSetCommitTsData() seems to
need to be updated.

+ * Note: if you change this functions you should also look at
+ * RecordTransactionCommitPrepared in twophase.c.

Typo: "this functions" should be "this function"

+if (replorigin_sesssion_origin == InvalidRepOriginId ||

This is not the problem of the patch, but I started wondering what
"sesssion" in the variable name "replorigin_sesssion_origin" means.
Is it just a typo and it should be "session"? Or it's the abbreviation
of something?


This should go in before beta.  Is someone updating the patch?



Sorry, missed your reply.

The "sesssion" is typo and it actually affects several variables around 
the replication origin, so I attached separate patch (which should be 
applied first) which fixes the typo everywhere.


I reworded the comment, hopefully it's better this way.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From e8c68da28dd3f6b785aa23d4cf3c2973d6bca6c5 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Mon, 28 Sep 2015 19:43:19 +0200
Subject: [PATCH 2/2] commit_ts-2pcfix

---
 src/backend/access/transam/commit_ts.c |  9 +
 src/backend/access/transam/twophase.c  | 26 +-
 src/backend/access/transam/xact.c  |  3 +++
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 33136e3..cddde30 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -123,10 +123,11 @@ static void WriteSetTimestampXlogRec(TransactionId mainxid, int nsubxids,
  * decision of storing timestamp info for each subxid.
  *
  * The do_xlog parameter tells us whether to include an XLog record of this
- * or not.  Normal path through RecordTransactionCommit() will be related
- * to a transaction commit XLog record, and so should pass "false" here.
- * Other callers probably want to pass true, so that the given values persist
- * in case of crashes.
+ * or not.  Normally, this is called from transaction (both normal and
+ * prepared) commit code and the information will be stored in the transaction
+ * commit XLog record, and so it should pass "false" here. The XLog redo code
+ * should use "false" here as well. Other callers probably want to pass true,
+ * so that the given values persist in case of crashes.
  */
 void
 TransactionTreeSetCommitTsData(TransactionId xid, int nsubxids,
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d48d101..6e7ede9 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 
+#include "access/commit_ts.h"
 #include "access/htup_details.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -56,6 +57,7 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
+#include "replication/origin.h"
 #include "replication/walsender.h"
 #include "replication/syncrep.h"
 #include "storage/fd.h"
@@ -2087,6 +2089,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 bool initfileinval)
 {
 	XLogRecPtr	recptr;
+	TimestampTz	committs = GetCurrentTimestamp();
 
 	START_CRIT_SECTION();
 
@@ -2094,13 +2097,34 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	MyPgXact->delayChkpt = true;
 
 	/* Emit the XLOG commit record */
-	recptr = XactLogCommitRecord(GetCurrentTimestamp(),
+	recptr = XactLogCommitRecord(committs,
  nchildren, children, nrels, rels,
  ninvalmsgs, invalmsgs,
  initfileinval, false,
  xid);
 
 	/*
+	 * Record plain commit ts if not replaying remote actions, or if no
+	 * timestamp is configured.
+	 */
+	if (replorigin_session_origin == InvalidRepOriginId ||
+		replorigin_session_origin == DoNotReplicateId ||
+		replorigin_session_origin_timestamp == 0)
+		replorigin_session_origin_timestamp = committs;
+	else
+		replorigin_session_advance(replorigin_session_origin_lsn,
+   XactLastRecEnd);
+
+	/*
+	 * We don't need to WAL log origin or timestamp here, the commit
+	 * record contains all the necessary information and will redo the SET
+	 * action during replay.
+	 */
+	TransactionTreeSetCommitTsData(xid, nchildren, children,
+   replorigin_session_origin_timestamp,
+   replorigin_session_origin, false);
+
+	/*
 	 * We don't currently try to sleep before flush here ... nor is there any
 	 * support for async commit of a prepared xact (the very idea is probably
 	 * a contradiction)
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index

Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-28 Thread Josh Berkus

On 09/28/2015 08:10 AM, Robert Haas wrote:
> -1 on that idea.  I really don't think that we should categorically
> decide we don't support higher optimization levels.  If the compiler
> has a bug, then the compiler manufacturer should fix it, and it's not
> our fault.  If the compiler doesn't have a bug and our stuff is
> blowing up, then we have a bug and should fix it.  I suppose there
> could be some grey area but hopefully not too much.

Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for
most programs because it could change behavior.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unclear about row-level security USING vs. CHECK

* Peter Eisentraut (pete...@gmx.net) wrote:
> I see.  But it is a bit odd to hide this very fundamental behavior
> somewhere in a paragraph that starts out with something about roles.

I'm happy to change that.  You're right, it should be a paragraph by
itself.

> There is also a mistake, I believe: DELETE policies also take both a
> CHECK and a USING clause.

DELETE never adds records and therefore does not take a CHECK clause,
only a USING clause:

=*# create policy p1 on t1 for delete using (c1 > 5) with check (c1 > 10);
ERROR:  WITH CHECK cannot be applied to SELECT or DELETE

There has been some discussion about changing that, but that would be a
future change and not for 9.5.

> I still find something about this weird, but I'm not sure what.  It's
> not clear to me at what level this USING->CHECK mapping is applied.  I
> can write FOR ALL USING and it will be mapped to CHECK for all actions,
> including INSERT, but when I write FOR INSERT USING it complains.  Why
> doesn't it do the mapping that case, too?

INSERT is only adding records and therefore only the CHECK policy
applies:

=*# create policy p1 on t1 for insert using (c1 > 5) with check (c1 > 10);
ERROR:  only WITH CHECK expression allowed for INSERT

The USING clause is for existing records while the CHECK option is for
new records, which is why DELETE only has a USING clause and INSERT only
has a WITH CHECK clause.  ALL allows you to specify clauses for all
commands, which is why it accepts both.  The only other case which
allows both is UPDATE, where records are both retrived and added.

> >> (Btw., what's the meaning of a policy for DELETE?)
> > 
> > The DELETE policy controls what records a user is able to delete.
> 
> That needs to be documented somewhere.

This is included in the CREATE POLICY documentation:

DELETE

Using DELETE for a policy means that it will apply to DELETE
commands. Only rows which pass this policy will be seen by a DELETE
command. Rows may be visible through a SELECT which are not seen by
a DELETE, as they do not pass the USING expression for the DELETE,
and rows which are not visible through the SELECT policy may be
deleted if they pass the DELETE USING policy. The DELETE policy only
accepts the USING expression as it only ever applies in cases where
records are being extracted from the relation for deletion.

I'm certainly all for improving the documentation, of course.  What
about the above isn't clear regarding what DELETE policies do?  Or is
the issue that it wasn't covered in ddl-rowsecurity?  Perhaps we should
simply move much of the CREATE POLICY documentation into ddl-rowsecurity
instead, since that's where people seem to be looking for this
information?

* Robert Haas (robertmh...@gmail.com) wrote:
> On Sat, Sep 26, 2015 at 9:46 PM, Peter Eisentraut  wrote:
> > On 9/23/15 3:41 PM, Stephen Frost wrote:
> > I see.  But it is a bit odd to hide this very fundamental behavior
> > somewhere in a paragraph that starts out with something about roles.
> >
> > There is also a mistake, I believe: DELETE policies also take both a
> > CHECK and a USING clause.
> >
> > I still find something about this weird, but I'm not sure what.  It's
> > not clear to me at what level this USING->CHECK mapping is applied.  I
> > can write FOR ALL USING and it will be mapped to CHECK for all actions,
> > including INSERT, but when I write FOR INSERT USING it complains.  Why
> > doesn't it do the mapping that case, too?
> 
> We are really pushing our luck only hammering this stuff out now.  But
> I think I agree with Peter's concerns, FWIW.

I listed out the various alternatives but didn't end up getting any
responses to it.  I'm still of the opinion that the documentation is the
main thing which needs improving here, but we can also change CREATE
POLICY, et al, to require an explicit WITH CHECK clause for the commands
where that makes sense if that's the consensus.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] unclear about row-level security USING vs. CHECK

On Mon, Sep 28, 2015 at 3:15 PM, Stephen Frost  wrote:
> I listed out the various alternatives but didn't end up getting any
> responses to it.  I'm still of the opinion that the documentation is the
> main thing which needs improving here, but we can also change CREATE
> POLICY, et al, to require an explicit WITH CHECK clause for the commands
> where that makes sense if that's the consensus.

My vote is to remove the behavior where USING flows over to WITH
CHECK.  So you only get a WITH CHECK policy if you explicitly specify
one.

If there's some other consensus, OK, but tempus fugit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename withCheckOptions to insertedCheckClauses

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> None of this renaming seems like an improvement to me.

I have to admit that I'm not entirely sure it's improving things either.

Certainly, we shouldn't be dumping out the withCheckOptions node and
perhaps we should move its place in Query and add comments to it, but
I'm not sure about the other changes.

> ri_InsertedCheckClauses is misleading because they're not clauses,
> they're WithCheckOption nodes carrying various pieces of information,
> only one of which is the clause to check.
> 
> ri_InsertedCheckOptions is a bit better from that perspective, but it
> still seems messy for the executor to carry the knowledge that these
> checks were inserted by the rewriter. In the executor they're just
> checks to be executed, and "WithCheck" reflects their original source
> better than "InsertedCheck".

Yeah, the actual structure is 'WithCheckOption' and everything is pretty
consistent around that.  One possibly confusing bit is that we have a
ViewCheckOption enum in ViewStmt called "withCheckOption".  Perhaps
that's what we should change?  Semes like it's either that or rename the
structure itself to NewTupleCheck (or similar) and rename everything to
go along with that instead.

> Also, these were added in 9.4, so introducing this many differences
> between 9.4 and 9.5+ will make back-patching harder.

That's certainly true, but we don't want current or future hackers to be
confused either.

> The original objection to the name WithCheckOptions was in relation to
> the Query struct, and the fact that this field isn't the parsed
> representation of WITH CHECK OPTION's on the query. I think that can
> be cured with a suitable comment.

There's also the issue that outfuncs is including that node when it
really shouldn't be.  That can be fixed independnetly of this
discussion, of course.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr 
:

> On Mon, Sep 28, 2015 at 12:05 PM, Pavel Stehule 
> wrote:
>
>>
>> 2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>> On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
>>> wrote:
>>>
>>> the preparing of content before execution is interesting idea, that can
 be used more. The almost queries and plans are not too big, so when the
 size of content is not too big - less than 1MB, then can be used one DSM
 for all backends.

>>>
>>>
 When size of content is bigger than limit, then DSM will be allocated
 specially for this content. The pointer to DSM and offset can be stored in
 requested process slot. The reading and writing to requested slot should be
 protected by spinlock, but it should block only two related processes for
 short time (copy memory).

>>>
>>> Sorry, I don't think this will fly.
>>>
>>> The whole idea is that a backend publishes the plan of a query just
>>> before running it and it doesn't care which other backend(s) might be
>>> reading it, how many times and in which order.  The only required locking
>>> (implicit) is contained in the code for dsm_attach/detach().
>>>
>>
>> I didn't propose too different solution. There is only one difference -
>> sharing DSM for smaller data. It is similar to using usual shared memory.
>>
>
> Does this mean implementing some sort of allocator on top of the shared
> memory segment?  If so, how are you going to prevent fragmentation?
>

yes, simple memory allocator is necessary in this case. But it should be
really simple - you can allocate only fixed size blocks - 10KB, 100KB and
1MB from separate buffers. So the fragmentation is not possible.

Regards

Pavel


>
> --
> Alex
>
>

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 19:13 GMT+02:00 Shulgin, Oleksandr 
:

> On Mon, Sep 28, 2015 at 7:09 PM, Pavel Stehule 
> wrote:
>
>>
>> 2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr <
>> oleksandr.shul...@zalando.de>:
>>
>>>
 I didn't propose too different solution. There is only one difference -
 sharing DSM for smaller data. It is similar to using usual shared memory.

>>>
>>> Does this mean implementing some sort of allocator on top of the shared
>>> memory segment?  If so, how are you going to prevent fragmentation?
>>>
>>
>> yes, simple memory allocator is necessary in this case. But it should be
>> really simple - you can allocate only fixed size blocks - 10KB, 100KB and
>> 1MB from separate buffers. So the fragmentation is not possible.
>>
>
> Maybe we're talking about completely different ideas, but I don't see how
> fixing the block helps to fight fragmentation, in particular.  Can you
> sketch a simple example?  E.g. 400 backends share the common segment and
> all of them want to publish a plan of ~1kb, for example.  Now what?
>

Probably only few backends will be data requesters and few backends will be
requested for data. The size of shared data will be typically less than
10KB, with few exceptions 100KB, and few exceptions 10MB. So for 400
backends you need 400*10KB+100*100KB+20*1MB = 34MB. You can have three
independent buffers for this sizes. If some process require 5KB then you
returns 10KB (it is good enough) from first buffer. This simple mechanism
can coverage 90% of usage. for other 10% you can create new DSM. Because
you are working with fixed size blocks and you don't divide the size of
block, then the fragmentation isn't possible. This model allow very fast
allocation, very fast deallocation.

It is not terribly effective, but 34MB is better than 400 DSM or than 400MB
(max size for any backend).

Our super long queries are terribly long > 1MB, but we use a few clients
(less than 2x CPU cores)

Regards

Pavel

>
> --
> Alex
>
>

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Andres Freund

On 2015-09-28 09:41:18 -0700, David Fetter wrote:
> Since you're convinced that this is an unqualified win, please put
> together a project plan for switching from our current system to
> Github.

Err, no. That's a waste of all our time.

It has been stated pretty clearly in this thread by a number of senior
community people that we're not going to use a closed source system.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY planning

Alvaro Herrera  writes:
> Yup.  However I notice that there are a few other callers of
> expression_planner() that do not involve the optimizer for anything
> else.  Maybe it makes sense to have a separate header file that's just

> #include "nodes/primnodes.h"
> extern Expr *expression_planner(Expr *expr);
> extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);

> Seems it could be used by a large percentage of files currently
> including planner.h.

Hmm.  I'd be a bit inclined to define such a file as "planner's
externally visible API", which would mean it should also include
planner() itself, and maybe also eval_const_expressions --- are
there external uses of that?

If we wanted to get rid of external uses of clauses.h, we'd probably
want to move some things like contain_mutable_functions() to nodeFuncs.c,
rather than put them into this hypothetical new header.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] track_commit_timestamp and COMMIT PREPARED

On Mon, Sep 28, 2015 at 2:07 PM, Petr Jelinek  wrote:
> Sorry, missed your reply.

To be clear, that was actually Fujii Masao's reply, not mine.  I hope
he can have a look at this version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

On Mon, Sep 28, 2015 at 2:34 PM, Josh Berkus  wrote:
> On 09/28/2015 08:10 AM, Robert Haas wrote:
>> -1 on that idea.  I really don't think that we should categorically
>> decide we don't support higher optimization levels.  If the compiler
>> has a bug, then the compiler manufacturer should fix it, and it's not
>> our fault.  If the compiler doesn't have a bug and our stuff is
>> blowing up, then we have a bug and should fix it.  I suppose there
>> could be some grey area but hopefully not too much.
>
> Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for
> most programs because it could change behavior.

I'm attempting to decipher this acronym.  Problem is located between
chair and keyboard?  But I don't see how that applies here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread David Fetter

On Mon, Sep 28, 2015 at 07:14:40PM +0300, YUriy Zhuravlev wrote:
> On Monday 28 September 2015 08:23:46 David Fetter wrote:
> > They may well be, but until we decide it's worth the switching
> > costs to move to a totally different way of doing things, that
> > system will stay in place.
> Until we decide we're wasting time

Great!

Since you're convinced that this is an unqualified win, please put
together a project plan for switching from our current system to
Github.  Make sure to account for all current functionality, even if
your plan is to drop it without archiving.  For bonus points, you can
do a side-by-side comparison of the system we have with the system you
envision, including the running and switching costs.  While
people-hours isn't a great metric, you could start with that, or
propose a different one.

> >Neither magic wands nor a green field project are in operation here.
> Now any stick will help. IMHO

That is an assertion that you now have an opportunity to back with
some kind of plan based on our current system and the future system
you envision.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Joshua D. Drake


On 09/28/2015 05:50 AM, YUriy Zhuravlev wrote:

On Thursday 24 September 2015 12:10:07 Ryan Pedela wrote:

Kam Lasater wrote:

I'd suggest: Github Issues, Pivotal Tracker or Redmine (probably in
that order). There are tens to hundreds of other great ones out there,
I'm sure one of them would also work.


Why not just use Github issues?


I will also vote for github.  We have github mirror now.

In a pinch, you can use gitlab.

PS mail lists outdated IMHO.



Github is not an option, period. .Org should as much as reasonably 
possible rely on Open Source tools and contributing companies/persons. 
The idea that we would use Github was a non-starter before the first 
person typed Gi...


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY planning

Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Alvaro Herrera  writes:
> > > I noticed that COPY calls planner() (this was introduced in 85188ab88).
> > > I think it should be calling pg_plan_query() instead.
> > 
> > +1 --- AFAICS, this is the *only* place that is going directly to
> > planner() without going through pg_plan_query(); other utility
> > functions such as CREATE TABLE AS do the latter.

Yeah, exactly.  Thus, pushed.

> > As far as the patch goes, do copy.c's #include's need adjustment?
> > I'm wondering if optimizer/planner.h could be removed, in particular.
> 
> BeginCopyFrom still uses expression_planner(), at least..

Yup.  However I notice that there are a few other callers of
expression_planner() that do not involve the optimizer for anything
else.  Maybe it makes sense to have a separate header file that's just

#include "nodes/primnodes.h"
extern Expr *expression_planner(Expr *expr);
extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);

Seems it could be used by a large percentage of files currently
including planner.h.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] track_commit_timestamp and COMMIT PREPARED

On Fri, Sep 18, 2015 at 12:53 AM, Fujii Masao  wrote:
> On Sat, Sep 5, 2015 at 7:48 PM, Petr Jelinek  wrote:
>> On 2015-09-02 16:14, Fujii Masao wrote:
>>>
>>> On Wed, Aug 5, 2015 at 2:16 AM, Robert Haas  wrote:

 On Mon, Aug 3, 2015 at 10:31 AM, Fujii Masao 
 wrote:
>
> track_commit_timestamp tracks COMMIT PREPARED as expected in standby
> server,
> but not in master server. Is this intentional? It should track COMMIT
> PREPARED
> even in master? Otherwise, we cannot use commit_timestamp feature to
> check
> the replication lag properly while we use 2PC.


 That sounds like it must be a bug.  I think you should add it to the
 open items list.
>>>
>>>
>>
>> Attached fixes this. It includes advancement of replication origin as well.
>> I didn't feel like doing refactor of commit code this late in 9.5 cycle
>> though, so I went with the code duplication + note in xact.c.
>
> Agreed. We can refactor the code later if needed.
>
> The patch looks good to me except the following minor points.
>
>  * or not.  Normal path through RecordTransactionCommit() will be related
>  * to a transaction commit XLog record, and so should pass "false" here.
>
> The above source comment of TransactionTreeSetCommitTsData() seems to
> need to be updated.
>
> + * Note: if you change this functions you should also look at
> + * RecordTransactionCommitPrepared in twophase.c.
>
> Typo: "this functions" should be "this function"
>
> +if (replorigin_sesssion_origin == InvalidRepOriginId ||
>
> This is not the problem of the patch, but I started wondering what
> "sesssion" in the variable name "replorigin_sesssion_origin" means.
> Is it just a typo and it should be "session"? Or it's the abbreviation
> of something?

This should go in before beta.  Is someone updating the patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS open items are vague and unactionable

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Sep 15, 2015 at 10:22 AM, Stephen Frost  wrote:
> > Unless there are other concerns or issues raised, I'll push this later
> > today.
> 
> So does this mean that the first RLS open item is addressed?  If so,
> can it be moved to the "resolved after 9.5alpha2" section?  Based on
> commit 4f3b2a8883c47b6710152a8e157f8a02656d0e68 I *think* yes but...

I hadn't moved it because there was ongoing discussion and I had an open
item (see: 20150923185403.gc3...@tamriel.snowman.net and the thread
leading up to it).

Attached is a patch to address exactly that issue.  This is all in the
commit message, of course, but the gist of it is:

If SELECT rights are required then apply the SELECT policies, even if
the actual command is an UPDATE or DELETE.  This covers the RETURNING
case which was discussed previously, so we don't need the explicit check
for that, and further addresses the concern raised by Zhaomo about
someone abusing the WHERE clause in an UPDATE or DELETE.

Further, if UPDATE rights are required then apply the UPDATE policies,
even if the actual command is a SELECT.  This addresses the concern that
a user might be able to lock rows they're not actually allowed to UPDATE
through the UPDATE policies.

Comments welcome, of course.  Barring concerns, I'll get this pushed
tomorrow.

Thanks!

Stephen
From b8ad8ca34ad473813b072cc871ad33c48ba68b42 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 28 Sep 2015 13:25:28 -0400
Subject: [PATCH] Include policies based on ACLs needed

When considering which policies should be included, rather than look at
individual bits of the query (eg: if a RETURNING clause exists, or if a
WHERE clause exists which is referencing the table, or if it's a
FOR SHARE/UPDATE query), consider any case where we've determined
the user needs SELECT rights on the relation to be a case where we apply
SELECT policies, and any case where we've deteremind that the user needs
UPDATE rights on the relation to be a case where we apply UPDATE
policies.

This simplifies the logic and addresses concerns that a user could use
UPDATE or DELETE with a WHERE clauses to determine if rows exist, or
they could lock rows which they are not actually allowed to modify
through UPDATE policies.

Back-patch to 9.5 where RLS was added.
---
 src/backend/rewrite/rowsecurity.c |  76 +-
 src/test/regress/expected/rowsecurity.out | 166 ++
 2 files changed, 150 insertions(+), 92 deletions(-)

diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c
index c20a178..1a51c20 100644
--- a/src/backend/rewrite/rowsecurity.c
+++ b/src/backend/rewrite/rowsecurity.c
@@ -187,28 +187,55 @@ get_row_security_policies(Query *root, RangeTblEntry *rte, int rt_index,
 		   hasSubLinks);

 	/*
-	 * For the target relation, when there is a returning list, we need to
-	 * collect up CMD_SELECT policies and add them via add_security_quals.
-	 * This is because, for the RETURNING case, we have to filter any records
-	 * which are not visible through an ALL or SELECT USING policy.
+	 * For a SELECT, if UPDATE privileges are required (eg: the user has
+	 * specified FOR [KEY] UPDATE/SHARE), then also add the UPDATE USING quals.
 	 *
-	 * We don't need to worry about the non-target relation case because we are
-	 * checking the ALL and SELECT policies for those relations anyway (see
-	 * above).
+	 * This way, we filter out any records from the SELECT FOR SHARE/UPDATE
+	 * which the user does not have access to via the UPDATE USING policies,
+	 * similar to how we require normal UPDATE rights for these queries.
 	 */
-	if (root->returningList != NIL &&
-		(commandType == CMD_UPDATE || commandType == CMD_DELETE))
+	if (commandType == CMD_SELECT && rte->requiredPerms & ACL_UPDATE)
 	{
-		List	   *returning_permissive_policies;
-		List	   *returning_restrictive_policies;
+		List	   *update_permissive_policies;
+		List	   *update_restrictive_policies;
+
+		get_policies_for_relation(rel, CMD_UPDATE, user_id,
+  _permissive_policies,
+  _restrictive_policies);
+
+		add_security_quals(rt_index,
+		   update_permissive_policies,
+		   update_restrictive_policies,
+		   securityQuals,
+		   hasSubLinks);
+	}
+
+	/*
+	 * Similar to above, during an UPDATE or DELETE, if SELECT rights are also
+	 * required (eg: when a RETURNING clause exists, or the user has provided
+	 * a WHERE clause which involves columns from the relation), we collect up
+	 * CMD_SELECT policies and add them via add_security_quals.
+	 *
+	 * This way, we filter out any records which are not visible through an ALL
+	 * or SELECT USING policy.
+	 *
+	 * We don't need to worry about the non-target relation case here because
+	 * we are checking the ALL and SELECT policies for those relations anyway
+	 * (see above).
+	 */
+	if ((commandType == CMD_UPDATE || commandType ==

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

Asif Naeem wrote:
> I have spent sometime to investigate the issue, it is reproduciable. In
> case of Windows, when pqsecure_raw_read() function error code
> WSAEWOULDBLOCK (EWOULDBLOCK) when no data queued to be read from the non
> blocking socket there is a need to log retry flag. Related error code can
> be retrieved via Windows WSAGetLastError() instead of errno, preprocessor
> SOCK_ERRNO handle it gracefully.

Hmm, wow.  I think you should also change my_sock_write.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai  wrote:
> The attached patch allows FDW driver to handle EPQ recheck by its own
> preferable way, even if it is alternative local join or ExecQual to
> the expression being pushed down.

Thanks.  I was all set to commit this, or at least part of it, when I
noticed that we already have an FDW callback called RefetchForeignRow.
We seem to be intending that this new callback should refetch the row
from the foreign server and verify that any pushed-down quals apply to
it.  But why can't RefetchForeignRow do that?  That seems to be what
it's for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

Asif Naeem  writes:
> I have spent sometime to investigate the issue, it is reproduciable. In
> case of Windows, when pqsecure_raw_read() function error code
> WSAEWOULDBLOCK (EWOULDBLOCK) when no data queued to be read from the non
> blocking socket there is a need to log retry flag. Related error code can
> be retrieved via Windows WSAGetLastError() instead of errno, preprocessor
> SOCK_ERRNO handle it gracefully. PFA patch, it resolve the issue i.e.

> @@ -1601,7 +1601,7 @@ my_sock_read(BIO *h, char *buf, int size)
>   int save_errno;
>  
>   res = pqsecure_raw_read((PGconn *) h->ptr, buf, size);
> - save_errno = errno;
> + save_errno = SOCK_ERRNO;
>   BIO_clear_retry_flags(h);
>   if (res < 0)
>   {


Great detective work!  But if that's broken, then surely the identical
code in my_sock_write is as well; and the reassignment to errno at the
bottom of my_sock_read needs to be SOCK_ERRNO_SET(); and why doesn't
my_sock_write have a reassignment at all?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

I wrote:
> ... and the reassignment to errno at the
> bottom of my_sock_read needs to be SOCK_ERRNO_SET(); and why doesn't
> my_sock_write have a reassignment at all?

Comparison to the backend versions of these routines, which have been
through quite a few releases, suggests that the reassignment to errno at
the bottom of my_sock_read is simply bogus/unnecessary.  There is no
reason to believe that BIO_clear_retry_flags, BIO_set_retry_read, or
BIO_set_retry_write will munge errno.  Hence we should remove that flight
of fantasy rather than clone it into my_sock_write.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

2015-09-28 Thread Andres Freund

On 2015-09-28 17:28:48 -0400, Tom Lane wrote:

> > What I do find curious is that afaics before 680513ab79 the code also
> > looked at errno, not SOCK_ERRNO. And apparently things worked back then?
> 
> No; AFAICS, before that commit, libpq did not use a custom BIO at all.
> That commit cloned the backend's custom BIO, but did not correctly
> adjust the backend's errno handling for the libpq environment.

Oh, yea.

> Will go fix it.

We now probably could remove
 * XXX OpenSSL 1.0.1e considers many more errcodes than just EINTR as reasons
 * to retry; do we need to adopt their logic for that?
since we now actually check for more tahn just EINTR.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

Andres Freund  writes:
> We now probably could remove
>  * XXX OpenSSL 1.0.1e considers many more errcodes than just EINTR as reasons
>  * to retry; do we need to adopt their logic for that?
> since we now actually check for more tahn just EINTR.

Well, that comment is cloned from the backend which is already checking
for all three of these errno codes.  I am too lazy to go look at the
OpenSSL code right now, but my recollection is that they checked for
some truly weird stuff, not just the expected spellings of EINTR.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] row_security GUC, BYPASSRLS

* Noah Misch (n...@leadboat.com) wrote:
> On Tue, Sep 22, 2015 at 10:38:53AM -0400, Stephen Frost wrote:
> > This would lead to trivial to cause (and likely hard to diagnose)
> > referential integrity data corruption issues.  I find that a very hard
> > pill to swallow in favor of a theoretical concern that new code may open
> > avenues of exploit for a new security context mode to bypass RLS when
> > doing internal referential integrity checks.  Further, it changes this
> > additional capability, which was agreed would be added to offset removal
> > of the 'row_security = force' option, from useful to downright
> > dangerous.
> 
> In schema reviews, I will raise a red flag for use of this feature; the best
> designs will instead use additional roles.  I forecast that PostgreSQL would
> fare better with no owner-constrained-by-RLS capability.  Even so, others want
> it, and FORCE ROW SECURITY would deliver it with an acceptable risk profile.

I've attached a patch to implement it.  It's not fully polished but it's
sufficient for comment, I believe.  Additional comments, documentation
and regression tests are to be added, if we have agreement on the
grammer and implementation approach.

> "SET row_security=force" was too risky, and not in a way particular to foreign
> key constraints, because the session user chose row_security=force independent
> of object owners.  With FORCE ROW SECURITY, each table owner would make both
> decisions.  A foreign key constraint, plus a SELECT policy hiding rows from
> the table owner, plus FORCE ROW SECURITY is one example of self-contradictory
> policy design.  That example is unexceptional amidst the countless ways a
> table owner can get security policy wrong.

I agree that a table owner can get security policy wrong in any number
of ways and that having a FORCE RLS option at the table level is better
than the GUC.

> SECURITY_ROW_LEVEL_DISABLED could have been okay.  I removed an incomplete
> implementation (e.g. didn't affect CASCADE constraints).  Writing a full one
> would be a mammoth job, and for what?  Setting the wrong SELECT policies can
> disrupt _any_ application logic; no foreign keys or FORCE ROW SECURITY need be
> involved.  Protecting just foreign keys brings some value, but it will not
> materially reduce the vigilance demanded of RLS policy authors and reviewers.

I have a hard time with this.  We're not talking about the application
logic in this case, we're talking about the guarantees which the
database is making to the user, be it an application or an individual.

I've included a patch (the second in the set attached) which adds a
SECURITY_NOFORCE_RLS bit which has a much tighter scope- it only applies
after the regular owner check is done.  This reduces the risk of it
being set mistakenly dramatically, I believe.  Further, the cascaded
case is correctly handled.  This also needs more polish and regression
tests, but I wanted to solicit for comment before moving forward with
that since we don't have a consensus about if this should be done.

Thanks!

Stephen
From 32f0298403dcf3ea72c104fa9bddadfa75f3114e Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 28 Sep 2015 11:28:15 -0400
Subject: [PATCH 1/2] ALTER TABLE .. FORCE ROW LEVEL SECURITY

To allow users to force RLS to always be applied, even for table owners,
add ALTER TABLE .. FORCE ROW LEVEL SECURITY.

Note that this ends up applying RLS even to referential integrity checks
and can therefore result in corruption.
---
 doc/src/sgml/catalogs.sgml | 10 +++
 doc/src/sgml/ref/alter_table.sgml  | 17 +
 src/backend/catalog/heap.c |  1 +
 src/backend/commands/tablecmds.c   | 68 
 src/backend/parser/gram.y  | 14 +
 src/backend/utils/misc/rls.c   | 16 +++--
 src/bin/pg_dump/pg_dump.c  | 20 +-
 src/bin/pg_dump/pg_dump.h  |  1 +
 src/bin/psql/describe.c| 44 +++--
 src/include/catalog/pg_class.h | 72 +++---
 src/include/nodes/parsenodes.h |  2 +
 .../modules/test_ddl_deparse/test_ddl_deparse.c|  6 ++
 12 files changed, 213 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e8bd3a1..5c6a480 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1972,6 +1972,16 @@
  
 
  
+  relforcerowsecurity
+  bool
+  
+  
+   True if row level security (when enabled) will also apply to table owner; see
+   pg_policy catalog
+  
+ 
+
+ 
   relispopulated
   bool
   
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 8e35cd9..aca40f5 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -61,6 +61,8

Re: [HACKERS] Rework the way multixact truncations work


On 9/27/15 2:25 PM, Andres Freund wrote:

On 2015-09-27 14:21:08 -0500, Jim Nasby wrote:

IMHO doing just a log of something this serious; it should at least be a
WARNING.


In postgres LOG, somewhat confusingly, is more severe than WARNING.


Ahh, right. Which in this case stinks, because WARNING is a lot more 
attention grabbing than LOG. :/



I think the concern about upgrading a replica before the master is valid; is
there some way we could over-ride a PANIC when that's exactly what someone
is trying to do? Check for a special file maybe?


I don't understand this concern - that's just the situation we have in
all released branches today.


There was discussion about making this a PANIC instead of a LOG, which I 
think is a good idea... but then there'd need to be some way to not 
PANIC if you were doing an upgrade.



+   boolsawTruncationInCkptCycle;
What happens if someone downgrades the master, back to a version that no
longer logs truncation? (I don't think assuming that the replica will need
to restart if that happens is a safe bet...)


It'll just to do legacy truncation again - without a restart on the
standby required.


Oh, I thought once that was set it would stay set. NM.


-   if (MultiXactIdPrecedes(oldestMXact, earliest))
+   /* If there's nothing to remove, we can bail out early. */
+   if (MultiXactIdPrecedes(oldestMulti, earliest))
{
-   DetermineSafeOldestOffset(oldestMXact);
+   LWLockRelease(MultiXactTruncationLock);
If/when this is backpatched, would it be safer to just leave this alone?


What do you mean? This can't just isolated be left alone?


I thought removing DetermineSafeOldestOffset was just an optimization, 
but I guess I was confused.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Has anyone run Cachegrind against the code?

Has anyone ever run cachegrind [1] against Postgres? I see little about 
it on the mailing list so I'm guessing no...


[1] http://valgrind.org/docs/manual/cg-manual.html
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

Andres Freund  writes:
> On 2015-09-28 16:57:24 -0400, Tom Lane wrote:
>> Great detective work!  But if that's broken, then surely the identical
>> code in my_sock_write is as well; and the reassignment to errno at the
>> bottom of my_sock_read needs to be SOCK_ERRNO_SET(); and why doesn't
>> my_sock_write have a reassignment at all?

> I wonder if we couldn't remove saving/restoring errno entirely from
> my_sock_*. We didn't do so before 680513ab79 and I can't see a reason
> why we'd need to now.

Agreed, see my comment to the same effect.

> What I do find curious is that afaics before 680513ab79 the code also
> looked at errno, not SOCK_ERRNO. And apparently things worked back then?

No; AFAICS, before that commit, libpq did not use a custom BIO at all.
That commit cloned the backend's custom BIO, but did not correctly
adjust the backend's errno handling for the libpq environment.

Will go fix it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename withCheckOptions to insertedCheckClauses

Stephen Frost  writes:
> * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
>> Also, these were added in 9.4, so introducing this many differences
>> between 9.4 and 9.5+ will make back-patching harder.

> That's certainly true, but we don't want current or future hackers to be
> confused either.

Yes.  I do not think that we should stick with badly chosen names just
because of back-patching concerns.  By that argument, we should never
fix any erroneous comments either.

Whether these particular names are improvements is, of course, a fit
subject for debate.  I have to agree that I don't feel like we've quite
hit on le mot juste yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

2015-09-28 Thread Andres Freund

On 2015-09-28 16:57:24 -0400, Tom Lane wrote:
> Asif Naeem  writes:
> > I have spent sometime to investigate the issue, it is reproduciable. In
> > case of Windows, when pqsecure_raw_read() function error code
> > WSAEWOULDBLOCK (EWOULDBLOCK) when no data queued to be read from the non
> > blocking socket there is a need to log retry flag. Related error code can
> > be retrieved via Windows WSAGetLastError() instead of errno, preprocessor
> > SOCK_ERRNO handle it gracefully. PFA patch, it resolve the issue i.e.
> 
> > @@ -1601,7 +1601,7 @@ my_sock_read(BIO *h, char *buf, int size)
> > int save_errno;
> >  
> > res = pqsecure_raw_read((PGconn *) h->ptr, buf, size);
> > -   save_errno = errno;
> > +   save_errno = SOCK_ERRNO;
> > BIO_clear_retry_flags(h);
> > if (res < 0)
> > {
> 
> 
> Great detective work!  But if that's broken, then surely the identical
> code in my_sock_write is as well; and the reassignment to errno at the
> bottom of my_sock_read needs to be SOCK_ERRNO_SET(); and why doesn't
> my_sock_write have a reassignment at all?

I wonder if we couldn't remove saving/restoring errno entirely from
my_sock_*. We didn't do so before 680513ab79 and I can't see a reason
why we'd need to now.

What I do find curious is that afaics before 680513ab79 the code also
looked at errno, not SOCK_ERRNO. And apparently things worked back then?
I guess the difference is that pgsecure_raw_read now unconditionally
does SOCK_ERRNO_SET(result_errno).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGXS "check" target forcing an install ?

On Thu, Sep 24, 2015 at 4:21 PM, Noah Misch  wrote:
> On Fri, Jun 26, 2015 at 09:09:15AM -0400, Robert Haas wrote:
>> On Tue, Jun 23, 2015 at 1:31 AM, Michael Paquier  
>> wrote:
>> >> I tracked the dangerous -rf to come from Makefile.global and it's empty
>> >> string being due to abs_top_builddir not being define in my own Makefile.
>> >> But beside that, which I can probably fix, it doesn't sound correct
>> >> that a "check" rule insists in finding an "install" rule.
>> >
>> > Oops, this is a regression, and a dangerous one indeed. This is caused
>> > by dcae5fac.
>> >
>> > One fix is to use NO_TEMP_INSTALL=yes in Makefile.global in the
>> > context of PGXS, like in the patch attached, this variable needing to
>> > be set before Makefile.global is loaded.
>
> This seems reasonable in concept, though the patch's addition is off-topic in
> a section marked "# Support for VPATH builds".  However, ...
>
>> Gulp.  I certainly agree that emitting rm -rf /tmp_install is a scary
>> thing for a PostgreSQL Makefile to be doing.  Fortunately, people
>> aren't likely to have a directory under / by that name, and maybe not
>> permissions on it even if they did, but all the same it's not good.  I
>> propose trying to guard against that a bit more explicitly, as in the
>> attached.
>
> ... agreed.

Thanks for reminding me about this patch.  I've rebased it and
committed it to master and 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS open items are vague and unactionable

On Tue, Sep 15, 2015 at 10:22 AM, Stephen Frost  wrote:
> Dean,
>
> * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
>> A minor point -- this comment isn't quite right:
>
> Fixed.
>
>> because the policies that are fetched there are only used for
>> add_security_quals(), not for add_with_check_options(). It might be
>> cleaner if the 'if' statement that follows were merged with the
>> identical one a few lines down, and then those returning policies
>> could be local to that block, with the 2 pieces of RETURNING handling
>> done together. Similarly for the upsert block.
>
> Done.
>
>> Actually, it isn't necessary to test that rt_index ==
>> root->resultRelation, because for all other relations commandType is
>> set to CMD_SELECT higher up, so the 'returning' bool variable could
>> just be replaced with 'root->returningList != NIL' throughout.
>
> Done.
>
> Updated patch attached for review.
>
> Unless there are other concerns or issues raised, I'll push this later
> today.

So does this mean that the first RLS open item is addressed?  If so,
can it be moved to the "resolved after 9.5alpha2" section?  Based on
commit 4f3b2a8883c47b6710152a8e157f8a02656d0e68 I *think* yes but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN Scan: Optimize memory allocation in function 'bringetbitmap'

2015-09-28 Thread Simon Riggs

On 27 September 2015 at 02:15, Jinyu Zhang  wrote:

>
> BRIN Scan: Optimize memory allocation in function 'bringetbitmap'.
> We can allocate memory for some pointer before do long loop instead of
> allocating
> memory in long loop.
>
> Before optimizing code (warm run)
> postgres=# select count(*) from lineitem where l_orderkey=1;
>  count
> ---
>  6
> (1 row)
>
> Time: 456.219 ms
>
> After optimizing code (warm run)
> postgres=# select count(*) from lineitem where l_orderkey=1;
>  count
> ---
>  6
> (1 row)
>
> Time: 349.219 ms
>

Patch?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Tab completion for ALTER COLUMN SET STATISTICS

On Sat, Sep 26, 2015 at 7:24 AM, Michael Paquier
 wrote:
> On Sat, Sep 26, 2015 at 7:18 AM, Jeff Janes  wrote:
>> If I have "alter table foo alter COLUMN bar SET STATISTICS" in the line
>> buffer,
>> it tab completes to add " TO", which is not legal.
>>
>> The attached patch makes it not tab complete anything at all, which is at
>> least not actively misleading.
>> I thought of having it complete "-1", "" so that it gives a clue
>> about what is needed, but I didn't see any precedence for non-literal
>> clue-giving and I did not want to try to create new precedence.
>
> +1 for the way you are doing it in your patch.

Before we take that approach, can I back up and ask whether we
shouldn't instead narrow the rule that's inserting TO?  I think that
completion is coming from here:

else if (pg_strcasecmp(prev2_wd, "SET") == 0 &&
 pg_strcasecmp(prev4_wd, "UPDATE") != 0 &&
 pg_strcasecmp(prev_wd, "TABLESPACE") != 0 &&
 pg_strcasecmp(prev_wd, "SCHEMA") != 0 &&
 prev_wd[strlen(prev_wd) - 1] != ')' &&
 prev_wd[strlen(prev_wd) - 1] != '=' &&
 pg_strcasecmp(prev4_wd, "DOMAIN") != 0)
COMPLETE_WITH_CONST("TO");

Now, that is basically an incredibly broad production: every time the
second-most recent word is SET, complete with TO.  It looks to me like
this has already been patched around multiple times to make it
slightly narrower.  Those exceptions were added by three different
commits, in 2004, 2010, and 2012.

Maybe it's time to back up and make the whole thing a lot narrower.
Like, if second-most-recent word of the command is also the FIRST word
of the command, this is probably right.  And there may be a few other
situations where it's right.  But in general, SET is used in lots of
places and the fact that we've seen one recently isn't enough to
decide in TO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unclear about row-level security USING vs. CHECK

On Sat, Sep 26, 2015 at 9:46 PM, Peter Eisentraut  wrote:
> On 9/23/15 3:41 PM, Stephen Frost wrote:
> I see.  But it is a bit odd to hide this very fundamental behavior
> somewhere in a paragraph that starts out with something about roles.
>
> There is also a mistake, I believe: DELETE policies also take both a
> CHECK and a USING clause.
>
> I still find something about this weird, but I'm not sure what.  It's
> not clear to me at what level this USING->CHECK mapping is applied.  I
> can write FOR ALL USING and it will be mapped to CHECK for all actions,
> including INSERT, but when I write FOR INSERT USING it complains.  Why
> doesn't it do the mapping that case, too?

We are really pushing our luck only hammering this stuff out now.  But
I think I agree with Peter's concerns, FWIW.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doubt in pgbench TPS number

Fabien COELHO  writes:
>> Yeah, that's definitely a bug but I'm afraid the fix will change the
>> TPS number and may break the backward compatibility. Since we have
>> lived with bug for years, I hesitate to back port to older stable
>> branches...

> My 2¥: I do not think of a good argument to keep wrong tps numbers once it 
> is known that there are plain wrong, especially as it is not a behavioral 
> change as such which could block applications or whatever, just a 
> different number printed at the end of a run. So I would not bother much 
> with upward compatibility consideration in this case.

FWIW, I vote with Tatsuo-san.  Such a change will break comparability of
results with all previous versions, which means it's not something to do
in minor releases, even if we now believe the previous results were
somewhat bogus.  Arguing that it's "not a behavioral change" seems
quite loony to me: for most people, the TPS numbers are the only
interesting output of pgbench.

I think we should fix it in 9.5 and document it as an incompatible change.

(Note: I've not read the patch, so this is not an opinion about its
correctness.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-28 Thread Marti Raudsepp

Hi Gavin

Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=gon-...@mail.gmail.com

In particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.

> *partial-knn-1.patch*
>
> KNN-GiST provides ability to get ordered results from index, but this order
> is based only on index information. For instance, GiST index contains
> bounding rectangles for polygons, and we can't get exact distance to
> polygon from index (similar situation is in PostGIS). In attached patch,
> GiST distance method can set recheck flag (similar to consistent method).
> This flag means that distance method returned lower bound of distance and
> we should recheck it from heap.

Unfortunatley this work has stalled.

Regards,
Marti

On Sun, Sep 27, 2015 at 11:58 PM, Gavin Wahl  wrote:
> It seems trivial to accelerate a MAX or MIN query with a BRIN index. You
> just find the page range with the largest/smallest value, and then only scan
> that one. Would that be hard to implement? I'm interested in working on it
> if someone can give me some pointers.
>
> Somewhat harder but still possible would be using BRIN indexes to accelerate
> ORDER BY. This would require a sorting algorithm that can take advantage of
> mostly-sorted inputs. You would sort the page ranges by their minimum or
> maximum value, then feed the sorting algorithm in that order.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

On Sat, Sep 26, 2015 at 3:27 AM, Fabien COELHO  wrote:
> Here is a v10, which is a rebase because of the "--progress-timestamp"
> option addition.

I do not see it attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manual bitswizzling -> LOCKBIT_ON

On Thu, Sep 24, 2015 at 10:24 PM, Thomas Munro
 wrote:
> While studying lmgr code, I noticed that there are a couple of places
> that use 1 << x to convert a LOCKMODE to a LOCKMASK instead of the
> macro that is used elsewhere.  Should that be changed for consistency,
> as in the attached?

Sure, why not?

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CustomScan support on readfuncs.c

On Thu, Sep 24, 2015 at 9:19 PM, Kouhei Kaigai  wrote:
> Then, let's look back a bit. Next issue is how to reproduce
> the "methods" pointer from the text representation.
> I try to lookup the methods table using a pair of library
> and symbol name; probably, it is a straightforward way.
> The "methods" field is put earlier than all private fields
> generated by TextOutCustomScan, so it should be reconstructable
> prior to TextReadCustomScan.
> To support this feature, I had to add an interface contract
> that requires extensions to put library and symbol name on
> CustomScanMethods table.
> Although INIT_CUSTOM_SCAN_METHODS() macro can set up these
> fields, author of extension needs to pay attention.
>
> In addition to these enhancement, a new NodeCopyCustomScan
> callback eliminates a restriction; custom-scan provider
> cannot define its own structure that embeds CustomScan.
> Only CSP knows exact size of the structure, this callback
> is intended to allocate a new one and copy the private fields,
> but no need to copy the common fields.
>
> These three callbacks (one existing, two new) will make
> CustomScan node copyObject, nodeToString and stringToNode
> aware.

Instead of doing this:

+/* Dump library and symbol name instead of raw pointer */
 appendStringInfoString(str, " :methods ");
-_outToken(str, node->methods->CustomName);
+_outToken(str, node->methods->methods_library_name);
+appendStringInfoChar(str, ' ');
+_outToken(str, node->methods->methods_symbol_name);

Suppose we just make library_name and symbol_name fields in the node
itself, that are dumped and loaded like any others.

Would that be better?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN vacuum bug

On Thu, Sep 24, 2015 at 11:47 AM, Teodor Sigaev  wrote:
> Hi!
>
> After reading thread
> http://www.postgresql.org/message-id/flat/CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com#CAMkU=1xalflhuuohfp5v33rzedlvb5aknnujceum9knbkrb...@mail.gmail.com
>
> I agree, that it's a bug, although, seems, with low probability. I'd like to
> suggest patch with introduces:
> 1 cleanup process could be only one at the same time
> 2 if cleanup called from regular insert sees waiting concurrent cleanup then
>   it will stop process in hope that another cleanup is vacuum called.
>
> Insert-called cleanup process locks metapage with
> ConditionalLockPage(ExclusiveLock) and if it fails then just goes away,
> because other cleanup is already in progress.  Also, insert-called cleanup
> process will sometimes do lock/conditional lock metapage. Vacuum-called
> cleanup process locks with a help of unconditional LockPage() and will not
> relock metapage during process, that gives a warranty to be a single cleanup
> process. Relock of insert-called cleanup process allows to leave a rest of
> work to vacuum if it runs right now. This reduces latency for insert and
> allows to vacuum to be sure that pending list is clear.
>
> Also, patch differentiates which limit of memory to use:
> autovacuum_work_mem, maintenance_work_mem or work_mem depending on call
> path.

Using a heavyweight lock on the metapage seems like a better idea to
me than the previous proposal of using the relation lock.  But I think
it would be best to discuss this on the original thread instead of
starting a new one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doubt in pgbench TPS number



Hello Tom,


FWIW, I vote with Tatsuo-san.  Such a change will break comparability of
results


I would not classify a performance measure as a "result compatibility" 
issue. What matters is the *correction* of the results.


When a bug is fixed anywhere in pg it may change performance significantly 
for some tests, and I have never seen this as a relevant consideration not 
to fix a problem...



with all previous versions, which means it's not something to do
in minor releases, even if we now believe the previous results were
somewhat bogus.  Arguing that it's "not a behavioral change" seems
quite loony to me: for most people, the TPS numbers are the only
interesting output of pgbench.


I think that if people are interested in tps without reconnecting on each 
transaction they would not run with "-C" to trigger reconnections and then 
look at the "tps without connections" for the figure they want... so I do 
not think that keeping this error is worth anything.


On the other hand, and on the principle, keeping the bug looks wrong. I 
cannot agree with the logic behind shipping something which is known bad, 
especially to display *optimistic* performances... It looks too much like 
the VW way:-)


I think we should fix it in 9.5 and document it as an incompatible 
change.


Hmm.


(Note: I've not read the patch, so this is not an opinion about its
correctness.)


That is another question:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

On Thu, Sep 24, 2015 at 8:37 AM, Masahiko Sawada  wrote:
> When we run "VACUUM;", the all tables of current database will be vacuumed.
> So pg_stat_vacuum_progress should have these oid in order to show
> which table is vacuumed now.

Hmm, I would tend to instead show the schema & table name, like "foo"."bar".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread David Fetter

On Mon, Sep 28, 2015 at 03:50:29PM +0300, YUriy Zhuravlev wrote:
> On Thursday 24 September 2015 12:10:07 Ryan Pedela wrote:
> > Kam Lasater wrote:
> > > I'd suggest: Github Issues, Pivotal Tracker or Redmine (probably
> > > in that order). There are tens to hundreds of other great ones
> > > out there, I'm sure one of them would also work.
> > 
> > Why not just use Github issues?
> 
> I will also vote for github.  We have github mirror now. 
> 
> In a pinch, you can use gitlab.

Github is a company, and companies go out of business all the time,
frequently without warning.  However healthy it may look right now,
this is a scenario for which we need to have plans.

What is your plan for this contingency?

I assure you that when it's happening, the very last thing on the mind
of the people winding down the company is whether some project's bug
tracker gets properly exported and its functionality ported to another
compatible system.

> PS mail lists outdated IMHO.

They may well be, but until we decide it's worth the switching costs
to move to a totally different way of doing things, that system will
stay in place.

There are a good many decisions I would make differently if I were
waving a magic wand over a green field project.  Neither magic wands
nor a green field project are in operation here.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-28 Thread Heikki Linnakangas

On 09/28/2015 05:28 PM, Marti Raudsepp wrote:

Note that Alexander Korotkov already started work in 2013 on a
somewhat similar feature called partial sort:
http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=gon-...@mail.gmail.com

In particular, see the 2nd patch for KNN sort -- it uses known
bounding boxes from the GiST index for sorting, which in many ways is
similar to min/max BRIN ranges.

*partial-knn-1.patch*

KNN-GiST provides ability to get ordered results from index, but this order
is based only on index information. For instance, GiST index contains
bounding rectangles for polygons, and we can't get exact distance to
polygon from index (similar situation is in PostGIS). In attached patch,
GiST distance method can set recheck flag (similar to consistent method).
This flag means that distance method returned lower bound of distance and
we should recheck it from heap.

Unfortunatley this work has stalled.

No, that was actually committed for 9.5. It's this item in the release
notes:

Allow queries to perform accurate distance filtering of
bounding-box-indexed objects (polygons, circles) using GiST indexes
(Alexander Korotkov, Heikki Linnakangas)

Previously, a common table expression was required to return a large
number of rows ordered by bounding-box distance, and then filtered
further with a more accurate non-bounding-box distance calculation.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On-demand running query plans using auto_explain and signals

On Fri, Sep 25, 2015 at 7:13 PM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

>
> Some implementation details:
>
> For every backend that might be running (MaxBackends) we reserve a
> dsm_handle slot in the addins shared memory.  When the new option is turned
> on, the ExecutorRun hook will produce a plan in whatever format is
> specified by the auto_explain.log_format, allocate a DSM segment, copy the
> plan into the segment and finally store the DSM handle into its own slot.
> No locking is required around this because every backend writes to its slot
> exclusively, no other backend can be writing into it concurrently.  In the
> ExecutorFinish hook we invalidate the backend's slot by setting the handle
> to 0 and deallocate the DSM segment.
>

And this patch adds back the use of table of contents on the DSM.
Benefits: magic number validation; communication of the actual plan size.

--
Alex
From 9294995e200fca2abdccf0825ef852f26df1f4a2 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin 
Date: Mon, 28 Sep 2015 16:20:42 +0200
Subject: [PATCH 2/2] Add SHM table of contents to the explain DSM

By doing so we achieve two things:

1) verify the actual contents of the segment we've attached to, by
enforcing the magic number check;

2) transmit the actual size of the payload, so the receiving side can
allocate and parse in process-local memory, detaching from the DSM
early.
---
 contrib/auto_explain/auto_explain.c | 59 -
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 0fa123f..ac6b750 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -23,6 +23,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/shmem.h"
+#include "storage/shm_toc.h"
 #include "utils/guc.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
@@ -48,6 +49,7 @@ static const struct config_enum_entry format_options[] = {
 	{NULL, 0, false}
 };
 
+#define PG_EXPLAIN_BACKEND_MAGIC	0x79fb2449
 
 /* Shared memory structs */
 static dsm_handle *published_plan_handles = NULL;
@@ -455,18 +457,40 @@ explain_query(QueryDesc *queryDesc)
 static void
 explain_publish_plan(QueryDesc *queryDesc)
 {
-	StringInfo		str = explain_query(queryDesc);
+	StringInfo		str;
+	Size			plan_size;
+	shm_toc_estimator	toc_estimator;
+	Sizeseg_size;
 	dsm_segment	   *dsm;
+	shm_toc		   *toc = NULL;
 
 	attach_published_plan_handles();
 
-	dsm = dsm_create(str->len + 1, DSM_CREATE_NULL_IF_MAXSEGMENTS);
+	str = explain_query(queryDesc);
+	plan_size = str->len + 1;
+
+	shm_toc_initialize_estimator(_estimator);
+	shm_toc_estimate_keys(_estimator, 2);	/* size, plan */
+	shm_toc_estimate_chunk(_estimator, sizeof(Size));
+	shm_toc_estimate_chunk(_estimator, plan_size);
+	seg_size = shm_toc_estimate(_estimator);
+
+	dsm = dsm_create(seg_size, DSM_CREATE_NULL_IF_MAXSEGMENTS);
 	if (dsm)
 	{
-		char	   *buffer = (char *) dsm_segment_address(dsm);
+		Size	   *size_ptr;
+		char	   *buffer;
 		dsm_handle	handle;
 
-		memcpy(buffer, str->data, str->len + 1);
+		toc = shm_toc_create(PG_EXPLAIN_BACKEND_MAGIC, dsm_segment_address(dsm), seg_size);
+
+		size_ptr = (Size *) shm_toc_allocate(toc, sizeof(Size));
+		*size_ptr = plan_size;
+		shm_toc_insert(toc, 0, size_ptr);
+
+		buffer = (char *) shm_toc_allocate(toc, plan_size);
+		memcpy(buffer, str->data, plan_size);
+		shm_toc_insert(toc, 1, buffer);
 
 		MyPublishedPlanSegment = dsm;
 
@@ -570,9 +594,30 @@ pg_explain_backend(PG_FUNCTION_ARGS)
 	{
 		PG_TRY();
 		{
-			const char	   *p = (const char *) dsm_segment_address(seg);
+			shm_toc		   *toc;
+			Size			plan_size;
+			char		   *buffer;
+			const char	   *p;
+
+			toc = shm_toc_attach(PG_EXPLAIN_BACKEND_MAGIC, dsm_segment_address(seg));
+			if (toc == NULL)
+ereport(ERROR,
+		(errmsg("bad magic in dynamic memory segment")));
+
+			plan_size = *((Size *) shm_toc_lookup(toc, 0));
+
+			buffer = palloc(plan_size);
+			memcpy(buffer, shm_toc_lookup(toc, 1), plan_size);
+
+			/* we can detach DSM already, but only if it's not our own one */
+			if (handle)
+			{
+handle = 0;		/* avoid possible extra detach in PG_CATCH */
+dsm_detach(seg);
+			}
 
 			/* break into tuples on newline */
+			p = buffer;
 			while (*p)
 			{
 const char *q = strchr(p, '\n');
@@ -592,9 +637,7 @@ pg_explain_backend(PG_FUNCTION_ARGS)
 p += len + 1;
 			}
 
-			/* detach DSM, but only if it's not our own one */
-			if (handle)
-dsm_detach(seg);
+			pfree(buffer);
 		}
 		PG_CATCH();
 		{
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: BRIN Scan: Optimize memory allocation in function 'bringetbitmap'

2015-09-28 Thread zhangjinyu

Sorry,  I forgot attaching patch. But I have send path in another thread.
Please see this thread "Patch: Optimize memory allocation in function
'bringetbitmap' ".





--
View this message in context: 
http://postgresql.nabble.com/BRIN-Scan-Optimize-memory-allocation-in-function-bringetbitmap-tp5867536p5867648.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread YUriy Zhuravlev

On Monday 28 September 2015 08:23:46 David Fetter wrote:
> They may well be, but until we decide it's worth the switching costs
> to move to a totally different way of doing things, that system will
> stay in place.
Until we decide we're wasting time

>Neither magic wands nor a green field project are in operation here.
Now any stick will help. IMHO

-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect patch, for showing tuple data

2015-09-28 Thread Nikolay Shaplov

В письме от 26 сентября 2015 20:57:25 Вы написали:

> >> Thanks! I just had a short look at it:
> >> - I am not convinced that it is worth declaring 3 versions of
> >> tuple_data_split.
> > 
> > How which of them should we leave?
> 
> The one with the most arguments. Now perhaps we could have as well two
> of them: one which has the minimum number of arguments with default
> values, and the second one with the maximum number of arguments.

I've been thinking about this warning mode for a while... 

I've added this argument to the heap_page_item_atts, because if somebody would 
like to force it to parse fake data, it would be really difficult to provide 
proper fake data if one heap to perfectly fit tuple descriptor of another heap.
So to do this you have to lover error level.

But since now we actually parse data with tuple_data_split, we can provide a 
precisely formed  fake information, so you are not limited with how it is 
actually stored in page. You just pass any arguments you want. So you does not 
need warning mode anymore.

So may be I should get rid of "warning mode" now.

Concerning do_detoast argument. I do not have any clear opinion. I thing that 
tuple_data_split is a kind of internal function. So it is ok if it is not so 
convenient to use. 

But in  heap_page_item_attrs should be optional. Because it is used from plsql 
console, and in most cases user does not want deTOASing. 

So if in one function do_detoast is optional, may be it would be good to have 
it optional in other functions to keep similarity...

So summorising: if you want make things simpler, I would first get totally rid 
of warning_mode first of all, then if you would insist, make do_detoast non-
optional.

> 
> >> +t_infomask2
> >> +integer
> >> +stores number of attributes (11 bits of the word). The
> >> rest are used for flag bits:
> >> +
> >> +0x2000 - tuple was updated and key cols modified, or tuple deleted
> >> +0x4000 - tuple was HOT-updated
> >> +0x8000 - this is heap-only tuple
> >> +0xE000 - visibility-related bits (so called "hint bits")
> >> +
> >> This large chunk of documentation is a duplicate of storage.sgml. If
> >> that's really necessary, it looks adapted to me to have more detailed
> >> comments at code level directly in heapfuncs.c.
> > 
> > Hm... There is no explanation of t_infomask/t_infomask2 bits in
> > storage.sgml.
> > 
> > If there is no source of information other then source code, then the
> > documentation is not good.
> 
> pageinspect is already something that works at low-level. My guess is
> that users of this module are going to have a look at the code either
> way to understand how tuple data is manipulated within a page.


> > So I would consider two options: Either move t_infomask/t_infomask2
> > details
> > into storage.sgml or leave as it is.
> 
> The documentation redirects the reader to look at htup_details.h (the
> documentation is actually incorrect, I sent a separate patch)
I do not see any redirect at
http://www.postgresql.org/docs/9.4/static/pageinspect.html

> , and I
> see no point in duplicating such low-level descriptions, that would be
> double maintenance for the same result.
> 
> > I am lazy, and does not feel confidence about touching main documentation,
> > so I would prefer to leave as it is.
> 
> Your patch is proving the contrary :) Honestly I would just rip out
> the part you have added to describe all the fields related to
> HeapTupleHeaderData, and have only a simple self-contained example to
> show how to use the new function tuple_data_split.
May be. I would think about it a little bit more, and discuss it with my 
friends

> >> The example of tuple_data_split in the docs would be more interesting
> >> if embedded with a call to heap_page_items.
> > 
> > This example would be almost not readable. May be I should add one more
> > praise explaining where did we take arguments for tuple_data_split
> 
> I would as well just remove heap_page_item_attrs, an example in the
> docs would be just enough IMO (note that I never mind being outvoted).

This is function that pageinspect user is definitely will use this function. 
And as a user I would like to see function output in documentation the way I 
would see it, in order to be better prepared to see it in real life. There are 
limits however, get_raw_page output should not be shown in documentation ;-)
heap_page_item_attrs is close to these limits, but did not reach it, in my 
opinion... So I'd prefer to keep it there.

> Btw, shouldn't the ereport messages in tuple_data_split use
> error_level instead of ERROR?
These errors are in the part where I parse t_bits from text back to bit 
representation.  Here there is not chance to get non-strict behavior, since 
there is no way to guess what should we do if we met characters then '0' or 
'1'. Or what to do if there is only 7 characters but we should write whole 
byte.etc...
Moreover as I wrote above bay be we just do not need warning_mode at all

Re: [HACKERS] pgbench stats per script & other stuff




Here is a v10, which is a rebase because of the "--progress-timestamp"
option addition.


I do not see it attached.


Indeed. Here it is.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0ac40f1..e3a90e5 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -261,6 +261,25 @@ pgbench  options  dbname
 benchmarking arguments:
 
 
+ 
+  -b scriptname[@weight]
+  --builtin scriptname[@weight]
+  
+   
+Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
+Available builtin scripts are: tpcb-like,
+simple-update and select-only.
+The provided scriptname needs only to be a prefix
+of the builtin name, hence simp would be enough to select
+simple-update.
+With special name list, show the list of builtin scripts
+and exit immediately.
+   
+  
+ 
+
 
  
   -c clients
@@ -307,14 +326,15 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

-Read transaction script from filename.
+Add a transaction script read from filename to
+the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.
--N, -S, and -f
-are mutually exclusive.

   
  
@@ -404,10 +424,7 @@ pgbench  options  dbname
   --skip-some-updates
   

-Do not update pgbench_tellers and
-pgbench_branches.
-This will avoid update contention on these tables, but
-it makes the test case even less like TPC-B.
+Shorthand for -b simple-update@1.

   
  
@@ -512,9 +529,9 @@ pgbench  options  dbname
 Report the specified scale factor in pgbench's
 output.  With the built-in tests, this is not necessary; the
 correct scale factor will be detected by counting the number of
-rows in the pgbench_branches table.  However, when testing
-custom benchmarks (-f option), the scale factor
-will be reported as 1 unless this option is used.
+rows in the pgbench_branches table.
+However, when testing only custom benchmarks (-f option),
+the scale factor will be reported as 1 unless this option is used.

   
  
@@ -524,7 +541,7 @@ pgbench  options  dbname
   --select-only
   

-Perform select-only transactions instead of TPC-B-like test.
+Shorthand for -b select-only@1.

   
  
@@ -674,7 +691,20 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   The default transaction script issues seven commands per transaction:
+   Pgbench executes test scripts chosen randomly from a specified list.
+   They include built-in scripts with -b and
+   user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
+ 
+
+  
+   The default builtin transaction script (also invoked with -b tpcb-like)
+   issues seven commands per transaction over randomly chosen aid,
+   tid, bid and balance.
+   The scenario is inspired by the TPC-B benchmark, but is not actually TPC-B,
+   hence the name.
   
 
   
@@ -688,9 +718,15 @@ pgbench  options  dbname
   
 
   
-   If you specify -N, steps 4 and 5 aren't included in the
-   transaction.  If you specify -S, only the SELECT is
-   issued.
+   If you select the simple-update builtin (also -N),
+   steps 4 and 5 aren't included in the transaction.
+   This will avoid update contention on these tables, but
+   it makes the test case even less like TPC-B.
+  
+
+  
+   If you select the select-only builtin (also -S),
+   only the SELECT is issued.
   
  
 
@@ -702,10 +738,7 @@ pgbench  options  dbname
benchmark scenarios by replacing the default transaction script
(described above) with a transaction script read from a file
(-f option).  In this case a transaction
-   counts as one execution of a script file.  You can even specify
-   multiple scripts (multiple -f options), in which
-   case a random one of the scripts is chosen each time a client session
-   starts a new transaction.
+   counts as one execution of a script file.
   
 
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 6ae1b86..0e56ed7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -164,12 +164,11 @@ bool		use_log;			/* log transaction latencies to a file */
 bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
  * transactions */
+bool

[HACKERS] [PATCH] Skip ALTER x SET SCHEMA if the schema didn't change

2015-09-28 Thread Marti Raudsepp

Hi list

The attached patch changes the behavior of multiple ALTER x SET SCHEMA
commands, to skip, rather than fail, when the old and new schema is
the same.

The advantage is that it's now easier to write DDL scripts that are indempotent.

This already matches the behavior of ALTER EXTENSION SET SCHEMA in
earlier versions, as well as many other SET-ish commands, e.g. ALTER
TABLE SET TABLESPACE, OWNER TO, CLUSTER ON, SET (storage_parameter...)
etc. I don't see why SET SCHEMA should be treated any differently.

The code is written such that object_access_hook is still called for
each object.

Regression tests included. I couldn't find any documentation that
needs changing.

Regards,
Marti


0001-Skip-ALTER-x-SET-SCHEMA-if-the-schema-didn-t-change.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Optimize memory allocation in function 'bringetbitmap'

2015-09-28 Thread zhangjinyu

Yes,  I forgot disable-c-assert last test.  
The following show the test result when disable-c-assert.
I think it's still worthwhile.
*After optimize code (warm run)*
postgres=# select count(*) from lineitem where l_orderkey=1;
 count 
---
 6
(1 row)

Time: 327.143 ms
*Before optimizing code (warm run)*
postgres=# select count(*) from lineitem where l_orderkey=1;
 count 
---
 6
(1 row)

Time: 404.323 ms

However I wonder if it would be simpler to have the dtup structure have
the pointers, so that you can pass it as NULL in the first call and then
followup calls reuse the one allocated in the first call.
Jinyu:  the memory is allocated from perRangeCxt  and perRangeCxt will be
reset in loop,
so this way don't work. 

Jinyu Zhang



--
View this message in context: 
http://postgresql.nabble.com/Patch-Optimize-memory-allocation-in-function-bringetbitmap-tp5867537p5867647.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3.9 and pg_multixact corruption

On Fri, Sep 25, 2015 at 3:41 AM, Andreas Seltenreich
 wrote:
> I think the intention was to make configure complain if there's a -O > 2
> in CFLAGS.

-1 on that idea.  I really don't think that we should categorically
decide we don't support higher optimization levels.  If the compiler
has a bug, then the compiler manufacturer should fix it, and it's not
our fault.  If the compiler doesn't have a bug and our stuff is
blowing up, then we have a bug and should fix it.  I suppose there
could be some grey area but hopefully not too much.

> OTOH, a unit test for multixact.c that exercises the code including
> wraparounds sounds like a desirable thing regardless of the fact that it
> could have caught this miscompilation earlier than 6 months into
> production.

Definitely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread Amit Langote

On 2015/09/28 13:31, David Rowley wrote:
> I've been spending time working on allowing the planner to perform
> aggregation before the final join relation is created.
>
...

>
> The patch is however so far capable of giving us extremely nice performance
> improvements for some (likely artificial) queries.
> 
> Let's look at a quick example:
> 
> CREATE TABLE product (product_id INT NOT NULL,product_code VARCHAR(64) NOT
> NULL, PRIMARY KEY(product_id));
> CREATE UNIQUE INDEX product_product_code_uidx ON product (product_code);
> -- create small list of products
> INSERT INTO product SELECT g.id,'ABC' || CAST(g.id AS TEXT) FROM
> generate_series(1,100) g(id);
> 
> CREATE TABLE sale (sale_id INT NOT NULL, product_id INT NOT NULL, quantity
> INT NOT NULL);
> 
> INSERT INTO sale (sale_id, product_id,quantity) SELECT
> x.x,x.x%100+1,CAST(random() * 1000 AS INT) FROM
> generate_series(1,1) x(x);
> 
> ALTER TABLE sale ADD CONSTRAINT sale_pkey PRIMARY KEY(sale_id);
> 
> test=# SELECT count(sale.sale_id) FROM sale, product;
> count
> -
>  100
> (1 row)
> Time: 10323.053 ms
> 
> 
> And if I disable the optimisation:
> 
> test=# set enable_earlygrouping = off;
> SET
> Time: 0.302 ms
> test=# SELECT count(sale.sale_id) FROM sale, product;
> count
> -
>  100
> (1 row)
> Time: 775790.764 ms
> 
> So, in this probably rather unlikely query, we get something around a 7500%
> performance increase. Of course as the ratio of groups per underlying
> tuples increase, the performance increase will tail off.
> 
> The explain output from the optimised version is as follows:
> 
>  QUERY PLAN
> 
>  Finalize Aggregate  (cost=1790544.37..1790544.38 rows=1 width=4)
>->  Nested Loop  (cost=1790541.10..1790544.12 rows=100 width=4)
>  ->  Partial Aggregate  (cost=1790541.10..1790541.11 rows=1 width=4)
>->  Seq Scan on sale  (cost=0.00..1540541.08 rows=10008
> width=4)
>  ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)
> 
> 

Did you perhaps attach a version of the patch you didn't intend to?

I get the following plan and hence a different result from what's shown above:

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
   QUERY PLAN


 Aggregate  (cost=17909.27..17909.28 rows=1 width=4)
   ->  Nested Loop  (cost=17906.00..17909.02 rows=100 width=4)
 ->  Aggregate  (cost=17906.00..17906.01 rows=1 width=4)
   ->  Seq Scan on sale  (cost=0.00..15406.00 rows=100
width=4)
 ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
 count
---
   100
(1 row)

postgres=# set enable_earlygrouping = off;
SET

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;
QUERY PLAN
---
 Aggregate  (cost=1515408.25..1515408.26 rows=1 width=4)
   ->  Nested Loop  (cost=0.00..1265408.25 rows=1 width=4)
 ->  Seq Scan on sale  (cost=0.00..15406.00 rows=100 width=4)
 ->  Materialize  (cost=0.00..2.50 rows=100 width=0)
   ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)
(5 rows)

postgres=# SELECT count(sale.sale_id) FROM sale, product;
   count
---
 1
(1 row)

Am I missing something?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doubt in pgbench TPS number



Hello Tatsuo-san,


I think that the degree of parallelism to consider is nclients, not
nthreads: while connection time is accumulated in conn_time, other
clients are possibly doing their transactions, in parallel,


I'm not sure about this. I think pgbench will not start transactions
until all clients establish connections to PostgreSQL.


I think that is true if "!is_connect", all client connections are 
performed at the beginning of threadRun, but under -C each client has its 
own connect/deconnect integrated within doCustom, so it is done in 
parallel to other clients having their transactions processed, hence the 
issue with the formula.



I found this while playing with pgpool-II. pgpool-II pre-forks child
process, whose number is defined by "num_init_children"
directive. What I observed was, pgbench starts connecting to pgpool-II
until "-c" connections are established. So, if "-c" is larger than
"num_init_children", no transaction starts.


Yep, unless -C, I think, where some client may start nevertheless? Not 
sure... Does not matter.



I have tested your patch. It seems the tolerance is much better than
before with your patch.


Yep.


I'm going to commit your patch if there's no objection.


This seems fine with me.

The formula change, and just this one, should probably be backported 
somehow, as this is a bug, wherever pgbench resides in older versions. It 
is just 's/nthreads/nclients/' in the printResult formula for computing 
tps_exclude.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Saturday, September 12, 2015 1:39 AM
> To: Etsuro Fujita
> Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; 花田茂
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> 
> On Thu, Sep 10, 2015 at 11:36 PM, Etsuro Fujita
>  wrote:
> > I've proposed the following API changes:
> >
> > * I modified create_foreignscan_path, which is called from
> > postgresGetForeignJoinPaths/postgresGetForeignPaths, so that a path,
> > subpath, is passed as the eighth argument of the function. subpath
> > represents a local join execution path if scanrelid==0, but NULL if
> > scanrelid>0.
> 
> OK, I see now.  But I don't much like the way
> get_unsorted_unparameterized_path() looks.
> 
> First, it's basically praying that MergePath, NodePath, and NestPath
> can be flat-copied without breaking anything.  In general, we have
> copyfuncs.c support for nodes that we need to be able to copy, and we
> use copyObject() to do it.  Even if what you've got here works today,
> it's not very future-proof.
> 
> Second, what guarantee do we have that we'll find a path with no
> pathkeys and a NULL param_info?  Why can't all of the paths for a join
> relation have pathkeys?  Why can't they all be parameterized?  I can't
> think of anything that would guarantee that.
> 
> Third, even if such a guarantee existed, why is this the right
> behavior?  Any join type will produce the same output; it's just a
> question of performance.  And if you have only one tuple on each side,
> surely a nested loop would be fine.
> 
> It seems to me that what you ought to be doing is using data hung off
> the fdw_private field of each RelOptInfo to cache a NestPath that can
> be used for EPQ rechecks at that level.  When you go to consider
> pushing down another join, you can build up a new NestPath that's
> suitable for the new level.  That seems much cleaner than groveling
> through the list of surviving paths and hoping you find the right kind
> of thing.
>
> And all that having been said, I still don't really understand why you
> are resisting the idea of providing a callback so that the FDW can
> execute arbitrary code in the recheck path.  There doesn't seem to be
> any reason not to let the FDW take control of the rechecks if it
> wishes, and there's no real cost in complexity that I can see.
>
The discussion has been pending for two weeks, even though we put this
problem on the open item towards v9.5; that means we recognize it is
a problem to be fixed by the v9.5 release.

The attached patch allows FDW driver to handle EPQ recheck by its own
preferable way, even if it is alternative local join or ExecQual to
the expression being pushed down.

Regarding to the alternative join path selection, I initially thought
it is valuable to choose the best path from performance standpoint,
however, what we need to do here is visibility check towards all the
EPQ tuples already loaded to EState. So, unparametalized NestLoop is
sufficient to execute qualifier across relations.
(What happen if HashJoin is chosen? It's probably problematic.)

So, if your modified postgres_fdw keeps an alternative path, what
we need to do is construction of dummy NestPath with no param_info,
no pathkeys, and dummy cost. Then, give this path on fdw_paths of
ForeignPath. It shall be transformed to plan-nodes, then eventually
transformed to plan-state-node by postgres_fdw itself.
I cannot find out something difficult to do any more.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


pgsql-fdw-epq-recheck.v2.patch
Description: pgsql-fdw-epq-recheck.v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doubt in pgbench TPS number

> I have tested your patch. It seems the tolerance is much better than
> before with your patch.

[snip]

> I'm going to commit your patch if there's no objection.

I think we should commit this to master and 9.5 stable branch only
because the fix significantly changes the benchmark result of pgbench.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doubt in pgbench TPS number

>> I'm not sure about this. I think pgbench will not start transactions
>> until all clients establish connections to PostgreSQL.
> 
> I think that is true if "!is_connect", all client connections are
> performed at the beginning of threadRun, but under -C each client has
> its own connect/deconnect integrated within doCustom, so it is done in
> parallel to other clients having their transactions processed, hence
> the issue with the formula.

Really?

I have tested with pgpool-II which is set to accept up to 2
connections, then run pgbench with -C and -c 32. pgbench was blocked
as expected and I attached gdb and got stack trace:

(gdb) bt
#0  0x7f48d5f17110 in __poll_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#1  0x7f48d6724056 in pqSocketCheck ()
   from /usr/local/src/pgsql/current/lib/libpq.so.5
#2  0x7f48d6724940 in pqWaitTimed ()
   from /usr/local/src/pgsql/current/lib/libpq.so.5
#3  0x7f48d671f3e2 in connectDBComplete ()
   from /usr/local/src/pgsql/current/lib/libpq.so.5
#4  0x7f48d671fbcf in PQconnectdbParams ()
   from /usr/local/src/pgsql/current/lib/libpq.so.5
#5  0x00402b2b in doConnect () at pgbench.c:650
#6  0x00404591 in doCustom (thread=0x25c2f40, st=0x25c2770, 
conn_time=0x25c2f90, logfile=0x0, agg=0x7fffba224260) at pgbench.c:1353
#7  0x0040a1d5 in threadRun (arg=0x25c2f40) at pgbench.c:3581
#8  0x00409ab4 in main (argc=12, argv=0x7fffba224668) at pgbench.c:3455

As you can see, one of threads wants to connect to PostgreSQL
(actually pgpool-II) and waits for reply.

In threadRun() around line 3581:

/* send start up queries in async manner */
for (i = 0; i < nstate; i++)
{
CState *st = [i];
Command   **commands = sql_files[st->use_file];
int prev_ecnt = st->ecnt;

st->use_file = getrand(thread, 0, num_files - 1);
if (!doCustom(thread, st, >conn_time, logfile, ))
remains--;  /* I've aborted */

if (st->ecnt > prev_ecnt && commands[st->state]->type == 
META_COMMAND)
{
fprintf(stderr, "client %d aborted in state %d; 
execution of meta-command failed\n",
i, st->state);
remains--;  /* I've aborted */
PQfinish(st->con);
st->con = NULL;
}
}

Here doCustome() is called with st->con == NULL. In doCustom() around
line 1353:

if (st->con == NULL)
{
instr_time  start,
end;

INSTR_TIME_SET_CURRENT(start);
if ((st->con = doConnect()) == NULL)
{

doConnect() blocks until PostgreSQL (pgpool-II) allows to be
connected.

Because outer loop in threadRun() wants to loop over until all threads
succefully connects to PostgreSQL, pgbench is blocked here.

/* send start up queries in async manner */
for (i = 0; i < nstate; i++)

>> I'm going to commit your patch if there's no objection.
> 
> This seems fine with me.
> 
> The formula change, and just this one, should probably be backported
> somehow, as this is a bug, wherever pgbench resides in older
> versions. It is just 's/nthreads/nclients/' in the printResult formula
> for computing tps_exclude.

Yeah, that's definitely a bug but I'm afraid the fix will change the
TPS number and may break the backward compatibility. Since we have
lived with bug for years, I hesitate to back port to older stable
branches...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread David Rowley

On 28 September 2015 at 20:36, Amit Langote 
wrote:

>
> Did you perhaps attach a version of the patch you didn't intend to?
>

Oops. It seems so.

Please find the correct version attached.

Thanks for checking and letting me know.

--
David Rowley   http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services


group_before_join_2015-09-28_c059c53.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-09-28 Thread Haribabu Kommi

On Tue, Aug 4, 2015 at 2:18 AM, Jeff Janes  wrote:
> On Mon, Jul 27, 2015 at 1:40 PM, Simon Riggs  wrote:
>>
>> On 22 July 2015 at 17:11, Jeff Janes  wrote:
>>>
>>> On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas 
>>> wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes 
 wrote:
 > Attached is a patch that implements the vm scan for truncation.  It
 > introduces a variable to hold the last blkno which was skipped during
 > the
 > forward portion.  Any blocks after both this blkno and after the last
 > inspected nonempty page (which the code is already tracking) must have
 > been
 > observed to be empty by the current vacuum.  Any other process
 > rendering the
 > page nonempty are required to clear the vm bit, and no other process
 > can set
 > the bit again during the vacuum's lifetime.  So if the bit is still
 > set, the
 > page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.
>>>
>>>
>>> I wouldn't say forever, as it would be easy to revert the change if
>>> something more important came along that conflicted with it.
>>
>>
>> I think what is being said here is that someone is already using this
>> technique, or if not, then we actively want to encourage them to do so as an
>> extension or as a submission to core.
>>
>> In that case, I think the rely-on-VM technique sinks again, sorry Jim,
>> Jeff. Probably needs code comments added.
>
>
> Sure, that sounds like the consensus.  The VM method was very efficient, but
> I agree it is pretty fragile and restricting.
>
>>
>>
>> That does still leave the prefetch technique, so all is not lost.
>>
>> Can we see a patch with just prefetch, probably with a simple choice of
>> stride? Thanks.
>
>
> I probably won't get back to it this commit fest, so it can be set to
> returned with feedback.  But if anyone has good ideas for how to set the
> stride (or detect that it is on SSD and so is pointless to try) I'd love to
> hear about them anytime.

I got the following way to get the whether data file is present in the
DISK or SSD.

1. Get the device file system that table data file is mapped using the
following or similar.

df -P "filename" | awk 'NR==2{print $1}'

2. if the device file system is of type /dev/sd* then treat is as a
disk system and proceed
with the prefetch optimization.

3. if we are not able to find the device details directly then we need
to get the information
from the mapping system.

Usually the devices will map like the following

/dev/mapper/v** points to ../dm-*

4. Get the name of the "dm-*"  from the above details and check
whether it is a SSD or not
with the following command.

/sys/block/dm-*/queue/rotation

5. If the value is 0 then it is an SSD drive, 1 means disk drive.

The described above procedure works only for linux. I didn't check for
other operating systems yet.
Is it worth to consider?

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doubt in pgbench TPS number

> I think that the degree of parallelism to consider is nclients, not
> nthreads: while connection time is accumulated in conn_time, other
> clients are possibly doing their transactions, in parallel, 

I'm not sure about this. I think pgbench will not start transactions
until all clients establish connections to PostgreSQL.

I found this while playing with pgpool-II. pgpool-II pre-forks child
process, whose number is defined by "num_init_children"
directive. What I observed was, pgbench starts connecting to pgpool-II
until "-c" connections are established. So, if "-c" is larger than
"num_init_children", no transaction starts.

> even if it
> is in the same thread, so it is not "stopped time" for all clients. It
> starts to matter with "-j 1 -c 30" and slow transactions, the
> cumulated conn_time in each thread may be arbitrary close to the whole
> time if there are many clients.
> 
> Now, I do not think that this tps computation makes that much
> sense. If you want to know the tps without reconnect, run without
> reconnecting... It is clear that I do not get this figure when running
> without -C, so maybe
> the tps with/without reconnection should be dropped?
> 
> Anyway, here is a patch, which:
>  - fixes the "exclude" computation (s/nthreads/nclients/)
>  - fixes the count total for skipped (s/threads/thread/)
>  - removes a useless parameter to doCustom
>(the conn_time is available through the thread parameter).

I have tested your patch. It seems the tolerance is much better than
before with your patch.

With the patch:
./pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql  test
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2887
latency average: 103.914 ms
tps = 95.896850 (including connections establishing)
tps = 98.101662 (excluding connections establishing)

Without the patch:
./pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql  test
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 30 s
number of transactions actually processed: 2887
latency average: 103.914 ms
tps = 95.919415 (including connections establishing)
tps = 124.732475 (excluding connections establishing)

I'm going to commit your patch if there's no objection.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread David Rowley

On 28 September 2015 at 23:17, Amit Langote 
wrote:

> On 2015/09/28 17:04, David Rowley wrote:
> > On 28 September 2015 at 20:36, Amit Langote <
> langote_amit...@lab.ntt.co.jp>
> > wrote:
> >
> >>
> >> Did you perhaps attach a version of the patch you didn't intend to?
> >>
> >
> > Oops. It seems so.
> >
> > Please find the correct version attached.
>
> Thanks, this one works fine.
>
> By the way, you may have noticed that the append_rel_list would be broken
> if the proposed optimization is applied to a appendrel parent.
>
> CREATE TABLE sale_1() INHERITS(sale);
> CREATE TABLE sale_2() INHERITS(sale);
>
> EXPLAIN SELECT count(*) FROM sale;
>   QUERY PLAN
> --
>  Finalize Aggregate  (cost=0.01..0.02 rows=1 width=0)
>->  Result  (cost=0.00..0.01 rows=1 width=0)
>  One-Time Filter: false
> (3 rows)
>
>
Thanks. I've changed this locally to disable the optimisation in this case.


> Moreover, would partial aggregation work below Append?
>

Do you mean for cases like:

create table a as select x.x a from generate_series(1,100) x(x);
select sum(a) from (select a from a union all select a from a) a;

to allow the aggregation to happen before the append?

On testing this I do see that writing the query as:

select sum(a) from (select sum(a) a from a union all select sum(a) from a)
a;

causes it to execute marginally faster. 174.280 ms vs 153.498 ms on my
laptop.
However pushing aggregation below Append nodes is not something I'm aiming
to do for this patch.

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] On-demand running query plans using auto_explain and signals

On Mon, Sep 28, 2015 at 12:05 PM, Pavel Stehule 
wrote:

>
> 2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr <
> oleksandr.shul...@zalando.de>:
>
>> On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
>> wrote:
>>
>> the preparing of content before execution is interesting idea, that can
>>> be used more. The almost queries and plans are not too big, so when the
>>> size of content is not too big - less than 1MB, then can be used one DSM
>>> for all backends.
>>>
>>
>>
>>> When size of content is bigger than limit, then DSM will be allocated
>>> specially for this content. The pointer to DSM and offset can be stored in
>>> requested process slot. The reading and writing to requested slot should be
>>> protected by spinlock, but it should block only two related processes for
>>> short time (copy memory).
>>>
>>
>> Sorry, I don't think this will fly.
>>
>> The whole idea is that a backend publishes the plan of a query just
>> before running it and it doesn't care which other backend(s) might be
>> reading it, how many times and in which order.  The only required locking
>> (implicit) is contained in the code for dsm_attach/detach().
>>
>
> I didn't propose too different solution. There is only one difference -
> sharing DSM for smaller data. It is similar to using usual shared memory.
>

Does this mean implementing some sort of allocator on top of the shared
memory segment?  If so, how are you going to prevent fragmentation?

--
Alex

Re: [HACKERS] On-demand running query plans using auto_explain and signals

On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
wrote:

the preparing of content before execution is interesting idea, that can be
> used more. The almost queries and plans are not too big, so when the size
> of content is not too big - less than 1MB, then can be used one DSM for all
> backends.
>

> When size of content is bigger than limit, then DSM will be allocated
> specially for this content. The pointer to DSM and offset can be stored in
> requested process slot. The reading and writing to requested slot should be
> protected by spinlock, but it should block only two related processes for
> short time (copy memory).
>

Sorry, I don't think this will fly.

The whole idea is that a backend publishes the plan of a query just before
running it and it doesn't care which other backend(s) might be reading it,
how many times and in which order.  The only required locking (implicit) is
contained in the code for dsm_attach/detach().

> Other possibility is showing the size of content in requested process
> slot. Then the requester can preallocate enough size of shared memory. But
> this doesn't solve a issues related to waiting requester for content. So
> first variant is pretty simple, and should be preferred. The disadvantage
> is clear - it can enforce maybe significant slowdown of fast queries.
>

Both of these approaches have just too many synchronization problems, IMO.

--
Alex

Re: [HACKERS] Doubt in pgbench TPS number




#5  0x00402b2b in doConnect () at pgbench.c:650
#6  0x00404591 in doCustom (thread=0x25c2f40, st=0x25c2770,
   conn_time=0x25c2f90, logfile=0x0, agg=0x7fffba224260) at pgbench.c:1353
#7  0x0040a1d5 in threadRun (arg=0x25c2f40) at pgbench.c:3581
#8  0x00409ab4 in main (argc=12, argv=0x7fffba224668) at pgbench.c:3455


Hmm. Ok, whatever the connection position (from doCustom or from 
threadRun), doConnect would block the thread. On the other hand, if you 
start several threads, probably those threads which could connect all 
their client would proceed.


What looks to be needed would be some timeout when connecting to the 
server.



The formula change, and just this one, should probably be backported
somehow, as this is a bug, wherever pgbench resides in older
versions. It is just 's/nthreads/nclients/' in the printResult formula
for computing tps_exclude.


Yeah, that's definitely a bug but I'm afraid the fix will change the
TPS number and may break the backward compatibility. Since we have
lived with bug for years, I hesitate to back port to older stable
branches...


My 2¥: I do not think of a good argument to keep wrong tps numbers once it 
is known that there are plain wrong, especially as it is not a behavioral 
change as such which could block applications or whatever, just a 
different number printed at the end of a run. So I would not bother much 
with upward compatibility consideration in this case.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doubt in pgbench TPS number

>> Yeah, that's definitely a bug but I'm afraid the fix will change the
>> TPS number and may break the backward compatibility. Since we have
>> lived with bug for years, I hesitate to back port to older stable
>> branches...
> 
> My 2¥: I do not think of a good argument to keep wrong tps numbers
> once it is known that there are plain wrong, especially as it is not a
> behavioral change as such which could block applications or whatever,
> just a different number printed at the end of a run. So I would not
> bother much with upward compatibility consideration in this case.
> 
> -- 
> Fabien.

FYI, the bug was there since the thread support was introduced in commit:

da0dfb4b1460c3701abc8ed5f516d138dc4654c

That was in 9.0, which was released in 2010.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr 
:

> On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule 
> wrote:
>
> the preparing of content before execution is interesting idea, that can be
>> used more. The almost queries and plans are not too big, so when the size
>> of content is not too big - less than 1MB, then can be used one DSM for all
>> backends.
>>
>
>
>> When size of content is bigger than limit, then DSM will be allocated
>> specially for this content. The pointer to DSM and offset can be stored in
>> requested process slot. The reading and writing to requested slot should be
>> protected by spinlock, but it should block only two related processes for
>> short time (copy memory).
>>
>
> Sorry, I don't think this will fly.
>
> The whole idea is that a backend publishes the plan of a query just before
> running it and it doesn't care which other backend(s) might be reading it,
> how many times and in which order.  The only required locking (implicit) is
> contained in the code for dsm_attach/detach().
>

I didn't propose too different solution. There is only one difference -
sharing DSM for smaller data. It is similar to using usual shared memory.

Regards

Pavel

>
>
>> Other possibility is showing the size of content in requested process
>> slot. Then the requester can preallocate enough size of shared memory. But
>> this doesn't solve a issues related to waiting requester for content. So
>> first variant is pretty simple, and should be preferred. The disadvantage
>> is clear - it can enforce maybe significant slowdown of fast queries.
>>
>
> Both of these approaches have just too many synchronization problems, IMO.
>
> --
> Alex
>
>

Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread Amit Langote

On 2015/09/28 17:04, David Rowley wrote:
> On 28 September 2015 at 20:36, Amit Langote 
> wrote:
> 
>>
>> Did you perhaps attach a version of the patch you didn't intend to?
>>
> 
> Oops. It seems so.
> 
> Please find the correct version attached.

Thanks, this one works fine.

By the way, you may have noticed that the append_rel_list would be broken
if the proposed optimization is applied to a appendrel parent.

CREATE TABLE sale_1() INHERITS(sale);
CREATE TABLE sale_2() INHERITS(sale);

EXPLAIN SELECT count(*) FROM sale;
  QUERY PLAN
--
 Finalize Aggregate  (cost=0.01..0.02 rows=1 width=0)
   ->  Result  (cost=0.00..0.01 rows=1 width=0)
 One-Time Filter: false
(3 rows)

Moreover, would partial aggregation work below Append?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5: Can't connect with PGSSLMODE=require on Windows

Thom Brown  writes:
> With 9.5 alpha 2 on Windows 8 (64-bit), trying to require SSL results
> in a blocking error:

I've pushed a patch for this; can you verify it on Windows?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On-demand running query plans using auto_explain and signals


On 9/18/15 5:05 AM, Shulgin, Oleksandr wrote:


It should not be true - the data sender create DSM and fills it.
Then set caller slot and send signal to caller. Caller can free DSM
any time, because data sender send newer touch it.


But the requester can timeout on waiting for reply and exit before it
sees the reply DSM.  Actually, I now don't even think a backend can free
the DSM it has not created.  First it will need to attach it,
effectively increasing the refcount, and upon detach it will only
decrease the refcount, but not actually release the segment...

So this has to be the responsibility of the reply sending backend in the
end: to create and release the DSM *at some point*.


What's wrong with just releasing it at the end of the statement? When 
the statement is done there's no point to reading it asynchronously anymore.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

Jim Nasby  writes:
> On 9/28/15 11:43 AM, Andres Freund wrote:
>> It has been stated pretty clearly in this thread by a number of senior
>> community people that we're not going to use a closed source system.

> GitLab OTOH is released under a MIT license, so it is an option. I don't 
> know how it compares to other suggested options, but if YUriy wants to 
> propose it it's at least a viable option.

I think a more accurate summary of what's been said is that we won't
consider putting any important functionality on proprietary platforms,
of which closed-source tools would be a subset.  The intention of the
community is that we'll be around for as long as there's a critical mass
of people interested in maintaining Postgres.  We will not be dependent
on any one company, and that's why e.g. github is out.  (A lot of smaller
open-source projects don't have the luxury of rejecting such options ...
but we do, and we will.)

Now, running gitlab on community-owned hardware would potentially be an
option, if we find gitlab attractive from a functionality standpoint.
The question I'd have about that is whether it has a real development
community, or is open-source in name only.  If github did go belly up,
would we find ourselves maintaining the gitlab code all by ourselves?
That might not be the end of the world, but it wouldn't be a good use
of community time either.

Fundamentally, we're playing the long game here.  We do not want to make
a choice of tools that we're going to regret ten years from now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Ryan Pedela

On Mon, Sep 28, 2015 at 4:09 PM, Jim Nasby  wrote:

> On 9/28/15 11:43 AM, Andres Freund wrote:
>
>> On 2015-09-28 09:41:18 -0700, David Fetter wrote:
>>
>>> Since you're convinced that this is an unqualified win, please put
>>> together a project plan for switching from our current system to
>>> Github.
>>>
>>
>> Err, no. That's a waste of all our time.
>>
>> It has been stated pretty clearly in this thread by a number of senior
>> community people that we're not going to use a closed source system.
>>
>
> GitLab OTOH is released under a MIT license, so it is an option. I don't
> know how it compares to other suggested options, but if YUriy wants to
> propose it it's at least a viable option.


I haven't used Gitlab extensively, but it has a feature set similar to
Github and then some [1]. The OSS project does seem active [2], but it is
still relatively new.

1. https://about.gitlab.com/better-than-github/
2. https://gitlab.com/gitlab-org/gitlab-ce

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Joshua D. Drake


On 09/28/2015 03:40 PM, Alvaro Herrera wrote:

Tom Lane wrote:


Now, running gitlab on community-owned hardware would potentially be an
option, if we find gitlab attractive from a functionality standpoint.
The question I'd have about that is whether it has a real development
community, or is open-source in name only.  If github did go belly up,
would we find ourselves maintaining the gitlab code all by ourselves?
That might not be the end of the world, but it wouldn't be a good use
of community time either.

Fundamentally, we're playing the long game here.  We do not want to make
a choice of tools that we're going to regret ten years from now.


We already made a similar choice some years ago when we started
depending on the then-recently open sourced SourceForge code for
pgFoundry.  That didn't turn out all that well in the long run.


I think we need to look at long standing FOSS projects with a large and 
extended user base (Redmine, RT). Anything that is fairly community 
specific (Debbugs) likely will cause more heartache than it is worth in 
the long run.


JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

Josh Berkus  writes:
> The infra team seems to be good with debbugs, and several committers
> seem to like it, why not go with it?

It certainly seems like debbugs is the proposal to beat at this point.

In the end though, what matters is somebody doing the dogwork to make
it happen: connecting some tool up to our workflow and populating it
with an initial collection of items.  Until that happens, we're just
arguing in a vacuum with no way to see whether a proposal will really
work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!


On 9/28/15 5:34 PM, Tom Lane wrote:

Now, running gitlab on community-owned hardware would potentially be an
option, if we find gitlab attractive from a functionality standpoint.
The question I'd have about that is whether it has a real development
community, or is open-source in name only.  If github did go belly up,
would we find ourselves maintaining the gitlab code all by ourselves?
That might not be the end of the world, but it wouldn't be a good use
of community time either.


FWIW, Gitlab appears to be a completely separate company from GitHub. 
According to https://about.gitlab.com/about/, over 800 people have 
contributed to it. It actually started as an open source project.



Fundamentally, we're playing the long game here.  We do not want to make
a choice of tools that we're going to regret ten years from now.


Agreed.

In this case it's a question of whether we want (or think in the future 
we might want) the advanced features that Gitlab offers. Things like 
commenting directly on "patches" (really, pull requests), direct 
integration with the issue tracker, a CI framework, etc.


Perhaps there's not a strong desire for those features today, but 
tomorrow could be a very different case. I'm honestly rather shocked 
(plesantly) that the community is seriously considering a bug tracker, 
given the reaction that's happened every time in the past. I think it'd 
be a real shame if a few years from now the community might consider, 
say, pull requests instead of emailed patches... except that would mean 
we need Yet Another Tool. Another example is CI. Yes, we have the 
buildfarm, but that does nothing to prevent bitrot in patches.


Actually, now that I've poked around the site a bit, it might well be 
worth adopting Gitlab just to replace some of our current stand-alone 
tools with a single integrated solution, depending on how much time we 
spend maintaining all the separate stuff.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LISTEN denial of service with aborted transaction

Matt Newell  writes:
> 1. When a connection issues it's first LISTEN command, in 
> Exec_ListenPreCommit
> QUEUE_BACKEND_POS(MyBackendId) = QUEUE_TAIL;
> this causes the connection to iterate through every notify queued in the 
> slru, 
> even though at that point I believe the connection can safely ignore any 
> notifications from transactions that are already committed, and if I 
> understand correctly notifications aren't put into the slru until precommit,  
> so wouldn't it be safe to do:
> QUEUE_BACKEND_POS(MyBackendId) = QUEUE_HEAD;
> inside Async_Listen?

Per the comment in Exec_ListenPreCommit, the goal here is to see entries
that have been made and not yet committed.  If you don't do it like this,
then a new listener has the possibility of receiving notifications from
one transaction, while not seeing notifications from another one that
committed later, even though longer-established listeners saw outputs from
both transactions.  We felt that that sort of application-visible
inconsistency would be a bad thing.

> If that's not safe, then could a new member be added to 
> AsyncQueueControl that points to the first uncommitted QueuePosition 
> (wouldn't 
> have to be kept completely up to date).

Err ... isn't that QUEUE_TAIL already?  I've not studied this code in
detail recently, though.

> 2. Would it be possible when a backend is signaled to check if it is idle 
> inside an aborted transaction, and if so process the notifications and put 
> any 
> that match what the backend is listening on into a local list.

Possibly.  sinval catchup notification works like that, so you could maybe
invent a similar mechanism for notify.  Beware that we've had to fix
performance issues with sinval catchup; you don't want to release a whole
bunch of processes to do work at the same time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!


On 9/28/15 6:06 PM, Tom Lane wrote:

Josh Berkus  writes:

The infra team seems to be good with debbugs, and several committers
seem to like it, why not go with it?


It certainly seems like debbugs is the proposal to beat at this point.

In the end though, what matters is somebody doing the dogwork to make
it happen: connecting some tool up to our workflow and populating it
with an initial collection of items.  Until that happens, we're just
arguing in a vacuum with no way to see whether a proposal will really
work.


Note that since they also offer a hosted solution we should use that to 
play with instead of trying to install it at this point.


Integrating the issue tracker looks like it's just a call to this API: 
http://doc.gitlab.com/ce/api/issues.html#new-issue. I don't normally do 
web development myself so I'd rather not figuring out how to setup a 
copy of the website to hack on, but if no one else wants to try it I can 
take a stab at it.


Presumably mirroring our git repository would work the same as it does 
for mirroring to GitHub. My guess is that would be enough to get the 
basic git/issue tracker integration working.


Commitfest could be tied in as well. Presumably each commitfest would be 
a milestone (http://doc.gitlab.com/ce/api/milestones.html) and each 
submission an issue.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] LISTEN denial of service with aborted transaction

2015-09-28 Thread Matt Newell

This morning with our production database I began receiving reports
of the database being "down".

I checked the log and was surprised to see extremely long durations for a
LISTEN that happens after each connection is made by our database library.
This coincided with many(approx 600) new connections happening in a short time
window due to render nodes automatically being turned on when the first job of
the morning was submitted(idle nodes are turned off to save power). My
initial hunch was that there was some code in postgres that resulted
exponential execution time if enough listens on new connections happened at
the same time. As I was trying to gather more information the listen times
began to decrease and after about 20 minutes things were back to normal.

A few hours later the symptoms returned but this time the listen was taking
upwards of 15 minutes. I did some more reading and checked the pg_notify
directory and found that there were 49 files. Having never checked that
directory before I wasn't sure if that was normal. A short time later I
noticed there was a process sitting idle with an aborted transaction. After
killing the process things quickly returned to normal.

After doing a search for "idle in transaction (aborted)" I came upon
http://stackoverflow.com/questions/15036438/postgres-connection-leaks-idle-in-transaction-aborted

While this is a potential solution for dealing with the problem it seems that
the postgresql developers have decided to let connections stay in the "idle in
transaction (aborted)" state for a reason, most likely under the assumption
that it's relatively safe and only eats up the resources of a single
connection. However it's easy to demonstrate that doing:

listen "abc";
begin;
select bad_column from non_existant_table;

...will eventually cause a denial of service situation if the DBA hasn't setup
guards against connection sitting idle in an aborted transaction.

Looking at the code in src/backend/commands/async.c I think there are a couple
ways to eliminate this problem.

1. When a connection issues it's first LISTEN command, in Exec_ListenPreCommit
QUEUE_BACKEND_POS(MyBackendId) = QUEUE_TAIL;
this causes the connection to iterate through every notify queued in the slru,
even though at that point I believe the connection can safely ignore any
notifications from transactions that are already committed, and if I
understand correctly notifications aren't put into the slru until precommit,
so wouldn't it be safe to do:
QUEUE_BACKEND_POS(MyBackendId) = QUEUE_HEAD;
inside Async_Listen? If that's not safe, then could a new member be added to
AsyncQueueControl that points to the first uncommitted QueuePosition (wouldn't
have to be kept completely up to date).

This would solve the problem of slow initial LISTEN in the face of a growing
pg_notify queue.

2. Would it be possible when a backend is signaled to check if it is idle
inside an aborted transaction, and if so process the notifications and put any
that match what the backend is listening on into a local list. This would
allow the slru to be cleaned up. In practice I think most notifications would
either be disregarded or combined with a duplicate, so the list would most
likely end up staying very small. If the backend local list does grow too
large then the connection could be killed or handled in some other appropriate
way.

I am happy to attempt coming up with a patch if the ideas are deemed
worthwhile.

Thanks,
Matt Newell

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Gavin Flower


On 29/09/15 11:54, Joshua D. Drake wrote:

On 09/28/2015 03:40 PM, Alvaro Herrera wrote:

Tom Lane wrote:


Now, running gitlab on community-owned hardware would potentially be an
option, if we find gitlab attractive from a functionality standpoint.
The question I'd have about that is whether it has a real development
community, or is open-source in name only.  If github did go belly up,
would we find ourselves maintaining the gitlab code all by ourselves?
That might not be the end of the world, but it wouldn't be a good use
of community time either.

Fundamentally, we're playing the long game here.  We do not want to 
make

a choice of tools that we're going to regret ten years from now.


We already made a similar choice some years ago when we started
depending on the then-recently open sourced SourceForge code for
pgFoundry.  That didn't turn out all that well in the long run.


I think we need to look at long standing FOSS projects with a large 
and extended user base (Redmine, RT). Anything that is fairly 
community specific (Debbugs) likely will cause more heartache than it 
is worth in the long run.


JD



Linux kernel project uses bugzilla (https://bugzilla.kernel.org)
and so does LibreOffice (https://bugs.documentfoundation.org)

I think they are both fairly big projects in for the long haul.


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!


On 9/28/15 11:43 AM, Andres Freund wrote:

On 2015-09-28 09:41:18 -0700, David Fetter wrote:

Since you're convinced that this is an unqualified win, please put
together a project plan for switching from our current system to
Github.


Err, no. That's a waste of all our time.

It has been stated pretty clearly in this thread by a number of senior
community people that we're not going to use a closed source system.


GitLab OTOH is released under a MIT license, so it is an option. I don't 
know how it compares to other suggested options, but if YUriy wants to 
propose it it's at least a viable option.


[1] https://gitlab.com/gitlab-org/gitlab-ce/blob/master/LICENSE
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

Tom Lane wrote:

> Now, running gitlab on community-owned hardware would potentially be an
> option, if we find gitlab attractive from a functionality standpoint.
> The question I'd have about that is whether it has a real development
> community, or is open-source in name only.  If github did go belly up,
> would we find ourselves maintaining the gitlab code all by ourselves?
> That might not be the end of the world, but it wouldn't be a good use
> of community time either.
> 
> Fundamentally, we're playing the long game here.  We do not want to make
> a choice of tools that we're going to regret ten years from now.

We already made a similar choice some years ago when we started
depending on the then-recently open sourced SourceForge code for
pgFoundry.  That didn't turn out all that well in the long run.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] track_commit_timestamp and COMMIT PREPARED

Fujii Masao wrote:

> +if (replorigin_sesssion_origin == InvalidRepOriginId ||
> 
> This is not the problem of the patch, but I started wondering what
> "sesssion" in the variable name "replorigin_sesssion_origin" means.
> Is it just a typo and it should be "session"? Or it's the abbreviation
> of something?

Just a typo; I just pushed a fix.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Josh Berkus

On 09/28/2015 03:40 PM, Alvaro Herrera wrote:
> Tom Lane wrote:
> 
>> Now, running gitlab on community-owned hardware would potentially be an
>> option, if we find gitlab attractive from a functionality standpoint.
>> The question I'd have about that is whether it has a real development
>> community, or is open-source in name only.  If github did go belly up,
>> would we find ourselves maintaining the gitlab code all by ourselves?
>> That might not be the end of the world, but it wouldn't be a good use
>> of community time either.
>>
>> Fundamentally, we're playing the long game here.  We do not want to make
>> a choice of tools that we're going to regret ten years from now.
> 
> We already made a similar choice some years ago when we started
> depending on the then-recently open sourced SourceForge code for
> pgFoundry.  That didn't turn out all that well in the long run.

No kidding.

Anyway, we don't have to have this discussion because the Github Issue
model is insufficiently sophisticated for our usage:

* crappy-to-nonexistant email integration
* flat "tag" categorization system
* no concept of releases
* too-simple two-level permissions model
* poor search tools
* no ability to add new fields to extend
* dependency on markup-based cross-referencing
* inability to flag issues for specific people's attention
* no workflow other than open/closed
* no support for attachments

... in short, Github issues is great for a small 6-dev project, but is
utterly inadequate for a project the size of PostgreSQL.

Now, if those issues were common to other tools we could find, then
maybe it would be worth fixing them.  But we already have access to
other tools which are more mature, so why would we bother?

The infra team seems to be good with debbugs, and several committers
seem to like it, why not go with it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Joshua D. Drake


On 09/28/2015 04:08 PM, Gavin Flower wrote:


JD



Linux kernel project uses bugzilla (https://bugzilla.kernel.org)
and so does LibreOffice (https://bugs.documentfoundation.org)

I think they are both fairly big projects in for the long haul.


I am not anti-bugzilla, just not all that familiar with it recently.

JD




Cheers,
Gavin




--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rework the way multixact truncations work

On Mon, Sep 28, 2015 at 5:47 PM, Jim Nasby  wrote:
> There was discussion about making this a PANIC instead of a LOG, which I
> think is a good idea... but then there'd need to be some way to not PANIC if
> you were doing an upgrade.

I think you're worrying about a non-problem.  This code has not been
back-patched prior to 9.5, and the legacy truncation code has been
removed in 9.5+.  So it's a complete non-issue right at the moment.
If at some point we back-patch this further, then it potentially
becomes a live issue, but I would like to respectfully inquire what
exactly you think making it a PANIC would accomplish?  There are a lot
of scary things about this patch, but the logic for deciding whether
to perform a legacy truncation is solid as far as I know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

* Ryan Pedela (rped...@datalanche.com) wrote:
> I haven't used Gitlab extensively, but it has a feature set similar to
> Github and then some [1]. The OSS project does seem active [2], but it is
> still relatively new.

I've set it up and used it for a relatively small environment and was
*not* impressed with it.  Perhaps it's come a long way in a year or two,
but I doubt it.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Josh Berkus  writes:
> > The infra team seems to be good with debbugs, and several committers
> > seem to like it, why not go with it?
> 
> It certainly seems like debbugs is the proposal to beat at this point.

Agreed.

> In the end though, what matters is somebody doing the dogwork to make
> it happen: connecting some tool up to our workflow and populating it
> with an initial collection of items.  Until that happens, we're just
> arguing in a vacuum with no way to see whether a proposal will really
> work.

Setting up debbugs is on my list of things to do in the not-too-distant
future, but don't expect much before beta1 as I've got a few items in
the hopper for that which are clearly higher priority than the bug
tracker we've lived without for 20 years.

I'll be sure to update the thread once there is progress to report.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

JD,

* Joshua D. Drake (j...@commandprompt.com) wrote:
> On 09/28/2015 03:40 PM, Alvaro Herrera wrote:
> >We already made a similar choice some years ago when we started
> >depending on the then-recently open sourced SourceForge code for
> >pgFoundry.  That didn't turn out all that well in the long run.
> 
> I think we need to look at long standing FOSS projects with a large
> and extended user base (Redmine, RT). Anything that is fairly
> community specific (Debbugs) likely will cause more heartache than
> it is worth in the long run.

debbugs is being used by Debian, and has been since before our first
release (I believe- according to wikipedia, it started in 1994..).
Further, it's now being used by the GNU project for things as important
(well, to some ;) as Emacs.

It also requires a minimum of customization to integrate with our
existing workflow.

That's good enough for us to at least test it out, in my view.

I've run both Redmine (ugh) and RT (it's good, but I don't feel it's
quite as good as debbugs, for us).

Further, to your specific point, neither of those have the kind of FOSS
community backing that debbugs has.  I have no issue with BestPractical
but I don't know that there's 100 individuals ready to jump in and help
keep RT going if they go south.  Redmine seems a bit better in that
regard, but it's painful to maintain and is only 9 years old, while
debbugs is old enough to drink in the US at this point.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Tuesday, September 29, 2015 5:46 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Etsuro Fujita; PostgreSQL-development; 花田茂
> Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
> 
> On Mon, Sep 28, 2015 at 3:34 AM, Kouhei Kaigai  wrote:
> > The attached patch allows FDW driver to handle EPQ recheck by its own
> > preferable way, even if it is alternative local join or ExecQual to
> > the expression being pushed down.
> 
> Thanks.  I was all set to commit this, or at least part of it, when I
> noticed that we already have an FDW callback called RefetchForeignRow.
> We seem to be intending that this new callback should refetch the row
> from the foreign server and verify that any pushed-down quals apply to
> it.  But why can't RefetchForeignRow do that?  That seems to be what
> it's for.
>
At least here are two matters to solve the problem with RefetchForeignRow.

1. RefetchForeignRow() does not take ForeignScanState argument, so it is
   not obvious how to cooperate with the private state in ForeignScanState;
   that may include expression pushed down, and so on.

2. ForeignScan with scanrelid == 0 represents the result of joined
   relations. Even if the refetched tuple is visible on base-relation
   level, it may not survive the join condition at the upper level.
   Once relations join get pushed down, only FDW driver knows how
   base relations are joined.

So, it is the only reasonable way to ask FDW driver on ExecScanFetch,
to check visibility of a particular tuple or another tuple made from
this.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partial Aggregation / GROUP BY before JOIN

2015-09-28 Thread Amit Langote

On 2015/09/28 20:58, David Rowley wrote:
> On 28 September 2015 at 23:17, Amit Langote 
> wrote:
>> Moreover, would partial aggregation work below Append?
>>
> 
> Do you mean for cases like:
> 
> create table a as select x.x a from generate_series(1,100) x(x);
> select sum(a) from (select a from a union all select a from a) a;
> 
> to allow the aggregation to happen before the append?
> 

Yes.

> On testing this I do see that writing the query as:
> 
> select sum(a) from (select sum(a) a from a union all select sum(a) from a)
> a;
> 
> causes it to execute marginally faster. 174.280 ms vs 153.498 ms on my
> laptop.
> However pushing aggregation below Append nodes is not something I'm aiming
> to do for this patch.

I see. I recall reading in archives that pushing aggregates below append
was not found to make much difference as your little test suggests, too.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CustomScan support on readfuncs.c