Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Ashutosh Bapat
> Ah, you're missing how commits in ATX are expected to work.  Let me
> illustrate:
>
> X (
>Data write A1
>call Y(
> Start ATX
> Data write B1
> Commit ATX
>)
>Data write A2
>Exception
> )
>
> In this workflow, B1 would be committed and persistent. Neither A1 nor
> A2 would be committed, or visible to other users.  Depending on what
> implementation we end up with, A1 might not even be visible to Y().
>
>
A1 should never be visible to Y(), else we will end up with
inconsistencies. E.g.

A1 is a primary key and B1 writes a foreign key referencing A1. Commit ATX,
will not complain as it sees A1, but in case X rolls back, we may have B1
referencing nothing.

Am I missing something?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Amit Langote

KaiGai-san,

On 2015-07-27 PM 11:07, Kouhei Kaigai wrote:
> 
>   Append
>--> Funnel
> --> PartialSeqScan on rel1 (num_workers = 4)
>--> Funnel
> --> PartialSeqScan on rel2 (num_workers = 8)
>--> SeqScan on rel3
> 
>  shall be rewritten to
>   Funnel
> --> PartialSeqScan on rel1 (num_workers = 4)
> --> PartialSeqScan on rel2 (num_workers = 8)
> --> SeqScan on rel3(num_workers = 1)
> 

In the rewritten plan, are respective scans (PartialSeq or Seq) on rel1,
rel2 and rel3 asynchronous w.r.t each other? Or does each one wait for the
earlier one to finish? I would think the answer is no because then it
would not be different from the former case, right? Because the original
premise seems that (partitions) rel1, rel2, rel3 may be on different
volumes so parallelism across volumes seems like a goal of parallelizing
Append.

>From my understanding of parallel seqscan patch, each worker's
PartialSeqScan asks for a block to scan using a shared parallel heap scan
descriptor that effectively keeps track of division of work among
PartialSeqScans in terms of blocks. What if we invent a PartialAppend
which each worker would run in case of a parallelized Append. It would use
some kind of shared descriptor to pick a relation (Append member) to scan.
The shared structure could be the list of subplans including the mutex for
concurrency. It doesn't sound as effective as proposed
ParallelHeapScanDescData does for PartialSeqScan but any more granular
might be complicated. For example, consider (current_relation,
current_block) pair. If there are more workers than subplans/partitions,
then multiple workers might start working on the same relation after a
round-robin assignment of relations (but of course, a later worker would
start scanning from a later block in the same relation). I imagine that
might help with parallelism across volumes if that's the case. MergeAppend
parallelization might involve a bit more complication but may be feasible
with a PartialMergeAppend with slightly different kind of coordination
among workers. What do you think of such an approach?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature - Index support on an lquery field (from the ltree module)

2015-07-27 Thread Heikki Linnakangas

On 07/28/2015 06:40 AM, David Belle wrote:

I have a requirement for a project that I am working on and was
hoping to gain some discussion on the idea of implementing an index
type for lquery fields (see ltree’s
http://www.postgresql.org/docs/9.4/static/ltree.html
)

Currently ltree fields can have GIST/GIN indexes, but you cannot
create a index for lquery or ltxtquery. In my scope, an index for
lquery would be enough, although if it wasn’t too difficult it for to
be expanded for ltxtquery fields, then this could also be
implemented.


Hmm. My first thought is to extract labels from the query which are 
required for any matches, and put those in the GIN index. When querying, 
find all lqueries that have at least one label in common with the search 
key.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Amit Kapila
On Tue, Jul 28, 2015 at 7:59 AM, Kouhei Kaigai  wrote:
>
> > -Original Message-
> > From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
> > Sent: Monday, July 27, 2015 11:07 PM
> > To: Amit Kapila
> > >
> > > Is there a real need to have new node like ParallelAppendPath?
> > > Can't we have Funnel node beneath AppendNode and then each
> > > worker will be responsible to have SeqScan on each inherited child
> > > relation.  Something like
> > >
> > > Append
> > >---> Funnel
> > >   --> SeqScan rel1
> > >   --> SeqScan rel2
> > >
> > If Funnel can handle both of horizontal and vertical parallelism,
> > it is a great simplification. I never stick a new node.
> >
> > Once Funnel get a capability to have multiple child nodes, probably,
> > Append node above will have gone. I expect set_append_rel_pathlist()
> > add two paths based on Append and Funnel, then planner will choose
> > the cheaper one according to its cost.
> >
> In the latest v16 patch, Funnel is declared as follows:
>
>   typedef struct Funnel
>   {
>   Scanscan;
>   int num_workers;
>   } Funnel;
>
> If we try to add Append capability here, I expects the structure will
> be adjusted as follows, for example:
>
>   typedef struct Funnel
>   {
>   Scanscan;
>   List   *funnel_plans;
>   List   *funnel_num_workers;
>   } Funnel;
>
> As literal, funnel_plans saves underlying Plan nodes instead of the
> lefttree. Also, funnel_num_workers saves number of expected workers
> to be assigned on individual child plans.
>

or shall we have a node like above and name it as FunnelAppend or
AppenFunnel?

> Even though create_parallelscan_paths() in v16 set num_workers not
> larger than parallel_seqscan_degree, total number of the concurrent
> background workers may exceed this configuration if more than two
> PartialSeqScan nodes are underlying.
> It is a different configuration from max_worker_processes, so it is
> not a matter as long as we have another restriction.
> However, how do we control the cap of number of worker processes per
> "appendable" Funnel node? For example, if a parent table has 200
> child tables but max_worker_processes are configured to 50.
> It is obviously impossible to launch all the background workers
> simultaneously. One idea I have is to suspend launch of some plans
> until earlier ones are completed.
>

Okay, but I think in that idea you need to re-launch the workers again for
new set of relation scan's which could turn out to be costly, how about
designing some way where workers after completing their assigned work
check for new set of task/'s (which in this case would be to scan a new) and
then execute the same.  I think in this way we can achieve dynamic
allocation
of work and achieve maximum parallelism with available set of workers.
We have achieved this in ParallelSeqScan by scanning at block level, once
a worker finishes a block, it checks for new block to scan.

>
> > We will need to pay attention another issues we will look at when Funnel
> > kicks background worker towards asymmetric relations.
> >
> > If number of rows of individual child nodes are various, we may
> > want to assign 10 background workers to scan rel1 with PartialSeqScan.
> > On the other hands, rel2 may have very small number of rows thus
> > its total_cost may be smaller than cost to launch a worker.
> > In this case, Funnel has child nodes to be executed asynchronously and
> > synchronously.
> >

I think this might turn out to be slightly tricky, for example how do we
know
for what size of relation, how many workers are sufficient?
Another way to look at dividing the work in this case could be in terms of
chunk-of-blocks, once a worker finishes it current set of block/'s, it
should be
able to get new set of block's to scan.  So let us assume if we decide
chunk-size as 32 and total number of blocks in whole inheritance hierarchy
are 3200, then the max workers we should allocate to this scan are 100 and
if we have parallel_seqscan degree lesser than that then we can use those
many workers and then let them scan 32-blocks-at-a-time.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Pavel Stehule
2015-07-28 5:24 GMT+02:00 Pavel Stehule :

>
>
> 2015-07-27 21:57 GMT+02:00 Andrew Dunstan :
>
>>
>> On 07/27/2015 02:53 PM, Pavel Stehule wrote:
>>
>>>
>>>
>>>
>>>
>>> I am trying to run parallel execution
>>>
>>> psql -At -c "select datname from pg_database" postgres | xargs -n 1 -P 3
>>> psql -c "select current_database()"
>>>
>>>
>>>
>>
>> I don't think it's going to be a hugely important feature, but I don't
>> see a problem with creating a new option (-C seems fine) which would have
>> the same effect as if the arguments were contatenated into a file which is
>> then used with -f. IIRC -c has some special characteristics which means
>> it's probably best not to try to extend it for this feature.
>>
>
> ok, I'll try to write patch.
>

I have a question. Can be -C option multiple?

The SQL is without problem, but what about \x command?

postgres=# \dt \dn select 10;
No relations found.
List of schemas
┌──┬───┐
│ Name │ Owner │
╞══╪═══╡
└──┴───┘
(0 rows)

\dn: extra argument "10;" ignored


some like

psql -C "\dt \dn" -C "select 10"

It is looking better than psql -c "\dt \dn \n select 10"

Regards

Pavel



>
> Pavel
>
>
>>
>> cheers
>>
>> andrew
>>
>
>


[HACKERS] Feature - Index support on an lquery field (from the ltree module)

2015-07-27 Thread David Belle
Hi 

Forgive me if this is not the right place to discuss this. I am new to the 
postgresql community.

I have a requirement for a project that I am working on and was hoping to gain 
some discussion on the idea of implementing an index type for lquery fields 
(see ltree’s http://www.postgresql.org/docs/9.4/static/ltree.html 
)

Currently ltree fields can have GIST/GIN indexes, but you cannot create a index 
for lquery or ltxtquery. In my scope, an index for lquery would be enough, 
although if it wasn’t too difficult it for to be expanded for ltxtquery fields, 
then this could also be implemented. 

I have not contributed to postgres yet so I’m not really sure how to get 
started. 

Thoughts?








Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Pavel Stehule
2015-07-27 23:59 GMT+02:00 Merlin Moncure :

> On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson  wrote:
> > On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure 
> wrote:
> >>
> >> On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus  wrote:
> >> > Batch Jobs: large data-manipulation tasks which need to be broken up
> >> > into segments, with each segment committing separately.  Example:
> >> > updating 1 million records in batches of 1000.
> >>
> >> Autonomous transactions are not a good fit for this case; stored
> >> procedures are a better way to go for any scenario where you don't
> >> want be be in a snapshot (for example, suppose you want to change
> >> isolation level on the fly).
> >
> >
> > Hm, you mean we need real "stored procedures" in PostgreSQL and not just
> > "functions"?
>
> Yes, exactly.
>
> Autonomous transactions aren't really set up for cases where the
> function runs for a very long time or indefinitely.  This is the
> 'advancing xmin' problem as Josh puts it but I think the problem is
> much bigger than that.  Anyways, this is mostly irrelevant to
> autonomous transactions as long as the design isn't extended to try
> and cover that case.
>
> Is the Autonomous Transaction feature only going to be exposed through
> pl/pgsql?
>

I hope not.

The integration with plpgsql can be secondary question. In this case I
prefer a relation to block statement without possibility to explicit
COMMIT. Minimally in functions.

some like

BEGIN
  BEGIN AUTONOMOUS
   ...
  END;
END;

This is consistent with current subtransaction support, and disallow some
corner cases like forgotten COMMIT.

Regards

Pavel


> merlin
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-27 Thread Tom Lane
Andreas Seltenreich  writes:
> I let sqlsmith run during the night, and it did no longer trigger the
> first two.  During roughly a million random queries it triggered the
> already mentioned brin one 10 times, but there was also one instance of
> this new one in the log:

> TRAP: FailedAssertion("!(join_clause_is_movable_into(rinfo, joinrel->relids, 
> join_and_req))", File: "relnode.c", Line: 987)
> LOG:  server process (PID 12851) was terminated by signal 6: Aborted
> DETAIL:  Failed process was running: select  
> rel65543066.tmplname as c0, 
> rel65543064.umuser as c1
>   from 
> public.dept as rel65543059
>   inner join pg_catalog.pg_user_mappings as rel65543064
>   left join pg_catalog.pg_enum as rel65543065
>   on (rel65543064.srvname = rel65543065.enumlabel )
> inner join pg_catalog.pg_ts_template as rel65543066
> on (rel65543065.enumtypid = rel65543066.tmplnamespace )
>   on (rel65543059.dname = rel65543064.srvname )
>   where ((rel65543059.mgrname = rel65543059.mgrname) 
>   and (rel65543064.usename = rel65543066.tmplname)) 
> and (rel65543059.mgrname ~~ rel65543059.mgrname)
>   fetch first 128 rows only;

I looked into this one.  What seems to be the story is that
join_clause_is_movable_into() is approximate in the conservative
direction, that is it sometimes can return "false" when a strict analysis
would conclude "true" (a fact that is undocumented in its comments;
I shall fix that).  This is acceptable, as long as the answers are
consistent across different jointree levels, since the worst consequence
is that we might fail to push a join clause as far down the jointree as it
could be pushed.  However, the Assert that's failing here supposes that
the answer is exact.  I think a sufficient fix in the near term is to
disable that Assert and add suitable commentary.  In the long run it might
be nice if the answers were exact, but that would require more information
than is currently passed to join_clause_is_movable_into(), which would
imply non-back-patchable API changes.

> ,[ git bisect ]
> |   first bad commit: [3f8c23c4d31d4a0e801041733deb2c7cfa577b32] Improve
> |   predtest.c's ability to reason about operator expressions.
> `

There seems to be something odd going on with your bisection tests,
because once again the query fails clear back to 9.2 for me, which is what
I'd expect based on the above analysis --- the movable-join-clause logic
all came in with parameterized paths in 9.2.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more RLS oversights

2015-07-27 Thread Joe Conway
On 07/03/2015 10:03 AM, Noah Misch wrote:
>> +static void
>> +dumpPolicy(Archive *fout, PolicyInfo *polinfo)
> ...
>> +if (polinfo->polqual != NULL)
>> +appendPQExpBuffer(query, " USING %s", polinfo->polqual);
> 
> (3) The USING clause needs parentheses; a dump+reload failed like so:

Also needed for WITH CHECK. Fix pushed to 9.5 and HEAD.

Joe




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Pavel Stehule
2015-07-27 21:57 GMT+02:00 Andrew Dunstan :

>
> On 07/27/2015 02:53 PM, Pavel Stehule wrote:
>
>>
>>
>>
>>
>> I am trying to run parallel execution
>>
>> psql -At -c "select datname from pg_database" postgres | xargs -n 1 -P 3
>> psql -c "select current_database()"
>>
>>
>>
>
> I don't think it's going to be a hugely important feature, but I don't see
> a problem with creating a new option (-C seems fine) which would have the
> same effect as if the arguments were contatenated into a file which is then
> used with -f. IIRC -c has some special characteristics which means it's
> probably best not to try to extend it for this feature.
>

ok, I'll try to write patch.

Pavel


>
> cheers
>
> andrew
>


Re: [HACKERS] MultiXact member wraparound protections are now enabled

2015-07-27 Thread Noah Misch
On Mon, Jul 27, 2015 at 07:59:40AM +0100, Simon Riggs wrote:
> On 26 July 2015 at 20:15, Noah Misch  wrote:
> > On Fri, Jul 24, 2015 at 09:14:09PM -0400, Peter Eisentraut wrote:
> > > On 7/22/15 4:45 PM, Robert Haas wrote:
> > > > But it seemed to me that this could be rather confusing.  I thought it
> > > > would be better to be explicit about whether the protections are
> > > > enabled in all cases.  That way, (1) if you see the message saying
> > > > they are enabled, they are enabled; (2) if you see the message saying
> > > > they are disabled, they are disabled; and (3) if you see neither
> > > > message, your version does not have those protections.
> > >
> > > But this is not documented, AFAICT, so I don't think anyone is going to
> > > be able to follow that logic.  I don't see anything in the release notes
> > > saying, look for this message to see how this applies to you, or
> > whatever.
> >
> > I supported inclusion of the message, because it has good potential to help
> > experts studying historical logs to find the root cause of data corruption.
> > The complex histories of clusters showing corruption from this series of
> > bugs
> > have brought great expense to the task of debugging new reports.  Given a
> > cluster having full mxact wraparound protections since last corruption-free
> > backup (or since initdb), one can rule out some causes.
> 
> 
> Would it be better to replace it with a less specific and more generally
> useful message?
> 
> For example, Server started with release X.y.z
> from which we could infer various useful things.

That message does sound generally useful, but we couldn't infer $subject from
it.  While the $subject message appears at startup in simple cases, autovacuum
prerequisite work can delay it indefinitely.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CustomScan and readfuncs.c

2015-07-27 Thread Kouhei Kaigai
> 2. Reproduce method table on background worker
> --
> The method field of CustomPath/Scan/ScanState is expected to be
> a reference to a static structure. Thus, copyObject() does not
> copy the entire table, but only pointers.
> However, we have no way to guarantee the callback functions have
> same entrypoint addressed on background workers. So, we may need
> an infrastructure to reproduce same CustomScan node with same
> callback function tables, probably, identified by name.
> We may have a few ways to solve the problem.
> 
> * Add system catalog, function returns pointer
> The simplest way, like FDW. System catalog has a name and function
> to return callback pointers. It also needs SQL statement support,
> even a little down side.
>
I tried to design a DDL statement and relevant system catalog as follows.

  #define CustomPlanRelationId3999
  
  CATALOG(pg_custom_plan,3999)
  {
  NameDatacustom_name;
  regproc custom_handler;
  } FormData_pg_custom_plan;

This simple catalog saves a pair of name and handler function of custom
plan provider. Like FDW, this handler function returns pointers to the
entrypoint to be called by set_(rel|join)_pathlist_hook and relevant
CustomXXXMethods table.

User can register a custom plan provider using the following statement:
  CREATE CUSTOM PLAN  HANDLER ;

And unregister:
  DROP CUSTOM PLAN ;

This enhancement allows background workers to reproduce CustomScan node
that was serialized by nodeToString(), as long as provider is specified
by the name.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Kaigai Kouhei(海外 浩平)
> Sent: Monday, July 27, 2015 8:42 AM
> To: 'Tom Lane'
> Cc: pgsql-hackers@postgresql.org
> Subject: RE: [HACKERS] CustomScan and readfuncs.c
> 
> > Kouhei Kaigai  writes:
> > > Under the investigation of ParallelAppend, I noticed here is a few
> > > problems in CustomScan, that prevents to reproduce an equivalent
> > > plan node on the background worker from serialized string.
> >
> > > 1. CustomScanMethods->TextOutCustomScan callback
> > > 
> > > This callback allows to output custom information on nodeToString.
> > > Originally, we intend to use this callback for debug only, because
> > > CustomScan must be copyObject() safe, thus, all the private data
> > > also must be stored in custom_exprs or custom_private.
> > > However, it will lead another problem when we try to reproduce
> > > CustomScan node from the string form generated by outfuncs.c.
> > > If TextOutCustomScan prints something, upcoming _readCustomScan
> > > has to deal with unexpected number of tokens in unexpected format.
> >
> > Um ... wait a second.  There is no support in readfuncs for any
> > plan node type, and never has been, and I seriously doubt that there
> > ever should be.  I do not think it makes sense to ship plans around
> > in the way you seem to have in mind.  (Also, I don't think the
> > problems you mention are exactly unique to CustomScan.  There's no
> > reason to assume that FDW plans could survive this treatment either,
> > since we do not know what's in the fdw_private stuff; certainly no
> > one has ever suggested that it should not contain pointers to static
> > data.)
> >
> Yep, no Plan node types are supported at this moment, however, will
> appear soon by the Funnel + PartialSeqScan nodes.
> It serializes a partial plan subtree using nodeToString() then gives
> the flatten PlannedStmt to background workers.
> I'm now investigating to apply same structure to Append not to kick
> child nodes in parallel.
> Once various plan node types appear in readfuncs.c, we have to care
> about this problem, don't it? I'm working for the patch submission
> of ParallelAppend on the next commit-fest, so like to make a consensus
> how to treat this matter.
> 
> > > I'd like to propose to omit this callback prior to v9.5 release,
> > > for least compatibility issues.
> >
> > I regard our commitment to cross-version compatibility for the
> > custom scan APIs as being essentially zero, for reasons previously
> > discussed.  So if this goes away in 9.6 it will not matter, but we
> > might as well leave it in for now for debug support.
> >
> I don't argue this point strongly. If TextOutCustomScan shall be
> obsoleted on v9.6, it is just kindness for developers not to use
> this callback.
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Kouhei Kaigai
> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Monday, July 27, 2015 11:07 PM
> To: Amit Kapila
> Cc: pgsql-hackers@postgresql.org; Robert Haas; Kyotaro HORIGUCHI
> Subject: Re: [HACKERS] [DESIGN] ParallelAppend
> 
> > On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai  wrote:
> > >
> > > Hello,
> > >
> > > I'm recently working/investigating on ParallelAppend feature
> > > towards the next commit fest. Below is my design proposal.
> > >
> > > 1. Concept
> > > --
> > > Its concept is quite simple anybody might consider more than once.
> > > ParallelAppend node kicks background worker process to execute
> > > child nodes in parallel / asynchronous.
> > > It intends to improve the performance to scan a large partitioned
> > > tables from standpoint of entire throughput, however, latency of
> > > the first multi-hundred rows are not scope of this project.
> > > From standpoint of technology trend, it primarily tries to utilize
> > > multi-cores capability within a system, but also enables to expand
> > > distributed database environment using foreign-tables inheritance
> > > features.
> > > Its behavior is very similar to Funnel node except for several
> > > points, thus, we can reuse its infrastructure we have had long-
> > > standing discussion through the v9.5 development cycle.
> > >
> > > 2. Problems to be solved
> > > -
> > > Typical OLAP workloads takes tons of tables join and scan on large
> > > tables which are often partitioned, and its KPI is query response
> > > time but very small number of sessions are active simultaneously.
> > > So, we are required to run a single query as rapid as possible even
> > > if it consumes larger computing resources than typical OLTP workloads.
> > >
> > > Current implementation to scan heap is painful when we look at its
> > > behavior from the standpoint - how many rows we can read within a
> > > certain time, because of synchronous manner.
> > > In the worst case, when SeqScan node tries to fetch the next tuple,
> > > heap_getnext() looks up a block on shared buffer, then ReadBuffer()
> > > calls storage manager to read the target block from the filesystem
> > > if not on the buffer. Next, operating system makes the caller
> > > process slept until required i/o get completed.
> > > Most of the cases are helped in earlier stage than the above worst
> > > case, however, the best scenario we can expect is: the next tuple
> > > already appear on top of the message queue (of course visibility
> > > checks are already done also) with no fall down to buffer manager
> > > or deeper.
> > > If we can run multiple scans in parallel / asynchronous, CPU core
> > > shall be assigned to another process by operating system, thus,
> > > it eventually improves the i/o density and enables higher processing
> > > throughput.
> > > Append node is an ideal point to be parallelized because
> > > - child nodes can have physically different location by tablespace,
> > >   so further tuning is possible according to the system landscape.
> > > - it can control whether subplan is actually executed on background
> > >   worker, per subplan basis. If subplan contains large tables and
> > >   small tables, ParallelAppend may kick background worker to scan
> > >   large tables only, but scan on small tables are by itself.
> > > - Like as Funnel node, we don't need to care about enhancement of
> > >   individual node types. SeqScan, IndexScan, ForeignScan or others
> > >   can perform as usual, but actually in parallel.
> > >
> > >
> > > 3. Implementation
> > > --
> > > * Plan & Cost
> > >
> > > ParallelAppend shall appear where Appen can appear except for the
> > > usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
> > > both of AppendPath and ParallelAppendPath with cost for each.
> > >
> >
> > Is there a real need to have new node like ParallelAppendPath?
> > Can't we have Funnel node beneath AppendNode and then each
> > worker will be responsible to have SeqScan on each inherited child
> > relation.  Something like
> >
> > Append
> >---> Funnel
> >   --> SeqScan rel1
> >   --> SeqScan rel2
> >
> If Funnel can handle both of horizontal and vertical parallelism,
> it is a great simplification. I never stick a new node.
> 
> Once Funnel get a capability to have multiple child nodes, probably,
> Append node above will have gone. I expect set_append_rel_pathlist()
> add two paths based on Append and Funnel, then planner will choose
> the cheaper one according to its cost.
>
In the latest v16 patch, Funnel is declared as follows:

  typedef struct Funnel
  {
  Scanscan;
  int num_workers;
  } Funnel;

If we try to add Append capability here, I expects the structure will
be adjusted as follows, for example:

  typedef struct Funnel
  {
  Scanscan;
  List   *funnel_plans;
  List 

Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/27/2015 03:05 PM, Stephen Frost wrote:
> AFK at the moment, but my thinking was that we should avoid having
> the error message change based on what a GUC is set to. I agree
> that there should be comments which explain that.

I changed back to using GetUserId() for the call to check_enable_rls()
at those three call sites, and added to the comments to explain why.

While looking at ri_ReportViolation() I spotted what I believe to be a
bug in the current logic -- namely, has_perm is initialized to true,
and when check_enable_rls() returns RLS_ENABLED we never reset
has_perm to false, and thus leak info even though the comments claim
we don't. I fixed that here, but someone please take a look and
confirm I am reading that correctly.

Beyond that, any additional comments?

Thanks,

Joe

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVtuadAAoJEDfy90M199hl67kQAJw9iekYVAm52+kOxmBi8YLK
NMoRUWLv5AZ7coaltQBBTiYYbjWHVKoWaMrOg2OjtxjyeshYaZ+xsBQl4umznI9j
b2unSfUmRPcCgy7O6R63+IXePh6krKowlMAIkSelYjvV05nSgQfy87/xjcBXS12r
MajLambTyJycS8fQXdt9sG8uGZh7ncXUtip3mUOYgl9Omn5GiDcgbdV1xQR+GBRJ
48S9lTHIJenpvi83Y9/7CXfDwxdcvkziUkR67UL4jxqmjWBDrrGZWEWOE1KOn78W
dIvItOnuw/tKoxmhcwkgys+u92uhfQUUwhDQsJRVKsqzvPcVt6Oh15rtlqipbYEn
YfcM35Sa4sUtxL9JIoyof+8audagPy9nzD26c4jA2A7EWXHt8Dim/z7D5RgrOpdn
xBqlwViuR5Zt+kLynf3aZyDtmaMIRA+tvzJPam1vrl7g86LCJbZslFNktveiGeYl
17+PKLTDcVO5f6CdT1NTnlaks0J7D4VThxGemDs09KX6P8iCe6VFaUqfEONpjb41
WsumlDJkT9bu5i8W68xtEskXBYgBmDCo6yho4bKn/f6tpHc10yyh8RpK48P5xPt9
ZLSTLmYkfLL7wsINw6eNrQ4OZCtWwiydD9CmjQZOzscyBBusOvlIcI9Kfrle0I1V
r2gQN651WyY4YJIoEggu
=hlUr
-END PGP SIGNATURE-
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 98d1497..fd82ea4 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SET search_path TO schemaboolean
 does current user have privilege for role

+   
+row_security_active(table)
+
+boolean
+does current user have row level security active for table
+   
   
  
 
*** SET search_path TO schema
  pg_has_role
 
+
+ row_security_active
+
  
 
  has_table_privilege checks whether a user
*** SELECT has_function_privilege('joeuser',
*** 15462,15467 
--- 15471,15483 
  are immediately available without doing SET ROLE.
 
  
+
+ row_security_active checks whether row level
+ security is active for the specified table in the context of the
+ current_user and environment. The table can
+ be specified by name or by OID.
+
+ 

  shows functions that
 determine whether a certain object is visible in the
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 1043362..2797c56 100644
*** a/src/backend/access/index/genam.c
--- b/src/backend/access/index/genam.c
*** BuildIndexValueDescription(Relation inde
*** 203,209 
  	indrelid = idxrec->indrelid;
  	Assert(indexrelid == idxrec->indexrelid);
  
! 	/* RLS check- if RLS is enabled then we don't return anything. */
  	if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED)
  	{
  		ReleaseSysCache(ht_idx);
--- 203,213 
  	indrelid = idxrec->indrelid;
  	Assert(indexrelid == idxrec->indexrelid);
  
! 	/*
! 	 * RLS check - if RLS is enabled then we don't return anything.
! 	 * Explicitly pass GetUserId() to ensure the result is not
! 	 * dependent on the value of row_security for users with BYPASSRLS.
! 	 */
  	if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED)
  	{
  		ReleaseSysCache(ht_idx);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..c0bd6fa 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_indexes AS
*** 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
--- 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats WITH (security_barrier) AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
*** CREATE VIEW pg_stats AS
*** 211,217 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
   LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
! WHERE NOT attisdropped AND has_column_privilege(c.oid, a.attnum, 'select');
  
  REVOKE ALL on pg_statistic FROM public;
  
--- 211,219 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND att

Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented

2015-07-27 Thread Michael Paquier
On Tue, Jul 28, 2015 at 1:15 AM, Andrew Dunstan  wrote:
> Well, it does create a lot of files that we don't pick up. An example list
> is show below, and I am attaching their contents in a single gzipped
> attachment. However, these are in the wrong location. This was a vpath build
> and yet these tmp_check directories are all created in the source tree.
> Let's fix that and then I'll set about having the buildfarm collect them.
> That should get us further down the track.
>
> [log list]

The patch attached fixes that. I suggest that we use env{TESTDIR}/log
as a location for the logs so as even a vpath build will locate
correctly the log files.
-- 
Michael


20150728_tap_logs_vpath.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread David Rowley
On 27 July 2015 at 20:11, Heikki Linnakangas  wrote:

> On 07/27/2015 08:34 AM, David Rowley wrote:
>
>> - * agg_input_types, agg_state_type, agg_result_type identify the input,
>> - * transition, and result types of the aggregate.  These should all be
>> - * resolved to actual types (ie, none should ever be ANYELEMENT etc).
>> + * agg_input_types identifies the input types of the aggregate.  These
>> should
>> + * be resolved to actual types (ie, none should ever be ANYELEMENT etc).
>>
>> I'm not sure I understand why agg_state_type and agg_result_type have
>> changed here.
>>
>
> The function no longer has the agg_result_type argument, but the removal
> of agg_state_type from the comment was a mistake.


I've put agg_state_type back in the attached delta which is again based on
your version of the patch.


>
>
>  + peraggstate->sortstates = (Tuplesortstate **)
>> + palloc0(sizeof(Tuplesortstate *) * numGroupingSets);
>> + for (currentsortno = 0; currentsortno < numGroupingSets;
>> currentsortno++)
>> + peraggstate->sortstates[currentsortno] = NULL;
>>
>> This was not you, but this NULL setting looks unneeded due to the
>> palloc0().
>>
>
> Yeah, I noticed that too. Ok, let's take it out.
>
>
Removed in attached.


>  In this function I also wasn't quite sure if it was with comparing both
>> non-NULL INITCOND's here. I believe my code comments may slightly
>> contradict what the code actually does, as the comments talk about them
>> having to match, but the code just bails if any are non-NULL. The reason I
>> didn't check them was because it seems inevitable that some duplicate work
>> needs to be done when setting up the INITCOND. Perhaps it's worth it?
>>
>
> It would be nice to handle non-NULL initconds. I think you'll have to
> check that the input function isn't volatile. Or perhaps just call the
> input function, and check that the resulting Datum is byte-per-byte
> identical, although that might be awkward to do with the current code
> structure.
>
>
I've not done anything with this.
I'd not thought of an input function being volatile before, but I guess
it's possible, which makes me a bit scared that we could be treading on
ground we shouldn't be. I know it's more of an output function thing than
an input function thing, but a GUC like extra_float_digits could cause
problems here.

In summary, I'm much less confident it's safe to enable the optimisation in
this case.



>  BTW, the name of the AggStatePerAggStateData struct is pretty horrible.
>>> The repeated "AggState" feels awkward. Now that I've stared at the patch
>>> for a some time, it doesn't bother me anymore, but it took me quite a
>>> while
>>> to over that. I'm sure it will for others too. And it's not just that
>>> struct, the comments talk about "aggregate state", which could be
>>> confused
>>> to mean "AggState", but it actually means AggStatePerAggStateData. I
>>> don't
>>> have any great suggestions, but can you come up a better naming scheme?
>>>
>>
>> I agree, they're horrible. The thing that's causing the biggest problem is
>> the struct named AggState, which carries state for *all* aggregates, and
>> has nothing to do with "transition state", so it seems there's two
>> different meanings if the word "state" and I've used both meanings in the
>> name for AggStatePerAggStateData.
>>
>> Perhaps just renaming AggStatePerAggStateData to AggStateTransStateData
>> would be good enough?
>>
>
> Hmm. I think it should be "AggStatePerTransData" then, to keep the same
> pattern as AggStatePerAggData and AggStatePerGroupData.
>
>
Sounds good. I've renamed it to that in the attached delta patch.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


sharing_aggstate-heikki-1_delta2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planner debug views

2015-07-27 Thread Qingqing Zhou
On Thu, Jul 23, 2015 at 4:11 PM, Tatsuo Ishii  wrote:
> Sounds like a great feature!
>

Thanks!

Attached is a draft patch implementing the idea. To play with it, you
shall create the follow two foreign tables:
CREATE EXTENSION file_fdw;
CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;
create foreign table pg_planner_rels(rel text, content text)server
pglog options(filename '/data/debug_planner_relopt.csv',
format 'csv');
create foreign table pg_planner_paths(rel text, path text, replacedby
text, reason int, startupcost float, totalcost float, cheapest text,
innerp text, outerp text, content text) server pglog options(filename
'/data/debug_planner_paths.csv', format 'csv');
Example output attached.

Questions:
1. Which document shall we update? This is more than existing
debug_print_ knobs.
2. GEQO is not supported yet. I would suggest we do that with a
separate check in.
3. Where do we want to put the csv files? Currently I just put them under /data.
4. Do we want to push these two foreign tables into system_view.sql?
One problem is that foreign table needs a absolute path. Any way to
handle this?
5. As the output is csv file: I wrap strings with '"' but not sure
within the string itself if there any. Do we have any guarantee here?

Thanks,
Qingqing

---

postgres=# select p.rel, p.path, p.replacedby, p.reason,
p.startupcost, p.totalcost, p.cheapest, p.innerp, p.outerp,
substr(p.content, 1,30),r.content from pg_planner_paths p join
pg_planner_rels r on p.rel=r.rel;
rel|   path| replacedby | reason | startupcost | totalcost
|   cheapest   |  innerp   |  outerp   | substr
 |content
---+---+++-+---+--+---+---++
 0x2791a10 | 0x279d4b0 |||   0 |  40.1
| +total+startup+param |   |   | ForeignScan(1)
rows=301 cost=0 | RELOPTINFO (1): rows=301 width=244
 0x279f998 | 0x27a2238 |||   0 |   1.1
| +total+startup+param |   |   | ForeignScan(1) rows=1
cost=0.0 | RELOPTINFO (1): rows=1 width=244
 0x279fbd0 | 0x27a28b8 |||   0 |   1.1
| +total+startup+param |   |   | ForeignScan(2) rows=1
cost=0.0 | RELOPTINFO (2): rows=1 width=64
 0x27a2ab0 | 0x27a3c68 |||   0 |  2.21
| +total+startup+param | 0x27a28b8 | 0x27a2238 | NestLoop(1 2) rows=1
cost=0.00 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a4608 | 0x27a4608  |  2 |1.11 |  2.23
|  | 0x27a2238 | 0x27a28b8 | HashJoin(1 2) rows=1
cost=1.11 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a4498 | 0x27a4498  |  0 |   0 |  2.22
|  | 0x27a4330 | 0x27a28b8 | NestLoop(1 2) rows=1
cost=0.00 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a4388 | 0x27a4388  |  0 |   0 |  2.21
|  | 0x27a2238 | 0x27a28b8 | NestLoop(1 2) rows=1
cost=0.00 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a4220 | 0x27a4220  |  2 |2.22 |  2.25
|  | 0x27a2238 | 0x27a28b8 | MergeJoin(1 2) rows=1
cost=2.2 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a3f90 | 0x27a3f90  |  2 |1.11 |  2.23
|  | 0x27a28b8 | 0x27a2238 | HashJoin(1 2) rows=1
cost=1.11 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a3e20 | 0x27a3e20  |  0 |   0 |  2.22
|  | 0x27a3c10 | 0x27a2238 | NestLoop(1 2) rows=1
cost=0.00 | RELOPTINFO (1 2): rows=1 width=308
 0x27a2ab0 | 0x27a3b18 | 0x27a3c68  |  1 |2.22 |  2.25
|  | 0x27a28b8 | 0x27a2238 | MergeJoin(1 2) rows=1
cost=2.2 | RELOPTINFO (1 2): rows=1 width=308


0002-local-change.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more RLS oversights

2015-07-27 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/27/2015 01:13 PM, Alvaro Herrera wrote:
> Hmm, these are not ACL objects, so conceptually it seems cleaner to
> use a different symbol for this.  I think the catalog state and the
> error messages would be a bit confusing otherwise.

Ok -- done

> Isn't this leaking the previously allocated array?  Not sure it's
> all that critical, but still.  (I don't think you really need to
> call palloc at all here.)

Agreed -- I think the attached is a bit cleaner.

Any other comments or concerns?

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVts3+AAoJEDfy90M199hlzTIQAJ6LEOrEEhHEjsoVvCT6TUAu
pMQl/LWmb0s3/vF9o5VFTpVd21k0LxcggD+DdbxSagiw1WpLK5x67C9Lj5uuFn/d
3a95nFnQje3ciQJaAKS4XcGyx8+6rHPZqfBRC1rARbLuDHrwpKqmbKwvpQCud4xN
kD7OolYk5Gd3cId0xH0XBHuwLVJg4Bt/MAexrcHn+kXuQoF8V6iOjnmBI/BHvTQy
w4j4ov7DpWSAR1ZiCTCkF2ZuNd9TJ8FmAEtSDVrlWMxB9J+9PU5oTfD50jp62BTz
wycANVYmbpCyZ7DAAiqopt3IQFIiF/1bYwzWH3/M8RRMKJF8HNyE8KBPDyC/doO5
0c0poCugJI2wOhumLGJShizgSAS85vqijh2Uxpp2yQxdStMnADykzT4Nb5EnEJVE
i7OKy6w+2xyilfFGEWhHfS7uo5Y0zBorhpjgT9fRaqPBGoK4jYwZoO8w97SUd9aS
kNXOCfmFxvcDFSZIYZP77pOeJZR2dLCbr+X9bF1Fz5S4FVkgCXVOp9rmsqzgxoDh
lcpkDh9zPPezdczRkfq/qCf0lmzGg8apdqdr7+g8DxU+01OvPspEdpovQQN0HXlM
m5Kbny4KCWhAgBTyAsOFTEy6lK7u4KsHV1cee3bG+ev65czbQ14gGRMJc/Lhm6Gg
Apcn/UcF14vLWxYVo6lS
=pS6L
-END PGP SIGNATURE-
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9096ee5..7781c56 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 5793,5798 
--- 5793,5808 
  
  
  
+  SHARED_DEPENDENCY_POLICY (r)
+  
+   
+The referenced object (which must be a role) is mentioned as the
+target of a dependent policy object.
+   
+  
+ 
+ 
+ 
   SHARED_DEPENDENCY_PIN (p)
   

diff --git a/src/backend/catalog/pg_shdepend.c b/src/backend/catalog/pg_shdepend.c
index 34fe4e2..43076c9 100644
*** a/src/backend/catalog/pg_shdepend.c
--- b/src/backend/catalog/pg_shdepend.c
*** storeObjectDescription(StringInfo descs,
*** 1083,1088 
--- 1083,1090 
  appendStringInfo(descs, _("owner of %s"), objdesc);
  			else if (deptype == SHARED_DEPENDENCY_ACL)
  appendStringInfo(descs, _("privileges for %s"), objdesc);
+ 			else if (deptype == SHARED_DEPENDENCY_POLICY)
+ appendStringInfo(descs, _("target of %s"), objdesc);
  			else
  elog(ERROR, "unrecognized dependency type: %d",
  	 (int) deptype);
diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 17b48d4..9544f75 100644
*** a/src/backend/commands/policy.c
--- b/src/backend/commands/policy.c
***
*** 22,27 
--- 22,28 
  #include "catalog/indexing.h"
  #include "catalog/namespace.h"
  #include "catalog/objectaccess.h"
+ #include "catalog/pg_authid.h"
  #include "catalog/pg_policy.h"
  #include "catalog/pg_type.h"
  #include "commands/policy.h"
***
*** 48,54 
  static void RangeVarCallbackForPolicy(const RangeVar *rv,
  		  Oid relid, Oid oldrelid, void *arg);
  static char parse_policy_command(const char *cmd_name);
! static ArrayType *policy_role_list_to_array(List *roles);
  
  /*
   * Callback to RangeVarGetRelidExtended().
--- 49,55 
  static void RangeVarCallbackForPolicy(const RangeVar *rv,
  		  Oid relid, Oid oldrelid, void *arg);
  static char parse_policy_command(const char *cmd_name);
! static Datum *policy_role_list_to_array(List *roles, int *num_roles);
  
  /*
   * Callback to RangeVarGetRelidExtended().
*** parse_policy_command(const char *cmd_nam
*** 130,159 
  
  /*
   * policy_role_list_to_array
!  *	 helper function to convert a list of RoleSpecs to an array of role ids.
   */
! static ArrayType *
! policy_role_list_to_array(List *roles)
  {
! 	ArrayType  *role_ids;
! 	Datum	   *temp_array;
  	ListCell   *cell;
- 	int			num_roles;
  	int			i = 0;
  
  	/* Handle no roles being passed in as being for public */
  	if (roles == NIL)
  	{
! 		temp_array = (Datum *) palloc(sizeof(Datum));
! 		temp_array[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);
  
! 		role_ids = construct_array(temp_array, 1, OIDOID, sizeof(Oid), true,
!    'i');
! 		return role_ids;
  	}
  
! 	num_roles = list_length(roles);
! 	temp_array = (Datum *) palloc(num_roles * sizeof(Datum));
  
  	foreach(cell, roles)
  	{
--- 131,158 
  
  /*
   * policy_role_list_to_array
!  *	 helper function to convert a list of RoleSpecs to an array of
!  *	 role id Datums.
   */
! static Datum *
! policy_role_list_to_array(List *roles, int *num_roles)
  {
! 	Datum	   *role_oids;
  	ListCell   *cell;
  	int			i = 0;
  
  	/* Handle no roles being passed in as being for public */
  	if (roles == NIL)
  	{
! 	   *num_roles = 1;
! 		role_oids = (Datum *) palloc(*num_roles * sizeof(Datum));
! 		role_oids[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);
  
! 		return role_oids;
  	}
  
!*num_roles = li

Re: [HACKERS] Several memory leaks for pg_rewind caused by missing PQclear calls

2015-07-27 Thread Michael Paquier
On Tue, Jul 28, 2015 at 2:45 AM, Heikki Linnakangas  wrote:
> On 07/23/2015 05:08 PM, Michael Paquier wrote:
>>
>> Hi all,
>>
>> After a run of valgrind on pg_rewind, I found a couple of code paths
>> missing some PQclear calls after running a query. Attached is a patch
>> to fix all those leaks.
>
>
> Fixed, thanks.

Thanks for taking care of this.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Jim Nasby

On 7/27/15 6:40 PM, Jim Nasby wrote:

On 7/27/15 5:12 PM, Joel Jacobson wrote:

Right now, when writing a function, if you raise an exception, you can
be sure all writes you have made will be rollbacked, but your caller
function might caught the exception and decide to carry on and commit
work made before your function was called, but at least you can be
confident your writes won't be committed as long as you don't caught the
exception you raised in your own function. If I understand it correctly,
that would change with the addition of Autonomous Transaction, unless
given a way to prevent a function you call from starting and commiting
a Autonomous Transaction. Wrong? If so, then please show how to prevent
Y() from commiting the "Data write B1" in your example, I don't get it.


What's being described here doesn't make sense in either use case ([1] &
[2]), but I do understand the concern about what 3rd party software is
doing. It would be nice to have the ability to disallow and/or disable
autonomous transactions, but I don't see a practical way of doing that
other than introducing a new GUC. I'm not sure if it's worth that effort.


It just occurred to me that another option would be to have an event 
trigger for beginning an autonomous transaction.



[1] the "batch process" use case: batches that still hold their own
transaction open don't gain anything.

[2] the "audit logging" case. If you didn't care about auditing
surviving regardless of a rollback then you wouldn't go to the extra
work of an autonomous transaction to begin with.



--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Jim Nasby

On 7/27/15 5:56 PM, Josh Berkus wrote:

Can you explain what use case you have where simply telling the staff
"if you use ATX without clearing it, you'll be fired" is not sufficient?
  Possibly there's something we failed to account for in the unconference
discussion.


That there's no way to enforce that, short of hand-auditing code? 
There's already enough things that are difficult/impossible to enforce, 
I'd rather not add another one.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Jim Nasby

On 7/27/15 5:12 PM, Joel Jacobson wrote:

Right now, when writing a function, if you raise an exception, you can
be sure all writes you have made will be rollbacked, but your caller
function might caught the exception and decide to carry on and commit
work made before your function was called, but at least you can be
confident your writes won't be committed as long as you don't caught the
exception you raised in your own function. If I understand it correctly,
that would change with the addition of Autonomous Transaction, unless
given a way to prevent a function you call from starting and commiting
a Autonomous Transaction. Wrong? If so, then please show how to prevent
Y() from commiting the "Data write B1" in your example, I don't get it.


What's being described here doesn't make sense in either use case ([1] & 
[2]), but I do understand the concern about what 3rd party software is 
doing. It would be nice to have the ability to disallow and/or disable 
autonomous transactions, but I don't see a practical way of doing that 
other than introducing a new GUC. I'm not sure if it's worth that effort.


[1] the "batch process" use case: batches that still hold their own 
transaction open don't gain anything.


[2] the "audit logging" case. If you didn't care about auditing 
surviving regardless of a rollback then you wouldn't go to the extra 
work of an autonomous transaction to begin with.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization idea: merging multiple EXISTS(...) with constraint-based join removal

2015-07-27 Thread David Rowley
On 28 July 2015 at 09:37, Frédéric TERRAZZONI  wrote:

>
> SELECT * FROM t1
> WHERE EXISTS(
> SELECT 1 FROM t2, t3, t4
> WHERE t2.id = t1.t2_id
> AND t3.id = t2.t3_id
> AND t4.id = t3.t4_id
> AND t4.val = 'XYZ'
> ) AND EXISTS(
> SELECT 1 FROM t2, t3, t5
> WHERE t2.id = t1.t2_id
> AND t3.id = t2.t3_id
> AND t5.id = t3.t5_id
> AND t5.val = 'Blablabla'
> ) AND EXISTS(
> SELECT 1 FROM t6
> WHERE t6.id = t1.t6_id
> AND t6.val = 'Hello'
> )
>
> ...


>
> The resulting query is:
>
> SELECT * FROM t1
> WHERE EXISTS(
> SELECT 1 FROM t2 t2_a, t3 t3_a, t4 t4_a, t2 t2_b, t3 t3_b, t5,
> t6
> WHERE t2_a.id = t1.t2_id
> AND t3_a.id = t2_a.t3_id
> AND t4_a.id = t3_a.t4_id
> AND t4_a.val = 'XYZ'
> AND t2_b.id = t1.t2_id
> AND t3_b.id = t2_b.t3_id
> AND t5.id = t3_b.t5_id
> AND t5.val = 'Blablabla'
> AND t6.id = t1.t6_id
> AND t6.val = 'Hello'
> )
>
> My questions are:
> - Does PostgreSQL already supports this optimization ? Maybe it's not
> enabled in my case only?
>

No, there's nothing which supports that currently.


> - If yes, is my reasoning incorrect ? Can you point me my mistake ?
>

It sounds reasonable to me.


> - Otherwise is there any plan to add this optimization to PostgreSQL ?
>
>
I did propose the idea here
http://www.postgresql.org/message-id/CAApHDvopmWq4i2BCf0VqU4mYmxzHCYwfnUMat9TWuKzdvo7=8...@mail.gmail.com
but I didn't focus just with semi-joins. Without re-reading, I think I was
talking about any join that could be proved to not duplicate rows could be
"consolidated".

I do believe that this optimisation would be useful in more cases than most
people might think, for example:

UPDATE t1 SET col1 = (SELECT col1 FROM t2 WHERE t1.id=t2.id), col2 =
(SELECT col2 FROM t2 WHERE t1.id=t2.id), ...;

Of course, this query could have been written using UPDATE/FROM,
(non-standard), or UPDATE t1 SET (col1,col2) = (SELECT ...), which was only
added recently.

There's also the case of the view which just has 1 column missing, so the
consumer joins a table that's already been joined to in the view.

I think it would be quite nice to have this, and I don't think it would be
all that expensive for the planner to detect this.

I think the planner would have to do something like:

1. Scan simple_rte_array looking for relids which are the same as another
entry's.
2. If found, is the join condition the same as the other one?
3. Is there a unique index to prove that joining to this does not duplicate
rows, or is it a semi-join?
4. Remove relation and replace all Vars from the removed relation with the
one from the other table, mark relation as REL_DEAD.

I think 1 is quite cheap to perform, so normal queries wouldn't suffer much
of a slow-down from these extra checks, as most queries won't have self
joins.

Are you thinking of working on it?

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread David Rowley
On 28 July 2015 at 09:17, Andres Freund  wrote:

> Hi,
>
> I recently once more noticed that timestamptz_out is really, really
> slow. To quantify that, I created a bunch of similar sized tables:
>
> CREATE TABLE tbl_timestamp AS SELECT NOW() FROM generate_series(1, 10)
> a, generate_series(1, 100) b;
> CREATE TABLE tbl_int8 AS SELECT 1::bigint FROM generate_series(1,  10)
> a, generate_series(1, 100) b;
> CREATE TABLE tbl_bytea AS SELECT '   '::bytea FROM generate_series(1,
> 10) a, generate_series(1, 100) b;
>
> These all end up being 346MB large.
>
> COPY tbl_bytea TO '/dev/null';
> Time: 1173.484 ms
> COPY tbl_int8 TO '/dev/null';
> Time: 1030.756 ms
> COPY tbl_timestamp TO '/dev/null';
> Time: 6598.030
>
> (all best of three)
>
> Yes, timestamp outputs more data as a whole, but being 5 times as slow
> is still pretty darn insane. To make sure that's not the cause, here's
> another table:
>
> CREATE TABLE tbl_timestamptext AS SELECT NOW()::text FROM
> generate_series(1, 10) a, generate_series(1, 100) b;
> COPY tbl_timestamptext TO '/dev/null';
> Time: 1449.554 ms
>
> So it's really just the timestamp code.
>
>
> Profiling it shows:
>   Overhead  Command Shared Object Symbol
>   -   38.33%  postgres_stock  libc-2.19.so  [.] vfprintf
>  - 97.92% vfprintf
>   _IO_vsprintf
> - sprintf
>+ 70.25% EncodeDateTime
>+ 29.75% AppendSeconds.constprop.10
>  + 1.11% _IO_default_xsputn
>   -8.22%  postgres_stock  libc-2.19.so  [.] _IO_default_xsputn
>  - 99.43% _IO_default_xsputn
> - 90.09% vfprintf
>  _IO_vsprintf
>- sprintf
>   + 74.15% EncodeDateTime
>   + 25.85% AppendSeconds.constprop.10
> + 9.72% _IO_padn
>  + 0.57% vfprintf
>   +   7.76%  postgres_stock  postgres_stock[.] CopyOneRowTo
>
> So nearly all the time is spent somewhere inside the sprintf calls. Not
> nice.
>
> The only thing I could come up to make the sprintfs cheaper was to
> combine them into one and remove some of the width specifiers that
> aren't needed. That doesn't buy us very much.
>
> I then proceeded to replace the sprintf call with hand-rolled
> conversions. And wow, the benefit is far bigger than I'd assumed:
> postgres[7236][1]=# COPY tbl_timestamp TO '/dev/null';
> Time: 2430.521 ms
>
> So, by hand-rolling the ISO conversion in EncodeDateTime() we got a
> ~250% performance improvement. I'd say that's worthwhile.
>
> The attached patch shows what I did. While there's some polishing
> possible, as a whole, it's pretty ugly. But I think timestamp data is so
> common that it's worth the effort.
>
> Does anybody have a fundamentally nicer idea than the attached to
> improvide this?
>

It won't be quite as fast as what you've written, but I think it will be
much neater and more likely to be used in other places if we invent a
function like pg_ltoa() which returns a pointer to the new end of string.

Also if we're specifying padding with zeros then we can skip the reverse
part that's in pg_ltoa(), (normally needed since the numeric string is
build in reverse)

The code could then be written as:

str = pg_int2str_pad(str, year, 4);
*str++ = '-';
str = pg_int2str_pad(str, tm->tm_mon, 2);
*str++ = '-';
str = pg_int2str_pad(str, tm->tm_mday, 2);

etc

I've used this method before and found it to be about 10 times faster than
snprintf(), but I was reversing the string, so quite likely it be more than
10 times.

I'm interested to see how much you're really gaining by manually unrolling
the part that builds the fractional part of the second.

We could just build that part with: (untested)

if (fsec != 0)
{
int fseca = abs(fsec);
while (fseca % 10 == 0 && fseca > 0)
fseca /= 10;
 *str++ = '.';
str = pg_int2str(str, fseca);
}

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Josh Berkus
On 07/27/2015 03:12 PM, Joel Jacobson wrote:
> On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus  > wrote:
> 
> Ah, you're missing how commits in ATX are expected to work.  Let me
> illustrate:
> 
> X (
>Data write A1
>call Y(
> Start ATX
> Data write B1
> Commit ATX
>)
>Data write A2
>Exception
> )
> 
> In this workflow, B1 would be committed and persistent. Neither A1 nor
> A2 would be committed, or visible to other users.  Depending on what
> implementation we end up with, A1 might not even be visible to Y().
> 
> So that solves your use case without any need to "block" ATXs in called
> functions.  However, it leads to some interesting cases involving
> self-deadlocks; see the original post on this thread.
> 
> 
> I don't follow. In your example above, if I'm X(), how do I ensure Y()
> won't have committed anyting at all when I later at "Exception" decide
> to rollback everything from "Data write A1" to "Data write A2" including
> any writes made by Y() (in the example "Data write B1")?

Ah, ok.  The goal of the project is that the writer of X() *cannot*
prevent Y() from writing its data (B1) and committing it.

One of the primary use cases for ATX is audit triggers.  If a function
writer could override ATX and prevent the audit triggers from
committing, then that use case would be violated.

Can you explain what use case you have where simply telling the staff
"if you use ATX without clearing it, you'll be fired" is not sufficient?
 Possibly there's something we failed to account for in the unconference
discussion.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Jim Nasby

On 7/27/15 10:39 AM, Robert Haas wrote:

But that's not enough: we also need to know that any tuple that
survived the prune operation (that is, it wasn't redirected or marked
unused) has a new-enough xmin, which isn't tracked anywhere.


Wouldn't that be old-enough xmin?

heap_prune_chain() is already calling HeapTupleSatisfiesVacuum, so it 
should be able to figure out if the page is all visible without a lot of 
extra work, and pass that info back to heap_page_prune (which would then 
pass it down to _execute()).


Definitely not a one-liner though.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Joel Jacobson
On Mon, Jul 27, 2015 at 11:49 PM, Josh Berkus  wrote:

> Ah, you're missing how commits in ATX are expected to work.  Let me
> illustrate:
>
> X (
>Data write A1
>call Y(
> Start ATX
> Data write B1
> Commit ATX
>)
>Data write A2
>Exception
> )
>
> In this workflow, B1 would be committed and persistent. Neither A1 nor
> A2 would be committed, or visible to other users.  Depending on what
> implementation we end up with, A1 might not even be visible to Y().
>
> So that solves your use case without any need to "block" ATXs in called
> functions.  However, it leads to some interesting cases involving
> self-deadlocks; see the original post on this thread.
>
>
I don't follow. In your example above, if I'm X(), how do I ensure Y()
won't have committed anyting at all when I later at "Exception" decide to
rollback everything from "Data write A1" to "Data write A2" including any
writes made by Y() (in the example "Data write B1")?

I understand the "Exception" will take care of rollbacking my (X's) writes,
but that's not sufficient if you want to make sure you rollback
*everything*, including any writes made by functions you call.

Right now, when writing a function, if you raise an exception, you can be
sure all writes you have made will be rollbacked, but your caller function
might caught the exception and decide to carry on and commit work made
before your function was called, but at least you can be confident your
writes won't be committed as long as you don't caught the exception you
raised in your own function. If I understand it correctly, that would
change with the addition of Autonomous Transaction, unless given a way to
prevent a function you call from starting and commiting a Autonomous
Transaction. Wrong? If so, then please show how to prevent Y() from
commiting the "Data write B1" in your example, I don't get it.


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Stephen Frost
On Monday, July 27, 2015, Dean Rasheed  wrote:

> On 27 July 2015 at 22:36, Joe Conway >
> wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > On 07/27/2015 01:25 PM, Dean Rasheed wrote:
> >> Looks good to me, except I'm not sure about those latest changes
> >> because I don't understand the reasoning behind the logic in
> >> check_enable_rls() when row_security is set to OFF.
> >>
> >> I would expect that if the current user has permission to bypass
> >> RLS, and they have set row_security to OFF, then it should be off
> >> for all tables that they have access to, regardless of how they
> >> access those tables (directly or through a view). If it really is
> >> intentional that RLS remains active when querying through a view
> >> not owned by the table's owner, then the other calls to
> >> check_enable_rls() should probably be left as they were, since the
> >> table might have been updated through such a view and that code
> >> can't really tell at that point.
> >
> > After looking again at those three call sites, I'm now on the fence.
> > All three are in functions which are trying to avoid leaking info in
> > error messages. If we leave the original coding, then the messages
> > will remain the same for a given user regardless of the row_security
> > setting, whereas if we change them as in my latest patch, the messages
> > will be different depending on row_security for users with BYPASSRLS.
> >
> > I think if we do the former (original coding) there should probably be
> > a comment added to indicate we are doing that intentionally.
> >
> > Thoughts?
> >
>
> I could go either way on that, but I don't think it's too serious to
> worry about leaking information to users with BYPASSRLS.
>

AFK at the moment, but my thinking was that we should avoid having the
error message change based on what a GUC is set to. I agree that there
should be comments which explain that.

Thanks!

Stephen


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Dean Rasheed
On 27 July 2015 at 22:36, Joe Conway  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 07/27/2015 01:25 PM, Dean Rasheed wrote:
>> Looks good to me, except I'm not sure about those latest changes
>> because I don't understand the reasoning behind the logic in
>> check_enable_rls() when row_security is set to OFF.
>>
>> I would expect that if the current user has permission to bypass
>> RLS, and they have set row_security to OFF, then it should be off
>> for all tables that they have access to, regardless of how they
>> access those tables (directly or through a view). If it really is
>> intentional that RLS remains active when querying through a view
>> not owned by the table's owner, then the other calls to
>> check_enable_rls() should probably be left as they were, since the
>> table might have been updated through such a view and that code
>> can't really tell at that point.
>
> After looking again at those three call sites, I'm now on the fence.
> All three are in functions which are trying to avoid leaking info in
> error messages. If we leave the original coding, then the messages
> will remain the same for a given user regardless of the row_security
> setting, whereas if we change them as in my latest patch, the messages
> will be different depending on row_security for users with BYPASSRLS.
>
> I think if we do the former (original coding) there should probably be
> a comment added to indicate we are doing that intentionally.
>
> Thoughts?
>

I could go either way on that, but I don't think it's too serious to
worry about leaking information to users with BYPASSRLS.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Merlin Moncure
On Mon, Jul 27, 2015 at 4:41 PM, Joel Jacobson  wrote:
> On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure  wrote:
>>
>> On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus  wrote:
>> > Batch Jobs: large data-manipulation tasks which need to be broken up
>> > into segments, with each segment committing separately.  Example:
>> > updating 1 million records in batches of 1000.
>>
>> Autonomous transactions are not a good fit for this case; stored
>> procedures are a better way to go for any scenario where you don't
>> want be be in a snapshot (for example, suppose you want to change
>> isolation level on the fly).
>
>
> Hm, you mean we need real "stored procedures" in PostgreSQL and not just
> "functions"?

Yes, exactly.

Autonomous transactions aren't really set up for cases where the
function runs for a very long time or indefinitely.  This is the
'advancing xmin' problem as Josh puts it but I think the problem is
much bigger than that.  Anyways, this is mostly irrelevant to
autonomous transactions as long as the design isn't extended to try
and cover that case.

Is the Autonomous Transaction feature only going to be exposed through pl/pgsql?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics / patch v7

2015-07-27 Thread Tomas Vondra

Hello Horiguchi-san,

On 07/27/2015 09:04 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Sat, 25 Jul 2015 23:09:31 +0200, Tomas Vondra  wrote 
in <55b3fb0b.7000...@2ndquadrant.com>

Hi,

On 07/16/2015 01:51 PM, Kyotaro HORIGUCHI wrote:

Hi, I'd like to show you the modified constitution of
multivariate statistics application logic. Please find the
attached. They apply on your v7 patch.


Sadly I do have some trouble getting it to apply correctly :-(
So for now all my comments are based on just reading the code.


Ah. My modification to rebase to the master for the time should
be culprit. Sorry for the dirty patch.

# I would recreate the patch if you complained before struggling
# with the thing..

The core of the modification is on closesel.c. I attached the
patched closesel.c.


I don't see any attachment. Perhaps you forgot to actually attach it?



My concern about the code at the time was following,

- You embedded the logic of multivariate estimation into
   clauselist_selectivity. I think estimate using multivariate
   statistics is quite different from the ordinary estimate based
   on single column stats then they are logically separatable and
   we should do so.


I don't see them as very different, actually quite the opposite. The two 
kinds of statistics are complementary and should naturally coexist. 
Perhaps the current code is not perfect and a refactoring would make the 
code more readable, but I don't think it's primary aim should be to 
separate regular and multivariate stats.




- You are taking top-down approach and it runs tree-walking to
   check appliability of mv-stats for every stepping down in
   clause tree. If the subtree found to be mv-applicable, split it
   to two parts - mv-compatible and non-compatible. These steps
   requires expression tree walking, which looks using too-much
   CPU.


I'm taking top-down approach because that's what the regular stats do, 
and also because that's what allows implementing the features that I 
think are interesting - ability to combine multiple stats in an 
efficient way, pass conditions and such. I think those two features are 
very useful and allow very interesting things.


The bottom-up would work too, probably - I mean, we could start from 
leaves of the expression tree, and build the largest "subtree" 
compatible with multivariate stats and then try to estimate it. I don't 
see how we could pass conditions though, which works naturally in the 
top-down approach.


Or maybe a combination of both - identify the "compatible" subtrees 
first, then perform the top-down phase.



- You look to be considering the cases when users create many
   multivariate statistics on attribute sets having
   duplications. But it looks too-much for me. MV-stats are more
   resource-eating so we can assume the minimum usage of that.


Not really. I don't expect huge numbers of multivariate stats to be 
built on the tables.


But I think restricting the users to use a single multivariate 
statistics per table would be a significant limitation. And once you 
allow using multiple multivariate statistics for the set of clauses, 
supporting over-lapping stats is not that difficult.


What it however makes possible is combining multiple "small" stats into 
a larger one in a very efficient way - it assumes the overlap is 
sufficient, of course. But if that's true you may build multiple small 
(and very accurate) stats instead of one huge (or very inaccurate) 
statistics.


This also makes it possible to handle complex combinations of clauses 
that are compatible and incompatible with multivariate statistics, by 
passing the conditions.




My suggestion in the patch is a bottom-up approach to find
mv-applicable portion(s) in the expression tree, which is the
basic way of planner overall. The approach requires no repetitive
run of tree walker, that is, pull_varnos. It could fail to find
the 'optimal' solution for complex situations but needs far less
calculation for almost the same return (I think..).

Even though it doesn't consider the functional dependency, the
reduce of the code shows the efficiency. It does not nothing
tricky.


OK


On a conceptual level, I think the idea to split the estimation into
two phases - enrich the expression tree with nodes with details about
stats etc, and then actually do the estimation in the second phase
might be interesting. Not because it's somehow clearer, but because it
gives us a chance to see the expression tree as a whole, with details
about all the stats (with the current code we process/estimate the
tree incrementally). But I don't really know how useful that would be.


It is difficult to say which approach is better sinch it is
affected by what we think important than other things. However I
concern about that your code substantially reconstructs the
expression (clause) tree midst of processing it. I believe it
should be a separate phase for simplicity. Of course additional
required resource is also should be considered 

Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Josh Berkus
On 07/27/2015 02:47 AM, Rajeev rastogi wrote:
>> Why have any fixed maximum?
> Since we are planning to have nested autonomous transaction, so it is 
> required to have limit on this so that resources can be controlled.

Is there a particular reason why this limit wouldn't just be
max_stack_depth?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread Tom Lane
Andres Freund  writes:
> On 2015-07-27 17:31:41 -0400, Tom Lane wrote:
>> What happens if you force use of port/snprintf.c instead of glibc's
>> version?

> Even worse. 15900.014 ms.

Interesting.  So as a separate optimization problem, we might consider
"try to put snprintf.c at least on a par with glibc".  I'm kind of
surprised by this result really, since snprintf.c lacks a lot of the
bells and whistles that are in glibc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Josh Berkus
On 07/27/2015 02:41 PM, Joel Jacobson wrote:
> However, we should also add a way for the caller to protect against an
> Autonomous Transaction in a function called by the caller. Imagine if
> you're the author of function X() and within X() make use of some other
> function Y() which has been written by some other author, and within
> your function X(), it's very important either all of your work or none
> at all gets committed, then you need to make sure none of the changes
> you made before calling Y() gets committed, and thus we need a way to
> prevent Y() from starting and committing an Autonomous Transaction,
> otherwise we would increase the risk and complexity of working with
> functions and plpgsql in PostgreSQL as you would then need to be sure
> none of the functions you are using within a function will start and
> commit an ATX.

Ah, you're missing how commits in ATX are expected to work.  Let me
illustrate:

X (
   Data write A1
   call Y(
Start ATX
Data write B1
Commit ATX
   )
   Data write A2
   Exception
)

In this workflow, B1 would be committed and persistent. Neither A1 nor
A2 would be committed, or visible to other users.  Depending on what
implementation we end up with, A1 might not even be visible to Y().

So that solves your use case without any need to "block" ATXs in called
functions.  However, it leads to some interesting cases involving
self-deadlocks; see the original post on this thread.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Joel Jacobson
On Fri, Jul 24, 2015 at 9:39 PM, Merlin Moncure  wrote:

> On Thu, Jul 23, 2015 at 1:49 PM, Josh Berkus  wrote:
> > Batch Jobs: large data-manipulation tasks which need to be broken up
> > into segments, with each segment committing separately.  Example:
> > updating 1 million records in batches of 1000.
>
> Autonomous transactions are not a good fit for this case; stored
> procedures are a better way to go for any scenario where you don't
> want be be in a snapshot (for example, suppose you want to change
> isolation level on the fly).


Hm, you mean we need real "stored procedures" in PostgreSQL and not just
"functions"?

If not, I think it would be sufficient to add Autonomous Transaction
support to the type of functions we already have in pg to allow writing a
batch job function which would commit after X numbers of modified rows,
instead of having to write a script in an external language such as Perl to
call the function in a while-loop and commit in between each function call.

However, we should also add a way for the caller to protect against an
Autonomous Transaction in a function called by the caller. Imagine if
you're the author of function X() and within X() make use of some other
function Y() which has been written by some other author, and within your
function X(), it's very important either all of your work or none at all
gets committed, then you need to make sure none of the changes you made
before calling Y() gets committed, and thus we need a way to prevent Y()
from starting and committing an Autonomous Transaction, otherwise we would
increase the risk and complexity of working with functions and plpgsql in
PostgreSQL as you would then need to be sure none of the functions you are
using within a function will start and commit an ATX.


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread Andres Freund
On 2015-07-27 17:31:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > So nearly all the time is spent somewhere inside the sprintf calls. Not
> > nice.
> 
> What happens if you force use of port/snprintf.c instead of glibc's
> version?

Good question.

Even worse. 15900.014 ms.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Optimization idea: merging multiple EXISTS(...) with constraint-based join removal

2015-07-27 Thread Frédéric TERRAZZONI
I've frequently encountered queries where the WHERE clause contains a lot
of "EXISTS(...)" subqueries linked by logical ANDs. They are typically
generated queries, and the "EXISTS(...)" fragments are composable pieces of
SQL code acting as filters. Moreover, all those filters ends up being very
similar: the joins used to navigate in the objects graph often involves the
same tables.

Unfortunately, PostgreSQL doesn't seem to be smart enough to simplify them.
In fact, according to my experiments, even dead-simple uncorrelated
subqueries such as "SELECT * FROM t1 WHERE EXISTS(SELECT * FROM t2) AND
EXISTS(SELECT * FROM t2)" are not simplified. (the execution plan is rather
naive: t2 is checked twice for non-emptiness).

I've not seen anything related to that topic on the TODO list or in the
mailing list. Consequently, I would like to propose a method which can be
used to merge multiple EXISTS(...) semi-joins linked by logical ANDs.

For the sake of this example, the database schema is:

create table t1(id int primary key, t2_id int, t6_id int)
create table t2(id int primary key, t3_id int)
create table t3(id int primary key, t4_id int, t5_id int)
create table t4(id int primary key, val text)
create table t5(id int primary key, val text)
create table t6(id int primary key, val text)

Here is a machine-generated query. Notice that the path "t1->t2->t3" is
used twice.

SELECT * FROM t1
WHERE EXISTS(
SELECT 1 FROM t2, t3, t4
WHERE t2.id = t1.t2_id
AND t3.id = t2.t3_id
AND t4.id = t3.t4_id
AND t4.val = 'XYZ'
) AND EXISTS(
SELECT 1 FROM t2, t3, t5
WHERE t2.id = t1.t2_id
AND t3.id = t2.t3_id
AND t5.id = t3.t5_id
AND t5.val = 'Blablabla'
) AND EXISTS(
SELECT 1 FROM t6
WHERE t6.id = t1.t6_id
AND t6.val = 'Hello'
)

The following transformation can be applied:

SELECT * FROM [...]
WHERE
EXISTS(SELECT 1 FROM [...tables_1...] WHERE [...conditions_1...])
AND EXISTS(SELECT 1 FROM [...tables_2...] WHERE [...conditions_2...])

<=>

SELECT * FROM [...]
WHERE
EXISTS(
SELECT 1 FROM
(SELECT 1 FROM [...tables_1...] WHERE [...conditions_1...])
q1,
(SELECT 1 FROM [...tables_2...] WHERE [...conditions_2...])
q2
)

The resulting query is:

SELECT * FROM t1
WHERE EXISTS(
SELECT 1 FROM t2 t2_a, t3 t3_a, t4 t4_a, t2 t2_b, t3 t3_b, t5,
t6
WHERE t2_a.id = t1.t2_id
AND t3_a.id = t2_a.t3_id
AND t4_a.id = t3_a.t4_id
AND t4_a.val = 'XYZ'
AND t2_b.id = t1.t2_id
AND t3_b.id = t2_b.t3_id
AND t5.id = t3_b.t5_id
AND t5.val = 'Blablabla'
AND t6.id = t1.t6_id
AND t6.val = 'Hello'
)

Given the constraints, it can be shown that t2_a = t2_b AND t3_a = t3_b :

* t2_a.id = t1.t2_id AND t2_b.id = t1.t2_id => t2_a.id = t2_b.id
* t2_a.id = t2_b.id AND t2(id) is a pkey => t2_a = t2_b
* t2_a = t2_b => t2_a.t3_id = t2_b.t3_id
* t2_a.t3_id = t2_b.t3_id AND t3_b.id = t2_b.t3_id AND t3_a.id =
t2_a.t3_id => t3_b.id = t3_a.id
* t3_b.id = t3_a.id AND t3(id) is a pkey => t3_a = t3_b

It is then safe to remove the joins "t2_b" and "t3_b" in the subquery,
because they have been proved to be duplicates of "t2_a" and "t3_a".

SELECT * FROM t1
WHERE EXISTS(
SELECT 1 FROM t2, t3, t4, t2, t3, t5, t6
WHERE t2.id = t1.t2_id
AND t3.id = t2.t3_id
AND t4.id = t3.t4_id
AND t5.id = t3.t5_id
AND t4.val = 'XYZ'
AND t5.val = 'Blablabla'
AND t6.id = t1.t6_id
AND t6.val = 'Hello'
)

Using a connected component analysis on join predicates, joined tables can
be separated and we get back our original query with the first two
EXISTS(...) semi-joins merged.

SELECT * FROM t1
WHERE EXISTS(
SELECT 1 FROM t2, t3, t4, t2, t3, t5
WHERE t2.id = t1.t2_id
AND t3.id = t2.t3_id
AND t4.id = t3.t4_id
AND t5.id = t3.t5_id
AND t4.val = 'XYZ'
AND t5.val = 'Blablabla'
) AND EXISTS(
SELECT 1 FROM t6
WHERE t6.id = t1.t6_id
AND t6.val = 'Hello'
)

My questions are:
- Does PostgreSQL already supports this optimization ? Maybe it's not
enabled in my case only?
- If yes, is my reasoning incorrect ? Can you point me my mistake ?
- Otherwise is there any plan to add this optimization to PostgreSQL ?

Thank you !

Frédéric Terrazzoni


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/27/2015 01:25 PM, Dean Rasheed wrote:
> Looks good to me, except I'm not sure about those latest changes 
> because I don't understand the reasoning behind the logic in 
> check_enable_rls() when row_security is set to OFF.
> 
> I would expect that if the current user has permission to bypass
> RLS, and they have set row_security to OFF, then it should be off
> for all tables that they have access to, regardless of how they
> access those tables (directly or through a view). If it really is
> intentional that RLS remains active when querying through a view
> not owned by the table's owner, then the other calls to
> check_enable_rls() should probably be left as they were, since the
> table might have been updated through such a view and that code
> can't really tell at that point.

After looking again at those three call sites, I'm now on the fence.
All three are in functions which are trying to avoid leaking info in
error messages. If we leave the original coding, then the messages
will remain the same for a given user regardless of the row_security
setting, whereas if we change them as in my latest patch, the messages
will be different depending on row_security for users with BYPASSRLS.

I think if we do the former (original coding) there should probably be
a comment added to indicate we are doing that intentionally.

Thoughts?

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVtqRuAAoJEDfy90M199hlIQEQAIli3fJHPbpBBPocIjzo04EH
78YTiRYlb58ZK9l+VKj+sA9JbOdUVEes168hJjHSdnw6HcjJnKY+J3+aKjcRgXu2
s197hMOiP2+nqj0r0+KX/W8cuHT/4x5NtQ46BR9UoAPGdW9139CSbf3nQ9gTIllR
PQs7TRJIsJRWhuZhX5eCvQqix+RUYY2PKPMNdo8OLQpZxlsA7ezWr5QuDBx0PYFd
WTkaOsRxpAtfPBQrt+0xnX8oKi1pF4QLGt0Nb0J5XQmxUbKUdQsYY7+iu7Utrmf2
i5TORkX5HIpyH1N6R5Zu9wyRiOpLQv8SyH0kWovDA2neMlrApkM8kYfTYZAPIQUE
H6fOs6bafMMP4vWH7CwDhOwasccoLwdkEg80wiGnn5Nu+K4nq4k6Dq05oq+G7ZL1
6vZCXR0zts1TuX4abwtAcURaNbw+y7D/1XN9Dn0kDwuV3cXRh2tc33r/0SJ9XQFj
+gEdqptm3gyIgBExGlZDwNtpHwHEs2xqFjIBChyDV3cMjvFeTlYgiFiJlm40Ac/3
zelJ6hpsttAHWBg+42MoogrV7wKdCLOH/npwRx0zw5hH3meMs3ydQibtUwb/z+yl
6zl7xDMrTDOg/gV+gXbruVzSQIgNmfDEXmcHDKRr2IQwcNXXkTzEiIxJBRB7FhJg
GgXBUGlGDKRGXF8Koauy
=jCXT
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread Tom Lane
Andres Freund  writes:
> So nearly all the time is spent somewhere inside the sprintf calls. Not
> nice.

What happens if you force use of port/snprintf.c instead of glibc's
version?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Dean Rasheed
On 27 July 2015 at 21:58, Stephen Frost  wrote:
> Dean,
>
> * Dean Rasheed (dean.a.rash...@gmail.com) wrote:
>> On 27 July 2015 at 18:13, Joe Conway  wrote:
>> > -BEGIN PGP SIGNED MESSAGE-
>> > Hash: SHA1
>> >
>> > On 07/27/2015 10:03 AM, Joe Conway wrote:
>> >> On 07/26/2015 07:59 AM, Joe Conway wrote:
>> >>> On 07/26/2015 07:19 AM, Dean Rasheed wrote:
>>  Attached is an updated patch (still needs some docs for the
>>  functions).
>> >>>
>> >>> Thanks for that. I'll add the docs.
>> >>
>> >> Documentation added. Also added comment to check_enable_rls about
>> >> passing InvalidOid versus GetUserId().
>> >>
>> >> I believe this is ready to go -- any other comments?
>> >
>> > Strike that - now I really think it is ready to go :-)
>> >
>> > In this patch I additionally changed instances of:
>> >   check_enable_rls(indrelid, GetUserId(), true)
>> > to
>> >   check_enable_rls(indrelid, InvalidOid, true)
>> > per Dean's earlier remark and my new comment.
>>
>> Looks good to me, except I'm not sure about those latest changes
>> because I don't understand the reasoning behind the logic in
>> check_enable_rls() when row_security is set to OFF.
>>
>> I would expect that if the current user has permission to bypass RLS,
>> and they have set row_security to OFF, then it should be off for all
>> tables that they have access to, regardless of how they access those
>> tables (directly or through a view). If it really is intentional that
>> RLS remains active when querying through a view not owned by the
>> table's owner, then the other calls to check_enable_rls() should
>> probably be left as they were, since the table might have been updated
>> through such a view and that code can't really tell at that point.
>
> Joe and I were discussing this earlier and it was certainly intentional
> that RLS still be enabled if you're querying through a view as the RLS
> rights of the view owner are used, not your own.  Note that we don't
> allow a user to assume the BYPASSRLS right of the view owner though,
> also intentionally.
>
> As a comparison to what we do today, even if you have access to a table,
> if you query it through a view, it's the view owner's permissions which
> are used to determine access to the table through the view, not your
> own.  I agree that can be a bit odd at times, as you can get a
> permission denied error when using the view even though you have access
> to the table which is complained about, but that's how views have worked
> for quite a long time.
>

OK, fair enough.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: Make timestamptz_out less slow.

2015-07-27 Thread Andres Freund
Hi,

I recently once more noticed that timestamptz_out is really, really
slow. To quantify that, I created a bunch of similar sized tables:

CREATE TABLE tbl_timestamp AS SELECT NOW() FROM generate_series(1, 10) a, 
generate_series(1, 100) b;
CREATE TABLE tbl_int8 AS SELECT 1::bigint FROM generate_series(1,  10) a, 
generate_series(1, 100) b;
CREATE TABLE tbl_bytea AS SELECT '   '::bytea FROM generate_series(1, 10) 
a, generate_series(1, 100) b;

These all end up being 346MB large.

COPY tbl_bytea TO '/dev/null';
Time: 1173.484 ms
COPY tbl_int8 TO '/dev/null';
Time: 1030.756 ms
COPY tbl_timestamp TO '/dev/null';
Time: 6598.030

(all best of three)

Yes, timestamp outputs more data as a whole, but being 5 times as slow
is still pretty darn insane. To make sure that's not the cause, here's
another table:

CREATE TABLE tbl_timestamptext AS SELECT NOW()::text FROM generate_series(1, 
10) a, generate_series(1, 100) b;
COPY tbl_timestamptext TO '/dev/null';
Time: 1449.554 ms

So it's really just the timestamp code.


Profiling it shows:
  Overhead  Command Shared Object Symbol
  -   38.33%  postgres_stock  libc-2.19.so  [.] vfprintf
 - 97.92% vfprintf
  _IO_vsprintf
- sprintf
   + 70.25% EncodeDateTime
   + 29.75% AppendSeconds.constprop.10
 + 1.11% _IO_default_xsputn
  -8.22%  postgres_stock  libc-2.19.so  [.] _IO_default_xsputn
 - 99.43% _IO_default_xsputn
- 90.09% vfprintf
 _IO_vsprintf
   - sprintf
  + 74.15% EncodeDateTime
  + 25.85% AppendSeconds.constprop.10
+ 9.72% _IO_padn
 + 0.57% vfprintf
  +   7.76%  postgres_stock  postgres_stock[.] CopyOneRowTo   

So nearly all the time is spent somewhere inside the sprintf calls. Not
nice.

The only thing I could come up to make the sprintfs cheaper was to
combine them into one and remove some of the width specifiers that
aren't needed. That doesn't buy us very much.

I then proceeded to replace the sprintf call with hand-rolled
conversions. And wow, the benefit is far bigger than I'd assumed:
postgres[7236][1]=# COPY tbl_timestamp TO '/dev/null';
Time: 2430.521 ms

So, by hand-rolling the ISO conversion in EncodeDateTime() we got a
~250% performance improvement. I'd say that's worthwhile.

The attached patch shows what I did. While there's some polishing
possible, as a whole, it's pretty ugly. But I think timestamp data is so
common that it's worth the effort.

Does anybody have a fundamentally nicer idea than the attached to
improvide this?

Greetings,

Andres Freund
>From 1d7b6110f8864ee00c1fe4f54d8937347ade7d80 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 27 Jul 2015 23:09:33 +0200
Subject: [PATCH] WIP: faster timestamp[tz]_out

---
 src/backend/utils/adt/datetime.c| 108 
 src/test/regress/expected/horology.out  |  24 ---
 src/test/regress/expected/timestamp.out |  62 +++---
 src/test/regress/sql/timestamp.sql  |   1 +
 4 files changed, 164 insertions(+), 31 deletions(-)

diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 2a44b6e..4c13f01 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3968,7 +3968,115 @@ EncodeDateTime(struct pg_tm * tm, fsec_t fsec, bool print_tz, int tz, const char
 
 	switch (style)
 	{
+#ifdef HAVE_INT64_TIMESTAMP
+		case USE_ISO_DATES:
+			/*
+			 * Fastpath for most common format and range. Not using sprintf
+			 * provides significant performance benefits, and timestamp data
+			 * is pretty common. All sane use cases dealing with large amounts
+			 * of data use iso timestamps, so we can focus on optimizing
+			 * those, and keep the rest of the code maintainable.
+			 */
+			if (tm->tm_year > 0 && tm->tm_year < 1)
+			{
+int year  = (tm->tm_year > 0) ? tm->tm_year : -(tm->tm_year - 1);
+
+*str++ = (year / 1000) + '0';
+*str++ = (year / 100) % 10 + '0';
+*str++ = (year / 10) % 10 + '0';
+*str++ = year % 10 + '0';
+*str++ = '-';
+*str++ = (tm->tm_mon / 10) + '0';
+*str++ = tm->tm_mon % 10 + '0';
+*str++ = '-';
+*str++ = (tm->tm_mday / 10) + '0';
+*str++ = tm->tm_mday % 10 + '0';
+*str++ = ' ';
+*str++ = (tm->tm_hour / 10) + '0';
+*str++ = tm->tm_hour % 10 + '0';
+*str++ = ':';
+*str++ = (tm->tm_min / 10) + '0';
+*str++ = tm->tm_min % 10 + '0';
+*str++ = ':';
+*str++ = (tm->tm_sec / 10) + '0';
+*str++ = tm->tm_sec % 10 + '0';
+
+/*
+ * Yes, this is darned ugly and would look nicer in a loop,
+ * but some versions of gcc can't convert the divisions into
+ * more efficient instructions unless manually unrolled.
+ */
+if (fsec != 0)
+{
+	int fseca = abs(fsec);
+
+	*str++ = '.';
+
+	if (fseca % 100 != 0)
+	{
+		*str++ = (fseca / 10) + '0';
+
+		if (fseca % 10 != 0)
+		{
+			*str++ = ((fseca 

Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Jim Nasby

On 7/27/15 2:57 PM, Andrew Dunstan wrote:


psql -At -c "select datname from pg_database" postgres | xargs -n 1 -P
3 psql -c "select current_database()"





I don't think it's going to be a hugely important feature, but I don't
see a problem with creating a new option (-C seems fine) which would
have the same effect as if the arguments were contatenated into a file
which is then used with -f. IIRC -c has some special characteristics
which means it's probably best not to try to extend it for this feature.


+1. I've occasionally wanted this as well.

I'd also vote for -C returning every state string as well.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Jim Nasby

On 7/27/15 1:46 PM, Robert Haas wrote:

On Mon, Jul 27, 2015 at 2:43 PM, Alvaro Herrera
 wrote:

Robert Haas wrote:

On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera
 wrote:



I think this is already possible, is it not?  You just have to look for
an identically-identified pg_locks entry with granted=true.  That gives
you a PID and vxid/xid.  You can self-join pg_locks with that, and join
to pg_stat_activity.

I remember we discussed having a layer of system views on top of
pg_stat_activity and pg_locks, probably defined recursively, that would
show the full graph of waiters/lockers.


It isn't necessarily the case that A is waiting for a unique process
B.  It could well be the case that A wants AccessExclusiveLock and
many processes hold a variety of other lock types.


Sure, but I don't think this makes it impossible to figure out who's
locking who.  I think the only thing you need other than the data in
pg_locks is the conflicts table, which is well documented.

Oh, hmm, one thing missing is the ordering of the wait queue for each
locked object.  If process A holds RowExclusive on some object, process
B wants ShareLock (stalled waiting) and process C wants AccessExclusive
(also stalled waiting), who of B and C is woken up first after A
releases the lock depends on order of arrival.


Agreed - it would be nice to expose that somehow.


+1. It's very common to want to know who's blocking who, and not at all 
easy to do that today. We should at minimum have a canonical example of 
how to do it, but something built in would be even better.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] copy.c handling for RLS is insecure

2015-07-27 Thread Stephen Frost
All,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2015-07-09 01:28:28 -0400, Noah Misch wrote:
> > > > - Keep the OID check, shouldn't hurt to have it
> > > 
> > > What benefit is left?
> > 
> > A bit of defense in depth. We execute user defined code in COPY
> > (e.g. BEFORE triggers). That user defined code could very well replace
> > the relation. Now I think right now that'd happen late enough, so the
> > second lookup already happened. But a bit more robust defense against
> > that sounds good to me.
> 
> Attached patch keeps the relation locked, fully qualifies it when
> building up the query, and uses list_member_oid() to check that the
> relation's OID ends up in the resulting relationOids list (to address
> Noah's point that the planner doesn't guarantee the ordering; I doubt
> that list will ever be more than a few entries long).
> 
> Also removes the misguided Assert().
> 
> Barring objections, I'll commit this (and backpatch to 9.5) tomorrow.

Apologies for not pushing this before I left on vacation.  I've done so
now.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> On 27 July 2015 at 18:13, Joe Conway  wrote:
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > On 07/27/2015 10:03 AM, Joe Conway wrote:
> >> On 07/26/2015 07:59 AM, Joe Conway wrote:
> >>> On 07/26/2015 07:19 AM, Dean Rasheed wrote:
>  Attached is an updated patch (still needs some docs for the
>  functions).
> >>>
> >>> Thanks for that. I'll add the docs.
> >>
> >> Documentation added. Also added comment to check_enable_rls about
> >> passing InvalidOid versus GetUserId().
> >>
> >> I believe this is ready to go -- any other comments?
> >
> > Strike that - now I really think it is ready to go :-)
> >
> > In this patch I additionally changed instances of:
> >   check_enable_rls(indrelid, GetUserId(), true)
> > to
> >   check_enable_rls(indrelid, InvalidOid, true)
> > per Dean's earlier remark and my new comment.
> 
> Looks good to me, except I'm not sure about those latest changes
> because I don't understand the reasoning behind the logic in
> check_enable_rls() when row_security is set to OFF.
> 
> I would expect that if the current user has permission to bypass RLS,
> and they have set row_security to OFF, then it should be off for all
> tables that they have access to, regardless of how they access those
> tables (directly or through a view). If it really is intentional that
> RLS remains active when querying through a view not owned by the
> table's owner, then the other calls to check_enable_rls() should
> probably be left as they were, since the table might have been updated
> through such a view and that code can't really tell at that point.

Joe and I were discussing this earlier and it was certainly intentional
that RLS still be enabled if you're querying through a view as the RLS
rights of the view owner are used, not your own.  Note that we don't
allow a user to assume the BYPASSRLS right of the view owner though,
also intentionally.

As a comparison to what we do today, even if you have access to a table,
if you query it through a view, it's the view owner's permissions which
are used to determine access to the table through the view, not your
own.  I agree that can be a bit odd at times, as you can get a
permission denied error when using the view even though you have access
to the table which is complained about, but that's how views have worked
for quite a long time.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Simon Riggs
On 22 July 2015 at 17:11, Jeff Janes  wrote:

> On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas 
> wrote:
>
>> On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes  wrote:
>> > Attached is a patch that implements the vm scan for truncation.  It
>> > introduces a variable to hold the last blkno which was skipped during
>> the
>> > forward portion.  Any blocks after both this blkno and after the last
>> > inspected nonempty page (which the code is already tracking) must have
>> been
>> > observed to be empty by the current vacuum.  Any other process
>> rendering the
>> > page nonempty are required to clear the vm bit, and no other process
>> can set
>> > the bit again during the vacuum's lifetime.  So if the bit is still
>> set, the
>> > page is still empty without needing to inspect it.
>>
>> Urgh.  So if we do this, that forever precludes having HOT pruning set
>> the all-visible bit.
>
>
> I wouldn't say forever, as it would be easy to revert the change if
> something more important came along that conflicted with it.
>

I think what is being said here is that someone is already using this
technique, or if not, then we actively want to encourage them to do so as
an extension or as a submission to core.

In that case, I think the rely-on-VM technique sinks again, sorry Jim,
Jeff. Probably needs code comments added.

That does still leave the prefetch technique, so all is not lost.

Can we see a patch with just prefetch, probably with a simple choice of
stride? Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Dean Rasheed
On 27 July 2015 at 18:13, Joe Conway  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 07/27/2015 10:03 AM, Joe Conway wrote:
>> On 07/26/2015 07:59 AM, Joe Conway wrote:
>>> On 07/26/2015 07:19 AM, Dean Rasheed wrote:
 Attached is an updated patch (still needs some docs for the
 functions).
>>>
>>> Thanks for that. I'll add the docs.
>>
>> Documentation added. Also added comment to check_enable_rls about
>> passing InvalidOid versus GetUserId().
>>
>> I believe this is ready to go -- any other comments?
>
> Strike that - now I really think it is ready to go :-)
>
> In this patch I additionally changed instances of:
>   check_enable_rls(indrelid, GetUserId(), true)
> to
>   check_enable_rls(indrelid, InvalidOid, true)
> per Dean's earlier remark and my new comment.
>

Looks good to me, except I'm not sure about those latest changes
because I don't understand the reasoning behind the logic in
check_enable_rls() when row_security is set to OFF.

I would expect that if the current user has permission to bypass RLS,
and they have set row_security to OFF, then it should be off for all
tables that they have access to, regardless of how they access those
tables (directly or through a view). If it really is intentional that
RLS remains active when querying through a view not owned by the
table's owner, then the other calls to check_enable_rls() should
probably be left as they were, since the table might have been updated
through such a view and that code can't really tell at that point.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Robert Haas
On Mon, Jul 27, 2015 at 2:53 PM, Pavel Stehule  wrote:
> I am trying to run parallel execution
>
> psql -At -c "select datname from pg_database" postgres | xargs -n 1 -P 3
> psql -c "select current_database()"

Put this in a shell script called run-psql:

#!/bin/bash

test $# = 0 && exit
for f in "${@:1:$(($#-1))}"; do
echo "$f" \;
done | psql "${@:$#}"

Then:

psql -At -c "select datname from pg_database" postgres | xargs -n 1 -P
3 ./run-psql "select current_database()" "vacuum" "select 1"

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] more RLS oversights

2015-07-27 Thread Alvaro Herrera
Joe Conway wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 07/03/2015 10:03 AM, Noah Misch wrote:
> > (2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend
> > entry for each role in the TO clause.  Test case:
> 
> Please see the attached patch. Note that I used SHARED_DEPENDENCY_ACL
> for this. It seems appropriate, but possibly we should invent a new
> shared dependency type for this use case? Comments?

Hmm, these are not ACL objects, so conceptually it seems cleaner to use
a different symbol for this.  I think the catalog state and the error
messages would be a bit confusing otherwise.

>   if (spec->roletype == ROLESPEC_PUBLIC)
>   {
> ! Datum   *tmp_role_oids;
> ! 
> ! if (*num_roles != 1)
>   ereport(WARNING,
>   
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>errmsg("ignoring roles 
> specified other than public"),
> errhint("All roles are members of the 
> public role.")));
> !*num_roles = 1;
> ! tmp_role_oids = (Datum *) palloc(*num_roles * 
> sizeof(Datum));
> ! tmp_role_oids[0] = ObjectIdGetDatum(ACL_ID_PUBLIC);

Isn't this leaking the previously allocated array?  Not sure it's all
that critical, but still.  (I don't think you really need to call palloc
at all here.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Andrew Dunstan


On 07/27/2015 02:53 PM, Pavel Stehule wrote:





I am trying to run parallel execution

psql -At -c "select datname from pg_database" postgres | xargs -n 1 -P 
3 psql -c "select current_database()"






I don't think it's going to be a hugely important feature, but I don't 
see a problem with creating a new option (-C seems fine) which would 
have the same effect as if the arguments were contatenated into a file 
which is then used with -f. IIRC -c has some special characteristics 
which means it's probably best not to try to extend it for this feature.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVOKE [ADMIN OPTION FOR] ROLE

2015-07-27 Thread Stephen Frost
Egor,

* Egor Rogov (e.ro...@postgrespro.ru) wrote:
> >On Thu, Jul 23, 2015 at 8:30 AM, Egor Rogov  wrote:
> >>So, the question: is it a documentation bug (as it seems to me), code bug,
> >>or I missed something?
> >Your analysis looks right to me, but I don't know whether the code or
> >the documentation should be changed.  This claim was added by Tom Lane
> >in 2005 in commit 58d214e51fe50b10b4439da6ec263d54c155afbf.  It might
> >be worth checking whether the claim was true at that time and later
> >became false, or whether it was never true to begin with.
> >
> As far as I can see, modern revoke syntax for revoking membership in
> a role (along with "admin option") was introduced in commit 7762619
> (by Tom Lane, 2005). Code for handling this command didn't pay
> attention for "restrict/cascade" keywords then, as it does not now.
> Before that, another syntax was in use: alter group groupname drop
> user username [, ...]. It did not include notion of "cascade" at
> all.
> I guess that "revoke role_name from role_name" inherited
> "[cascade|restrict]" section from general revoke command but never
> actually used it. And I see no point in changing this, because role
> membership is somewhat more static than privileges.
> So I would propose the attached fix for documentation.

Have you looked at the SQL spec at all for this..?  That's what we
really should be looking at to determine if this is a documentation
issue or a code issue.

I'll take a look in a day or two after I've caught up on other things,
if no one beats me to it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] spgist recovery assertion failure

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 04:24 PM, Michael Paquier wrote:

On Mon, Jul 27, 2015 at 2:33 PM, Piotr Stefaniak
 wrote:

On 07/27/2015 07:19 AM, Michael Paquier wrote:


On Mon, Jul 27, 2015 at 2:00 PM, Noah Misch  wrote:


When I caused a crash during the create_index regression test, recovery
hit an
assertion failure.  Minimal test case:

psql -X <


On which platform are you seeing the failure? I am afraid I could not
reproduce the failure on Linux and OSX after testing it on HEAD.



I'm having the same symptoms with
159cff58cf3b565be3c17901698a74238e9e23f8 on Ubuntu Linux 3.4.39 armv7l.


Yes, on armv7l this can be indeed reproduced.


Fixed, thanks everyone! The problem was that in the WAL format change 
patch, I had used "char" in a struct to hold -1, but "char" is unsigned 
on PowerPC and ARM.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Pavel Stehule
2015-07-27 20:47 GMT+02:00 Robert Haas :

> On Mon, Jul 27, 2015 at 2:37 PM, Pavel Stehule 
> wrote:
> > 2015-07-27 20:32 GMT+02:00 Robert Haas :
> >>
> >> On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule  >
> >> wrote:
> >> > It will be nice side effect, but my primary problem was a
> impossibility
> >> > to
> >> > combine VACUUM and any other statement to one simple psql call.
> >>
> >> Seems like you can do that easily enough:
> >>
> >> [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') |
> psql
> >>  ?column?
> >> --
> >> 1
> >> (1 row)
> >>
> >> VACUUM
> >>  ?column?
> >> --
> >> 2
> >> (1 row)
> >>
> >
> > how I can do it with xargs?
>
> I don't specifically what you're trying to do, but I bet it's not that
> hard.
>

I am trying to run parallel execution

psql -At -c "select datname from pg_database" postgres | xargs -n 1 -P 3
psql -c "select current_database()"

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] anole: assorted stability problems

2015-07-27 Thread Robert Haas
On Sun, Jul 26, 2015 at 11:36 AM, Andres Freund  wrote:
> On 2015-07-07 13:25:24 +0200, Andres Freund wrote:
>> So, it's starting to look good. Not exactly allowing for a lot of
>> confidence yet, but still:
>> http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=anole&br=HEAD
>
> Since there have not been any relevant failures since, I'm going to
> remove this issue from the open items list.

Woohoo!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Robert Haas
On Mon, Jul 27, 2015 at 2:37 PM, Pavel Stehule  wrote:
> 2015-07-27 20:32 GMT+02:00 Robert Haas :
>>
>> On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule 
>> wrote:
>> > It will be nice side effect, but my primary problem was a impossibility
>> > to
>> > combine VACUUM and any other statement to one simple psql call.
>>
>> Seems like you can do that easily enough:
>>
>> [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql
>>  ?column?
>> --
>> 1
>> (1 row)
>>
>> VACUUM
>>  ?column?
>> --
>> 2
>> (1 row)
>>
>
> how I can do it with xargs?

I don't specifically what you're trying to do, but I bet it's not that hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Robert Haas
On Mon, Jul 27, 2015 at 2:43 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera
>>  wrote:
>
>> > I think this is already possible, is it not?  You just have to look for
>> > an identically-identified pg_locks entry with granted=true.  That gives
>> > you a PID and vxid/xid.  You can self-join pg_locks with that, and join
>> > to pg_stat_activity.
>> >
>> > I remember we discussed having a layer of system views on top of
>> > pg_stat_activity and pg_locks, probably defined recursively, that would
>> > show the full graph of waiters/lockers.
>>
>> It isn't necessarily the case that A is waiting for a unique process
>> B.  It could well be the case that A wants AccessExclusiveLock and
>> many processes hold a variety of other lock types.
>
> Sure, but I don't think this makes it impossible to figure out who's
> locking who.  I think the only thing you need other than the data in
> pg_locks is the conflicts table, which is well documented.
>
> Oh, hmm, one thing missing is the ordering of the wait queue for each
> locked object.  If process A holds RowExclusive on some object, process
> B wants ShareLock (stalled waiting) and process C wants AccessExclusive
> (also stalled waiting), who of B and C is woken up first after A
> releases the lock depends on order of arrival.

Agreed - it would be nice to expose that somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera
>  wrote:

> > I think this is already possible, is it not?  You just have to look for
> > an identically-identified pg_locks entry with granted=true.  That gives
> > you a PID and vxid/xid.  You can self-join pg_locks with that, and join
> > to pg_stat_activity.
> >
> > I remember we discussed having a layer of system views on top of
> > pg_stat_activity and pg_locks, probably defined recursively, that would
> > show the full graph of waiters/lockers.
> 
> It isn't necessarily the case that A is waiting for a unique process
> B.  It could well be the case that A wants AccessExclusiveLock and
> many processes hold a variety of other lock types.

Sure, but I don't think this makes it impossible to figure out who's
locking who.  I think the only thing you need other than the data in
pg_locks is the conflicts table, which is well documented.

Oh, hmm, one thing missing is the ordering of the wait queue for each
locked object.  If process A holds RowExclusive on some object, process
B wants ShareLock (stalled waiting) and process C wants AccessExclusive
(also stalled waiting), who of B and C is woken up first after A
releases the lock depends on order of arrival.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Pavel Stehule
2015-07-27 20:32 GMT+02:00 Robert Haas :

> On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule 
> wrote:
> > It will be nice side effect, but my primary problem was a impossibility
> to
> > combine VACUUM and any other statement to one simple psql call.
>
> Seems like you can do that easily enough:
>
> [rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql
>  ?column?
> --
> 1
> (1 row)
>
> VACUUM
>  ?column?
> --
> 2
> (1 row)
>
>
how I can do it with xargs?

Regards

Pavel


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Robert Haas
On Mon, Jul 27, 2015 at 2:32 PM, Alvaro Herrera
 wrote:
> David Rowley wrote:
>> I've not looked into the feasibility of it, but if it were also possible to
>> have a "waiting_for" column which would store the process ID of the process
>> that's holding a lock that this process is waiting on, then it would be
>> possible for some smart guy to write some code which draws beautiful
>> graphs, perhaps in Pg Admin 4 of which processes are blocking other
>> processes. I imagine this as a chart with an icon for each process.
>> Processes waiting on locks being released would have an arrow pointing to
>> their blocking process, if we clicked on that blocking process we could see
>> the query that it's running and various other properties that are existing
>> columns in pg_stat_activity.
>
> I think this is already possible, is it not?  You just have to look for
> an identically-identified pg_locks entry with granted=true.  That gives
> you a PID and vxid/xid.  You can self-join pg_locks with that, and join
> to pg_stat_activity.
>
> I remember we discussed having a layer of system views on top of
> pg_stat_activity and pg_locks, probably defined recursively, that would
> show the full graph of waiters/lockers.

It isn't necessarily the case that A is waiting for a unique process
B.  It could well be the case that A wants AccessExclusiveLock and
many processes hold a variety of other lock types.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Alvaro Herrera
David Rowley wrote:

> I've not looked into the feasibility of it, but if it were also possible to
> have a "waiting_for" column which would store the process ID of the process
> that's holding a lock that this process is waiting on, then it would be
> possible for some smart guy to write some code which draws beautiful
> graphs, perhaps in Pg Admin 4 of which processes are blocking other
> processes. I imagine this as a chart with an icon for each process.
> Processes waiting on locks being released would have an arrow pointing to
> their blocking process, if we clicked on that blocking process we could see
> the query that it's running and various other properties that are existing
> columns in pg_stat_activity.

I think this is already possible, is it not?  You just have to look for
an identically-identified pg_locks entry with granted=true.  That gives
you a PID and vxid/xid.  You can self-join pg_locks with that, and join
to pg_stat_activity.

I remember we discussed having a layer of system views on top of
pg_stat_activity and pg_locks, probably defined recursively, that would
show the full graph of waiters/lockers.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: multiple psql option -c

2015-07-27 Thread Robert Haas
On Sat, Jul 25, 2015 at 5:27 AM, Pavel Stehule  wrote:
> It will be nice side effect, but my primary problem was a impossibility to
> combine VACUUM and any other statement to one simple psql call.

Seems like you can do that easily enough:

[rhaas pgsql]$ (echo 'SELECT 1;'; echo 'VACUUM;'; echo 'SELECT 2;') | psql
 ?column?
--
1
(1 row)

VACUUM
 ?column?
--
2
(1 row)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Several memory leaks for pg_rewind caused by missing PQclear calls

2015-07-27 Thread Heikki Linnakangas

On 07/23/2015 05:08 PM, Michael Paquier wrote:

Hi all,

After a run of valgrind on pg_rewind, I found a couple of code paths
missing some PQclear calls after running a query. Attached is a patch
to fix all those leaks.


Fixed, thanks.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Alpha 2 wrapping August 3

2015-07-27 Thread Josh Berkus
... so please get those fixes/overhauls in in the next week.

Thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/27/2015 10:03 AM, Joe Conway wrote:
> On 07/26/2015 07:59 AM, Joe Conway wrote:
>> On 07/26/2015 07:19 AM, Dean Rasheed wrote:
>>> Attached is an updated patch (still needs some docs for the
>>> functions).
>> 
>> Thanks for that. I'll add the docs.
> 
> Documentation added. Also added comment to check_enable_rls about 
> passing InvalidOid versus GetUserId().
> 
> I believe this is ready to go -- any other comments?

Strike that - now I really think it is ready to go :-)

In this patch I additionally changed instances of:
  check_enable_rls(indrelid, GetUserId(), true)
to
  check_enable_rls(indrelid, InvalidOid, true)
per Dean's earlier remark and my new comment.

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVtma+AAoJEDfy90M199hl01wP+wYTV6VfBbpVEVf2+ZmbQlbJ
pgquLXkXsZ9vdsw/jY09+7HKwVQFjqq+E3zjqj/Pn9Q0h17cgflPuYSvde30Mb+l
86zVD5oKLttFlCb9Ablbauc8FoYTud3D+fJkGwDPBYh5VeIlFRwQMRSKQRxKHFfr
PvXmv3z7TmYGBe7dLEl24WyGncOtsJxPiHZDYA5Cna7lG+jlHqVIDz5itu6xGHgy
OOLfr07aZX3Bt9zmzg1NdxcBZNc6NkSVtKFzkqrJ+rCIcoMFxyIWsVp2IAEOItFI
o7hNEqrRk8yMcyX+Ej7K/6arOqCjQ6+RT+tJarCNDPv7WRXwt4PInircCjswt+uX
/vMM7zhzhrW+BMc2rbkU4TKfcEfI78SxUh3jKRTMbUWM6UJPZ+ca1mo6EQGNhUaS
mOMnpPD+huKXZpKlAC1ImH1boFPYqf9de6ToQRIdm7GKLUhKK8llWg3wC2GwMrtq
JDojJhPUohhofMaU7YjokJWx0vAa3NckgCO4nmYvL5Sc36+QUDlW4Amm43el7PvB
SkD2B0AvLZFmMJlrh3eAnuDleXzjRmVc1WoJtGGT2qwmL9oSDtT6y4Uh+0VnDJkh
T7XJ1NgvwFGNzG/heVTv346Mah2wRl/4A43jpojzQLjbNZ7t2gi8h9DkanA7/iGK
JOmMBbIfVlKnT+SKEOVJ
=WZhM
-END PGP SIGNATURE-
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 98d1497..fd82ea4 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SET search_path TO schemaboolean
 does current user have privilege for role

+   
+row_security_active(table)
+
+boolean
+does current user have row level security active for table
+   
   
  
 
*** SET search_path TO schema
  pg_has_role
 
+
+ row_security_active
+
  
 
  has_table_privilege checks whether a user
*** SELECT has_function_privilege('joeuser',
*** 15462,15467 
--- 15471,15483 
  are immediately available without doing SET ROLE.
 
  
+
+ row_security_active checks whether row level
+ security is active for the specified table in the context of the
+ current_user and environment. The table can
+ be specified by name or by OID.
+
+ 

  shows functions that
 determine whether a certain object is visible in the
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 1043362..aa5b28c 100644
*** a/src/backend/access/index/genam.c
--- b/src/backend/access/index/genam.c
*** BuildIndexValueDescription(Relation inde
*** 204,210 
  	Assert(indexrelid == idxrec->indexrelid);
  
  	/* RLS check- if RLS is enabled then we don't return anything. */
! 	if (check_enable_rls(indrelid, GetUserId(), true) == RLS_ENABLED)
  	{
  		ReleaseSysCache(ht_idx);
  		return NULL;
--- 204,210 
  	Assert(indexrelid == idxrec->indexrelid);
  
  	/* RLS check- if RLS is enabled then we don't return anything. */
! 	if (check_enable_rls(indrelid, InvalidOid, true) == RLS_ENABLED)
  	{
  		ReleaseSysCache(ht_idx);
  		return NULL;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..c0bd6fa 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_indexes AS
*** 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
--- 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats WITH (security_barrier) AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
*** CREATE VIEW pg_stats AS
*** 211,217 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
   LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
! WHERE NOT attisdropped AND has_column_privilege(c.oid, a.attnum, 'select');
  
  REVOKE ALL on pg_statistic FROM public;
  
--- 211,219 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
   LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
! WHERE NOT attisdropped
! AND has_column_privilege(c.oid, a.attnum, 'select')
! AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
  
  REVOKE ALL on pg_statistic FROM public;
  
diff -

Re: [HACKERS] A little RLS oversight?

2015-07-27 Thread Joe Conway
On 07/26/2015 07:59 AM, Joe Conway wrote:
> On 07/26/2015 07:19 AM, Dean Rasheed wrote:
>> Attached is an updated patch (still needs some docs for the functions).
> 
> Thanks for that. I'll add the docs.

Documentation added. Also added comment to check_enable_rls about
passing InvalidOid versus GetUserId().

I believe this is ready to go -- any other comments?

Joe

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 98d1497..fd82ea4 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SET search_path TO schemaboolean
 does current user have privilege for role

+   
+row_security_active(table)
+
+boolean
+does current user have row level security active for table
+   
   
  
 
*** SET search_path TO schema
  pg_has_role
 
+
+ row_security_active
+
  
 
  has_table_privilege checks whether a user
*** SELECT has_function_privilege('joeuser',
*** 15462,15467 
--- 15471,15483 
  are immediately available without doing SET ROLE.
 
  
+
+ row_security_active checks whether row level
+ security is active for the specified table in the context of the
+ current_user and environment. The table can
+ be specified by name or by OID.
+
+ 

  shows functions that
 determine whether a certain object is visible in the
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..c0bd6fa 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_indexes AS
*** 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
--- 150,156 
   LEFT JOIN pg_tablespace T ON (T.oid = I.reltablespace)
  WHERE C.relkind IN ('r', 'm') AND I.relkind = 'i';
  
! CREATE VIEW pg_stats WITH (security_barrier) AS
  SELECT
  nspname AS schemaname,
  relname AS tablename,
*** CREATE VIEW pg_stats AS
*** 211,217 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
   LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
! WHERE NOT attisdropped AND has_column_privilege(c.oid, a.attnum, 'select');
  
  REVOKE ALL on pg_statistic FROM public;
  
--- 211,219 
  FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
   JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
   LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
! WHERE NOT attisdropped
! AND has_column_privilege(c.oid, a.attnum, 'select')
! AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
  
  REVOKE ALL on pg_statistic FROM public;
  
diff --git a/src/backend/rewrite/rowsecurity.c b/src/backend/rewrite/rowsecurity.c
index aaf0061..2386cf0 100644
*** a/src/backend/rewrite/rowsecurity.c
--- b/src/backend/rewrite/rowsecurity.c
*** get_row_security_policies(Query *root, C
*** 107,113 
  
  	Relation	rel;
  	Oid			user_id;
- 	int			sec_context;
  	int			rls_status;
  	bool		defaultDeny = false;
  
--- 107,112 
*** get_row_security_policies(Query *root, C
*** 117,138 
  	*hasRowSecurity = false;
  	*hasSubLinks = false;
  
! 	/* This is just to get the security context */
! 	GetUserIdAndSecContext(&user_id, &sec_context);
  
  	/* Switch to checkAsUser if it's set */
  	user_id = rte->checkAsUser ? rte->checkAsUser : GetUserId();
  
- 	/*
- 	 * If this is not a normal relation, or we have been told to explicitly
- 	 * skip RLS (perhaps because this is an FK check) then just return
- 	 * immediately.
- 	 */
- 	if (rte->relid < FirstNormalObjectId
- 		|| rte->relkind != RELKIND_RELATION
- 		|| (sec_context & SECURITY_ROW_LEVEL_DISABLED))
- 		return;
- 
  	/* Determine the state of RLS for this, pass checkAsUser explicitly */
  	rls_status = check_enable_rls(rte->relid, rte->checkAsUser, false);
  
--- 116,128 
  	*hasRowSecurity = false;
  	*hasSubLinks = false;
  
! 	/* If this is not a normal relation, just return immediately */
! 	if (rte->relkind != RELKIND_RELATION)
! 		return;
  
  	/* Switch to checkAsUser if it's set */
  	user_id = rte->checkAsUser ? rte->checkAsUser : GetUserId();
  
  	/* Determine the state of RLS for this, pass checkAsUser explicitly */
  	rls_status = check_enable_rls(rte->relid, rte->checkAsUser, false);
  
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index e6808e7..525794f 100644
*** a/src/backend/utils/cache/plancache.c
--- b/src/backend/utils/cache/plancache.c
*** CreateCachedPlan(Node *raw_parse_tree,
*** 153,160 
  	CachedPlanSource *plansou

Re: [HACKERS] Reduce ProcArrayLock contention

2015-07-27 Thread Pavan Deolasee
On Sat, Jul 25, 2015 at 10:12 AM, Amit Kapila 
wrote:

>
> >>
> >
> > Numbers look impressive and definitely shows that the idea is worth
> pursuing. I tried patch on my laptop. Unfortunately, at least for 4 and 8
> clients, I did not see any improvement.
> >
>
> I can't help in this because I think we need somewhat
> bigger m/c to test the impact of patch.
>
>
I understand. IMHO it will be a good idea though to ensure that the patch
does not cause regression for other setups such as a less powerful machine
or while running with lower number of clients.


>
> I was telling that fact even without my patch. Basically I have
> tried by commenting ProcArrayLock in ProcArrayEndTransaction.
>
>
I did not get that. You mean the TPS is same if you run with commenting out
ProcArrayLock in ProcArrayEndTransaction? Is that safe to do?


> > But those who don't get the lock will sleep and hence the contention is
> moved somewhere else, at least partially.
> >
>
> Sure, if contention is reduced at one place it will move
> to next lock.
>

What I meant was that the lock may not show up in the contention because
all but one processes now sleep for their work to do be done by the group
leader. So the total time spent in the critical section remains the same,
but not shown in the profile. Sure, your benchmark numbers show this is
still better than all processes contending for the lock.


>
>
> No, autovacuum generates I/O due to which sometimes there
> is more variation in Write tests.
>

Sure, but on an average does it still show similar improvements? Or does
the test becomes IO bound and hence the bottleneck shifts somewhere else?
Can you please post those numbers as well when you get chance?


>
> I can do something like that if others also agree with this new
> API in LWLock series, but personally I don't think LWLock.c is
> the right place to expose API for this work.  Broadly the work
> we are doing can be thought of below sub-tasks.
>
> 1. Advertise each backend's xid.
> 2. Push all backend's except one on global list.
> 3. wait till some-one wakes and check if the xid is cleared,
>repeat untll the xid is clear
> 4. Acquire the lock
> 5. Pop all the backend's and clear each one's xid and used
>their published xid to advance global latestCompleteXid.
> 6. Release Lock
> 7. Wake all the processes waiting for their xid to be cleared
>and before waking mark that Xid of the backend is clear.
>
> So among these only step 2 can be common among different
> algorithms, other's need some work specific to each optimization.
>
>
Right, but if we could encapsulate that work in a function that just needs
to work on some shared memory, I think we can build such infrastructure.
Its possible though a more elaborate infrastructure is needed that just one
function pointer. For example, in this case, we also want to set the
latestCompletedXid after clearing xids for all pending processes.


>
> >
> > Regarding the patch, the compare-and-exchange function calls that you've
> used would work only for 64-bit machines, right? You would need to use
> equivalent 32-bit calls on a 32-bit machine.
> >
>
> I thought that internal API will automatically take care of it,
> example for msvc it uses _InterlockedCompareExchange64
> which if doesn't work on 32-bit systems or is not defined, then
> we have to use 32-bit version, but I am not certain about
> that fact.
>
>
Hmm. The pointer will be a 32-bit field on a 32-bit machine. I don't know
how exchanging that with 64-bit integer be safe.

Thanks,
Pavan


-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented

2015-07-27 Thread Andrew Dunstan


On 07/27/2015 10:06 AM, Tom Lane wrote:

I challenge anybody to figure out what happened here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-07-27%2010%3A25%3A17
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2015-07-04%2016%3A00%3A23
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2015-07-07%2016%3A35%3A06

With no visibility of pg_ctl's output, and no copy of the postmaster log,
there is no chance of debugging intermittent failures like this one.
This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing
has inadequate error reporting by design.  If "not ok" isn't enough
information for you, tough beans.  (It might help if the farm script
captured the postmaster log after a failure, but that would do nothing
for prove's unwillingness to pass through client-side messages.)

I think we should disable TAP testing in the buildfarm until there is
some credible form of error reporting for it.  I've grown tired of
looking into buildfarm failure reports only to meet a dead end.
Aside from the wasted investigation time, which admittedly isn't huge,
there's an opportunity cost in that subsequent test steps didn't get run.




Well, it does create a lot of files that we don't pick up. An example 
list is show below, and I am attaching their contents in a single 
gzipped attachment. However, these are in the wrong location. This was a 
vpath build and yet these tmp_check directories are all created in the 
source tree. Let's fix that and then I'll set about having the buildfarm 
collect them. That should get us further down the track.


cheers

andrew

   
/home/andrew/pgl/pg_head/src/bin/pg_controldata/tmp_check/log/regress_log_001_pg_controldata
   
/home/andrew/pgl/pg_head/src/bin/pg_basebackup/tmp_check/log/regress_log_020_pg_receivexlog
   
/home/andrew/pgl/pg_head/src/bin/pg_basebackup/tmp_check/log/regress_log_010_pg_basebackup
   /home/andrew/pgl/pg_head/src/bin/pg_rewind/regress_log
   
/home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_003_extrafiles
   
/home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_001_basic
   
/home/andrew/pgl/pg_head/src/bin/pg_rewind/tmp_check/log/regress_log_002_databases
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_091_reindexdb_all
   /home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_050_dropdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_070_dropuser
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_020_createdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_102_vacuumdb_stages
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_030_createlang
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_060_droplang
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_040_createuser
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_010_clusterdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_011_clusterdb_all
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_101_vacuumdb_all
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_090_reindexdb
   
/home/andrew/pgl/pg_head/src/bin/scripts/tmp_check/log/regress_log_080_pg_isready
   
/home/andrew/pgl/pg_head/src/bin/pg_config/tmp_check/log/regress_log_001_pg_config
   
/home/andrew/pgl/pg_head/src/bin/pg_ctl/tmp_check/log/regress_log_001_start_stop
   /home/andrew/pgl/pg_head/src/bin/pg_ctl/tmp_check/log/regress_log_002_status
   /home/andrew/pgl/pg_head/src/bin/initdb/tmp_check/log/regress_log_001_initdb





binlog.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Anastasia Lubennikova
2015-07-27 20:05 GMT+04:00 Heikki Linnakangas :

> On 07/27/2015 06:46 PM, Teodor Sigaev wrote:
>
>> I need an advice, what would be better:
>>> - to add new flag like F_HAS_GARBAGE,
>>> - or to delete all mentions of F_TUPLES_DELETED and use it in gist
>>> microvacuum.
>>>
>>
>> According to commit message:
>> commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079
>> Author: Heikki Linnakangas 
>> Date:   Fri Nov 7 15:03:46 2014 +0200
>> ..
>>   The code that generated a record to clear the F_TUPLES_DELETED flag
>> hasn't
>>   existed since we got rid of old-style VACUUM FULL. I kept the code
>> that sets
>>   the flag, although it's not used for anything anymore, because it
>> might
>>   still be interesting information for debugging purposes that some
>> tuples
>>   have been deleted from a page.
>> ..
>>
>> If Heikki doesn't change his opinion then introduce new flag. Although I
>> don't
>> think that we need to keep F_TUPLES_DELETED.
>>
>
> It's certainly not needed for anything at the moment, although conceivably
> we might reintroduce code that needs it in the future. There are plenty of
> flag bits available, so let's use a new flag. If there was a shortage, I
> wouldn't blink reusing F_TUPLES_DELETED.
>
> - Heikki
>
>
Thanks for the quick reply

-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 06:46 PM, Teodor Sigaev wrote:

I need an advice, what would be better:
- to add new flag like F_HAS_GARBAGE,
- or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum.


According to commit message:
commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079
Author: Heikki Linnakangas 
Date:   Fri Nov 7 15:03:46 2014 +0200
..
  The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
  existed since we got rid of old-style VACUUM FULL. I kept the code that 
sets
  the flag, although it's not used for anything anymore, because it might
  still be interesting information for debugging purposes that some tuples
  have been deleted from a page.
..

If Heikki doesn't change his opinion then introduce new flag. Although I don't
think that we need to keep F_TUPLES_DELETED.


It's certainly not needed for anything at the moment, although 
conceivably we might reintroduce code that needs it in the future. There 
are plenty of flag bits available, so let's use a new flag. If there was 
a shortage, I wouldn't blink reusing F_TUPLES_DELETED.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts

2015-07-27 Thread Fabien COELHO



Attatched is the revised version of this patch.

The first patch is not changed from before.

The second is fixed a kind of bug.

Ths third is the new one to allow backslash continuation for
backslash commands.


Ah, thanks:-)

Would you consider adding the patch to the next commitfest? I may put 
myself as a reviewer...



[...] I'm not satisfied by the design but I don't see another way..


I'll try to have a look.


I don't have idea how to deal with the copy of psqlscan.[lh] from
psql. Currently they are simply the dead copies of those of psql.


I think that there should be no copies, but it should use relative
symbolic links so that the files are kept synchronized.


Yeah, I think so but symlinks could harm on git and Windows.
The another way would be make copies it from psql directory. They live 
next door to each other.


Indeed there are plenty of links already which are generated by makefiles 
(see src/bin/pg_xlogdump/*), and probably a copy is made on windows. There 
should no file duplication within the source tree.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Teodor Sigaev

I need an advice, what would be better:
- to add new flag like F_HAS_GARBAGE,
- or to delete all mentions of F_TUPLES_DELETED and use it in gist microvacuum.


According to commit message:
commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079
Author: Heikki Linnakangas 
Date:   Fri Nov 7 15:03:46 2014 +0200
..
The code that generated a record to clear the F_TUPLES_DELETED flag hasn't
existed since we got rid of old-style VACUUM FULL. I kept the code that sets
the flag, although it's not used for anything anymore, because it might
still be interesting information for debugging purposes that some tuples
have been deleted from a page.
..

If Heikki doesn't change his opinion then introduce new flag. Although I don't 
think that we need to keep F_TUPLES_DELETED.



Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-27 Thread Robert Haas
On Sat, Jul 25, 2015 at 4:46 AM, Simon Riggs  wrote:
> On 22 July 2015 at 14:59, Robert Haas  wrote:
>> Urgh.  So if we do this, that forever precludes having HOT pruning set
>> the all-visible bit.
>
> What is the reason why we don't do that already? Surely its a one liner?

I wish!

It actually doesn't look simple to me to modify heap_page_prune to
know whether any not-all-visible items remain at the end.  It's
definitely true that, in order for marking the page all-visible to be
a possibility, we need to reach heap_page_prune_execute() with ndead
== 0.  But that's not enough: we also need to know that any tuple that
survived the prune operation (that is, it wasn't redirected or marked
unused) has a new-enough xmin, which isn't tracked anywhere.
vacuumlazy.c solves that problem by making a second pass over all the
tuples on the page to determine whether the page is all-visible, but
that seems pretty inefficient and I think heap_page_prune() is already
a CPU hog on some workloads.  (Didn't Heikki have a patch to improve
that?)

Another sticky wicket is that, as things stand today, we'd have to
insert a second WAL record to mark the page all-visible, which would
add probably-significant overhead.  We could probably modify the
format of the record we're already emitting for pruning to carry a
flag indicating whether to additionally set PD_ALL_VISIBLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] All-zero page in GIN index causes assertion failure

2015-07-27 Thread Amit Langote
On Tue, Jul 28, 2015 at 12:21 AM, Heikki Linnakangas  wrote:
>
> Thanks, I've pushed this, as well a fix to similar failure from B-tree
> vacuum that Amit Langote reported in the other thread.
>

Thanks Heikki and sorry I didn't notice this new thread.

Regards,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] All-zero page in GIN index causes assertion failure

2015-07-27 Thread Heikki Linnakangas

On 07/23/2015 07:43 PM, Jeff Janes wrote:

On Mon, Jul 20, 2015 at 1:23 PM, Heikki Linnakangas  wrote:


I came up with the attached, for GIN and SP-GiST. Instead of WAL-logging
the page initialization in SP-GiST vacuum, I changed it so that it simply
leaves the page as all-zeroes, and adds it to the FSM. The code that grabs
a target page from the FSM handles that, and initializes the page anyway,
so that was simpler. This made the SP-GiST is-deleted flag obsolete, it's
no longer set, although the code still checks for it for
backwards-compatibility. (even that may actually be unnecessary, as a page
that's marked as deleted must also be empty, and wherever we check for the
deleted-flag, we also check for PageIsEmpty()))


This patch, in conjunction with the LWLock deadlock patch, fixes all the
issues I was having with GIN indexes in 9.5.


Thanks, I've pushed this, as well a fix to similar failure from B-tree 
vacuum that Amit Langote reported in the other thread.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm TAP testing is useless as currently implemented

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 05:06 PM, Tom Lane wrote:

I challenge anybody to figure out what happened here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-07-27%2010%3A25%3A17
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2015-07-04%2016%3A00%3A23
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2015-07-07%2016%3A35%3A06

With no visibility of pg_ctl's output, and no copy of the postmaster log,
there is no chance of debugging intermittent failures like this one.
This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing
has inadequate error reporting by design.  If "not ok" isn't enough
information for you, tough beans.  (It might help if the farm script
captured the postmaster log after a failure, but that would do nothing
for prove's unwillingness to pass through client-side messages.)


Yep.


I think we should disable TAP testing in the buildfarm until there is
some credible form of error reporting for it.  I've grown tired of
looking into buildfarm failure reports only to meet a dead end.
Aside from the wasted investigation time, which admittedly isn't huge,
there's an opportunity cost in that subsequent test steps didn't get run.


Commit 1ea06203b - Improve logging of TAP tests - made it a lot better. 
The pg_ctl log should be in the log file now. The buildfarm doesn't seem 
to capture those logs at the moment, but that should be easy to fix.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduce ProcArrayLock contention

2015-07-27 Thread Robert Haas
On Sat, Jul 25, 2015 at 12:42 AM, Amit Kapila  wrote:
> I thought that internal API will automatically take care of it,
> example for msvc it uses _InterlockedCompareExchange64
> which if doesn't work on 32-bit systems or is not defined, then
> we have to use 32-bit version, but I am not certain about
> that fact.

Instead of using pg_atomic_uint64, how about using pg_atomic_uint32
and storing the pgprocno rather than the pointer directly?  Then it
can work the same way (and be the same size) on every platform.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Anastasia Lubennikova
Hi,

I'm working on microvacuum for gist access method.
Briefly microvacuum includes two steps:
1. When search tells us that the tuple is invisible to all transactions it
is marked LP_DEAD and page is marked as "has dead tuples",
2. Then, when insert touches full page which has dead tuples it calls
microvacuum instead of splitting page.
You can find a kind of review here [1].

While writing patch, I found strange GISTPageOpaqueData flag -
F_TUPLES_DELETED

.
Its description looks like it is the same for BTP_HAS_GARBAGE


#define F_TUPLES_DELETED   (1 << 2) /* some tuples on the page are dead */

#define BTP_HAS_GARBAGE   (1 << 6) /* page has LP_DEAD tuples */

But it's definitely not the same things. I found only two mentions of this
flag.
Function GistMarkTuplesDeleted

sets
the flag after dead tuples deletion.

Do anyone need it at all? I found no place where this flag is checked.
Is it correct using of the flag?

I need an advice, what would be better:
- to add new flag like F_HAS_GARBAGE,
- or to delete all mentions of F_TUPLES_DELETED and use it in gist
microvacuum.


[1]
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/ivanitskiy_ilya/5629499534213120
-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Buildfarm failure from overly noisy warning message

2015-07-27 Thread Kevin Grittner
Tom Lane  wrote:
> Kevin Grittner  writes:

>> I think a LOG entry when an autovacuum process is actually canceled
>> has value just in case it is happening on a particular table so
>> frequently that the table starts to bloat.  I see no reason to log
>> anything if there is an intention to cancel an autovacuum process
>> but it actually completes before we can do so.

> Hm.  By that logic, I'm not sure if we need anything to be logged here,
> because the autovacuum process will log something about having received
> a query cancel signal.

That seems sufficient to me for normal cases.

> If we're in the business of minimizing log chatter, I'd suggest that
> we remove the entirely-routine "sending cancel" log message, and only
> log something in the uncommon case where the kill() fails (but, per
> original point, reduce that entry to LOG or so; or else print something
> only for not-ESRCH cases).

+1 for only printing for the non-ESRCH cases.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-27 Thread Tom Lane
Andreas Seltenreich  writes:
> Tom Lane writes:
>> I've fixed the first two of these --- thanks for the report!

> I let sqlsmith run during the night, and it did no longer trigger the
> first two.  During roughly a million random queries it triggered the
> already mentioned brin one 10 times, but there was also one instance of
> this new one in the log:

Oh goody, more fun.  I'll take a look.

>> I'm a bit confused about this aspect of your report though, because in
>> my hands that example fails clear back to 9.2.  It doesn't seem to require
>> the predtest.c improvement to expose the fault.

> Hmm, I actually used a different, uglier query to trigger this assertion
> for the bisection run.

Ah, okay.  The triggering condition for both those cases is
provably-contradictory restriction clauses on an inheritance relation.
In what you showed yesterday, that was something like "x < x AND x IS
NULL", which the planner has been able to recognize as contradictory
for a long time because "<" is strict.  (It did not, and still doesn't,
notice that "x < x" all by itself is contradictory...).  But here it
looks like the trigger is

from public.b as rel4551420
where ( rel4551420.bb>rel4551420.bb ) and ( rel4551420.bb bb.  So that's why this run started to fail
there, even though the bug it was tickling is much older.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restore-reliability mode

2015-07-27 Thread Alvaro Herrera
Noah Misch wrote:
> On Thu, Jul 23, 2015 at 04:53:49PM -0300, Alvaro Herrera wrote:
> > Peter Geoghegan wrote:
> > > On Sat, Jun 6, 2015 at 12:58 PM, Noah Misch  wrote:
> > > >   - Call VALGRIND_MAKE_MEM_NOACCESS() on a shared buffer when its local 
> > > > pin
> > > > count falls to zero.  Under CLOBBER_FREED_MEMORY, wipe a shared 
> > > > buffer
> > > > when its global pin count falls to zero.
> > > 
> > > Did a patch for this ever materialize?
> > 
> > I think the first part would be something like the attached.
> 
> Neat.  Does it produce any new complaints during "make installcheck"?

I only tried a few tests, for lack of time, and it didn't produce any.
(To verify that the whole thing was working properly, I reduced the
range of memory made available during PinBuffer and that resulted in a
crash immediately).  I am not really familiar with valgrind TBH and just
copied a recipe to run postmaster under it, so if someone with more
valgrind-fu could verify this, it would be great.


This part:

> > > > Under CLOBBER_FREED_MEMORY, wipe a shared buffer when its
> > > > global pin count falls to zero.

can be done without any valgrind, I think.  Any takers?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failing assertions in indxpath.c, placeholder.c and brin_minmax.c

2015-07-27 Thread Alvaro Herrera
Tom Lane wrote:

> Bottom line is that somebody failed to consider the possibility of a
> null comparison value reaching the BRIN index lookup machinery.
> The code stanza that's failing supposes that only IS NULL or IS NOT NULL
> tests could have SK_ISNULL set, but that's just wrong.

Hm, okay, will look into that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Kouhei Kaigai
> On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai  wrote:
> >
> > Hello,
> >
> > I'm recently working/investigating on ParallelAppend feature
> > towards the next commit fest. Below is my design proposal.
> >
> > 1. Concept
> > --
> > Its concept is quite simple anybody might consider more than once.
> > ParallelAppend node kicks background worker process to execute
> > child nodes in parallel / asynchronous.
> > It intends to improve the performance to scan a large partitioned
> > tables from standpoint of entire throughput, however, latency of
> > the first multi-hundred rows are not scope of this project.
> > From standpoint of technology trend, it primarily tries to utilize
> > multi-cores capability within a system, but also enables to expand
> > distributed database environment using foreign-tables inheritance
> > features.
> > Its behavior is very similar to Funnel node except for several
> > points, thus, we can reuse its infrastructure we have had long-
> > standing discussion through the v9.5 development cycle.
> >
> > 2. Problems to be solved
> > -
> > Typical OLAP workloads takes tons of tables join and scan on large
> > tables which are often partitioned, and its KPI is query response
> > time but very small number of sessions are active simultaneously.
> > So, we are required to run a single query as rapid as possible even
> > if it consumes larger computing resources than typical OLTP workloads.
> >
> > Current implementation to scan heap is painful when we look at its
> > behavior from the standpoint - how many rows we can read within a
> > certain time, because of synchronous manner.
> > In the worst case, when SeqScan node tries to fetch the next tuple,
> > heap_getnext() looks up a block on shared buffer, then ReadBuffer()
> > calls storage manager to read the target block from the filesystem
> > if not on the buffer. Next, operating system makes the caller
> > process slept until required i/o get completed.
> > Most of the cases are helped in earlier stage than the above worst
> > case, however, the best scenario we can expect is: the next tuple
> > already appear on top of the message queue (of course visibility
> > checks are already done also) with no fall down to buffer manager
> > or deeper.
> > If we can run multiple scans in parallel / asynchronous, CPU core
> > shall be assigned to another process by operating system, thus,
> > it eventually improves the i/o density and enables higher processing
> > throughput.
> > Append node is an ideal point to be parallelized because
> > - child nodes can have physically different location by tablespace,
> >   so further tuning is possible according to the system landscape.
> > - it can control whether subplan is actually executed on background
> >   worker, per subplan basis. If subplan contains large tables and
> >   small tables, ParallelAppend may kick background worker to scan
> >   large tables only, but scan on small tables are by itself.
> > - Like as Funnel node, we don't need to care about enhancement of
> >   individual node types. SeqScan, IndexScan, ForeignScan or others
> >   can perform as usual, but actually in parallel.
> >
> >
> > 3. Implementation
> > --
> > * Plan & Cost
> >
> > ParallelAppend shall appear where Appen can appear except for the
> > usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
> > both of AppendPath and ParallelAppendPath with cost for each.
> >
> 
> Is there a real need to have new node like ParallelAppendPath?
> Can't we have Funnel node beneath AppendNode and then each
> worker will be responsible to have SeqScan on each inherited child
> relation.  Something like
> 
> Append
>---> Funnel
>   --> SeqScan rel1
>   --> SeqScan rel2
>
If Funnel can handle both of horizontal and vertical parallelism,
it is a great simplification. I never stick a new node.

Once Funnel get a capability to have multiple child nodes, probably,
Append node above will have gone. I expect set_append_rel_pathlist()
add two paths based on Append and Funnel, then planner will choose
the cheaper one according to its cost.

We will need to pay attention another issues we will look at when Funnel
kicks background worker towards asymmetric relations.

If number of rows of individual child nodes are various, we may
want to assign 10 background workers to scan rel1 with PartialSeqScan.
On the other hands, rel2 may have very small number of rows thus
its total_cost may be smaller than cost to launch a worker.
In this case, Funnel has child nodes to be executed asynchronously and
synchronously.

If cheapest path of the child relation is a pair of Funnel and
PartialSeqScan, we have to avoid to stack Funnel node. Probably,
Funnel node that performs like Append needs to pull up underlying
Funnel and assign equivalen number of workers as follows.

  Append
   --> Funnel
--> PartialSeqScan on rel1 (num_workers = 4)
   --> Funnel
--> PartialSeqScan on 

[HACKERS] Buildfarm TAP testing is useless as currently implemented

2015-07-27 Thread Tom Lane
I challenge anybody to figure out what happened here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2015-07-27%2010%3A25%3A17
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2015-07-04%2016%3A00%3A23
or here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2015-07-07%2016%3A35%3A06

With no visibility of pg_ctl's output, and no copy of the postmaster log,
there is no chance of debugging intermittent failures like this one.
This isn't entirely the buildfarm's fault --- AFAICS, prove-based testing
has inadequate error reporting by design.  If "not ok" isn't enough
information for you, tough beans.  (It might help if the farm script
captured the postmaster log after a failure, but that would do nothing
for prove's unwillingness to pass through client-side messages.)

I think we should disable TAP testing in the buildfarm until there is
some credible form of error reporting for it.  I've grown tired of
looking into buildfarm failure reports only to meet a dead end.
Aside from the wasted investigation time, which admittedly isn't huge,
there's an opportunity cost in that subsequent test steps didn't get run.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Kouhei Kaigai
> Hello, can I ask some questions?
>
> I suppose we can take this as the analog of ParalleSeqScan.  I
> can see not so distinction between Append(ParalleSeqScan) and
> ParallelAppend(SeqScan). What difference is there between them?
>
Append does not start to execute the second or later node until
first node reaches end of the scan.
On the other hands, ParallelAppend will kick all the child nodes
(almost) simultaneously.

> If other nodes will have the same functionality as you mention at
> the last of this proposal, it might be better that some part of
> this feature is implemented as a part of existing executor
> itself, but not as a deidicated additional node, just as my
> asynchronous fdw execution patch patially does. (Although it
> lacks planner part and bg worker launching..) If that is the
> case, it might be better that ExecProcNode is modified so that it
> supports both in-process and inter-bgworker cases by the single
> API.
> 
> What do you think about this?
>
Its downside is that we need to adjust all the existing nodes to
follow the new executor's capability. At this moment, we have 38
node types delivered from Plan. I think, it is not an easy job to
review a patch that changes multi-dozens files.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> regards,
> 
> > Hello,
> >
> > I'm recently working/investigating on ParallelAppend feature
> > towards the next commit fest. Below is my design proposal.
> >
> > 1. Concept
> > --
> > Its concept is quite simple anybody might consider more than once.
> > ParallelAppend node kicks background worker process to execute
> > child nodes in parallel / asynchronous.
> > It intends to improve the performance to scan a large partitioned
> > tables from standpoint of entire throughput, however, latency of
> > the first multi-hundred rows are not scope of this project.
> > From standpoint of technology trend, it primarily tries to utilize
> > multi-cores capability within a system, but also enables to expand
> > distributed database environment using foreign-tables inheritance
> > features.
> > Its behavior is very similar to Funnel node except for several
> > points, thus, we can reuse its infrastructure we have had long-
> > standing discussion through the v9.5 development cycle.
> >
> > 2. Problems to be solved
> > -
> > Typical OLAP workloads takes tons of tables join and scan on large
> > tables which are often partitioned, and its KPI is query response
> > time but very small number of sessions are active simultaneously.
> > So, we are required to run a single query as rapid as possible even
> > if it consumes larger computing resources than typical OLTP workloads.
> >
> > Current implementation to scan heap is painful when we look at its
> > behavior from the standpoint - how many rows we can read within a
> > certain time, because of synchronous manner.
> > In the worst case, when SeqScan node tries to fetch the next tuple,
> > heap_getnext() looks up a block on shared buffer, then ReadBuffer()
> > calls storage manager to read the target block from the filesystem
> > if not on the buffer. Next, operating system makes the caller
> > process slept until required i/o get completed.
> > Most of the cases are helped in earlier stage than the above worst
> > case, however, the best scenario we can expect is: the next tuple
> > already appear on top of the message queue (of course visibility
> > checks are already done also) with no fall down to buffer manager
> > or deeper.
> > If we can run multiple scans in parallel / asynchronous, CPU core
> > shall be assigned to another process by operating system, thus,
> > it eventually improves the i/o density and enables higher processing
> > throughput.
> > Append node is an ideal point to be parallelized because
> > - child nodes can have physically different location by tablespace,
> >   so further tuning is possible according to the system landscape.
> > - it can control whether subplan is actually executed on background
> >   worker, per subplan basis. If subplan contains large tables and
> >   small tables, ParallelAppend may kick background worker to scan
> >   large tables only, but scan on small tables are by itself.
> > - Like as Funnel node, we don't need to care about enhancement of
> >   individual node types. SeqScan, IndexScan, ForeignScan or others
> >   can perform as usual, but actually in parallel.
> >
> >
> > 3. Implementation
> > --
> > * Plan & Cost
> >
> > ParallelAppend shall appear where Appen can appear except for the
> > usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
> > both of AppendPath and ParallelAppendPath with cost for each.
> > Cost estimation logic shall take further discussions, however,
> > I expect the logic below to estimate the cost for ParallelAppend.
> >   1. Sum startup_cost and run_cost for each child pathnode, but
> >  distinguish according to synchronous or asynchronous

Re: [HACKERS] spgist recovery assertion failure

2015-07-27 Thread Michael Paquier
On Mon, Jul 27, 2015 at 2:33 PM, Piotr Stefaniak
 wrote:
> On 07/27/2015 07:19 AM, Michael Paquier wrote:
>>
>> On Mon, Jul 27, 2015 at 2:00 PM, Noah Misch  wrote:
>>>
>>> When I caused a crash during the create_index regression test, recovery
>>> hit an
>>> assertion failure.  Minimal test case:
>>>
>>> psql -X <>> CREATE TABLE t (c text);
>>> INSERT INTO t SELECT 'P0123456789abcdef' FROM generate_series(1,1000);
>>> INSERT INTO t VALUES ('P0123456789abcdefF');
>>> CREATE INDEX ON t USING spgist (c);
>>> EOSQL
>>> pg_ctl -m immediate -w restart
>>
>>
>> On which platform are you seeing the failure? I am afraid I could not
>> reproduce the failure on Linux and OSX after testing it on HEAD.
>>
>
> I'm having the same symptoms with
> 159cff58cf3b565be3c17901698a74238e9e23f8 on Ubuntu Linux 3.4.39 armv7l.

Yes, on armv7l this can be indeed reproduced.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] creating extension including dependencies

2015-07-27 Thread Michael Paquier
On Sun, Jul 26, 2015 at 1:01 AM, Petr Jelinek wrote:
> Yes that's what I meant by the change of checking order in the explanation
> above. I did that because I thought code would be more complicated
> otherwise, but apparently I was stupid...

+In case the extension specifies schema in its control file, the schema
s/schema/schema/

+++ b/src/test/modules/test_extensions/.gitignore
@@ -0,0 +1,3 @@
+# Generated subdirectories
+/results/
+/tmp_check/
test_extensions/.gitignore is missing /log/.

Something also has not been discussed yet: what to do with new_version
and old_version (the options of CreateExtensionStmt)? As of now if
those options are defined they are not passed down to the parent
extensions but shouldn't we raise an error if they are used in
combination with CASCADE? In any case, I think that the behavior
chosen should be documented.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVOKE [ADMIN OPTION FOR] ROLE

2015-07-27 Thread Egor Rogov



On Thu, Jul 23, 2015 at 8:30 AM, Egor Rogov  wrote:

So, the question: is it a documentation bug (as it seems to me), code bug,
or I missed something?

Your analysis looks right to me, but I don't know whether the code or
the documentation should be changed.  This claim was added by Tom Lane
in 2005 in commit 58d214e51fe50b10b4439da6ec263d54c155afbf.  It might
be worth checking whether the claim was true at that time and later
became false, or whether it was never true to begin with.

As far as I can see, modern revoke syntax for revoking membership in a 
role (along with "admin option") was introduced in commit 7762619 (by 
Tom Lane, 2005). Code for handling this command didn't pay attention for 
"restrict/cascade" keywords then, as it does not now.
Before that, another syntax was in use: alter group groupname drop user 
username [, ...]. It did not include notion of "cascade" at all.
I guess that "revoke role_name from role_name" inherited 
"[cascade|restrict]" section from general revoke command but never 
actually used it. And I see no point in changing this, because role 
membership is somewhat more static than privileges.

So I would propose the attached fix for documentation.

Thanks,
Egor Rogov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/ref/revoke.sgml b/doc/src/sgml/ref/revoke.sgml
index 36c286b..8417abe 100644
--- a/doc/src/sgml/ref/revoke.sgml
+++ b/doc/src/sgml/ref/revoke.sgml
@@ -167,8 +167,10 @@ REVOKE [ ADMIN OPTION FOR ]
   
 
   
-   When revoking membership in a role, GRANT OPTION is instead
-   called ADMIN OPTION, but the behavior is similar.
+   If ADMIN OPTION FOR is specified, only the admin option is
+   revoked, not the group membership itself. When revoking membership in a
+   role or admin option, recursive revocation does not happen no matter
+   whether CASCADE is specified or not.
Note also that this form of the command does not
allow the noise word GROUP.
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.5a1 BUG FIX: pgbench negative latencies

2015-07-27 Thread Fabien COELHO


Under 9.5a1 "pgbench -r" negative latencies are reported on meta commands, 
probably as an oversight of 84f0ea3f.


This patch ensures that "now" is reset on each loop inside doCustom.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 2c3e365..cce67e8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -1146,16 +1146,19 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 	bool		trans_needs_throttle = false;
 	instr_time	now;
 
+top:
 	/*
 	 * gettimeofday() isn't free, so we get the current timestamp lazily the
 	 * first time it's needed, and reuse the same value throughout this
 	 * function after that. This also ensures that e.g. the calculated latency
 	 * reported in the log file and in the totals are the same. Zero means
 	 * "not set yet".
+	 *
+	 * "now" must also be reset on "goto top;" issued when interpreting meta
+	 * commands, otherwise the per-command measured latency is wrong.
 	 */
 	INSTR_TIME_SET_ZERO(now);
 
-top:
 	commands = sql_files[st->use_file];
 
 	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Amit Kapila
On Sun, Jul 26, 2015 at 8:43 AM, Kouhei Kaigai  wrote:
>
> Hello,
>
> I'm recently working/investigating on ParallelAppend feature
> towards the next commit fest. Below is my design proposal.
>
> 1. Concept
> --
> Its concept is quite simple anybody might consider more than once.
> ParallelAppend node kicks background worker process to execute
> child nodes in parallel / asynchronous.
> It intends to improve the performance to scan a large partitioned
> tables from standpoint of entire throughput, however, latency of
> the first multi-hundred rows are not scope of this project.
> From standpoint of technology trend, it primarily tries to utilize
> multi-cores capability within a system, but also enables to expand
> distributed database environment using foreign-tables inheritance
> features.
> Its behavior is very similar to Funnel node except for several
> points, thus, we can reuse its infrastructure we have had long-
> standing discussion through the v9.5 development cycle.
>
> 2. Problems to be solved
> -
> Typical OLAP workloads takes tons of tables join and scan on large
> tables which are often partitioned, and its KPI is query response
> time but very small number of sessions are active simultaneously.
> So, we are required to run a single query as rapid as possible even
> if it consumes larger computing resources than typical OLTP workloads.
>
> Current implementation to scan heap is painful when we look at its
> behavior from the standpoint - how many rows we can read within a
> certain time, because of synchronous manner.
> In the worst case, when SeqScan node tries to fetch the next tuple,
> heap_getnext() looks up a block on shared buffer, then ReadBuffer()
> calls storage manager to read the target block from the filesystem
> if not on the buffer. Next, operating system makes the caller
> process slept until required i/o get completed.
> Most of the cases are helped in earlier stage than the above worst
> case, however, the best scenario we can expect is: the next tuple
> already appear on top of the message queue (of course visibility
> checks are already done also) with no fall down to buffer manager
> or deeper.
> If we can run multiple scans in parallel / asynchronous, CPU core
> shall be assigned to another process by operating system, thus,
> it eventually improves the i/o density and enables higher processing
> throughput.
> Append node is an ideal point to be parallelized because
> - child nodes can have physically different location by tablespace,
>   so further tuning is possible according to the system landscape.
> - it can control whether subplan is actually executed on background
>   worker, per subplan basis. If subplan contains large tables and
>   small tables, ParallelAppend may kick background worker to scan
>   large tables only, but scan on small tables are by itself.
> - Like as Funnel node, we don't need to care about enhancement of
>   individual node types. SeqScan, IndexScan, ForeignScan or others
>   can perform as usual, but actually in parallel.
>
>
> 3. Implementation
> --
> * Plan & Cost
>
> ParallelAppend shall appear where Appen can appear except for the
> usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
> both of AppendPath and ParallelAppendPath with cost for each.
>

Is there a real need to have new node like ParallelAppendPath?
Can't we have Funnel node beneath AppendNode and then each
worker will be responsible to have SeqScan on each inherited child
relation.  Something like

Append
   ---> Funnel
  --> SeqScan rel1
  --> SeqScan rel2


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal for CSN based snapshots

2015-07-27 Thread Alexander Korotkov
On Sat, Jul 25, 2015 at 11:39 AM, Simon Riggs  wrote:

> On 24 July 2015 at 19:21, Robert Haas  wrote:
>
>> On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs 
>> wrote:
>> > It depends on the exact design we use to get that. Certainly we do not
>> want
>> > them if they cause a significant performance regression.
>>
>> Yeah.  I think the performance worries expressed so far are:
>>
>> - Currently, if you see an XID that is between the XMIN and XMAX of
>> the snapshot, you hit CLOG only on first access.  After that, the
>> tuple is hinted.  With this approach, the hint bit doesn't avoid
>> needing to hit CLOG anymore, because it's not enough to know whether
>> or not the tuple committed; you have to know the CSN at which it
>> committed, which means you have to look that up in CLOG (or whatever
>> SLRU stores this data).  Heikki mentioned adding some caching to
>> ameliorate this problem, but it sounds like he was worried that the
>> impact might still be significant.
>>
>
> This seems like the heart of the problem. Changing a snapshot from a list
> of xids into one number is easy. Making XidInMVCCSnapshot() work is the
> hard part because there needs to be a translation/lookup from CSN to
> determine if it contains the xid.
>
> That turns CSN into a reference to a cached snapshot, or a reference by
> which a snapshot can be derived on demand.
>

I got the problem. Currently, once we set hint bits don't have to visit
CLOG anymore. With CSN snapshots that is not so. We still have to translate
XID into CSN in order to compare it with snapshot CSN. In version of CSN
patch in this thread we still have XMIN and XMAX in the snapshot. AFAICS
with CSN snapshots XMIN and XMAX are not necessary required to express
snapshot, they were kept for optimization. That restricts usage of XID =>
CSN map with given range of XIDs. However, with long running transactions
[XMIN; XMAX] range could be very wide and we could use XID => CSN map
heavily in wide range of XIDs.

As I can see in Huawei PGCon talk "Dense Map" in shared memory is proposed
for XID => CSN transformation. Having large enough "Dense Map" we can do
most of XID => CSN transformations with single shared memory access. PGCon
talk gives us result of pgbench. However, pgbench doesn't run any long
transactions. With long running transaction we can run out of "Dense Map"
for significant part of XID => CSN transformations. Dilip, did you test
your patch with long transactions?

I'm also thinking about different option for optimizing this. When we set
hint bits we can also change XMIN/XMAX with CSN. In this case we wouldn't
need to do XID => CSN transformation for this tuple anymore. However, we
have 32-bit XIDs for now. We could also have 32-bit CSNs. However, that
would doubles our troubles with wraparound: we will have 2 counters that
could wraparound. That could return us to thoughts about 64-bit XIDs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] raw output from copy

2015-07-27 Thread Pavel Stehule
2015-07-27 10:41 GMT+02:00 Heikki Linnakangas :

> On 07/27/2015 06:55 AM, Craig Ringer wrote:
>
>> On 7 July 2015 at 14:32, Pavel Stehule  wrote:
>>
>>> Hi
>>>
>>> previous patch was broken, and buggy
>>>
>>> Here is new version with fixed upload and more tests
>>>
>>
>> I routinely see people trying to use COPY ... FORMAT binary to export
>> a single binary field (like an image, for example) and getting
>> confused by the header PostgreSQL adds. Or using text-format COPY and
>> struggling with the hex escaping. It's clearly something people have
>> trouble with.
>>
>> It doesn't help that while lo_import and lo_export can read paths
>> outside the datadir (and refuse to read from within it),
>> pg_read_binary_file is superuser only and disallows absolute paths.
>> There's no corresponding pg_write_binary_file. So users who want to
>> import and export a single binary field tend to try to use COPY. We
>> have functionality for large objects that has no equivalent for
>> 'bytea'.
>>
>> I don't love the use of COPY for this, but it gets us support for
>> arbitrary clients pretty easily. Otherwise it'd be server-side only
>> via local filesystem access, or require special psql-specific
>> functionality like we have for lo_import etc.
>>
>
> COPY seems like a strange interface for this. I can see the point that the
> syntax is almost there already, for both input and output. But even that's
> not quite there yet, we'd need the new RAW format. And as an input method,
> COPY is a bit awkward, because you cannot easily pass the file to a
> function, for example. I think this should be implemented in psql, along
> the lines of Andrew's original \bcopy patch.
>
> There are a couple of related psql-features here actually, that would be
> useful on their own. The first is being able to send the query result to a
> file, for a single query only. You can currently do:
>
> \o /tmp/foo
> SELECT ...;
> \o
>
> But more often than not, when I try to do that, I forget to do the last
> \o, and run another query, and the output still goes to the file. So it'd
> be nice to have a \o option that only affects the next query. Something
> like:
>
> \O /tmp/foo
> SELECT ...;
>
> The second feature needed is to write the output without any headers, row
> delimiters and such. Just the datum. And the third feature is to write it
> in binary. Perhaps something like:
>
> \O /tmp/foo binary
> SELECT blob FROM foo WHERE id = 10;
>
> What about input? This is a whole new feature, but it would be nice to be
> able to pass the file contents as a query parameter. Something like:
>
> \P /tmp/foo binary
> INSERT INTO foo VALUES (?);
>

The example of input is strong reason, why don't do it via inserts. Only
parsing some special "?" symbol needs lot of new code.

In this case, I don't see any advantage of  psql based solution. COPY is
standard interface for input/output from/to files, and it should be used
there.

Regards

Pavel


>
>
> - Heikki
>
>


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-27 Thread Ildus Kurbangaliev

Hello.
In the attached patch I've made a refactoring for tranches.
The prefix for them was extended,  and I've did a split of LWLockAssign 
to two

functions (one with tranche and second for user defined LWLocks).

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 0eb991c..e75ca4d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -480,7 +480,7 @@ pgss_shmem_startup(void)
 	if (!found)
 	{
 		/* First time through ... */
-		pgss->lock = LWLockAssign();
+		pgss->lock = LWLockUserAssign();
 		pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
 		pgss->mean_query_len = ASSUMED_LENGTH_INIT;
 		SpinLockInit(&pgss->mutex);
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 9c15950..b365565 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3356,7 +3356,7 @@ if (!ptr)
 {
 initialize contents of shmem area;
 acquire any requested LWLocks using:
-ptr->mylockid = LWLockAssign();
+ptr->mylockid = LWLockUserAssign();
 }
 LWLockRelease(AddinShmemInitLock);
 }
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3a58f1e..3a55511 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -457,7 +457,7 @@ CLOGShmemInit(void)
 {
 	ClogCtl->PagePrecedes = CLOGPagePrecedes;
 	SimpleLruInit(ClogCtl, "CLOG Ctl", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
-  CLogControlLock, "pg_clog");
+  CLogControlLock, "pg_clog", LW_TRANCHE_CLOG_BUFFERS);
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 5ad35c0..2571203 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -466,7 +466,8 @@ CommitTsShmemInit(void)
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
 	SimpleLruInit(CommitTsCtl, "CommitTs Ctl", CommitTsShmemBuffers(), 0,
-  CommitTsControlLock, "pg_commit_ts");
+  CommitTsControlLock, "pg_commit_ts",
+  LW_TRANCHE_COMMITTS_BUFFERS);
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
 	 sizeof(CommitTimestampShared),
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 1933a87..e8d2474 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1842,10 +1842,12 @@ MultiXactShmemInit(void)
 
 	SimpleLruInit(MultiXactOffsetCtl,
   "MultiXactOffset Ctl", NUM_MXACTOFFSET_BUFFERS, 0,
-  MultiXactOffsetControlLock, "pg_multixact/offsets");
+  MultiXactOffsetControlLock, "pg_multixact/offsets",
+  LW_TRANCHE_MULTIXACT_BUFFERS);
 	SimpleLruInit(MultiXactMemberCtl,
   "MultiXactMember Ctl", NUM_MXACTMEMBER_BUFFERS, 0,
-  MultiXactMemberControlLock, "pg_multixact/members");
+  MultiXactMemberControlLock, "pg_multixact/members",
+  LW_TRANCHE_MULTIXACT_BUFFERS);
 
 	/* Initialize our shared state struct */
 	MultiXactState = ShmemInitStruct("Shared MultiXact State",
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 5fcea11..7709f6d 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -161,7 +161,7 @@ SimpleLruShmemSize(int nslots, int nlsns)
 
 void
 SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
-			  LWLock *ctllock, const char *subdir)
+			  LWLock *ctllock, const char *subdir, int lwlocktranche)
 {
 	SlruShared	shared;
 	bool		found;
@@ -218,7 +218,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			shared->page_dirty[slotno] = false;
 			shared->page_lru_count[slotno] = 0;
-			shared->buffer_locks[slotno] = LWLockAssign();
+			shared->buffer_locks[slotno] = LWLockAssign(lwlocktranche);
 			ptr += BLCKSZ;
 		}
 	}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6b70982..80690bc 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -179,7 +179,8 @@ SUBTRANSShmemInit(void)
 {
 	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
 	SimpleLruInit(SubTransCtl, "SUBTRANS Ctl", NUM_SUBTRANS_BUFFERS, 0,
-  SubtransControlLock, "pg_subtrans");
+  SubtransControlLock, "pg_subtrans",
+  LW_TRANCHE_SUBTRANS_BUFFERS);
 	/* Override default assumption that writes should be fsync'd */
 	SubTransCtl->do_fsync = false;
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e82a53a..fd04258 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -621,6 +621,7 @@ CREATE VIEW pg_stat_activity AS
 S.query_start,
   

Re: [HACKERS] Autonomous Transaction is back

2015-07-27 Thread Rajeev rastogi
On 23 July 2015 21:04, Robert Haas Wrote:
On Thu, Jul 23, 2015 at 1:31 AM, Rajeev rastogi  
wrote:
>> 2.It should be allowed to deadlock with master transaction. We
>> need to work-out a solution to avoid deadlock.

>This sentence seems to contradict itself.  I thought the consensus was that 
>the transaction should NOT conflict with the master transaction.

Since we are saying transaction is autonomous to parent transaction, we cannot 
guarantee that it does not take any conflicting lock unless otherwise designed 
so by the application.
But yes, we should have mechanism to deal with the possible deadlock.

>> 3.It can support multiple level of nesting based on the
>> configuration (may be max as 70).

>Why have any fixed maximum?

Since we are planning to have nested autonomous transaction, so it is required 
to have limit on this so that resources can be controlled.

>> 2. The above commands can be issued either inside the procedure to 
>> make few statements of procedure inside autonomous transaction or even 
>> in stand-alone query execution.

>I think inside a procedure the autonomous transaction will need to be 
>lexically scoped.  You won't be able to do this, for example:

>BEGIN AUTONOMOUS TRANSACTION;
>FOR x IN SELECT ... LOOP
>COMMIT;
>BEGIN AUTONOMOUS TRANSACTION;
>END LOOP;

I am not sure, how we will be able to control this. IMHO user should be able to 
control this, especially since it does not have any meaning from user 
perspective.
Please let me know if I am missing something here.

>Rather you'd have to do something like this:

>FOR x IN SELECT .. LOOP
>   BEGIN WITH AUTONOMOUS TRANSACTION
>   do stuff
>   END;
>END LOOP;


Thanks and Regards,
Kumar Rajeev Rastogi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We need to support ForeignRecheck for late row locking, don't we?

2015-07-27 Thread Kouhei Kaigai
> On 2015/07/24 23:51, Kouhei Kaigai wrote:
> >> On 2015/07/22 19:10, Etsuro Fujita wrote:
> >>> While working on the issue "Foreign join pushdown vs EvalPlanQual", I
> >>> happened to notice odd behaviors of late row locking in FDWs.
> >>
> >>> I think the reason for that is because we don't check pushed-down quals
> >>> inside an EPQ testing even if what was fetched by RefetchForeignRow was
> >>> an updated version of the tuple rather than the same version previously
> >>> obtained.  So, to fix this, I'd like to propose that pushed-down quals
> >>> be checked in ForeignRecheck.
> 
> >> * I've modified ForeignRecheck so as to check pushed-down quals whether
> >> doing late locking or early locking.
> 
> > Isn't it an option to put a new callback in ForeignRecheck?
> >
> > FDW driver knows its private data structure includes expression node
> > that was pushed down to the remote side. So, it seems to me the best
> > way to consult FDW driver whether the supplied tuple should be visible
> > according to the pushed down qualifier.
> >
> > More or less, this fix need a new interface contract around EvalPlanQual
> > logic. It is better to give FDW driver more flexibility of its private
> > data structure and the way to process recheck logic, rather than special
> > purpose variable.
> >
> > If FDW driver managed pushed-down expression in its own format, requirement
> > to pushedDownQual makes them to have qualifier redundantly.
> > The callback approach does not have such kind of concern.
> 
> That might be an idea, but is there any performance disadvantage as
> discussed in [1]?; it looks like that that needs to perform another
> remote query to see if the supplied tuple satisfies the pushed-down
> quals during EPQ testing.
>
I expect the callback of ForeignRecheck runs ExecQual() towards
the qualifier expression pushed-down but saved on the private data
of ForeignScanState. It does not need to kick another remote query
(unless FDW driver is not designed), so performance disadvantage is
none or quite limited.

Also, let's assume the case when scanrelid == 0 (join pushdown).
It is easy to put special code path if scanrelid == 0, that
implies ScanState is either ForeignScan or CustomScan.
If ForeignRecheck (= recheckMtd) is called instead of the if-
block below of the Assert() on ExecScanFetch, FDW driver will be
able to put its own special code path to run alternative sub-plan.
How this alternative sub-plan works? It walks down the sub-plan
tree that is typically consists of NestLoop + ForeignScan for
example, then ExecScanFetch() is called again towards ScanState
with scanrelid > 0 at that time.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DESIGN] ParallelAppend

2015-07-27 Thread Kyotaro HORIGUCHI
Hello, can I ask some questions?

I suppose we can take this as the analog of ParalleSeqScan.  I
can see not so distinction between Append(ParalleSeqScan) and
ParallelAppend(SeqScan). What difference is there between them?

If other nodes will have the same functionality as you mention at
the last of this proposal, it might be better that some part of
this feature is implemented as a part of existing executor
itself, but not as a deidicated additional node, just as my
asynchronous fdw execution patch patially does. (Although it
lacks planner part and bg worker launching..) If that is the
case, it might be better that ExecProcNode is modified so that it
supports both in-process and inter-bgworker cases by the single
API.

What do you think about this?



regards,

> Hello,
> 
> I'm recently working/investigating on ParallelAppend feature
> towards the next commit fest. Below is my design proposal.
> 
> 1. Concept
> --
> Its concept is quite simple anybody might consider more than once.
> ParallelAppend node kicks background worker process to execute
> child nodes in parallel / asynchronous.
> It intends to improve the performance to scan a large partitioned
> tables from standpoint of entire throughput, however, latency of
> the first multi-hundred rows are not scope of this project.
> From standpoint of technology trend, it primarily tries to utilize
> multi-cores capability within a system, but also enables to expand
> distributed database environment using foreign-tables inheritance
> features.
> Its behavior is very similar to Funnel node except for several
> points, thus, we can reuse its infrastructure we have had long-
> standing discussion through the v9.5 development cycle.
> 
> 2. Problems to be solved
> -
> Typical OLAP workloads takes tons of tables join and scan on large
> tables which are often partitioned, and its KPI is query response
> time but very small number of sessions are active simultaneously.
> So, we are required to run a single query as rapid as possible even
> if it consumes larger computing resources than typical OLTP workloads.
> 
> Current implementation to scan heap is painful when we look at its
> behavior from the standpoint - how many rows we can read within a
> certain time, because of synchronous manner.
> In the worst case, when SeqScan node tries to fetch the next tuple,
> heap_getnext() looks up a block on shared buffer, then ReadBuffer()
> calls storage manager to read the target block from the filesystem
> if not on the buffer. Next, operating system makes the caller
> process slept until required i/o get completed.
> Most of the cases are helped in earlier stage than the above worst
> case, however, the best scenario we can expect is: the next tuple
> already appear on top of the message queue (of course visibility
> checks are already done also) with no fall down to buffer manager
> or deeper.
> If we can run multiple scans in parallel / asynchronous, CPU core
> shall be assigned to another process by operating system, thus,
> it eventually improves the i/o density and enables higher processing
> throughput.
> Append node is an ideal point to be parallelized because
> - child nodes can have physically different location by tablespace,
>   so further tuning is possible according to the system landscape.
> - it can control whether subplan is actually executed on background
>   worker, per subplan basis. If subplan contains large tables and
>   small tables, ParallelAppend may kick background worker to scan
>   large tables only, but scan on small tables are by itself.
> - Like as Funnel node, we don't need to care about enhancement of
>   individual node types. SeqScan, IndexScan, ForeignScan or others
>   can perform as usual, but actually in parallel.
> 
> 
> 3. Implementation
> --
> * Plan & Cost
> 
> ParallelAppend shall appear where Appen can appear except for the
> usage for dummy. So, I'll enhance set_append_rel_pathlist() to add
> both of AppendPath and ParallelAppendPath with cost for each.
> Cost estimation logic shall take further discussions, however,
> I expect the logic below to estimate the cost for ParallelAppend.
>   1. Sum startup_cost and run_cost for each child pathnode, but
>  distinguish according to synchronous or asynchronous.
>  Probably, total cost of pathnode is less than:
>   (parallel_setup_cost + its total cost / parallel_append_degree
>+ number of rows * cpu_tuple_comm_cost)
>  is nonsense to run on background worker.
>   2. parallel_setup_cost * (# of asynchronous nodes) are added to
>  sum of startup_cost of asynchronous nodes.
>   3. sum of run_cost of asynchronous nodes are divided by 
>  parallel_append_degree, then cpu_tuple_comm_cost * (total # of
>  rows by asynchronous nodes) are added.
>   4. both of synchronous and asynchronous cost are added, then it
>  becomes the cost of ParallelAppend.
> Obviously, it stand on the 

Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread David Rowley
On 27 July 2015 at 20:11, Heikki Linnakangas  wrote:

> On 07/27/2015 08:34 AM, David Rowley wrote:
>
>  In this function I also wasn't quite sure if it was with comparing both
>> non-NULL INITCOND's here. I believe my code comments may slightly
>> contradict what the code actually does, as the comments talk about them
>> having to match, but the code just bails if any are non-NULL. The reason I
>> didn't check them was because it seems inevitable that some duplicate work
>> needs to be done when setting up the INITCOND. Perhaps it's worth it?
>>
>
> It would be nice to handle non-NULL initconds. I think you'll have to
> check that the input function isn't volatile. Or perhaps just call the
> input function, and check that the resulting Datum is byte-per-byte
> identical, although that might be awkward to do with the current code
> structure.
>
>
Yeah, it's awkward, as I just managed to remind myself:

The aggtranstype needs to be known before we can call GetAggInitVal() on
the initval datum.

That currently happens in build_transstate_for_aggref() only when we've
decided to create a new state.

transstate->initValue = GetAggInitVal(textInitVal,
  transstate->aggtranstype);

And to get the aggtranstype:

transstate->aggtranstype =
resolve_aggregate_transtype(aggref->aggfnoid,
aggform->aggtranstype,
inputTypes,
numArguments);

Of course, not impossible, but lots of code gets duplicated.

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] False comment about speculative insertion

2015-07-27 Thread Heikki Linnakangas

On 07/26/2015 10:30 PM, Peter Geoghegan wrote:

Attached patch removes a reference to an executor README section about
speculative insertion. In fact, the high-level overview of speculative
insertion ended up at the top of execIndexing.c. The executor README
was not touched by the ON CONFLICT patch at all.

I don't think it's necessary to refer to execIndexing.c within
ExecInsert instead. All the routines being called from ExecInsert()
that relate to speculative insertion are in execIndexing.c anyway.


Fixed, thanks.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 06:55 AM, Craig Ringer wrote:

On 7 July 2015 at 14:32, Pavel Stehule  wrote:

Hi

previous patch was broken, and buggy

Here is new version with fixed upload and more tests


I routinely see people trying to use COPY ... FORMAT binary to export
a single binary field (like an image, for example) and getting
confused by the header PostgreSQL adds. Or using text-format COPY and
struggling with the hex escaping. It's clearly something people have
trouble with.

It doesn't help that while lo_import and lo_export can read paths
outside the datadir (and refuse to read from within it),
pg_read_binary_file is superuser only and disallows absolute paths.
There's no corresponding pg_write_binary_file. So users who want to
import and export a single binary field tend to try to use COPY. We
have functionality for large objects that has no equivalent for
'bytea'.

I don't love the use of COPY for this, but it gets us support for
arbitrary clients pretty easily. Otherwise it'd be server-side only
via local filesystem access, or require special psql-specific
functionality like we have for lo_import etc.


COPY seems like a strange interface for this. I can see the point that 
the syntax is almost there already, for both input and output. But even 
that's not quite there yet, we'd need the new RAW format. And as an 
input method, COPY is a bit awkward, because you cannot easily pass the 
file to a function, for example. I think this should be implemented in 
psql, along the lines of Andrew's original \bcopy patch.


There are a couple of related psql-features here actually, that would be 
useful on their own. The first is being able to send the query result to 
a file, for a single query only. You can currently do:


\o /tmp/foo
SELECT ...;
\o

But more often than not, when I try to do that, I forget to do the last 
\o, and run another query, and the output still goes to the file. So 
it'd be nice to have a \o option that only affects the next query. 
Something like:


\O /tmp/foo
SELECT ...;

The second feature needed is to write the output without any headers, 
row delimiters and such. Just the datum. And the third feature is to 
write it in binary. Perhaps something like:


\O /tmp/foo binary
SELECT blob FROM foo WHERE id = 10;

What about input? This is a whole new feature, but it would be nice to 
be able to pass the file contents as a query parameter. Something like:


\P /tmp/foo binary
INSERT INTO foo VALUES (?);


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sharing aggregate states between different aggregate functions

2015-07-27 Thread Heikki Linnakangas

On 07/27/2015 08:34 AM, David Rowley wrote:

- * agg_input_types, agg_state_type, agg_result_type identify the input,
- * transition, and result types of the aggregate.  These should all be
- * resolved to actual types (ie, none should ever be ANYELEMENT etc).
+ * agg_input_types identifies the input types of the aggregate.  These
should
+ * be resolved to actual types (ie, none should ever be ANYELEMENT etc).

I'm not sure I understand why agg_state_type and agg_result_type have
changed here.


The function no longer has the agg_result_type argument, but the removal 
of agg_state_type from the comment was a mistake.



+ peraggstate->sortstates = (Tuplesortstate **)
+ palloc0(sizeof(Tuplesortstate *) * numGroupingSets);
+ for (currentsortno = 0; currentsortno < numGroupingSets; currentsortno++)
+ peraggstate->sortstates[currentsortno] = NULL;

This was not you, but this NULL setting looks unneeded due to the palloc0().


Yeah, I noticed that too. Ok, let's take it out.


In this function I also wasn't quite sure if it was with comparing both
non-NULL INITCOND's here. I believe my code comments may slightly
contradict what the code actually does, as the comments talk about them
having to match, but the code just bails if any are non-NULL. The reason I
didn't check them was because it seems inevitable that some duplicate work
needs to be done when setting up the INITCOND. Perhaps it's worth it?


It would be nice to handle non-NULL initconds. I think you'll have to 
check that the input function isn't volatile. Or perhaps just call the 
input function, and check that the resulting Datum is byte-per-byte 
identical, although that might be awkward to do with the current code 
structure.



BTW, the name of the AggStatePerAggStateData struct is pretty horrible.
The repeated "AggState" feels awkward. Now that I've stared at the patch
for a some time, it doesn't bother me anymore, but it took me quite a while
to over that. I'm sure it will for others too. And it's not just that
struct, the comments talk about "aggregate state", which could be confused
to mean "AggState", but it actually means AggStatePerAggStateData. I don't
have any great suggestions, but can you come up a better naming scheme?


I agree, they're horrible. The thing that's causing the biggest problem is
the struct named AggState, which carries state for *all* aggregates, and
has nothing to do with "transition state", so it seems there's two
different meanings if the word "state" and I've used both meanings in the
name for AggStatePerAggStateData.

Perhaps just renaming AggStatePerAggStateData to AggStateTransStateData
would be good enough?


Hmm. I think it should be "AggStatePerTransData" then, to keep the same 
pattern as AggStatePerAggData and AggStatePerGroupData.



I've attached a delta patch based on your patch, in this I've:

1. Renamed AggStatePerAggStateData to AggStateTransStateData and all
variables using that are renamed to suit better.
2. Removed surplus peraggstate->sortstates[currentsortno] = NULL; (not
related to this patch, but since we're moving that part of the code, we'd
better fix)
3. Put back the missing aggfnoid from the error message.
4. Changed initialize_aggregates() to not pass the states. They're already
in AggState and we're using aggstate->numstates to get the count of the
items in that array, so it seems wrong to allow a different array to ever
be passed in.
5. Changed wording of a few comments to try and reduce confusing of 'state'
and 'transition state'.
6. Renamed AggState.peraggstate to transstates. I pluralised this to try to
reduce confusion of the single state pointers named 'transstate' in the
functions in nodeAgg.c. I did think that peragg should also become peraggs
and pergroup should become pergroups, but didn't change those.

Anything else I changed is self explanatory.

What do you think?


Looks good, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump -Fd and compression level

2015-07-27 Thread Marc Mamin

>>>
>>> As per attached patch.
>>>
>>> Comments?
>>
>> It seems that the first test on the compression in pg_backup_tar.c is now 
>> obsolete.
>> It didn't make much sense anyway.
>>
>>
>>
>>211 if (AH->compression < 0 || AH->compression > 9)
>>212 AH->compression = Z_DEFAULT_COMPRESSION;
>>213
>>214 /* Don't compress into tar files unless asked to do so */
>>215 if (AH->compression == Z_DEFAULT_COMPRESSION)
>>216 AH->compression = 0;
>>217
>>218 /*
>>219  * We don't support compression because reading the files 
>> back is not
>>220  * possible since gzdopen uses buffered IO which totally 
>> screws file
>>221  * positioning.
>>222  */
>>223 if (AH->compression != 0)
>>224 exit_horribly(modulename,
>>225  "compression is not supported by tar archive 
>> format\n");
>>226 }
>>
>>
>
>In fact, the first two tests look unnecessary. Neither condition should
>be possible now.
>

Hello,

Isn't the second test still required if you call pg_dump -Ft without setting 
-Z0 explicitly ?
(=> AH->compression == Z_DEFAULT_COMPRESSION)

There still are a few suspicious places in pg_backup_tar.c 
that refer to the compression although not supported (except for blob ?)
(C programming is beyond my capabilities, I can roughly read simple code ... )

regards,
Marc Mamin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >