date:20140409

Re: [HACKERS] Problem with displaying wide tables in psql

2014-04-09 Thread Sergey Muraviov

Hi.

How can I pass or set the value of pset variable for an regression test?

The code with ioctl was copied from print_aligned_text function, which has
a long history.
43ee2282 (Bruce Momjian  2008-05-16 16:59:05 +  680)
 if (ioctl(fileno(stdout), TIOCGWINSZ, screen_size) != -1)

2014-04-08 20:27 GMT+04:00 Greg Stark st...@mit.edu:

 On Tue, Apr 8, 2014 at 12:19 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  I don't think this is easily testable that way - doesn't it rely on
  determining the width of the terminal? Which you won't have when started
  from pg_regress?

 There's a pset variable to set the target width so at least the
 formatting code can be tested. It would be nice to have the ioctl at
 least get called on the regression farm so we're sure we aren't doing
 something unportable.

 --
 greg




-- 
Best regards,
Sergey Muraviov

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Heikki Linnakangas


On 04/09/2014 01:18 AM, Andrew Dunstan wrote:


On 04/08/2014 05:57 PM, Peter Geoghegan wrote:

On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Well, let me see if I understand the situation correctly:

* jsonb_ops supports more operators

* jsonb_hash_ops produces smaller, better-performing indexes

* jsonb_ops falls over on inputs with wide field values, but
jsonb_hash_ops does not

There might be some compelling cases for indexing existence rather
than containment, since the recheck flag isn't set there, but in
general this summary seems sound. I would say that broadly, existence
is a less useful operator than containment, and so jsonb_hash_ops is
broadly more compelling. I didn't propose changing the default due to
concerns about the POLA, but I'm happy to be told that those concerns
were out of proportion to the practical benefits of a different
default.


I tend to agree with Tom that POLA will be more violated by the default
ops class not being able to index some values.


Yeah.

rant

Both of the operator classes are actually much less flexible than I'd 
like. Firstly, they index everything. In many cases, that's not what you 
want, so you end up with much larger indexes than necessary. Secondly, 
jsonb_ops indexes all values separately from the keys. That makes the 
index pretty much useless for a query on, say, WHERE json @ 
'{needs_processing:true}', if all the rows also contain a key-value 
pair active:true. Thirdly, inequality operators are not supported; you 
can't search for rows with (the json-syntax equivalent of) price  
12.3. Fourthly, sometimes you would want to include the path to an 
entry in the key, sometimes not.


If I understood correctly the way jsonb_hash_ops works, the limitation 
compared to jsonb_ops is that it cannot be used for foo ? 'bar' type 
queries. And the reason for that limitation is that it hashes the whole 
path to the key; the naked values are not indexes separately. But why 
not? jsonb_ops does - why is that decision related to whether you hash 
or not? Or it could index both. Sure, it would be wasteful when you 
don't need to support foo ? 'bar', but the point is that it should be up 
to the DBA to decide, based on his needs.


As the code stands, you don't have a choice on any of those things. The 
decisions have been made by us, PostgreSQL developers. The only choice 
you have is between jsonb_ops and jsonb_hash_ops, with a strange 
combination of tradeoffs in both. Sure, they're still useful, if not 
optimal, for a wide-range of applications. For more complicated cases, 
you will have to resort to expression indexes. It bugs me greatly that 
the underlying indexam could do all those things, we're just not 
exposing the capability.


ISTM we need a way to parameterize opclasses, so that when you create 
the index, you specify the above things.


/rant

The ship has cleatly sailed to add parameterized opclasses to 9.4, but 
let's keep it in mind when we decide on the defaults.


In the absence of parameterizable opclasses, it would be much more 
flexible to have opclasses that index, keys, values, key-value pairs and 
paths separately, instead of the current json_ops and json_hash_ops 
opclasses which index all of those in the same index. That way, if you 
only e.g. ever query on the existence of a key, you'd only need to index 
the keys.


I don't understand how we ended up with the current dichotomy of 
json_ops and json_hash_ops...


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Pointer to structure in ECPG

2014-04-09 Thread Ashutosh Bapat

Hi All,
I tried to use pointer to array to fetch results of a query. The test case
test_select.pgc is attached. Changes specific to one's environment are
needed before that test can be tried. Otherwise, you may try file
pointer_to_struct.pgc in the patch attached, by putting it ecpg/test
directory.

The file test_select.pgc compiles fine but when run it produces error -203
and produces garbage because the result of FETCH was not saved in the array
pointed by pointer.

./test_select
SQL error: SQL error -203 on line 40
empno=-163754450, ename=, job=
empno=634004672, ename=�, job=
empno=634004624, ename=�, job=

This happens because of two problems
1. When dumping individual members of structure emp in test_select.pgc,
when a pointer to that structure is dumped, its array size is dumped as 1
instead of 0 or a negative number like scalar variable pointers.

2. As discussed in other thread using arrays within structure in ECPG,
the offsets of members of structures are not dumped correctly for an
unbounded array.

Problem 1 reason:
-
When dumping individual members of structure emp in test_select.pgc, when a
pointer to that structure is dumped, the call is routed through code path
as shown below

*#0*  *ECPGdump_a_type* (o=0x6cc4f0, *name=0x6d2190 empno*,
type=0x6d2130, brace_level=-1, ind_name=0x4c0ae8 no_indicator,
ind_type=0x6c8260, ind_brace_l
evel=-1, *prefix=0x6d1a20 emp2-*, ind_prefix=0x0, *arr_str_siz=0x6d2070
0,* *struct_sizeof=0x6d0c60 sizeof( struct employee )*,
ind_struct_sizeof=0x0)
 at type.c:247
*#1*  0x0042f21d in *ECPGdump_a_struct* (o=0x6cc4f0, *name=0x6d20b0
emp2*, ind_name=0x435c20 no_indicator, *arrsiz=0x6d2070 0*,
type=0x6d20d0, in
d_type=0x6c8260, *prefix=0x6d1a20 emp2-*, ind_prefix=0x0) at type.c:572
*#2*  0x0042e2cc in *ECPGdump_a_type *(o=0x6cc4f0, *name=0x6d20b0
emp2*, type=0x6d22b0, brace_level=1, ind_name=0x435c20 no_indicator,
ind_type=0
x6c8260, ind_brace_level=0, prefix=0x0, ind_prefix=0x0, *arr_str_siz=0x6d1e60
0*, *struct_sizeof=0x0*, ind_struct_sizeof=0x0) at type.c:293
*#3*  0x0043335d in *dump_variables* (list=0x6d2310, mode=1) at
variable.c:452

ECPGdump_a_type() at frame #3, then calls ECPGdump_a_simple() as follows
#0  *ECPGdump_a_simple* (o=0x6cc4f0, *name=0x6d2190 empno*,
type=ECPGt_int, varcharsize=0x6d0f70 1,* arrsize=0x6d2360 -1*,
*siz=0x6d0c60 sizeof( struct employee )*, prefix=0x6d1a20 emp2-,
counter=0) at type.c:405

ECPGdump_a_simple() while dumping the member (here empno) dumps with
arrsize = 1 instead of -1 because of code at line
523│ if (atoi(arrsize)  0)
524│ strcpy(arrsize, 1);

In ECPG run time library when reading this variable, it is read with array
size 1 and is falsely interpreted as an array with size 1 instead of an
unbounded array (which are read with arrsize  0). Hence giving error -203.

ECPGdump_a_struct has correctly called ECPGdump_a_type with arrsize 0 to
mean an unbounded array, and ECPGdump_a_type() has correctly converted it
to -1 again to differentiate between pointer to struct from pointer to
scalar variable. This differentiation is important to dump pointer to
struct and pointer to scalar variable differently in ECPGdump_a_simple().
So, in ECPGdump_a_simple() we should dump the variable with arrsize -1 if
it's part of an outer structure.

The patch attached, includes solutions to both the problems and also a
testcase preproc/pointer_to_struct.pgc. This test is a copy of test
preproc/array_of_struct.pgc with the structure arrays replaced by pointers
to structures and memory allocations.

The patch is based on development head. Please consider this to be
backpatched to 9.3 as well.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


test_select.pgc
Description: Binary data
diff --git a/src/interfaces/ecpg/preproc/type.c b/src/interfaces/ecpg/preproc/type.c
index 2982cb6..dbe2d7e 100644
--- a/src/interfaces/ecpg/preproc/type.c
+++ b/src/interfaces/ecpg/preproc/type.c
@@ -512,25 +512,33 @@ ECPGdump_a_simple(FILE *o, const char *name, enum ECPGttype type,
 if (((atoi(arrsize)  0) ||
 	 (atoi(arrsize) == 0  strcmp(arrsize, 0) != 0)) 
 	siz == NULL)
 	sprintf(variable, (%s%s), prefix ? prefix : , name);
 else
 	sprintf(variable, (%s%s), prefix ? prefix : , name);
 
 sprintf(offset, sizeof(%s), ecpg_type_name(type));
 break;
 		}
-
-		if (atoi(arrsize)  0)
+		
+		/*
+		 * Array size would be -1 for addresses of members within structure,
+		 * when pointer to structure is being dumped.
+		 */
+		if (atoi(arrsize)  0  !siz)
 			strcpy(arrsize, 1);
 
-		if (siz == NULL || strlen(siz) == 0 || strcmp(arrsize, 0) == 0 || strcmp(arrsize, 1) == 0)
+		/*
+		 * If siz i.e. the size of structure of which this variable is part of,
+		 * that gives the offset to the next element, if required 
+		 */
+		if (siz == NULL || strlen(siz) == 0)
 			fprintf(o, \n\t%s,%s,(long)%s,(long)%s,%s, ,

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-09 Thread Pavan Deolasee

On Wed, Apr 9, 2014 at 11:03 AM, Rajeev rastogi
rajeev.rast...@huawei.comwrote:


 Though autonomous transaction uses mixed approach of sub-transaction as
 well as main
 transaction, transaction state of autonomous transaction is handled
 independently.


Whenever I was asked to have a look at implementing this feature, I always
wondered about the great amount of global state that a backend maintains
which is normally tied to a single top transaction. Since AT will have same
characteristics as a top level transaction, I wonder how do you plan to
separate those global state variables ? Sure, we can group them in a
structure and put them on a stack when an AT starts and pop them off when
the original top transaction becomes active again, finding all such global
state variables is going to be tricky.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-09 Thread Hannu Krosing

On 04/09/2014 08:44 AM, Pavan Deolasee wrote:
 On Wed, Apr 9, 2014 at 11:03 AM, Rajeev rastogi
 rajeev.rast...@huawei.com mailto:rajeev.rast...@huawei.com wrote:


 Though autonomous transaction uses mixed approach of
 sub-transaction as well as main
 transaction, transaction state of autonomous transaction is
 handled independently.


 Whenever I was asked to have a look at implementing this feature, I
 always wondered about the great amount of global state that a backend
 maintains which is normally tied to a single top transaction. Since AT
 will have same characteristics as a top level transaction, I wonder
 how do you plan to separate those global state variables ? Sure, we
 can group them in a structure and put them on a stack when an AT
 starts and pop them off when the original top transaction becomes
 active again, finding all such global state variables is going to be
 tricky.
I would hope most of this to be solved by having one (read only) virtual
transaction and
then juggling the ATs in a way similar to current subtransaction machinery.

The main differences would be that:

 A) the top level transaction stays virtual

and

 B) ATs are committed independantly

This would be greatly simplified if we can accept the restriction that
there is only single
snapshot per backend (not per transaction). To me this seems a
completely sensible restriction.

Re syntax, I think we need a way to name the transactions so we can have
a way
to switch between multiple parallel active autonomous transactions.

-
BEGIN TRANSACTION myfirsttransaction;

do something in myfirsttransaction;

BEGIN TRANSACTION anothertransaction;

do something in anothertransaction;

SET TRANSACTION myfirsttransaction;

more work in myfirsttransaction;

ROLLBACK anothertransaction;

COMMIT; -- or COMMIT myfirsttransaction;


Cheers
Hannu



 Thanks,
 Pavan

 -- 
 Pavan Deolasee
 http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-09 Thread Craig Ringer

On 04/09/2014 02:44 PM, Pavan Deolasee wrote:
 On Wed, Apr 9, 2014 at 11:03 AM, Rajeev rastogi
 rajeev.rast...@huawei.com mailto:rajeev.rast...@huawei.com wrote:
 
 
 Though autonomous transaction uses mixed approach of sub-transaction
 as well as main
 transaction, transaction state of autonomous transaction is handled
 independently.
 
 
 Whenever I was asked to have a look at implementing this feature, I
 always wondered about the great amount of global state that a backend
 maintains which is normally tied to a single top transaction. Since AT
 will have same characteristics as a top level transaction, I wonder how
 do you plan to separate those global state variables ? Sure, we can
 group them in a structure and put them on a stack when an AT starts and
 pop them off when the original top transaction becomes active again,
 finding all such global state variables is going to be tricky.

... not to mention the fact that extensions may rely on having their own
global state.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-09 Thread Heikki Linnakangas


On 04/09/2014 09:55 AM, Hannu Krosing wrote:

This would be greatly simplified if we can accept the restriction that
there is only single
snapshot per backend (not per transaction). To me this seems a
completely sensible restriction.


Huh? In Read committed mode, every query within a transaction gets a 
different snapshot.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Peter Geoghegan

On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 As the code stands, you don't have a choice on any of those things. The
 decisions have been made by us, PostgreSQL developers. The only choice you
 have is between jsonb_ops and jsonb_hash_ops, with a strange combination of
 tradeoffs in both. Sure, they're still useful, if not optimal, for a
 wide-range of applications. For more complicated cases, you will have to
 resort to expression indexes. It bugs me greatly that the underlying indexam
 could do all those things, we're just not exposing the capability.

Why would you ever not have to use expression indexes? Idiomatic usage
of jsonb involves expression indexes because it's desirable to index
only a expression. People will want to do things like only index the
nested tags array far more frequently then they'll only want to
index keys (that is, Object pair keys) in the entire document. I don't
get why you'd say that they'd resort to expression indexes, like
they're a kludge. Have you ever tried out one of the new document
databases? I suggest you do. Expression indexes on jsonb map pretty
closely onto how you're frequently expected to index data in those
systems. That's something that they make heavy use of. Why would you
ever not really have to consider ahead of time what is important
enough to be indexed, and what is not?

 ISTM we need a way to parameterize opclasses, so that when you create the
 index, you specify the above things.

That would be nice.

 In the absence of parameterizable opclasses, it would be much more flexible
 to have opclasses that index, keys, values, key-value pairs and paths
 separately, instead of the current json_ops and json_hash_ops opclasses
 which index all of those in the same index. That way, if you only e.g. ever
 query on the existence of a key, you'd only need to index the keys.

I think only ever needing to index the keys is not a common use-case.
It pretty much makes exactly as much sense to do so as it would with
hstore, and yet hstore doesn't support that after all these years.

 I don't understand how we ended up with the current dichotomy of json_ops
 and json_hash_ops...

It makes sense if you consider jsonb_ops best suited to simpler
hstore-style indexing, while jsonb_hash_ops is best suited to testing
containment of JSON documents, potentially with lots of nesting. These
documents are typically homogeneous in structure. Idiomatic usage of
systems like MongoDB involves collections of fairly homogeneous
documents. If there is a lot of variability in their structure within
a collection, the collection more or less becomes impossible to
usefully query. They aim to be flexible, but still implicitly require
you to insert data with a half-way sensible/consistent structure. This
makes separately indexing the keys less than compelling as a default,
because there is so much duplication of keys in practice.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] libpq api wart: no PQconnect variant that can consume output of PQconninfoParse(...)

2014-04-09 Thread Craig Ringer

Hi all

While working on something else I just noticed that there's no PQconnect
variant that can consume the output from PQconninfoParse(...), i.e. an
array of PQconninfoOption* .

This would be a nice-to-have for times when you want to pass a connstr,
modify it, and then connect with the modified options.

There's PQconnectdbParams, but you have to transform the option arrays
to use them with it.

This might be a worthwhile TODO (beginner) option - and may simplify
some of fe-connect.c in the process.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Merge Join for Non '=' Operators

2014-04-09 Thread Nicolas Barbier

2014-04-09 Dilip kumar dilip.ku...@huawei.com:

 I would like to propose a New merge join algorithm for optimizing non ‘=’
 operators. (‘’, ‘=’, ‘’, ‘=’)

Do you have a real-world example use case of such joins, to offset the
extra planner time that will likely have to be paid (even for queries
for which the functionality ends up not being used)?

I guess there might be queries that join on “values that are not too
far apart” or something, but as those cases (there typically not being
a lot of “inner” rows that join with each “outer” row) are probably
executed efficiently using a nested loop + index scan, I don’t see the
point (yet). Are you aiming for the case where the inner relation is
difficult to compute and cannot be accessed using an index scan?

 selecting this new cost calculation can be implemented in planner

Hmm. Of course, the difficult part will be adding support for this in
the planner, but you don’t seem to have any plans for implementing
that?

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Dean Rasheed

On 8 April 2014 21:48, Florian Pflug f...@phlo.org wrote:
 On Apr7, 2014, at 17:41 , Dean Rasheed dean.a.rash...@gmail.com wrote:
 I've just finished reading through all the other patches, and they all
 look OK to me. It's mostly straightforward stuff, so despite the size
 it's hopefully all committable once the base patch goes in.

 Hm, I'm starting to have second thoughts about the minmax patch. The
 inverse transition functions for MIN and MAX have a non-trivial probability
 of failure - they trigger a rescan whenever the value that is removed isn't
 strictly smaller (respectively strictly larger) then the current maximum
 (respectively minimum). Thus, whenever that happens, we both call the
 inverse transition function *and* (since it fails) restart the aggregation.

 For windows based on ROWS, this isn't too bad - even if we fail every second
 time, we still avoid half the rescans, which should be a net win if the
 average window size is  2.

 But for RANGE-based windows, more than one call of the inverse transition
 function must succeed for us to save anything, since we must successfully
 remove *all* peers to avoid one restart. This greatly increases the chance
 that using the inverse transition function hurts rather then helps - the
 situation is worse the larger the average number of peers is.


Argh, I hadn't really considered that case. I suppose any imperfect
inverse transition function has the potential to make performance
worse rather than better. But working out the likelihood of that isn't
necessarily straightforward.

It might be possible to include some sort of heuristic based on the
known information --- the number of rows P in the peer group about to
be removed vs the total number N of rows aggregated so far. If the
data were fairly random, then a quick back-of-the-envelope calculation
suggests that trying the inverse min/max functions would be worthwhile
on average if P were less than around 0.4N, but of course different
data distributions could make that much worse. Even a perfect inverse
transition function isn't going to be much use if P  N/2 (e.g.,
imagine a case where the peer groups decrease in size exponentially),
so perhaps we should be including such a check anyway.

That's also assuming that the cost of the inverse transition function
is about the same as the cost of the forward function, which is not
necessarily the case. Perhaps imperfect inverse transition functions
should be assigned a higher cost, and that should be factored into the
decision as to whether they are likely to be worth trying. All that
feels very speculative though, and I think it's too late to be
considering that for 9.4, so yes, let's leave out the min/max
aggregates for now.


 I've factored the BOOL_AND,BOOL_OR stuff out into a separate patch
 invtrans_bool - it previously was part of invtrans_minmax. Given the
 performance risk involved, I think that we probably shouldn't commit
 invtrans_minmax at this time. I should have brought this up earlier, but
 the issue had slipped my mind :-( Sorry for that.

 I think that you're probably right that optimising
 window_gettupleslot() to eliminate the O(n^2) behaviour can be left to
 a later patch --- the existing performance benefits of this patch are
 enough to justify its inclusion IMO. It would be nice to include the
 trivial optimisation to window_gettupleslot() that I posted upthread,
 since it gives such a big improvement for only a few lines of code
 changed.

 Agreed, but since it's independent from the rest of the base patch,
 I kept it as a separate patch (invtrans_optimize) instead of merging it
 into the base patch. It should probably be committed separately too.


It would be good to commit at least the base, arith and optimize
patches for 9.4. I think the collecting and bool patches are also
committable, but I also suspect that those aggregates are less well
used, so they could be considered lower priority.


 If you post a new patch set, I'll mark it as ready for committer attention.

 New patch set is attached. The only difference to the previous one is that
 Forward Transitions and Inverse Transitions are now scaled with nloops,
 and that it includes your window_gettupleslot patch under the name
 invtrans_optimize.


OK, I'm marking this ready for committer attention, on the
understanding that that doesn't include the invtrans_minmax patch.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Heikki Linnakangas


On 04/09/2014 10:40 AM, Peter Geoghegan wrote:

On Tue, Apr 8, 2014 at 11:37 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

As the code stands, you don't have a choice on any of those things. The
decisions have been made by us, PostgreSQL developers. The only choice you
have is between jsonb_ops and jsonb_hash_ops, with a strange combination of
tradeoffs in both. Sure, they're still useful, if not optimal, for a
wide-range of applications. For more complicated cases, you will have to
resort to expression indexes. It bugs me greatly that the underlying indexam
could do all those things, we're just not exposing the capability.


Why would you ever not have to use expression indexes? Idiomatic usage
of jsonb involves expression indexes because it's desirable to index
only a expression. People will want to do things like only index the
nested tags array far more frequently then they'll only want to
index keys (that is, Object pair keys) in the entire document. I don't
get why you'd say that they'd resort to expression indexes, like
they're a kludge.


Expression indexes are definitely nice, but you have to be careful to 
formulate the query in exactly the same way to match the index.



Have you ever tried out one of the new document
databases? I suggest you do. Expression indexes on jsonb map pretty
closely onto how you're frequently expected to index data in those
systems. That's something that they make heavy use of. Why would you
ever not really have to consider ahead of time what is important
enough to be indexed, and what is not?


I didn't say that. On the contrary, I think the shotgun approach 
jsonb_ops and jsonb_hash_ops take is too broad. It should be possible to 
specify what to index in a more detailed fashion.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \d+ and oid display

2014-04-09 Thread Bruce Momjian

On Wed, Apr  9, 2014 at 01:02:05AM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  Well, that's sorta my concern.  I mean, right now we've got people
  saying what the heck is a replica identity?.  But, if the logical
  decoding stuff becomes popular, as I hope it will, that's going to be
  an important thing for people to adjust, and the information needs to
  be present in a clear and easily-understood way.  I haven't studied
  the current code in detail so maybe it's fine.  I just want to make
  sure we're not giving it second-class treatment solely on the basis
  that it's new and people aren't using it yet.
 
 I think the proposal is don't mention the property if it has the
 default value.  That's not second-class status, as long as people
 who know what the property is understand that behavior.  It's just
 conserving screen space.

Yes, the lines will certainly appear if you have set _anything_ as
non-default, both oids and replica identity.  Kind of the same as how we
show indexes and child tables if any exist (it is non-default), and
unlogged tables.

I know I have fielded questions during training asking, What is that
OID line?, so I do know it confuses people.  It does give me a chance
to talk about it, but based on how often it is useful, I am not sure
that is a win.

Ideally we would deal with oids for 9.5, but since I think everyone
agrees that replica identity and oids should behave the same, we need to
decide this for 9.4.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Firing trigger if only

2014-04-09 Thread Gabriel

Thank you, this done the job.All the best 



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Firing-trigger-if-only-tp5798484p5799344.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ipc_test

2014-04-09 Thread Bruce Momjian

On Tue, Apr  8, 2014 at 02:08:25PM -0400, Greg Stark wrote:
 On Mon, Apr 7, 2014 at 10:43 AM, Robert Haas robertmh...@gmail.com wrote:
  OK, done.  One less thing to worry about when committing!
 
 Also one less thing to cause headaches with etags and similar tools.
 It always drove me nuts that I was constantly being sent to ipc_test
 files for various typedefs. Thanks!

Oh, yeah, that was a _big_ pain.  Thanks!

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Peter Geoghegan

On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I didn't say that. On the contrary, I think the shotgun approach jsonb_ops
 and jsonb_hash_ops take is too broad. It should be possible to specify what
 to index in a more detailed fashion.

It is - use an expression index. That's by far the most important way
to specify what to index in a more detailed fashion. There are others,
but that's the major one. Beyond that, yes, it's necessary to
carefully write your query predicate a certain way. However, a similar
situation exists in MongoDB, where there is a distinction between
Indexes on embedded fields (which must be accessed using special
dot notation) and indexes on subdocuments (which cannot be
accessed using dot notation). It's late here, but I'm pretty sure
that's a feature and not a limitation.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Simon Riggs

On 21 March 2014 14:22, Andres Freund and...@2ndquadrant.com wrote:

 That seems to work fairly well. On the few tests I could run on my
 laptop - I've done this during a flight - it's a small performance win
 in all cases I could test. While saving a fair amount of memory.

We've got to the stage now that saving this much memory is essential,
so this patch is a must-have.

The patch does all I would expect and no more, so approach and details
look good to me.

Performance? Discussed many years ago, but I suspect the micro-tuning
of those earlier patches wasn't as good as it is here.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ALTER TABLE set reloptions

2014-04-09 Thread Simon Riggs

As part of the ALTER TABLE lock reductions we've now agreed that
reloptions should have a lock level associated with them, so we can
take appropriate lock levels.

Attached patch will be applied at start of 9.5 dev cycle, so that any
new relopt authors are aware that lock levels are needed for any new
work.

Later patch will then use this infrastructure to (attempt) to reduce
lock levels, assuming no problems.

Added to next CF for June 2014, not for commit yet.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


relopt.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Andres Freund

On 2014-04-09 05:34:42 -0400, Simon Riggs wrote:
 On 21 March 2014 14:22, Andres Freund and...@2ndquadrant.com wrote:
 
  That seems to work fairly well. On the few tests I could run on my
  laptop - I've done this during a flight - it's a small performance win
  in all cases I could test. While saving a fair amount of memory.
 
 We've got to the stage now that saving this much memory is essential,
 so this patch is a must-have.

I think some patch like this is necessary - I am not 100% sure mine is
the one true approach here, but it certainly seems simple enough.

 Performance? Discussed many years ago, but I suspect the micro-tuning
 of those earlier patches wasn't as good as it is here.

It's a small win on small machines (my laptop, 16GB), so we need to
retest with 128GB shared_buffers or such on bigger ones. There
PrivateRefCount previously was the source of a large portion of the
cache misses...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pointer to structure in ECPG

2014-04-09 Thread Michael Meskes

Thanks Ashutosh,

both patches committed and backported to the whole 9.* series.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Merge Join for Non '=' Operators

2014-04-09 Thread Dilip kumar


On 09 April 2014 13:31, Nicolas Barbier Wrote
 Do you have a real-world example use case of such joins, to offset the
 extra planner time that will likely have to be paid (even for queries
 for which the functionality ends up not being used)?
 
 I guess there might be queries that join on “values that are not too
 far apart” or something, but as those cases (there typically not being
 a lot of “inner” rows that join with each “outer” row) are probably
 executed efficiently using a nested loop + index scan, I don’t see the
 point (yet). Are you aiming for the case where the inner relation is
 difficult to compute and cannot be accessed using an index scan?

I think this will be more useful when Both the relation are Big and are of 
almost equal size.

Here we can compare the cost of existing methods and new approach..

For such case planner can select 
Either(1). NLJ [Outer(seq scan) JOIN Inner (Materialize))
OR  (2). NLJ [Outer(seq scan) JOIN Inner (Index)) As you mentioned.

In approach (1), cost will be:
NLJ Cost : Outer Tuple * Inner Tuple
+ I/O cost : Seq Page Cost (Inner rel pages + outer rel pages).

* And relation can be too big for materialization.

In approach (2), Cost will be:  
NLJ Cost : OuterTuple * Inner Tuple in Path{ Inner Tuple in 
Path - Join selectivity * Number of inner tuple}
+ I/O Cost : OuterTuple * Index Rescan Cost{ Index Rescan 
Cost will depend upon (how many pages fetched in scan)} 

* This will be costly because I/O cost will increase for doing 
multiple index rescan(for each outer).

In New Approach(3) Cost will be:
(since here for each outer tuple we need not to scan complete 
Inner Tuple.)
MJ Cost : Outer Tuple * Inner Tuple in Path + (every outer 
tuple need to scan some tuple before reach to qualifying tuple)
+ I/O Cost : Index Scan Cost   {Only one scan}  


So for This Best case cost will be : MJ Cost : Outer Tuple * 
Inner Tuple in Path + I/O Cost : Index Scan Cost
Worst case cost will be:  MJ Cost : Outer Tuple 
* Inner Tuple + I/O Cost : Index Scan Cost

So for many case approach(3) can be cheaper, that can be 
detected by planner cost calculation.


  selecting this new cost calculation can be implemented in planner
 
 Hmm. Of course, the difficult part will be adding support for this in
 the planner, but you don’t seem to have any plans for implementing that?


Yes, I have plan to implement this part, but it's not completed yet.

Thanks  Regards,
Dilip



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pointer to structure in ECPG

2014-04-09 Thread Ashutosh Bapat

Thanks a lot Michael.


On Wed, Apr 9, 2014 at 3:59 PM, Michael Meskes mes...@postgresql.orgwrote:

 Thanks Ashutosh,

 both patches committed and backported to the whole 9.* series.

 Michael
 --
 Michael Meskes
 Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
 Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
 Jabber: michael.meskes at gmail dot com
 VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-09 Thread Amit Kapila

On Tue, Apr 8, 2014 at 9:15 PM, Robert Haas robertmh...@gmail.com wrote:
 Apparently not.  However, I'm fairly sure this is a step toward
 addressing the complaints previously raised, even if there may be some
 details people still want changed, so I've gone ahead and committed
 it.

Few Observations:

1. One new warning has been introduced in code.
1src\backend\port\win32_shmem.c(295): warning C4013:
'dsm_set_control_handle' undefined; assuming extern returning int
Attached patch fixes this warning.

2. On running test_shm_mq manually, the client side didn't get
any problem (the results are as expected). However I am seeing
below log on server:
LOG:  registering background worker test_shm_mq
LOG:  starting background worker process test_shm_mq
LOG:  unrecognized win32 error code: 127
LOG:  worker process: test_shm_mq (PID 2528) exited with exit code 1
LOG:  unregistering background worker test_shm_mq

I think below message in log is not expected:
LOG:  unrecognized win32 error code: 127

This has been started appearing after your commit related to DSM.
I have debugged this issue and found that it comes from below part
of code:
dsm_impl_windows
{
..
else
{
hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ,
  FALSE, /* do not inherit the name */
  name); /* name of mapping object */
_dosmaperr(GetLastError());
}
}

Here even though handle returned by OpenFileMapping() is a valid handle,
but still GetLastError() return 127 (The specified procedure could not be
found.).  Now the specs[1] of API says that if handle is non-NULL, then
consider it success, so I am not sure whether we should bother about this
error or not.  I have tried many ways (trying different parameters, search on
net) to change this and related API's, but the same problem exists.  One
strange thing is if I just call the API twice (I know this is not
right, but just
to experiment for finding some info), second time this error doesn't
occur.
The only difference in latest changes is that now the OpenFileMapping()
is getting called by Child process rather than peer process (w.r.t process
that has called CreateFileMapping), don't know why this should make
such a difference.

On net whatever examples I have seen for such usage, they call
GetLastError() only if handle is invalid, we have called in above fashion
just to keep code little bit simple.

I am just not sure whether it is okay to rearrange the code and call
GetLastError() only if returned handle is Invalid (NULL) or try to look
for more info.

One question:
1. I have seen that initdb still creates pg_dynshmem, is it required
after your latest changes?


Let me know your opinion?

[1]
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366791(v=vs.85).aspx



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_dsm_win_warning.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Robert Haas

On Wed, Apr 9, 2014 at 5:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
 We've got to the stage now that saving this much memory is essential,
 so this patch is a must-have.

 The patch does all I would expect and no more, so approach and details
 look good to me.

 Performance? Discussed many years ago, but I suspect the micro-tuning
 of those earlier patches wasn't as good as it is here.

I think this approach is practically a slam-dunk when the number of
pins is small (as it typically is).  I'm less clear what happens when
we overflow from the small array into the hashtable.  That certainly
seems like it could be a loss, but how do we construct such a case to
test it?  A session with lots of suspended queries?  Can we generate a
regression by starting a few suspended queries to use up the array
elements, and then running a scan that pins and unpins many buffers?

One idea is: if we fill up all the array elements and still need
another one, evict all the elements to the hash table and then start
refilling the array.  The advantage of that over what's done here is
that the active scan will always being using an array slot rather than
repeated hash table manipulations.  I guess you'd still have to probe
the hash table repeatedly, but you'd avoid entering and removing items
frequently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Andres Freund

On 2014-04-09 08:22:15 -0400, Robert Haas wrote:
 On Wed, Apr 9, 2014 at 5:34 AM, Simon Riggs si...@2ndquadrant.com wrote:
  We've got to the stage now that saving this much memory is essential,
  so this patch is a must-have.
 
  The patch does all I would expect and no more, so approach and details
  look good to me.
 
  Performance? Discussed many years ago, but I suspect the micro-tuning
  of those earlier patches wasn't as good as it is here.
 
 I think this approach is practically a slam-dunk when the number of
 pins is small (as it typically is).  I'm less clear what happens when
 we overflow from the small array into the hashtable.  That certainly
 seems like it could be a loss, but how do we construct such a case to
 test it?  A session with lots of suspended queries?  Can we generate a
 regression by starting a few suspended queries to use up the array
 elements, and then running a scan that pins and unpins many buffers?

I've tried to reproduce problems around this (when I wrote this), but
it's really hard to construct cases that need more than 8 pins. I've
tested performance for those cases by simply not using the array, and
while the performance suffers a bit, it's not that bad.

 One idea is: if we fill up all the array elements and still need
 another one, evict all the elements to the hash table and then start
 refilling the array.  The advantage of that over what's done here is
 that the active scan will always being using an array slot rather than
 repeated hash table manipulations.  I guess you'd still have to probe
 the hash table repeatedly, but you'd avoid entering and removing items
 frequently.

We could do that, but my gut feeling is that it's not necessary. There'd
be some heuristic to avoid doing that all the time, otherwise we'd
probably regress.
I think the fact that we pin/unpin very frequently will put frequently
used pins to the array most of the time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Pavan Deolasee

On Wed, Apr 9, 2014 at 6:02 PM, Andres Freund and...@2ndquadrant.comwrote:



 I've tried to reproduce problems around this (when I wrote this), but
 it's really hard to construct cases that need more than 8 pins. I've
 tested performance for those cases by simply not using the array, and
 while the performance suffers a bit, it's not that bad.


AFAIR this was suggested before and got rejected because constructing that
worst case and proving that the approach does not perform too badly was a
challenge. Having said that, I agree its time to avoid that memory
allocation, especially with large number of backends running with large
shared buffers.

An orthogonal issue I noted is that we never check for overflow in the ref
count itself. While I understand overflowing int32 counter will take a
large number of pins on the same buffer, it can still happen in the worst
case, no ? Or is there a theoretical limit on the number of pins on the
same buffer by a single backend ?

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Andres Freund

On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
 On Wed, Apr 9, 2014 at 6:02 PM, Andres Freund and...@2ndquadrant.comwrote:
  I've tried to reproduce problems around this (when I wrote this), but
  it's really hard to construct cases that need more than 8 pins. I've
  tested performance for those cases by simply not using the array, and
  while the performance suffers a bit, it's not that bad.

 AFAIR this was suggested before and got rejected because constructing that
 worst case and proving that the approach does not perform too badly was a
 challenge. Having said that, I agree its time to avoid that memory
 allocation, especially with large number of backends running with large
 shared buffers.

Well, I've tested the worst case by making *all* pins go through the
hash table. And it didn't regress too badly, although it *was* visible
in the profile.
I've searched the archive and to my knowledge nobody has actually sent a
patch implementing this sort of schemes for pins, although there's been
talk about various ways to solve this.

 An orthogonal issue I noted is that we never check for overflow in the ref
 count itself. While I understand overflowing int32 counter will take a
 large number of pins on the same buffer, it can still happen in the worst
 case, no ? Or is there a theoretical limit on the number of pins on the
 same buffer by a single backend ?

I think we'll die much earlier, because the resource owner array keeping
track of buffer pins will be larger than 1GB.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Robert Haas

On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund and...@2ndquadrant.com wrote:
 I've tried to reproduce problems around this (when I wrote this), but
 it's really hard to construct cases that need more than 8 pins. I've
 tested performance for those cases by simply not using the array, and
 while the performance suffers a bit, it's not that bad.

Suspended queries won't do it?

Also, it would be good to quantify not that bad.  Actually, this
thread is completely lacking any actual benchmark results...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \d+ and oid display

2014-04-09 Thread Robert Haas

On Wed, Apr 9, 2014 at 1:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, that's sorta my concern.  I mean, right now we've got people
 saying what the heck is a replica identity?.  But, if the logical
 decoding stuff becomes popular, as I hope it will, that's going to be
 an important thing for people to adjust, and the information needs to
 be present in a clear and easily-understood way.  I haven't studied
 the current code in detail so maybe it's fine.  I just want to make
 sure we're not giving it second-class treatment solely on the basis
 that it's new and people aren't using it yet.

 I think the proposal is don't mention the property if it has the
 default value.  That's not second-class status, as long as people
 who know what the property is understand that behavior.  It's just
 conserving screen space.

One thing that concerns me is that replica identity has a different
default for system tables (NOTHING) than for other tables (DEFAULT).
So when we say we're not going to display the default value, are we
going to display it when it's not NOTHING, when it's not DEFAULT, or
when it's not the actual default for that particular kind of table?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Andres Freund

On 2014-04-09 09:17:59 -0400, Robert Haas wrote:
 On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund and...@2ndquadrant.com wrote:
  I've tried to reproduce problems around this (when I wrote this), but
  it's really hard to construct cases that need more than 8 pins. I've
  tested performance for those cases by simply not using the array, and
  while the performance suffers a bit, it's not that bad.
 
 Suspended queries won't do it?

What exactly do you mean by suspended queries? Defined and started
portals? Recursive query execution?

 Also, it would be good to quantify not that bad.

The 'not bad' comes from my memory of the benchmarks I'd done after
about 12h of flying around ;).

Yes, it needs real benchmarks. Probably won't get to it the next few
days tho.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Merge Join for Non '=' Operators

2014-04-09 Thread Tom Lane

Dilip kumar dilip.ku...@huawei.com writes:
 On 09 April 2014 13:31, Nicolas Barbier Wrote
 Do you have a real-world example use case of such joins, to offset the
 extra planner time that will likely have to be paid (even for queries
 for which the functionality ends up not being used)?

 I think this will be more useful when Both the relation are Big and are of 
 almost equal size.

I'm not following how this would help much.  Consider a.x  b.y where
we know that the inputs are sorted by x/y.  I think you are saying that
for a given b row, we could scan forward from the beginning of a, but
stop as soon as we find a row with a.x = b.y.  Great ... but instead of
comparing M*N rows, we are comparing M*N/2 rows.  So it's still O(N^2),
though with a slightly smaller constant.  Is this going to be worth the
added time to sort the inputs?

It's slightly more promising for range constraints (that is, a.x
between b.y and b.z) but even then you have to assume that the ranges
are narrow, at which point a nestloop join using an inner indexscan on
a.x might do about as well.

So personally, I suspect there's a reason why this isn't a standard
join algorithm already.  If you want to try it, I'd suggest that you
focus on getting to where you can do some performance testing ASAP,
without doing much code polishing or worrying about teaching the
planner how to cost it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Robert Haas

On Wed, Apr 9, 2014 at 9:38 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-04-09 09:17:59 -0400, Robert Haas wrote:
 On Wed, Apr 9, 2014 at 8:32 AM, Andres Freund and...@2ndquadrant.com wrote:
  I've tried to reproduce problems around this (when I wrote this), but
  it's really hard to construct cases that need more than 8 pins. I've
  tested performance for those cases by simply not using the array, and
  while the performance suffers a bit, it's not that bad.

 Suspended queries won't do it?

 What exactly do you mean by suspended queries? Defined and started
 portals? Recursive query execution?

Open a cursor and fetch from it; leave it open while doing other things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-09 18:13:29 +0530, Pavan Deolasee wrote:
 An orthogonal issue I noted is that we never check for overflow in the ref
 count itself. While I understand overflowing int32 counter will take a
 large number of pins on the same buffer, it can still happen in the worst
 case, no ? Or is there a theoretical limit on the number of pins on the
 same buffer by a single backend ?

 I think we'll die much earlier, because the resource owner array keeping
 track of buffer pins will be larger than 1GB.

The number of pins is bounded, more or less, by the number of scan nodes
in your query plan.  You'll have run out of memory trying to plan the
query, assuming you live that long.

The resource managers are interesting to bring up in this context.
That mechanism didn't exist when PrivateRefCount was invented.
Is there a way we could lay off the work onto the resource managers?
(I don't see one right at the moment, but I'm under-caffeinated still.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Andres Freund

On 2014-04-09 10:09:44 -0400, Tom Lane wrote:
 The resource managers are interesting to bring up in this context.
 That mechanism didn't exist when PrivateRefCount was invented.
 Is there a way we could lay off the work onto the resource managers?
 (I don't see one right at the moment, but I'm under-caffeinated still.)

Yea, that's something I've also considered, but I couldn't come up with
a performant and sensibly complicated way to do it.
There's some nasty issues with pins held by different ResourceOwners and
such, so even if we could provide sensible random access to check for
existing pins, it wouldn't be a simple thing.

It's not unreasonable to argue that we just shouldn't optimize for
several pins held by the same backend for the same and always touch the
global count. Thanks to resource managers the old reason for
PrivateRefCount, which was the need to be able cleanup remaining pins in
case of error, doesn't exist anymore.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 It's not unreasonable to argue that we just shouldn't optimize for
 several pins held by the same backend for the same and always touch the
 global count.

NAK.  That would be a killer because of increased contention for buffer
headers.  The code is full of places where a buffer's PrivateRefCount
jumps up and down a bit, for example when transferring a tuple into a
TupleTableSlot.  (I said upthread that the number of pins is bounded by
the number of scan nodes, but actually it's probably some small multiple
of that --- eg a seqscan would hold its own pin on the current buffer,
and there'd be a slot or two holding the current tuple, each with its
own pin count.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC, POC] Don't require a NBuffer sized PrivateRefCount array of local buffer pins

2014-04-09 Thread Andres Freund

On 2014-04-09 10:26:25 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  It's not unreasonable to argue that we just shouldn't optimize for
  several pins held by the same backend for the same and always touch the
  global count.
 
 NAK.

Note I didn't implement it because I wasn't too convinced either ;)

 That would be a killer because of increased contention for buffer
 headers.  The code is full of places where a buffer's PrivateRefCount
 jumps up and down a bit, for example when transferring a tuple into a
 TupleTableSlot.

On the other hand in those scenarios the backend is pretty likely to
already have the cacheline locally in exclusive mode...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup

2014-04-09 Thread Magnus Hagander

On Wed, Apr 9, 2014 at 2:06 AM, Prabakaran, Vaishnavi 
vaishna...@fast.au.fujitsu.com wrote:

  Hi all,



 Following the discussion in message id -
 cahgqgwffmor4ecugwhzpaapyqbsekdg66vmj1rvej6z-ega...@mail.gmail.com  , I
 have developed the patch which gives option to user to exclude pg_log
 directory contents in pg_basebackup.



 [Current situation]

 During pg_basebackup, all files in pg_log directory will be copied to new
 backup directory.



 [Design]

 - Added new non-mandatory option -S/--skip-log-dir to pg_basebackup .

 - If skip-log-dir is specified in pg_basebackup command, then in
 basebackup, exclude copying log files from standard pg_log directory and
 any other directory specified in Log_directory guc variable. (Still empty
 folder pg_log/$Log_directory will be created)

 - In case, pg_log/$Log_directory is symbolic link, then an empty folder
 will be created



 [Advantage]

 It gives an option to user to avoid copying of large log files if they
 doesn't wish to and hence can save memory space.





While pg_log is definitely the most common one being the default on many
platforms, we'll still be missing other ones. Should we really hardcode it,
or should we somehow derive it from the settings for log_directory instead?

As a more general discussion, is this something we might want to expose as
a more general facility rather than hardcode it to the log directory only?
And is it perhaps something we'd rather have configured at the server than
specified in pg_basebackup - like a guc saying which directories should
always be excluded from a basebackup? So you don't have to remember it
every time?



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup

2014-04-09 Thread Alvaro Herrera

Magnus Hagander wrote:

 While pg_log is definitely the most common one being the default on many
 platforms, we'll still be missing other ones. Should we really hardcode it,
 or should we somehow derive it from the settings for log_directory instead?
 
 As a more general discussion, is this something we might want to expose as
 a more general facility rather than hardcode it to the log directory only?
 And is it perhaps something we'd rather have configured at the server than
 specified in pg_basebackup - like a guc saying which directories should
 always be excluded from a basebackup? So you don't have to remember it
 every time?

So it'd be an array, and by default you'd have something like:
basebackup_skip_path = $log_directory
?

Maybe use it to skip backup labels by default as well.
basebackup_skip_path = $log_directory, $backup_label_files

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup

2014-04-09 Thread Magnus Hagander

On Wed, Apr 9, 2014 at 4:55 PM, Alvaro Herrera alvhe...@2ndquadrant.comwrote:

 Magnus Hagander wrote:

  While pg_log is definitely the most common one being the default on many
  platforms, we'll still be missing other ones. Should we really hardcode
 it,
  or should we somehow derive it from the settings for log_directory
 instead?
 
  As a more general discussion, is this something we might want to expose
 as
  a more general facility rather than hardcode it to the log directory
 only?
  And is it perhaps something we'd rather have configured at the server
 than
  specified in pg_basebackup - like a guc saying which directories should
  always be excluded from a basebackup? So you don't have to remember it
  every time?

 So it'd be an array, and by default you'd have something like:
 basebackup_skip_path = $log_directory
 ?

 Maybe use it to skip backup labels by default as well.
 basebackup_skip_path = $log_directory, $backup_label_files


I hadn't considered any details, but yes, someting along that line. And
then you could also include arbitrary filenames or directories should you
want. E.g. if you use the data directory to store your torrents or
something.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Alexander Korotkov

On Wed, Apr 9, 2014 at 10:37 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 The ship has cleatly sailed to add parameterized opclasses to 9.4, but
 let's keep it in mind when we decide on the defaults.

 In the absence of parameterizable opclasses, it would be much more
 flexible to have opclasses that index, keys, values, key-value pairs and
 paths separately, instead of the current json_ops and json_hash_ops
 opclasses which index all of those in the same index. That way, if you only
 e.g. ever query on the existence of a key, you'd only need to index the
 keys.

 I don't understand how we ended up with the current dichotomy of json_ops
 and json_hash_ops...


+1 for parameterizable opclasses

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Get more from indices.

2014-04-09 Thread Tom Lane

Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 Oops! I found a bug in this patch. The previous v8 patch missed
 the case that build_index_pathkeys() could build a partial
 pathkeys from the index tlist.

TBH I think that's barely the tip of the iceberg of cases where this
patch will get the wrong answer.  It looks like it thinks that *any*
Var belonging to the indexed relation must be the same one indexed
by the index.  Also, I don't see it doing anything to check the ordering
of multiple index columns --- for instance, an index on (a,b) and one
on (b,a) would both pass or fail identically, though surely only one
should match ORDER BY a,b,  Also, what's with the success return
before the loop:

+   if (list_length(pathkeys) == list_length(root-query_pathkeys))
+   return true;

At this point you haven't proven *anything at all* about whether the
index columns have something to do with the query_pathkeys.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Alexander Korotkov

On Wed, Apr 9, 2014 at 12:40 PM, Peter Geoghegan p...@heroku.com wrote:

 On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  I didn't say that. On the contrary, I think the shotgun approach
 jsonb_ops
  and jsonb_hash_ops take is too broad. It should be possible to specify
 what
  to index in a more detailed fashion.

 It is - use an expression index. That's by far the most important way
 to specify what to index in a more detailed fashion. There are others,
 but that's the major one. Beyond that, yes, it's necessary to
 carefully write your query predicate a certain way. However, a similar
 situation exists in MongoDB, where there is a distinction between
 Indexes on embedded fields (which must be accessed using special
 dot notation) and indexes on subdocuments (which cannot be
 accessed using dot notation). It's late here, but I'm pretty sure
 that's a feature and not a limitation.


I believe that serious limitation we now have is that we actually specify
kind of index to be used in the SQL query.
For example you need to find objects with active = true. You can write:

js @ {active: true}

then GIN index on js can be used. Also you can write:

js-'active' = true

then btree expression index on (js-'active') can be used. For sure, one
can do

js @ {active: true} AND js-'active' = true

This query can use any of indexes, but it is:
1) Cluge
2) Excess recheck
3) If both indexes present, excess bitmap and.

Having to choose index in SQL-query we make our SQL more imperative and
less declarative. Similar things can happen without json/hstore (user have
to rewrite SQL in order to use expression index), but now it could become
very common. My opinion is that we have to do something in planner to make
it understand at least this two kinds of queries to be equivalent.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup

2014-04-09 Thread Alvaro Herrera

Magnus Hagander wrote:
 On Wed, Apr 9, 2014 at 4:55 PM, Alvaro Herrera 
 alvhe...@2ndquadrant.comwrote:

  So it'd be an array, and by default you'd have something like:
  basebackup_skip_path = $log_directory
  ?
 
  Maybe use it to skip backup labels by default as well.
  basebackup_skip_path = $log_directory, $backup_label_files
 
 
 I hadn't considered any details, but yes, someting along that line. And
 then you could also include arbitrary filenames or directories should you
 want. E.g. if you use the data directory to store your torrents or
 something.

Man, that's a great idea.  Database servers have lots of diskspace in
that partition, so it should work really well.  Thanks!

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Robert Haas

On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Both of the operator classes are actually much less flexible than I'd like.
 Firstly, they index everything. In many cases, that's not what you want, so
 you end up with much larger indexes than necessary. Secondly, jsonb_ops
 indexes all values separately from the keys. That makes the index pretty
 much useless for a query on, say, WHERE json @ '{needs_processing:true}',
 if all the rows also contain a key-value pair active:true. Thirdly,
 inequality operators are not supported; you can't search for rows with (the
 json-syntax equivalent of) price  12.3. Fourthly, sometimes you would
 want to include the path to an entry in the key, sometimes not.

Maybe we should make *neither* of these the default opclass, and give
*neither* the name json_ops.

 ISTM we need a way to parameterize opclasses, so that when you create the
 index, you specify the above things.

Yeah, that would be great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add psql tab completion for event triggers

2014-04-09 Thread Robert Haas

On Tue, Apr 8, 2014 at 5:27 AM, Ian Barwick i...@2ndquadrant.com wrote:
 On 08/04/14 18:22, Ian Barwick wrote:

 As it was kind of annoying not to have this when playing around with
 event triggers.

 This also tightens up the existing tab completion for ALTER TRIGGER,
 which contained redundant code for table name completion, and which was
 also causing a spurious RENAME TO to be inserted in this context:

  CREATE EVENT TRIGGER foo ON {event} ^I


 Apologies, previous patch had some unrelated changes in it.

 Correct patch attached.

This *still* has some unrelated things in it, like s/Pgsql/Postgres/,
and numerous hunks consisting entirely of whitespace changes and/or
changes to unrelated comments.

Also, what's the point of this hunk:

*** psql_completion(const char *text, int st
*** 1318,1340 
 pg_strcasecmp(prev2_wd, TRIGGER) == 0)
COMPLETE_WITH_CONST(ON);

-   else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
-pg_strcasecmp(prev3_wd, TRIGGER) == 0)
-   {
-   completion_info_charp = prev2_wd;
-   COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger);
-   }
-
/*
!* If we have ALTER TRIGGER sth ON, then add the correct tablename
 */
else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
 pg_strcasecmp(prev3_wd, TRIGGER) == 0 
 pg_strcasecmp(prev_wd, ON) == 0)
!   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);

/* ALTER TRIGGER name ON name */
!   else if (pg_strcasecmp(prev4_wd, TRIGGER) == 0 
 pg_strcasecmp(prev2_wd, ON) == 0)
COMPLETE_WITH_CONST(RENAME TO);

--- 1355,1374 
 pg_strcasecmp(prev2_wd, TRIGGER) == 0)
COMPLETE_WITH_CONST(ON);

/*
!* If we have ALTER TRIGGER name ON, then add the correct tablename
 */
else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
 pg_strcasecmp(prev3_wd, TRIGGER) == 0 
 pg_strcasecmp(prev_wd, ON) == 0)
!   {
!   completion_info_charp = prev2_wd;
!   COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger);
!   }

/* ALTER TRIGGER name ON name */
!   else if (pg_strcasecmp(prev5_wd, ALTER) == 0 
!pg_strcasecmp(prev4_wd, TRIGGER) == 0 
 pg_strcasecmp(prev2_wd, ON) == 0)
COMPLETE_WITH_CONST(RENAME TO);



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 Both of the operator classes are actually much less flexible than I'd like.

 Maybe we should make *neither* of these the default opclass, and give
 *neither* the name json_ops.

There's definitely something to be said for that.  Default opclasses are
sensible when there's basically only one behavior that's interesting for
most people.  We can already see that that's not going to be the case
for jsonb indexes, at least not with the currently available alternatives.

Not having a default would force users to make decisions explicitly.
Is that what we want?

One other point here is that non-default opclasses can't be used in
UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
specify an opclass name in those syntaxes.  UNIQUE/PRIMARY KEY don't
matter here since these aren't btree opclasses, but is there a
use-case for EXCLUDE with any of the supported jsonb operators?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Including replication slot data in base backups

2014-04-09 Thread Robert Haas

On Tue, Apr 8, 2014 at 3:08 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Apr 8, 2014 at 1:18 AM, Robert Haas robertmh...@gmail.com wrote:
 Not sure if this is exactly the right way to do it, but I agree that
 something along those lines is a good idea.  I also think, maybe even
 importantly, that we should probably document that people using
 file-copy based hot backups should strongly consider removing the
 replication slots by hand before using the backup.
 Good point. Something here would be adapted in this case:
 http://www.postgresql.org/docs/devel/static/backup-file.html
 I am attaching an updated patch as well.

What you've got here doesn't look like the right section to update -
the section you've updated is on taking a cold backup.  The section I
think you want to update is Making a Base Backup Using the Low Level
API, and specifically this part:

You can, however, omit from the backup dump the files within the
cluster's pg_xlog/ subdirectory. This slight adjustment is worthwhile
because it reduces the risk of mistakes when restoring. This is easy
to arrange if pg_xlog/ is a symbolic link pointing to someplace
outside the cluster directory, which is a common setup anyway for
performance reasons. You might also want to exclude postmaster.pid and
postmaster.opts, which record information about the running
postmaster, not about the postmaster which will eventually use this
backup. (These files can confuse pg_ctl.)

What I'd propose is adding a second paragraph like this:

It is often a good idea to also omit from the backup dump the files
within the cluster's pg_replslot/ directory, so that replication slots
that exist on the master do not become part of the backup.  Otherwise,
the subsequent use of the backup to create a standby may result in
indefinite retention of WAL files on the standby, and possibly bloat
on the master if hot standby feedback is enabled, because the clients
that are using those replication slots will still be connecting to and
updating the slots on the master, not the standby.  Even if the backup
is only intended for use in creating a new master, copying the
replication slots isn't expected to be particularly useful, since the
contents of those slots will likely be badly out of date by the time
the new master comes on line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \d+ and oid display

2014-04-09 Thread Bruce Momjian

On Wed, Apr  9, 2014 at 09:27:11AM -0400, Robert Haas wrote:
 On Wed, Apr 9, 2014 at 1:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Well, that's sorta my concern.  I mean, right now we've got people
  saying what the heck is a replica identity?.  But, if the logical
  decoding stuff becomes popular, as I hope it will, that's going to be
  an important thing for people to adjust, and the information needs to
  be present in a clear and easily-understood way.  I haven't studied
  the current code in detail so maybe it's fine.  I just want to make
  sure we're not giving it second-class treatment solely on the basis
  that it's new and people aren't using it yet.
 
  I think the proposal is don't mention the property if it has the
  default value.  That's not second-class status, as long as people
  who know what the property is understand that behavior.  It's just
  conserving screen space.
 
 One thing that concerns me is that replica identity has a different
 default for system tables (NOTHING) than for other tables (DEFAULT).
 So when we say we're not going to display the default value, are we
 going to display it when it's not NOTHING, when it's not DEFAULT, or
 when it's not the actual default for that particular kind of table?

We exclude pg_catalog from displaying Replica Identity due to this
inconsistency.  I assume this was desired because you can't replicate
system tables.  Is that true?  The current test is:

if ((tableinfo.relkind == 'r' || tableinfo.relkind == 'm') 
/*
 * No need to display default values;  we already display a
 * REPLICA IDENTITY marker on indexes.
 */
tableinfo.relreplident != 'd'  tableinfo.relreplident != 'i' 
strcmp(schemaname, pg_catalog) != 0)

What might make more sense is this:

if ((tableinfo.relkind == 'r' || tableinfo.relkind == 'm') 
/*
 * No need to display default values;  we already display a
 * REPLICA IDENTITY marker on indexes.
 */
tableinfo.relreplident != 'i' 
((strcmp(schemaname, pg_catalog) != 0  tableinfo.relreplident 
!= 'd') ||
 (strcmp(schemaname, pg_catalog) == 0  tableinfo.relreplident 
!= 'n')))

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \d+ and oid display

2014-04-09 Thread Andres Freund

On 2014-04-09 11:42:32 -0400, Bruce Momjian wrote:
 On Wed, Apr  9, 2014 at 09:27:11AM -0400, Robert Haas wrote:
  On Wed, Apr 9, 2014 at 1:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
   Robert Haas robertmh...@gmail.com writes:
   Well, that's sorta my concern.  I mean, right now we've got people
   saying what the heck is a replica identity?.  But, if the logical
   decoding stuff becomes popular, as I hope it will, that's going to be
   an important thing for people to adjust, and the information needs to
   be present in a clear and easily-understood way.  I haven't studied
   the current code in detail so maybe it's fine.  I just want to make
   sure we're not giving it second-class treatment solely on the basis
   that it's new and people aren't using it yet.
  
   I think the proposal is don't mention the property if it has the
   default value.  That's not second-class status, as long as people
   who know what the property is understand that behavior.  It's just
   conserving screen space.
  
  One thing that concerns me is that replica identity has a different
  default for system tables (NOTHING) than for other tables (DEFAULT).
  So when we say we're not going to display the default value, are we
  going to display it when it's not NOTHING, when it's not DEFAULT, or
  when it's not the actual default for that particular kind of table?
 
 We exclude pg_catalog from displaying Replica Identity due to this
 inconsistency.

I don't understand why it's inconsistent, but whatever.

 I assume this was desired because you can't replicate
 system tables.  Is that true?

Yes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-09 Thread Robert Haas

On Wed, Apr 9, 2014 at 7:41 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 Few Observations:

 1. One new warning has been introduced in code.
 1src\backend\port\win32_shmem.c(295): warning C4013:
 'dsm_set_control_handle' undefined; assuming extern returning int
 Attached patch fixes this warning.

OK, committed, after moving the header to the correct position in
alphabetical order.

 2. On running test_shm_mq manually, the client side didn't get
 any problem (the results are as expected). However I am seeing
 below log on server:
 LOG:  registering background worker test_shm_mq
 LOG:  starting background worker process test_shm_mq
 LOG:  unrecognized win32 error code: 127
 LOG:  worker process: test_shm_mq (PID 2528) exited with exit code 1
 LOG:  unregistering background worker test_shm_mq

 I think below message in log is not expected:
 LOG:  unrecognized win32 error code: 127

 This has been started appearing after your commit related to DSM.
 I have debugged this issue and found that it comes from below part
 of code:
 dsm_impl_windows
 {
 ..
 else
 {
 hmap = OpenFileMapping(FILE_MAP_WRITE | FILE_MAP_READ,
   FALSE, /* do not inherit the name */
   name); /* name of mapping object */
 _dosmaperr(GetLastError());
 }
 }

 Here even though handle returned by OpenFileMapping() is a valid handle,
 but still GetLastError() return 127 (The specified procedure could not be
 found.).  Now the specs[1] of API says that if handle is non-NULL, then
 consider it success, so I am not sure whether we should bother about this
 error or not.  I have tried many ways (trying different parameters, search on
 net) to change this and related API's, but the same problem exists.  One
 strange thing is if I just call the API twice (I know this is not
 right, but just
 to experiment for finding some info), second time this error doesn't
 occur.
 The only difference in latest changes is that now the OpenFileMapping()
 is getting called by Child process rather than peer process (w.r.t process
 that has called CreateFileMapping), don't know why this should make
 such a difference.

 On net whatever examples I have seen for such usage, they call
 GetLastError() only if handle is invalid, we have called in above fashion
 just to keep code little bit simple.

 I am just not sure whether it is okay to rearrange the code and call
 GetLastError() only if returned handle is Invalid (NULL) or try to look
 for more info.

Well, I don't know either.  You wrote the Windows portion of this
code, so you'll have to decide what's best.  If the best practice in
this area is to only call GetLastError() if the handle isn't valid,
then that's probably what we should do, too.  But I can't speak to
what the best practice is.

 One question:
 1. I have seen that initdb still creates pg_dynshmem, is it required
 after your latest changes?

It's only used now if dynamic_shared_memory_type = mmap.  I know
Andres was never a huge fan of the mmap implementation, so we could
rip that out and get rid of the directory, too, but I kind of liked
having it, and broke the tie in favor of myself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-09 Thread Andres Freund

On 2014-04-09 11:50:33 -0400, Robert Haas wrote:
  One question:
  1. I have seen that initdb still creates pg_dynshmem, is it required
  after your latest changes?
 
 It's only used now if dynamic_shared_memory_type = mmap.  I know
 Andres was never a huge fan of the mmap implementation, so we could
 rip that out and get rid of the directory, too, but I kind of liked
 having it, and broke the tie in favor of myself.

It's purely a toy thing. I think pretty much every dynshm user that
actually transfers data through it will be better off falling back to
single process style work, rather than parellizing it.

Anyway, it's there, it doesn't presently cause problems, and I don't
have to maintain it, so ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-09 Thread Robert Haas

On Wed, Apr 9, 2014 at 12:24 AM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:
  Deadlock Detection:
 I'm not sure how this would work out internally
 In order to resolve deadlock, two member variable will be created in the 
 structure PROLOCK:
 Bitmask for lock types currently held by autonomous transaction.
 LOCKMASKholdMaskByAutoTx[MAX_AUTO_TX_LEVEL]
 Bitmask for lock types currently held by main transaction.
 LOCKMASKholdMaskByNormalTx

 Now when we grant the lock to particular transaction, depending on type of 
 transaction, bit
 Mask will be set for either holdMaskByAutoTx or holdMaskByNormalTx.
 Similar when lock is ungranted, corresponding bitmask will be reset.

That sounds pretty ugly, not to mention the fact that it will cause a
substantial increase in the amount of memory required to store
PROCLOCKs.  It will probably slow things down, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Alvaro Herrera

Tom Lane wrote:

  On Wed, Apr 9, 2014 at 2:37 AM, Heikki Linnakangas
  hlinnakan...@vmware.com wrote:
  Both of the operator classes are actually much less flexible than I'd like.
 
  Maybe we should make *neither* of these the default opclass, and give
  *neither* the name json_ops.

+1.  I was thinking the same thing after reading Heikki's rant.

 One other point here is that non-default opclasses can't be used in
 UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
 specify an opclass name in those syntaxes.  UNIQUE/PRIMARY KEY don't
 matter here since these aren't btree opclasses, but is there a
 use-case for EXCLUDE with any of the supported jsonb operators?

That sounds like an oversight that could better be fixed in EXCLUDE, no?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DOCS] Call for GIST/GIN/SP-GIST opclass documentation

2014-04-09 Thread Alvaro Herrera

Robert Haas wrote:
 On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I just created sections in the SGML manual chapters about GIST, GIN, and
  SP-GIST to hold documentation about the standard opclasses provided for
  them:
 
  http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html
  http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html
  http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html
 
  We never had any well-defined place to discuss such opclasses before,
  but I think it's past time we did.
 
 +1.  Great idea.

Agreed.

I find that in my browser it's a bit difficult to make out the different
operators in the long list; ISTM a single space between operators isn't
enough.  I wonder how can we do better; sticking commas (or any other
single character, really) between them is not likely to improve matters;
and using one column per operator is not going to work very well.  Maybe
if it were possible to split a single cell in several sub-cells; or
perhaps there's a way to specify two whitespace units, or a long space,
or something like that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 One other point here is that non-default opclasses can't be used in
 UNIQUE/PRIMARY KEY/EXCLUDE constraints, because there's no place to
 specify an opclass name in those syntaxes.  UNIQUE/PRIMARY KEY don't
 matter here since these aren't btree opclasses, but is there a
 use-case for EXCLUDE with any of the supported jsonb operators?

 That sounds like an oversight that could better be fixed in EXCLUDE, no?

Well, there hasn't been a use-case up to now.  I'm not sure there's
one yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DOCS] Call for GIST/GIN/SP-GIST opclass documentation

2014-04-09 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Robert Haas wrote:
 On Tue, Apr 8, 2014 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 http://www.postgresql.org/docs/devel/static/gist-builtin-opclasses.html
 http://www.postgresql.org/docs/devel/static/gin-builtin-opclasses.html
 http://www.postgresql.org/docs/devel/static/spgist-builtin-opclasses.html

 I find that in my browser it's a bit difficult to make out the different
 operators in the long list; ISTM a single space between operators isn't
 enough.

Yeah, I noticed that too, but wasn't sure what to do about it.  One line
per operator isn't better, but I don't know how to get a bit more
horizontal space into the lists.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-09 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Wed, Apr 9, 2014 at 12:24 AM, Rajeev rastogi
 rajeev.rast...@huawei.com wrote:
 Now when we grant the lock to particular transaction, depending on type of 
 transaction, bit
 Mask will be set for either holdMaskByAutoTx or holdMaskByNormalTx.
 Similar when lock is ungranted, corresponding bitmask will be reset.

 That sounds pretty ugly, not to mention the fact that it will cause a
 substantial increase in the amount of memory required to store
 PROCLOCKs.  It will probably slow things down, too.

More to the point, why isn't it a flat-out bad idea?  I can see no
justification for distinguishing normal and autonomous transactions
at this level.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST support for inet datatypes

2014-04-09 Thread Emre Hasegeli

Tom Lane t...@sss.pgh.pa.us:
 Committed with some additional documentation work.  Thanks for
 submitting this!

Thank you for committing. I had not thought of using different structure
for the index. It works faster with my test case, too.

I am sending rebased version of the consistent operator names patch
in case you would like to include it to the same release. This is what
I wrote about this change before:

 That is why I prepared it as a separate patch on top of the others. It is
 not only consistency with the range types: @ and @ symbols used for
 containment everywhere except the inet data types, particularly on
 the geometric types, arrays; cube, hstore, intaray, ltree extensions.
 The patch does not just change the operator names, it leaves the old ones,
 adds new operators with GiST support and changes the documentation, like
 your commit ba920e1c9182eac55d5f1327ab0d29b721154277 back in 2006. I could
 not find why did you leave the inet operators unchanged on that commit,
 in the mailing list archives [1]. GiST support will be a promotion for
 users to switch to the new operators, if we make this change with it.

 This change will also indirectly deprecate the undocumented non-transparent
 btree index support that works sometimes for some of the subnet inclusion
 operators [2].

 [1] http://www.postgresql.org/message-id/14277.1157304...@sss.pgh.pa.us

 [2] http://www.postgresql.org/message-id/389.1042992...@sss.pgh.pa.us


inet-operator-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Peter Geoghegan

On Wed, Apr 9, 2014 at 8:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe we should make *neither* of these the default opclass, and give
 *neither* the name json_ops.

 There's definitely something to be said for that.  Default opclasses are
 sensible when there's basically only one behavior that's interesting for
 most people.  We can already see that that's not going to be the case
 for jsonb indexes, at least not with the currently available alternatives.

I've heard worse ideas.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add transforms feature

2014-04-09 Thread Andres Freund

On 2014-04-05 00:21:47 +0200, Andres Freund wrote:
 On 2014-01-15 21:13:18 -0500, Peter Eisentraut wrote:
  The attached patch will probably fail to apply because of the pg_proc
  changes.  So if you want to try it out, look into the header for the Git
  hash it was based off.  I'll produce a properly merged version when this
  approach is validated.  Also, some implementation details could probably
  take some revising after a couple of nights of sleep. ;-)
 
 You've slept since? ;)
 
 So, I am only doign a look through the patch, to see where it has gone
 in the past year.
 ...

So, unless somebody protests PDQ, I am going to mark this Returned with
Feedback.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem with displaying wide tables in psql

2014-04-09 Thread Greg Stark

 How can I pass or set the value of pset variable for an regression test?

I just wrote some tests using \pset columns to control the output.

Having figured out what the point of the patch is I'm pretty happy
with the functionality. It definitely is something I would appreciate
having.

One thing I noticed is that it's not using the formatting characters
to indicate newlines and wrapping. I'm trying to add that myself but
it's taking me a bit to figure out what's going on in the formatting
code. I do have a feeling this is all rearranging the deck chairs on
an obsolete technology and this should all be replaced with a web ui.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Tom Lane

Dean Rasheed dean.a.rash...@gmail.com writes:
 OK, I'm marking this ready for committer attention, on the
 understanding that that doesn't include the invtrans_minmax patch.

I've started to look at this patch set.  After rereading the thread,
I'm thinking that it's a mistake to just add the inverse transition
function and require it to work with the standard forward transition
function for the aggregate.  There was discussion upthread of providing
two separate forward transition functions, but Florian argued that that
would do nothing that you couldn't accomplish with a runtime check in
the forward function.  I think that's nonsense though, because one of
the key points here is that an invertible aggregate may require a more
complex transition state data structure --- in particular, if you're
forced to go from a pass-by-value to a pass-by-reference data type, right
there you are going to take a big hit in aggregate performance, and there
is no way for the forward transition function to avoid it.  The patch
has in fact already done that to a couple of basic aggregates like
sum(int4).  Has anyone bothered to test what side-effects that has on
non-windowed aggregation performance?

I think what'd make sense is to have a separate forward function *and*
separate state datatype to be used when we want invertible aggregation
(hm, and I guess a separate final function too).  So an aggregate
definition would include two entirely independent implementations.
If they chance to share sfunc and stype, fine, but they don't have to.

This would mean we'd need some new names for the doppelgangers of the
CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond
(but not sortop).  I guess that'd open up a chance to use a more uniform
naming scheme, but I'm not too sure what would be good.

Comments?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Greg Stark

On Wed, Apr 9, 2014 at 11:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe we should make *neither* of these the default opclass, and give
 *neither* the name json_ops.

 There's definitely something to be said for that.  Default opclasses are
 sensible when there's basically only one behavior that's interesting for
 most people.  We can already see that that's not going to be the case
 for jsonb indexes, at least not with the currently available alternatives.

 Not having a default would force users to make decisions explicitly.
 Is that what we want?

I don't like the idea of having no default opclass. I think there's a
huge usability gain in being able to just create an index on a
column and have it do something reasonable for most use cases.

I can get behind the idea of having separate index opclasses for paths
and path-value pairs but I suspect the default should just be to index
both in the same index. If we can have one default index opclass that
supports containment and existence and then other opclasses that are
smaller but only support a subset of the operators that would seem
like the best compromise.

I'm a bit confused by Heikki's list though. I would expect path and
path-value pair to be the only useful ones. I'm not clear what an
index on keys or key-value would be -- it would index just the
top-level keys and values without recursing?


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Buildfarm network under attack

2014-04-09 Thread Vik Fearing

According to Joshua Drake on IRC, there is a DDOS in progress against a
customer in the same network as the buildfarm.  As a consequence, you
may experience slowness, or inability to connect while they deal with it.

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Tom Lane

I wrote:
 ... an invertible aggregate may require a more
 complex transition state data structure --- in particular, if you're
 forced to go from a pass-by-value to a pass-by-reference data type, right
 there you are going to take a big hit in aggregate performance, and there
 is no way for the forward transition function to avoid it.  The patch
 has in fact already done that to a couple of basic aggregates like
 sum(int4).  Has anyone bothered to test what side-effects that has on
 non-windowed aggregation performance?

As a quick check, I compared aggregation performance in HEAD, non-assert
builds, with and without --disable-float8-byval on a 64-bit machine.
So this tests replacing a pass-by-val transition datatype with a
pass-by-ref one without any other changes.  There's essentially no
difference in performance of sum(int4), AFAICT, but that's because
int4_sum goes out of its way to cheat and avoid palloc overhead.
I looked to the bit_and() aggregates to see what would happen to
an aggregate not thus optimized.  As expected, int4 and int8 bit_and
are just about the same speed if int8 is pass by value ... but if it's
pass by ref, the int8 case is a good 60% slower.

So added palloc overhead, at least, is a no-go.  I see that the patched
version of sum(int4) avoids that trap, but nonetheless it's replaced a
pretty cheap transition function with a less cheap function, namely the
function previously used for avg(int4).  A quick test says that avg(int4)
is about five percent slower than sum(int4), so that's the kind of hit
we'd be taking on non-windowed aggregations if we do it like this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-04-09 Thread Tom Lane

Dean Rasheed dean.a.rash...@gmail.com writes:
 On 10 March 2014 03:36, Craig Ringer cr...@2ndquadrant.com wrote:
 I've found an issue with updatable security barrier views. Locking is
 being pushed down into the subquery. Locking is thus applied before
 user-supplied quals are, so we potentially lock too many rows.

 That has nothing to do with *updatable* security barrier views,
 because that's not an update. In fact you get that exact same plan on
 HEAD, without the updatable security barrier views patch.

I got around to looking at this.  The proximate reason for the two
LockRows nodes is:

(1) The parser and rewriter intentionally insert RowMarkClauses into
both the outer query and the subquery when a FOR UPDATE/SHARE applies
to a subquery-in-FROM.  The comment in transformLockingClause explains
why:

 * FOR UPDATE/SHARE of subquery is propagated to all of
 * subquery's rels, too.  We could do this later (based on
 * the marking of the subquery RTE) but it is convenient
 * to have local knowledge in each query level about which
 * rels need to be opened with RowShareLock.

that is, if we didn't push down the RowMarkClause, processing of the
subquery would be at risk of opening relations with too weak a lock.
In the example case, this pushdown is actually done by the rewriter's
markQueryForLocking function, but it's just emulating what the parser
would have done if the view query had been written in-line.

(2) The planner doesn't consider this, and generates a LockRows plan node
for each level of the query, since both of them have rowMarks.

However, I'm not sure this is a bug, or at least it's not the same bug you
think it is.  The lower LockRows node is doing a ROW_MARK_EXCLUSIVE mark
on the physical table rows, while the upper one is doing a ROW_MARK_COPY
on the virtual rows emitted by the subquery.  *These are not the same
thing*.  A ROW_MARK_COPY isn't a lock of any sort; it's just there to
allow EvalPlanQual to see the row that was emitted from the subquery,
in case a recheck has to be done during a Read Committed UPDATE/DELETE.
Since this is a pure SELECT, the upper locking action is useless, and
arguably the planner ought to be smart enough not to emit it.  But it's
just wasting some cycles, it's not changing any semantics.

So if you wanted user-supplied quals to limit which rows get locked
physically, they would need to be applied before the lower LockRows node.

To my mind, it's not immediately apparent that that is a reasonable
expectation.  The entire *point* of a security_barrier view is that
unsafe quals don't get pushed into it, so why would you expect that
those quals get applied early enough to limit locking?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Florian Pflug

On Apr9, 2014, at 21:35 , Tom Lane t...@sss.pgh.pa.us wrote:
 As a quick check, I compared aggregation performance in HEAD, non-assert
 builds, with and without --disable-float8-byval on a 64-bit machine.
 So this tests replacing a pass-by-val transition datatype with a
 pass-by-ref one without any other changes.  There's essentially no
 difference in performance of sum(int4), AFAICT, but that's because
 int4_sum goes out of its way to cheat and avoid palloc overhead.

 I looked to the bit_and() aggregates to see what would happen to
 an aggregate not thus optimized.  As expected, int4 and int8 bit_and
 are just about the same speed if int8 is pass by value ... but if it's
 pass by ref, the int8 case is a good 60% slower.

True, but that just means that aggregate transition functions really
ought to update the state in-place, no?

 So added palloc overhead, at least, is a no-go.  I see that the patched
 version of sum(int4) avoids that trap, but nonetheless it's replaced a
 pretty cheap transition function with a less cheap function, namely the
 function previously used for avg(int4).  A quick test says that avg(int4)
 is about five percent slower than sum(int4), so that's the kind of hit
 we'd be taking on non-windowed aggregations if we do it like this.

That's rather surprising though. AFAICS, there's isn't much difference
between the two transfer functions int4_sum and int4_avg_accum at all.

The differences come down to (+ denoting things which ought to make
int4_avg_accum slower compared to int4_sum, - denoting the opposite)

 1. +) int4_avg_accum calls AggCheckCallContext
 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
   to verify that the state is a 2-element array without NULL entries
 4. -) int4_sum checks if the input is NULL

The number of conditional branches should be about the same (and all are
seldomly taken). The validity checks on the state array, i.e. (3), should
be rather cheap I think - not quite as cheap as PG_ARGISNULL maybe, but
not so much more expensive either. That leaves the AggCheckCallContext call.
If that call costs us 5%, maybe we can find a way to make it faster, or
get rid of it entirely? Still seems a lot of a call of a not-very-complex
function, though...

I'll go and check the disassembly - maybe something in int4_avg_accum turns
out to be more complex than is immediately obvious. I'll also try to create
a call profile, unless you already have one from your test runs.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-04-09 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
  On 10 March 2014 03:36, Craig Ringer cr...@2ndquadrant.com wrote:
  I've found an issue with updatable security barrier views. Locking is
  being pushed down into the subquery. Locking is thus applied before
  user-supplied quals are, so we potentially lock too many rows.
 
  That has nothing to do with *updatable* security barrier views,
  because that's not an update. In fact you get that exact same plan on
  HEAD, without the updatable security barrier views patch.
 
 I got around to looking at this.

Thanks!

 So if you wanted user-supplied quals to limit which rows get locked
 physically, they would need to be applied before the lower LockRows node.

Right.

 To my mind, it's not immediately apparent that that is a reasonable
 expectation.  The entire *point* of a security_barrier view is that
 unsafe quals don't get pushed into it, so why would you expect that
 those quals get applied early enough to limit locking?

Right, we don't want unsafe quals to get pushed down below the
security_barrier view.  The point here is that those quals should not be
able to lock rows which they don't even see.

My understanding of the issue was that the unsafe quals were being able
to see and lock rows that they shouldn't know exist.  If that's not the
case then I don't think there's a bug here..?  Craig/Dean, can you
provide a complete test case which shows the issue?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Florian Pflug

On Apr9, 2014, at 20:20 , Tom Lane t...@sss.pgh.pa.us wrote:
 There was discussion upthread of providing
 two separate forward transition functions, but Florian argued that that
 would do nothing that you couldn't accomplish with a runtime check in
 the forward function.  I think that's nonsense though, because one of
 the key points here is that an invertible aggregate may require a more
 complex transition state data structure --- in particular, if you're
 forced to go from a pass-by-value to a pass-by-reference data type, right
 there you are going to take a big hit in aggregate performance, and there
 is no way for the forward transition function to avoid it.

To be precise, my point was that *only* having a separate non-invertible
forward transition function is pointless, exactly because of the reason
you gave - that won't allow a cheaper state representation for the
non-invertible case. So all such a non-invertible forward transition
function can do is to skip some bookkeeping that's required by the
inverse transition function - and *that* can just as easily be accomplished
by a runtime check.

I was (and still am) not in favour of duplicating the whole quadruple of
(state, initialvalue, transferfunction, finalfunction) because it seems
excessive. In fact, I believed that doing this would probably be grounds for
outright rejection of the patch, on the base of catalog bloat. And your
initial response to this suggestion seemed to confirm this.

 The patch has in fact already done that to a couple of basic aggregates like
 sum(int4).  Has anyone bothered to test what side-effects that has on
 non-windowed aggregation performance?

I'm pretty sure David Rowley did some benchmarking. The results should be
in this thread somewhere I think, but they currently evade me... Maybe David
can re-post, if he's following this...

 I think what'd make sense is to have a separate forward function *and*
 separate state datatype to be used when we want invertible aggregation
 (hm, and I guess a separate final function too).  So an aggregate
 definition would include two entirely independent implementations.
 If they chance to share sfunc and stype, fine, but they don't have to.
 
 This would mean we'd need some new names for the doppelgangers of the
 CREATE AGGREGATE parameters sfunc, stype, sspace, finalfunc, and initcond
 (but not sortop).  I guess that'd open up a chance to use a more uniform
 naming scheme, but I'm not too sure what would be good.

If we really go down that road (and I'm far from convinced), then maybe
instead of having a bunch of additional fields, we could have separate
entries in pg_aggregate for the two cases, with links between them.

The non-invertible aggregates would have something like _noninv or so
appended to their name, and we'd automatically use them if we know we
won't need to remove entries from the aggregation state. That would also
allow the users to *force* the non-invertible aggregate to be used by
simply saying SUM_NONINV instead of SUM.

Then all we'd need would be an additional OID field that links the
invertible to the non-invertible definition.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-04-09 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 So if you wanted user-supplied quals to limit which rows get locked
 physically, they would need to be applied before the lower LockRows node.

 Right.

 To my mind, it's not immediately apparent that that is a reasonable
 expectation.  The entire *point* of a security_barrier view is that
 unsafe quals don't get pushed into it, so why would you expect that
 those quals get applied early enough to limit locking?

 Right, we don't want unsafe quals to get pushed down below the
 security_barrier view.  The point here is that those quals should not be
 able to lock rows which they don't even see.

That sounds a bit confused.  The quals aren't getting to see rows they
shouldn't be allowed to see.  The issue rather is whether rows that the
user quals would exclude might get locked anyway because the locking
happens before those quals are checked.

[ thinks for awhile... ]  I guess the thing that is surprising is that
ordinarily, if you say SELECT ... WHERE ... FOR UPDATE, only rows passing
the WHERE clause get locked.  If there's a security view involved, that
gets reversed (at least for unsafe clauses).  So from that standpoint
it does seem pretty bogus.  I'm not sure how much we can do about it
given the current implementation technique for security views.  A physical
row lock requires access to the source relation and the ctid column,
neither of which are visible at all outside an un-flattened subquery.

Anyway, this is definitely a pre-existing issue with security_barrier
views (or really with any unflattenable subquery), and so I'm not sure
if it has much to do with the patch at hand.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Tom Lane

Florian Pflug f...@phlo.org writes:
 I was (and still am) not in favour of duplicating the whole quadruple of
 (state, initialvalue, transferfunction, finalfunction) because it seems
 excessive. In fact, I believed that doing this would probably be grounds for
 outright rejection of the patch, on the base of catalog bloat. And your
 initial response to this suggestion seemed to confirm this.

Well, I think it's much more likely that causing a performance penalty for
cases unrelated to window aggregates would lead to outright rejection :-(.
The majority of our users probably don't ever use window functions, but
for sure they've heard of SUM().  We can't penalize the non-window case.

Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this
suggestion) is a little annoying but it doesn't sound like a show stopper.
It seems reasonable to assume that the extra initval would be NULL in most
cases, so it's probably a net addition of 12 bytes per row.

 On Apr9, 2014, at 20:20 , Tom Lane t...@sss.pgh.pa.us wrote:
 The patch has in fact already done that to a couple of basic aggregates like
 sum(int4).  Has anyone bothered to test what side-effects that has on
 non-windowed aggregation performance?

 I'm pretty sure David Rowley did some benchmarking. The results should be
 in this thread somewhere I think, but they currently evade me... Maybe David
 can re-post, if he's following this...

I saw benchmarks addressing window aggregation, but none looking for
side-effects on plain aggregation.

 If we really go down that road (and I'm far from convinced), then maybe
 instead of having a bunch of additional fields, we could have separate
 entries in pg_aggregate for the two cases, with links between them.

That seems like a complete mess; in particular it would break the primary
key for pg_aggregate (aggfnoid), and probably break every existing query
that looks at pg_aggregate.  Some extra fields would not break such
expectations (in fact we've added fields to pg_aggregate in the past).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] bogus tsdict, tsparser, etc object identities

2014-04-09 Thread Alvaro Herrera

I noticed that pg_identify_object() gives a bogus answer for the text
search object types: it is failing to schema-qualify the objects.  I
guess at the time I thought that those object types were global, not
contained within schemas (as seems reasonable.  Who would want to have
different text search parsers in different schemas anyway?)

alvherre=# CREATE TEXT SEARCH DICTIONARY alt_ts_dict1 (template=simple);
CREATE TEXT SEARCH DICTIONARY
alvherre=# create schema foo;
CREATE SCHEMA
alvherre=# CREATE TEXT SEARCH DICTIONARY foo.alt_ts_dict1 (template=simple);
CREATE TEXT SEARCH DICTIONARY

Before patch,

alvherre=# select identity.* from pg_ts_dict, 
pg_identify_object('pg_ts_dict'::regclass, oid, 0) identity where schema  
'pg_catalog';
  type  | schema | name |   identity
++--+--
 text search dictionary | public | alt_ts_dict1 | alt_ts_dict1
 text search dictionary | foo| alt_ts_dict1 | alt_ts_dict1
(2 filas)

After patch,

alvherre=# select identity.* from pg_ts_dict, 
pg_identify_object('pg_ts_dict'::regclass, oid, 0) identity where schema  
'pg_catalog';
  type  | schema | name |  identity   
++--+-
 text search dictionary | public | alt_ts_dict1 | public.alt_ts_dict1
 text search dictionary | foo| alt_ts_dict1 | foo.alt_ts_dict1
(2 filas)

This problem is new in 9.3, so backpatching to that.  Prior to that we
didn't have object identities.

The attached patch demonstrates the fix for tsdict only, but as I said
above all text search object types are affected, so I'm going to
backpatch for all of them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/catalog/objectaddress.c
--- b/src/backend/catalog/objectaddress.c
***
*** 3095,3100  getObjectIdentity(const ObjectAddress *object)
--- 3095,3101 
  			{
  HeapTuple	tup;
  Form_pg_ts_dict formDict;
+ char	   *schema;
  
  tup = SearchSysCache1(TSDICTOID,
  	  ObjectIdGetDatum(object-objectId));
***
*** 3102,3109  getObjectIdentity(const ObjectAddress *object)
  	elog(ERROR, cache lookup failed for text search dictionary %u,
  		 object-objectId);
  formDict = (Form_pg_ts_dict) GETSTRUCT(tup);
  appendStringInfoString(buffer,
! 			  quote_identifier(NameStr(formDict-dictname)));
  ReleaseSysCache(tup);
  break;
  			}
--- 3103,3112 
  	elog(ERROR, cache lookup failed for text search dictionary %u,
  		 object-objectId);
  formDict = (Form_pg_ts_dict) GETSTRUCT(tup);
+ schema = get_namespace_name(formDict-dictnamespace);
  appendStringInfoString(buffer,
! 			  quote_qualified_identifier(schema,
! 		 NameStr(formDict-dictname)));
  ReleaseSysCache(tup);
  break;
  			}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-04-09 Thread Alvaro Herrera

Haribabu Kommi wrote:

 I modified the autovac_balance_cost function to balance the costs using
 the number of running workers, instead
 of default vacuum cost parameters.

Just as a heads-up, this patch wasn't part of the commitfest, but I
intend to review it and possibly commit for 9.4.  Not immediately but at
some point.

Arguably this is a bug fix, since the autovac rebalance code behaves
horribly in cases such as the one described here, so I should consider a
backpatch right away.  However I don't think it's a good idea to do that
without more field testing.  Perhaps we can backpatch later if the new
code demonstrates its sanity.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Florian Pflug

On Apr9, 2014, at 23:17 , Florian Pflug f...@phlo.org wrote:

 On Apr9, 2014, at 21:35 , Tom Lane t...@sss.pgh.pa.us wrote:
 A quick test says that avg(int4)
 is about five percent slower than sum(int4), so that's the kind of hit
 we'd be taking on non-windowed aggregations if we do it like this.
 
 That's rather surprising though. AFAICS, there's isn't much difference
 between the two transfer functions int4_sum and int4_avg_accum at all.
 
 The differences come down to (+ denoting things which ought to make
 int4_avg_accum slower compared to int4_sum, - denoting the opposite)
 
 1. +) int4_avg_accum calls AggCheckCallContext
 2. -) int4_sum checks if the state is NULL (it never is for int4_avg_accum)
 3. +) int4_avg_accum uses ARR_HASNULL, ARR_SIZE, ARR_OVERHEAD_NONULLS
   to verify that the state is a 2-element array without NULL entries
 4. -) int4_sum checks if the input is NULL

I've done a bit of profiling on this (using Instruments.app on OSX). One
thing I missed is that inv4_avg_accum also calls pg_detoast_datum - that
calls comes from the PG_GETARG_ARRAYTYPE_P macro. Doing that is a bit silly,
since we know that the datum cannot possibly be toasted I think (or if it
was, nodeAgg.c should do the de-toasting).

The profile also attributes a rather large percent of the total runtime of
int4_avg_accum (around 30%) to the call of AggCheckCallContext(). Getting
rid of that would help quite a few transition functions, invertible or not.
That certainly seems doable - we'd need a way to mark functions as internal
support functions, and prevent user-initiated calls of such functions. 
Transition functions marked that way could then safely scribble over their
state arguments without having to consult AggCheckCallContext() first.

I've also compared the disassemblies of int4_sum and int4_avg_accum, and
apart from these differences, they are pretty similar.

Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's
transition type is that it needs to return NULL, not 0, if zero rows
were aggregates. It might seems that it could just use int8 as state
with initial value NULL, but that only works if the transition functions
are non-strict (since the special case of state type == input type doesn't
apply here). And for non-strict transition functions need to deal with
NULL inputs themselves, which means counting the number of non-NULL inputs..

That problem would go away though if we had support for an initalfunction,
which would receive the first input value and initialize the state. In
the case of SUM(), the initial function would be strict, and thus would be
called on the first non-NULL input value, which it'd convert to int8 and
return as the new state. 

However, I still believe the best approach at this point is to just work
on making int4_avg_accum faster. I still see no principal reason what it
has to be noticeably slower - the only additional work it absolutely *has*
to perform is *one* 64-bit increment.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Dean Rasheed

On 9 April 2014 22:55, Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 I was (and still am) not in favour of duplicating the whole quadruple of
 (state, initialvalue, transferfunction, finalfunction) because it seems
 excessive. In fact, I believed that doing this would probably be grounds for
 outright rejection of the patch, on the base of catalog bloat. And your
 initial response to this suggestion seemed to confirm this.

 Well, I think it's much more likely that causing a performance penalty for
 cases unrelated to window aggregates would lead to outright rejection :-(.
 The majority of our users probably don't ever use window functions, but
 for sure they've heard of SUM().  We can't penalize the non-window case.

 Expanding pg_aggregate from 10 columns (as per patch) to 14 (as per this
 suggestion) is a little annoying but it doesn't sound like a show stopper.
 It seems reasonable to assume that the extra initval would be NULL in most
 cases, so it's probably a net addition of 12 bytes per row.

 On Apr9, 2014, at 20:20 , Tom Lane t...@sss.pgh.pa.us wrote:
 The patch has in fact already done that to a couple of basic aggregates like
 sum(int4).  Has anyone bothered to test what side-effects that has on
 non-windowed aggregation performance?

 I'm pretty sure David Rowley did some benchmarking. The results should be
 in this thread somewhere I think, but they currently evade me... Maybe David
 can re-post, if he's following this...

 I saw benchmarks addressing window aggregation, but none looking for
 side-effects on plain aggregation.

 If we really go down that road (and I'm far from convinced), then maybe
 instead of having a bunch of additional fields, we could have separate
 entries in pg_aggregate for the two cases, with links between them.

 That seems like a complete mess; in particular it would break the primary
 key for pg_aggregate (aggfnoid), and probably break every existing query
 that looks at pg_aggregate.  Some extra fields would not break such
 expectations (in fact we've added fields to pg_aggregate in the past).


This may initially sound unrelated, but I think it might address some
of these issues. Suppose we added a 'firsttrans' function, that took a
single argument (the first value to be aggregated) and was responsible
for creating the initial state from that first value.

This would apply to aggregates that ignore null values, but whose
transition function cannot currently be declared strict (either
because the state type is internal, or because it is not the same as
the aggregate's argument type).

I think quite a lot of the existing aggregates fall into this
category, and this would allow their transition functions to be made
strict and simplified --- no more testing if the state is null, and
then building it, and no more testing if the argument is null and
ignoring it. That might give a noticeable performance boost in the
regular aggregate case, especially over data containing nulls.

But in addition, it would help with writing inverse transition
functions because if the transition functions could be made strict,
they wouldn't need to do null-counting, which would mean that their
state types would not need to be expanded.

So for example sum(int4) could continue to have int8 as its state
type, it could use int8(int4) as its firsttrans function, and
int4_sum() could become strict and lose all its null-handling logic.
Then int4_sum_inv() would be the trivial to write - just doing the
same in reverse.

I'm not sure it helps for all aggregates, but there are certainly some
where it would seem to simplify things.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add psql tab completion for event triggers

2014-04-09 Thread Ian Barwick


On 10/04/14 00:23, Robert Haas wrote:

On Tue, Apr 8, 2014 at 5:27 AM, Ian Barwick i...@2ndquadrant.com wrote:

On 08/04/14 18:22, Ian Barwick wrote:


As it was kind of annoying not to have this when playing around with
event triggers.

This also tightens up the existing tab completion for ALTER TRIGGER,
which contained redundant code for table name completion, and which was
also causing a spurious RENAME TO to be inserted in this context:

  CREATE EVENT TRIGGER foo ON {event} ^I



Apologies, previous patch had some unrelated changes in it.

Correct patch attached.


This *still* has some unrelated things in it, like s/Pgsql/Postgres/,
and numerous hunks consisting entirely of whitespace changes and/or
changes to unrelated comments.


Apologies again, that was ill-thought out. Revised patch attached with 
only the additions related to event triggers, and the small fix for 
ALTER TRIGGER mentioned above which ensures RENAME TO is applied only 
when ALTER TRIGGER name ON sth was input; currently there is no 
check for a preceding ALTER, resulting in the spurious RENAME TO 
when completing CREATE EVENT TRIGGER.



Also, what's the point of this hunk:

*** psql_completion(const char *text, int st
*** 1318,1340 
  pg_strcasecmp(prev2_wd, TRIGGER) == 0)
 COMPLETE_WITH_CONST(ON);

-   else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
-pg_strcasecmp(prev3_wd, TRIGGER) == 0)
-   {
-   completion_info_charp = prev2_wd;
-   COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger);
-   }
-
 /*
!* If we have ALTER TRIGGER sth ON, then add the correct tablename
  */
 else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
  pg_strcasecmp(prev3_wd, TRIGGER) == 0 
  pg_strcasecmp(prev_wd, ON) == 0)
!   COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);

 /* ALTER TRIGGER name ON name */
!   else if (pg_strcasecmp(prev4_wd, TRIGGER) == 0 
  pg_strcasecmp(prev2_wd, ON) == 0)
 COMPLETE_WITH_CONST(RENAME TO);

--- 1355,1374 
  pg_strcasecmp(prev2_wd, TRIGGER) == 0)
 COMPLETE_WITH_CONST(ON);

 /*
!* If we have ALTER TRIGGER name ON, then add the correct tablename
  */
 else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
  pg_strcasecmp(prev3_wd, TRIGGER) == 0 
  pg_strcasecmp(prev_wd, ON) == 0)
!   {
!   completion_info_charp = prev2_wd;
!   COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger);
!   }

 /* ALTER TRIGGER name ON name */
!   else if (pg_strcasecmp(prev5_wd, ALTER) == 0 
!pg_strcasecmp(prev4_wd, TRIGGER) == 0 
  pg_strcasecmp(prev2_wd, ON) == 0)
 COMPLETE_WITH_CONST(RENAME TO);



I'll submit that as a separate patch. This was intended to fix this:


else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
 pg_strcasecmp(prev3_wd, TRIGGER) == 0)
{
completion_info_charp = prev2_wd;
COMPLETE_WITH_QUERY(Query_for_list_of_tables_for_trigger);
}

/*
 * If we have ALTER TRIGGER sth ON, then add the correct tablename
 */
else if (pg_strcasecmp(prev4_wd, ALTER) == 0 
 pg_strcasecmp(prev3_wd, TRIGGER) == 0 
 pg_strcasecmp(prev_wd, ON) == 0)
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);


as the second else if clause is redundant.


Regards

Ian Barwick


--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 202ffce..6d26ffc
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*** static const SchemaQuery Query_for_list_
*** 714,719 
--- 714,724 
 FROM pg_catalog.pg_prepared_statements \
WHERE substring(pg_catalog.quote_ident(name),1,%d)='%s'
  
+ #define Query_for_list_of_event_triggers \
+  SELECT pg_catalog.quote_ident(evtname) \
+FROM pg_catalog.pg_event_trigger \
+   WHERE substring(pg_catalog.quote_ident(evtname),1,%d)='%s'
+ 
  /*
   * This is a list of all things in Pgsql, which can show up after CREATE or
   * DROP; and there is also a query to get a list of them.
*** static const pgsql_thing_t words_after_c
*** 746,751 
--- 751,757 
  	{DATABASE, Query_for_list_of_databases},
  	{DICTIONARY, Query_for_list_of_ts_dictionaries, NULL, THING_NO_SHOW},
  	{DOMAIN, NULL, Query_for_list_of_domains},
+ 	{EVENT TRIGGER, NULL, NULL},
  	{EXTENSION, Query_for_list_of_extensions},
  	{FOREIGN DATA WRAPPER, NULL, NULL},
  	{FOREIGN TABLE,

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-04-09 Thread Dean Rasheed

On 10 April 2014 01:13, Florian Pflug f...@phlo.org wrote:
 Also, the *only* reason that SUM(int2|int4) cannot use int8 as it's
 transition type is that it needs to return NULL, not 0, if zero rows
 were aggregates. It might seems that it could just use int8 as state
 with initial value NULL, but that only works if the transition functions
 are non-strict (since the special case of state type == input type doesn't
 apply here). And for non-strict transition functions need to deal with
 NULL inputs themselves, which means counting the number of non-NULL inputs..

 That problem would go away though if we had support for an initalfunction,
 which would receive the first input value and initialize the state. In
 the case of SUM(), the initial function would be strict, and thus would be
 called on the first non-NULL input value, which it'd convert to int8 and
 return as the new state.


Ah snap!

 However, I still believe the best approach at this point is to just work
 on making int4_avg_accum faster. I still see no principal reason what it
 has to be noticeably slower - the only additional work it absolutely *has*
 to perform is *one* 64-bit increment.


In the best case that would make sum() not noticeably slower than
avg(), whereas using a firsttrans/initialfunction would potentially
make both of them faster than they currently are, and not just in
window queries.

Also, IMO a first/initial function leads to a cleaner separation of
logic, allowing some of the transition functions to be simplified
rather than becoming more complex.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup

2014-04-09 Thread Prabakaran, Vaishnavi

On Thursday, Apr 10,2014 at 1:15Am, Álvaro Herrera wrote:
Magnus Hagander wrote:
On Wed, Apr 9, 2014 at 4:55 PM, Alvaro Herrera 
alvhe...@2ndquadrant.comwrote:

  So it'd be an array, and by default you'd have something like:
   basebackup_skip_path = $log_directory ?
 
  Maybe use it to skip backup labels by default as well.
  basebackup_skip_path = $log_directory, $backup_label_files
 
 
 I hadn't considered any details, but yes, someting along that line. 
 And then you could also include arbitrary filenames or directories 
 should you want. E.g. if you use the data directory to store your 
 torrents or something.

Man, that's a great idea.  Database servers have lots of diskspace in that 
partition, so it should work really well.  Thanks!

Yes, It sounds like a good idea. I will look into this and start working in 
sometime. 

Thanks  Regards,
Vaishnavi
Fujitsu Australia



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-04-09 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 That sounds a bit confused.  

It was- I clearly hadn't followed the thread entirely.

 The quals aren't getting to see rows they
 shouldn't be allowed to see.  The issue rather is whether rows that the
 user quals would exclude might get locked anyway because the locking
 happens before those quals are checked.

This, imv, moves this from a 'major' bug to more of a 'performance' or
'obnoxious' side-effect issue that we should address *eventually*, but
not something to hold up the RLS implementation.  We often have more
locking than is strictly required and then reduce it later (see also:
ALTER TABLE lock reduction thread).  It will be an unfortunate gotcha
for a release or two, but hopefully we'll be able to address it...

 [ thinks for awhile... ]  I guess the thing that is surprising is that
 ordinarily, if you say SELECT ... WHERE ... FOR UPDATE, only rows passing
 the WHERE clause get locked.  If there's a security view involved, that
 gets reversed (at least for unsafe clauses).  So from that standpoint
 it does seem pretty bogus.  I'm not sure how much we can do about it
 given the current implementation technique for security views.  A physical
 row lock requires access to the source relation and the ctid column,
 neither of which are visible at all outside an un-flattened subquery.

Interesting.  I'm trying to reason out why we don't have it already in
similar situations; we can't *always* flatten a subquery...  Updatable
security_barrier views and RLS may make this a more promenint issue but
hopefully that'll just encourage work on fixing it (perhaps take the
higher level lock and then figure out a way to drop it when the rows
don't match the later quals..?).

 Anyway, this is definitely a pre-existing issue with security_barrier
 views (or really with any unflattenable subquery), and so I'm not sure
 if it has much to do with the patch at hand.

Agreed.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-04-09 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 Interesting.  I'm trying to reason out why we don't have it already in
 similar situations; we can't *always* flatten a subquery...

I think we do, just nobody's particularly noticed (which further reduces
the urgency of finding a solution, I suppose).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] four minor proposals for 9.5

2014-04-09 Thread Amit Kapila

On Tue, Apr 8, 2014 at 9:07 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2014-04-08 6:27 GMT+02:00 Amit Kapila amit.kapil...@gmail.com:
 So do you want to just print lock time for error'd statements, won't
 it better to
 do it for non-error'd statements as well or rather I feel it can be more
 useful
 for non-error statements? Do we already have some easy way to get
 wait-time
 for non-error statements?


 There are two points:

 a) we have no a some infrastructure how to glue some specific info to any
 query other than log_line_prefix.

Can't we do that by using log_duration and log_min_duration_statement?
For Example, if I enable these parameters, below is the log:

LOG:  duration: 343.000 ms  statement: create table t1(c1 int);

 And I have no any idea, how and what
 implement better. And I don't think so any new infrastructure (mechanism) is
 necessary. log_line_prefix increase log size, but it is very well parseable
 - splunk and similar sw has no problem with it.

One thing that could happen if we implement total lock time at
log_line_prefix is that if user enables log_lock_waits, then it will start
printing duration for each lock wait time, not sure again it depends on
implementation.

 b) lock time can be interesting on error statements too - for example - you
 can cancel locked queries - so you would to see a lock time and duration for
 cancelled queries. So this implementation can be sensible too.

Agreed, I just said it will be quite useful for non-error'd long running
statements as well, so it might be good idea to see if we can implement
it for successful statements as well.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

2014-04-09 Thread Amit Kapila

On Mon, Apr 7, 2014 at 7:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 On Sat, Apr 5, 2014 at 8:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 AFAICS, pg_ctl already reports to stderr if stderr is a tty.  This whole
 issue only comes up when pg_ctl itself is running as a service (which
 I guess primarily means starting/stopping the postgresql service?).
 So that puts extra weight on trying to be sure that error messages go
 to a predictable place; the user's going to be hard-pressed to debug
 background failures without that.

Agreed.  I think if this needs to fixed, then it will be better to do
as per your suggestion of adding a new switch.  So I have marked this
patch as Returned with feedback.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] small typo about comment in xlog.c

2014-04-09 Thread Tomonari Katsumata

Hi,

I'm reading xlog.c, and I noticed a comment of
do_pg_abort_backup is typo.

...
10247 * NB: This is only for aborting a non-exclusive backup that
doesn't write
10248 * backup_label. A backup started with pg_stop_backup() needs to be
finished
10249 * with pg_stop_backup().
...

I think A backup started with pg_stop_backup() should be
A backup started with pg_start_backup().

This is a bug about source comment, so it's not big problem.
But I want to fix the comment.
See attached patch.

regards,
--
Tomonari Katsumata

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0b4a5f6..3551d94 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10245,7 +10245,7 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
  * an error handler.
  *
  * NB: This is only for aborting a non-exclusive backup that doesn't write
- * backup_label. A backup started with pg_stop_backup() needs to be finished
+ * backup_label. A backup started with pg_start_backup() needs to be finished
  * with pg_stop_backup().
  */
 void

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-09 Thread Rajeev rastogi

On 09 April 2014 12:14, Pavan Deolasee Wrote:

Whenever I was asked to have a look at implementing this feature, I always 
wondered about the great amount of global state that a backend maintains which 
is normally tied to a single top transaction. Since AT will have same 
characteristics as a top level transaction, I
wonder how do you plan to separate those global state variables ? Sure, we can 
group them in a structure and put them on a stack when an AT starts and pop 
them off when the original top transaction becomes active again, finding all 
such global state variables is
going to be tricky.

I could think of few  global variables like transaction properties related(i.e. 
read-only mode, isolation level etc). As I plan to keep transaction properties 
of autonomous transaction same as main transaction, so there is no need to have 
these global variables separately.
Apart from this there are global variables like with-in transaction counters, 
GUC, xactStartTimeStamp. I think there is no need to maintain these variables 
also separately. They can continue from previous value for autonomous 
transaction also similar to as sub-transaction does.

In-case of autonomous transaction, only specific global variables initialized 
are related to resources (similar to sub-transaction), which anyway  gets 
stored in current transaction state.

Please let me know if I am missing something or if you have some specific 
global variables related issue.

Thanks and Regards,
Kumar Rajeev Rastogi

Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0

2014-04-09 Thread Amit Kapila

On Tue, Apr 8, 2014 at 11:32 PM, Bruce Momjian br...@momjian.us wrote:
 On Sun, Apr  6, 2014 at 11:45:59AM +0530, Amit Kapila wrote:
 In fact, this C program compiled by gcc on Debian issues no compiler
 warnings and returns 'hello', showing that -1 and ~0 compare as equal:

 int
 main(int argc, char **argv)
 {
 int i;
 unsigned int j;

 i = -1;
 j = ~0;

 if (i == j)
 printf(hello\n);

 return 0;
 }

I have add below code to check it's usage as per PG:

if (j  0)
printf(hello-1\n);

It doesn't print hello-1, which means that all the check's in code
for sock_desc  0 can have problem.

 1.
 int
 pg_foreach_ifaddr(PgIfAddrCallback callback, void *cb_data)
 sock = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0);
 if (sock == SOCKET_ERROR)

 Well, the actual problem here is that WSASocket() returns INVALID_SOCKET
 per the documentation, not SOCKET_ERROR.  I did not use PGINVALID_SOCKET
 here because this is Windows-specific code, defining 'sock' as SOCKET.
 We could have sock defined as pgsocket, but because this is Windows code
 already, it doesn't seem wise to mix portability code in there.

I think it's better to use check like below, just for matter of
consistency with other place
if (sock == INVALID_SOCKET)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-09 Thread Pavan Deolasee

On Thu, Apr 10, 2014 at 10:44 AM, Rajeev rastogi
rajeev.rast...@huawei.comwrote:

  On 09 April 2014 12:14, Pavan Deolasee Wrote:

  Whenever I was asked to have a look at implementing this feature, I
 always wondered about the great amount of global state that a backend
 maintains which is normally tied to a single top transaction. Since AT will
 have same characteristics as a top level transaction, I

 wonder how do you plan to separate those global state variables ? Sure,
 we can group them in a structure and put them on a stack when an AT starts
 and pop them off when the original top transaction becomes active again,
 finding all such global state variables is

 going to be tricky.



 I could think of few  global variables like transaction properties
 related(i.e. read-only mode, isolation level etc). As I plan to keep
 transaction properties of autonomous transaction same as main transaction,
 so there is no need to have these global variables separately.

 Apart from this there are global variables like with-in transaction
 counters, GUC, xactStartTimeStamp. I think there is no need to maintain
 these variables also separately. They can continue from previous value for
 autonomous transaction also similar to as sub-transaction does.




Hmm. Is that in line with what other databases do ? I would have preferred
AT to run like a standalone transaction without any influence of the
starting transaction, managing its own resources/locks/visibility/triggers
etc.


  In-case of autonomous transaction, only specific global variables
 initialized are related to resources (similar to sub-transaction), which
 anyway  gets stored in current transaction state.



 Please let me know if I am missing something or if you have some specific
 global variables related issue.




No, I don't have any specific issues in mind. Mostly all such global state
is managed through various AtStart/AtEOX and related routines. So a careful
examination of all those routines will give a good idea what needs to be
handled. You probably will require to write AtATStart/AtATEOX and similar
routines to manage the state at AT start/commit/rollback. Sorry, I haven't
looked at your WIP patch yet.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] four minor proposals for 9.5

2014-04-09 Thread Pavel Stehule

2014-04-10 5:50 GMT+02:00 Amit Kapila amit.kapil...@gmail.com:

 On Tue, Apr 8, 2014 at 9:07 PM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  2014-04-08 6:27 GMT+02:00 Amit Kapila amit.kapil...@gmail.com:
  So do you want to just print lock time for error'd statements, won't
  it better to
  do it for non-error'd statements as well or rather I feel it can be more
  useful
  for non-error statements? Do we already have some easy way to get
  wait-time
  for non-error statements?
 
 
  There are two points:
 
  a) we have no a some infrastructure how to glue some specific info to any
  query other than log_line_prefix.

 Can't we do that by using log_duration and log_min_duration_statement?
 For Example, if I enable these parameters, below is the log:

 LOG:  duration: 343.000 ms  statement: create table t1(c1 int);


yes, sorry. You have true. I though about this possibility, and I choose
log_line_prefix due simple configurability and better parserability. But
again, enhancing log_duration feature can be implement together with
enhancing log_line_prefix.

I'll try to visualise in prototype.

Regards

Pavel



  And I have no any idea, how and what
  implement better. And I don't think so any new infrastructure
 (mechanism) is
  necessary. log_line_prefix increase log size, but it is very well
 parseable
  - splunk and similar sw has no problem with it.

 One thing that could happen if we implement total lock time at
 log_line_prefix is that if user enables log_lock_waits, then it will start
 printing duration for each lock wait time, not sure again it depends on
 implementation.

  b) lock time can be interesting on error statements too - for example -
 you
  can cancel locked queries - so you would to see a lock time and duration
 for
  cancelled queries. So this implementation can be sensible too.

 Agreed, I just said it will be quite useful for non-error'd long running
 statements as well, so it might be good idea to see if we can implement
 it for successful statements as well.


 With Regards,
 Amit Kapila.
 EnterpriseDB: http://www.enterprisedb.com

87 matches

Mail list logo