date:20140728

Re: [HACKERS] delta relations in AFTER triggers

2014-07-28 Thread Pavel Stehule

2014-07-28 19:27 GMT+02:00 Marti Raudsepp :

> On Mon, Jul 28, 2014 at 6:24 PM, Kevin Grittner  wrote:
> > Do you have some other suggestion?  Keep in mind that it must allow
> > the code which will *generate* the transition tables to know
> > whether any of the attached triggers use a given transition table
> > for the specific operation, regardless of the language of the
> > trigger function.
>
> You will need to access the pg_proc record of the trigger function
> anyway, so it's just a matter of coming up with syntax that makes
> sense, right?
>
> What I had in mind was that we could re-use function argument
> declaration syntax. For instance, use the "argmode" specifier to
> declare OLD and NEW. Shouldn't cause grammar conflicts because the
> current OUT and INOUT aren't reserved keywords.
>
> We could also re-use the refcursor type, which already has bindings in
> some PLs, if that's not too much overhead. That would make the
> behavior straightforward without introducing new constructs, plus you
> can pass them around between functions. Though admittedly it's
> annoying to integrate cursor results into queries.
>
> Something like:
>
> CREATE FUNCTION trig(OLD old_rows refcursor, NEW new_rows refcursor)
> RETURNS trigger LANGUAGE plpgsql AS '...';
>

I dislike this proposal - it is strongly inconsistent with current trigger
design

Regards

Pavel


>
> Or maybe if the grammar allows, we could spell out "NEW TABLE", "OLD
> TABLE", but that's redundant since you can already deduce that from
> the refcursor type.
>
> It could also be extended for different types, like tid[], and maybe
> "record" for the FOR EACH ROW variant (dunno if that can be made to
> work).
>
> Regards,
> Marti
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] gaussian distribution pgbench -- splits v4

2014-07-28 Thread Fabien COELHO



Hello Robert,

I wish to agree, but my interpretation of the previous code is that 
they were ignored before, so ISTM that we are stuck with keeping the 
same unfortunate behavior.


I don't agree.  I'm not in a huge hurry to fix all the places where 
pgbench currently lacks error checks just because I don't have enough to 
do (hint: I do have enough to do), but when we're adding more 
complicated syntax in one particular place, bringing the error checks in 
that portion of the code up to scratch is an eminently sensible thing to 
do, and we should do it.


Ok. I'm in favor of that anyway. It is just that was afraid that changing 
behavior, however poor the said behavior, could be a blocker.



Also, please stop changing the title of this thread every other post.
It breaks threading for me (and anyone else using gmail), and that
makes the thread hard to follow.


Sorry. It does not break my mailer which relies on internal headers, but 
I'll try to be compatible with this gmail "features" in the future.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] whether I can see other alternative plantrees for one query?

2014-07-28 Thread Kyotaro HORIGUCHI

Hi,

> hi, all
>   NOTE: Version is 8.4   Fedora 20 X86_64

Why don't you play on 9.3 or later? 8.4 is now on the edge to EOL.

>   for understanding optimizer's internals, I set debug_print_plan=on
> and created two tables as follows :
> 
> create table Reserves (sid integer, bid integer,day date,rname char(25));
> create table Sailors(sid integer,sname char(25),rating integer,age real);
> 
> and add 1,000,000 records for each.
> 
> and execute the cmd:
> 
> select S.rating,count(*)
> from Sailors S
> where S.rating > 5 and S.age = 20
> group by S.rating;
> 
> but from the log, I only found the final selected planTree, so I want to
> ask:
> what should I do if I want to see the other alternative planTrees?  any
> advice will be apprecitaed!

Forcing another plan by configuration parameters would help.

http://www.postgresql.org/docs/9.3/static/runtime-config-query.html

For example, "set enable_hashagg to off" makes the planner to try
to avoid using HashAggregate for grouping. If you got a plan
using HashAgregate, you will get another one using GroupAggregate
by that.

What you can do otherwise would be building PG with
CFLAGS="-DOPTIMIZER_DEBUG". This will show you a bit more than
debug_print_plan, but the query you mentioned is too simple so
that planner has almost no alternative. Creating some index (say,
on age) would give planner some alternatives.

Have a good day,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] whether I can see other alternative plantrees for one query?

2014-07-28 Thread Tom Lane

=?UTF-8?B?5Zyf5Y2c55q/?=  writes:
>   NOTE: Version is 8.4   Fedora 20 X86_64

You do realize that's five years out of date?  Not sure why you're running
an end-of-life database on a bleeding-edge OS.

>   for understanding optimizer's internals, I
> *set debug_print_plan=on*
> ...
> but from the log, I only found the final selected planTree, so I want to
> ask:
> what should I do if I want to see the other alternative planTrees?  any
> advice will be apprecitaed!

The printed plan tree is actually the only one that the planner follows to
completion; other alternatives are discarded as soon as possible to save
useless processing.

For a query as simple as you're showing here, there aren't all that many
alternatives.  You can probably explore all of them by fooling with the
planner parameters enable_seqscan etc, by repeatedly disallowing whatever
plan the planner thought was cheapest so that it will select and print
the next cheapest.

If you want to look more closely than that, you could add some code to the
add_path() subroutine so that it prints rejected paths --- but they'll
necessarily be pretty abstract and not user-friendly (or perhaps I should
say even less user-friendly than EXPLAIN usually is :-() because the
details simply aren't there.

There's lots of previous discussion in the PG archives, eg here here
and here:
http://www.postgresql.org/message-id/flat/CAN3Hncy1X9Zm4gJjGPc4ApYQe0Qs_pjZe=vw0V_J=rma-cl...@mail.gmail.com
http://www.postgresql.org/message-id/flat/canp-bfaraah2f9a55wqsanh4trbeerfp_g3karwc-jlu-kx...@mail.gmail.com
http://www.postgresql.org/message-id/flat/CAFcOn2-9j4fTcJ39xvdCByF6dg3U_=tgumccp1-7sh_j9g+...@mail.gmail.com

I'm fairly sure that I remember seeing some more-completely-worked-out
patches for printing partial paths as they're considered, but my search fu
is failing me at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 documentation: duplicate paragraph in logical decoding example

2014-07-28 Thread Fujii Masao

On Mon, Jul 14, 2014 at 2:33 PM, Christoph Moench-Tegeder
 wrote:
> ## Andres Freund (and...@2ndquadrant.com):
>
>> Care to submit a patch for it Christoph?
>
> There it is.

Thanks! Applied.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] whether I can see other alternative plantrees for one query?

2014-07-28 Thread 土卜皿

hi, all
  NOTE: Version is 8.4   Fedora 20 X86_64

  for understanding optimizer's internals, I

*set debug_print_plan=on*

and created two tables as follows :


*create table Reserves (sid integer, bid integer,day date,rname
char(25));create table Sailors(sid integer,sname char(25),rating
integer,age real);*

and add 1,000,000 records for each.

and execute the cmd:




*select S.rating,count(*) from Sailors S where S.rating > 5
and S.age = 20 group by S.rating;*

but from the log, I only found the final selected planTree, so I want to
ask:
what should I do if I want to see the other alternative planTrees?  any
advice will be apprecitaed!

Dillon Peng

Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-07-28 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> It might be better if we'd declared AclMode in a single-purpose header,
>> say utils/aclmode.h, and then #include'd that into parsenodes.h.
>> There's certainly plenty of other single-datatype headers laying about.

> Do you mean src/include/datatype/aclmode.h?

I was thinking src/include/utils/, actually, but maybe datatype/ would
be a good choice.

OTOH, what we've got in there now is just timestamp.h, and IIRC it was put
there because it needed to be accessible from both frontend and backend
contexts.  That would not be true of aclmode.h, so perhaps aclmode.h
doesn't belong there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [w32] test_shm_mq test suite permanently burns connections slots

2014-07-28 Thread Alvaro Herrera

Robert Haas wrote:

> OK, I think I see the problem.  In EXEC_BACKEND mode,
> SubPostmasterMain() calls InitProcess() before IsBackgroundWorker has
> been set.  InitProcess() therefore pulls the PGPROC for the worker
> from freeProcs rather than bgworkerFreeProcs.  By exit time,
> IsBackgroundWorker has been set, so the PGPROC gets put back on the
> bgworkerFreeProcs list.  Eventually there are no regular PGPROCs left;
> they've all been moved to the bgworkerFreeProcs list.

Doh.  I'm surprised -- I tested a worker that crashed over and over to
ensure PGPROCs were reused sanely.  I guess I forgot to run it under
EXEC_BACKEND.

Are you fixing it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-07-28 Thread Alvaro Herrera

David Rowley wrote:

> I've also been looking at the isolation tests and I see that you've added a
> series of tests for NOWAIT. I was wondering why you did that as that's
> really existing code, probably if you thought the tests were a bit thin
> around NOWAIT then maybe that should be a separate patch?

The isolation tester is new so we don't have nearly enough tests for it.
Adding more meaningful tests is good even if they're unrelated to the
patch at hand.

FWIW you can use configure --enable-coverage and "make coverage-html" to
get coverage reports.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-07-28 Thread Alvaro Herrera

Tom Lane wrote:

> It might be better if we'd declared AclMode in a single-purpose header,
> say utils/aclmode.h, and then #include'd that into parsenodes.h.
> There's certainly plenty of other single-datatype headers laying about.

Do you mean src/include/datatype/aclmode.h?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reminder: time to stand down from 8.4 maintenance

2014-07-28 Thread Andrew Dunstan



I have removed it from the buildfarm server's branches_of_interest.txt.

buildfarm members that rely in this file won't need to take any action, 
except possibly to clean up their build root.


cheers

andrew


On 07/28/2014 07:41 PM, Tom Lane wrote:

PG 8.4.x is EOL as of last week's releases, so it's time to remove that
branch from any auto-update scripts you might have, reconfigure buildfarm
members that are force-building it, etc.

regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-28 Thread Peter Geoghegan

On Mon, Jul 28, 2014 at 5:14 PM, Wim Lewis  wrote:
> A quick glance at OSX's strxfrm() suggests they're using an implementation of 
> strxfrm() from FreeBSD. You can find the source here:
>
> 
> http://www.opensource.apple.com/source/Libc/Libc-997.90.3/string/FreeBSD/strxfrm.c
>
> (and a really quick glance at the contents of libc on OSX 10.9 reinforces 
> this--- I don't see any calls into their CoreFoundation unicode string APIs.)

Something isn't quite accounted for, then. The FreeBSD behavior is to
append the primary weights only. That makes their returned blobs
smaller than those you'll see on Linux, but also appears to imply that
their implementation is substandard (The PostgreSQL port uses ICU on
FreeBSD for a reason, I suppose). But FreeBSD did not add extra,
redundant "header bytes" right in the primary level when I tested it,
but I'm told Mac OS X does. I guess it could be that the collations
shipped differ, but I can't think why that would be. It does seem
peculiar that the Mac OS X blobs are always printable, whereas that
isn't the case with Glibc (the only restriction like that is that
there are no NULL bytes), and the Unicode algorithm standard
specifically says that that's okay.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-28 Thread Wim Lewis

On 28 Jul 2014, at 4:57 PM, Peter Geoghegan wrote:
> [] Then
> again, who knows? The Mac OS X behavior seems totally arbitrary to me.
> If I had to guess I'd say it has something to do with their providing
> an open standard shim to a UTF-16 based proprietary API.

A quick glance at OSX's strxfrm() suggests they're using an implementation of 
strxfrm() from FreeBSD. You can find the source here:


http://www.opensource.apple.com/source/Libc/Libc-997.90.3/string/FreeBSD/strxfrm.c

(and a really quick glance at the contents of libc on OSX 10.9 reinforces 
this--- I don't see any calls into their CoreFoundation unicode string APIs.)





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-07-28 Thread Peter Geoghegan

On Mon, Jul 28, 2014 at 9:41 AM, Merlin Moncure  wrote:
>> I don't think a buildfarm animal that doesn't run the actual upstream
>> code is a good idea. That'll make it a lot harder to understand what's
>> going on when something breaks after a commit.  It'd also require the
>> custom patches being rebased ontop of $branch before every run...
>
> hm. oh well.  maybe if there was a separate page for custom builds
> (basically, an unsupported section).

I think that's a bad idea. The QNX OS seems to be mostly used in
safety-critical systems; it has a microkernel design. I think it would
be particularly bad to have iffy support for something like that.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-28 Thread Peter Geoghegan

On Mon, Jul 28, 2014 at 4:41 PM, Peter Geoghegan  wrote:
> Actually, come to think of it that might not quite be true.

Another issue is that we might just happen to use the "C" locale when
the AC_TRY_RUN program is invoked, which probably doesn't exhibit the
broken behavior of Mac OS X, since at least with glibc on Linux that
leaves you with a blob exactly matching the original string. Then
again, who knows? The Mac OS X behavior seems totally arbitrary to me.
If I had to guess I'd say it has something to do with their providing
an open standard shim to a UTF-16 based proprietary API.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Reminder: time to stand down from 8.4 maintenance

2014-07-28 Thread Tom Lane

PG 8.4.x is EOL as of last week's releases, so it's time to remove that
branch from any auto-update scripts you might have, reconfigure buildfarm
members that are force-building it, etc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-28 Thread Peter Geoghegan

On Sun, Jul 27, 2014 at 12:34 PM, Peter Geoghegan  wrote:
> It's more or less testing for a primary weight level (i.e. the first
> part of the blob) that is no larger than the original characters of
> the string, and has no "header bytes" or other redundancies.  It also
> matches secondary and subsequently weight levels to ensure that they
> match, since the two stings tested have identical case, use of
> diacritics, etc (they're both lowercase ASCII-safe strings). I don't
> set a locale, but that shouldn't matter.

Actually, come to think of it that might not quite be true. Consider
this output from Robert's strxfrm test program:

pg@hamster:~/code$ ./strxfrm hu_HU.utf8 potyty
"potyty" -> 2826303001090909090109090909 (14 bytes)
pg@hamster:~/code$ ./strxfrm hu_HU.utf8 potyta
"potyta" -> 2826302e0c010909090909010909090909 (17 bytes)

This is a very esoteric Hungarian collation rule [1], which at one
point we found we had to plaster over within varstr_cmp() to prevent
indexes giving wrong answers [2]. It turns out that with this
collation, strcoll("potyty", "potty") == 0. The point specifically is
that in principle, collations can alter the number of weights that
appear in the primary level of the blob. This might imply that the
number of primary level bytes for the ASCII-safe string "abcdefgh"
might not equal those of "ijklmnop" for some collation, because of the
application of some similar esoteric rule. In principle, collations
are at liberty to make that happen, even though this hardly ever
occurs in practice (we first heard about it in 2005, although the
Unicode algorithm standard warns of this), and even though any of the
cases where it does occur it probably happens to not affect my little
AC_TRY_RUN program. Even still, I'm not comfortable with the
deficiency of the program. I don't want my optimization to
accidentally not apply just because some hypothetical collation where
this is true was used when Postgres was built. It probably couldn't
happen, but I must admit guaranteeing that it can't is a mess.
I suppose I could fix this by no longer assuming that the number of
bytes that appear in the primary level are fixed at n for n original
ASCII code point strings. I think that in theory even that could
break, though, because we have no principled way of parsing out
different weight levels (the Unicode standard has some ideas about how
given strxfrm()'s "no NULL bytes in blob" restriction, but that's
clearly implementation defined).

Given that Mac OS X is the only platform that appears to have this
header byte problem at all, I think we'd be better off specifically
disabling it on Mac OS X. I was very surprised to learn of the problem
on Mac OS X. Clearly it's going against the grain by having the
problem.

[1] http://www.postgresql.org/message-id/43a16bb7.7030...@mage.hu
[2] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=656beff59033ccc5261a615802e1a85da68e8fad
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-07-28 Thread Thomas Munro

On 27 July 2014 23:19, Thomas Munro  wrote:
> On the subject of isolation tests, I think skip-locked.spec is only
> producing schedules that reach third of the three 'return
> HeapTupleWouldBlock' statements in heap_lock_tuple.  I will follow up
> with some more thorough isolation tests in the next week or so to
> cover the other two, and some other scenarios and interactions with
> other feature.

Now with extra isolation tests so that the three different code
branches that can skip rows are covered.  I temporarily added some
logging lines to double check that the expected branches are reached
by each permutation while developing the specs.  They change the
output and are not part of the patch -- attaching separately.
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 231dc6a..0469705 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -45,7 +45,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressioncount | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
 [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
-[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT ] [...] ]
+[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
 
@@ -1283,7 +1283,7 @@ FETCH { FIRST | NEXT } [ count ] {
 The locking clause has the general form
 
 
-FOR lock_strength [ OF table_name [, ...] ] [ NOWAIT ]
+FOR lock_strength [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ]
 
 
 where lock_strength can be one of
@@ -1359,11 +1359,17 @@ KEY SHARE
 

 To prevent the operation from waiting for other transactions to commit,
-use the NOWAIT option.  With NOWAIT, the statement
-reports an error, rather than waiting, if a selected row
-cannot be locked immediately.  Note that NOWAIT applies only
-to the row-level lock(s) — the required ROW SHARE
-table-level lock is still taken in the ordinary way (see
+use either the NOWAIT or SKIP LOCKED
+option.  With NOWAIT, the statement reports an error, rather
+than waiting, if a selected row cannot be locked immediately.
+With SKIP LOCKED, any selected rows that cannot be
+immediately locked are skipped.  Skipping locked rows provides an
+inconsistent view of the data, so this is not suitable for general purpose
+work, but can be used to avoid lock contention with multiple consumers
+accessing a queue-like table.  Note that NOWAIT
+and SKIP LOCKED apply only to the row-level lock(s)
+— the required ROW SHARE table-level lock is
+still taken in the ordinary way (see
 ).  You can use
 
 with the NOWAIT option first,
@@ -1386,14 +1392,14 @@ KEY SHARE

 

-Multiple locking
-clauses can be written if it is necessary to specify different locking
-behavior for different tables.  If the same table is mentioned (or
-implicitly affected) by more than one locking clause,
-then it is processed as if it was only specified by the strongest one.
-Similarly, a table is processed
-as NOWAIT if that is specified in any of the clauses
-affecting it.
+Multiple locking clauses can be written if it is necessary to specify
+different locking behavior for different tables.  If the same table is
+mentioned (or implicitly affected) by more than one locking clause, then
+it is processed as if it was only specified by the strongest one.
+Similarly, a table is processed as NOWAIT if that is specified
+in any of the clauses affecting it.  Otherwise, it is processed
+as SKIP LOCKED if that is specified in any of the
+clauses affecting it.

 

@@ -1930,9 +1936,9 @@ SELECT distributors.* WHERE distributors.name = 'Westward';
 PostgreSQL allows it in any SELECT
 query as well as in sub-SELECTs, but this is an extension.
 The FOR NO KEY UPDATE, FOR SHARE and
-FOR KEY SHARE variants,
-as well as the NOWAIT option,
-do not appear in the standard.
+FOR KEY SHARE variants, as well as the NOWAIT
+and SKIP LOCKED options, do not appear in the
+standard.

   
 
diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml
index ba92607..57396d7 100644
--- a/doc/src/sgml/sql.sgml
+++ b/doc/src/sgml/sql.sgml
@@ -863,7 +863,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start ]
-[ FOR { UPDATE | SHARE } [ OF table_name [, ...] ] [ NOWAIT ] [...] ]
+[ FOR { UPDATE | SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 35f9404..51e4d28 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4091,7 +4091,7 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
  *	cid: curr

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Peter Geoghegan

On Mon, Jul 28, 2014 at 10:43 AM, Peter Geoghegan  wrote:
> On a mostly unrelated note, I'll remind you of the reason that I felt
> it was best to lock indexes. It wasn't so much about avoiding bloat as
> it was about avoiding deadlocks. When I highlighted the issue,
> Heikki's prototype, which did insert optimistically rather than
> locking, was then made to go and physically "super delete" the
> upsert-insert conflicting heap tuple (inserted optimistically before
> its index tuples), before going to lock a row, in order to avoid
> unprincipled deadlocking. In contrast, my design just used a callback
> that released page level heavyweight locks before going to lock a row.
> Heikki's prototype involved making it so that *even someone else's
> dirty snapshot* didn't see our dead speculatively-inserted heap tuple.
>
> Anyway, making all that happen is fairly invasive to a bunch of places
> that are just as critical as the nbtree code.

I think I should be more concrete about why this is more complicated
than it first appears. Andres said at pgCon that he would still go
with a design where "promise tuples" are inserted into indexes ahead
of any heap tuple (which differs from Heikki's prototype, where heap
tuple insertion occurs first). Accounting for deadlocking issues could
be particularly problematic with that design, since we must kill
tuples from each index in turn before row locking. In any case the
need to efficiently *release* locks must be weighed carefully. It
isn't obvious, but we must release locks if there is a conflict.

After Heikki acknowledged the problem [1], he produced a revision
addressing it. The details of the workaround and a patch were posted
[2]. I think it's fair to say that this area is a lot messier than it
first appears. If anyone wants to propose an alternative design, they
are of course quite welcome to, but I ask that the very real
difficulties with those designs be acknowledged. AFAICT, only Heikki
has independently acknowledged these issue on list.

In case it isn't obvious, let me be clear about what I care about
here. I feel it's important to get something that is easy to reason
about - you write a DML statement, and within certain fairly
reasonable parameters Postgres does the rest. I think we should not
accept something that may even occasionally through deadlock errors,
or unique violations, or RC-level serialization failures through no
fault of the user. That would be inferior to the plpgql looping
pattern we promote that does the right thing and avoids all of this.
It would be awful to have to tell users who hit problems like this
that they should just stop doing so much upserting, or use the old
pattern.

[1] http://www.postgresql.org/message-id/52b4aaf0.5090...@vmware.com
[2] http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [w32] test_shm_mq test suite permanently burns connections slots

2014-07-28 Thread Robert Haas

On Mon, Jul 28, 2014 at 3:59 PM, Robert Haas  wrote:
> On Fri, Jul 25, 2014 at 3:25 PM, Noah Misch  wrote:
>> On a Windows or other EXEC_BACKEND build, the following eventually gets
>> failures because all, or all but one, max_connections slot is consumed:
>>
>>   for run in `seq 1 100`; do make -C contrib/test_shm_mq installcheck; done
>>
>> When I use max_connections=40, it fails on the sixth iteration.  Only the six
>> basic processes are actually running at that time.
>
> The tests start 7 workers each time, so that makes sense: 7 * 5 < 40
> but 7 * 6 > 40.  What I'm not sure is why they are leaking connection
> slots, and why they're only doing it in EXEC_BACKEND mode.  A quick
> code audit didn't uncover any obvious explanation, so I'll try to
> reproduce and debug.

OK, I think I see the problem.  In EXEC_BACKEND mode,
SubPostmasterMain() calls InitProcess() before IsBackgroundWorker has
been set.  InitProcess() therefore pulls the PGPROC for the worker
from freeProcs rather than bgworkerFreeProcs.  By exit time,
IsBackgroundWorker has been set, so the PGPROC gets put back on the
bgworkerFreeProcs list.  Eventually there are no regular PGPROCs left;
they've all been moved to the bgworkerFreeProcs list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix src/backend/storage/smgr/README

2014-07-28 Thread Tom Lane

=?UTF-8?Q?Fabr=C3=ADzio_de_Royes_Mello?=  writes:
> Small fix in src/backend/storage/smgr/README about where is assigned fork
> numbers.

Ah, looks like I missed that reference when I moved the enum :-(.
Thanks, will fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix src/backend/storage/smgr/README

2014-07-28 Thread Fabrízio de Royes Mello

Hi all,

Small fix in src/backend/storage/smgr/README about where is assigned fork
numbers.

Regards,

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/src/backend/storage/smgr/README b/src/backend/storage/smgr/README
index 12df42a..37ed40b 100644
--- a/src/backend/storage/smgr/README
+++ b/src/backend/storage/smgr/README
@@ -50,7 +50,7 @@ independently of the main data file, while still treating it all as a single
 physical relation in system catalogs.
 
 It is assumed that the main fork, fork number 0 or MAIN_FORKNUM, always
-exists. Fork numbers are assigned in src/include/storage/relfilenode.h.
+exists. Fork numbers are assigned in src/include/common/relpath.h.
 Functions in smgr.c and md.c take an extra fork number argument, in addition
 to relfilenode and block number, to identify which relation fork you want to
 access. Since most code wants to access the main fork, a shortcut version of

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [w32] test_shm_mq test suite permanently burns connections slots

2014-07-28 Thread Robert Haas

On Fri, Jul 25, 2014 at 3:25 PM, Noah Misch  wrote:
> On a Windows or other EXEC_BACKEND build, the following eventually gets
> failures because all, or all but one, max_connections slot is consumed:
>
>   for run in `seq 1 100`; do make -C contrib/test_shm_mq installcheck; done
>
> When I use max_connections=40, it fails on the sixth iteration.  Only the six
> basic processes are actually running at that time.

The tests start 7 workers each time, so that makes sense: 7 * 5 < 40
but 7 * 6 > 40.  What I'm not sure is why they are leaking connection
slots, and why they're only doing it in EXEC_BACKEND mode.  A quick
code audit didn't uncover any obvious explanation, so I'll try to
reproduce and debug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-28 Thread Andres Freund

On 2014-07-28 15:29:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > One thing I am wondering about around this is: Why are we only
> > processing catchup events when DoingCommandRead? There's other paths
> > where we can wait for data from the client for a long time. Obviously we
> > don't want to process async.c stuff from inside copy, but I don't see
> > why that's the case for sinval.c.
> 
> It might be all right to do it during copy, but I'd just as soon treat
> that as a separate issue.  If you merge it into the basic patch then it
> might be hard to get rid of if there are problems.

Yea, not planning to merge it. Just wondering to make sure I understand
all the implications.

Another thing I'm wondering about - also not for the basic patch - is
accepting termination while writing to the client. It's rather annoying
that we currently don't allow to pg_terminate_backend() when writing to
the client.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-28 Thread Tom Lane

Andres Freund  writes:
> One thing I am wondering about around this is: Why are we only
> processing catchup events when DoingCommandRead? There's other paths
> where we can wait for data from the client for a long time. Obviously we
> don't want to process async.c stuff from inside copy, but I don't see
> why that's the case for sinval.c.

It might be all right to do it during copy, but I'd just as soon treat
that as a separate issue.  If you merge it into the basic patch then it
might be hard to get rid of if there are problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-28 Thread Josh Berkus

On 07/28/2014 11:03 AM, Fujii Masao wrote:
> On Sat, Jul 26, 2014 at 1:07 PM, Amit Kapila  wrote:
>> On Fri, Jul 25, 2014 at 6:11 PM, Fujii Masao  wrote:
>>> On Wed, Jul 9, 2014 at 11:05 PM, Amit Kapila 
>>> wrote:
 Okay. As mentioned upthread, I have fixed by ensuring that for duplicate
 config params, retain only which comes later during parsing.
 I think it might have been bit simpler to fix, if we try to fix after
 parsing
 is complete, but I think for that we might need to replicate the logic
 at multiple places.
>>>
>>> ISTM that the patch works fine. Only concern is that the logic needs
>>> O(n^2) comparison, which may cause performance problem. But
>>> "n" in O(n^2) is the number of uncommented parameters and I don't
>>> think it's so big, ISTM I can live with the logic...
>>
>> Thanks for reviewing the patch.  I also think that having O(n^2)
>> comparisons should not be a problem in this logic as it will be processed
>> only during load/parse of config file which we don't do in performance
>> sensitive path.
> 
> Yep.
> 
> There is other side effect on this patch. With the patch, when reloading
> the configurartion file, the server cannot warm an invalid setting value
> if it's not the last setting of the parameter. This may cause problematic
> situation as follows.
> 
> 1. ALTER SYSTEM SET work_mem TO '1024kB';
> 2. Reload the configuration file ---> success
> 3. Then, a user accidentally adds "work_mem = '2048KB'" into postgresql.conf
>  The setting value '2048KB' is invalid, and the unit should be
> 'kB' instead of 'KB'.
> 4. Reload the configuration file ---> success
>  The invalid setting is ignored because the setting of work_mem in
>  postgresql.auto.conf is preferred. So a user cannot notice that
> postgresql.conf
>  has an invalid setting.
> 5. Failover on shared-disk HA configuration happens, then PostgreSQL fails to
> start up because of such an invalid setting. When PostgreSQL
> starts up, the last
> setting is preferred. But all the settings are checked.
> 
> Can we live with this issue?

I'd think so, yes.  That's pretty extreme corner-case.

Also, it's my perspective that users who change conf by concurrently
editing pg.conf *and* doing ALTER SYSTEM SET are hopeless anyway.
There's simply no way we can protect them from themselves.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf and reload

2014-07-28 Thread Fujii Masao

On Sat, Jul 26, 2014 at 1:07 PM, Amit Kapila  wrote:
> On Fri, Jul 25, 2014 at 6:11 PM, Fujii Masao  wrote:
>> On Wed, Jul 9, 2014 at 11:05 PM, Amit Kapila 
>> wrote:
>> > Okay. As mentioned upthread, I have fixed by ensuring that for duplicate
>> > config params, retain only which comes later during parsing.
>> > I think it might have been bit simpler to fix, if we try to fix after
>> > parsing
>> > is complete, but I think for that we might need to replicate the logic
>> > at multiple places.
>>
>> ISTM that the patch works fine. Only concern is that the logic needs
>> O(n^2) comparison, which may cause performance problem. But
>> "n" in O(n^2) is the number of uncommented parameters and I don't
>> think it's so big, ISTM I can live with the logic...
>
> Thanks for reviewing the patch.  I also think that having O(n^2)
> comparisons should not be a problem in this logic as it will be processed
> only during load/parse of config file which we don't do in performance
> sensitive path.

Yep.

There is other side effect on this patch. With the patch, when reloading
the configurartion file, the server cannot warm an invalid setting value
if it's not the last setting of the parameter. This may cause problematic
situation as follows.

1. ALTER SYSTEM SET work_mem TO '1024kB';
2. Reload the configuration file ---> success
3. Then, a user accidentally adds "work_mem = '2048KB'" into postgresql.conf
 The setting value '2048KB' is invalid, and the unit should be
'kB' instead of 'KB'.
4. Reload the configuration file ---> success
 The invalid setting is ignored because the setting of work_mem in
 postgresql.auto.conf is preferred. So a user cannot notice that
postgresql.conf
 has an invalid setting.
5. Failover on shared-disk HA configuration happens, then PostgreSQL fails to
start up because of such an invalid setting. When PostgreSQL
starts up, the last
setting is preferred. But all the settings are checked.

Can we live with this issue?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-07-28 Thread Andres Freund

On 2014-07-26 12:20:34 -0400, Robert Haas wrote:
> On Sat, Jul 26, 2014 at 4:37 AM, Andres Freund  wrote:
> > On 2014-07-25 14:11:32 -0400, Robert Haas wrote:
> >> Attached is a contrib module that lets you launch arbitrary command in
> >> a background worker, and supporting infrastructure patches for core.
> >
> > Cool.
> >
> > I assume this 'fell out' of the work towards parallelism? Do you think
> > all of the patches (except the contrib one) are required for that or is
> > some, e.g. 3), only required to demonstrate the others?
>
> I'm fairly sure that patches 3, 4, and 5 are all required in some form
> as building blocks for parallelism.  Patch 1 contains two functions,
> one of which (shm_mq_set_handle) I think is generally useful for
> people using background workers, but not absolutely required; and one
> of which is infrastructure for patch 3 which might not be necessary
> with different design choices (shm_mq_sendv).  Patch 2 is only
> included because pg_background can benefit from it; we could instead
> use an eoxact callback, at the expense of doing cleanup at
> end-of-transaction rather than end-of-query.  But it's a mighty small
> patch and seems like a reasonable extension to the API, so I lean
> toward including it.

Don't get me wrong, I don't object to anything in here. It's just that
the bigger picture can help giving sensible feedback.

> >> Patch 3 adds the ability for a backend to request that the protocol
> >> messages it would normally send to the frontend get redirected to a
> >> shm_mq.  I did this by adding a couple of hook functions.  The best
> >> design is definitely arguable here, so if you'd like to bikeshed, this
> >> is probably the patch to look at.
> >
> > Uh. This doesn't sound particularly nice. Shouldn't this rather be
> > clearly layered by making reading/writing from the client a proper API
> > instead of adding hook functions here and there?
>
> I don't know exactly what you have in mind here.  There is an API for
> writing to the client that is used throughout the backend, but right
> now "the client" always has to be a socket.  Hooking a couple of parts
> of that API lets us write someplace else instead.  If you've got
> another idea how to do this, suggest away...

What I'm thinking of is providing an actual API for the writes instead
of hooking into the socket API in a couple places. I.e. have something
like

typedef struct DestIO DestIO;

struct DestIO
{
void (*flush)(struct DestIO *io);
int (*putbytes)(struct DestIO *io, const char *s, size_t len);
int (*getbytes)(struct DestIO *io, const char *s, size_t len);
...
}

and do everything through it. I haven't thought much about the specific
API we want, but abstracting the communication properly instead of
adding hooks here and there is imo much more likely to succeed in the
long term.

> > Also, you seem to have only touched receiving from the client, and not
> > sending back to the subprocess. Is that actually sufficient? I'd expect
> > that for this facility to be fully useful it'd have to be two way
> > communication. But perhaps I'm overestimating what it could be used for.
>
> Well, the basic shm_mq infrastructure can be used to send any kind of
> messages you want between any pair of processes that care to establish
> them.  But in general I expect that data is going to flow mostly in
> one direction - the user backend will launch workers and give them an
> initial set of instructions, and then results will stream back from
> the workers to the user backend.  Other messaging topologies are
> certainly possible, and probably useful for something, but I don't
> really know exactly what those things will be yet, and I'm not sure
> the FEBE protocol will be the right tool for the job anyway.

It's imo not particularly unreasonable to e.g. COPY to/from a bgworker. Which
would require the ability to both read/write from the other side.

> But
> error propagation, which is the main thrust of this, seems like a need
> that will likely be pretty well ubiquitous.

Agreed.

> >> This patch also adds a function to
> >> help you parse an ErrorResponse or NoticeResponse and re-throw the
> >> error or notice in the originating backend.  Obviously, parallelism is
> >> going to need this kind of functionality, but I suspect a variety of
> >> other applications people may develop using background workers may
> >> want it too; and it's certainly important for pg_background itself.
> >
> > I would have had use for it previously.
>
> Cool.  I know Petr was interested as well (possibly for the same project?).

Well, I was aware of Petr's project, but I also have my own pet project
I'd been playing with :). Nothing real.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailp

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Peter Geoghegan

On Mon, Jul 28, 2014 at 10:43 AM, Peter Geoghegan  wrote:
> Plus,  I ask you to
> consider that.

Excuse me. I meant "Plus, you avoid bloat. I ask you to consider that".

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sendLong in custom communication doesn't work

2014-07-28 Thread Robert Haas

On Fri, Jul 25, 2014 at 5:53 AM, Krystian Piećko
 wrote:
> I’m implementing the functionality that will pass all the queries native to
> postgresql (that asks about structures and versions) to the hidden
> postgresql and other queries I would like to parse myself. I have a big
> problem with outputing PG_TYPE_INT[248] value. I’m doing it because in my
> system I would like to provide the method to send by jdbc XML messages that
> would be resolved by my software and other queries would be answered by
> PostgreSQL. I don’t mind the speed and the double processing I just want to
> learn how to do it.
> Outputing texts works fine.
>
> So when I receive native query like (select * from pg_class) I have active
> JDBC connection to hidden postgresql (9.3) and this query is executed on
> that server. When I get ResultSet from the hidden postgresql I try to output
> the results in postgres form to the output. Example communication that
> returns current date in long (in my opinion) should look like this:
>
> //Example code that outputs current date in long format
> public void sendExampleResponse() {
> server.trace("Query");
> String query = readString();
> System.out.println("query = " + query);
[ ...and there's more... ]

I don't think you're going to get much help writing Java code here; if
you want to discuss the pgsql-jdbc code, you should use that mailing
list rather than this one.  This mailing list would be the right place
for discussions about the wire protocol itself, though, so maybe
someone could give you some advice on that if you were more specific
about what problem you're having.  From the provided information it's
hard to be sure what's going wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Peter Geoghegan

On Mon, Jul 28, 2014 at 8:37 AM, Robert Haas  wrote:
> AFAIUI, this is because your implementation uses lwlocks in a way that
> Andres and I both find unacceptable.

That's not the case. My implementation uses page-level heavyweight
locks. The nbtree AM used to use them for other stuff. Plenty of other
systems use index level locks managed by a heavyweight lock manager.

>>> Here you seem to be suggested that I intended to propose your existing
>>> design rather than something else, which I didn't.  In this design,
>>> you find the conflict (at most one) but scanning for the tuple to be
>>> updated.
>>
>> Yes, but what if you don't see a conflict because it isn't visible to
>> your snapshot, and then you insert, and only then (step 5), presumably
>> with a dirty snapshot, you find a conflict? How does the loop
>> terminate if that brings you back to step 1 with the same MVCC
>> snapshot feeding the update?
>
> Good point.  Maybe the syntax should be something like:
>
> UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
> ...] | select_query }
>
> That would address both the concern about being able to pipe multiple
> tuples through it and the point you just raised.  We look for a row
> that matches each given tuple on the key columns; if one is found, we
> update it; if none is found, we insert.

That basically is my design, except that (tangentially) yours risks
bloat in exchange for not having to use a real locking mechanism, and
has a different syntax. The parts of inserting into an index scan that
I stagger include an initial part that is more or less just an index
scan. With this design you'll have to set up things so that all
indexes can be directly accessed in the manner of ExecInsert() (get a
list of them from the relcache, open them in an order that avoids
possible deadlocks, etc). Why not just use ExecInsert()? I don't think
I'm alone in seeing things that way.

On a mostly unrelated note, I'll remind you of the reason that I felt
it was best to lock indexes. It wasn't so much about avoiding bloat as
it was about avoiding deadlocks. When I highlighted the issue,
Heikki's prototype, which did insert optimistically rather than
locking, was then made to go and physically "super delete" the
upsert-insert conflicting heap tuple (inserted optimistically before
its index tuples), before going to lock a row, in order to avoid
unprincipled deadlocking. In contrast, my design just used a callback
that released page level heavyweight locks before going to lock a row.
Heikki's prototype involved making it so that *even someone else's
dirty snapshot* didn't see our dead speculatively-inserted heap tuple.

Anyway, making all that happen is fairly invasive to a bunch of places
that are just as critical as the nbtree code. I'm not saying it can't
be done, or even that it definitely shouldn't be, but taking an
approach that produces bloat, rather than locking values the way other
systems do (and, to a limited extent Postgres already does) is at
least way more invasive than it first appears. Plus,  I ask you to
consider that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-07-28 Thread Marti Raudsepp

On Mon, Jul 28, 2014 at 6:24 PM, Kevin Grittner  wrote:
> Do you have some other suggestion?  Keep in mind that it must allow
> the code which will *generate* the transition tables to know
> whether any of the attached triggers use a given transition table
> for the specific operation, regardless of the language of the
> trigger function.

You will need to access the pg_proc record of the trigger function
anyway, so it's just a matter of coming up with syntax that makes
sense, right?

What I had in mind was that we could re-use function argument
declaration syntax. For instance, use the "argmode" specifier to
declare OLD and NEW. Shouldn't cause grammar conflicts because the
current OUT and INOUT aren't reserved keywords.

We could also re-use the refcursor type, which already has bindings in
some PLs, if that's not too much overhead. That would make the
behavior straightforward without introducing new constructs, plus you
can pass them around between functions. Though admittedly it's
annoying to integrate cursor results into queries.

Something like:

CREATE FUNCTION trig(OLD old_rows refcursor, NEW new_rows refcursor)
RETURNS trigger LANGUAGE plpgsql AS '...';

Or maybe if the grammar allows, we could spell out "NEW TABLE", "OLD
TABLE", but that's redundant since you can already deduce that from
the refcursor type.

It could also be extended for different types, like tid[], and maybe
"record" for the FOR EACH ROW variant (dunno if that can be made to
work).

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-28 Thread Andres Freund

On 2014-07-26 20:20:05 +0200, Andres Freund wrote:
> On 2014-07-26 13:58:38 -0400, Tom Lane wrote:
> 
> > Andres Freund  writes:
> > > That'd require either renegging on SA_RESTART or
> > > using WaitLatchOrSocket() and nonblocking send/recv.
> > 
> > Yeah, I was wondering about using WaitLatchOrSocket for client I/O too.
> > We already have a hook that lets us do the actual recv even when using
> > OpenSSL, and in principle that function could do interrupt-service-like
> > functions if it got kicked off the recv().
> 
> I've started playing with this. Looks clearly worthwile.
> 
> I think if we do it right we pretty much can get rid of the whole
> prepare_for_client_read() machinery and handle everything via
> ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me
> with joy.

One thing I am wondering about around this is: Why are we only
processing catchup events when DoingCommandRead? There's other paths
where we can wait for data from the client for a long time. Obviously we
don't want to process async.c stuff from inside copy, but I don't see
why that's the case for sinval.c.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-28 Thread Fabrízio de Royes Mello

On Mon, Jul 28, 2014 at 1:41 PM, Christoph Berg  wrote:
>
> Re: Fabrízio de Royes Mello 2014-07-28

> > There are something that should I do on this patch yet?
>
> I haven't got around to have a look at the newest incarnation yet, but
> I plan to do that soonish. (Of course that shouldn't stop others from
> doing that as well if they wish.)
>

Thanks!

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-07-28 Thread Merlin Moncure

On Mon, Jul 28, 2014 at 11:22 AM, Andres Freund  wrote:
> On 2014-07-28 11:19:48 -0500, Merlin Moncure wrote:
>> Maybe step #1 is to get a buildfarm member set up.  Is there any
>> policy against unsupported environments in the buildfarm? (I hope not)
>>
>> You're going to have to run it against a git repository containing
>> your custom patches.  It's a long and uncertain road to getting a new
>> port (re-) accepted, but demonstrated commitment to support is a
>> necessary first step. It will also advertise support for the platform.
>
> I don't think a buildfarm animal that doesn't run the actual upstream
> code is a good idea. That'll make it a lot harder to understand what's
> going on when something breaks after a commit.  It'd also require the
> custom patches being rebased ontop of $branch before every run...

hm. oh well.  maybe if there was a separate page for custom builds
(basically, an unsupported section).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-28 Thread Christoph Berg

Re: Fabrízio de Royes Mello 2014-07-28 

> There are something that should I do on this patch yet?

I haven't got around to have a look at the newest incarnation yet, but
I plan to do that soonish. (Of course that shouldn't stop others from
doing that as well if they wish.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parametric block size?

2014-07-28 Thread Thomas Kellerer

> Possibly stopping at the tablespace level might be more straightforward. 
> To avoid messing up the pages in shared buffers we'd perhaps need 
> something like several shared buffer pools - each with either its own 
> blocksize or associated with a (set of) tablespace(s).

This is exactly how Oracle does it. You can specify the blocksize when
creating a tablespace.

For each blocksize a separate buffer cache ("shared buffers" in Postgres
terms) can be configured. So the cache is not maintained on tablespace level
but on blocksize level.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/parametric-block-size-tp5812350p5813060.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-07-28 Thread Andres Freund

On 2014-07-28 11:19:48 -0500, Merlin Moncure wrote:
> Maybe step #1 is to get a buildfarm member set up.  Is there any
> policy against unsupported environments in the buildfarm? (I hope not)
> 
> You're going to have to run it against a git repository containing
> your custom patches.  It's a long and uncertain road to getting a new
> port (re-) accepted, but demonstrated commitment to support is a
> necessary first step. It will also advertise support for the platform.

I don't think a buildfarm animal that doesn't run the actual upstream
code is a good idea. That'll make it a lot harder to understand what's
going on when something breaks after a commit.  It'd also require the
custom patches being rebased ontop of $branch before every run...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-07-28 Thread Merlin Moncure

On Fri, Jul 25, 2014 at 3:16 PM, Baker, Keith [OCDUS Non-J&J]
 wrote:
> I propose that a QNX 6.5 port be introduced to PostgreSQL.
>
> I am new to PostgreSQL development, so please bear with me.
>
>
>
> I have made good progress (with 1 outstanding issue, details below):
>
> · I created a QNX 6.5 port of PostgreSQL 9.3.4 which passes
> regression tests.
>
> · I merged my changes into 9.4beta2, and with a few minor changes,
> it passes regression tests.
>
> · QNX support states that QNX 6.5 SP1 binaries run on QNX 6.6
> without modification, which I confirmed with a few quick tests.
>
>
>
> Summary of changes required for PostgreSQL 9.3.4 on QNX 6.5:
>
> · Typical changes required for any new port (template, configure.in,
> dynloader, etc.)
>
> · QNX lacks System V shared memory: I created
> “src/backend/port/posix_shmem.c” which replaces System V calls (shmget,
> shmat, shmdt, …) with POSIX calls (shm_open, mmap, munmap, shm_unlink)
>
> · QNX lacks sigaction SA_RESTART: I modified “src/include/port.h” to
> define macros to retry system calls upon EINTR (open,read,write,…) when
> compiled on QNX
>
> · A few files required addition of #include  on QNX
> (for fd_set).
>
>
>
> Additional changes required for PostgreSQL9.4beta2 on QNX 6.5:
>
> · “DSM” changes introduced in 9.4 (R. Haas) required that I make
> minor updates to my new “posix_shmem.c” code.
>
> · src\include\replication\logical.h: struct LogicalDecodingContext
> field “write” interferes with my “write” retry macro.  Renaming field
> “write” to “do_write” solved this problem.
>
>
>
> Outstanding Issue #1:
>
> src/backend/commands/dbcommands.c :: createdb() complains when copying
> template1 to template0 (apparently a locale issue)
>
> “FATAL:  22023: new LC_CTYPE (C;collate:POSIX;ctype:POSIX) is incompatible
> with the LC_CTYPE of the template database (POSIX;messages:C)”
>
> I would appreciate help from an experienced PostgreSQL hacker to address
> this.
>
> I have temporarily disabled this check on QNX (I can live with the
> assumption/limitation that template0 and template1 contain strictly ASCII).
>
> I can work toward setting up a build farm member should this proposal be
> accepted.

Maybe step #1 is to get a buildfarm member set up.  Is there any
policy against unsupported environments in the buildfarm? (I hope not)

You're going to have to run it against a git repository containing
your custom patches.  It's a long and uncertain road to getting a new
port (re-) accepted, but demonstrated commitment to support is a
necessary first step. It will also advertise support for the platform.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-07-28 Thread Robert Haas

On Fri, Jul 25, 2014 at 3:39 AM, Albe Laurenz  wrote:
> Shigeru Hanada wrote:
>> * Naming of new behavior
>> You named this optimization "Direct Update", but I'm not sure that
>> this is intuitive enough to express this behavior.  I would like to
>> hear opinions of native speakers.
>
> How about "batch foreign update" or "batch foreign modification"?
> (Disclaimer: I'm not a native speaker either.)

I think direct update sounds pretty good.  "Batch" does not sound as
good to me, since it doesn't clearly describe what makes this patch
special as opposed to some other grouping of updates that happens to
produce a speedup.

Another term that might be used is "update pushdown", since we are
pushing the whole update to the remote server instead of having the
local server participate.  Without looking at the patch, I don't have
a strong opinion on whether that's better than "direct update" in this
context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Least Active Transaction ID function

2014-07-28 Thread Robert Haas

On Thu, Jul 24, 2014 at 3:42 PM, Rohit Goyal  wrote:
> This was really -2 helpful.

I'm not sure what it means to be -2 helpful.  Hopefully it's a good thing.

> 1. Can I use this xmin variable directly anytime anywhere in my code as it
> is a global variable.

I don't really know what you're asking.  If you want a tutorial on how
global variables work in C, this is the wrong mailing list to ask
about that.

> 2. What is the difference b/w recentXmin and RecentGlobalXmin. I read the
> description but any small detail  can clear my mind. :)

RecentXmin is the oldest transaction ID that was still running as of
the last time it was updated.  RecentGlobalXmin is the oldest
transaction ID that was part of somebody's snapshot as of the last
time it was updated.  Transaction IDs older than RecentXmin can be
assumed not to be running, but there could be still-running
transactions that can't see the effected of some committed transaction
whose ID precedes RecentXmin.  Transaction IDs older than
RecentGlobalXmin are no longer running, and furthermore any the
effects of any such transactions which went on to commit are
guaranteed to be visible to the snapshots of all currently-running
transactions, and all future transactions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench -- splits v4

2014-07-28 Thread Robert Haas

On Wed, Jul 23, 2014 at 12:39 PM, Fabien COELHO  wrote:
>> 3. Similarly, I suggest that the use of gaussian or uniform be an
>> error when argc < 6 OR argc > 6.  I also suggest that the
>> parenthesized distribution type be dropped from the error message in
>> all cases.
>
> I wish to agree, but my interpretation of the previous code is that they
> were ignored before, so ISTM that we are stuck with keeping the same
> unfortunate behavior.

I don't agree.  I'm not in a huge hurry to fix all the places where
pgbench currently lacks error checks just because I don't have enough
to do (hint: I do have enough to do), but when we're adding more
complicated syntax in one particular place, bringing the error checks
in that portion of the code up to scratch is an eminently sensible
thing to do, and we should do it.

Also, please stop changing the title of this thread every other post.
It breaks threading for me (and anyone else using gmail), and that
makes the thread hard to follow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Robert Haas

On Wed, Jul 23, 2014 at 7:35 PM, Peter Geoghegan  wrote:
>> It's certain arguable whether you should INSERT and then turn failures
>> into an update or try to UPDATE and then turn failures into an INSERT;
>> we might even want to have both options available, though that smells
>> a little like airing too much of our dirty laundry.  But I think I
>> generally favor optimizing for the UPDATE case for more or less the
>> same reasons Kevin articulated.
>
> I don't see the connection between this and Kevin's remarks. And FWIW,
> I don't see a reason to favor inserts or updates. Fortunately, what I
> have balances both cases very well, and doesn't cause bloat. The work
> of descending the index to lock it isn't wasted if an update is
> required. My implementation decides to either insert or update at
> literally the latest possible moment.

AFAIUI, this is because your implementation uses lwlocks in a way that
Andres and I both find unacceptable.  My suspicion is that any version
of this that ends up getting committed is going to involve a risk of
bloat in cases involving retries, and I think it will be easier to
minimize bloat in an update-driven implementation.  But I suppose
that's speculative.

>> Here you seem to be suggested that I intended to propose your existing
>> design rather than something else, which I didn't.  In this design,
>> you find the conflict (at most one) but scanning for the tuple to be
>> updated.
>
> Yes, but what if you don't see a conflict because it isn't visible to
> your snapshot, and then you insert, and only then (step 5), presumably
> with a dirty snapshot, you find a conflict? How does the loop
> terminate if that brings you back to step 1 with the same MVCC
> snapshot feeding the update?

Good point.  Maybe the syntax should be something like:

UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
...] | select_query }

That would address both the concern about being able to pipe multiple
tuples through it and the point you just raised.  We look for a row
that matches each given tuple on the key columns; if one is found, we
update it; if none is found, we insert.

> I agree that you want to uniquely identify each tuple. What I meant
> was, why should we not be able to upsert multiple rows in a single
> command? What's wrong with that?

Nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-07-28 Thread Kevin Grittner

Marti Raudsepp  wrote:
> On Sat, Jul 5, 2014 at 5:38 PM, Kevin Grittner  wrote:
>> it seems to me that we need the full tuple to support triggers on
>> FDWs, so the TID approach would be an optimization for a subset of
>> the cases, and would probably be more appropriate, if we do it at
>> all, in a follow-on patch
>> If you disagree with that assessment, now would be a good
>> time to explain your reasoning.
>
> Maybe I just have a limited imagination because I've never found a use
> for FDWs personally. But recording changes from a trigger on a FDW
> table doesn't seem that useful,

It's a matter of whether AFTER triggers on an FDW can see the 
modified data in table form.  We just recently added the ability 
for FDW triggers to see the data in *row* form; it seemed odd to 
immediately follow that with a new way to get at the data and 
cripple FDW triggers for it.

> since you can only capture changes
> done by the local node. I expect that in many situations there are
> multiple writers accessing the same underlying remote table. Certainly
> it's can't guarantee the consistency of materialized views.

While I expect the techniques used here to help with development of 
incremental maintenance of materialized views, this seems like a 
useful feature in its own right.  I think the question is what the 
basis would be for supporting access to the changes in row format 
but not table format for FDWs, if we're supporting both formats for 
other tables.

>> I took a look at whether I could avoid making OLD and NEW
>> non-reserved keywords, but I didn't see how to do that without
>> making FOR at least partially reserved.  If someone sees a way to
>> do this without creating three new unreserved keywords
>> (REFERENCING, OLD, and NEW) I'm all ears.
>
> Sorry, I know I am very late to make this point, so feel free to ignore this.
>
> I'm not a fan of the SQL standard syntax for this feature. One nice
> thing about PostgreSQL's triggers is that you can declare the trigger
> function once and re-use it on many tables. It would make more sense
> if the same function declaration could say what variable/relation
> names it wants to use. They're more like function argument names, not
> some metadata about a table-function relationship.
>
> Putting these in the CREATE TRIGGER command means you have to repeat
> them for each table you want to apply the trigger to. It introduces
> the possibility of making more mistakes without any gain in
> flexibility.
>
> But then again, I understand that there's value in supporting standard
> syntax.

Do you have some other suggestion?  Keep in mind that it must allow 
the code which will *generate* the transition tables to know 
whether any of the attached triggers use a given transition table 
for the specific operation, regardless of the language of the 
trigger function.  Using the standard syntax has the advantage of 
making it pretty easy to put the information exactly where it is 
needed for easy access at run time.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making joins involving ctid work for the benefit of UPSERT

2014-07-28 Thread Robert Haas

On Wed, Jul 23, 2014 at 7:32 PM, Alvaro Herrera
 wrote:
>> Because nobody wants an operation to either insert 1 tuple or update
>> n>=1 tuples.  The intention is that the predicate should probably be
>> something like WHERE unique_key = 'some_value', but you can use
>> something else.  So it's kinda like saying which index you care about
>> for a particular operation, except without having to explicitly name
>> an index.  But in any event you should use a predicate that uniquely
>> identifies the tuple you want to update.
>
> This seemed a nice idea when I first read it earlier today, but now I'm
> not so sure.  Are you saying that it wouldn't be allowed to use an
> UPSERT with some sort of join, such that each joined row would produce
> either one insert or one update?  To clarify: suppose I import some
> external data into a temp table, then run UPSERT "USING" that table so
> that the rows end up in a permanent table; some of the rows might be
> already in the permanent table, some others might not.  I would hope
> that by the time UPSERT is done, all the rows are in the permanent
> table.  Would that raise an error, with your proposed design?

Yeah, my syntax didn't have a mechanism for that.  I agree we should
have one.  I was just brainstorming.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parametric block size?

2014-07-28 Thread Robert Haas

On Sat, Jul 26, 2014 at 1:37 PM, Fabien COELHO  wrote:
>> And I don't see that as being warranted at this point. But further
>> benchmarks sound like a good idea.
>
> Yep. A 10% potential performance impact looks worth the investigation.

I wonder, though, whether this isn't using a crowbar where some finer
instrument is called for.  If, for example, bigger heap blocks give
better performance because a bigger I/O size just plain works better,
well then that's interesting in its own right.  But if a bigger or
smaller block size yields better results on index scans, the right
solution might be to change the internal page structure used by that
index.  For example, I remember reading a paper a few years back where
the authors found that large page sizes were inefficient because you
had to do a linear scan of all the items on the page; so they added
some kind of btree-like structure within the page and got great
results.  So the page size itself wasn't the fundamental issue; it had
more to do with what kind of page layout made sense at one page size
vs. another page size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-28 Thread Fabrízio de Royes Mello

On Wed, Jul 23, 2014 at 5:48 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
> On Tue, Jul 22, 2014 at 3:29 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
> >
> > On Tue, Jul 22, 2014 at 12:01 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
> > >
> > >
> > >
> > > On Thu, Jul 17, 2014 at 8:02 PM, Andres Freund 
wrote:
> > > >
> > > > > That means should I "FlushRelationBuffers(rel)" before change the
> > > > > relpersistence?
> > > >
> > > > I don't think that'd help. I think what this means that you simply
> > > > cannot change the relpersistence of the old relation before the heap
> > > > swap is successful. So I guess it has to be something like
(pseudocode):
> > > >
> > > > OIDNewHeap = make_new_heap(..);
> > > > newrel = heap_open(OIDNewHeap, AEL);
> > > >
> > > > /*
> > > >  * Change the temporary relation to be unlogged/logged. We have to
do
> > > >  * that here so buffers for the new relfilenode will have the right
> > > >  * persistency set while the original filenode's buffers won't get
read
> > > >  * in with the wrong (i.e. new) persistency setting. Otherwise a
> > > >  * rollback after the rewrite would possibly result with buffers
for the
> > > >  * original filenode having the wrong persistency setting.
> > > >  *
> > > >  * NB: This relies on swap_relation_files() also swapping the
> > > >  * persistency. That wouldn't work for pg_class, but that can't be
> > > >  * unlogged anyway.
> > > >  */
> > > > AlterTableChangeCatalogToLoggedOrUnlogged(newrel);
> > > > FlushRelationBuffers(newrel);
> > > > /* copy heap data into newrel */
> > > > finish_heap_swap();
> > > >
> > > > And then in swap_relation_files() also copy the persistency.
> > > >
> > > >
> > > > That's the best I can come up right now at least.
> > > >
> > >
> > > Isn't better if we can set the relpersistence as an argument to
"make_new_heap" ?
> > >
> > > I'm thinking to change the make_new_heap:
> > >
> > > From:
> > >
> > >  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
> > >LOCKMODE lockmode)
> > >
> > > To:
> > >
> > >  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
> > >LOCKMODE lockmode)
> > >
> > > That way we can create the new heap with the appropriate
relpersistence, so in the swap_relation_files also copy the persistency, of
course.
> > >
> > > And after copy the heap data to the new table (ATRewriteTable) change
relpersistence of the OldHeap's indexes, because in the "finish_heap_swap"
they'll be rebuild.
> > >
> > > Thoughts?
> > >
> >
> > The attached patch implement my previous idea based on Andres thoughts.
> >
>
> I don't liked the last version of the patch, especially this part:
>
> +/* check if SetUnlogged or SetLogged exists in subcmds */
> +for(pass = 0; pass < AT_NUM_PASSES; pass++)
> +{
> +List *subcmds = tab->subcmds[pass];
> +ListCell*lcmd;
> +
> +if (subcmds == NIL)
> +continue;
> +
> +foreach(lcmd, subcmds)
> +{
> +AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd);
> +
> +if (cmd->subtype == AT_SetUnLogged || cmd->subtype
== AT_SetLogged)
> +{
> +/*
> + * Change the temporary relation to be
unlogged/logged. We have to do
> + * that here so buffers for the new relfilenode
will have the right
> + * persistency set while the original filenode's
buffers won't get read
> + * in with the wrong (i.e. new) persistency
setting. Otherwise a
> + * rollback after the rewrite would possibly
result with buffers for the
> + * original filenode having the wrong
persistency setting.
> + *
> + * NB: This relies on swap_relation_files() also
swapping the
> + * persistency. That wouldn't work for pg_class,
but that can't be
> + * unlogged anyway.
> + */
> +if (cmd->subtype == AT_SetUnLogged)
> +newrelpersistence = RELPERSISTENCE_UNLOGGED;
> +
> +isSetLoggedUnlogged = true;
> +}
> +}
> +}
>
>
> So I did a refactoring adding new items to AlteredTableInfo to pass the
information through the phases.
>

Hi all,

There are something that should I do on this patch yet?

Regards

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] Re: [GENERAL] pg_dump behaves differently for different archive formats

2014-07-28 Thread Tom Lane

Stephen Frost  writes:
> If we're going to change this, it seems to me that the only option would
> be to change the dump format...  Just off-the-cuff, I'm wondering if we
> could actually not change the real 'format' but simply promote each ACL
> entry (and similar cases..) to top-level objects and declare that TOC
> entries should be single statements.

I don't think we want even more TOC entries, but it would not be
unreasonable to insist that the statement(s) within a TOC entry be
subdivided somehow.  Essentially the payload of a TOC entry becomes
a list of strings rather than just one string.

That would mean that the problem could not be fixed for existing archive
files; but that seems OK, given the rather small number of complaints
so far.

If we had something like that, I'd be strongly inclined to get rid of
the existing convention whereby comments and ACL commands are separate
TOC entries, and make them part of the parent object's TOC entry (which'd
mean we'd want to label the sub-strings so we can tell whether they are
main object, comment, or ACL).  The fewer TOC entries we can have, the
better; there is no reason why comments/ACLs should be independently
sortable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] building pdfs

2014-07-28 Thread Tom Lane

Andrew Dunstan  writes:
> On 07/27/2014 11:28 AM, Tom Lane wrote:
>> Personally I find the PDF docs to be an anachronism: surely nobody
>> is printing them on dead trees any more, and for on-computer usage,
>> what do they offer that the HTML format doesn't?  So I'm unexcited
>> about making them slightly prettier.

> If they are then maybe there's no point in trying to build them in the 
> buildfarm constantly.

> One advantage that they have over the HTML docs is that they encapsulate 
> the docs in a single file. But then, so does the epub format, which, 
> unlike PDFs, can adapt to display dimensions.

Hm.  I recall Peter was experimenting with building epub format.  Maybe we
could start moving over to that, if there's a mature toolchain.  I keep
fearing we are going to run up against document-size limits in the
jadetex toolchain (and unlike the last time, it may not be so easy to
modify the limits --- there are hard-wired limits inside TeX).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-07-28 Thread Marti Raudsepp

On Sat, Jul 5, 2014 at 5:38 PM, Kevin Grittner  wrote:
> it seems to me that we need the full tuple to support triggers on
> FDWs, so the TID approach would be an optimization for a subset of
> the cases, and would probably be more appropriate, if we do it at
> all, in a follow-on patch
> If you disagree with that assessment, now would be a good
> time to explain your reasoning.

Maybe I just have a limited imagination because I've never found a use
for FDWs personally. But recording changes from a trigger on a FDW
table doesn't seem that useful, since you can only capture changes
done by the local node. I expect that in many situations there are
multiple writers accessing the same underlying remote table. Certainly
it's can't guarantee the consistency of materialized views.

> I took a look at whether I could avoid making OLD and NEW
> non-reserved keywords, but I didn't see how to do that without
> making FOR at least partially reserved.  If someone sees a way to
> do this without creating three new unreserved keywords
> (REFERENCING, OLD, and NEW) I'm all ears.

Sorry, I know I am very late to make this point, so feel free to ignore this.

I'm not a fan of the SQL standard syntax for this feature. One nice
thing about PostgreSQL's triggers is that you can declare the trigger
function once and re-use it on many tables. It would make more sense
if the same function declaration could say what variable/relation
names it wants to use. They're more like function argument names, not
some metadata about a table-function relationship.

Putting these in the CREATE TRIGGER command means you have to repeat
them for each table you want to apply the trigger to. It introduces
the possibility of making more mistakes without any gain in
flexibility.

But then again, I understand that there's value in supporting standard syntax.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GENERAL] pg_dump behaves differently for different archive formats

2014-07-28 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Albe Laurenz  writes:
> > I thought that changing the dump format for this would be too
> > much trouble, so I came up with the attached.

If we're going to change this, it seems to me that the only option would
be to change the dump format...  Just off-the-cuff, I'm wondering if we
could actually not change the real 'format' but simply promote each ACL
entry (and similar cases..) to top-level objects and declare that TOC
entries should be single statements.

> While those specific problems could no doubt be fixed, I object to the
> entire concept of assuming that what pg_dump emits is always going to be
> trivially parsable.  If we are to go down this path, I think we have to
> replicate what psql is doing to identify statement boundaries ... and
> as I mentioned upthread, that's rather a lot of code :-(

Agreed.  If we want this, we should handle it on the pg_dump side, not
try and work it out on the pg_restore side.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] building pdfs

2014-07-28 Thread Andrew Dunstan



On 07/27/2014 11:28 AM, Tom Lane wrote:



Personally I find the PDF docs to be an anachronism: surely nobody
is printing them on dead trees any more, and for on-computer usage,
what do they offer that the HTML format doesn't?  So I'm unexcited
about making them slightly prettier.





If they are then maybe there's no point in trying to build them in the 
buildfarm constantly.


One advantage that they have over the HTML docs is that they encapsulate 
the docs in a single file. But then, so does the epub format, which, 
unlike PDFs, can adapt to display dimensions.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [GENERAL] pg_dump behaves differently for different archive formats

2014-07-28 Thread Tom Lane

Albe Laurenz  writes:
> I thought that changing the dump format for this would be too
> much trouble, so I came up with the attached.

> It assumes that custom- or tar-format archives are written by pg_dump
> and cannot contain arbitrary SQL statements, which allows me to get away
> with very simple parsing.

I don't think this can be trusted in the least.  To begin with, where'd
you get the idea dumps cannot contain "arbitrary SQL statements"?  CREATE
RULE at least could contain some pretty weird stuff.  This thing doesn't
look like it's even bothering to count nested parentheses, so it will
certainly fail on a multi-statement rule.  I believe you're also at risk
of SQL injection attacks from failing to account for multibyte characters
in non-ASCII-safe client encodings.

While those specific problems could no doubt be fixed, I object to the
entire concept of assuming that what pg_dump emits is always going to be
trivially parsable.  If we are to go down this path, I think we have to
replicate what psql is doing to identify statement boundaries ... and
as I mentioned upthread, that's rather a lot of code :-(

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [GENERAL] pg_dump behaves differently for different archive formats

2014-07-28 Thread Albe Laurenz

Tom Lane wrote on Dec 16, 2013:
> Albe Laurenz  writes:
>> Restoring a "plain format" dump and a "custom format" dump of
>> the same database can lead to different results:

>> pg_dump organizes the SQL statements it creates in "TOC entries".
>> If a custom format dump is restored with pg_restore, all
>> SQL statements in a TOC entry will be executed as a single command
>> and thus in a single transaction.

> Yeah, this is a bug I think.  pg_dump was designed around the idea
> that the output would be executed as a simple script, and in a
> number of places there's an expectation that one SQL statement
> can fail without affecting following ones.  So if pg_restore can't
> provide that behavior it's not good.
> 
> On the other hand, I'm not sure how much enthusiasm there'd be for
> complex or fragile changes to fix this.  A lot of people invariably
> run restores in single-transaction mode and don't really care about
> fault-tolerant restores.  Also, it's easy enough to dodge the problem
> if you must: just pipe the output into psql rather than
> direct-to-database.
> 
> So to me the question is can we fix this without doing something like
> duplicating psql's lexer?  If we have to parse out the statements
> contained in each text blob, it's probably going to be too painful.
> Some cautionary history about this sort of thing can be read at
> http://www.postgresql.org/message-id/flat/18006.1325700...@sss.pgh.pa.us

I thought that changing the dump format for this would be too
much trouble, so I came up with the attached.

It assumes that custom- or tar-format archives are written by pg_dump
and cannot contain arbitrary SQL statements, which allows me to get away
with very simple parsing.

If this is not shot down immediately on account of fragility, I'd
add it to the next commitfest page.

The problem has been a pain point for my co-workers in the past;
using single-transaction mode doesn't work for us, since we have custom objects
in our template database that cause expected errors when a dump is restored.

Yours,
Laurenz Albe


pg_restore.patch
Description: pg_restore.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Audit of logout

2014-07-28 Thread Fujii Masao

On Mon, Jul 28, 2014 at 12:22 PM, Amit Kapila  wrote:
> On Thu, Jul 3, 2014 at 1:13 AM, Fujii Masao  wrote:
>> On Wed, Jul 2, 2014 at 11:39 PM, Joe Conway  wrote:
>>
>> No. If we change it to PGC_SIGHUP, SHOW command does display
>> the changed value after a reload. It's the same behavior as other
>> PGC_SIGHUP parameters do. Attached patch just changes it to PGC_SIGHUP.
>> You can test that by using the patch.
>
> As this patch is marked as Needs Review, so I went ahead and
> picked up for review, however after reading mail chain, it seems to
> me that there is a general inclination to have a new category in
> GucContext for this feature.  I don't see the patch implementing the
> same in this thread, so I think it is better to move it to next CF
> (2014-08).

Yep, agreed. I just moved this to next CF.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] config.sgml referring to unix_socket_directories on older releases

2014-07-28 Thread Fujii Masao

On Sun, Jul 27, 2014 at 12:09 AM, Guillaume Lelarge
 wrote:
> Hi,
>
> While updating the french translation of the latest releases, I stumbled
> upon a small issue on the config.sgml file.
>
> It talks about unix_socket_directories whereas this parameter only appears
> with the 9.3 release. It should probably be replaced with
> unix_socket_directory for all releases where this has been commited (8.4 to
> 9.2). The patch attached does this. It applies cleanly on all releases (with
> a hunk though).

Thanks for the patch! Applied it to 9.2, 9.1 and 9.0. I didn't applied it to 8.4
because it's already EOL.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-28 Thread MauMau


From: "MauMau" 
I must add one thing.  After some client processes closed the connection 
without any hang, their server processes were stuck with a stack trace 
like this (I'll look for and show the exact stack trace tomorrow):


I found two kinds of stack traces:

#0  0x003199ec488f in poll () from /lib64/libc.so.6
#1  0x00609f24 in WaitLatchOrSocket ()
#2  0x0063ad92 in SyncRepWaitForLSN ()
#3  0x004ad474 in CommitTransaction ()
#4  0x004aef53 in CommitTransactionCommand ()
#5  0x0064b547 in shmem_exit ()
#6  0x0064b625 in proc_exit_prepare ()
#7  0x0064b6a8 in proc_exit ()
#8  0x00668a94 in PostgresMain ()
#9  0x00617f2c in ServerLoop ()
#10 0x0061ae96 in PostmasterMain ()
#11 0x005b2ccf in main ()

#0  0x003f4badf258 in poll () from /lib64/libc.so.6
#1  0x00619b94 in WaitLatchOrSocket ()
#2  0x00640c4c in SyncRepWaitForLSN ()
#3  0x00491c18 in RecordTransactionCommit ()
#4  0x00491d98 in CommitTransaction ()
#5  0x00493135 in CommitTransactionCommand ()
#6  0x00653fc5 in ProcessCatchupEvent ()
#7  0x006540ed in HandleCatchupInterrupt ()
#8  0x006533e3 in procsignal_sigusr1_handler ()
#9  
#10 0x003f4bae96b0 in recv () from /lib64/libc.so.6
#11 0x005b75f6 in secure_read ()
#12 0x005c223b in pq_recvbuf ()
#13 0x005c263b in pq_getbyte ()
#14 0x0066e081 in PostgresMain ()
#15 0x00627d81 in PostmasterMain ()
#16 0x005c4803 in main ()


I'll try the fix tomorrow if possible.  What kind of problems do you hink 
of for back-patching?


I could reproduce the problem with 9.2.8, but have not yet with 9.5dev. 
I'll try with 9.2.9, and create the fix.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/PgSQL: EXIT USING ROLLBACK

2014-07-28 Thread Simon Riggs

On 28 July 2014 10:34, Marko Tiikkaja  wrote:
> On 7/28/14 11:27 AM, Simon Riggs wrote:
>>
>> On 26 July 2014 18:14, Marko Tiikkaja  wrote:
>>
>>> Today I'd like to present a way to get rid of code like this:
>>
>>
>> You haven't explained this very well... there is nothing that explains
>> WHY you want this.
>>
>> In the absence of a good explanation and a viable benefit, I would
>> vote -1 for this feature suggestion.
>
>
> Yes, I did a poor job in the original email, but I did explain my reasoning
> later:

With respect, I think you did a poor job the second time too. I can't
find a clearly explained reasoning behind the proposal, nor do I
understand what the problem was.

One of the things I do is work hard on my initial explanations and
reasoning. This helps me because I frequently end up not proposing
something because my reasoning was poor, but it also helps me focus on
whether I am solving a real problem by sharepening my understanding of
the actual problem. And it also helps Tom (or others) demolish things
more quickly with a well placed "indeed" ;-)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-28 Thread Kyotaro HORIGUCHI

Hello,

> > > I think there is one more disadvantage in the way current patch is
> > > done which is that you need to collect index path keys for all relations
> > > irrespective of whether they will be of any use to eliminate useless
> > > pathkeys from query_pathkeys.  One trivial case that comes to mind is
> > > when there are multiple relations involved in query and ORDER BY is
> > > base on columns of only part of the tables involved in query.
> >
> > Like this?
> >
> > select x.a, x.b, y.b from x, y where x.a = y.a order by x.a, x.b;
> >
> > Equivalent class consists of (x.a=y.a) and (x.b), so index
> > pathkeys for i_y is (y.a.=x.a). As a result, no common primary
> > pathkeys found.
> 
> I think it will find common pathkey incase you have an unique index
> on x.a (please see the example below), but currently I am not clear
> why there is a need for a common index path key in such a case to
> eliminate useless keys in ORDER BY, why can't we do it based
> on individual table's path key.
> 
> Example:
> 
> create table t (a int not null, b int not null, c int, d text);
> create unique index i_t_pkey on t(a, b);
> insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
> 10) a);
> analyze;
> 
> create table t1 (a int not null, b int not null, c int, d text);
> create unique index i_t1_pkey_1 on t1(a);
> create unique index i_t1_pkey_2 on t1(a, b);
> insert into t1 (select a * 2, a / 10, a, 't' from generate_series(0,
> 10) a);
> explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
> t1.a,t1.b,t1.c,t1.d;
> 
> QUERY PLAN
> --
>  Merge Join
>Merge Cond: (t.a = t1.a)
>->  Index Scan using i_t_pkey on t
>->  Index Scan using i_t1_pkey_1 on t1
> (4 rows)
> 
> Here we can notice that there is no separate sort key in plan.

Sure, 

> Now drop the i_t1_pkey_1 and check the query plan again.
> 
> drop index i_t1_pkey_1;
> explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by
> t1.a,t1.b,t1.c,t1.d;
>QUERY PLAN
> 
>  Sort
>Sort Key: t.a, t1.b, t1.c, t1.d
>->  Merge Join
>  Merge Cond: (t.a = t1.a)
>  ->  Index Scan using i_t_pkey on t
>  ->  Index Scan using i_t1_pkey_2 on t1
> (6 rows)
> 
> Can't above plan eliminate Sort Key even after dropping index
> (i_t1_pkey_1)?

My patch doesn't so since there no longer a 'common primary
pathkeys' in this query. Perhaps the query doesn't allow the sort
eliminated. Since a is no more a pkey, t1 can have dulicate rows
for the same a, so the joined relation also may have duplicte
values in the column a. Therefore the joined relation may be half
sorted only by the column a so the sort pathkeys cannot be
trimmed.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/PgSQL: EXIT USING ROLLBACK

2014-07-28 Thread Marko Tiikkaja

On 7/28/14 11:27 AM, Simon Riggs wrote:

On 26 July 2014 18:14, Marko Tiikkaja  wrote:

Today I'd like to present a way to get rid of code like this:

You haven't explained this very well... there is nothing that explains
WHY you want this.

In the absence of a good explanation and a viable benefit, I would
vote -1 for this feature suggestion.

Yes, I did a poor job in the original email, but I did explain my 
reasoning later:

> Yes, you can already do this with RAISE but that seems more like an
> accident than anything else.  I feel a dedicated syntax is less error
> prone and makes the intent clearer to people reading the code.  But I
> realize I might be in the minority with this.

I guess -3, +0 is enough that I'll be dropping the patch.  Thanks to 
everyone who had a look.

.marko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/PgSQL: EXIT USING ROLLBACK

2014-07-28 Thread Simon Riggs

On 26 July 2014 18:14, Marko Tiikkaja  wrote:

> Today I'd like to present a way to get rid of code like this:

You haven't explained this very well... there is nothing that explains
WHY you want this.

In the absence of a good explanation and a viable benefit, I would
vote -1 for this feature suggestion.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.

2014-07-28 Thread Kyotaro HORIGUCHI

Hello,

> In order to minimize the impact, what can be done is to execute
> fetch_more_data() in asynchronous mode every time, when there only few rows
> left to be consumed. So in current code below
> 1019 /*
> 1020  * Get some more tuples, if we've run out.
> 1021  */
> 1022 if (fsstate->next_tuple >= fsstate->num_tuples)
> 1023 {
> 1024 /* No point in another fetch if we already detected EOF,
> though. */
> 1025 if (!fsstate->eof_reached)
> 1026 fetch_more_data(node, false);
> 1027 /* If we didn't get any tuples, must be end of data. */
> 1028 if (fsstate->next_tuple >= fsstate->num_tuples)
> 1029 return ExecClearTuple(slot);
> 1030 }
> 
> replace line 1022 with if (fsstate->next_tuple >= fsstate->num_tuples)
> with if (fsstate->next_tuple >= fsstate->num_tuples -
> SOME_BUFFER_NUMBER_ROWS)
> Other possibility is to call PQsendQuery(conn, sql), after line 2100 and if
> eof_reached is false.
> 
> 2096 /* Must be EOF if we didn't get as many tuples as we asked
> for. */
> 2097 fsstate->eof_reached = (numrows < fetch_size);
> 2098
> 2099 PQclear(res);
> 2100 res = NULL;

I see, I'll consider it. If late (lazy) error detection is
allowed, single row mode seems available, too.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.

2014-07-28 Thread Kyotaro HORIGUCHI

Hello, thank you for the comment.

> Hi Kyotaro,
> fetch_more_rows() always runs "FETCH 100 FROM " on the foreign
> server to get the next set of rows. The changes you have made seem to run
> only the first FETCHes from all the nodes but not the subsequent ones. The
> optimization will be helpful only when there are less than 100 rows per
> postgres connection in the query. If there are more than 100 rows from a
> single foreign server, the second onwards FETCHes will be serialized.
> 
> Is my understanding correct?

Yes, you're right. So I wrote that as following.

Me> it almost halves the response time because the remote queries
Me> take far longer startup time than running time.

Parallelizing all FETCHes would be effective if the connection
transfers bytes at a speed near the row fetch speed but I
excluded the case because of the my assumption that the chance is
relatively lower for the gain, and for the simplicity as PoC. If
this approach is not so inappropriate and not getting objections,
I will work on this for the more complete implement, including
cost estimation.

> On Fri, Jul 25, 2014 at 2:05 PM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> 
> > Hello,
> >
> > I noticed that postgresql_fdw can run in parallel by very small
> > change. The attached patch let scans by postgres_fdws on
> > different foreign servers run sumiltaneously. This seems a
> > convenient entry point to parallel execution.
> >
> > For the testing configuration which the attched sql script makes,
> > it almost halves the response time because the remote queries
> > take far longer startup time than running time. The two foreign
> > tables fvs1, fvs2 and fvs1_2 are defined on the same table but
> > fvs1 and fvs1_2 are on the same foreign server pgs1 and fvs2 is
> > on the another foreign server pgs2.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

2014-07-28 Thread Heikki Linnakangas


On 07/17/2014 11:13 PM, Fabien COELHO wrote:



However, ISTM that it is not the purpose of pgbench documentation to be a
primer about what is an exponential or gaussian distribution, so the idea
would yet be to have a relatively compact explanation, and that the
interested but clueless reader would document h..self from wikipedia or a
text book or a friend or a math teacher (who could be a friend as well:-).


Well, I think it's a balance.  I agree that the pgbench documentation
shouldn't try to substitute for a text book or a math teacher, but I
also think that you shouldn't necessarily need to refer to a text book
or a math teacher in order to figure out how to use pgbench.  Saying
"it's complicated, so we don't have to explain it" would be a cop out;
we need to *make* it simple.  And if there's no way to do that, then
IMHO we should reject the patch in favor of some future patch that
implements something that will be easy for users to understand.


  [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 10.0

decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%


I don't have a clue what that means.  None.


Maybe we could add in front of the decile/percent

"distribution of increasing account key values selected by pgbench:"


I still wouldn't know what that meant.  And it misses the point
anyway: if the documentation is good, this will be unnecessary.  If
the documentation is bad, a printout that tries to illustrate it by
example is not an acceptable substitute.


The decile description is quite classic when discussing statistics.


IMHO we should include a diagram for each distribution. A diagram would 
be much more easy to understand than a decile or verbal explanation.


The only problem is that the build infrastructure doesn't currently 
support including images in the docs. That's been discussed before, and 
I think we even used to have a couple of images there a long time ago. 
Now would be a good time to bite the bullet and add the support.
We got fairly close to a consensus on how to do it in this thread: 
www.postgresql.org/message-id/flat/20120712181636.gc11...@momjian.us. 
The biggest problem was choosing an editor that has a fairly stable file 
format, so that we don't get huge diffs every time someone moves a line 
in a diagram. One work-around for that is to use graphviz and/or gnuplot 
as the source format, instead of a graphical editor.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

64 matches

Mail list logo