Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2014-02-06 Thread Andres Freund
On 2014-02-05 12:36:42 -0500, Robert Haas wrote:
  It may well be that your proposal is spot on.  But I'd like to see some
  data-structure-by-data-structure measurements, rather than assuming that
  alignment must be a good thing.
 
  I am fine with just aligning BufferDescriptors properly. That has
  clearly shown massive improvements.
 
 I thought your previous idea of increasing BUFFERALIGN to 64 bytes had
 a lot to recommend it.

Good.

I wonder if we shouldn't move that bit of logic:
if (size = BUFSIZ)
newStart = BUFFERALIGN(newStart);
out of ShmemAlloc() and instead have a ShmemAllocAligned() and
ShmemInitStructAligned() that does it. So we can sensibly can control it
per struct.

 But that doesn't mean it doesn't need testing.

I feel the need here, to say that I never said it doesn't need testing
and never thought it didn't...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-02-06 Thread Marti Raudsepp
On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote:
 Hmm, sounds a little steep.  Why is it so expensive?  I'm probably
 missing something here, because I would have thought that planner
 support for partial sorts would consist mostly of considering the same
 sorts we consider today, but with the costs reduced by the batching.

I guess it's because the patch undoes some optimizations in the
mergejoin planner wrt caching merge clauses and adds a whole lot of
code to find_mergeclauses_for_pathkeys. In other code paths the
overhead does seem to be negligible.

Notice the removal of:
/* Select the right mergeclauses, if we didn't already */
/*
 * Avoid rebuilding clause list if we already made one;
 * saves memory in big join trees...
 */

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-security on updatable s.b. views

2014-02-06 Thread Yeb Havinga

On 2014-02-06 05:43, Craig Ringer wrote:

Based on Tom's objections, another approach is presented in
rls-9.4-upd-sb-views-v5 on  g...@github.com:ringerc/postgres.git . The
Query node is used to record the recursive expansion parent list
instead, and copying is avoided.


Cannot fetch or clone.

github on web says This repository is temporarily unavailable. The 
backend storage is temporarily offline. Usually this means the storage 
server is undergoing maintenance. Please contact support if the problem 
persists.


regards,
Yeb



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication

2014-02-06 Thread Andres Freund
On 2014-02-05 13:26:15 -0500, Robert Haas wrote:
 On Wed, Feb 5, 2014 at 1:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  It feels weird to me that the new columns are called transactionid and
  xmin.  Why not xid and xmin?
 
  Actually the part of that that bothers me is xmin, which conflicts
  with a reserved system column name.  While you can legally pick such
  conflicting names for view columns, it's not going to be so much fun
  when you try to join that view against some regular table.
 
 That's a fair point, too.  So maybe we should go with something like
 backend_xid and backend_xmin or some other prefix that we come up
 with.  My concern is more that I think they should be consistent
 somehow.

Those work for me.

We have a bit of a confusing situation atm, pg_prepared_xact calls it's
xid transaction, pg_locks transactionid... So if we add a new speling,
we should like it sufficiently to use it in the future.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [doc patch] extra_float_digits and casting from real to numeric

2014-02-06 Thread Christoph Berg
Re: Robert Haas 2014-02-05 
ca+tgmozy4kkaptkf2pxcoqnt-jb0ysp8w2php_qge62nilv...@mail.gmail.com
  literal0/literal, the output is the same on every platform
  supported by PostgreSQL.  Increasing it will produce output that
  more accurately represents the stored value, but may be unportable.
  +   Casts to other numeric datatypes and the literalto_char/literal
  +   function are not affected by this setting, it affects only the text
  +   representation.
 /para
/note
 
 
  Anyone for that patch?
 
 Well, the new text kinda recapitulates what the existing text already
 says.  If we're going to clarify, I'd do it like this:
 
  The xref linkend=guc-extra-float-digits setting controls the
   number of extra significant digits included when a floating point
   value is converted to text for output.  It does not affect the results
   when a floating point number is converted to some other data type
   or formatted using literalto_char/literal.
 
 But frankly I'm inclined to just leave it alone.  It says that it
 affects what happens when the value is converted to text for output.
  That's specific and accurate.  Granted, someone could misunderstand,
 but that's true of almost anything we might write, and being too
 long-winded has costs of its own.

Yes, the original text is correct. The point is that the original text
doesn't really make the reader (who might think the numbers he's
seeing on the screen will also be used for computation/inserts) aware
that he's looking at something very different from what will actually
be used internally. I'm sure we were not the only one to stumble over
that problem, and I even knew what extra_float_digits is for, yet I
didn't connect the dots in the beginning.

Including some explicit heads-up sentence in there solves that
problem, either your version, or mine.

Mit freundlichen Grüßen,
Christoph Berg
-- 
Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


signature.asc
Description: Digital signature


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris

On 05/02/14 21:56, Robert Haas wrote:

On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote:

The attached patch replaces the existing siftup method for heapify with
a siftdown method. Tested with random integers it does 18% fewer
compares and takes 10% less time for the heapify, over the work_mem
range 1024 to 1048576.

Both algorithms appear to be O(n) (contradicting Wikipedia's claim
that a siftup heapify is O(n log n)).


I think Wikipedia is right.  Inserting an a tuple into a heap is O(lg
n); doing that n times is therefore O(n lg n).  Your patch likewise
implements an outer loop which runs O(n) times and an inner loop which
runs at most O(lg n) times, so a naive analysis of that algorithm also
comes out to O(n lg n).


The patch implements a siftdown.  Measurements attached.
--
Cheers,
  Jeremy


work_mem	baseline		Sift-down			baseline K	siftdown K		K ratio		baseline K	siftdown K		K ratio
	cmps	time	cmps	time		cmps	cmps		cmps		time	time		time
1024	27271	0.001	22426	0.001		26.6	21.9		82%		6.76E-07	4.93E-07		73%
2048	52153	0.001	44557	0.001		25.5	21.8		85%		6.28E-07	4.47E-07		71%
4096	108660	0.003	89591	0.002		26.5	21.9		82%		6.47E-07	4.63E-07		72%
8192	217341	0.005	179191	0.004		26.5	21.9		82%		6.57E-07	4.88E-07		74%
16384	434101	0.011	358680	0.009		26.5	21.9		83%		6.55E-07	5.42E-07		83%
32768	871110	0.022	717542	0.019		26.6	21.9		82%		6.60E-07	5.74E-07		87%
65536	1740355	0.043	1434967	0.039		26.6	21.9		82%		6.59E-07	5.96E-07		90%
131072	3478497	0.087	2869190	0.078		26.5	21.9		82%		6.62E-07	5.98E-07		90%
262144	6962693	0.173	5740518	0.154		26.6	21.9		82%		6.58E-07	5.89E-07		89%
524288	13912236	0.345	11476660	0.311		26.5	21.9		82%		6.59E-07	5.93E-07		90%
1048576	27839260	0.690	22957368	0.622		26.5	21.9		82%		6.58E-07	5.93E-07		90%

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST support for inet datatypes

2014-02-06 Thread Emre Hasegeli
2014-01-19 12:10, Emre Hasegeli e...@hasegeli.com:
 2014-01-19 Andreas Karlsson andr...@proxel.se:

 I am a bit suspicious about your memcmp based optimization in bitncommon,
 but it could be good. Have you benchmarked it compared to doing the same
 thing with a loop?

 I did, when I was writing that part. I will be happy to do it again. I will
 post the results.

I was testing it by creating GiST indexes. I realized that these test are
inconsistent when BUFFERING = AUTO. I repeated them with BUFFERING = ON.
The function without memcmp was faster in this case. I will change
the function in the next version of the patch.

The test case:

Create table Network as
select (a || '.' || b || '.' || c || '/24')::cidr
from generate_series(0, 255) as a,
generate_series(0, 255) as b,
generate_series(0, 255) as c;

Drop index if exists N;
Create index N on Network using gist(cidr) with (buffering = on);

Create table Network6 as
select ('::' || to_hex(a) || ':' || to_hex(b))::inet
from generate_series(0, 255) as a,
generate_series(0, 65535) as b;

Drop index if exists N6;
Create index N6 on Network6 using gist(inet) with (buffering = on);

What I could not understand is the tests with IP version 6 was much faster.
The row count is same, the index size is bigger.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-06 Thread Kyotaro HORIGUCHI
Hello, I've understood how this works and seems working as
expected.

 Anyway this is just a test module so if things works for you by
 changing the above way, its fine. However I wonder why its not
 generating .def file for you.

Surely.

Getting back on topic, using dsm_keep_segment, I saw postmaster
keeping the section handle for the dsm segment and any session
ever after the creating session gone off could recall the
segment, unlike dsm_keep_mapping couldn't retain them after end
of the creating session. And this exactly resembles linux version
in behavior including attach failure.

The orphan section handles on postmaster have become a matter of
documentation.

Besides all above, I'd like to see a comment for the win32 code
about the 'DuplicateHandle hack', specifically, description that
the DuplicateHandle pushes the copy of the section handle to the
postmaster so the section can retain for the postmaster lifetime.

By the way I have one additional comment.

All postgres processes already keep a section handle for
'Global/PostgreSQL:pgdata file path' aside from dsm. It tells
itself is for the file. On the other hand the names for the dsm
sections are 'Global/PostgreSQL.%d'. This seems a bit unkindly as
they don't tell what they are of. Do you mind changing it to,
say, 'Global/PostgreSQL.dsm.%d' or something like that?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part2: fast scan

2014-02-06 Thread Heikki Linnakangas

On 02/05/2014 12:42 PM, Alexander Korotkov wrote:

Attached patch is light version of fast scan. It does extra consistent
function calls only on startScanKey, no extra calls during scan of the
index.
It finds subset of rarest entries absence of which guarantee false
consistent function result.


Sounds good. Since the extra consistent calls are only performed at 
startup, it's unlikely to cause any noticeable performance regression, 
even when it's not helping.



I've run real-life tests based of postgresql.org logs (33318 queries). Here
is a table with summary time of running whole test case.

=# select method, sum(time) from test_result group by method order by
method;
method|   sum
-+--
  fast-scan-11| 126916.11199
  fast-scan-light |   137321.211
  heikki  |   466751.693
  heikki-true-ternary | 454113.62397
  master  |   446976.288
(6 rows)

where 'heikki' is gin-ternary-logic binary-heap
preconsistent-only-on-new-page.patch and 'heikki-true-ternary' is version
with my catalog changes promoting ternary consistent function to opclass.


Looks good.


We can see fast-scan-light gives almost same performance benefit on 
queries. However, I test only CPU effect, not IO effect.
Any thoughts?


This test doesn't handle lossy-pages correctly:


/*
 * Check if all items marked as scanEntryUseForSkip is not 
present in
 * minItem. If so, we can correct advancePast.
 */
if (ginCompareItemPointers(minItem, minItemSkip)  0)
{
advancePast = minItemSkip;
advancePast.ip_posid--;
continue;
}



If minItemSkip is a lossy page, we skip all exact items on the same 
page. That's wrong, here's a test case to demonstrate:


CREATE TABLE foo (ts tsvector);
INSERT INTO foo SELECT to_tsvector('foo bar'||g) from generate_series(1, 
100) g;

CREATE INDEX i_foo ON foo USING gin (ts);

set work_mem='64'; -- to force lossy pages
-- should return 11
select count(*) from foo where ts @@ to_tsquery('foo  bar4:*');

After some fiddling (including fixing the above), I ended up with the 
attached version of your patch. I still left out the catalog changes, 
again not because I don't think we should have them, but to split this 
into smaller patches that can be reviewed separately. I extended the 
both ways shim function to work with more than one maybe input. Of 
course, that is O(n^2), where n is the number of maybe arguments, so 
that won't scale, but it's OK for testing purposes. And many if not most 
real world queries, too.


I had a very hard time understanding what it really means for an entry 
to be in the scanEntryUseForSkip array. What does it mean to use an 
entry for skipping?


So I refactored that logic, to hopefully make it more clear. In essence, 
the entries are divided into two groups, required and other items. If 
none of the entries in the required set are true, then we know that the 
overall qual cannot match. See the comments for a more detailed 
explanation. I'm not wedded to the term required here; one might think 
that *all* the entries in the required set must be TRUE for a match, 
while it's actually that at least one of them must be TRUE.


In keyGetItem, we fetch the minimum item among all the entries in the 
requiredEntries set. That's the next item we're going to return, 
regardless of any items in otherEntries set. But before calling the 
consistent function, we advance the other entries up to that point, so 
that we know whether to pass TRUE or FALSE to the consistent function. 
IOW, otherEntries only provide extra information for the consistent 
function to decide if we have a match or not - they don't affect which 
items we return to the caller.


I think this is pretty close to committable state now. I'm slightly 
disappointed that we can't do the skipping in more scenarios. For 
example, in foo  bar, we can currently skip bar entry up to the 
next foo, but we cannot skip foo up to the next bar. But this is 
fairly simple, and since we sort the entries by estimated frequency, 
should already cover all the pathological rare  frequent type 
queries. So that's OK.


- Heikki
diff --git a/src/backend/access/gin/Makefile b/src/backend/access/gin/Makefile
index aabc62f..db4f496 100644
--- a/src/backend/access/gin/Makefile
+++ b/src/backend/access/gin/Makefile
@@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = ginutil.o gininsert.o ginxlog.o ginentrypage.o gindatapage.o \
 	ginbtree.o ginscan.o ginget.o ginvacuum.o ginarrayproc.o \
-	ginbulk.o ginfast.o ginpostinglist.o
+	ginbulk.o ginfast.o ginpostinglist.o ginlogic.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index a45d722..b6f521c 100644
--- 

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-06 Thread Kyotaro HORIGUCHI
Hello, let me say in passing, 

 However I wonder why its not generating .def file for you.

Is the 'it' is Visual Studio IDE or CL? Mmm, as far as I know
.def file is a stuff that programmers should write by their hands
as a matter of course.

I've found no way to automatically generate .def files.. (However
itself would be no matter.)


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part2: fast scan

2014-02-06 Thread Alexander Korotkov
On Thu, Feb 6, 2014 at 2:21 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 On 02/05/2014 12:42 PM, Alexander Korotkov wrote:

 Attached patch is light version of fast scan. It does extra consistent
 function calls only on startScanKey, no extra calls during scan of the
 index.
 It finds subset of rarest entries absence of which guarantee false
 consistent function result.


 Sounds good. Since the extra consistent calls are only performed at
 startup, it's unlikely to cause any noticeable performance regression, even
 when it's not helping.


  I've run real-life tests based of postgresql.org logs (33318 queries).
 Here
 is a table with summary time of running whole test case.

 =# select method, sum(time) from test_result group by method order by
 method;
 method|   sum
 -+--
   fast-scan-11| 126916.11199
   fast-scan-light |   137321.211
   heikki  |   466751.693
   heikki-true-ternary | 454113.62397
   master  |   446976.288
 (6 rows)

 where 'heikki' is gin-ternary-logic binary-heap
 preconsistent-only-on-new-page.patch and 'heikki-true-ternary' is version
 with my catalog changes promoting ternary consistent function to opclass.


 Looks good.


  We can see fast-scan-light gives almost same performance benefit on 
 queries. However, I test only CPU effect, not IO effect.
 Any thoughts?


 This test doesn't handle lossy-pages correctly:

  /*
  * Check if all items marked as scanEntryUseForSkip is
 not present in
  * minItem. If so, we can correct advancePast.
  */
 if (ginCompareItemPointers(minItem, minItemSkip)  0)
 {
 advancePast = minItemSkip;
 advancePast.ip_posid--;
 continue;
 }


 If minItemSkip is a lossy page, we skip all exact items on the same page.
 That's wrong, here's a test case to demonstrate:

 CREATE TABLE foo (ts tsvector);
 INSERT INTO foo SELECT to_tsvector('foo bar'||g) from generate_series(1,
 100) g;
 CREATE INDEX i_foo ON foo USING gin (ts);

 set work_mem='64'; -- to force lossy pages
 -- should return 11
 select count(*) from foo where ts @@ to_tsquery('foo  bar4:*');

 After some fiddling (including fixing the above), I ended up with the
 attached version of your patch. I still left out the catalog changes, again
 not because I don't think we should have them, but to split this into
 smaller patches that can be reviewed separately. I extended the both ways
 shim function to work with more than one maybe input. Of course, that is
 O(n^2), where n is the number of maybe arguments, so that won't scale,
 but it's OK for testing purposes. And many if not most real world queries,
 too.

 I had a very hard time understanding what it really means for an entry to
 be in the scanEntryUseForSkip array. What does it mean to use an entry
 for skipping?

 So I refactored that logic, to hopefully make it more clear. In essence,
 the entries are divided into two groups, required and other items. If none
 of the entries in the required set are true, then we know that the overall
 qual cannot match. See the comments for a more detailed explanation. I'm
 not wedded to the term required here; one might think that *all* the
 entries in the required set must be TRUE for a match, while it's actually
 that at least one of them must be TRUE.

 In keyGetItem, we fetch the minimum item among all the entries in the
 requiredEntries set. That's the next item we're going to return, regardless
 of any items in otherEntries set. But before calling the consistent
 function, we advance the other entries up to that point, so that we know
 whether to pass TRUE or FALSE to the consistent function. IOW, otherEntries
 only provide extra information for the consistent function to decide if we
 have a match or not - they don't affect which items we return to the caller.

 I think this is pretty close to committable state now. I'm slightly
 disappointed that we can't do the skipping in more scenarios. For example,
 in foo  bar, we can currently skip bar entry up to the next foo, but
 we cannot skip foo up to the next bar. But this is fairly simple, and
 since we sort the entries by estimated frequency, should already cover all
 the pathological rare  frequent type queries. So that's OK.


Sorry for my scanty explanation. Your version of patch looks good for me.
I've rerun my testcase:

=# select method, sum(time) from test_result group by method order by
method;
 method |   sum
+--
 fast-scan-11   | 126916.11199
 fast-scan-light|   137321.211
 fast-scan-light-heikki | 138168.02801
 heikki |   466751.693
 heikki-tru-ternary | 454113.62397
 master |   446976.288
 tri-state   

[HACKERS] Small GIN optimizations (after 9.4)

2014-02-06 Thread Heikki Linnakangas
While hacking on the GIN patches, I've come up with a few different 
ideas for improving performance. It's too late for 9.4, but I'll list 
them here if someone wants to work on them later:


* Represent ItemPointers as uint64's, to speed up comparisons. 
ginCompareItemPointers is inlined into only a few instructions, but it's 
still more expensive than a single-instruction 64-bit comparison. 
ginCompareItemPointers is called very heavily in a GIN scan, so even a 
small improvement there would make for a noticeable speedup. It might be 
an improvement in code clarity, too.


* Keep the entry streams of a GinScanKey in a binary heap, to quickly 
find the minimum curItem among them.


I did this in various versions of the fast scan patch, but then I 
realized that the straightforward way of doing it is wrong, because a 
single GinScanEntry can be part of multiple GinScanKeys. If an entry's 
curItem is updated as part of advancing one key, and the entry is in a 
heap of another key, updating the curItem can violate the heap property 
of the other entry's heap.


* Build a truth table (or cache) of consistent-function's results, and 
use that instead of calling consistent for every item.


* Deduce AND or OR logic from the consistent function. Or have the 
opclass provide a tree of AND/OR/NOT nodes directly, instead of a 
consistent function. For example, if the query is foo  bar, we could 
avoid calling consistent function altogether, and only return items that 
match both.


* Delay decoding segments during a scan. Currently, we decode all 
segments of a posting tree page into a single array at once. But with 
fast scan, we might be able to skip over all entries in some of the 
segments. So it would be better to copy the segments into 
backend-private memory in compressed format, and decode them one segment 
at a time (or maybe even one item at a time), when needed. That would 
avoid the unnecessary decoding of segments that can be skipped over, and 
would also reduce memory usage of a scan.


I'll add these to the TODO.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-06 Thread Amit Kapila
On Thu, Feb 6, 2014 at 9:13 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Feb 5, 2014 at 8:50 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 02/05/2014 04:48 PM, Amit Kapila wrote:

 I have done one test where there is a large suffix match, but
 not large enough that it can compress more than 75% of string,
 the CPU overhead with wal-update-prefix-suffix-encode-1.patch is
 not much, but there is no I/O reduction as well.


 Hmm, it's supposed to compress if you save at least 25%, not 75%. Apparently
 I got that backwards in the patch...

 So If I understand the code correctly, the new check should be

 if (prefixlen + suffixlen  (slen * need_rate) / 100)
 return false;

 rather than

 if (slen - prefixlen - suffixlen  (slen * need_rate) / 100)
 return false;

Considering above change as correct, I have tried to see the worst
case overhead for this patch by having new tuple such that after
25% or so of suffix/prefix match, there is a small change in tuple
and kept rest of tuple same as old tuple and it shows overhead
for this patch as well.

Updated test script is attached.

Unpatched
 testname | wal_generated | duration
--+---+--
 ten long fields, 8 bytes changed | 348843824 | 5.56866788864136
 ten long fields, 8 bytes changed | 348844800 | 5.84434294700623
 ten long fields, 8 bytes changed | 35050 | 5.92329406738281
(3 rows)



wal-update-prefix-suffix-encode-1.patch

 testname | wal_generated | duration
--+---+--
 ten long fields, 8 bytes changed | 348845624 | 6.92243480682373
 ten long fields, 8 bytes changed | 348847000 | 8.35828399658203
 ten long fields, 8 bytes changed | 350204752 | 7.61826491355896
(3 rows)

One minor point, can we avoid having prefix tag if prefixlen is 0.

+ /* output prefix as a tag */
+ pgrb_out_tag(ctrlp, ctrlb, ctrl, bp, prefixlen, hlen);



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


wal-update-testsuite.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-02-06 Thread Robert Haas
On Thu, Feb 6, 2014 at 12:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Feb 5, 2014 at 2:26 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
 Then again, why is the behavior of schema-qualifying absolutely
 everything even desirable?

 Well, someone could create a collation in another schema with the same
 name as a system collation and the command would become ambiguous.

 Hmm, good point.  I guess we don't worry much about this with pg_dump
 because we assume that we're restoring into an empty database (and if
 not, the user gets to keep both pieces).  You're applying a higher
 standard here.

 Robert, that's just horsepucky.  pg_dump is very careful about schemas.
 It's also careful to not schema-qualify names unnecessarily, which is an
 intentional tradeoff to improve readability of the dump --- at the cost
 that the dump might break if restored into a nonempty database with
 conflicting objects.  In the case of data passed to event triggers,
 there's a different tradeoff to be made: people will probably value
 consistency over readability, so always-qualify is probably the right
 choice here.  But in neither case are we being sloppy.

I didn't mean to imply otherwise.  I know the pg_dump tries to avoid
excess schema-qualification for readability among other reasons; what
I meant was that Alvaro is applying a higher standard specifically in
regards to replayability.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-02-06 Thread Amit Kapila
On Wed, Feb 5, 2014 at 8:56 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Wed, Feb 5, 2014 at 5:13 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 02/05/2014 07:54 AM, Amit Kapila wrote:

 That's not the worst case, by far.

 First, note that the skipping while scanning new tuple is only performed in
 the first loop. That means that as soon as you have a single match, you fall
 back to hashing every byte. So for the worst case, put one 4-byte field as
 the first column, and don't update it.

 Also, I suspect the runtimes in your test were dominated by I/O. When I
 scale down the number of rows involved so that the whole test fits in RAM, I
 get much bigger differences with and without the patch. You might also want
 to turn off full_page_writes, to make the effect clear with less data.

 So with this test, the overhead is very significant.

 With the skipping logic, another kind of worst case case is that you have
 a lot of similarity between the old and new tuple, but you miss it because
 you skip.

 This is exactly the reason why I have not kept skipping logic in second
 pass(loop), but I think may be it would have been better to keep it not
 as aggressive as in first pass.

I have tried to merge pass-1 and pass-2 and kept skipping logic as same,
and it have reduced the overhead to a good extent but not completely for
the new case you have added. This change is to check if it can reduce
overhead, if we want to proceed, may be we can limit the skip factor, so
that chance of skipping some match data is reduced.

New version of patch is attached with mail

Unpatched

   testname   | wal_generated | duration
--+---+--
 ten long fields, all changed | 348842856 | 6.93688106536865
 ten long fields, all changed | 348843672 | 7.53063702583313
 ten long fields, all changed | 352662344 | 7.76640701293945
(3 rows)


pgrb_delta_encoding_v8.patch
 testname | wal_generated | duration
--+---+--
 ten long fields, but one changed | 348848144 | 9.22694897651672
 ten long fields, but one changed | 348841376 | 9.11818099021912
 ten long fields, but one changed | 352963488 | 8.37875485420227
(3 rows)


pgrb_delta_encoding_v9.patch

 testname | wal_generated | duration
--+---+--
 ten long fields, but one changed | 350166320 | 8.84561610221863
 ten long fields, but one changed | 348840728 | 8.45299792289734
 ten long fields, but one changed | 348846656 | 8.34846496582031
(3 rows)


It appears to me that it can be good idea to merge both the patches
(prefix-suffix encoding + delta-encoding) in a way such that if we
get reasonable compression (50% or so) with prefix-suffix, then we
can return without doing delta encoding and if compression is lesser
than we can do delta encoding for rest of tuple. The reason I think it
will be good because by just doing prefix-suffix we might leave many
cases where good compression is possible.
If you think it is viable way, then I can merge both the patches and
check the results.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgrb_delta_encoding_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-security on updatable s.b. views

2014-02-06 Thread Craig Ringer
On 02/06/2014 04:54 PM, Yeb Havinga wrote:
 On 2014-02-06 05:43, Craig Ringer wrote:
 Based on Tom's objections, another approach is presented in
 rls-9.4-upd-sb-views-v5 on  g...@github.com:ringerc/postgres.git . The
 Query node is used to record the recursive expansion parent list
 instead, and copying is avoided.
 
 Cannot fetch or clone.
 
 github on web says This repository is temporarily unavailable. The
 backend storage is temporarily offline. Usually this means the storage
 server is undergoing maintenance. Please contact support if the problem
 persists.

If this persists, or someone else has the same issue, try the HTTPS url:

https://github.com/ringerc/postgres.git

per: http://github.com/ringerc/postgres/

(I need to chase up my git.postgresql.org account request, it seems to
have got lost).


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-06 Thread Amit Kapila
On Thu, Feb 6, 2014 at 4:10 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, let me say in passing,

 However I wonder why its not generating .def file for you.

 Is the 'it' is Visual Studio IDE or CL? Mmm, as far as I know
 .def file is a stuff that programmers should write by their hands
 as a matter of course.

Using MSVC.
We have gendef.pl which can do it.

Example in Postgres project properties, in
Configuration Properties-Build Events-Pre-Link Event, there
is a Command Line like below which can do the required work.
perl src\tools\msvc\gendef.pl Debug\postgres x64


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2014-02-06 Thread Peter Eisentraut
On 7/20/13, 9:06 PM, Greg Stark wrote:
 In theory this tool is promising though since it works by looking at
 the llvm bytecode to determine what the real syntax is. It should be
 able to handle the typedef issues we have with most of the the tools.

Since clang-format has no options to specify include file paths, I doubt
that it actually analyzes the file to the extent that it knows about
typedefs.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2014-02-06 Thread Andres Freund
On 2013-07-21 02:06:02 +0100, Greg Stark wrote:
 On Thu, Jun 27, 2013 at 10:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  AFAIR, no one has ever done a serious comparison to anything except GNU
  indent, and (at least at the time) it seemed to have bugs as bad as
  pgindent's, just different ones.  I'm certainly open to another choice
  as long as we don't lose on portability of the tool.  But who will do
  the legwork to test something else?
 
 Fwiw I played with clang-format a bit the other day. But I couldn't
 get it to resemble our coding style. It was strongly of the opinion
 that spacing within lines should be completely consistent and that
 meant it eliminated all spaces that lined up initializers and such.

FWIW, there's some work going on to add more options (like pg correct
differences between function definitions and declarations). So it might
develop into something usable for pg.

What I haven't seen in any tool yet is something to resemble our
indentation style for variable declarations. Do we feel that's a
dealbreaker for any tool?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-security on updatable s.b. views

2014-02-06 Thread Craig Ringer
On 02/06/2014 12:43 PM, Craig Ringer wrote:
 1. Try (again) to do row-security in the rewriter. This was previously
 impossible because of the definition of row-security behaviour around
 inheritance, but with the simplified inheritance model now proposed I
 think it's possible.

Thanks to the simplified requirements for inheritance, this turns out to
be fairly easy. There's a version rewritten to use the rewriter in the tag:

   rls-9.4-upd-sb-views-v6

on https://github.com/ringerc/postgres.git

The trickiest bit remaining is how to register the PlanInvalItem to
force plan invalidation when the user-id changes. This was easy in the
optimizer, but it's not obvious how to do it cleanly in the rewriter.
I've got a couple of ideas but don't much like either of them.
Recommendations from the experienced welcomed.

Other more minor open items, each of which look quite quick:

I haven't plugged in rewriter-based recursion detection yet, but that
looks fairly easy (famous last words?). It's 10pm and I have an early
start, so that won't happen tonight.

I haven't removed some cruft from the previous approach from the Query
node either; I need to trim the diff.

The regression test expected file needs adjustment to match the new
inheritance rules, and to cover a couple of new tests I've added.

Needs a better error when it gets an unacceptable rewrite product during
security qual rewriting (modeled on the errors for CTEs). Easy, just
some cookie cutter code.

Need to cherry-pick the docs patch back on top of the patch tree now
things are settling.



Amazingly, I think this has a hope.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Robert Haas
On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris j...@wizmail.org wrote:
 On 05/02/14 21:56, Robert Haas wrote:
 On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote:
 The attached patch replaces the existing siftup method for heapify with
 a siftdown method. Tested with random integers it does 18% fewer
 compares and takes 10% less time for the heapify, over the work_mem
 range 1024 to 1048576.

 Both algorithms appear to be O(n) (contradicting Wikipedia's claim
 that a siftup heapify is O(n log n)).


 I think Wikipedia is right.  Inserting an a tuple into a heap is O(lg
 n); doing that n times is therefore O(n lg n).  Your patch likewise
 implements an outer loop which runs O(n) times and an inner loop which
 runs at most O(lg n) times, so a naive analysis of that algorithm also
 comes out to O(n lg n).

 The patch implements a siftdown.  Measurements attached.

Uh, this spreadsheet is useless.  You haven't explained how these
numbers were generated, or what the columns in the spreadsheet
actually mean.  Ideally, you should include enough information that
someone else can try to reproduce your results.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-security on updatable s.b. views

2014-02-06 Thread Craig Ringer
On 02/06/2014 10:19 PM, Craig Ringer wrote:
 On 02/06/2014 12:43 PM, Craig Ringer wrote:
 1. Try (again) to do row-security in the rewriter. This was previously
 impossible because of the definition of row-security behaviour around
 inheritance, but with the simplified inheritance model now proposed I
 think it's possible.
 
 Thanks to the simplified requirements for inheritance, this turns out to
 be fairly easy. There's a version rewritten to use the rewriter in the tag:
 
rls-9.4-upd-sb-views-v6
 
 on https://github.com/ringerc/postgres.git
 
 The trickiest bit remaining is how to register the PlanInvalItem to
 force plan invalidation when the user-id changes. This was easy in the
 optimizer, but it's not obvious how to do it cleanly in the rewriter.
 I've got a couple of ideas but don't much like either of them.
 Recommendations from the experienced welcomed.

Or, after thinking about it for a second with my tired brain, not so much.

We don't rerun rewrite on plan invalidation.

So that means the superuser exemption won't work properly with this patch.

So much for having a hope, that's not a small thing to fix.

So: either I invoke the rewriter from within the optimizer on the
security quals, or I make the rewriter re-run on plan invalidation.
Neither is small or simple.

Blast.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] open and close columns in the NEW record not allowed

2014-02-06 Thread Rafael Martinez Guerrero
Hello

One of our users is having a problem with a trigger in a system running
postgresql 9.3.

The problem is that pl/pgsql does not accept open and close as column
names when used in the NEW record in a trigger function.

This page:
http://www.postgresql.org/docs/9.3/static/sql-keywords-appendix.html
does not say that they are reserved words in postgresql (although they
are reserved words in the sql standard)

In the other hand, postgres allows to create and update tables with
columns named open/close without problems.

We think the behavior should be consistent, either it is allow to use
them or not, but not like it is today.

-
Test case:
-
CREATE TABLE test_open(id integer,open timestamp);
CREATE TABLE test_close(id integer,close timestamp);
CREATE TABLE test_close_trigger(id integer,close timestamp);
CREATE TABLE test_open_trigger(id integer,open timestamp);

CREATE OR REPLACE FUNCTION test_open()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
 INSERT INTO test_open_trigger (id, open)
 VALUES (NEW.id, NEW.open);
 RETURN NEW;
END;
$function$;

CREATE OR REPLACE FUNCTION test_close()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
 INSERT INTO test_close_trigger (id, close)
 VALUES (NEW.id, NEW.close);
 RETURN NEW;
END;
$function$;

# INSERT INTO test_open (id,open) VALUES (1,now());
INSERT 0 1
# INSERT INTO test_close (id,close) VALUES (1,now());
INSERT 0 1
# SELECT * FROM test_open;
 id |open
+
  1 | 2014-02-06 15:17:52.654977
(1 row)

# SELECT * FROM test_close;
 id |   close
+
  1 | 2014-02-06 15:17:53.893911
(1 row)

CREATE TRIGGER test_open AFTER INSERT ON test_open FOR EACH ROW EXECUTE
PROCEDURE test_open();

CREATE TRIGGER test_close AFTER INSERT ON test_close FOR EACH ROW
EXECUTE PROCEDURE test_close();

# INSERT INTO test_open (id,open) VALUES (1,now());
ERROR:  record new has no field open
LINE 3:  VALUES (NEW.id, NEW.open)
 ^
QUERY:  INSERT INTO public.test_open_trigger
 (id, open)
 VALUES (NEW.id, NEW.open)
CONTEXT:  PL/pgSQL function test_open() line 3 at SQL statement
# INSERT INTO test_close (id,close) VALUES (1,now());
ERROR:  record new has no field close
LINE 3:  VALUES (NEW.id, NEW.close)
 ^
QUERY:  INSERT INTO public.test_close_trigger
 (id, close)
 VALUES (NEW.id, NEW.close)
CONTEXT:  PL/pgSQL function test_close() line 3 at SQL statement

-

Thanks in advance.

regards,
-- 
Rafael Martinez Guerrero
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-06 Thread Greg Stark
On Tue, Feb 4, 2014 at 6:07 PM, David E. Wheeler da...@justatheory.com wrote:
 The install failed, of course, because extensions want to install in 
 $PGROOT/share/extensions.

Homebrew sounds kind of confused. Having a non-root user have access
to make global system changes sounds like privilege escalation
vulnerability by design.

However putting that aside, it is fairly standard for software to
provide two directories for extensions/modules/plugins/etc. One for
distribution-built software such as /usr/share/emacs/site-lisp/ and
another for sysadmin customizations such as
/usr/local/share/emacs/site-lisp. The same idea as /usr/share/perl and
/usr/local/share/perl or with Python or anything else.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2014-02-06 Thread Alvaro Herrera
Andres Freund wrote:

 What I haven't seen in any tool yet is something to resemble our
 indentation style for variable declarations. Do we feel that's a
 dealbreaker for any tool?

I find our style more aesthetically pleasing than any other I've seen;
but anyway the changes if we used a tool that formatted them differently
would cause way too much havoc to even be considered, IMO.

You know what I wouldn't mind losing?  The indentation of our
switch/case blocks.  It wastes too much valuable space at the left.
But then there are ~1000 uses of it in the tree so perhaps it'd be too
much trouble as well.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2014-02-06 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 What I haven't seen in any tool yet is something to resemble our
 indentation style for variable declarations. Do we feel that's a
 dealbreaker for any tool?

Well, we're unlikely to change pgindent's rules to conform to the
behavior of some random emacs mode, if that's what you mean ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-security on updatable s.b. views

2014-02-06 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 We don't rerun rewrite on plan invalidation.

Don't we?  plancache.c certainly does, in fact it starts from the raw
grammar output.  Skipping the rewriter would mean failing to respond
to CREATE OR REPLACE VIEW, for example.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] open and close columns in the NEW record not allowed

2014-02-06 Thread Adrian Klaver

On 02/06/2014 06:35 AM, Rafael Martinez Guerrero wrote:

Hello

One of our users is having a problem with a trigger in a system running
postgresql 9.3.

The problem is that pl/pgsql does not accept open and close as column
names when used in the NEW record in a trigger function.

This page:
http://www.postgresql.org/docs/9.3/static/sql-keywords-appendix.html
does not say that they are reserved words in postgresql (although they
are reserved words in the sql standard)

In the other hand, postgres allows to create and update tables with
columns named open/close without problems.

We think the behavior should be consistent, either it is allow to use
them or not, but not like it is today.



The catch all from here:

http://www.postgresql.org/docs/9.3/interactive/sql-keywords-appendix.html

is:

 As a general rule, if you get spurious parser errors for commands that 
contain any of the listed key words as an identifier you should try to 
quote the identifier to see if the problem goes away.



Which indeed solves the problem on my end at least:

test= CREATE OR REPLACE FUNCTION public.test_open()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
BEGIN
 INSERT INTO test_open_trigger (id, open)
 VALUES (NEW.id, NEW.open);
 RETURN NEW;
END;
$function$
;

test= \d test_open
 Table public.test_open
 Column |Type | Modifiers
+-+---
 id | integer |
 open   | timestamp without time zone |
Triggers:
test_open AFTER INSERT ON test_open FOR EACH ROW EXECUTE PROCEDURE 
test_open()


test= INSERT INTO test_open (id,open) VALUES (1,now());
INSERT 0 1




Thanks in advance.

regards,




--
Adrian Klaver
adrian.kla...@gmail.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] open and close columns in the NEW record not allowed

2014-02-06 Thread Tom Lane
Rafael Martinez Guerrero r.m.guerr...@usit.uio.no writes:
 The problem is that pl/pgsql does not accept open and close as column
 names when used in the NEW record in a trigger function.

Yup.  Those words (and other words that can start a plpgsql statement)
are reserved so far as plpgsql is concerned.

 This page:
 http://www.postgresql.org/docs/9.3/static/sql-keywords-appendix.html
 does not say that they are reserved words in postgresql (although they
 are reserved words in the sql standard)

It is not the business of that page to document the behavior of plpgsql.
Perhaps the plpgsql chapter should document what it considers to be
reserved words, but for now, you could look at the list in

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/pl/plpgsql/src/pl_scanner.c

 We think the behavior should be consistent, either it is allow to use
 them or not, but not like it is today.

That would require giving plpgsql a privileged position over all other
PLs, which isn't going to happen ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2014-02-06 Thread Andres Freund
On 2014-02-06 10:04:54 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  What I haven't seen in any tool yet is something to resemble our
  indentation style for variable declarations. Do we feel that's a
  dealbreaker for any tool?
 
 Well, we're unlikely to change pgindent's rules to conform to the
 behavior of some random emacs mode, if that's what you mean ...

Nah, not for that ;). I mean I haven't seen any replacement for pgindent
that generates equivalent variable indentations.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2014-02-06 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Nah, not for that ;). I mean I haven't seen any replacement for pgindent
 that generates equivalent variable indentations.

Seems odd.  I thought netbsd's indent was pretty much the granddaddy of
them all --- so you'd expect newer tools to be able to replicate that
style.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] open and close columns in the NEW record not allowed

2014-02-06 Thread Rafael Martinez Guerrero
On Thu, 2014-02-06 at 07:11 -0800, Adrian Klaver wrote:
 On 02/06/2014 06:35 AM, Rafael Martinez Guerrero wrote:

  We think the behavior should be consistent, either it is allow to use
  them or not, but not like it is today.
 
 
  As a general rule, if you get spurious parser errors for commands that 
 contain any of the listed key words as an identifier you should try to 
 quote the identifier to see if the problem goes away.
 
 
 Which indeed solves the problem on my end at least:
 

Hello

Thanks for the feedback.

Our problem is that an application decides the name of the columns in
the tables and XDB replication from EnterpriseDB decides the triggers.
We have no control over the code :-( 

regards,
-- 
Rafael Martinez Guerrero
Center for Information Technology Services
University of Oslo, Norway




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] open and close columns in the NEW record not allowed

2014-02-06 Thread k...@rice.edu
On Thu, Feb 06, 2014 at 04:21:41PM +0100, Rafael Martinez Guerrero wrote:
 On Thu, 2014-02-06 at 07:11 -0800, Adrian Klaver wrote:
  On 02/06/2014 06:35 AM, Rafael Martinez Guerrero wrote:
 
   We think the behavior should be consistent, either it is allow to use
   them or not, but not like it is today.
  
  
   As a general rule, if you get spurious parser errors for commands that 
  contain any of the listed key words as an identifier you should try to 
  quote the identifier to see if the problem goes away.
  
  
  Which indeed solves the problem on my end at least:
  
 
 Hello
 
 Thanks for the feedback.
 
 Our problem is that an application decides the name of the columns in
 the tables and XDB replication from EnterpriseDB decides the triggers.
 We have no control over the code :-( 
 
 regards,
 -- 
 Rafael Martinez Guerrero
 Center for Information Technology Services
 University of Oslo, Norway
 

Hi Rafael,

It sounds like a bug in the XDB trigger generation code. Maybe file a bug
report.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updated emacs configuration

2014-02-06 Thread Andres Freund
On 2014-02-06 10:24:34 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Nah, not for that ;). I mean I haven't seen any replacement for pgindent
  that generates equivalent variable indentations.
 
 Seems odd.  I thought netbsd's indent was pretty much the granddaddy of
 them all --- so you'd expect newer tools to be able to replicate that
 style.

I don't know, I guess not many projects use the indented variable names
in declarations style these days. But I guess it's not too hard to add.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-06 Thread Stephen Frost
* Greg Stark (st...@mit.edu) wrote:
 On Tue, Feb 4, 2014 at 6:07 PM, David E. Wheeler da...@justatheory.com 
 wrote:
  The install failed, of course, because extensions want to install in 
  $PGROOT/share/extensions.
 
 Homebrew sounds kind of confused. Having a non-root user have access
 to make global system changes sounds like privilege escalation
 vulnerability by design.

I've not played w/ Homebrew myself, but it's installing into /usr/local
and presumably that includes installing things into /usr/local/bin, so
the notion that installing something from Homebrew isn't already (and
intended to be) making global system changes doesn't quite line up.

The end-admin would have to modify the system-installed postgresql.conf
anyway to enable this other directory.  David wasn't suggesting that
Homebrew *should* be able to do so, he was pointing out that it *can't*,
which all makes sense imv.

 However putting that aside, it is fairly standard for software to
 provide two directories for extensions/modules/plugins/etc. One for
 distribution-built software such as /usr/share/emacs/site-lisp/ and
 another for sysadmin customizations such as
 /usr/local/share/emacs/site-lisp. The same idea as /usr/share/perl and
 /usr/local/share/perl or with Python or anything else.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature


adt Makefile, was Re: [HACKERS] jsonb and nested hstore

2014-02-06 Thread Andrew Dunstan


On 02/01/2014 05:20 PM, Andres Freund wrote:

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 1ae9fa0..fd93d9b 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o 
array_typanalyze.o \
tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \
tsvector.o tsvector_op.o tsvector_parser.o \
txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \
-   rangetypes_typanalyze.o rangetypes_selfuncs.o
+   rangetypes_typanalyze.o rangetypes_selfuncs.o \
+   jsonb.o jsonb_support.o

Odd, most OBJS lines are kept in alphabetical order, but that doesn't
seem to be the case here.



This whole list is a mess, and we don't even have all the range_types 
files following each other.


Worth cleaning up?

I'm actually wondering if it might be worth having some subgroups of 
object files and then combining them into $OBJS.


Or it could just be left more or less as is - it's hardly a breakthrough 
advance.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small GIN optimizations (after 9.4)

2014-02-06 Thread PostgreSQL - Hans-Jürgen Schönig
i think there is one more thing which would be really good in GIN and which 
would solve a ton of issues.
atm GIN entries are sorted by item pointer.
if we could sort them by a column it would fix a couple of real work issues 
such as ...

SELECT ... FROM foo WHERE tsearch_query ORDER BY price DESC LIMIT 10 

... or so.
it many cases you want to search for a, say, product and find the cheapest / 
most expensive one.
if the tsearch_query yields a high number of rows (which it often does) the 
subsequent sort will kill you.

many thanks,

hans


On Feb 6, 2014, at 12:36 PM, Heikki Linnakangas wrote:

 While hacking on the GIN patches, I've come up with a few different ideas for 
 improving performance. It's too late for 9.4, but I'll list them here if 
 someone wants to work on them later:
 
 * Represent ItemPointers as uint64's, to speed up comparisons. 
 ginCompareItemPointers is inlined into only a few instructions, but it's 
 still more expensive than a single-instruction 64-bit comparison. 
 ginCompareItemPointers is called very heavily in a GIN scan, so even a small 
 improvement there would make for a noticeable speedup. It might be an 
 improvement in code clarity, too.
 
 * Keep the entry streams of a GinScanKey in a binary heap, to quickly find 
 the minimum curItem among them.
 
 I did this in various versions of the fast scan patch, but then I realized 
 that the straightforward way of doing it is wrong, because a single 
 GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is 
 updated as part of advancing one key, and the entry is in a heap of another 
 key, updating the curItem can violate the heap property of the other entry's 
 heap.
 
 * Build a truth table (or cache) of consistent-function's results, and use 
 that instead of calling consistent for every item.
 
 * Deduce AND or OR logic from the consistent function. Or have the opclass 
 provide a tree of AND/OR/NOT nodes directly, instead of a consistent 
 function. For example, if the query is foo  bar, we could avoid calling 
 consistent function altogether, and only return items that match both.
 
 * Delay decoding segments during a scan. Currently, we decode all segments of 
 a posting tree page into a single array at once. But with fast scan, we 
 might be able to skip over all entries in some of the segments. So it would 
 be better to copy the segments into backend-private memory in compressed 
 format, and decode them one segment at a time (or maybe even one item at a 
 time), when needed. That would avoid the unnecessary decoding of segments 
 that can be skipped over, and would also reduce memory usage of a scan.
 
 I'll add these to the TODO.
 
 - Heikki
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: adt Makefile, was Re: [HACKERS] jsonb and nested hstore

2014-02-06 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 02/01/2014 05:20 PM, Andres Freund wrote:
 Odd, most OBJS lines are kept in alphabetical order, but that doesn't
 seem to be the case here.

 This whole list is a mess, and we don't even have all the range_types 
 files following each other.

 Worth cleaning up?

+1.  It's just neatnik-ism, but isn't compulsive neatnik-ism pretty
much a job requirement for programmers?  It's hard enough dealing
with necessary complexities without having to wonder if some seemingly
arbitrary choice has hidden meanings.

 I'm actually wondering if it might be worth having some subgroups of 
 object files and then combining them into $OBJS.

Nah, let's just alphabetize them and be done.  The Makefile has no
reason to care about subgroups of those files.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: adt Makefile, was Re: [HACKERS] jsonb and nested hstore

2014-02-06 Thread Alvaro Herrera
Andrew Dunstan wrote:

 This whole list is a mess, and we don't even have all the
 range_types files following each other.
 
 Worth cleaning up?
 
 I'm actually wondering if it might be worth having some subgroups of
 object files and then combining them into $OBJS.

Doesn't the MSVC build stuff parse OBJS definitions?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-06 Thread David E. Wheeler
On Feb 6, 2014, at 6:51 AM, Greg Stark st...@mit.edu wrote:

 Homebrew sounds kind of confused. Having a non-root user have access
 to make global system changes sounds like privilege escalation
 vulnerability by design.

Well, the point is that it *doesn’t* make global system changes. I got an error 
on OS X Server with my original formula, because there was no permission to 
install in $PGROOT/share/extensions.

 However putting that aside, it is fairly standard for software to
 provide two directories for extensions/modules/plugins/etc. One for
 distribution-built software such as /usr/share/emacs/site-lisp/ and
 another for sysadmin customizations such as
 /usr/local/share/emacs/site-lisp. The same idea as /usr/share/perl and
 /usr/local/share/perl or with Python or anything else.

Right. And you can also add additional paths for those applications to search.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-06 Thread David E. Wheeler
On Feb 6, 2014, at 7:32 AM, Stephen Frost sfr...@snowman.net wrote:

 The end-admin would have to modify the system-installed postgresql.conf
 anyway to enable this other directory.  David wasn't suggesting that
 Homebrew *should* be able to do so, he was pointing out that it *can't*,
 which all makes sense imv.

Yeah, or be able to add a directory as a Postgres super user at runtime.

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: adt Makefile, was Re: [HACKERS] jsonb and nested hstore

2014-02-06 Thread Andrew Dunstan


On 02/06/2014 11:38 AM, Alvaro Herrera wrote:

Andrew Dunstan wrote:


This whole list is a mess, and we don't even have all the
range_types files following each other.

Worth cleaning up?

I'm actually wondering if it might be worth having some subgroups of
object files and then combining them into $OBJS.

Doesn't the MSVC build stuff parse OBJS definitions?



Good point. At least in some cases it does.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb and nested hstore

2014-02-06 Thread David E. Wheeler
On Feb 5, 2014, at 3:59 PM, Andrew Dunstan and...@dunslane.net wrote:

 I got a slightly earlier start ;-) For people wanting to play along, here's 
 what this change looks like: 
 https://github.com/feodor/postgres/commit/3fe899b3d7e8f806b14878da4a4e2331b0eb58e8

Man I love seeing all that read. :-)

D



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-06 Thread Greg Stark
On Thu, Feb 6, 2014 at 5:49 PM, David E. Wheeler da...@justatheory.com wrote:
 On Feb 6, 2014, at 6:51 AM, Greg Stark st...@mit.edu wrote:

 Homebrew sounds kind of confused. Having a non-root user have access
 to make global system changes sounds like privilege escalation
 vulnerability by design.

 Well, the point is that it *doesn't* make global system changes. I got an 
 error on OS X Server with my original formula, because there was no 
 permission to install in $PGROOT/share/extensions.

Installing into /usr/local is a global system change. Only root should
be able to do that and any user that can do that can easily acquire
root privileges.

 However putting that aside, it is fairly standard for software to
 provide two directories for extensions/modules/plugins/etc. One for
 distribution-built software such as /usr/share/emacs/site-lisp/ and
 another for sysadmin customizations such as
 /usr/local/share/emacs/site-lisp. The same idea as /usr/share/perl and
 /usr/local/share/perl or with Python or anything else.

 Right. And you can also add additional paths for those applications to search.

Well, users can do whatever they want at run-time but there are
blessed paths that are the correct place to install things that these
systems are configured to search automatically. My point was just that
there are generally two such blessed paths, one for the distribution
and one for the local sysadmin.

What you do not want is to have a different path for each piece of
software. That way lies the
/usr/local/kde/bin:/usr/local/gnome/bin:/usr/local/myfavouritehack/bin:...
madness. You can do this with Python or Perl but they won't do it
automatically and everyone who does this with environment variables or
command line flags eventually realizes what a mess it is. (Except Java
programmers)

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeff Janes
On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris j...@wizmail.org wrote:

 The attached patch replaces the existing siftup method for heapify with
 a siftdown method. Tested with random integers it does 18% fewer
 compares and takes 10% less time for the heapify, over the work_mem
 range 1024 to 1048576.


Thanks for working on this.  Several times I've wondered why we didn't use
siftdown for heapifying, as it is a well known optimization.  How big of
sets were you sorting in each case?  Was it always just slightly bigger
than would fit in work_mem?  Did you try sorting already-sorted, reverse
sorted, or pipe-organ shaped data sets?  We will also need to test it on
strings.  I usually use md5(random()::text) to generate strings for such
purposes, at least for a first pass.

The name of the attached patch is a bit unfortunate.  And I'm not sure what
you are doing with tupindex, the change there doesn't seem to be necessary
to the rest of your work, so some explanation on that would be helpful.

The project coding style usually has comments in blocks before loops on
which they comment, rather than interspersed throughout the clauses of the
for statement.



 Both algorithms appear to be O(n) (contradicting Wikipedia's claim
 that a siftup heapify is O(n log n)).


Siftdown is linear in all cases.  Siftup may be linear in the typical case,
but is n log n in the worst case.

Another optimization that is possible is to do only one comparison at each
level (between the two children) all the way down the heap, and then
compare the parent to the chain of smallest children in reverse order.
 This can substantially reduce the number of comparisons on average,
because most tuples sift most of the way down the heap (because most of the
tuples are at the bottom of the heap).  Robert submitted a patch to do this
in the main siftdown routine (which for some reason is
named tuplesort_heap_siftup, rather than siftdown), and I found it a
substantial improvement but it never got pushed forward.  I think Robert
was concerned it might make some obscure cases worse rather than better,
and no one put it through enough serious testing to overcome that concern.

(Also, I think you should make your siftdown code look more like the
existing siftdown code.)

Cheers,

Jeff


[HACKERS] 'dml' value for log_statement

2014-02-06 Thread Sawada Masahiko
Hi all,

Attaching patch provides new value 'dml'  for log_statement.
Currently, The server logs modification statements AND data definition
statements if log_statement is set 'mod'.
So we need to set the 'all' value for log_statement and remove
unnecessary information
if we would like to log only DML.

This patch allows the user to set in detail the setting.
The server logs statement only when INSERT. UPDATE, DELETE and TRUNCATE
statement are executed.
( same as 'mod' value which does not log the DDL)

Comments?

Regards,

---
Sawada Masahiko


log_statement_dml.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extension_control_path

2014-02-06 Thread David E. Wheeler
On Feb 6, 2014, at 9:14 AM, Greg Stark st...@mit.edu wrote:

 Installing into /usr/local is a global system change. Only root should
 be able to do that and any user that can do that can easily acquire
 root privileges.

I agree with you, but I don’t think the Homebrew folks do. Or at least their 
current implementation doesn’t. OT though.

 Well, users can do whatever they want at run-time but there are
 blessed paths that are the correct place to install things that these
 systems are configured to search automatically. My point was just that
 there are generally two such blessed paths, one for the distribution
 and one for the local sysadmin.

Yeah, two blessed would be very useful, but I think the ability to add any 
number of paths would be even better.

 What you do not want is to have a different path for each piece of
 software. That way lies the
 /usr/local/kde/bin:/usr/local/gnome/bin:/usr/local/myfavouritehack/bin:...
 madness. You can do this with Python or Perl but they won't do it
 automatically and everyone who does this with environment variables or
 command line flags eventually realizes what a mess it is. (Except Java
 programmers)

Agreed.

Best,

David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-02-06 Thread Robert Haas
On Thu, Feb 6, 2014 at 3:39 AM, Marti Raudsepp ma...@juffo.org wrote:
 On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote:
 Hmm, sounds a little steep.  Why is it so expensive?  I'm probably
 missing something here, because I would have thought that planner
 support for partial sorts would consist mostly of considering the same
 sorts we consider today, but with the costs reduced by the batching.

 I guess it's because the patch undoes some optimizations in the
 mergejoin planner wrt caching merge clauses and adds a whole lot of
 code to find_mergeclauses_for_pathkeys. In other code paths the
 overhead does seem to be negligible.

 Notice the removal of:
 /* Select the right mergeclauses, if we didn't already */
 /*
  * Avoid rebuilding clause list if we already made one;
  * saves memory in big join trees...
  */

Yeah, I noticed that.  My feeling is that those optimizations got put
in there because someone found them to be important, so I'm skeptical
about removing them.  It may be that having the capability to do a
partial sort makes it seem worth spending more CPU looking for merge
joins, but I'd vote for making any such change a separate patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Proposal: variant of regclass

2014-02-06 Thread Marti Raudsepp
On Tue, Jan 28, 2014 at 10:38 AM, Yugo Nagata nag...@sraoss.co.jp wrote:
 I revised the patch. Could you please review this?

I didn't test the patch due to the duplicate OID compilation error.
But a few things stuck out from the diffs:
* You added some unnecessary spaces at the beginning of the linein
OpernameGetCandidates.
* regclass_guts and regtype_guts can be simplified further by moving
the ereport() code into regclassin, thereby saving the if
(missing_ok)
* I would rephrase the documentation paragraph as:

to_regclass, to_regproc, to_regoper and to_regtype are functions
similar to the regclass, regproc, regoper and regtype casts, except
that they return InvalidOid (0) when the object is not found, instead
of raising an error.

On Wed, Jan 22, 2014 at 1:04 PM, Tatsuo Ishii is...@postgresql.org wrote:
 I thought the consensus was that returning NULL is better than
 InvalidOid? From an earlier message:

 Not sure. There's already at least one counter example:

 pg_my_temp_schema() oid OID of session's temporary schema, or 0 if 
 none

And there are pg_relation_filenode, pg_stat_get_backend_dbid,
pg_stat_get_backend_userid which return NULL::oid. In general I don't
like magic values like 0 in SQL. For example, this query would give
unexpected results because InvalidOid has some other meaning:

select * from pg_aggregate where aggfinalfn=to_regproc('typo');

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-02-06 Thread Marti Raudsepp
On Thu, Feb 6, 2014 at 9:15 PM, Robert Haas robertmh...@gmail.com wrote:
 It may be that having the capability to do a
 partial sort makes it seem worth spending more CPU looking for merge
 joins, but I'd vote for making any such change a separate patch.

Agreed.

Alexander, should I work on splitting up the patch in two, or do you
want to do it yourself?

Should I merge my coding style and enable_partialsort patches while at
it, or do you still have reservations about those?

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-02-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Feb 6, 2014 at 3:39 AM, Marti Raudsepp ma...@juffo.org wrote:
 I guess it's because the patch undoes some optimizations in the
 mergejoin planner wrt caching merge clauses and adds a whole lot of
 code to find_mergeclauses_for_pathkeys. In other code paths the
 overhead does seem to be negligible.

 Yeah, I noticed that.  My feeling is that those optimizations got put
 in there because someone found them to be important, so I'm skeptical
 about removing them.

I put them in, and yeah they are important.  Even with those, and even
with the rather arbitrary heuristic restrictions that joinpath.c puts on
what mergeclause lists to consider, the existing planner spends a whole
lot of effort on mergejoins --- possibly disproportionate to their actual
value.  I think that any patch that removes those optimizations is not
going to fly.  If anything, it'd be better to reduce the number of
mergejoins considered even further, because a lot of the possible plans
are not usefully different.

It's already the case that we expect indxpath.c to predict the useful
orderings (by reference to query_pathkeys and available mergejoin clauses)
and generate suitable paths, rather than trying to identify the orderings
at join time.  Can't that approach be extended to cover this technique?

In any case, the bottom line is that we don't want this patch to cause
the planner to consider large numbers of new but useless sort orderings.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [DOCS] Re: Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)

2014-02-06 Thread Stefan Kaltenbrunner
On 02/05/2014 07:27 PM, Robert Haas wrote:
 On Tue, Feb 4, 2014 at 11:43 PM, Noah Misch n...@leadboat.com wrote:
 Right.  I mean, a lot of the links say things like Section 26.2
 which obviously makes no sense in a standalone text file.

 For xrefs normally displayed that way, text output could emit a URL, either
 inline or in the form of a footnote.  For link targets (e.g. SQL commands)
 having a friendly text fragment for xref sites, use the normal fragment.
 
 True.  If we're going to keep these things around, something like that
 would avoid some annoyances for documentation authors.  But I still
 think we ought to just nuke them, because who cares?

I dont care about HISTORY and even less so about regress_README but I
would prefer to keep INSTALL because I know that people do look at that
one...


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Mon, Feb 3, 2014 at 12:02 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 What version were you running before 9.1.11 exactly?  I took a look
 through all the diffs from 9.1.9 up to 9.1.11, and couldn't find any
 changes that seemed even vaguely related to this.  There are some
 changes in known-transaction tracking, but it's hard to see a connection
 there.  Most of the other diffs are in code that wouldn't execute during
 WAL replay at all.


Both the primary and the standby were 9.1.11 from the get-go. The
database the primary was forked off of was 9.1.10 but as far as I can
tell the primary in the current pair has no problems.

What's worse is we created a new standby from the same base backup and
replayed the same records and it didn't reproduce the problem. This
means either it's a hardware problem -- but we've seen it on multiple
standbys on this database and at least one other database which is in
a different data centre -- or it's a race condition --but that's hard
to credit in the recovery code which is basically single-threaded.

And these records are from before the standby reaches a consistency so
it's hard to see how a connection from a hot standby client could
cause any kind of race condition. The only other thread that could
conceivably cause a heisenbug is the bgwriter. It's hard to imagine
how a race condition in there could be so easy to hit that it would
happen four times on one restore but otherwise go mostly unnoticed.
-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mvcc catalo gsnapshots and TopTransactionContext

2014-02-06 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-02 15:16:45 -0500, Tom Lane wrote:
 I've been thinking about this for the past little while, and I believe
 that it's probably okay to have RelationClearRelation leave the relcache
 entry un-rebuilt, but with rd_isvalid = false so it will be rebuilt when
 next opened.  The rationale is explained in the comments in the attached
 patch.  I've checked that this fixes Noah's test case and still passes
 the existing regression tests.

 Hm, a bit scary, but I don't see an immediate problem.

 The following comment now is violated for nailed relations
  *We assume that at the time we are called, we have at least 
 AccessShareLock
  *on the target index.  (Note: in the calls from RelationClearRelation,
  *this is legitimate because we know the rel has positive refcount.)
 but that should be easy to fix.

Ah, good point.  So we should keep the refcnt test in the nailed-rel case.

 I wonder though, if we couldn't just stop doing the
 RelationReloadIndexInfo() for nailed indexes.

No; you're confusing nailed indexes with mapped indexes.  There are nailed
indexes that aren't on mapped catalogs, see the load_critical_index calls
in RelationCacheInitializePhase3.

 The corresponding comment says:
* If it's a nailed index, then we need to re-read the pg_class row to 
 see
* if its relfilenode changed.

This comment could use some work I guess; needs to say nailed but not
mapped index.

 Do you plan to backpatch this? If so, even to 8.4?

I'm of two minds about that.  I think this is definitely a bug, but we
do not currently have any evidence that there's an observable problem
in practice.  On the other hand, we certainly get reports of
irreproducible issues from time to time, so I don't care to rule out
the theory that some of them might be caused by faulty cache reloads.
That possibility has to be balanced against the risk of introducing
new issues with this patch.

Thoughts?

(BTW, I see that the CLOBBER_CACHE_ALWAYS members of the buildfarm are
unhappy with the IsTransactionState check in RelationIdGetRelation.
Will look at that ... but it seems to be in initdb which breaks a lot
of these rules anyway, so I think it's probably not significant.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 Both the primary and the standby were 9.1.11 from the get-go. The
 database the primary was forked off of was 9.1.10 but as far as I can
 tell the primary in the current pair has no problems.

 What's worse is we created a new standby from the same base backup and
 replayed the same records and it didn't reproduce the problem. This
 means either it's a hardware problem -- but we've seen it on multiple
 standbys on this database and at least one other database which is in
 a different data centre -- or it's a race condition --but that's hard
 to credit in the recovery code which is basically single-threaded.

 And these records are from before the standby reaches a consistency so
 it's hard to see how a connection from a hot standby client could
 cause any kind of race condition. The only other thread that could
 conceivably cause a heisenbug is the bgwriter. It's hard to imagine
 how a race condition in there could be so easy to hit that it would
 happen four times on one restore but otherwise go mostly unnoticed.

I had noticed that the WAL records that were mis-replayed seemed to
be bunched pretty close together (two of them even adjacent).  Could
you confirm that?  If so, it seems like we're looking for some condition
that makes mis-replay fairly probable for a period of time, but in
itself might be quite improbable.  Not that that helps much at
nailing it down.

You might well be on to something with the bgwriter idea, considering
that none of the WAL replay code was originally written with any
concurrent execution in mind.  We might've missed some place where
additional locking is needed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mvcc catalo gsnapshots and TopTransactionContext

2014-02-06 Thread Andres Freund
On 2014-02-06 16:35:07 -0500, Tom Lane wrote:
  I wonder though, if we couldn't just stop doing the
  RelationReloadIndexInfo() for nailed indexes.
 
 No; you're confusing nailed indexes with mapped indexes.  There are nailed
 indexes that aren't on mapped catalogs, see the load_critical_index calls
 in RelationCacheInitializePhase3.

Oh. I'd somehow remembered nailed catalogs would be a subset of mapped
ones. But as you say, they are not. Not sure I like that, but it's
certainly set for now.

  Do you plan to backpatch this? If so, even to 8.4?

 I'm of two minds about that.  I think this is definitely a bug, but we
 do not currently have any evidence that there's an observable problem
 in practice.  On the other hand, we certainly get reports of
 irreproducible issues from time to time, so I don't care to rule out
 the theory that some of them might be caused by faulty cache reloads.
 That possibility has to be balanced against the risk of introducing
 new issues with this patch.
 
 Thoughts?

Let's let it stew a while in master, there certainly are enough
subtleties around this that I'd hesitate to add it to a point release in
the not too far away future. Doesn't seem that urgent to me. But after
that I'd say lets backpatch it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Release schedule for 9.3.3?

2014-02-06 Thread Sean Chittenden
Are there any tentative plans for the 9.3.3 release date? 9.3.2 was released in 
December and it’s getting close to the two month mark for another micro 
release, or at least one seems like one should be right around the corner.

It’s a little early, but I haven’t seen any movement or discussion so I thought 
I’d prod and ask. Thanks in advance. -sc

--
Sean Chittenden
s...@chittenden.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Release schedule for 9.3.3?

2014-02-06 Thread Greg Stark
On Thu, Feb 6, 2014 at 10:55 PM, Sean Chittenden s...@chittenden.org wrote:
 Are there any tentative plans for the 9.3.3 release date? 9.3.2 was released 
 in December and it's getting close to the two month mark for another micro 
 release, or at least one seems like one should be right around the corner.


Bug fix releases are released when we find bugs, not on a schedule.
That said, there were some bugs fixed in the last few weeks including
one fairly serious one so I would expect one in not too long.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris

On 06/02/14 18:21, Jeff Janes wrote:

  How big of
sets were you sorting in each case?


Big enough to go external.  The timings and compare-counts given are
purely for the heapify stage not the total for the sort, so are
constrained by the work_mem not by the sort size per se.

I'm limited to working on a small machine, so the work_mem value
of 1e+6 is approaching my top limit of sort-time pain unless I rip the
heapify out into a test harness.  What range of work_mem is it useful
to test over?


 Was it always just slightly bigger
than would fit in work_mem?


I didn't check.


 Did you try sorting already-sorted, reverse
sorted, or pipe-organ shaped data sets?


Not yet, but I can.  What variety of pipe-organ shape is of
interest (I can think of several that might be called that)?


 We will also need to test it on
strings.  I usually use md5(random()::text) to generate strings for such
purposes, at least for a first pass.


OK.



The name of the attached patch is a bit unfortunate.


Is there a project standard for this?


 And I'm not sure what
you are doing with tupindex, the change there doesn't seem to be necessary
to the rest of your work, so some explanation on that would be helpful.


I'll add commentary.


The project coding style usually has comments in blocks before loops on
which they comment, rather than interspersed throughout the clauses of the
for statement.


I'll adjust that.


Another optimization that is possible is to do only one comparison at each
level (between the two children) all the way down the heap, and then
compare the parent to the chain of smallest children in reverse order.
  This can substantially reduce the number of comparisons on average,
because most tuples sift most of the way down the heap (because most of the
tuples are at the bottom of the heap).


Sounds interesting; I'll see if I can get that going.


(Also, I think you should make your siftdown code look more like the
existing siftdown code.)


A function called by the outer loop?  Can do; the existing does that
because the function is called from multiple places but this will only
be used the once.

Thanks for the review.
--
Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 'dml' value for log_statement

2014-02-06 Thread David Johnston
Sawada Masahiko wrote
 Hi all,
 
 Attaching patch provides new value 'dml'  for log_statement.
 Currently, The server logs modification statements AND data definition
 statements if log_statement is set 'mod'.
 So we need to set the 'all' value for log_statement and remove
 unnecessary information
 if we would like to log only DML.
 
 This patch allows the user to set in detail the setting.
 The server logs statement only when INSERT. UPDATE, DELETE and TRUNCATE
 statement are executed.
 ( same as 'mod' value which does not log the DDL)
 
 Comments?
 
 Regards,
 
 ---
 Sawada Masahiko

-1

I'm not fully versed on what log levels provide what data but if you care
about changes to data then ALTER TABLE table is just as important an
activity as UPDATE table so mod exhibits the recommended behavior and
dml provides something that should be of minimal value.

DML by itself has value since monitoring schema changes without worrying
about transient data updates is understandable.  But a schema-change, by
definition, alters data.

Maybe further description of why you find mod unsatisfactory would be
helpful.

Though it is simple enough to just let people use their own judgement

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/dml-value-for-log-statement-tp5790895p5790925.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris

On 06/02/14 14:22, Robert Haas wrote:

On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris j...@wizmail.org wrote:

On 05/02/14 21:56, Robert Haas wrote:

On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote:

The attached patch replaces the existing siftup method for heapify with
a siftdown method. Tested with random integers it does 18% fewer
compares and takes 10% less time for the heapify, over the work_mem
range 1024 to 1048576.

Both algorithms appear to be O(n) (contradicting Wikipedia's claim
that a siftup heapify is O(n log n)).



I think Wikipedia is right.  Inserting an a tuple into a heap is O(lg
n); doing that n times is therefore O(n lg n).  Your patch likewise
implements an outer loop which runs O(n) times and an inner loop which
runs at most O(lg n) times, so a naive analysis of that algorithm also
comes out to O(n lg n).


The patch implements a siftdown.  Measurements attached.


Uh, this spreadsheet is useless.  You haven't explained how these
numbers were generated, or what the columns in the spreadsheet
actually mean.  Ideally, you should include enough information that
someone else can try to reproduce your results.



I'm sorry I wasn't clear.

Test code was groups of the form:

set work_mem to 1024;
CREATE TABLE jgh (i integer);
INSERT INTO jgh (i) SELECT 2 * random() FROM generate_series(1, 2);
VACUUM ANALYZE jgh;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to off;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to on;
DROP TABLE jgh;

.. for the tabulated work_mem, and scaled values for the INSERT.

Compares were counted for the heapify stage (only), and timings
taken using pg_rusage_init() before and pg_rusage_show() after
it.  Times are in seconds.

Baseline value for compares and time used 858ec11858.
Sift-down compares and time are for the patch.

Baseline K cmps is compares divided by work_mem (which should
be a scaled version of compares-per-tuple being heapified) pre-patch.
Siftdown K cmps is likewise with the patch.

K ratio cmps is the ratio  (patched compares)/(unpatched compares).

Repeat for time measurements.

--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Thu, Feb 6, 2014 at 10:48 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I had noticed that the WAL records that were mis-replayed seemed to
 be bunched pretty close together (two of them even adjacent).  Could
 you confirm that?  If so, it seems like we're looking for some condition
 that makes mis-replay fairly probable for a period of time, but in
 itself might be quite improbable.  Not that that helps much at
 nailing it down.

Well one thing that argues for is hardware problems. It could be that
the right memory mapping just happened to line up with that variable
on the stack for that short time and then it was mapped to something
else entirely. Or the machine was overheating for a short time and
then the temperature became more reasonable. Or the person with the
x-ray source walked by in that short time window.

That doesn't explain the other instance or the other copies of this
database. I think the most productive thing I can do is switch my
attention to the other database to see if it really looks like the
same problem.

 You might well be on to something with the bgwriter idea, considering
 that none of the WAL replay code was originally written with any
 concurrent execution in mind.  We might've missed some place where
 additional locking is needed.

Except that the bgwriter has been in there for a few years already.
Unless there's been some other change, possibly involving copying some
code that was safe in some context but not where it was copied to.

The problem with the bgwriter being at fault is that from what I can
see the bgwriter will never extend a file. That means the xlog
recovery code must have done it. That means even if the bgwriter came
along and looked at the buffer we just read in it would already be too
late to cause mischief. The xlog code extends the file *first* then
reads in the backup block into a buffer. I can't see how it could
corrupt the stack or the wal recovery buffer in the window between
reading in the wal buffer and deciding to extend the relation.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Andres Freund
On 2014-02-06 23:41:19 +0100, Greg Stark wrote:
 The problem with the bgwriter being at fault is that from what I can
 see the bgwriter will never extend a file. That means the xlog
 recovery code must have done it. That means even if the bgwriter came
 along and looked at the buffer we just read in it would already be too
 late to cause mischief. The xlog code extends the file *first* then
 reads in the backup block into a buffer. I can't see how it could
 corrupt the stack or the wal recovery buffer in the window between
 reading in the wal buffer and deciding to extend the relation.

That's not necessarily true. If e.g. the buffer mapping would change
racily, the result write from the bgwriter could very well end up
increasing the file size, leaving a hole inbetween its write and the
original size.
Are there any other relations that are as big as the corrupted relations
got extended to?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mvcc catalo gsnapshots and TopTransactionContext

2014-02-06 Thread Tom Lane
I wrote:
 (BTW, I see that the CLOBBER_CACHE_ALWAYS members of the buildfarm are
 unhappy with the IsTransactionState check in RelationIdGetRelation.
 Will look at that ... but it seems to be in initdb which breaks a lot
 of these rules anyway, so I think it's probably not significant.)

So what's happening is that in bootstrap mode, we run each BKI command
in a separate transaction.  (Since these transactions all use the same
XID, it's not clear what's the point of doing that rather than running
the whole BKI script as one transaction.  But it's been like that for
twenty years so I'm disinclined to mess with it.)  Bootstrap mode is
peculiar in that it expects relcache entries to stay open across these
transactions, which we implement by keeping their refcnts positive.
Throwing CLOBBER_CACHE_ALWAYS into the mix means we do a relcache flush
during transaction start, during which RelationClearRelation tries to
rebuild the cache entries right away because they have positive refcnt,
triggering the Assert because it has to access system relations such as
pg_class and we're not in TRANS_INPROGRESS state yet.

So in short, the patch I proposed upthread fixes this, since it postpones
the actual cache entry reload into the main part of the transaction.

So rather than lobotomize that Assert with an IsBootstrapMode exception,
I think I'll just go ahead and commit this patch, with the cleanups
we discussed earlier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mvcc catalo gsnapshots and TopTransactionContext

2014-02-06 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-02-06 16:35:07 -0500, Tom Lane wrote:
 Thoughts?

 Let's let it stew a while in master, there certainly are enough
 subtleties around this that I'd hesitate to add it to a point release in
 the not too far away future. Doesn't seem that urgent to me. But after
 that I'd say lets backpatch it.

Seems reasonable to me.  The urgency would go up if we could positively
trace any field trouble reports to this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Greg Stark
On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote:

 That's not necessarily true. If e.g. the buffer mapping would change
 racily, the result write from the bgwriter could very well end up
 increasing the file size, leaving a hole inbetween its write and the
 original size.

a) the segment isn't sparse and b) there were whole segments full of
nuls between the end of the tables and the final blocks.

So the file was definitely extended by Postgres, not the OS and the
bgwriter passes EXTENSION_FAIL which means it wouldn't create those
intervening segments.

 Are there any other relations that are as big as the corrupted relations
 got extended to?

I was wondering the same thing. But no, the extended file is 39GB
larger than the next largest relation.

Also, the final block in the first relation is clearly a version of
the same block from the correct location. Look at the ctids and the
xids on the rows for lp 3 in the attached file for example. That
second copy is from the location in the xlog record.


-- 
greg
=# select * from (select relfilenode,(pg_relation_size(oid)/8192)::integer-1 as 
blockno,((pg_relation_size(oid)/8192)::integer-1)/1024/128 as 
segnum,((pg_relation_size(oid)/8192)::integer-1)%(1024*1024*128) as offset, 
(page_header(get_raw_page(oid::regclass::text,'main',(pg_relation_size(oid)/8192)::integer-1))).*
 from pg_class where pg_relation_size(oid)  4*8192 and 
(page_header(get_raw_page(oid::regclass::text,'main',(pg_relation_size(oid)/8192)::integer-2))).upper=0
 order by lsn) as x ;
 relfilenode | blockno  | segnum |  offset  |lsn | tli | flags | lower 
| upper | special | pagesize | version | prune_xid  
-+--++--++-+---+---+---+-+--+-+
  473158 | 18090059 |138 | 18090059 | EA1/625E28 |   6 | 0 |   144 
|   896 |8192 | 8192 |   4 | 1401029863
 1366221 |  2208159 | 16 |  2208159 | EA1/62DDF0 |   6 | 0 |  1180 
|  3552 |8176 | 8192 |   4 |  0
 1261982 |  7141472 | 54 |  7141472 | EA1/638988 |   6 | 0 |  1240 
|  3312 |8176 | 8192 |   4 |  0
 1364767 |  3631161 | 27 |  3631161 | EA1/63A0A8 |   6 | 0 |  1180 
|  3552 |8176 | 8192 |   4 |  0
 1519286 |   311808 |  2 |   311808 | EA1/708B08 |   6 | 1 |   312 
|  3040 |8192 | 8192 |   4 |  0
(5 rows)

=# select 
(heap_page_items(get_raw_page(16531::regclass::text,'main',18090059))).*;
 lp | lp_off | lp_flags | lp_len |   t_xmin   |   t_xmax   | t_field3 |
t_ctid| t_infomask2 | t_infomask | t_hoff | 
 t_bits  | t_oid 
++--++++--+--+-+++--+---
  1 |   7024 |1 |328 | 1358232426 | 1401029863 |1 | 
(9386399,3)  |   49163 |   9475 | 32 | 
1010 |  
  2 |  1 |2 |  0 |||  | 
 | |||  
|  
  3 |   6560 |1 |464 | 1401029863 |  0 |1 | 
(9386399,3)  |   32779 |  10499 | 32 | 
1010 |  
  4 |  5 |2 |  0 |||  | 
 | |||  
|  
  5 |   8008 |1 |184 |  2 |  0 |0 | 
(9386399,5)  |   32779 |  10499 | 32 | 
1010 |  
  6 |   6416 |1 |144 | 1418089052 | 1418089053 |0 | 
(9386399,7)  |   16395 |259 | 32 | 
1010 |  
  7 |   6272 |1 |144 | 1418089053 |  0 |0 | 
(9386399,7)  |   32779 |  10243 | 32 | 
1010 |  
  8 |   6128 |1 |144 | 1418089056 | 1418089060 |0 | 
(9386399,11) |   16395 |   1283 | 32 | 
1010 |  
  9 | 10 |2 |  0 |||  | 
 | |||  
|  
 10 |   7864 |1 |144 |  2 |  0 |0 | 
(9386399,10) |   32779 |  10499 | 32 | 

Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com wrote:
 That's not necessarily true. If e.g. the buffer mapping would change
 racily, the result write from the bgwriter could very well end up
 increasing the file size, leaving a hole inbetween its write and the
 original size.

 a) the segment isn't sparse and b) there were whole segments full of
 nuls between the end of the tables and the final blocks.

 So the file was definitely extended by Postgres, not the OS and the
 bgwriter passes EXTENSION_FAIL which means it wouldn't create those
 intervening segments.

But ... when InRecovery, md.c will create such segments too.  We had
dismissed that on the grounds that the files would be sparse because
of the way md.c creates them.  However, it is real damn hard to see
how the loop in XLogReadBufferExtended could've accessed a bogus block,
other than hardware misfeasance which I don't believe any more than
you do.  The blkno that's passed to that function came directly out
of a WAL record that's in the private memory of the startup process
and recently passed a CRC check.  You'd have to believe some sort
of asynchronous memory clobber inside the startup process.

On the other hand, if _mdfd_getseg did the deed, there's a whole lot
more space for something funny to have happened, because now we're
talking about a buffer being written in preparation for eviction
from shared buffers, long after WAL replay filled it.

So I'm wondering if there's something wrong with our deduction from
file non-sparseness.  In this connection, google quickly found me
a report of XFS losing the sparse state of a file across multiple
writes:
http://oss.sgi.com/archives/xfs/2011-06/msg00225.html
I wonder whether that bug or a similar one exists in your production
kernel.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Andres Freund
On 2014-02-06 18:42:05 -0500, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
  On Thu, Feb 6, 2014 at 11:48 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
  That's not necessarily true. If e.g. the buffer mapping would change
  racily, the result write from the bgwriter could very well end up
  increasing the file size, leaving a hole inbetween its write and the
  original size.
 
  a) the segment isn't sparse and b) there were whole segments full of
  nuls between the end of the tables and the final blocks.
 
  So the file was definitely extended by Postgres, not the OS and the
  bgwriter passes EXTENSION_FAIL which means it wouldn't create those
  intervening segments.
 
 But ... when InRecovery, md.c will create such segments too.  We had
 dismissed that on the grounds that the files would be sparse because
 of the way md.c creates them.  However, it is real damn hard to see
 how the loop in XLogReadBufferExtended could've accessed a bogus block,
 other than hardware misfeasance which I don't believe any more than
 you do.  The blkno that's passed to that function came directly out
 of a WAL record that's in the private memory of the startup process
 and recently passed a CRC check.  You'd have to believe some sort
 of asynchronous memory clobber inside the startup process.

That reminds me, not that I directly see how it could be responsible,
there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I
don't think we came to a agreement in that thread how to fix the
problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: adt Makefile, was Re: [HACKERS] jsonb and nested hstore

2014-02-06 Thread Michael Paquier
On Fri, Feb 7, 2014 at 1:18 AM, Andrew Dunstan and...@dunslane.net wrote:

 On 02/01/2014 05:20 PM, Andres Freund wrote:

 diff --git a/src/backend/utils/adt/Makefile
 b/src/backend/utils/adt/Makefile
 index 1ae9fa0..fd93d9b 100644
 --- a/src/backend/utils/adt/Makefile
 +++ b/src/backend/utils/adt/Makefile
 @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o
  array_typanalyze.o \
tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \
tsvector.o tsvector_op.o tsvector_parser.o \
txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \
 -  rangetypes_typanalyze.o rangetypes_selfuncs.o
 +  rangetypes_typanalyze.o rangetypes_selfuncs.o \
 +  jsonb.o jsonb_support.o

 Odd, most OBJS lines are kept in alphabetical order, but that doesn't
 seem to be the case here.



 This whole list is a mess, and we don't even have all the range_types files
 following each other.

 Worth cleaning up?
+1. Yes please.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery inconsistencies, standby much larger than primary

2014-02-06 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 That reminds me, not that I directly see how it could be responsible,
 there's still 20131029011623.gj20...@awork2.anarazel.de ff. around. I
 don't think we came to a agreement in that thread how to fix the
 problem.

Hm, yeah.  I'm not sure I believe Heikki's argument that we need to avoid
closing the smgr relation.  If that stuff is being used in any
performance-critical paths then we've got trouble already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-02-06 Thread Amit Kapila
On Thu, Feb 6, 2014 at 3:42 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, I've understood how this works and seems working as
 expected.

 Anyway this is just a test module so if things works for you by
 changing the above way, its fine. However I wonder why its not
 generating .def file for you.

 Surely.

 Getting back on topic, using dsm_keep_segment, I saw postmaster
 keeping the section handle for the dsm segment and any session
 ever after the creating session gone off could recall the
 segment, unlike dsm_keep_mapping couldn't retain them after end
 of the creating session. And this exactly resembles linux version
 in behavior including attach failure.

 The orphan section handles on postmaster have become a matter of
 documentation.

 Besides all above, I'd like to see a comment for the win32 code
 about the 'DuplicateHandle hack', specifically, description that
 the DuplicateHandle pushes the copy of the section handle to the
 postmaster so the section can retain for the postmaster lifetime.

Thanks. I shall handle all comments raised by you and send
an updated patch tomorrow.

 By the way I have one additional comment.

 All postgres processes already keep a section handle for
 'Global/PostgreSQL:pgdata file path' aside from dsm. It tells
 itself is for the file. On the other hand the names for the dsm
 sections are 'Global/PostgreSQL.%d'. This seems a bit unkindly as
 they don't tell what they are of. Do you mind changing it to,
 say, 'Global/PostgreSQL.dsm.%d' or something like that?

I am not sure if there is any benefit of changing the name as it
is used for identification of segment internally and it is uniquely
identified by segment identifier (segment handle), also I think
something similar is use for posix implementation in
dsm_impl_posix(), so we might need to change that as well.

Also as this name is not introduced by this patch, so I think
it will better to keep this as note to committer for this patch and
if there is an agreement for changing it, I will update the patch.
Whats your opinion?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Row-security on updatable s.b. views

2014-02-06 Thread Craig Ringer
On 02/06/2014 11:11 PM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 We don't rerun rewrite on plan invalidation.
 
 Don't we?  plancache.c certainly does, in fact it starts from the raw
 grammar output.  Skipping the rewriter would mean failing to respond
 to CREATE OR REPLACE VIEW, for example.

I was thinking about exactly that case as I went to sleep - especially
as it's things like CREATE OR REPLACE VIEW that prevent me from just
rewriting the security qual when it's initially added, before storage in
the catalogs.

I could've sworn discussion around row security in the past concluded
that it couldn't be properly done in the rewriter because of an
inability to correctly invalidate plans. Searching the archives, though,
I don't find anything to support that. I'll just say I'm glad to be
wrong, and proceed from there.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-06 Thread Hiroshi Inoue
(2014/02/05 14:52), Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 On 02/05/2014 06:29 AM, Tom Lane wrote:
 I had been okay with the manual PGDLLIMPORT-sprinkling approach
 (not happy with it, of course, but prepared to tolerate it) as long
 as I believed the buildfarm would reliably tell us of the need for
 it.  That assumption has now been conclusively disproven, though.
 
 I'm kind of horrified that the dynamic linker doesn't throw its toys
 when it sees this.
 
 Indeed :-(.
 
 The truly strange part of this is that it seems that the one Windows
 buildfarm member that's telling the truth (or most nearly so, anyway)
 is narwhal, which appears to have the oldest and cruftiest toolchain
 of the lot.  I'd really like to come out the other end of this
 investigation with a clear understanding of why the newer toolchains
 are failing to report a link problem, and yet not building working
 executables.

Is it a linkage error?
Could you please show me the error message concretely?

regards,
Hiroshi Inoue



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread James Sewell
I've just noticed that on PostgreSQL 9.3 I can do the following with a
master node A and a slave node B (as long as I have set
recovery_target_timeline = 'latest'):

   1. Stop Node A
   2. Promote Node B
   3. Attach Node A as slave

This is sufficient for my needs (I know it doesn't cover a crash), can
anyone see any potential problems with this approach?


Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Wed, Feb 5, 2014 at 6:03 PM, Michael Paquier
michael.paqu...@gmail.comwrote:

 On Wed, Feb 5, 2014 at 3:14 PM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  On Wed, Feb 5, 2014 at 10:30 AM, James Sewell james.sew...@lisasoft.com
 
  I've seen some proposals and a tool (pg_rewind), but all seem to have
 draw
  backs.
 
  As far as I remember, one of the main drawbacks for pg_rewind was
 related to
  hint bits which can be avoided by wal_log_hints. pg_rewind is not part of
  core
  PostgreSQL code, however if you wish, you can try that tool to see if
 can it
  solve your purpose.
 For 9.3, pg_rewind is only safe with page checksums enabled. For 9.4,
 yes wal_log_hints or checksums is mandatory. The code contains as well
 some safety checks as well to ensure that a node not using those
 parameters cannot be rewinded.
 Regards,
 --
 Michael


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration

2014-02-06 Thread Amit Kapila
On Tue, Feb 4, 2014 at 9:36 AM, Tatsuo Ishii is...@postgresql.org wrote:
 As you can see, at 2014-02-04 12:47:27.210981+09 the query SELECT
 count(*) FROM pg_catalog.pg_class... is active and it seems still
 running.

 On the other side, Here is an excerpt from PostgreSQL log:

 21850 2014-02-04 12:47:11.241 JST LOG:  execute pgpool21805/pgpool21805: 
 SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = 
 pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u'
 21850 2014-02-04 12:47:11.241 JST LOG:  duration: 0.078 ms

 The duration was shown as 0.078 ms thus it seems the query has been
 already finished.

 The reason why pg_stat_activity thinks that the query in question is,
 sync message has not been sent to the backend yet (at least from
 what I read from postgres.c).

I think that is the probable reason for the above mentioned behaviour.
As I understand here, the problem is that 'state' of backend is shown as
active along with 'query' which according to docs (If state is active this field
shows the currently executing query.) means that query is executing.

This statement holds true for simple query but for prepared statement
(using message 'P', 'B', 'D', 'E', 'S') it might not be completely right as
we update the state only after sync message which can confuse some
users as you have stated. However I don't think it is good idea to change
state in between different messages or at least with the current set of
states.

 I think this inconsistency is not very intutive to users...

Do you think we can fix it in any easy way, or might be updating docs
can make users understand the current situation better?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration

2014-02-06 Thread Amit Kapila
On Fri, Feb 7, 2014 at 10:43 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Feb 4, 2014 at 9:36 AM, Tatsuo Ishii is...@postgresql.org wrote:
 As you can see, at 2014-02-04 12:47:27.210981+09 the query SELECT
 count(*) FROM pg_catalog.pg_class... is active and it seems still
 running.

 On the other side, Here is an excerpt from PostgreSQL log:

 21850 2014-02-04 12:47:11.241 JST LOG:  execute pgpool21805/pgpool21805: 
 SELECT count(*) FROM pg_catalog.pg_class AS c WHERE c.oid = 
 pgpool_regclass('pgbench_accounts') AND c.relpersistence = 'u'
 21850 2014-02-04 12:47:11.241 JST LOG:  duration: 0.078 ms

 The duration was shown as 0.078 ms thus it seems the query has been
 already finished.

 The reason why pg_stat_activity thinks that the query in question is,
 sync message has not been sent to the backend yet (at least from
 what I read from postgres.c).

 I think that is the probable reason for the above mentioned behaviour.

One more catch is that I think it can happen for simple query statements
as well, because we first logs the query end duration and then updates
the state and both are not related in code, so there is always a small
window during which this can happen.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-02-06 Thread Hiroshi Inoue
OK I can see the error message at PostgreSQL Build Farm Log.

I see a similar problem in a simple test case.

gcc -DCALLPG=\import1\ -c export.c -o export1.o
dlltool --export-all --output-def export1.def export1.o
dlltool --dllname export1.exe --def export1.def --output-lib export1.a
dlltool --dllname export1.exe --output-exp export1.exp --def export1.def
gcc -o export1.exe -Wl,--base-file,export1.base export1.exp export1.o
dlltool --dllname export1.exe --base-file export1.base --output-exp
export1.exp
--def export1.def
gcc   -Wl,--stack=65536 -o export1.exe export1.exp export1.o
rm -f export1.exp export1.base
gcc  -DCALLPG=\import1\ -c import.c -o import1.o
dlltool --export-all --output-def import1.def import1.o
dllwrap -o import1.dll --dllname import1.dll  import1.o  export1.a --def
import1.def
Info: resolving _intnum by linking to __imp__intnum (auto-import)
fu01.o:(.idata$2+0xc): undefined reference to `export1_a_iname'
nmth00.o:(.idata$4+0x0): undefined reference to `_nm__intnum'
collect2: ld returned 1 exit status


Adding __declspec(dllimport) fixes the problem.

Using gcc only build also fixes the problem.
 gcc -DCALLPG=\import2\ -c export.c -o export2.o
 gcc -Wl,--export-all-symbols -Wl,--out-implib=export2.a -o export2
export2.o
 Creating library file: export2.a
 gcc -DCALLPG=\import2\ -c import.c -o import2.o
 gcc -shared -o import2.dll import2.o export2.a
-Wl,--out-implib=import2.dll.a
 Info: resolving _intnum by linking to __imp__intnum (auto-import)
 Creating library file: import2.dll.a
Though I'm not a MINGW expert at all, I know dllwrap is a deprecated
tool and dlltool is almost a deprecated tool. Cygwin port is removing
the use of dllwrap and dlltool now. Isn't it better for MINGW port to
follow it?

Changing the machine from the one using gcc 3.4.5 to another one
using 4.6.1 also fixes the problem.

regards,
Hiroshi Inoue

(2014/02/07 12:25), Hiroshi Inoue wrote:
 (2014/02/05 14:52), Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 On 02/05/2014 06:29 AM, Tom Lane wrote:
 I had been okay with the manual PGDLLIMPORT-sprinkling approach
 (not happy with it, of course, but prepared to tolerate it) as long
 as I believed the buildfarm would reliably tell us of the need for
 it.  That assumption has now been conclusively disproven, though.

 I'm kind of horrified that the dynamic linker doesn't throw its toys
 when it sees this.

 Indeed :-(.

 The truly strange part of this is that it seems that the one Windows
 buildfarm member that's telling the truth (or most nearly so, anyway)
 is narwhal, which appears to have the oldest and cruftiest toolchain
 of the lot.  I'd really like to come out the other end of this
 investigation with a clear understanding of why the newer toolchains
 are failing to report a link problem, and yet not building working
 executables.
 
 Is it a linkage error?
 Could you please show me the error message concretely?
 
 regards,
 Hiroshi Inoue


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread Michael Paquier
On Fri, Feb 7, 2014 at 1:57 PM, James Sewell james.sew...@lisasoft.comwrote:

 I've just noticed that on PostgreSQL 9.3 I can do the following with a
 master node A and a slave node B (as long as I have set
 recovery_target_timeline = 'latest'):

1. Stop Node A
2. Promote Node B
3. Attach Node A as slave

 This is sufficient for my needs (I know it doesn't cover a crash), can
 anyone see any potential problems with this approach?

Yes, node A could get ahead of the point where WAL forked when promoting B.
In this case you cannot reconnect A to B, and need to actually recreate a
node from a fresh base backup, or rewind it. pg_rewind targets the latter,
postgres core is able to to the former, and depending on things like your
environment and/or the size of your server, you might prefer one or the
other.
Regards,
-- 
Michael


Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread James Sewell
Node A could get ahead even if it has been shut down cleanly BEFORE the
promotion?

I'd always assumed if I shut down the master the slave would be at the same
point after shutdown - is this incorrect?

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Fri, Feb 7, 2014 at 4:58 PM, Michael Paquier
michael.paqu...@gmail.comwrote:




 On Fri, Feb 7, 2014 at 1:57 PM, James Sewell james.sew...@lisasoft.comwrote:

 I've just noticed that on PostgreSQL 9.3 I can do the following with a
 master node A and a slave node B (as long as I have set
 recovery_target_timeline = 'latest'):

1. Stop Node A
2. Promote Node B
3. Attach Node A as slave

 This is sufficient for my needs (I know it doesn't cover a crash), can
 anyone see any potential problems with this approach?

 Yes, node A could get ahead of the point where WAL forked when promoting
 B. In this case you cannot reconnect A to B, and need to actually recreate
 a node from a fresh base backup, or rewind it. pg_rewind targets the
 latter, postgres core is able to to the former, and depending on things
 like your environment and/or the size of your server, you might prefer one
 or the other.
 Regards,
 --
 Michael


-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


[HACKERS] Typo in a comment?

2014-02-06 Thread Amit Langote
In src/backend/storage/freespace/freespace.c,

 *
 * MaxFSMRequestSize depends on the architecture and BLCKSZ, but assuming
 * default 8k BLCKSZ, and that MaxFSMRequestSize is 24 bytes, the categories
 * look like this
 *

Is 24 bytes a typo considering that

#define MaxFSMRequestSize   MaxHeapTupleSize

?

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration

2014-02-06 Thread Tatsuo Ishii
 I think that is the probable reason for the above mentioned behaviour.
 As I understand here, the problem is that 'state' of backend is shown as
 active along with 'query' which according to docs (If state is active this 
 field
 shows the currently executing query.) means that query is executing.
 
 This statement holds true for simple query but for prepared statement
 (using message 'P', 'B', 'D', 'E', 'S') it might not be completely right as
 we update the state only after sync message which can confuse some
 users as you have stated. However I don't think it is good idea to change
 state in between different messages or at least with the current set of
 states.
 
 I think this inconsistency is not very intutive to users...
 
 Do you think we can fix it in any easy way, or might be updating docs
 can make users understand the current situation better?

One idea is, calling pgstat_report_activity(STATE_IDLE) in
exec_execute_message() of postgres.c. The function has already called
pgstat_report_activity(STATE_RUNNING) which shows active state in
pg_stat_actviity view. So why cann't we call
pgstat_report_activity(STATE_IDLE) here.

Somebody might claim that idle is a transaction state term. In the
case, I propose to add new state name, say finished. So above
proposal would calling pgstat_report_activity(STATE_FINISHED) instead.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistency between pg_stat_activity and log_duration

2014-02-06 Thread Amit Kapila
On Fri, Feb 7, 2014 at 11:46 AM, Tatsuo Ishii is...@postgresql.org wrote:
 I think that is the probable reason for the above mentioned behaviour.
 As I understand here, the problem is that 'state' of backend is shown as
 active along with 'query' which according to docs (If state is active this 
 field
 shows the currently executing query.) means that query is executing.

 This statement holds true for simple query but for prepared statement
 (using message 'P', 'B', 'D', 'E', 'S') it might not be completely right as
 we update the state only after sync message which can confuse some
 users as you have stated. However I don't think it is good idea to change
 state in between different messages or at least with the current set of
 states.

 I think this inconsistency is not very intutive to users...

 Do you think we can fix it in any easy way, or might be updating docs
 can make users understand the current situation better?

 One idea is, calling pgstat_report_activity(STATE_IDLE) in
 exec_execute_message() of postgres.c. The function has already called
 pgstat_report_activity(STATE_RUNNING) which shows active state in
 pg_stat_actviity view. So why cann't we call
 pgstat_report_activity(STATE_IDLE) here.

 Somebody might claim that idle is a transaction state term.

Idle means The backend is waiting for a new client command., which
is certainly not true especially in case of 'E' message as still sync
message processing is left.

 In the
 case, I propose to add new state name, say finished. So above
 proposal would calling pgstat_report_activity(STATE_FINISHED) instead.

Okay, so by state finish, it can mean The backend has finished execution
of a query.. In that case I think this might need to be called at end
of exec_simple_query() as well, but then there will be very less difference
between idle and finish for simple query.

The argument here could be do we really need a new state for such a short
window between completion of 'E' message and processing of 'S' sync
message considering updation of state is not a very light call which can
be called between processing of 2 messages. It might make sense for cases
where sync message processing can take longer time.

Would it be not sufficient, If we just explain this in docs. Do users really
face any inconvenience or it's a matter of clear understanding for users?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread Michael Paquier
On Fri, Feb 7, 2014 at 3:02 PM, James Sewell james.sew...@lisasoft.comwrote:

 Node A could get ahead even if it has been shut down cleanly BEFORE the
 promotion?
 I'd always assumed if I shut down the master the slave would be at the
 same point after shutdown - is this incorrect?

Yes and no. A node will wait at shutdown until all the *connected* slaves
have flushed necessary WALs to disk. But if the slave is not connected at
the moment of shutdown, it might not be the case.
-- 
Michael


Re: [HACKERS] PostgreSQL Failback without rebuild

2014-02-06 Thread Albe Laurenz
Michael Paquier wrote:
 On Fri, Feb 7, 2014 at 3:02 PM, James Sewell james.sew...@lisasoft.com 
 wrote:
 
 Node A could get ahead even if it has been shut down cleanly BEFORE the 
 promotion?
 I'd always assumed if I shut down the master the slave would be at the same 
 point after shutdown
 - is this incorrect?
 
 Yes and no. A node will wait at shutdown until all the *connected* slaves 
 have flushed necessary WALs
 to disk. But if the slave is not connected at the moment of shutdown, it 
 might not be the case.

Even if the slave is connected, there was a bug:
http://www.postgresql.org/message-id/e1urwwa-0004lr...@gemulon.postgresql.org

So you need at least 9.3.0, 9.2.5 or 9.1.10 for this to work properly.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers