date:20140915

Re: [HACKERS] Triconsistent catalog declaration

2014-09-15 Thread Heikki Linnakangas


On 09/15/2014 08:56 PM, Robert Haas wrote:

On Mon, Sep 15, 2014 at 10:13 AM, Heikki Linnakangas
 wrote:

That makes for a bit awkward input and output from psql, when the values
used are 0, 1, 2, rather than ascii characters. But that's OK, because as
you said these functions are not callable from psql anyway, as they have
"internal" arguments.


Maybe we should change them to something a bit more understandable.


We can't change the return datatype to anything wider, or the values 
from 0, 1, 2, because those values have been chosen so that they are 
"compatible" with booleans. A boolean can be safely cast to a 
GinTernaryValue. I'm not sure if we make use of that anywhere ATM, but 
it's a useful property.



This requires a catalog change to fix. Are we still planning to do a
catversion bump for 9.4 because of the jsonb changes?


That was my understanding, although we seem to be proceeding at an
inexplicably glacial pace.


Ok, committed.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Anonymous code block with parameters

2014-09-15 Thread Kalyanov Dmitry

I'd like to propose support for IN and OUT parameters in 'DO' blocks.

Currently, anonymous code blocks (DO statements) can not receive or
return parameters.

I suggest:

1) Add a new clause to DO statement for specifying names, types,
directions and values of parameters:

DO  [LANGUAGE ] [USING ()]

where  has the same syntax as in
'CREATE FUNCTION  ()'.

Example:

do $$ begin z := x || y; end; $$
language plpgsql
using
(
  x text = '1',
  in out y int4 = 123,
  out z text
);

2) Values for IN and IN OUT parameters are specified using syntax for
default values of function arguments.

3) If DO statement has at least one of OUT or IN OUT parameters then it
returns one tuple containing values of OUT and IN OUT parameters.

Do you think that this feature would be useful? I have a
proof-of-concept patch in progress that I intend to publish soon.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

2014-09-15 Thread Michael Paquier

On Mon, Sep 15, 2014 at 8:00 PM, Michael Paquier
 wrote:
> Alvaro got faster than me... I was just looking at the first patch and
> +1 on those comments. It is worth noting that the first patch, as it
> does only a large refactoring, does not impact performance or size of
> WAL records.

Some comments after a second and closer read of the first patch:
1) Wouldn't it be better to call GetFullPageWriteInfo directly in
XLogRecordAssemble, removing RedoRecPtr and doPageWrites from its
arguments?
2) XLogCheckBufferNeedsBackup is not used. It can be removed.
3) If I am following correctly, there are two code paths where
XLogInsertRecData can return InvalidXLogRecPtr, and then there is this
process in XLogInsert
+   {
+   /*
+* Undo the changes we made to the rdata chain.
+*
+* XXX: This doesn't undo *all* the changes;
the XLogRecData
+* entries for buffers that we had already
decided to back up have
+* had their data-pointers cleared. That's OK,
as long as we
+* decide to back them up on the next
iteration as well. Hence,
+* don't allow "doPageWrites" value to go from
true to false after
+* we've modified the rdata chain.
+*/
+   boolnewDoPageWrites;
+
+   GetFullPageWriteInfo(&RedoRecPtr, &newDoPageWrites);
+   if (!doPageWrites && newDoPageWrites)
+   doPageWrites = true;
+   rdt_lastnormal->next = NULL;
+   }
Wouldn't it be better to keep that in XLogInsertRecData? This way
GetFullPageWriteInfo could be removed (I actually see little point in
keeping it as part of this patch, but does this change make its sense
in patch 2?).
4) This patch is in DOS encoding (?!)

That's all I have for now...
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-09-15 Thread Alvaro Herrera

Jan Wieck wrote:
> On 09/14/2014 02:25 PM, Pavel Stehule wrote:
> >a) Isn't possible handle a assertion exception anywhere .. it enforce
> >ROLLBACK in 100%
> >
> >b) Assertions should be disabled globally .. I am not sure, it it is a
> >good idea, but I can understand so some tests based on queries to data
> >can be performance issue.
> >
> >Important question is a relation assertations and exceptions. Is it only
> >shortcut for exception or some different?
> 
> I think that most data integrity issues can be handled by a well
> designed database schema that uses UNIQUE, NOT NULL, REFERENCES and
> CHECK constraints. Assertions are usually found inside of complex
> code constructs to check values of local variables. I don't think it
> is even a good idea to implement assertions that can query arbitrary
> data.

Actually Peter Eisentraut posted a patch for SQL assertions:
http://www.postgresql.org/message-id/1384486216.5008.17.ca...@vanquo.pezone.net

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

2014-09-15 Thread Michael Paquier

On Mon, Sep 15, 2014 at 7:03 PM, Alvaro Herrera
 wrote:
> Heikki Linnakangas wrote:
>
>> Here we go. I've split this again into two patches. The first patch
>> is just refactoring the current code. It moves XLogInsert into a new
>> file, xloginsert.c, and the definition of XLogRecord to new
>> xlogrecord.h header file. As a result, there is a a lot of churn in
>> the #includes in C files that generate WAL records, or contain redo
>> routines.  The number of files that pull in xlog.h - directly or
>> indirectly through other headers - is greatly reduced.
>
> I think you should push the first patch for now and continue to
> investigate the issues in the second patch. Some minor suggestions I
> have for the first patch are that since xlog.h is no longer used by
> other headers (but only by .c files), you should just #include
> xloginsert.h instead of doing the forward declaration of struct
> XLogRecData; and in xlog_internal.h, instead of
>
> - * XLogRecord is defined in xlog.h, but we avoid #including that to keep
> + * XLogRecord is defined in xlogrecord.h, but we avoid #including that to 
> keep
>
> I would just suggest to #include xlogrecord.h, since it should just
> work to include that file from frontend programs.  It didn't work for
> xlog.h because that one #includes fmgr.h IIRC and that one causes Datum
> to appear, which makes compilation blow up.  Shouldn't be the case here.

Alvaro got faster than me... I was just looking at the first patch and
+1 on those comments. It is worth noting that the first patch, as it
does only a large refactoring, does not impact performance or size of
WAL records. Btw, a declaration of access/xlog.h in fd.c is forgotten.
Also, if somebody is interested in running the test suite, there are
comments on how to do it at the top of compare.sql.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-09-15 Thread Etsuro Fujita

(2014/09/13 0:13), Tom Lane wrote:
> Albe Laurenz  writes:
>> Tom Lane wrote:
>>> I'm not sure offhand what the new plan tree ought to look like.  We could
>>> just generate a ForeignScan node, but that seems like rather a misnomer.
>>> Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
>>> be ForeignScan but with a flag field saying "hey this is really an update
>>> (or a delete)".  The main benefit I can think of right now is that the
>>> EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
>>> the only thing that ever looks at plan trees, so having an outright
>>> misleading plan structure is likely to burn us down the line.

> I was envisioning that the EXPLAIN output would look like
> 
>  Foreign Scan on tab1
>Remote SQL: SELECT ...
> 
> for the normal case, versus
> 
>  Foreign Update on tab1
>Remote SQL: UPDATE ...
> 
> for the pushed-down-update case (and similarly for DELETE).  For a
> non-optimized update it'd still be a ForeignScan underneath a ModifyTable.

> As for the internal representation, I was thinking of adding a CmdType
> field to struct ForeignScan, with currently only CMD_SELECT, CMD_UPDATE,
> CMD_DELETE as allowed values, though possibly in future we'd think of a
> reason to allow CMD_INSERT there.

+1

> The only thing that's bothering me about this concept is that I'm not
> seeing how to scale it up to handling a pushed-down update on a join,
> ie, "UPDATE foo ... FROM bar ..." where both foo and bar are remote.
> Maybe it's silly to worry about that until join push-down is done;
> but in that case I'd vote for postponing this whole patch until we
> have join push-down.

OK

Thanks,

PS: I'll help Hanada-san do the work if there is anything I can do.

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

2014-09-15 Thread Alvaro Herrera

Heikki Linnakangas wrote:

> Here we go. I've split this again into two patches. The first patch
> is just refactoring the current code. It moves XLogInsert into a new
> file, xloginsert.c, and the definition of XLogRecord to new
> xlogrecord.h header file. As a result, there is a a lot of churn in
> the #includes in C files that generate WAL records, or contain redo
> routines.  The number of files that pull in xlog.h - directly or
> indirectly through other headers - is greatly reduced.

I think you should push the first patch for now and continue to
investigate the issues in the second patch.  Some minor suggestions I
have for the first patch are that since xlog.h is no longer used by
other headers (but only by .c files), you should just #include
xloginsert.h instead of doing the forward declaration of struct
XLogRecData; and in xlog_internal.h, instead of 

- * XLogRecord is defined in xlog.h, but we avoid #including that to keep
+ * XLogRecord is defined in xlogrecord.h, but we avoid #including that to keep

I would just suggest to #include xlogrecord.h, since it should just
work to include that file from frontend programs.  It didn't work for
xlog.h because that one #includes fmgr.h IIRC and that one causes Datum
to appear, which makes compilation blow up.  Shouldn't be the case here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Craig Ringer

On 09/16/2014 07:44 AM, Peter Geoghegan wrote:
> FWIW, I am slightly concerned about weighing use cases around very
> large JSON documents too heavily. Having enormous jsonb documents just
> isn't going to work out that well, but neither will equivalent designs
> in popular document database systems for similar reasons. For example,
> the maximum BSON document size supported by MongoDB is 16 megabytes,
> and that seems to be something that their users don't care too much
> about. Having 270 pairs in an object isn't unreasonable, but it isn't
> going to be all that common either.

Also, at a certain size the fact that Pg must rewrite the whole document
for any change to it starts to introduce other practical changes.

Anyway - this is looking like the change will go in, and with it a
catversion bump. Introduction of a jsonb version/flags byte might be
worthwhile at the same time. It seems likely that there'll be more room
for improvement in jsonb, possibly even down to using different formats
for different data.

Is it worth paying a byte per value to save on possible upgrade pain?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Arthur Silva

I couldn't get my hands on the twitter data but I'm generating my own. The
json template is http://paste2.org/wJ1dfcjw and data was generated with
http://www.json-generator.com/. It has 35 top level keys, just in case
someone is wondering.
I generated 1 random objects and I'm inserting them repeatedly until I
got 320k rows.

Test query: SELECT data->>'name', data->>'email' FROM t_json
Test storage: EXTERNAL
Test jsonb lengths quartiles: {1278,1587,1731,1871,2231}
Tom's lengths+cache aware: 455ms
HEAD: 440ms

This is a realistic-ish workload in my opinion and Tom's patch performs
within 4% of HEAD.

Due to the overall lenghts I couldn't really test compressibility so I
re-ran the test. This time I inserted an array of 2 objects in each row, as
in: [obj, obj];
The objects where taken in sequence from the 1 pool so contents match
in both tests.

Test query: SELECT data #> '{0, name}', data #> '{0, email}', data #> '{1,
name}', data #> '{1, email}' FROM t_json
Test storage: EXTENDED
HEAD: 17mb table + 878mb toast
HEAD size quartiles: {2015,2500,2591,2711,3483}
HEAD query runtime: 15s
Tom's: 220mb table + 580mb toast
Tom's size quartiles: {1665,1984,2061,2142.25,2384}
Tom's query runtime: 13s

This is an intriguing edge case that Tom's patch actually outperform the
base implementation for 3~4kb jsons.

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 4:21 PM, Peter Geoghegan  wrote:
> I attach a much simpler patch, that only adds an opportunistic
> "memcmp() == 0" before a possible strcoll().  Both
> bttextfastcmp_locale() and varstr_cmp() have the optimization added,
> since there is no point in leaving anyone out for this part.

FWIW, it occurs to me that this could be a big win for cases like
ExecMergeJoin(). Obviously, abbreviated keys will usually make your
merge join on text attributes a lot faster in the common case where a
sort is involved (if we consider that the sort is integral to the cost
of the join). However, when making sure that inner and outer tuples
match, the MJCompare() call will *very* frequently get away with a
cheap memcmp().  That could make a very significant additional
difference, I think.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus  wrote:
> Actually, having the keys all at the same level *is* relevant for the
> issue we're discussing.  If those 270 keys are organized in a tree, it's
> not the same as having them all on one level (and not as problematic).

I believe Robert meant that the 270 keys are not at the top level, but
are at some level (in other words, some object has 270 pairs). That is
equivalent to having them at the top level for the purposes of this
discussion.

FWIW, I am slightly concerned about weighing use cases around very
large JSON documents too heavily. Having enormous jsonb documents just
isn't going to work out that well, but neither will equivalent designs
in popular document database systems for similar reasons. For example,
the maximum BSON document size supported by MongoDB is 16 megabytes,
and that seems to be something that their users don't care too much
about. Having 270 pairs in an object isn't unreasonable, but it isn't
going to be all that common either.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2014-09-15 Thread Emanuel Calvo


El 14/09/14 17:37, Simon Riggs escribió:
> On 12 September 2014 18:19, Simon Riggs  wrote:
>> On 12 September 2014 15:30, Tom Lane  wrote:
>>> After a little bit I remembered there was already a function for this.
>>> So specifically, I'd suggest using ExecRelationIsTargetRelation()
>>> to decide whether to mark the scan as requiring pruning.
>> Sounds cool. Thanks both, this is sounding like a viable route now.
> Yes, this is viable.
>
> Patch attached, using Alvaro's idea of use-case specific pruning and
> Tom's idea of aiming at target relations. Patch uses or extends
> existing infrastructure, so its shorter than it might have been, yet
> with all that bufmgr yuck removed.
>
> This is very, very good because while going through this I notice the
> dozen or more places where we were pruning blocks in annoying places I
> didn't even know about such as about 4-5 constraint checks. In more
> than a few DDL commands like ALTER TABLE and CLUSTER we were even
> pruning the old relation prior to rewrite.
>

A simple performance test with the following variables:
LOOP=50
CONN=60
TXSS=500
SCALE=30


Select only:
WITH PATCH
Average:  20716.1 tps

NO PATCH
Average:  19141.7 tps


With writes:
WITH PATCH
Average:  2602.65

NO PATCH
Average:  2565.32


TODO:
- Consistency check.
- ALTER and CLUSTER test.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 11:25 AM, Peter Geoghegan  wrote:
> OK, I'll draft a patch for that today, including similar alterations
> to varstr_cmp() for the benefit of Windows and so on.

I attach a much simpler patch, that only adds an opportunistic
"memcmp() == 0" before a possible strcoll().  Both
bttextfastcmp_locale() and varstr_cmp() have the optimization added,
since there is no point in leaving anyone out for this part.

When this is committed, and I hear back from you on the question of
what to do about having an independent sortsupport state for
abbreviated tie-breakers (and possibly other issues of concern), I'll
produce a rebased patch with a single commit.

-- 
Peter Geoghegan
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
new file mode 100644
index 7549542..53be8ec
*** a/src/backend/utils/adt/varlena.c
--- b/src/backend/utils/adt/varlena.c
*** varstr_cmp(char *arg1, int len1, char *a
*** 1405,1410 
--- 1405,1438 
  #endif
  		}
  
+ 		/*
+ 		 * Fast path:  Attempt an entirely opportunistic "memcmp() == 0".
+ 		 *
+ 		 * In general there is no reason to be optimistic about being able to
+ 		 * resolve the string comparison in this way.  There are many usage
+ 		 * patterns in which this fast path will never be taken, and some in
+ 		 * which it will frequently be taken.  When it is frequently taken,
+ 		 * clearly the optimization is worthwhile, since strcoll() is known to
+ 		 * be very expensive.  When it is not taken, the overhead is completely
+ 		 * negligible, and perhaps even immeasurably small, even in the worst
+ 		 * case.  We have nothing to lose and everything to gain.
+ 		 *
+ 		 * These counter-intuitive performance characteristics are likely
+ 		 * explained by the dominant memory latency cost when performing a
+ 		 * strcoll();  in practice, this may give the compiler and CPU
+ 		 * sufficient leeway to order things in a way that hides the overhead
+ 		 * of useless memcmp() calls.  Sorting is very probably bottlenecked on
+ 		 * memory bandwidth, and so it's worth it to use a few arithmetic
+ 		 * instructions on the off chance that it results in a saving in memory
+ 		 * bandwidth.  In effect, this makes use of a capacity that would
+ 		 * otherwise go unused.  The same memory needed to be read into CPU
+ 		 * cache at this juncture anyway.  Modern CPUs can execute hundreds of
+ 		 * instructions in the time taken to fetch a single cache line from
+ 		 * main memory.
+ 		 */
+ 		if (len1 == len2 && memcmp(arg1, arg2, len1) == 0)
+ 			return 0;
+ 
  #ifdef WIN32
  		/* Win32 does not have UTF-8, so we need to map to UTF-16 */
  		if (GetDatabaseEncoding() == PG_UTF8)
*** bttextfastcmp_locale(Datum x, Datum y, S
*** 1842,1847 
--- 1870,1885 
  	len1 = VARSIZE_ANY_EXHDR(arg1);
  	len2 = VARSIZE_ANY_EXHDR(arg2);
  
+ 	/*
+ 	 * Fast path:  Attempt an entirely opportunistic memcmp() == 0, as
+ 	 * explained within varstr_cmp()
+ 	 */
+ 	if (len1 == len2 && memcmp(a1p, a2p, len1) == 0)
+ 	{
+ 		result = 0;
+ 		goto done;
+ 	}
+ 
  	if (len1 >= tss->buflen1)
  	{
  		pfree(tss->buf1);
*** bttextfastcmp_locale(Datum x, Datum y, S
*** 1875,1880 
--- 1913,1919 
  	if (result == 0)
  		result = strcmp(tss->buf1, tss->buf2);
  
+ done:
  	/* We can't afford to leak memory here. */
  	if (PointerGetDatum(arg1) != x)
  		pfree(arg1);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suspicious check (src/backend/access/gin/gindatapage.c)

2014-09-15 Thread Michael Paquier

On Fri, Sep 12, 2014 at 5:42 PM, Heikki Linnakangas
 wrote:
> On 09/12/2014 11:38 AM, Heikki Linnakangas wrote:
>>
>> Now that the logic is fixed, I hope we
>> won't get complaints that the indexes are bigger, if you fill a table by
>> appending to the end. I wonder if we should aim at an even more uneven
>> split; the default fillfactor for B-trees is 90%, for example. I didn't
>> go that high when I wrote that, because the code in previous versions
>> always did a 50/50 split. But it could be argued that a higher
>> fillfactor makes sense for a GIN index - they typically don't get as
>> much random updates as a B-tree.
>
>
> Actually, we should add a fillfactor reloption to GIN. But that's 9.5
> material.

Actually gin is the only method that does not have this parameter, no?
Then the following extra-processing would be enough I imagine:
1) Tune freespace using fillfactor when placing keys to leaf data page
(dataPlaceToPageLeaf)
2) On split, instead of (BLCKSZ * 3) / 4, have the left leaf full at
(BLCKSZ * fillfactor) / 100
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus

On 09/15/2014 02:16 PM, Robert Haas wrote:
> On Mon, Sep 15, 2014 at 3:09 PM, Josh Berkus  wrote:
>> On 09/15/2014 10:23 AM, Claudio Freire wrote:
>>> Now, large small keys could be 200 or 2000, or even 20k. I'd guess
>>> several should be tested to find the shape of the curve.
>>
>> Well, we know that it's not noticeable with 200, and that it is
>> noticeable with 100K.  It's only worth testing further if we think that
>> having more than 200 top-level keys in one JSONB value is going to be a
>> use case for more than 0.1% of our users.  I personally do not.
> 
> FWIW, I have written one (1) application that uses JSONB and it has
> one sub-object (not the top-level object) that in the most typical
> configuration contains precisely 270 keys.  Now, granted, that is not
> the top-level object, if that distinction is actually relevant here,
> but color me just a bit skeptical of this claim anyway.  This was just
> a casual thing I did for my own use, not anything industrial strength,
> so it's hard to believe I'm stressing the system more than 99.9% of
> users will.

Actually, having the keys all at the same level *is* relevant for the
issue we're discussing.  If those 270 keys are organized in a tree, it's
not the same as having them all on one level (and not as problematic).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collation-aware comparisons in GIN opclasses

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 12:45 PM, Tom Lane  wrote:
> No.  And we don't know how to change the default opclass without
> breaking things, either.

Is there a page on the Wiki along the lines of "things that we would
like to change if ever there is a substantial change in on-disk format
that will break pg_upgrade"? ISTM that we should be intelligently
saving those some place, just as Redhat presumably save up
ABI-breakage over many years for the next major release of RHEL.
Alexander's complaint is a good example of such a change, IMV. Isn't
it more or less expected that the day will come when we'll make a
clean break?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Robert Haas

On Mon, Sep 15, 2014 at 3:09 PM, Josh Berkus  wrote:
> On 09/15/2014 10:23 AM, Claudio Freire wrote:
>> Now, large small keys could be 200 or 2000, or even 20k. I'd guess
>> several should be tested to find the shape of the curve.
>
> Well, we know that it's not noticeable with 200, and that it is
> noticeable with 100K.  It's only worth testing further if we think that
> having more than 200 top-level keys in one JSONB value is going to be a
> use case for more than 0.1% of our users.  I personally do not.

FWIW, I have written one (1) application that uses JSONB and it has
one sub-object (not the top-level object) that in the most typical
configuration contains precisely 270 keys.  Now, granted, that is not
the top-level object, if that distinction is actually relevant here,
but color me just a bit skeptical of this claim anyway.  This was just
a casual thing I did for my own use, not anything industrial strength,
so it's hard to believe I'm stressing the system more than 99.9% of
users will.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2014-09-15 Thread Simon Riggs

On 15 September 2014 17:09, Robert Haas  wrote:

> Do we really want to disable HOT for all catalog scans?

The intention of the patch is that catalog scans are treated
identically to non-catalog scans. The idea here is that HOT cleanup
only occurs on scans on target relations, so only INSERT, UPDATE and
DELETE do HOT cleanup.

It's possible that many catalog scans don't follow the normal target
relation logic, so we might argue we should use HOT every time. OTOH,
since we now have separate catalog xmins we may find that using HOT on
catalogs is no longer effective. So I could go either way on how to
proceed; its an easy change either way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus

On 09/15/2014 12:25 PM, Claudio Freire wrote:
> On Mon, Sep 15, 2014 at 4:17 PM, Josh Berkus  wrote:
>> On 09/15/2014 12:15 PM, Claudio Freire wrote:
>>> So while you're right that it's perhaps above what would be a common
>>> use case, the range "somewhere between 200 and 100K" for the tipping
>>> point seems overly imprecise to me.
>>
>> Well, then, you know how to solve that.
> 
> 
> I was hoping testing with other numbers was a simple hitting a key for
> someone else.

Nope.  My test case has a fixed size.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collation-aware comparisons in GIN opclasses

2014-09-15 Thread Tom Lane

Peter Geoghegan  writes:
> On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
>  wrote:
>> Rename such opclasses and make them not default.
>> Create new default opclasses with bitwise comparison functions.
>> Write recommendation to re-create indexes with default opclasses into
>> documentation.

> I certainly think this should be fixed if at all possible, but I'm not
> sure about this plan. Can we really rename an opclass without
> consequence, including having that respected across pg_upgrade?

No.  And we don't know how to change the default opclass without
breaking things, either.  See previous discussions about how we
might fix the totally-broken default gist opclass that btree_gist
creates for the inet type [1].  The motivation for getting rid of that
is *way* stronger than "it might be slow", but there's no apparent
way to make something else be the default without creating havoc.

regards, tom lane

[1] 
http://www.postgresql.org/message-id/flat/cae2gyzyuesd188j0b290gf16502h9b-lwnrs3rfi1swdb9q...@mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Claudio Freire

On Mon, Sep 15, 2014 at 4:17 PM, Josh Berkus  wrote:
> On 09/15/2014 12:15 PM, Claudio Freire wrote:
>> So while you're right that it's perhaps above what would be a common
>> use case, the range "somewhere between 200 and 100K" for the tipping
>> point seems overly imprecise to me.
>
> Well, then, you know how to solve that.


I was hoping testing with other numbers was a simple hitting a key for
someone else.

But sure. I'll set something up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus

On 09/15/2014 12:15 PM, Claudio Freire wrote:
> So while you're right that it's perhaps above what would be a common
> use case, the range "somewhere between 200 and 100K" for the tipping
> point seems overly imprecise to me.

Well, then, you know how to solve that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Claudio Freire

On Mon, Sep 15, 2014 at 4:09 PM, Josh Berkus  wrote:
> On 09/15/2014 10:23 AM, Claudio Freire wrote:
>> Now, large small keys could be 200 or 2000, or even 20k. I'd guess
>> several should be tested to find the shape of the curve.
>
> Well, we know that it's not noticeable with 200, and that it is
> noticeable with 100K.  It's only worth testing further if we think that
> having more than 200 top-level keys in one JSONB value is going to be a
> use case for more than 0.1% of our users.  I personally do not.

Yes, but bear in mind that the worst case is exactly at the use case
jsonb was designed to speed up: element access within relatively big
json documents.

Having them uncompressed is expectable because people using jsonb will
often favor speed over compactness if it's a tradeoff (otherwise
they'd use plain json).

So while you're right that it's perhaps above what would be a common
use case, the range "somewhere between 200 and 100K" for the tipping
point seems overly imprecise to me.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus

On 09/15/2014 10:23 AM, Claudio Freire wrote:
> Now, large small keys could be 200 or 2000, or even 20k. I'd guess
> several should be tested to find the shape of the curve.

Well, we know that it's not noticeable with 200, and that it is
noticeable with 100K.  It's only worth testing further if we think that
having more than 200 top-level keys in one JSONB value is going to be a
use case for more than 0.1% of our users.  I personally do not.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collation-aware comparisons in GIN opclasses

2014-09-15 Thread Alexander Korotkov

On Mon, Sep 15, 2014 at 10:51 PM, Peter Geoghegan  wrote:

> On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
>  wrote:
> > some GIN opclasses uses collation-aware comparisons while they don't
> need to
> > do especially collation-aware comparison. Examples are text[] and hstore
> > opclasses. Depending on collation this may make them a much slower.
>
> I'm glad that I saw how pointless this was with the jsonb GIN default
> opclass during development.
>
> > Rename such opclasses and make them not default.
> > Create new default opclasses with bitwise comparison functions.
> > Write recommendation to re-create indexes with default opclasses into
> > documentation.
>
> I certainly think this should be fixed if at all possible, but I'm not
> sure about this plan. Can we really rename an opclass without
> consequence, including having that respected across pg_upgrade?

Just rename doesn't seem to be safe. Since pg_upgrade uses pg_dump, all
indexes are linked to opclasses using their names. Therefore existed
indexes will be linked to new opclasses. It's likely we need to execute SQL
script renaming opclasses before pg_upgrade. Another option is to don't
rename old opclasses, just create new default opclasses with new names.
Bruce, what is your opinion about pg_upgrade?
Contrib opclasses would be safe to rename using migration script.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Support for N synchronous standby servers

2014-09-15 Thread Michael Paquier

On Sat, Sep 13, 2014 at 3:56 AM, Robert Haas  wrote:
> I think so.  If we make it a function, then it's either the kind of
> function that you access via pg_proc, or it's the kind that's written
> in C and installed by storing a function pointer in a hook variable
> from _PG_init().  The first approach is a non-starter because it would
> require walsender to be connected to the database where that function
> lives, which is a non-starter at least for logical replication where
> we need walsender to be connected to the database being replicated.
> Even if we found some way around that problem, and I'm not sure there
> is one, I suspect the overhead would be pretty high.  The second
> approach - a hook that can be accessed directly by loadable modules -
> seems like it would work fine; the only problem that is that you've
> got to write your policy function in C.  But I have no issue with
> exposing it that way if someone wants to write a patch.  There is no
> joy in getting between the advanced user and his nutty-complicated
> sync rep configuration.  However, I suspect that most users will
> prefer a more user-friendly interface.

Reading both your answers, I'd tend to think that having a set of
hooks to satisfy all the potential user requests would be enough. We
could let the server code decide what is the priority of the standbys
using the information in synchronous_standby_names, then have the
hooks interact with SyncRepReleaseWaiters and pg_stat_get_wal_senders.

We would need two hooks:
- one able to get an array of WAL sender position defining all the
nodes considered as sync nodes. This is enough for
pg_stat_get_wal_senders. SyncRepReleaseWaiters would need it as
well...
- a second able to define the update policy of the write and flush
positions in walsndctl (SYNC_REP_WAIT_FLUSH and SYNC_REP_WAIT_WRITE),
as well as the next failover policy. This would be needed for
SyncRepReleaseWaiters when a WAL sender calls SyncRepReleaseWaiters.

Perhaps that's overthinking, but I am getting the impression that
whatever the decision taken, it would involve modifications of the
sync-standby parametrization at GUC level, and whatever the path
chosen (dedicated language, set of integer params), there will be
complains about what things can or cannot be done.

At least a set of hooks has the merit to say: do what you like with
your synchronous node policy.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Collation-aware comparisons in GIN opclasses

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
 wrote:
> some GIN opclasses uses collation-aware comparisons while they don't need to
> do especially collation-aware comparison. Examples are text[] and hstore
> opclasses. Depending on collation this may make them a much slower.

I'm glad that I saw how pointless this was with the jsonb GIN default
opclass during development.

> Rename such opclasses and make them not default.
> Create new default opclasses with bitwise comparison functions.
> Write recommendation to re-create indexes with default opclasses into
> documentation.

I certainly think this should be fixed if at all possible, but I'm not
sure about this plan. Can we really rename an opclass without
consequence, including having that respected across pg_upgrade?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb contains behaviour weirdness

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 11:29 AM, Tom Lane  wrote:
> It might be that the benefit is very close to nil; that would depend a lot
> on workload, so it's hard to be sure.  I'd say though that the cost is
> also very close to nil, in the sense that we're considering two additional
> compare-and-branch instructions in a function that will surely expend
> hundreds or thousands of instructions if there's no such short-circuit.

Fair enough.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb contains behaviour weirdness

2014-09-15 Thread Tom Lane

Peter Geoghegan  writes:
> On Mon, Sep 15, 2014 at 11:12 AM, Tom Lane  wrote:
>> Personally I'd think that we should retain it for objects; Peter's
>> main argument against that was that the comment would be too complicated,
>> but that seems a bit silly from here.

> I just don't see any point to it. My argument against the complexity
> of explaining why the optimization is only used with objects is based
> on the costs and the benefits. I think the benefits are very close to
> nil.

It might be that the benefit is very close to nil; that would depend a lot
on workload, so it's hard to be sure.  I'd say though that the cost is
also very close to nil, in the sense that we're considering two additional
compare-and-branch instructions in a function that will surely expend
hundreds or thousands of instructions if there's no such short-circuit.

I've certainly been on the side of "that optimization isn't worth its
keep" many times before, but I don't think the case is terribly clear cut
here.  Since somebody (possibly you) thought it was worth having to begin
with, I'm inclined to follow that lead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb contains behaviour weirdness

2014-09-15 Thread Robert Haas

On Mon, Sep 15, 2014 at 2:18 PM, Peter Geoghegan  wrote:
> On Mon, Sep 15, 2014 at 11:12 AM, Tom Lane  wrote:
>> Personally I'd think that we should retain it for objects; Peter's
>> main argument against that was that the comment would be too complicated,
>> but that seems a bit silly from here.
>
> I just don't see any point to it. My argument against the complexity
> of explaining why the optimization is only used with objects is based
> on the costs and the benefits. I think the benefits are very close to
> nil.

That seems pessimistic to me; I'm with Tom.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 11:20 AM, Robert Haas  wrote:
> ...looks like about a 10-line patch.  We have the data to show that
> the loss is trivial even in the worst case, and we have or should be
> able to get data showing that the best-case win is significant even
> without the abbreviated key stuff.  If you'd care to draft a patch for
> just that, I assume we could get it committed in a day or two, whereas
> I'm quite sure that considerably more work than that remains for the
> main patch.

OK, I'll draft a patch for that today, including similar alterations
to varstr_cmp() for the benefit of Windows and so on.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Robert Haas

On Mon, Sep 15, 2014 at 1:55 PM, Peter Geoghegan  wrote:
> On Mon, Sep 15, 2014 at 10:53 AM, Robert Haas  wrote:
>> I think there's probably more than that to work out, but in any case
>> there's no harm in getting a simple optimization done first before
>> moving on to a complicated one.
>
> I guess we never talked about the abort logic in all that much detail.
> I suppose there's that, too.

Well, the real point is that from where I'm sitting, this...

>> x = memcmp();
>> if (x == 0)
>>return x;
>> y = strcoll();
>> if (y == 0)
>>return x;
>> return y;

...looks like about a 10-line patch.  We have the data to show that
the loss is trivial even in the worst case, and we have or should be
able to get data showing that the best-case win is significant even
without the abbreviated key stuff.  If you'd care to draft a patch for
just that, I assume we could get it committed in a day or two, whereas
I'm quite sure that considerably more work than that remains for the
main patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triconsistent catalog declaration

2014-09-15 Thread Tom Lane

Robert Haas  writes:
> On Mon, Sep 15, 2014 at 10:13 AM, Heikki Linnakangas
>  wrote:
>> This requires a catalog change to fix. Are we still planning to do a
>> catversion bump for 9.4 because of the jsonb changes?

> That was my understanding, although we seem to be proceeding at an
> inexplicably glacial pace.

We already had a post-beta2 catversion bump for 30c05261a.  As long
as this is a catalog fix and not an on-disk-data change, there's no
reason not to make it in 9.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb contains behaviour weirdness

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 11:12 AM, Tom Lane  wrote:
> Personally I'd think that we should retain it for objects; Peter's
> main argument against that was that the comment would be too complicated,
> but that seems a bit silly from here.

I just don't see any point to it. My argument against the complexity
of explaining why the optimization is only used with objects is based
on the costs and the benefits. I think the benefits are very close to
nil.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb contains behaviour weirdness

2014-09-15 Thread Tom Lane

Josh Berkus  writes:
> On 09/12/2014 01:33 PM, Tom Lane wrote:
>> No, it's a bug, not a documentation deficiency.

> Hmmm?  Are you proposing that we should change how ARRAY @> ARRAY works
> for non-JSON data?

No.

> Or are you proposing that JSONARRAY @> JSONARRAY should work differently
> from ARRAY @> ARRAY?

And no.  It's a bug that jsonb array containment works differently from
regular array containment.  We understand the source of the bug, ie a
mistaken optimization.  I don't see why there's much need for discussion
about anything except whether removing the optimization altogether
(as Peter proposed) is the best fix, or whether we want to retain
some weaker form of it.

Personally I'd think that we should retain it for objects; Peter's
main argument against that was that the comment would be too complicated,
but that seems a bit silly from here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 10:53 AM, Robert Haas  wrote:
> I think there's probably more than that to work out, but in any case
> there's no harm in getting a simple optimization done first before
> moving on to a complicated one.

I guess we never talked about the abort logic in all that much detail.
I suppose there's that, too.

> I rather assume we could reuse the results of the first memcmp()
> instead of doing it again.
>
> x = memcmp();
> if (x == 0)
>return x;
> y = strcoll();
> if (y == 0)
>return x;
> return y;

Of course, but you know what I mean. (I'm sure the compiler will
realize this if the programmer doesn't)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triconsistent catalog declaration

2014-09-15 Thread Robert Haas

On Mon, Sep 15, 2014 at 10:13 AM, Heikki Linnakangas
 wrote:
> That makes for a bit awkward input and output from psql, when the values
> used are 0, 1, 2, rather than ascii characters. But that's OK, because as
> you said these functions are not callable from psql anyway, as they have
> "internal" arguments.

Maybe we should change them to something a bit more understandable.

> This requires a catalog change to fix. Are we still planning to do a
> catversion bump for 9.4 because of the jsonb changes?

That was my understanding, although we seem to be proceeding at an
inexplicably glacial pace.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Robert Haas

On Mon, Sep 15, 2014 at 1:34 PM, Peter Geoghegan  wrote:
> On Mon, Sep 15, 2014 at 10:17 AM, Robert Haas  wrote:
>> It strikes me that perhaps we should make this change (rearranging
>> things so that the memcmp tiebreak is run before strcoll) first,
>> before dealing with the rest of the abbreviated keys infrastructure.
>> It appears to be a separate improvement which is worthwhile
>> independently of what we do about that patch.
>
> I guess we could do that, but AFAICT the only open item blocking the
> commit of a basic version of abbreviated keys (the informally agreed
> to basic version lacking support for single-attribute aggregates) is
> what to do about the current need to create a separate sortsupport
> state. I've talked about my thoughts on that question in detail now
> [1].

I think there's probably more than that to work out, but in any case
there's no harm in getting a simple optimization done first before
moving on to a complicated one.

> BTW, you probably realize this, but we still need a second memcmp()
> after strcoll() too. hu_HU will care about that [2].
>
> [1] 
> http://www.postgresql.org/message-id/cam3swzqcdcnfwd3qzoo4qmy4g8okhuqyrd26bbla7fl2x-n...@mail.gmail.com
> [2] 
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=656beff59033ccc5261a615802e1a85da68e8fad

I rather assume we could reuse the results of the first memcmp()
instead of doing it again.

x = memcmp();
if (x == 0)
   return x;
y = strcoll();
if (y == 0)
   return x;
return y;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench throttling latency limit

2014-09-15 Thread Fabien COELHO




I'm not sure I like the idea of printing a percentage.  It might be
unclear what the denominator was if somebody feels the urge to work
back to the actual number of skipped transactions.  I mean, I guess
it's probably just the value you passed to -R, so maybe that's easy
enough, but then why bother dividing in the first place?  The user can
do that easily enough if they want the data that way.


Indeed "skipped" and "late" per second may have an unclear denominator. If 
you divide by the time, the unit would be "tps", but 120 tps performance 
including 20 late tps, plus 10 skipped tps... I do not think it is that 
clear. Reporting "tps" for transaction *not* performed looks strange.


Maybe late transactions could be given as a percentage of all processed 
transactions in the interval. But for skipped the percentage of what? The 
only number that would make sense is the total number of transactions 
schedule in the interval, but that would mean that the denominator for 
late would be different than the denominator for skipped, which is 
basically uncomprehensible.



I agree with you that it would be good to get some statistics on
late/skipped transactions, but it's not obvious what people will want.
Late transactions, straight up?  Late by more than a threshold value?


Yes.

Under throttling transaction are given a schedule start time. When the 
transactions can actually start:


  (1) if it is already more late (before even starting) than the latency
  limit (a threshold), it is *NOT* started, but counted "skipped"

  (2) otherwise it is started. When it finishes, it may be
(2a) out of the latency limit (scheduled time + limit)
 => it is counted as "late"
(2b) within the latency limit
 => all is well

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Peter Geoghegan

On Mon, Sep 15, 2014 at 10:17 AM, Robert Haas  wrote:
> It strikes me that perhaps we should make this change (rearranging
> things so that the memcmp tiebreak is run before strcoll) first,
> before dealing with the rest of the abbreviated keys infrastructure.
> It appears to be a separate improvement which is worthwhile
> independently of what we do about that patch.

I guess we could do that, but AFAICT the only open item blocking the
commit of a basic version of abbreviated keys (the informally agreed
to basic version lacking support for single-attribute aggregates) is
what to do about the current need to create a separate sortsupport
state. I've talked about my thoughts on that question in detail now
[1].

BTW, you probably realize this, but we still need a second memcmp()
after strcoll() too. hu_HU will care about that [2].

[1] 
http://www.postgresql.org/message-id/cam3swzqcdcnfwd3qzoo4qmy4g8okhuqyrd26bbla7fl2x-n...@mail.gmail.com
[2] 
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=656beff59033ccc5261a615802e1a85da68e8fad
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-09-15 Thread Robert Haas

On Mon, Sep 15, 2014 at 8:38 AM, Kouhei Kaigai  wrote:
>> > The only reason why I put separate hooks here is, create_custom_scan()
>> > needs to know exact size of the CustomScan node (including private
>> > fields), however, it is helpful for extensions to kick its callback to
>> > initialize the node next to the common initialization stuff.
>>
>> Why does it need to know that?  I don't see that it's doing anything that
>> requires knowing the size of that node, and if it is, I think it shouldn't
>> be.  That should get delegated to the callback provided by the custom plan
>> provider.
>>
> Sorry, my explanation might be confusable. The create_custom_scan() does not
> need to know the exact size of the CustomScan (or its inheritance) because of
> the two separated hooks; one is node allocation time, the other is the tail
> of the series of initialization.
> If we have only one hook here, we need to have a mechanism to informs
> create_custom_scan() an exact size of the CustomScan node; including private
> fields managed by the provider, instead of the first hook on node allocation
> time. In this case, node allocation shall be processed by create_custom_scan()
> and it has to know exact size of the node to be allocated.
>
> How do I implement the feature here? Is the combination of static node size
> and callback on the tail more simple than the existing design that takes two
> individual hooks on create_custom_scan()?

I still don't get it.  Right now, the logic in create_custom_scan(),
which I think should really be create_custom_plan() or
create_plan_from_custom_path(), basically looks like this:

1. call hook function CreateCustomPlan
2. compute values for tlist and clauses
3. pass those values to hook function InitCustomScan()
4. call copy_path_costsize

What I think we should do is:

1. compute values for tlist and clauses
2. pass those values to hook function PlanCustomPath(), which will return a Plan
3. call copy_path_costsize

create_custom_scan() does not need to allocate the node.  You don't
need the node to be allocated before computing tlist and clauses.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Claudio Freire

On Mon, Sep 15, 2014 at 2:12 PM, Josh Berkus  wrote:
> If not, I think the corner case is so obscure as to be not worth
> optimizing for.  I can't imagine that more than a tiny minority of our
> users are going to have thousands of keys per datum.

Worst case is linear cost scaling vs number of keys, which depends on
the number of keys how expensive it is.

It would have an effect only on uncompressed jsonb, since compressed
jsonb already pays a linear cost for decompression.

I'd suggest testing performance of large small keys in uncompressed
form. It's bound to have a noticeable regression there.

Now, large small keys could be 200 or 2000, or even 20k. I'd guess
several should be tested to find the shape of the curve.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Robert Haas

On Sun, Sep 14, 2014 at 10:37 AM, Heikki Linnakangas
 wrote:
> On 09/13/2014 11:28 PM, Peter Geoghegan wrote:
>> Anyway, attached rough test program implements what you outline. This
>> is for 30,000 32 byte strings (where just the final two bytes differ).
>> On my laptop, output looks like this (edited to only show median
>> duration in each case):
>
> Got to be careful to not let the compiler optimize away microbenchmarks like
> this. At least with my version of gcc, the strcoll calls get optimized away,
> as do the memcmp calls, if you don't use the result for anything. Clang was
> even more aggressive; it ran both comparisons in 0.0 seconds. Apparently it
> optimizes away the loops altogether.
>
> Also, there should be a setlocale(LC_ALL, "") call somewhere. Otherwise it
> runs in C locale, and we don't use strcoll() at all for C locale.
>
> After fixing those, it runs much slower, so I had to reduce the number of
> strings. Here's a fixed program. I'm now getting numbers like this:
>
> (baseline) duration of comparisons without useless memcmp()s: 6.007368
> seconds
>
> duration of comparisons with useless memcmp()s: 6.079826 seconds
>
> Both values vary in range 5.9 - 6.1 s, so it's fair to say that the useless
> memcmp() is free with these parameters.
>
> Is this the worst case scenario?

I can't see a worse one, and I replicated your results here on my
MacBook Pro.  I also tried with 1MB strings and, surprisingly, it was
basically free there, too.

It strikes me that perhaps we should make this change (rearranging
things so that the memcmp tiebreak is run before strcoll) first,
before dealing with the rest of the abbreviated keys infrastructure.
It appears to be a separate improvement which is worthwhile
independently of what we do about that patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-09-15 Thread Josh Berkus

On 09/12/2014 01:30 PM, Heikki Linnakangas wrote:
> 
> Performance was one argument for sure. It's not hard to come up with a
> case where the all-lengths approach is much slower: take a huge array
> with, say, million elements, and fetch the last element in a tight loop.
> And do that in a PL/pgSQL function without storing the datum to disk, so
> that it doesn't get toasted. Not a very common thing to do in real life,
> although something like that might come up if you do a lot of json
> processing in PL/pgSQL. but storing offsets makes that faster.

While I didnt post the results (because they were uninteresting), I did
specifically test the "last element" in a set of 200 elements for
all-lengths vs. original offsets for JSONB, and the results were not
statistically different.

I did not test against your patch; is there some reason why your patch
would be faster for the "last element" case than the original offsets
version?

If not, I think the corner case is so obscure as to be not worth
optimizing for.  I can't imagine that more than a tiny minority of our
users are going to have thousands of keys per datum.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb contains behaviour weirdness

2014-09-15 Thread Josh Berkus

On 09/12/2014 01:33 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> However, this better become a FAQ item, because it's not necessarily the
>> behavior that folks used to JSON but not Postgres will expect.
> 
> No, it's a bug, not a documentation deficiency.

Hmmm?  Are you proposing that we should change how ARRAY @> ARRAY works
for non-JSON data?

Or are you proposing that JSONARRAY @> JSONARRAY should work differently
from ARRAY @> ARRAY?

Or are you agreeing with Peter that it's the first case that's a bug,
and the 2nd two tests are correct, and just being unclear?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench throttling latency limit

2014-09-15 Thread Robert Haas

On Mon, Sep 15, 2014 at 6:34 AM, Heikki Linnakangas
 wrote:
> Please have a look. I have not looked at the docs changes yet.
>
> One thing that needs some thinking and changing is the progress reporting.
> It currently looks like this:
>
> progress: 1.0 s, 4863.0 tps, lat 3.491 ms stddev 2.487, lag 1.809 ms, 99
> skipped
> progress: 2.0 s, 5042.8 tps, lat 3.265 ms stddev 2.264, lag 1.584 ms, 16
> skipped
> progress: 3.0 s, 4926.1 tps, lat 2.731 ms stddev 2.371, lag 1.196 ms, 45
> skipped
> progress: 4.0 s, 4963.9 tps, lat 1.904 ms stddev 1.212, lag 0.429 ms, 0
> skipped
> progress: 5.0 s, 4971.2 tps, lat 2.592 ms stddev 1.722, lag 0.975 ms, 0
> skipped
>
> The absolute number of skipped transactions doesn't make much sense when all
> the other numbers are averages, and tps is a 1/s value. If you don't know
> the total number of transactions executed, the absolute number is
> meaningless. (Although you can calculate the absolute number of transactions
> executed by multiplying the TPS value with the interval). I think a
> percentage would be better here.
>
> Should we also print the number of late transactions here? I think that
> would be an even more important detail than the number of skipped
> transactions. It might be better to print only the percentage of late
> transactions, including skipped ones. Or both, but it's difficult to cram
> everything on a single line. This needs some further thinking..

I'm not sure I like the idea of printing a percentage.  It might be
unclear what the denominator was if somebody feels the urge to work
back to the actual number of skipped transactions.  I mean, I guess
it's probably just the value you passed to -R, so maybe that's easy
enough, but then why bother dividing in the first place?  The user can
do that easily enough if they want the data that way.

I agree with you that it would be good to get some statistics on
late/skipped transactions, but it's not obvious what people will want.
Late transactions, straight up?  Late by more than a threshold value?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-15 Thread Robert Haas

On Sun, Sep 14, 2014 at 5:18 AM, Rohit Goyal  wrote:
> Thanks for reply. But, I think i confused you. I am talking about access
> using indexes. So, I assume that B+ tree store key-value pair where rohit is
> the key and all the versions are its value.
>
> Another way to think is I have a secondary index on emp. name and there are
> 4 rohit exist in DB. So, now B+ tree gives me 4 different tuple pointer for
> each Rohit. I want to know the code portion for this where i can see all 4
> tuple pointer before each one have I/O access to fetch its tuple.

You may want to look at index_getnext(), index_getnext_tid(), and/or
heap_hot_search_buffer().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2014-09-15 Thread Robert Haas

On Sun, Sep 14, 2014 at 4:37 PM, Simon Riggs  wrote:
> On 12 September 2014 18:19, Simon Riggs  wrote:
>> On 12 September 2014 15:30, Tom Lane  wrote:
>
>>> After a little bit I remembered there was already a function for this.
>>> So specifically, I'd suggest using ExecRelationIsTargetRelation()
>>> to decide whether to mark the scan as requiring pruning.
>>
>> Sounds cool. Thanks both, this is sounding like a viable route now.
>
> Yes, this is viable.
>
> Patch attached, using Alvaro's idea of use-case specific pruning and
> Tom's idea of aiming at target relations. Patch uses or extends
> existing infrastructure, so its shorter than it might have been, yet
> with all that bufmgr yuck removed.
>
> This is very, very good because while going through this I notice the
> dozen or more places where we were pruning blocks in annoying places I
> didn't even know about such as about 4-5 constraint checks. In more
> than a few DDL commands like ALTER TABLE and CLUSTER we were even
> pruning the old relation prior to rewrite.

Do we really want to disable HOT for all catalog scans?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] .ready files appearing on slaves

2014-09-15 Thread Jehan-Guillaume de Rorthais

Hi hackers,

An issue that seems related to this has been posted on pgsql-admin. See:

  
http://www.postgresql.org/message-id/caas3tylnxaydz0+zhxlpdvtovhqovr+jsphp30o8kvwqqs0...@mail.gmail.com

How can we help on this issue?

Cheers,

On Thu, 4 Sep 2014 17:50:36 +0200
Jehan-Guillaume de Rorthais  wrote:

> Hi hackers,
> 
> Since few months, we occasionally see .ready files appearing on some slave
>  instances from various context. The two I have in mind are under 9.2.x.
> 
> I tried to investigate a bit. These .ready files are created when a WAL file
> from pg_xlog has no corresponding file in pg_xlog/archive_status. I could
> easily experience this by deleting such a file: it is created again at the
> next restartpoint or checkpoint received from the master.
> 
> Looking at the WAL in pg_xlog folder corresponding to these .ready files, they
> are all much older than the current WAL "cycle" in both mtime and name logic
> sequence. As instance on one of these box we have currently 6 of those "ghost"
> WALs:
> 
>   00021E5300FF
>   00021F1800FF
>   0002204700FF
>   000220BF00FF
>   0002214000FF
>   0002237000FF
>   0002255D00A8
>   0002255D00A9
>   [...normal WAL sequence...]
>   0002255E009D
> 
> And on another box:
> 
>   0001040E00FF
>   0001041400DA
>   0001046E00FF
>   0001047000FF
>   00010485000F
>   000104850010
>   [...normal WAL sequence...]
>   000104850052
> 
> So it seems for some reasons, these old WALs were "forgotten" by the
> restartpoint mechanism when they should have been recylced/deleted. 
> 
> For one of these servers, I could correlate this with some brutal
> disconnection of the streaming replication appearing in its logs. But there
> was no known SR disconnection on the second one.
> 
> Any idea about this weird behaviour? What can we do to help you investigate
> further?
> 
> Regards,



-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Collation-aware comparisons in GIN opclasses

2014-09-15 Thread Alexander Korotkov

Hackers,

some GIN opclasses uses collation-aware comparisons while they don't need
to do especially collation-aware comparison. Examples are text[] and hstore
opclasses. Depending on collation this may make them a much slower.

See example.

# show lc_collate ;
 lc_collate
─
 ru_RU.UTF-8
(1 row)

# create table test as (select array_agg(i::text) from
generate_series(1,100) i group by (i-1)/10);
SELECT 10

# create index test_idx on test using gin(array_agg);
CREATE INDEX
Time: *26930,423 ms*

# create index test_idx2 on test using gin(array_agg collate "C");
CREATE INDEX
Time: *5143,682 ms*

Index creation with collation "ru_RU.UTF-8" is 5 times slower while
collation has absolutely no effect on index functionality.

However, we can just replace comparison function for those opclasses
because it would break binary compatibility for pg_upgrade. I see following
solution:

   1. Rename such opclasses and make them not default.
   2. Create new default opclasses with bitwise comparison functions.
   3. Write recommendation to re-create indexes with default opclasses into
   documentation.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] delta relations in AFTER triggers

2014-09-15 Thread Kevin Grittner

Tom Lane  wrote:
> Heikki Linnakangas  writes:
>
>> On 08/30/2014 12:15 AM, Kevin Grittner wrote:
>>> If we were to go with the hooks as you propose, we would still need
>>> to take the information from TriggerData and put it somewhere else
>>> for the hook to reference.
>
>> Sure.
>
> FWIW, I agree with Heikki on this point.  It makes a lot more sense for
> the parser to provide hooks comparable to the existing hooks for resolving
> column refs, and it's not apparent that loading such functionality into
> SPI is sane at all.
>
> OTOH, I agree with Kevin that the things we're talking about are
> lightweight relations not variables.

Try as I might, I was unable to find any sensible way to use hooks.
If the issue was only the parsing, the route was fairly obvious,
but the execution side needs to access the correct tuplestore(s)
for each run, too -- so the sort of information provided by
relcache needed to be passed in to based on the context within the
process (i.e., if you have nested triggers firing, each needs to
use a different tuplestore with the same name in the same
function, even though it's using the same plan).  On both sides it
seemed easier to pass things through the same sort of techniques as
"normal" parameters; I couldn't find a way to use hooks that didn't
just make things uglier.

I see the down side of making this a feature which can only be used
from SPI, so I've updated the patch to allow it from other
contexts.  On the other hand, I see many uses for it where SPI *is*
used, and the SPI interface needs to change to support that.  The
functions I had added are one way to do that on the
parsing/planning side without breaking any existing code.  The same
information (i.e., the metadata) needs to be passed to the executor
along with references to the actual tuplestores, and that needs to
go through a different path -- at least for the plpgsql usage.

I broke out the changes from the previous patch in multiple commits
in my repository on github:

commit 2520db8fbb41c68a82c2c750c8543154c6d85eb9
Author: Kevin Grittner 
Date:   Mon Sep 15 01:17:14 2014 -0500

Use executor state rather than SPI to get named tuplestores.

spi.c and trigger.c will need a little more clean-up to match this,
and it's likely that not all places that need to pass the baton are
doing so, but at least this passes regression tests.

commit de9067258125226d1625f160c3eee9aff90ca598
Author: Kevin Grittner 
Date:   Sun Sep 14 22:01:04 2014 -0500

Pass the Tsrcache through ParserState to parse analysis.

commit e94779c1e22ec587446a7aa2593ba5f102b6a28b
Author: Kevin Grittner 
Date:   Sun Sep 14 19:23:45 2014 -0500

Modify SPI to use Tsrcache instead of List.

commit 7af841881d9113eb4c8dca8e82dc1867883bf75d
Author: Kevin Grittner 
Date:   Sun Sep 14 18:45:54 2014 -0500

Create context-specific Tsrcache.

Not used yet.

commit 35790a4b6c236d09e0be261be9b0017d34eaf9c9
Author: Kevin Grittner 
Date:   Sun Sep 14 16:49:37 2014 -0500

Create Tsrmd structure so tuplestore.h isn't needed in parser.

commit 93d57c580da095b71d9214f69fede71d2f8ed840
Author: Kevin Grittner 
Date:   Sun Sep 14 15:59:45 2014 -0500

Improve some confusing naming for TuplestoreRelation node.

Anyone who has already reviewed the earlier patch may want to look
at these individually:

https://github.com/kgrittn/postgres/compare/transition

Attached is a new patch including all of that.  Hopefully I haven't
misunderstood what Heikki and Tom wanted; if I have, just let me
know where I went wrong.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

trigger-transition-tables-v4.patch.gz
Description: application/tgz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

2014-09-15 Thread Noah Misch

On Mon, Sep 15, 2014 at 10:11:57AM +0300, Heikki Linnakangas wrote:
> On 09/15/2014 07:51 AM, Noah Misch wrote:
> >libintl replaces setlocale().  Its setlocale(LC_x, "") uses OS-specific APIs
> >to determine the default locale when $LANG and similar environment variables
> >are empty, as they are during "make check NO_LOCALE=1".  On OS X, it calls[1]
> >CFLocaleCopyCurrent(), which in turn spins up a thread.  See the end of this
> >message for the postmaster thread stacks active upon hitting a breakpoint set
> >at _dispatch_mgr_thread.
> 
> Ugh. I'd call that a bug in libintl. setlocale() has no business to
> make the process multi-threaded.

Fair interpretation.  libintl's options for mitigating the problem are even
more constrained, unfortunately.

> Do we have the same problem in backends? At a quick glance, aside
> from postmaster we only use PG_SETMASK(&BlockSig) in signal
> handlers, to prevent another signal handler from running
> concurrently.

In backends, we pass a specific value, not "", to pg_perm_setlocale().  (I
expect the problem in EXEC_BACKEND builds, but I doubt anyone uses that as a
production configuration on OS X.)  Every frontend program does call
setlocale(LC_ALL, "") via set_pglocale_pgservice().  None use sigprocmask(),
but a few do use fork().

> >1. Fork, call setlocale(LC_x, "") in the child, pass back the effective 
> >locale
> >name through a pipe, and pass that name to setlocale() in the original
> >process.  The short-lived child will get the extra threads, and the
> >postmaster will remain clean.

> >I'm skeptical of the value of looking up locale information using other OS X
> >facilities when the usual environment variables are inconclusive, but I see 
> >no
> >clear cause to reverse that decision now.  I lean toward (1).
> 
> Both of those are horrible hacks. And who's to say that calling
> setlocale(LC_x, "foo") won't also call some function that makes the
> process multi-threaded. If not in any current OS X release, it might
> still happen in a future one.

Right.  Not just setlocale(); a large body of APIs could change that way.  The
CAVEATS section[1] of the OS X fork() manual page suggests Apple had that in
mind at some point.  To be 100% safe, we would write the postmaster as though
it's always multithreaded, including making[2] EXEC_BACKEND the only supported
configuration on OS X.

Assuming we don't do that anytime soon, I did have in mind to make the
postmaster raise an error if it becomes multithreaded.  (Apple's
pthread_is_threaded_np() is usable for this check.)  Then, at least, we'll
more-easily hear about new problems of this nature.

> One idea would be to use an extra pthread mutex or similar, in
> addition to PG_SETMASK(). Whenever you do PG_SETMASK(&BlockSig),
> also grab the mutex, and release it when you do
> PG_SETMASK(&UnBlockSig).

I had considered and tested such a thing, and it sufficed to clear orangutan's
present troubles.  You still face undefined behavior after fork().  It's okay
in practice if libdispatch threads are careful to register pthread_atfork()
handlers for any relevant resources.  It's a fair option, but running
setlocale(LC_x, "") in a short-lived child assumes less about the system
library implementation, seems no more of a hack to me, and isolates the
overhead at postmaster start.

[1] 
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fork.2.html
[2] http://www.postgresql.org/message-id/1349280184-sup-...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triconsistent catalog declaration

2014-09-15 Thread Heikki Linnakangas


On 09/15/2014 04:58 PM, Alexander Korotkov wrote:

Hi!

9.4 GIN introduces new triconsistent method which can return one of three
values in spite of just consistent method. But in catalog declaration
triconsistent still returns bool type. It doesn't affect anything because
nobody calls triconsistent from SQL.


Good point.


But I think it would be correct to declare them returns int2.


No, int2 is two bytes wide, while the return value is a single byte. I 
guess "char" would be the right type.


That makes for a bit awkward input and output from psql, when the values 
used are 0, 1, 2, rather than ascii characters. But that's OK, because 
as you said these functions are not callable from psql anyway, as they 
have "internal" arguments.


This requires a catalog change to fix. Are we still planning to do a 
catversion bump for 9.4 because of the jsonb changes?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Triconsistent catalog declaration

2014-09-15 Thread Alexander Korotkov

Hi!

9.4 GIN introduces new triconsistent method which can return one of three
values in spite of just consistent method. But in catalog declaration
triconsistent still returns bool type. It doesn't affect anything because
nobody calls triconsistent from SQL. But I think it would be correct to
declare them returns int2.

--
With best regards,
Alexander Korotkov.


tri-consistent-catalog.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] implement subject alternative names support for SSL connections

2014-09-15 Thread Heikki Linnakangas


On 09/15/2014 01:44 PM, Alexey Klyukin wrote:

Committed, with that change, ie. the CN is not checked if SANs are present.


Actually, I disagree with the way the patch ignores the CN. Currently,
it skips the
CN unconditionally if the SubjectAltName section is present. But what
RFC 6125 says
is:

"If a subjectAltName extension of type dNSName is present, that MUST
be used as the identity.  Otherwise, the (most specific) Common Name
field in the Subject field of the certificate MUST be used."

This means that we have to check that at least one dNSName resource is
present before
rejecting to examine the CN. Attached is a one-liner (excluding
comments) that fixes this.


Ok, good catch. Fixed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-09-15 Thread Kouhei Kaigai

> On Thu, Sep 11, 2014 at 8:40 PM, Kouhei Kaigai  wrote:
> >> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai
> >> 
> >> wrote:
> >> >> Don't the changes to src/backend/optimizer/plan/createplan.c
> >> >> belong in patch #2?
> >> >>
> >> > The borderline between #1 and #2 is little bit bogus. So, I moved
> >> > most of portion into #1, however, invocation of InitCustomScan
> >> > (that is a callback in CustomPlanMethod) in create_custom_plan() is
> still in #2.
> >>
> >> Eh, create_custom_scan() certainly looks like it is in #1 from here,
> >> or at least part of it is.  It calculates tlist and clauses and then
> >> does nothing with them.  That clearly can't be the right division.
> >>
> >> I think it would make sense to have create_custom_scan() compute
> >> tlist and clauses first, and then pass those to CreateCustomPlan().
> >> Then you don't need a separate InitCustomScan() - which is misnamed
> >> anyway, since it has nothing to do with ExecInitCustomScan().
> >>
> > The only reason why I put separate hooks here is, create_custom_scan()
> > needs to know exact size of the CustomScan node (including private
> > fields), however, it is helpful for extensions to kick its callback to
> > initialize the node next to the common initialization stuff.
> 
> Why does it need to know that?  I don't see that it's doing anything that
> requires knowing the size of that node, and if it is, I think it shouldn't
> be.  That should get delegated to the callback provided by the custom plan
> provider.
> 
Sorry, my explanation might be confusable. The create_custom_scan() does not
need to know the exact size of the CustomScan (or its inheritance) because of
the two separated hooks; one is node allocation time, the other is the tail
of the series of initialization.
If we have only one hook here, we need to have a mechanism to informs
create_custom_scan() an exact size of the CustomScan node; including private
fields managed by the provider, instead of the first hook on node allocation
time. In this case, node allocation shall be processed by create_custom_scan()
and it has to know exact size of the node to be allocated.

How do I implement the feature here? Is the combination of static node size
and callback on the tail more simple than the existing design that takes two
individual hooks on create_custom_scan()?

> > Regarding to the naming, how about GetCustomScan() instead of
> InitCustomScan()?
> > It follows the manner in create_foreignscan_plan().
> 
> I guess that's a bit better, but come to think of it, I'd really like to
> avoid baking in the assumption that the custom path provider has to return
> any particular type of plan node.  A good start would be to give it a name
> that doesn't imply that - e.g. PlanCustomPath().
> 
OK, I'll use this naming.

> >> > OK, I revised. Now custom-scan assumes it has a particular valid
> >> > relation to be scanned, so no code path with scanrelid == 0 at this
> moment.
> >> >
> >> > Let us revisit this scenario when custom-scan replaces relation-joins.
> >> > In this case, custom-scan will not be associated with a particular
> >> > base- relation, thus it needs to admit a custom-scan node with
> >> > scanrelid
> >> == 0.
> >>
> >> Yeah, I guess the question there is whether we'll want let CustomScan
> >> have scanrelid == 0 or require that CustomJoin be used there instead.
> >>
> > Right now, I cannot imagine a use case that requires individual
> > CustomJoin node because CustomScan with scanrelid==0 (that performs
> > like custom-plan rather than custom-scan in actually) is sufficient.
> >
> > If a CustomScan gets chosen instead of built-in join logics, it shall
> > looks like a relation scan on the virtual one that is consists of two
> > underlying relation. Callbacks of the CustomScan has a responsibility
> > to join underlying relations; that is invisible from the core executor.
> >
> > It seems to me CustomScan with scanrelid==0 is sufficient to implement
> > an alternative logic on relation joins, don't need an individual node
> > from the standpoint of executor.
> 
> That's valid logic, but it's not the only way to do it.  If we have CustomScan
> and CustomJoin, either of them will require some adaption to handle this
> case.  We can either allow a custom scan that isn't scanning any particular
> relation (i.e. scanrelid == 0), or we can allow a custom join that has no
> children.  I don't know which way will come out cleaner, and I think it's
> good to leave that decision to one side for now.
> 
Yep. I agree with you. It may not be productive discussion to conclude
this design topic right now. Let's assume CustomScan scans on a particular
relation (scanrelid != 0) on the first revision.

> >> >> Why can't the Custom(GpuHashJoin) node build the hash table
> >> >> internally instead of using a separate node?
> >> >>
> >> > It's possible, however, it prevents to check sub-plans using
> >> > EXPLAIN if we manage inner-plans internally. So, I'd like to have a
> >> > separa

Re: [HACKERS] PoC: Partial sort

2014-09-15 Thread Alexander Korotkov

On Sun, Sep 14, 2014 at 9:32 AM, Peter Geoghegan  wrote:

> I think we might be better off if a tuplesort function was called
> shortly after tuplesort_begin_heap() is called. How top-n heap sorts
> work is something that largely lives in tuplesort's head. Today, we
> call tuplesort_set_bound() to hint to tuplesort "By the way, this is a
> top-n heap sort applicable sort". I think that with this patch, we
> should then hint (where applicable) "by the way, you won't actually be
> required to sort those first n indexed attributes; rather, you can
> expect to scan those in logical order. You may work the rest out
> yourself, and may be clever about exploiting the sorted-ness of the
> first few columns". The idea of managing a bunch of tiny sorts from
> with ExecSort(), and calling the new function tuplesort_reset() seems
> questionable. tuplesortstate is supposed to be private/opaque to
> nodeSort.c, and the current design strains that.
>
> I'd like to keep nodeSort.c simple. I think it's pretty clear that the
> guts of this do not belong within ExecSort(), in any case. Also, the
> additions there should be much better commented, wherever they finally
> end up.
>

As I understand, you propose to incapsulate partial sort algorithm into
tuplesort. However, in this case we anyway need some significant change of
its interface: let tuplesort decide when it's able to return tuple.
Otherwise, we would miss significant part of LIMIT clause optimization.
tuplesort_set_bound() can't solve all the cases. There could be other
planner nodes between the partial sort and LIMIT.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-09-15 Thread Alexander Korotkov

On Sun, Sep 14, 2014 at 7:39 AM, Peter Geoghegan  wrote:

> On Fri, Sep 12, 2014 at 2:19 PM, Alexander Korotkov
>  wrote:
> > Actually, higher cardinality skip columns is better. Sorting of smaller
> > groups is faster than sorting larger groups of same size. Also, with
> smaller
> > groups you achieve limit more accurate (in average), i.e. sort smaller
> > amount of total rows.
>
> Higher cardinality leading key columns are better for what? Do you
> mean they're better in that those cases are more sympathetic to your
> patch, or better in the general sense that they'll perform better for
> the user? The first example query, that you chose to demonstrate your
> patch had a leading, indexed column (column "v1") with much lower
> cardinality/n_distinct than the column that had to be sorted on
> (column "v2"). That was what my comments were based on.
>
> When this feature is used, there will often be a significantly lower
> cardinality in the leading, indexed column (as in your example).
> Otherwise, the user might well have been happy to just order on the
> indexed/leading column alone. That isn't definitely true, but it's
> often true.
>

I just meant higher cardinality is cheaper for do partial sort. We could
have some misunderstood because of notions "high" and "low" are relative.
When cardinality is 1 then partial sort seems to be useless. When
cardinality is few then it still could be cheaper to do sequential scan +
sort rather than index scan + partial sort. When cardinality is close to
number of rows then as you mentioned user probably don't need to sort by
rest of columns. At least one exception is when user needs to force
uniqueness of order. So, we need to target something in the middle of this
two corner cases.


> >> I'm not sure if that's worth it to more or less
> >> duplicate heap_tuple_attr_equals() to save a "mere" n expensive
> >> comparisons, but it's something to think about (actually, there are
> >> probably less than even n comparisons in practice because there'll be
> >> a limit).
> >
> > Not correct. Smaller groups are not OK. Imagine that two representations
> of
> > same skip column value exists. Index may return them in any order, even
> > change them one by one. In this case sorting on other column never takes
> > place, while it should.
>
> I think I explained this badly - it wouldn't be okay to make the
> grouping smaller, but as you say we could tie-break with a proper
> B-Tree support function 1 comparison to make sure we really had
> reached the end of our grouping.
>
> FWIW I want all bttextcmp()/varstr_cmp() comparisons to try a memcmp()
> first, so the use of the equality operator with text in mind that you
> mention may soon not be useful at all. The evidence suggests that
> memcmp() is so much cheaper than special preparatory NUL-termination +
> strcoll() that we should always try it first when sorting text, even
> when we have no good reason to think it will work. The memcmp() is
> virtually free. And so, you see why it might be worth thinking about
> this when we already have reasonable confidence that many comparisons
> will indicate that datums are equal. Other datatypes will have
> expensive "real" comparators, not just text or numeric.
>

When strings are not equal bttextcmp still needs to use strcoll while
texteq doesn't need that. So, it would be still advantage of using equality
operator over comparison function: equality operator don't have to compare
unequal values. However, real cost of this advantage depends on presorted
columns cardinality as well.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder

2014-09-15 Thread Heikki Linnakangas

This patch has been sitting in the commitfest with no activity since 
this email from July. What's the real status, who's got the ball on this 
one? Please update the commitfest app accordingly. If nothing happens in 
the next few days, I'll mark this as Returned with Feedback.


On 07/09/2014 03:53 PM, MauMau wrote:

From: "Alvaro Herrera" 

I think the suggestion by Peter Eisentraut upthread was pretty
reasonable -- the Makefiles are already aware that they are building a
shared library, so scrape the info off them.  The less hardcoded stuff
in the msvc framework, the better.


I overlooked his mail.  Do you mean something like this? (I don't know yet
how to express this in Perl)

for each Makefile in under top_dir_in_source_tree or src/interfaces
{
   if (Makefile contains SO_MAJOR_VERSION)
   {
 extract NAME value from Makefile
 move lib/lib${NAME}.dll to bin/
   }
}

But the source tree contains as many as 206 Makefiles.  I'm worried that it
will waste the install time.  Should all these Makefiles be examined, or 16
Makefiles in src/interfaces/?

Regards
MauMau








--
- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-09-15 Thread Heikki Linnakangas


On 09/14/2014 11:34 PM, Peter Geoghegan wrote:

On Sun, Sep 14, 2014 at 7:37 AM, Heikki Linnakangas
 wrote:

Both values vary in range 5.9 - 6.1 s, so it's fair to say that the useless
memcmp() is free with these parameters.

Is this the worst case scenario?


Other than pushing the differences much much later in the strings
(which you surely thought of already), yes.


Please test the absolute worst case scenario you can think of. As I said 
earlier, if you can demonstrate that the slowdown of that is acceptable, 
we don't even need to discuss how likely or realistic the case is.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-09-15 Thread Heikki Linnakangas


On 09/15/2014 02:42 AM, Arthur Silva wrote:

Em 14/09/2014 12:21, "Andres Freund"  escreveu:


On 2014-09-13 20:27:51 -0500, k...@rice.edu wrote:


What we are looking for here is uniqueness thus better error

detection. Not

avalanche effect, nor cryptographically secure, nor bit distribution.
As far as I'm aware CRC32C is unbeaten collision wise and time proven.

I couldn't find tests with xxhash and crc32 on the same hardware so I

spent

some time putting together a benchmark (see attachment, to run it just
start run.sh)

I included a crc32 implementation using ssr4.2 instructions (which

works on

pretty much any Intel processor built after 2008 and AMD built after

2012),

a portable Slice-By-8 software implementation and xxhash since it's

the

fastest software 32bit hash I know of.

Here're the results running the test program on my i5-4200M

crc sb8: 90444623
elapsed: 0.513688s
speed: 1.485220 GB/s

crc hw: 90444623
elapsed: 0.048327s
speed: 15.786877 GB/s

xxhash: 7f4a8d5
elapsed: 0.182100s
speed: 4.189663 GB/s

The hardware version is insanely and works on the majority of Postgres
setups and the fallback software implementations is 2.8x slower than

the

fastest 32bit hash around.

Hopefully it'll be useful in the discussion.


Note that all these numbers aren't fully relevant to the use case
here. For the WAL - which is what we're talking about and the only place
where CRC32 is used with high throughput - the individual parts of a
record are pretty darn small on average. So performance of checksumming
small amounts of data is more relevant. Mind, that's not likely to go
for CRC32, especially not slice-by-8. The cache fooprint of the large
tables is likely going to be noticeable in non micro benchmarks.


Indeed, the small input sizes is something I was missing. Something more
cache friendly would be better, it's just a matter of finding a better
candidate.


It's worth noting that the extra tables that slicing-by-4 requires are 
and *in addition to* the lookup table we already have. And slicing-by-8 
builds on the slicing-by-4 lookup tables. Our current algorithm uses a 
1kB lookup table, slicing-by-4 a 4kB, and slicing-by-8 8kB. But the 
first 1kB of the slicing-by-4 lookup table is identical to the current 
1kB lookup table, and the first 4kB of the slicing-by-8 are identical to 
the slicing-by-4 tables.


It would be pretty straightforward to use the current algorithm when the 
WAL record is very small, and slicing-by-4 or slicing-by-8 for larger 
records (like FPWs), where the larger table is more likely to pay off. I 
have no idea where the break-even point is with the current algorithm 
vs. slicing-by-4 and a cold cache, but maybe we can get a handle on that 
with some micro-benchmarking.


Although this is complicated by the fact that slicing-by-4 or -8 might 
well be a win even with very small records, if you generate a lot of them.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] implement subject alternative names support for SSL connections

2014-09-15 Thread Alexey Klyukin

On Mon, Sep 15, 2014 at 10:23 AM, Alexey Klyukin  wrote:
> On Fri, Sep 12, 2014 at 4:20 PM, Heikki Linnakangas
>  wrote:
>
>>> Hmm. If that's what the browsers do, I think we should also err on the
>>> side of caution here. Ignoring the CN is highly unlikely to cause anyone
>>> a problem; a CA worth its salt should not issue a certificate with a CN
>>> that's not also listed in the SAN section. But if you have such a
>>> certificate anyway for some reason, it shouldn't be too difficult to get
>>> a new certificate. Certificates expire every 1-3 years anyway, so there
>>> must be a procedure to renew them anyway.
>>
>>
>> Committed, with that change, ie. the CN is not checked if SANs are present.

Actually, I disagree with the way the patch ignores the CN. Currently,
it skips the
CN unconditionally if the SubjectAltName section is present. But what
RFC 6125 says
is:

"If a subjectAltName extension of type dNSName is present, that MUST
   be used as the identity.  Otherwise, the (most specific) Common Name
   field in the Subject field of the certificate MUST be used."

This means that we have to check that at least one dNSName resource is
present before
rejecting to examine the CN. Attached is a one-liner (excluding
comments) that fixes this.

Regards,
Alexey
diff --git a/src/interfaces/libpq/fe-secure-openssl.c 
b/src/interfaces/libpq/fe-secure-openssl.c
new file mode 100644
index 98d02b6..dd4fab8
*** a/src/interfaces/libpq/fe-secure-openssl.c
--- b/src/interfaces/libpq/fe-secure-openssl.c
*** verify_peer_name_matches_certificate(PGc
*** 626,637 
sk_GENERAL_NAME_free(peer_san);
}
/*
!* If there is no subjectAltName extension, check the Common Name.
 *
!* (Per RFC 2818 and RFC 6125, if the subjectAltName extension is 
present,
 * the CN must be ignored.)
 */
!   else
{
X509_NAME  *subject_name;
  
--- 626,637 
sk_GENERAL_NAME_free(peer_san);
}
/*
!* If there is no subjectAltName extension of type dNSName, check the 
Common Name.
 *
!* (Per RFC 2818 and RFC 6125, if the subjectAltName extension of type 
dNSName is present,
 * the CN must be ignored.)
 */
!   if (names_examined == 0)
{
X509_NAME  *subject_name;
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench throttling latency limit

2014-09-15 Thread Heikki Linnakangas


On 09/13/2014 11:25 AM, Fabien COELHO wrote:



[about logging...]


Here is an attempt at updating the log features, including the aggregate
and sampling stuff, with skipped transactions under throttling.

I moved the logging stuff into a function which is called when a
transaction is skipped or finished.


Makes sense.

I spent some time on this, and this is what I ended up with. Notable 
changes:


* I split this into two patches for easier review. The first patch 
contains the refactoring of the logging stuff, and the second patch 
contains the new functionality.


* I renamed many of the variables, to be more consistent with existing 
similar variables


* progress reporting was broken with !PTHREAD_FORK_EMULATION. Need to 
collect the number of skipped xacts from all threads.


* I renamed the long option to --latency-limit. --limit is too generic.

Please have a look. I have not looked at the docs changes yet.

One thing that needs some thinking and changing is the progress 
reporting. It currently looks like this:


progress: 1.0 s, 4863.0 tps, lat 3.491 ms stddev 2.487, lag 1.809 ms, 99 
skipped
progress: 2.0 s, 5042.8 tps, lat 3.265 ms stddev 2.264, lag 1.584 ms, 16 
skipped
progress: 3.0 s, 4926.1 tps, lat 2.731 ms stddev 2.371, lag 1.196 ms, 45 
skipped
progress: 4.0 s, 4963.9 tps, lat 1.904 ms stddev 1.212, lag 0.429 ms, 0 
skipped
progress: 5.0 s, 4971.2 tps, lat 2.592 ms stddev 1.722, lag 0.975 ms, 0 
skipped


The absolute number of skipped transactions doesn't make much sense when 
all the other numbers are averages, and tps is a 1/s value. If you don't 
know the total number of transactions executed, the absolute number is 
meaningless. (Although you can calculate the absolute number of 
transactions executed by multiplying the TPS value with the interval). I 
think a percentage would be better here.


Should we also print the number of late transactions here? I think that 
would be an even more important detail than the number of skipped 
transactions. It might be better to print only the percentage of late 
transactions, including skipped ones. Or both, but it's difficult to 
cram everything on a single line. This needs some further thinking..


- Heikki

commit 6f1158e9a7e77710164d503b40d77fd9dcde7a08
Author: Heikki Linnakangas 
Date:   Mon Sep 15 11:09:29 2014 +0300

Refactor log-printing in pgbench to a separate function.

The doCustom function was getting really unwieldy. Upcoming patches will add
even more code there.

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 087e0d3..0daf92f 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -347,6 +347,9 @@ static char *select_only = {
 static void setalarm(int seconds);
 static void *threadRun(void *arg);
 
+static void doLog(TState *thread, CState *st, FILE *logfile, instr_time *now,
+  AggVals *agg);
+
 static void
 usage(void)
 {
@@ -1016,6 +1019,16 @@ doCustom(TState *thread, CState *st, instr_time *conn_time, FILE *logfile, AggVa
 	PGresult   *res;
 	Command   **commands;
 	bool		trans_needs_throttle = false;
+	instr_time	now;
+
+	/*
+	 * gettimeofday() isn't free, so we get the current timestamp lazily the
+	 * first time it's needed, we reuse the same value throughout this
+	 * function after that. This also ensures that e.g. the calculated latency
+	 * reported in the log file and in the totals are the same. Zero means
+	 * "not set yet".
+	 */
+	INSTR_TIME_SET_ZERO(now);
 
 top:
 	commands = sql_files[st->use_file];
@@ -1049,10 +1062,10 @@ top:
 
 	if (st->sleeping)
 	{			/* are we sleeping? */
-		instr_time	now;
 		int64		now_us;
 
-		INSTR_TIME_SET_CURRENT(now);
+		if (INSTR_TIME_IS_ZERO(now))
+			INSTR_TIME_SET_CURRENT(now);
 		now_us = INSTR_TIME_GET_MICROSEC(now);
 		if (st->txn_scheduled <= now_us)
 		{
@@ -1074,11 +1087,6 @@ top:
 
 	if (st->listen)
 	{			/* are we receiver? */
-		instr_time	now;
-		bool		now_valid = false;
-
-		INSTR_TIME_SET_ZERO(now); /* initialize to keep compiler quiet */
-
 		if (commands[st->state]->type == SQL_COMMAND)
 		{
 			if (debug)
@@ -1100,181 +1108,40 @@ top:
 		{
 			int			cnum = commands[st->state]->command_num;
 
-			if (!now_valid)
-			{
+			if (INSTR_TIME_IS_ZERO(now))
 INSTR_TIME_SET_CURRENT(now);
-now_valid = true;
-			}
 			INSTR_TIME_ACCUM_DIFF(thread->exec_elapsed[cnum],
   now, st->stmt_begin);
 			thread->exec_count[cnum]++;
 		}
 
-		/* transaction finished: record latency under progress or throttling */
-		if ((progress || throttle_delay) && commands[st->state + 1] == NULL)
+		/* transaction finished: calculate latency and log the transaction */
+		if (commands[st->state + 1] == NULL)
 		{
-			int64		latency;
-
-			if (!now_valid)
+			/* only calculate latency if an option is used that needs it. */
+			if (progress || throttle_delay)
 			{
-INSTR_TIME_SET_CURRENT(now);
-now_valid = true;
-			}
-
-			latency = INSTR_TIME_GET_MICROSEC(now) - st->txn_scheduled;
+int64		latenc

Re: [HACKERS] implement subject alternative names support for SSL connections

2014-09-15 Thread Alexey Klyukin

On Fri, Sep 12, 2014 at 4:20 PM, Heikki Linnakangas
 wrote:

>> Hmm. If that's what the browsers do, I think we should also err on the
>> side of caution here. Ignoring the CN is highly unlikely to cause anyone
>> a problem; a CA worth its salt should not issue a certificate with a CN
>> that's not also listed in the SAN section. But if you have such a
>> certificate anyway for some reason, it shouldn't be too difficult to get
>> a new certificate. Certificates expire every 1-3 years anyway, so there
>> must be a procedure to renew them anyway.
>
>
> Committed, with that change, ie. the CN is not checked if SANs are present.
>
> Thanks for bearing through all these iterations!

Great news! Thank you very much for devoting your time and energy to
the review and providing such a useful feedback!
On the CN thing, I don't have particularly strong arguments for either
of the possible behaviors, so sticking to RFC makes sense here

Sincerely,
-- 
Alexey Klyukin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

2014-09-15 Thread Heikki Linnakangas


On 09/15/2014 07:51 AM, Noah Misch wrote:

libintl replaces setlocale().  Its setlocale(LC_x, "") uses OS-specific APIs
to determine the default locale when $LANG and similar environment variables
are empty, as they are during "make check NO_LOCALE=1".  On OS X, it calls[1]
CFLocaleCopyCurrent(), which in turn spins up a thread.  See the end of this
message for the postmaster thread stacks active upon hitting a breakpoint set
at _dispatch_mgr_thread.


Ugh. I'd call that a bug in libintl. setlocale() has no business to make 
the process multi-threaded.


Do we have the same problem in backends? At a quick glance, aside from 
postmaster we only use PG_SETMASK(&BlockSig) in signal handlers, to 
prevent another signal handler from running concurrently.



I see two options for fixing this in pg_perm_setlocale(LC_x, ""):

1. Fork, call setlocale(LC_x, "") in the child, pass back the effective locale
name through a pipe, and pass that name to setlocale() in the original
process.  The short-lived child will get the extra threads, and the
postmaster will remain clean.

2. On OS X, check for relevant environment variables.  Finding none, set
LC_x=C before calling setlocale(LC_x, "").  A variation is to raise
ereport(FATAL) if sufficient environment variables aren't in place.  Either
way ensures the libintl setlocale() will never call CFLocaleCopyCurrent().
This is simpler than (1), but it entails a behavior change: "LANG= initdb"
will use LANG=C or fail rather than use the OS X user account locale.

I'm skeptical of the value of looking up locale information using other OS X
facilities when the usual environment variables are inconclusive, but I see no
clear cause to reverse that decision now.  I lean toward (1).


Both of those are horrible hacks. And who's to say that calling 
setlocale(LC_x, "foo") won't also call some function that makes the 
process multi-threaded. If not in any current OS X release, it might 
still happen in a future one.


One idea would be to use an extra pthread mutex or similar, in addition 
to PG_SETMASK(). Whenever you do PG_SETMASK(&BlockSig), also grab the 
mutex, and release it when you do PG_SETMASK(&UnBlockSig).


It would be nice to stop doing non-trivial things in the signal handler 
in the first place. It's pretty scary, even though it works when the 
process is single-threaded. I believe the reason it's currently 
implemented like that are the same problems that the latch code solves 
with the self-pipe trick: select() is not interrupted by a signal on all 
platforms, and even if it was, you would need pselect() with is not 
available (or does not work correctly even if it exists) on all 
platforms. I think we could use a latch in postmaster too.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

65 matches

Mail list logo