date:20160111

Alvaro Herrera  writes:
>> +   
>> show_substring_limit()show_substring_limit
>> +   
>> set_substring_limit(real)set_substring_limit

> I don't quite understand why aren't we using a custom GUC variable here.

Presumably this is following the existing set_limit() precedent
in pg_trgm.  But I tend to agree that that's a crummy precedent
and we should not extend it.

Let's invent a custom GUC for the regular limit, mark show_limit()
and set_limit as deprecated, and then make just a custom GUC for
this other limit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better

I wrote:
> In further experimentation, I found out that even with this patch in
> place, the plpython transform extensions still present a dump/reload
> hazard:
> ...
> But I wonder why we couldn't make it do
> LOAD 'plpython3';

I've verified that the attached additional patch makes everything behave
nicely in this corner of the world.  AFAICS this adds no assumptions
that aren't already built into pg_pltemplate.

Last call for objections ...

regards, tom lane

diff --git a/contrib/hstore_plperl/hstore_plperl--1.0.sql b/contrib/hstore_plperl/hstore_plperl--1.0.sql
index ea0ad76..a4fd7c2 100644
*** a/contrib/hstore_plperl/hstore_plperl--1.0.sql
--- b/contrib/hstore_plperl/hstore_plperl--1.0.sql
***
*** 1,5 
  -- make sure the prerequisite libraries are loaded
! DO '' LANGUAGE plperl;
  SELECT NULL::hstore;
  
  
--- 1,5 
  -- make sure the prerequisite libraries are loaded
! LOAD 'plperl';
  SELECT NULL::hstore;
  
  
diff --git a/contrib/hstore_plperl/hstore_plperlu--1.0.sql b/contrib/hstore_plperl/hstore_plperlu--1.0.sql
index 46ad35c..2c2e3e3 100644
*** a/contrib/hstore_plperl/hstore_plperlu--1.0.sql
--- b/contrib/hstore_plperl/hstore_plperlu--1.0.sql
***
*** 1,5 
  -- make sure the prerequisite libraries are loaded
! DO '' LANGUAGE plperlu;
  SELECT NULL::hstore;
  
  
--- 1,5 
  -- make sure the prerequisite libraries are loaded
! LOAD 'plperl';
  SELECT NULL::hstore;
  
  
diff --git a/contrib/hstore_plpython/hstore_plpython2u--1.0.sql b/contrib/hstore_plpython/hstore_plpython2u--1.0.sql
index c998de5..a793dc9 100644
*** a/contrib/hstore_plpython/hstore_plpython2u--1.0.sql
--- b/contrib/hstore_plpython/hstore_plpython2u--1.0.sql
***
*** 1,5 
  -- make sure the prerequisite libraries are loaded
! DO '1' LANGUAGE plpython2u;
  SELECT NULL::hstore;
  
  
--- 1,5 
  -- make sure the prerequisite libraries are loaded
! LOAD 'plpython2';
  SELECT NULL::hstore;
  
  
diff --git a/contrib/hstore_plpython/hstore_plpython3u--1.0.sql b/contrib/hstore_plpython/hstore_plpython3u--1.0.sql
index 61d0e47..a85c85d 100644
*** a/contrib/hstore_plpython/hstore_plpython3u--1.0.sql
--- b/contrib/hstore_plpython/hstore_plpython3u--1.0.sql
***
*** 1,5 
  -- make sure the prerequisite libraries are loaded
! DO '1' LANGUAGE plpython3u;
  SELECT NULL::hstore;
  
  
--- 1,5 
  -- make sure the prerequisite libraries are loaded
! LOAD 'plpython3';
  SELECT NULL::hstore;
  
  
diff --git a/contrib/hstore_plpython/hstore_plpythonu--1.0.sql b/contrib/hstore_plpython/hstore_plpythonu--1.0.sql
index 6acb97a..e2e4721 100644
*** a/contrib/hstore_plpython/hstore_plpythonu--1.0.sql
--- b/contrib/hstore_plpython/hstore_plpythonu--1.0.sql
***
*** 1,5 
  -- make sure the prerequisite libraries are loaded
! DO '1' LANGUAGE plpythonu;
  SELECT NULL::hstore;
  
  
--- 1,5 
  -- make sure the prerequisite libraries are loaded
! LOAD 'plpython2';  -- change to plpython3 if that ever becomes the default
  SELECT NULL::hstore;
  
  
diff --git a/contrib/ltree_plpython/ltree_plpython2u--1.0.sql b/contrib/ltree_plpython/ltree_plpython2u--1.0.sql
index 29a12d4..f040bd3 100644
*** a/contrib/ltree_plpython/ltree_plpython2u--1.0.sql
--- b/contrib/ltree_plpython/ltree_plpython2u--1.0.sql
***
*** 1,5 
  -- make sure the prerequisite libraries are loaded
! DO '1' LANGUAGE plpython2u;
  SELECT NULL::ltree;
  
  
--- 1,5 
  -- make sure the prerequisite libraries are loaded
! LOAD 'plpython2';
  SELECT NULL::ltree;
  
  
diff --git a/contrib/ltree_plpython/ltree_plpython3u--1.0.sql b/contrib/ltree_plpython/ltree_plpython3u--1.0.sql
index 1300a78..7afe51f 100644
*** a/contrib/ltree_plpython/ltree_plpython3u--1.0.sql
--- b/contrib/ltree_plpython/ltree_plpython3u--1.0.sql
***
*** 1,5 
  -- make sure the prerequisite libraries are loaded
! DO '1' LANGUAGE plpython3u;
  SELECT NULL::ltree;
  
  
--- 1,5 
  -- make sure the prerequisite libraries are loaded
! LOAD 'plpython3';
  SELECT NULL::ltree;
  
  
diff --git a/contrib/ltree_plpython/ltree_plpythonu--1.0.sql b/contrib/ltree_plpython/ltree_plpythonu--1.0.sql
index 1d1af28..50f35dd 100644
*** a/contrib/ltree_plpython/ltree_plpythonu--1.0.sql
--- b/contrib/ltree_plpython/ltree_plpythonu--1.0.sql
***
*** 1,5 
  -- make sure the prerequisite libraries are loaded
! DO '1' LANGUAGE plpythonu;
  SELECT NULL::ltree;
  
  
--- 1,5 
  -- make sure the prerequisite libraries are loaded
! LOAD 'plpython2';  -- change to plpython3 if that ever becomes the default
  SELECT NULL::ltree;
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

David Steele wrote:
> Currently log messages generated by pgaudit can be made visible to the
> client simply by altering client_min_messages.  While this has not been a
> showstopper for anyone it's ideal, either.
> 
> The client authentication code sets the global variable ClientAuthInProgress
> which causes ereport() to filter client output < ERROR while forcing client
> output >= ERROR.  This functionality would also work well for pgaudit.
> 
> The patch creates a new counter to separate the log filtering from the
> authentication functionality.  This makes it possible to get the same
> filtering in other parts of the code (or extensions) without abusing the
> ClientAuthInProgress variable or using the log hook.

Hmm, okay, but this would need a large blinking comment explaining that
the calling code have better set a PG_TRY block when incrementing, so
that on errors it resets to zero again.  Or maybe some handling in
AbortOutOfAnyTransaction/AbortCurrentTransaction.  or both.

> I also considered a new function for ereport (like errhidecontext()) but
> this mechanism would not have worked for authentication and so would not
> have been used anywhere in core.

But then, you already know that authentication phase is not exactly the
same use case as security auditing, so they don't have to work the same
way necessarily.  I think this needs more discussion.  If the audit code
calls something that throws an error (other than an audit message -- say
"out of memory"), then it would also be hidden, but you may not want it
to be hidden.  I think maybe it's better to have each individual error
message be marked as "don't show to client" rather than a global var.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

Hi,

On January 11, 2016 10:46:01 PM GMT+01:00, Simon Riggs  
wrote:
>On 11 January 2016 at 20:10, Andres Freund  wrote:
>
>> On January 11, 2016 8:57:58 PM GMT+01:00, Simon Riggs
>>  wrote:
>> >On 11 January 2016 at 18:43, Simon Riggs 
>wrote:
>>
>> >It's clear there are various additional tuning opportunities, but
>the
>> >objective of the current patch to improve performance is very, very
>> >clearly
>> >met, so I'm aiming to commit *this* patch soon.
>>
>> Again, the WAL read routine used doesn't deal with timeline changes.
>
>
>Not relevant: The direct WAL read routine is never used during replay,
>so
>your comment is not relevant since we don't change timelines on the
>master.

Hm, OK.   But, isn't this actually a bad sign? Currently recovery of 2pc often 
already is a bigger bottleneck than the workload on the master, because replay 
has to execute the fsyncs implied by statefile  re-creation serially, whereas 
on the master they'll usually be executed in parallel. So, if I understand 
correctly this patch would widen that gap?

Anyway, as evidenced here, review on a phone isn't efficient, and that's all i 
have access to right now. Please wait till at least tomorrow evening, so I can 
have a meaningful look.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

I gave a quick look through the patch and noticed a few minor things
while trying to understand it.

I think the test corpus isn't particularly interesting for how big it
is.  I'd rather have (a) a small corpus (say 100 words) with which to do
detailed regression testing, and (b) some larger document for more
extensive testing.  I'm not sure (b) is actually necessary.

Overall I think the new functions could stand a lot more commentary.


> --- 104,125 
>   #define GETARR(x)   ( (trgm*)( (char*)x+TRGMHDRSIZE ) )
>   #define ARRNELEM(x) ( ( VARSIZE(x) - TRGMHDRSIZE )/sizeof(trgm) )
>   
> + #ifdef DIVUNION
> + #define CALCSML(count, len1, len2) ((float4) (count)) / ((float4) ((len1) + 
> (len2) - (count)))
> + #else
> + #define CALCSML(count, len1, len2) ((float4) (count)) / ((float4) (((len1) 
> > (len2)) ? (len1) : (len2)))
> + #endif

This macro has a multiple evaluation gotcha.  Maybe we don't really care
because of the limited applicability but I'd put a comment on top.

   
>   /* strange bug at freebsd 5.2.1 and gcc 3.3.3 */
> ! res = (*(int *)  == *(int *) _limit 
> || tmpsml > trgm_limit) ? true : false;
>   }
>   else if (ISALLTRUE(key))
>   {   /* non-leaf 
> contains signature */
> --- 288,304 
>   switch (strategy)
>   {
>   case SimilarityStrategyNumber:
> ! case SubstringSimilarityStrategyNumber:
> ! /* Similarity search is exact. Substring similarity 
> search is inexact */
> ! *recheck = (strategy == 
> SubstringSimilarityStrategyNumber) ? true : false;
> ! nlimit = (strategy == SimilarityStrategyNumber) ? 
> trgm_limit : trgm_substring_limit;
>   
>   if (GIST_LEAF(entry))
>   {   /* all leafs 
> contains orig trgm */
> ! float4  tmpsml = cnt_sml(qtrg, key, 
> *recheck);
>   
>   /* strange bug at freebsd 5.2.1 and gcc 3.3.3 */
> ! res = (*(int *)  == *(int *)  || 
> tmpsml > nlimit) ? true : false;

What's the compiler bug about?  This code and comment were introduced in
cbfa4092bb (May 2004) without any explanation.  Do we still need to keep
it, if  is now a local variable instead of a global?  FWIW the
oldest GCC in the buildfarm is 3.4.2/3.4.3 (except for Gaur which uses
GCC 2.95 under HPPA)

BTW you don't need a ternary op just to say "? true : false" -- just
make it
res = foo == bar;
which immediately assigns the correct boolean value.  (This pattern
occurs in many places both in the original and in your submitted code.
I see no reason to perpetuate it.)

   
> + Datum
> + set_substring_limit(PG_FUNCTION_ARGS)
> + {
> + float4  nlimit = PG_GETARG_FLOAT4(0);
> + 
> + if (nlimit < 0 || nlimit > 1.0)
> + elog(ERROR, "wrong limit, should be between 0 and 1");

Should be an ereport().


> ! static void
> ! protect_out_of_mem(int slen)
> ! {
> ! /*
> !  * Guard against possible overflow in the palloc requests below.  (We
> !  * don't worry about the additive constants, since palloc can detect
> !  * requests that are a little above MaxAllocSize --- we just need to
> !  * prevent integer overflow in the multiplications.)
> !  */
> ! if ((Size) (slen / 2) >= (MaxAllocSize / (sizeof(trgm) * 3)) ||
> ! (Size) slen >= (MaxAllocSize / 
> pg_database_encoding_max_length()))
> ! ereport(ERROR,
> ! (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> !  errmsg("out of memory")));
> ! }

I think the comment should now be above the function, not inside it.

> ***
> *** 254,259  generate_trgm(char *str, int slen)
> --- 292,534 
>   return trg;
>   }
>   
> + /* Trigram with position */
> + typedef struct
> + {
> + trgmt;
> + int i;
> + } ptrgm;

This struct definition could stand a better name, and better member
names.  Also, our policy (not always enforced) is to have these near the
top of the file rather than in the middle.  If the struct is used in
many places then IMO it's better that it go at the top; if it's used
only in one function I'd say it's okay to have it just after that
function.

> + /*
> +  * Make array of positional trigrams from two trigram arrays. The first 
> array
> +  * is required substing which positions don't matter and replaced with -1.
> +  * The second array is haystack where we search and have to store its
> +  * positions.
> +  */
> + static ptrgm *
> + make_positional_trgm(trgm *trg1, int len1, trgm *trg2, int len2)
> + {
> + ptrgm *result;
> + int i, len = len1 + len2;
> + 
> + result = (ptrgm *) palloc(sizeof(ptrgm) * len);
> + 
> + for (i = 0; i <

Re: [HACKERS] Spelling corrections

2016-01-11 Thread Peter Geoghegan

On Sun, Jan 10, 2016 at 2:17 AM, David Rowley
 wrote:
> Snap!
> http://www.postgresql.org/message-id/CAKJS1f-AGgQaurTwhY=wkJjspgDcmDUE8Yx03XTXCDz=kae...@mail.gmail.com

Great minds think alike.  :-)


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

Andres, Robert, are you still reviewing this patch?

Aleksander Alekseev wrote:
> Here is a clean version of the patch. Step 1 is an optimization. Step 2
> refactors this:
> 
>  HTAB *
>  ShmemInitHash(const char *name, /* table string name for shmem index */
> - long init_size,   /* initial table size */
> + long init_size,   /* initial table size XXX ignored,
>  refactor! */

So you were saying some posts upthread that the new hash tables would
"borrow" AN elements from the freelist then put AN elements back when
too many got unused.  Is that idea embodied in this patch?  I'm nervous
about continuing a line of development that does away with the
"init_size" concept completely, but if this patch is keep the "borrow"
concept then maybe it doesn't matter.

> -/* Number of partitions of the shared buffer mapping hashtable */
> -#define NUM_BUFFER_PARTITIONS  128
> -
>  /* Number of partitions the shared lock tables are divided into */
> -#define LOG2_NUM_LOCK_PARTITIONS  4
> +#define LOG2_NUM_LOCK_PARTITIONS  7
>  #define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)
>  
> +/* Number of partitions of the shared buffer mapping hashtable */
> +#define NUM_BUFFER_PARTITIONS NUM_LOCK_PARTITIONS
> +
>  /* Number of partitions the shared predicate lock tables are divided into */
>  #define LOG2_NUM_PREDICATELOCK_PARTITIONS  4
>  #define NUM_PREDICATELOCK_PARTITIONS  (1 << 
> LOG2_NUM_PREDICATELOCK_PARTITIONS)

I find this odd.  Why is it a good idea to define the bufMapping
partitions like this?  What's the explanation for the performance
numbers you saw?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecGather() + nworkers

2016-01-11 Thread Peter Geoghegan

On Sun, Jan 10, 2016 at 5:45 PM, Robert Haas  wrote:
> Well, in general, the parallel sort code doesn't really get to pick
> whether or not a BackgroundWorkerSlot gets used or not.  Whoever
> created the parallel context decides how many workers to request, and
> then the context got as many of those as it could.  It then did
> arbitrary computation, which at some point in the middle involves one
> or more parallel sorts.  You can't just have one of those workers up
> and exit in the middle.  Now, in the specific case of parallel index
> build, you probably can do that, if you want to.  But to be honest,
> I'd be inclined not to include that in the first version.  If you get
> fewer workers than you asked for, just use the number you got.  Let's
> see how that actually works out before we decide that we need a lot
> more mechanism here.  You may find that it's surprisingly effective to
> do it this way.

I am inclined to just accept for the time being (until I post the
patch) that one worker and one leader may sometimes be all that we
get. I will put off dealing with the problem until I show code, in
other words.

>> Now, you might wonder why it is that the leader cannot also sort runs,
>> just as a worker would. It's possible, but it isn't exactly
>> straightforward.

> I am surprised that this is not straightforward.  I don't see why it
> shouldn't be, and it worries me that you think it isn't.

It isn't entirely straightforward because it requires special case
handling. For example, I must teach the leader not to try and wait on
itself to finish sorting runs. It might otherwise attempt that ahead
of its final on-the-fly merge.

>> More importantly, I have other, entirely general concerns. Other major
>> RDBMSs have settings that are very similar to max_parallel_degree,
>> with a setting of 1 effectively disabling all parallelism. Both Oracle
>> and SQL Server have settings that they both call the "maximum degree
>> or parallelism". I think it's a bit odd that with Postgres,
>> max_parallel_degree = 1 can still use parallelism at all. I have to
>> wonder: are we conflating controlling the resources used by parallel
>> operations with how shared memory is doled out?
>
> We could redefined things so that max_parallel_degree = N means use N
> - 1 workers, with a minimum value of 1 rather than 0, if there's a
> consensus that that's better.  Personally, I prefer it the way we've
> got it: it's real darned clear in my mind that max_parallel_degree=0
> means "not parallel".  But I won't cry into my beer if a consensus
> emerges that the other way would be better.

The fact that we don't do that isn't quite the issue, though. It may
or may not make sense to count the leader as an additional worker when
the leader has very little work to do. In good cases for parallel
sequential scan, the leader has very little leader-specific work to
do, because most of the time is spent on workers (including the
leader) determining that tuples must not be returned to the leader.
When that is less true, maybe the leader could reasonably count as a
fixed cost for a parallel operation. Hard to say.

I'm sorry that that's not really actionable, but I'm still working
this stuff out.

>> I could actually "give back" my parallel worker slots early if I
>> really wanted to (that would be messy, but the then-acquiesced workers
>> do nothing for the duration of the merge beyond conceptually owning
>> the shared tape temp files). I don't think releasing the slots early
>> makes sense, because I tend to think that hanging on to the workers
>> helps the DBA in managing the server's resources. The still-serial
>> merge phase is likely to become a big bottleneck with parallel sort.
>
> Like I say, the sort code better not know anything about this
> directly, or it's going to break when embedded in a query.

tuplesort.c knows very little. nbtsort.c manages workers, and their
worker tuplesort states, as well as the leader and its tuplesort
state. So tuplesort.c knows a little bit about how the leader
tuplesort state may need to reconstruct worker state in order to do
its final on-the-fly merge. It knows nothing else, though, and
provides generic hooks for assigning worker numbers to worker
processes, or logical run numbers (this keeps trace_sort output
straight, plus a few other things).

Parallel workers are all managed in nbtsort.c, which seems
appropriate. Note that I have introduced a way in which a single
tuplesort state doesn't perfectly encapsulate a single sort operation,
though.

> This seems dead wrong.  A max_parallel_degree of 8 means you have a
> leader and 8 workers.  Where are the other 7 processes coming from?
> What you should have is 8 processes each of which is participating in
> both the parallel seq scan and the parallel sort, not 8 processes
> scanning and 8 separate processes sorting.

I simply conflated max_parallel_degree and max_worker_processes for a
moment. The point is that

Re: [HACKERS] Speedup twophase transactions

On 11 January 2016 at 20:10, Andres Freund  wrote:

> On January 11, 2016 8:57:58 PM GMT+01:00, Simon Riggs
>  wrote:
> >On 11 January 2016 at 18:43, Simon Riggs  wrote:
>
> >It's clear there are various additional tuning opportunities, but the
> >objective of the current patch to improve performance is very, very
> >clearly
> >met, so I'm aiming to commit *this* patch soon.
>
> Again, the WAL read routine used doesn't deal with timeline changes.

Not relevant: The direct WAL read routine is never used during replay, so
your comment is not relevant since we don't change timelines on the master.

So no,  it's bit ready to be committed.
>

I will update the comment on that function to explain its usage and its
limitations for future usage, to make that clearer.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-11 Thread Amit Langote

On 2016/01/08 21:20, Rahila Syed wrote:
>> I suspect you need to create a new CF entry for this patch in CF 2016-01.
> 
> Unless I am missing something, there seems to be no entry for this patch
> into CF 2016-01 page: https://commitfest.postgresql.org/8/.
> Regrettably, we have exceeded the deadline to add the patch into this
> commitfest. Is there still some way to add it to the commitfest 2016-01? As
> this feature has received lot of feedback in previous commitfest , adding
> it to this commitfest will surely help in progressing it in order to make
> it ready for PostgreSQL 9.6.

I see that the patch has been added to the CF.

I'm slightly concerned that the latest patch doesn't incorporate any
revisions to the original pgstat interface per Robert's comments in [1].

Thanks,
Amit

[1]
http://www.postgresql.org/message-id/ca+tgmoz5q4n4t0c0_-xktencewoabfdktoppt8nujjjv5oh...@mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Avoid pin scan for replay of XLOG_BTREE_VACUUM

On 11 January 2016 at 01:46, Robert Haas  wrote:


> /me crawls into a hole.
>
> Thanks.


Far from it, thanks very much for thinking about it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PATCH: add pg_current_xlog_flush_location function

On Tue, Jan 12, 2016 at 6:35 AM, Simon Riggs wrote:
> Comments in xlog.c say
>
> "In addition to the shared variable, each backend has a private copy of
> LogwrtResult, which is updated when convenient."
> It is therefore valid to update the value of both Write and Flush positions
> at the same time, any time either is required.

Yes I saw this one yesterday when looking at this code. My comment
regarded the potential interactions between this field with XLogFlush,
but now I see that my concerns are not valid, updating more frequently
LogwrtResult may save some cycles though.

> My suggested commit pattern for this is...
> 1. Update existing function to maintain LogwrtResult more eagerly (separate
> patch)

The only place I see now that would benefit a bit from that is
UpdateMinRecoveryPoint when info_lck is taken, which can be called by
XLogFlush. Though I would expect this to have minimal impact.

> 2. Have the patch use the existing function name (main patch)

Yeah, we had better just use GetFlushRecPtr and be done with it. It
seems that there is little point to add a new function, and it is not
going to be called that much so its effects in updating LogwrtResult
would be minimized for a single backend.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

On Thu, Jan 7, 2016 at 7:00 PM, Fabien COELHO wrote:
> Obviously all this is possible in the grand scale of things, but this is
> also significant work for a tiny benefit, if any. I rather see "debug" as a
> simple tool for debugging a script with "pgbench -t 1" (i.e. one or few
> transactions), so that the trace length is not an issue.
> Once the script is satisfactory, remove the debug and run it for real.
> Note that it does not make much sense to run with "debug" calls for many
> transactions because of the performance impact.

Yep. That's exactly what I did during my tests.

>> And more comments for example in pgbench.h would be welcome to explain
>> more the code.
>
> Ok. I'm generally in favor of comments.

OK, I added some where I thought it was adapted.

>> I am fine to take a final look at that before handling it to a committer
>> though. Does that sound fine as a plan, Fabien?
>
> I understand that you propose to add these comments?

Yep. I did as attached.

> I'm fine with that!:-)
> If I misuderstood, tell me:-)

I think you don't, but I found other issues:
1) When precising a negative parameter in the gaussian and exponential
functions an assertion is triggered:
Assertion failed: (parameter > 0.0), function getExponentialRand, file
pgbench.c, line 494.
Abort trap: 6 (core dumped)
An explicit error is better.
2) When precising a value between 0 and 2, pgbench will loop
infinitely. For example:
\set cid debug(random_gaussian(-100, 100, 0))
In both cases we just lack sanity checks with PGBENCH_RANDOM,
PGBENCH_RANDOM_EXPONENTIAL and PGBENCH_RANDOM_GAUSSIAN. I think that
those checks would be better if moved into getExponentialRand & co
with the assertions removed. I would personally have those functions
return a boolean status and have the result in a pointer as a function
argument.

Another thing itching me is that ENODE_OPERATOR is actually useless
now that there is a proper function layer. Removing it and replacing
it with a set of functions would be more adapted and make the code
simpler, at the cost of more functions and changing the parser so as
an operator found is transformed into a function expression.

I am attaching a patch where I tweaked a bit the docs and added some
comments for clarity. Patch is switched to "waiting on author" for the
time being.
Regards,
-- 
Michael
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 541d17b..9ddaabf 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -771,17 +771,21 @@ pgbench  options  dbname
   Sets variable varname to an integer value calculated
   from expression.
   The expression may contain integer constants such as 5432,
-  references to variables :variablename,
+  double constants such as 3.14156,
+  references to integer variables :variablename,
   and expressions composed of unary (-) or binary operators
-  (+, -, *, /, %)
-  with their usual associativity, and parentheses.
+  (+, -, *, /,
+  %) with their usual associativity, function calls and
+  parentheses.
+   shows the available
+  functions.
  
 
  
   Examples:
 
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1
 
 

@@ -801,66 +805,35 @@ pgbench  options  dbname
  
 
  
-  By default, or when uniform is specified, all values in the
-  range are drawn with equal probability.  Specifying gaussian
-  or  exponential options modifies this behavior; each
-  requires a mandatory parameter which determines the precise shape of the
-  distribution.
- 
+  
+   
+
+ \setrandom n 1 10 or \setrandom n 1 10 uniform
+ is equivalent to \set n random(1, 10) and uses a uniform
+ distribution.
+
+   
 
- 
-  For a Gaussian distribution, the interval is mapped onto a standard
-  normal distribution (the classical bell-shaped Gaussian curve) truncated
-  at -parameter on the left and +parameter
-  on the right.
-  Values in the middle of the interval are more likely to be drawn.
-  To be precise, if PHI(x) is the cumulative distribution
-  function of the standard normal distribution, with mean mu
-  defined as (max + min) / 2.0, with
-
- f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
-(2.0 * PHI(parameter) - 1.0)
-
-  then value i between min and
-  max inclusive is drawn with probability:
-  f(i + 0.5) - f(i - 0.5).
-  Intuitively, the larger parameter, the more
-  frequently values close to the middle of the interval are drawn, and the
-  less frequently values close to the min and
-  max bounds. About 67% of values are drawn from the
-  middle 1.0 / parameter, that is a relative
-  0.5 / parameter around the mean, and 95% in the middle
-  2.0 / parameter, that is a relative
-  1.0 /

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-11 Thread Vinayak Pokale

On Jan 12, 2016 11:22 AM, "Amit Langote" 
wrote:
>
> On 2016/01/12 10:30, Amit Langote wrote:
> > I'm slightly concerned that the latest patch doesn't incorporate any
> > revisions to the original pgstat interface per Robert's comments in [1].
>
> I meant to say "originally proposed pgstat interface on this thread".

Yes.
Robert's comments related to pgstat interface needs to be address.
I will update it.

Regards,
Vinayak Pokale

Re: [HACKERS] Speedup twophase transactions

On 11 January 2016 at 22:24, Andres Freund  wrote:


> Please wait till at least tomorrow evening, so I can have a meaningful
> look.
>

No problem, make sure to look at 2pc_optimize.v4.patch

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Proposal: BSD Authentication support

2016-01-11 Thread Marisa Emerson

I've attached the latest version of this patch. I've fixed up an issue 
with the configuration scripts that I missed.


On 08/01/16 12:40, Marisa Emerson wrote:

There's a port for PAM, but we would prefer to use BSD Auth as its quite
a lot cleaner and is standard on OpenBSD.

I've attached an updated patch that includes documentation. It has been
tested against OpenBSD 5.8. I'll add this thread to the commitfest.

On 07/01/16 23:26, Greg Stark wrote:

This sounds like a sensible thing to me. I'm actually surprised, it
sounds like something we would have already seen. Do some people just
use PAM on OpenBSD? Are both supported?

You should add the patch to https://commitfest.postgresql.org to
ensure it doesn't slip through the cracks. It's too late for January
though there's nothing stopping people from commenting on or even
committing patches outside the commitfest.

diff --git a/configure b/configure
index 5772d0e..84c1c3e 100755
--- a/configure
+++ b/configure
@@ -826,6 +826,7 @@ with_python
 with_gssapi
 with_krb_srvnam
 with_pam
+with_bsd_auth
 with_ldap
 with_bonjour
 with_openssl
@@ -1514,6 +1515,7 @@ Optional Packages:
   --with-krb-srvnam=NAME  default service principal name in Kerberos (GSSAPI)
   [postgres]
   --with-pam  build with PAM support
+  --with-bsd-auth build with BSD Authentication support
   --with-ldap build with LDAP support
   --with-bonjour  build with Bonjour support
   --with-openssl  build with OpenSSL support
@@ -5557,6 +5559,41 @@ $as_echo "$with_pam" >&6; }
 
 
 #
+# BSD AUTH
+#
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with BSD support" >&5
+$as_echo_n "checking whether to build with BSD support... " >&6; }
+
+
+
+# Check whether --with-bsd-auth was given.
+if test "${with_bsd_auth+set}" = set; then :
+  withval=$with_bsd_auth;
+  case $withval in
+yes)
+
+$as_echo "#define USE_BSD_AUTH 1" >>confdefs.h
+
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-bsd-auth option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_bsd_auth=no
+
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_bsd_auth" >&5
+$as_echo "$with_bsd_auth" >&6; }
+
+
+#
 # LDAP
 #
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with LDAP support" >&5
@@ -10475,6 +10512,23 @@ done
 
 fi
 
+if test "$with_bsd_auth" = yes ; then
+  for ac_header in bsd_auth.h
+do :
+  ac_fn_c_check_header_mongrel "$LINENO" "bsd_auth.h" "ac_cv_header_bsd_auth_h" "$ac_includes_default"
+if test "x$ac_cv_header_bsd_auth_h" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_BSD_AUTH_H 1
+_ACEOF
+
+else
+  as_fn_error $? "header file  is required for BSD Authentication support" "$LINENO" 5
+fi
+
+done
+
+fi
+
 if test "$with_libxml" = yes ; then
   ac_fn_c_check_header_mongrel "$LINENO" "libxml/parser.h" "ac_cv_header_libxml_parser_h" "$ac_includes_default"
 if test "x$ac_cv_header_libxml_parser_h" = xyes; then :
diff --git a/configure.in b/configure.in
index 44f832f..8eb98a8 100644
--- a/configure.in
+++ b/configure.in
@@ -663,6 +663,16 @@ AC_MSG_RESULT([$with_pam])
 
 
 #
+# BSD AUTH
+#
+AC_MSG_CHECKING([whether to build with BSD support])
+PGAC_ARG_BOOL(with, bsd-auth, no,
+  [build with BSD Authentication support],
+  [AC_DEFINE([USE_BSD_AUTH], 1, [Define to 1 to build with BSD support. (--with-bsd-auth)])])
+AC_MSG_RESULT([$with_bsd_auth])
+
+
+#
 # LDAP
 #
 AC_MSG_CHECKING([whether to build with LDAP support])
@@ -1249,6 +1259,10 @@ if test "$with_pam" = yes ; then
  [AC_MSG_ERROR([header file  or  is required for PAM.])])])
 fi
 
+if test "$with_bsd_auth" = yes ; then
+  AC_CHECK_HEADERS(bsd_auth.h, [], [AC_MSG_ERROR([header file  is required for BSD Authentication support])])
+fi
+
 if test "$with_libxml" = yes ; then
   AC_CHECK_HEADER(libxml/parser.h, [], [AC_MSG_ERROR([header file  is required for XML support])])
 fi
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 3b2935c..ffb5178 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -522,6 +522,17 @@ hostnossl  database  user
  
 

+
+   
+ bsd
+ 
+  
+   Authenticate using BSD Authentication (BSD Auth) provided
+   by the operating system. See 
+   for details.
+  
+ 
+   
   
 
   
@@ -1647,6 +1658,30 @@ host ... ldap ldapurl="ldap://ldap.example.net/dc=example,dc=net?uid?sub;
 

   
+
+  
+   BSD Authentication
+
+   
+BSD
+   
+
+   
+This authentication method operates similarly to
+password except that it uses BSD
+Authentication as the authentication mechanism. BSD Authentication
+is used only to validate user name/password pairs.
+Therefore the user must already exist in the database before BSD
+Authentication can be

Re: [HACKERS] Speedup twophase transactions

On 11 January 2016 at 23:11, Stas Kelvich  wrote:

> >
>
> On 11 Jan 2016, at 21:43, Simon Riggs  wrote:
> >
> > Have you measured lwlocking as a problem?
> >
>
> Yes. GXACT locks that wasn’t even in perf top 10 on dual Xeon moves to the
> first places when running on 60 core system. But Jesper’s flame graph on 24
> core system shows different picture.

I think we can use a shmem hash table to identify the GID by name during
LockGxact and avoid duplicates during prepare. Hashing on the first 16
bytes of the GID should be sufficient in most cases; the worst case would
be the same as it is now, all depending on how people use the GID name
field. The hash value can be calculated outside of the lock. We can also
partition the lock without risk, just adds a little extra code.

We can also optimize final removal (sketch of how to do that attached).

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

2pc_remove_prepXacts.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-11 Thread Amit Langote

On 2016/01/12 10:30, Amit Langote wrote:
> I'm slightly concerned that the latest patch doesn't incorporate any
> revisions to the original pgstat interface per Robert's comments in [1].

I meant to say "originally proposed pgstat interface on this thread".

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-11 Thread Vinayak Pokale

Hi Sudhir,

On Jan 7, 2016 3:02 AM, "Sudhir Lonkar-2"  wrote:
>
> Hello,
> >Please find attached patch addressing following comments
> I have tested this patch.
> It is showing empty (null) in phase column of pg_stat_vacuum_progress,
when
> I switched to a unprivileged user.
> In the previous patch, it is showing  in phase
> column.
Yes. I will update the patch.
Regards,
Vinayak Pokale

Re: [HACKERS] pg_conversion seems rather strangely defined

2016-01-11 Thread Tatsuo Ishii

>> I used to had a customer who needs to have different client and
>> database encoding than the default.  That is, a slightly different
>> mapping between Shift-JIS and other database encoding.  Due to
>> unfortunate historical reasons, there are several Shift-JIS variants
>> (in addition to the standard defined by government, there are IBM, NEC
>> and Microsoft versions).  This is the reason why I wanted to have the
>> functionality at that time.  I'm not sure the customer still uses the
>> functionality, but it is possible that there are other users who have
>> similar use cases, since the Shift-JIS variants are still used.
> 
> Hm.  Odd that we've not heard complaints about the removal of
> CONVERT(... USING ...), then.

Yeah, I'm not sure the customer updated to the newer version of
PostgreSQL.

> I think it would be a good idea at least to put back some equivalent
> of CONVERT(... USING ...), if for no other reason than that it would
> ease testing.  As I understood it, the argument to remove it was not
> that the functionality was bad, but that we were using a SQL-standard
> syntax for what we concluded was not SQL-standard functionality.
> I'd propose putting it back with a syntax of, say,
> 
>   convert_with(input bytea, conversion_name text) returns bytea
> 
> As for the client encoding conversion case, I still say a
> search-path-based lookup is a horrible idea, and furthermore there
> seems no very good reason why it has to be restricted to default
> conversions.  Aside from other arguments, that tends to push people
> to mark *every* conversion as default, which is outright silly if
> you have several competing ones.
> 
> As a sketch of an alternative, consider inventing a GUC named
> preferred_conversions or some such, which is a list of
> possibly-schema-qualified conversion names.  When establishing an
> original or new value of client_encoding, we look through the list
> for the first entry that exists and performs the desired encoding
> conversion (whether or not it is default).  If there is no match,
> look up the (unique) default conversion for the encoding pair, and
> use that.  (Obviously this has to be done twice, once for each
> direction, when setting up client encoding conversions.)  We could
> apply the same rules for identifying which specific conversion to use
> in convert() and siblings.

Sounds good to me.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On Tue, Jan 12, 2016 at 4:57 AM, Simon Riggs  wrote:
> Performance tests for me show that the patch is effective; my results match
> Jesper's roughly in relative numbers.
>
> My robustness review is that the approach and implementation are safe.
>
> It's clear there are various additional tuning opportunities, but the
> objective of the current patch to improve performance is very, very clearly
> met, so I'm aiming to commit *this* patch soon.

-   /* initialize LSN to 0 (start of WAL) */
-   gxact->prepare_lsn = 0;
+   /* initialize LSN to InvalidXLogRecPtr */
+   gxact->prepare_start_lsn = InvalidXLogRecPtr;
+   gxact->prepare_end_lsn = InvalidXLogRecPtr;

I think that it would be better to explicitly initialize gxact->ondisk
to false here.

+   xlogreader = XLogReaderAllocate(_read_local_xlog_page, NULL);
+   if (!xlogreader)
+   ereport(ERROR,
+   (errcode(ERRCODE_OUT_OF_MEMORY),
+errmsg("out of memory"),
+errdetail("Failed while allocating an
XLog reading processor.")));
Depending on something that is part of logical decoding to decode WAL
is not a good idea. If you want to move on with this approach, you
should have a dedicated function. Even better, it would be nice to
come up with a generic function used by both 2PC and logical decoding.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On Tue, Jan 12, 2016 at 3:35 PM, Michael Paquier
 wrote:
> On Tue, Jan 12, 2016 at 4:57 AM, Simon Riggs  wrote:
>> Performance tests for me show that the patch is effective; my results match
>> Jesper's roughly in relative numbers.
>>
>> My robustness review is that the approach and implementation are safe.
>>
>> It's clear there are various additional tuning opportunities, but the
>> objective of the current patch to improve performance is very, very clearly
>> met, so I'm aiming to commit *this* patch soon.
>
> -   /* initialize LSN to 0 (start of WAL) */
> -   gxact->prepare_lsn = 0;
> +   /* initialize LSN to InvalidXLogRecPtr */
> +   gxact->prepare_start_lsn = InvalidXLogRecPtr;
> +   gxact->prepare_end_lsn = InvalidXLogRecPtr;
>
> I think that it would be better to explicitly initialize gxact->ondisk
> to false here.
>
> +   xlogreader = XLogReaderAllocate(_read_local_xlog_page, NULL);
> +   if (!xlogreader)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_OUT_OF_MEMORY),
> +errmsg("out of memory"),
> +errdetail("Failed while allocating an
> XLog reading processor.")));
> Depending on something that is part of logical decoding to decode WAL
> is not a good idea. If you want to move on with this approach, you
> should have a dedicated function. Even better, it would be nice to
> come up with a generic function used by both 2PC and logical decoding.


+   if (log_checkpoints && n > 0)
+   ereport(LOG,
+   (errmsg("%u two-phase state files were written "
+   "for long-running
prepared transactions",
+   n)));
This would be better as an independent change. That looks useful for
debugging, and I guess that's why you added it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Weighted Stats

2016-01-11 Thread David Fetter

On Fri, Jan 08, 2016 at 04:37:36PM +1100, Haribabu Kommi wrote:
> On Mon, Dec 21, 2015 at 1:50 PM, David Fetter  wrote:
> > On Sun, Dec 20, 2015 at 06:13:33PM -0600, Jim Nasby wrote:
> >> On 11/2/15 5:46 PM, David Fetter wrote:
> >> >I'd like to add weighted statistics to PostgreSQL
> >>
> >> Anything happen with this? If community isn't interested, ISTM it'd be good
> >> to put this in PGXN.
> >
> > I think it's already in PGXN as an extension, and I'll get another
> > version out this early this week, as it involves mostly adding some
> > tests.
> >
> > I'll do the float8 ones for core this week, too, and unless there's a
> > really great reason to do more data types on the first pass, it should
> > be in committable shape.
> 
> I reviewed the patch, following are my observations.
> 
> 1. +   precision, numeric, or interval
> 
> with interval type it is giving problem. As interval data type is not 
> supported,
> so remove it in the list of supported inputs.

I'd meant to add more, but will make sure that the next version
documents exactly the types it supports.

> postgres=# select weighted_avg(f7,f1) from tbl;
> ERROR:  function weighted_avg(interval, smallint) does not exist at character 
> 8
> HINT:  No function matches the given name and argument types. You
> might need to add explicit type casts.
> 
> 
> 2. +float8_weighted_avg(PG_FUNCTION_ARGS)
> 
> It will be helpful, if you provide some information as a function header,
> how the weighted average is calculated similar like other weighted functions.

Will do.

> 3. + transvalues = check_float8_array(transarray,
> "float8_weighted_stddev_accum", 4);
> 
> The second parameter to check_float8_array should be "float8_weighted_accum".

Oops.  Will fix.

> 4. There is an OID conflict of 4066 with latest master code.

Will fix.

> 5.+ A += newvalW * ( newvalX - transvalues[2] ) / W;
> + CHECKFLOATVAL(A, isinf(newvalW) || isinf(newvalX - transvalues[2])
> || isinf(1.0/W), true);
> 
> + Q += newvalW * (newvalX - transvalues[2]) * (newvalX - A);
> + CHECKFLOATVAL(A, isinf(newvalX -  transvalues[3]) || isinf(newvalX -
> A) || isinf(1.0/W), true);
> 
> 
> Is the need of calculation also needs to be passed to CHECKFLOATVAL?
> Just passing
> the variables involved in the calculation isn't enough? If expressions
> are required then
> it should be something as follows?
> 
> CHECKFLOATVAL(A, isinf(transvalues[2]) || isinf(newvalW) ||
> isinf(newvalX - transvalues[2]) || isinf(1.0/W), true);
> 
> CHECKFLOATVAL(Q, isinf(transvalues[3]) || isinf(newvalX -
> transvalues[2]) || isinf(newvalX - A) || isinf(1.0/W), true);

Will fix.

> I verified the stddev transition and final function calculations
> according to wikipedia
> and they are fine.

Thanks for reviewing this!

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: add pg_current_xlog_flush_location function

On 11 January 2016 at 12:01, Tomas Vondra 
wrote:

>
> On 01/11/2016 06:30 AM, Michael Paquier wrote:
>
>>
>> Updating LogwrtResult directly when calling your new function
>> GetXLogFlushRecPtr() does not strike me as a particularly good idea
>> per this portion in XLogFlush():
>>
> >
>
>>  /* Quick exit if already known flushed */
>>  if (record <= LogwrtResult.Flush)
>>  return;
>>
>> The same counts for GetXLogWriteRecPtr, we had better allocate the
>> value in an independent variable as there are checks using it. For
>> now it does not matter much for the write position because all the
>> code paths doing the checks explicitly update again the pointer
>> before looking at it but it seems to me that using an independent
>> variable would make the code more robust.
>>
>
> Why? LogwrtResult only serves as a local cache of shared values, so there
> should be no danger of skipping something.


Comments in xlog.c say

"In addition to the shared variable, each backend has a private copy of
LogwrtResult, which is updated when convenient."

It is therefore valid to update the value of both Write and Flush positions
at the same time, any time either is required.


My suggested commit pattern for this is...

1. Update existing function to maintain LogwrtResult more eagerly (separate
patch)

2. Have the patch use the existing function name (main patch)

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Speedup twophase transactions

2016-01-11 Thread Stas Kelvich

> On 10 Jan 2016, at 12:15, Simon Riggs  wrote:
> 
> So we've only optimized half the usage? We're still going to cause 
> replication delays.

Yes, replica will go through old procedures of moving data to and from file.

> We can either
> 
> 1) Skip fsyncing the RecreateTwoPhaseFile and then fsync during restartpoints

>From what i’ve seen with old 2pc code main performance bottleneck was caused 
>by frequent creating of files. So better to avoid files if possible.

> 
> 2) Copy the contents to shmem and then write them at restartpoint as we do 
> for checkpoint
> (preferred)

Problem with shared memory is that we can’t really predict size of state data, 
and anyway it isn’t faster then reading data from WAL
(I have tested that while preparing original patch). 

We can just apply the same logic on replica that on master: do not do anything 
special on prepare, and just read that data from WAL.
If checkpoint occurs during recovery/replay probably existing code will handle 
moving data to files.

I will update patch to address this issue.

> I think padding will negate the effects of the additional bool.
> 
> If we want to reduce the size of the array GIDSIZE is currently 200, but XA 
> says maximum 128 bytes.
> 
> Anybody know why that is set to 200?

Good catch about GID size.

If we talk about further optimisations i see two ways:

1) Optimising access to GXACT. Here we can try to shrink it; introduce more 
granular locks, 
e.g. move GIDs out of GXACT and lock GIDs array only once while checking new 
GID uniqueness; try to lock only part of GXACT by hash; etc.

2) Be optimistic about consequent COMMIT PREPARED. In normal workload next 
command after PREPARE will be COMMIT/ROLLBACK, so we can save
transaction context and release it only if next command isn’t our designated 
COMMIT/ROLLBACK. But that is a big amount of work and requires
changes to whole transaction pipeline in postgres.

Anyway I suggest that we should consider that as a separate task.

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Need help on pgcrypto

2016-01-11 Thread rajan

Trying to find a documentation which will make me understand how to secure a
password column. I want the encryption to be one way and it should not be
decrypted.

I am able to find discrete documents but nothing of them gets me there.

Please help.



-
--
Thanks,
Rajan.
--
View this message in context: 
http://postgresql.nabble.com/Need-help-on-pgcrypto-tp5881477.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] looking for an array-extract-item-as-it operator

2016-01-11 Thread Peter Krauss

(ops, sending to the pgsql-hackers, see the complete thread below)

Adding a formal suggestion after discussion: to  include a fast
array_getarray() function!


CREATE FUNCTION array_getarray( m anyarray, idx int ) RETURNS anyarray AS
$f$
-- this is a slow workaround for an (need for) internal operation
WITH item AS (SELECT unnest($1[$2:$2]) as x)
SELECT array_agg(x) FROM item;
$f$ LANGUAGE sql IMMUTABLE;

-- EXAMPLE:
  SELECT array_getarray(zz,2) as x, zz[2:2] as y  -- x is not same as y!
  FROM ( SELECT '{{1,2},{33,44}}'::int[][] as zz  ) as tt




2016-01-07 7:26 GMT-02:00 Peter Krauss :

>
>
> 2016-01-06 20:50 GMT-02:00 Tom Lane :
>
>> Peter Krauss  writes:
>> > I need to access an array-item from an array of arrays.
>>
>> Multi-dimensional arrays in Postgres are *not* "arrays of arrays".
>>
>
> Thanks,  you expressed in a little phrase something fundamental to think
> about pg-arrays (!), the pg-Guide need some notices like yours, to remember
> people like me ;-)  Well... The good answer closes the question.
>
> - - - -
>
> We can imagine that the "multidimensional array world" is like a data
> type, that is distinct from the "usual array" data type...
> I am starting other discussion...
>
> Let me explain how the question arrives for me: was after working with
> JSONB, where arrays are of "usual array" type.
> Now that PostgreSQL 9.4+ incorporated definitely JSONB, the SQL array data
> type is an important "intermediate" between JSONB and usual SQL structures
> (and type operation algebras).
>
> So, perhaps, PostgreSQL 9.4+ will need a kind of "usual array type", a new
> internal type, and a *cast* function: with this new type will be possible
> to simplify the work with JSONB, and do other things like
>  array_agg(array[x,y]).
> ... It is not for final user, perhaps only for developers, or library
> plugins: an "intermediate" type that not broken compatibility... Not very
> useful, a type only to formally express things like to eficient cast, etc.
>
>
>
>
>
>

Re: [HACKERS] strange CREATE INDEX tab completion cases

On Mon, Jan 11, 2016 at 2:16 AM, Peter Eisentraut  wrote:
> On 12/13/15 9:16 AM, Michael Paquier wrote:
>> Please see the attached to address those things (and others) with
>> extra fixes for a couple of comments.
>
> I have ported these changes to the new world order and divided them up
> into more logical changes that are more clearly documented.  Please
> check that this matches what you had in mind.

Thanks for the new patches, this fell of my radar. This new version
looks fine to me.

+   /* If we have CREATE|UNIQUE INDEX CONCURRENTLY, then add "ON"
and existing
+  indexes */
This comment format is not correct.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2016-01-11 Thread Magnus Hagander

On Mon, Jan 11, 2016 at 4:31 AM, Joshua D. Drake 
wrote:

> On 01/10/2016 06:19 PM, Robert Haas wrote:
>
> [snip]
>
> No arguments with what was written above.

+1. Very well written.

>
>
> If we had an "issue" tracker rather than a bug tracker, I'd expect a
>> lot more unproductive bickering.
>>
>
> This I disagree with. It wouldn't be any worse than we have now. An issue
> tracker (if it is a good one) becomes the forum, whether you use an email
> or web interface. So expect at least as much banter as there is on lists
> but with a much easier management interface.
>
>
We can argue about if it's actually an easier management interface ;)

I'd agree with Robert in that it will cause some more such bickering -- but
only because the discussions become visible to a new group of people as
well. That's not necessarily a bad thing. But thinking that having such an
issue tracker is actually going to help *get rid of* the pointless part of
things is a nice dream, but just a dream. The only advantage I can see of
it is to increase the visibility of them.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Need help on pgcrypto

2016-01-11 Thread Vitaly Burovoy

On 1/11/16, Michael Paquier  wrote:
> On Mon, Jan 11, 2016 at 10:03 PM, rajan  wrote:
>> Trying to find a documentation which will make me understand how to secure
>> a
>> password column. I want the encryption to be one way and it should not be
>> decrypted.
>>
>> I am able to find discrete documents but nothing of them gets me there.
>
> Er, isn't that enough?
> http://www.postgresql.org/docs/devel/static/pgcrypto.html
> --
> Michael

I'd give more specific URL with examples of SELECT and UPDATE:
http://www.postgresql.org/docs/devel/static/pgcrypto.html#AEN170716

... but recommend to change just "gen_salt('md5')" to "gen_salt('bf', 8))"

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

On 2016-01-09 16:49:56 +0100, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> >Hm. New theory: The current flush interface does the flushing inside
> >FlushBuffer()->smgrwrite()->mdwrite()->FileWrite()->FlushContextSchedule(). 
> >The
> >problem with that is that at that point we (need to) hold a content lock
> >on the buffer!
> 
> You are worrying that FlushBuffer is holding a lock on a buffer and the
> "sync_file_range" call occurs is issued at that moment.
> 
> Although I agree that it is not that good, I would be surprise if that was
> the explanation for a performance regression, because the sync_file_range
> with the chosen parameters is an async call, it "advises" the OS to send the
> file, but it does not wait for it to be completed.

I frequently see sync_file_range blocking - it waits till it could
submit the writes into the io queues. On a system bottlenecked on IO
that's not always possible immediately.

> Also, maybe you could answer a question I had about the performance
> regression you observed, I could not find the post where you gave the
> detailed information about it, so that I could try reproducing it: what are
> the exact settings and conditions (shared_buffers, pgbench scaling, host
> memory, ...), what is the observed regression (tps? other?), and what is the
> responsiveness of the database under the regression (eg % of seconds with 0
> tps for instance, or something like that).

I measured it in a different number of cases, both on SSDs and spinning
rust. I just reproduced it with:

postgres-ckpt14 \
-D /srv/temp/pgdev-dev-800/ \
-c maintenance_work_mem=2GB \
-c fsync=on \
-c synchronous_commit=off \
-c shared_buffers=2GB \
-c wal_level=hot_standby \
-c max_wal_senders=10 \
-c max_wal_size=100GB \
-c checkpoint_timeout=30s

Using a fresh cluster each time (copied from a "template" to save time)
and using
pgbench -M prepared -c 16 -j16 -T 300 -P 1
I get

My laptop 1 EVO 840, 1 i7-4800MQ, 16GB ram:
master:
scaling factor: 800
query mode: prepared
number of clients: 16
number of threads: 16
duration: 300 s
number of transactions actually processed: 1155733
latency average: 4.151 ms
latency stddev: 8.712 ms
tps = 3851.242965 (including connections establishing)
tps = 3851.725856 (excluding connections establishing)

ckpt-14 (flushing by backends disabled):
scaling factor: 800
query mode: prepared
number of clients: 16
number of threads: 16
duration: 300 s
number of transactions actually processed: 855156
latency average: 5.612 ms
latency stddev: 7.896 ms
tps = 2849.876327 (including connections establishing)
tps = 2849.912015 (excluding connections establishing)

My laptop 1 850 PRO, 1 i7-4800MQ, 16GB ram:
master:
transaction type: TPC-B (sort of)
scaling factor: 800
query mode: prepared
number of clients: 16
number of threads: 16
duration: 300 s
number of transactions actually processed: 2104781
latency average: 2.280 ms
latency stddev: 9.868 ms
tps = 7010.397938 (including connections establishing)
tps = 7010.475848 (excluding connections establishing)

ckpt-14 (flushing by backends disabled):
scaling factor: 800
query mode: prepared
number of clients: 16
number of threads: 16
duration: 300 s
number of transactions actually processed: 1930716
latency average: 2.484 ms
latency stddev: 7.303 ms
tps = 6434.785605 (including connections establishing)
tps = 6435.13 (excluding connections establishing)

In neither case there are periods of 0 tps, but both have times of <
1000 tps with noticeably increased latency.


The endresults are similar with a sane checkpoint timeout - the tests
just take much longer to give meaningful results. Constantly running
long tests on prosumer level SSDs isn't nice - I've now killed 5 SSDs
with postgres testing...

As you can see there's roughly a 30% performance regression on the
slower SSD and a ~9% on the faster one. HDD results are similar (but I
can't repeat on the laptop right now since the 2nd hdd is now an SSD).


My working copy of checkpoint sorting & flushing currently results in:
My laptop 1 EVO 840, 1 i7-4800MQ, 16GB ram:
transaction type: TPC-B (sort of)
scaling factor: 800
query mode: prepared
number of clients: 16
number of threads: 16
duration: 300 s
number of transactions actually processed: 1136260
latency average: 4.223 ms
latency stddev: 8.298 ms
tps = 3786.696499 (including connections establishing)
tps = 3786.778875 (excluding connections establishing)

My laptop 1 850 PRO, 1 i7-4800MQ, 16GB ram:
transaction type: TPC-B (sort of)
scaling factor: 800
query mode: prepared
number of clients: 16
number of threads: 16
duration: 300 s
number of transactions actually processed: 2050661
latency average: 2.339 ms
latency stddev: 7.708 ms
tps = 6833.593170 (including connections establishing)
tps = 6833.680391 (excluding connections establishing)

My version of the patch currently addresses various points, which need
to be separated and benchmarked separate:
* Different

Re: [HACKERS] PATCH: add pg_current_xlog_flush_location function

2016-01-11 Thread Tomas Vondra



On 01/11/2016 06:30 AM, Michael Paquier wrote:


Updating LogwrtResult directly when calling your new function
GetXLogFlushRecPtr() does not strike me as a particularly good idea
per this portion in XLogFlush():

>

 /* Quick exit if already known flushed */
 if (record <= LogwrtResult.Flush)
 return;

The same counts for GetXLogWriteRecPtr, we had better allocate the
value in an independent variable as there are checks using it. For
now it does not matter much for the write position because all the
code paths doing the checks explicitly update again the pointer
before looking at it but it seems to me that using an independent
variable would make the code more robust.


Why? LogwrtResult only serves as a local cache of shared values, so 
there should be no danger of skipping something.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Need help on pgcrypto

On Mon, Jan 11, 2016 at 10:03 PM, rajan  wrote:
> Trying to find a documentation which will make me understand how to secure a
> password column. I want the encryption to be one way and it should not be
> decrypted.
>
> I am able to find discrete documents but nothing of them gets me there.

Er, isn't that enough?
http://www.postgresql.org/docs/devel/static/pgcrypto.html
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why doesn't src/backend/port/win32/socket.c implement bind()?

2016-01-11 Thread Magnus Hagander

On Mon, Jan 11, 2016 at 6:19 AM, Amit Kapila 
wrote:

> On Sun, Jan 10, 2016 at 11:55 PM, Tom Lane  wrote:
> >
> > Some of the Windows buildfarm members occasionally fail like this:
> >
> > LOG:  could not bind IPv4 socket: No error
> > HINT:  Is another postmaster already running on port 64470? If not, wait
> a few seconds and retry.
> > WARNING:  could not create listen socket for "127.0.0.1"
> > FATAL:  could not create any TCP/IP sockets
> >
> > (bowerbird, in particular, has a few recent examples)
> >
> > I think the reason why we're getting "No error" instead of a useful
> > strerror report is that socket.c doesn't provide an implementation
> > of bind() that includes TranslateSocketError().
> >
>
> listen also doesn't have such an implementation and probably few others.
>

The reason they don't is that when this compatibility layer was written, it
was to support the signal emulation. So the calls that were put in there
were the ones that we need(ed) to be able to interrupt with a signal. As
both bind() and listen() are not blocking commands (at least not normally),
there is no need to interrupt them, and thus there is no function in
socket.c for them.

I don't think anybody at the time was even considering the error handling.
Only insofar as handling the calls that were very clearly not the same as
the Unix variants. listen/bind were just missed.

> >  Why is that?
> >
>
> Not sure, but I could see that bind and listen doesn't have the equivalent
> Win sock API (checked in winsock2.h) and while googling on same,
> I found that there are reasons [1] why Win Sockets doesn't have the
> equivalent of some of the socket API's.
>
> I think here we should add a win32 wrapper over bind and listen
> API's which ensures TranslateSocketError() should be called for
> error cases.
>

Yeah, that seems like a good idea.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2016-01-11 Thread Joshua D. Drake


On 01/11/2016 03:31 AM, Magnus Hagander wrote:



We can argue about if it's actually an easier management interface ;)


(as long as it is manageable via email as well as web?)



I'd agree with Robert in that it will cause some more such bickering --
but only because the discussions become visible to a new group of people
as well. That's not necessarily a bad thing. But thinking that having
such an issue tracker is actually going to help *get rid of* the
pointless part of things is a nice dream, but just a dream. The only
advantage I can see of it is to increase the visibility of them.


Oh, I think you are right there. My point is that we have the ability to 
better manage that inforomation. We can mark something as a bug, feature 
request, approved feature, feature in development etc... Things become 
much more contextually aware.


JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-11 Thread Catalin Iacob

On Fri, Jan 8, 2016 at 7:56 AM, Pavel Stehule  wrote:
>> 3. PLy_error__init__ is used for BaseError but always sets an
>> attribute called spidata, I would expect that to only happen for
>> SPIError not for BaseError. You'll need to pick some other way of
>> attaching the error details to BaseError instances. Similarly,
>> PLy_get_spi_error_data became PLy_get_error_data and it's invoked on
>> other classes than SPIError but it still always looks at the spidata
>> attribute.
>
>
> I afraid of compatibility issues - so I am not changing internal format
> simply. Some legacy application can be based on detection of spidata
> attribute. So I cannot to change this structure for SPIError.

Indeed, I wasn't suggesting changing it for SPIError, that needs to
stay spidata.

> I can change it for BaseError, but this is question. Internally BaseError
> and SPIError share same data. So it can be strange if BaseError and SPIError
> uses different internal formats.
>
> I am interesting your opinion??

Yes, it's not great if BaseError and SPIError have different ways but
I think having BaseError have a member (spidata) named after one of
its subclasses is even worse.

I would say detail, hint etc belong to BaseError and I would make them
different attributes of BaseError instances instead of one big
BaseError.data tuple similar to what SPIError.spidata is now. SPIError
would keep its spidata for compatibility but would get the same
information into its detail, hint etc. attributes since it's derived
from BaseError.

So an equivalent to this Python (pseudo)code:

# base class for all exceptions raised by PL/Python
class BaseError:
def __init__(self, msg, detail=None, hint=None, and so on for
every accepted argument):
self.msg = msg
self.detail = detail
# and so on

class Error(BaseError):
 pass

class Fatal(BaseError):
pass

class SPIError(BaseError):
 def __init__(self, msg):
 # call BaseError.__init__ so that SPIError also gets .msg,
.detail and so on like all other BaseError instances
 # do what's done today to fill spidata to keep backward compatibility

If you don't like that spidata duplicates the other fields, it's
probably also ok to not make inherit from BaseError at all. I actually
like this better. Then I would call the base class something like
UserRaisedError (ugly name but can't think of something better to
emphasize that it's about errors that the PL/Python developer is
supposed to raise rather SPIError which is Postgres raising them) and
it would be:

# base class for exceptions raised by users in their PL/Python code
class UserRaisedError:
def __init__(self, msg, detail=None, hint=None, and so on for
every accepted argument):
self.msg = msg
self.detail = detail
# and so on

class Error(UserRaisedError):
 pass

class Fatal(UserRaisedError):
pass

# base class for exceptions raised by Postgres when executing sql code
class SPIError:
 # no change to this class

>
> a Fatal cannnot be raised - it is a session end. Personally - support of
> Fatal level is wrong, and it should not be propagated to user level, but it
> is now. And then due consistency I wrote support for fatal level. But hard
> to test it.
>

I see, then never mind.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma

On Tue, Jan 5, 2016 at 11:09 AM, Tom Lane  wrote:

> Atri Sharma  writes:
> > I fail to see the relevance of which node is getting evaluated (its a
> Plan
> > node BTW) for this question. The concern I had was around using SPI
> inside
> > executor and its fail safety.
>
> The code path executor -> PL function -> SPI certainly works, so
> presumably omitting the intermediate PL function could still work.
> Whether it's a good idea is another question entirely.  I do not
> offhand see a good reason why knowledge of non-system tables should
> exist in the core executor; so what is the need to use SPI?
>
>
Thanks!

This was a weird requirement and managed to work around it but I will keep
this hack for future reference.

Regards,

Atri

Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma

On Mon, Jan 11, 2016 at 10:48 PM, Atri Sharma  wrote:

> Sorry, I missed this email.
>
> I was specifically targeting accessing tables inside Node evaluation hence
> do not want to add new nodes.
>
> Thanks for your inputs!
>
> Regards,
>
> Atri
>
> On Tue, Jan 5, 2016 at 11:43 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
>
>> On 2016/01/05 14:30, Atri Sharma wrote:
>> > On Tue, Jan 5, 2016 at 9:54 AM, Amit Langote <
>> langote_amit...@lab.ntt.co.jp>
>> >> On 2016/01/05 3:53, Atri Sharma wrote:
>> >>> I was wary to use SPI inside the executor for node evaluation
>> functions.
>> >>> Does it seem safe?
>> >>
>> >> What is "node evaluation functions"? Is it "Plan" nodes or "Expr" nodes
>> >> that you are talking about? I guess you'd know to use ExecProcNode() or
>> >> ExecEvalExpr() for them, respectively.
>> >>
>> > I fail to see the relevance of which node is getting evaluated (its a
>> Plan
>> > node BTW) for this question. The concern I had was around using SPI
>> inside
>> > executor and its fail safety.
>>
>> Sorry, I may have misunderstood your question(s). Seeing your first
>> question in the thread, I see that you're looking to query non-system
>> tables within the executor. AFAIU, most of the processing within executor
>> takes the form of some node in some execution pipeline of a plan tree.
>> Perhaps, you're imagining some kind of node, subnode or some such. By the
>> way, some places like ATRewriteTable(), validateCheckConstraint() scan
>> user tables directly using low-level utilities within a dummy executor
>> context. I think Jim suggested something like that upthread.
>>
>>

Sorry for top posting.

Regards,

Atri

Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma

Sorry, I missed this email.

I was specifically targeting accessing tables inside Node evaluation hence
do not want to add new nodes.

Thanks for your inputs!

Regards,

Atri

On Tue, Jan 5, 2016 at 11:43 AM, Amit Langote  wrote:

> On 2016/01/05 14:30, Atri Sharma wrote:
> > On Tue, Jan 5, 2016 at 9:54 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp>
> >> On 2016/01/05 3:53, Atri Sharma wrote:
> >>> I was wary to use SPI inside the executor for node evaluation
> functions.
> >>> Does it seem safe?
> >>
> >> What is "node evaluation functions"? Is it "Plan" nodes or "Expr" nodes
> >> that you are talking about? I guess you'd know to use ExecProcNode() or
> >> ExecEvalExpr() for them, respectively.
> >>
> > I fail to see the relevance of which node is getting evaluated (its a
> Plan
> > node BTW) for this question. The concern I had was around using SPI
> inside
> > executor and its fail safety.
>
> Sorry, I may have misunderstood your question(s). Seeing your first
> question in the thread, I see that you're looking to query non-system
> tables within the executor. AFAIU, most of the processing within executor
> takes the form of some node in some execution pipeline of a plan tree.
> Perhaps, you're imagining some kind of node, subnode or some such. By the
> way, some places like ATRewriteTable(), validateCheckConstraint() scan
> user tables directly using low-level utilities within a dummy executor
> context. I think Jim suggested something like that upthread.
>
> Thanks,
> Amit
>
>
>


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Minor code improvements to create_foreignscan_plan/ExecInitForeignScan

I wonder,

> --- 2166,2213 
>   }
>   
>   /*
> !  * If rel is a base relation, detect whether any system columns are
> !  * requested from the rel.  (If rel is a join relation, rel->relid will 
> be
> !  * 0, but there can be no Var in the target list with relid 0, so we 
> skip
> !  * this in that case.  Note that any such system columns are assumed to 
> be
> !  * contained in fdw_scan_tlist, so we never need fsSystemCol to be true 
> in
> !  * the joinrel case.)  This is a bit of a kluge and might go away 
> someday,
> !  * so we intentionally leave it out of the API presented to FDWs.
>*/
> ! scan_plan->fsSystemCol = false;
> ! if (scan_relid > 0)
>   {
> ! Bitmapset  *attrs_used = NULL;
> ! ListCell   *lc;
> ! int i;
>   
> ! /*
> !  * First, examine all the attributes needed for joins or final 
> output.
> !  * Note: we must look at reltargetlist, not the attr_needed 
> data,
> !  * because attr_needed isn't computed for inheritance child 
> rels.
> !  */
> ! pull_varattnos((Node *) rel->reltargetlist, scan_relid, 
> _used);
>   
> ! /* Add all the attributes used by restriction clauses. */
> ! foreach(lc, rel->baserestrictinfo)
>   {
> ! RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
> ! 
> ! pull_varattnos((Node *) rinfo->clause, scan_relid, 
> _used);
>   }
>   
> ! /* Now, are any system columns requested from rel? */
> ! for (i = FirstLowInvalidHeapAttributeNumber + 1; i < 0; i++)
> ! {
> ! if (bms_is_member(i - 
> FirstLowInvalidHeapAttributeNumber, attrs_used))
> ! {
> ! scan_plan->fsSystemCol = true;
> ! break;
> ! }
> ! }
> ! 
> ! bms_free(attrs_used);
> ! }
>   
>   return scan_plan;
>   }

Would it make sense to call pull_varattnos(reltargetlist), then walk the
bitmapset and break if we see a system column, then call
pull_varattnos() on the rinfo->clause?  That way, if the targetlist
request a system column we don't have to walk the RestrictInfos.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2016-01 Commitfest

These are the numbers after one week of commitfest work:

 Needs review: 65.
 Waiting on Author: 14.
 Ready for Committer: 6.
 Committed: 14.
Total: 99. 

The attentive reader might notice that we grew one more patch since last
week, which is the "VACUUM progress checker" thingy that has been under
active review.

We went from 8 committed patches to 14.  At the current rate of 6
patches per week it would take us 12 weeks to close the commitfest,
which doesn't sound very good.

There are a number of patches in Needs-review state which haven't seen
any pgsql-hackers activity in a long while.  I'm particularly concerned
about the following patches:

* Support multiple synchronous standby servers
* Access method extendability
* Partial sort
* add 'waiting for replication' to pg_stat_activity.state
* More stable query plans via more predictable column statistics
* Statistics for array types
* Declarative partitioning
* Table Partition + Join Pushdown
* multivariate statistics

It would be very helpful of a reviewer to look at those patches.


Some committers have assigned patches to themselves:

* Andres Freund
  Speedup timestamp/time/date output functions
* Peter Eisentraut
  remove wal_level archive
* Tom Lane
  Rework index access method interface
* Teodor Sigaev
  New gist vacuum
* Stephen Frost
  Default Roles
* Simon Riggs
  Fix handling on XLOG_RUNNING_XACTS generated by bgwriter on idle
  systems

I assume this means they intend to commit them in some reasonable
timeframe (possibly after some rework).  If this is not the case, please
let us know.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Making plpython 2 and 3 coexist a bit better

Commit 803716013dc1350f installed a safeguard against loading plpython2
and plpython3 at the same time, stating

+   It is not allowed to use PL/Python based on Python 2 and PL/Python
+   based on Python 3 in the same session, because the symbols in the
+   dynamic modules would clash, which could result in crashes of the
+   PostgreSQL server process.  There is a check that prevents mixing
+   Python major versions in a session, which will abort the session if
+   a mismatch is detected.  It is possible, however, to use both
+   PL/Python variants in the same database, from separate sessions.

It turns out though that the freedom promised in the last sentence
is fairly illusory, because if you have functions in both languages
in one DB and you try a dump-and-restore, it will fail.

But it gets worse: a report today in pgsql-general points out that
even if you have the two languages in use *in different databases*,
pg_upgrade will fail, because pg_upgrade rather overenthusiastically
loads every .so mentioned anywhere in the source installation into
one session.

I think we might be able to improve this if we don't do the check in
plpython's _PG_init(), but delay it until we have a need to call into
libpython; which we could skip doing at _PG_init() time, at the cost
of perhaps one more flag check per plpython function call to see if
we've initialized libpython yet.

The behavior would then be that if you load both language .so's
into one session, neither one would be willing to do anything
anymore --- not run a function, and not syntax-check one either,
because any attempted call into either libpython might crash.
But restoring a dump has no need to do either of those things;
it just wants to be able to LOAD the .so.  Also, the error for
not being able to do something wouldn't have to be FATAL.

Comments?  Am I being too optimistic about whether it's safe to
have both libpythons loaded if we're not calling either of them?

(If I am, we should at least try to improve pg_upgrade so that
it's not imposing a greater restriction than anything else in
the system.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport


On 1/11/16 12:33 PM, Pavel Stehule wrote:

1. break compatibility and SPIError replace by Error


At this point I've lost track... what's the incompatibility between the two?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better


On 1/11/16 11:51 AM, Tom Lane wrote:

We could ameliorate the first of these cases by putting the can't-run-
with-two-pythons error back up to FATAL rather than ERROR, but I'm not
sure that that would be a net improvement --- FATAL errors aren't very
friendly.  In any case errors of the second type seem unpreventable
unless we stick with the immediate-FATAL-error approach.


Something that's always concerned me about functions in other languages 
is that any kind of snafu in the function/language can hose the backend, 
which you may or may not detect. I've used other databases that (by 
default) spin up a separate process for executing functions, maybe we 
could do something like that? If we treated 2 and 3 as different 
languages you could actually use both at the same time in a single 
backend. The only thing that's not clear to me is how you'd be able to 
re-enter the process during recursive/nested calls.


Obviously this is a lot more work than what you're proposing though. :(
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On 11 January 2016 at 12:58, Stas Kelvich  wrote:

>
> > On 10 Jan 2016, at 12:15, Simon Riggs  wrote:
> >
> > So we've only optimized half the usage? We're still going to cause
> replication delays.
>
> Yes, replica will go through old procedures of moving data to and from
> file.
>
> > We can either
> >
> > 1) Skip fsyncing the RecreateTwoPhaseFile and then fsync during
> restartpoints
>
> From what i’ve seen with old 2pc code main performance bottleneck was
> caused by frequent creating of files. So better to avoid files if possible.
>
> >
> > 2) Copy the contents to shmem and then write them at restartpoint as we
> do for checkpoint
> > (preferred)
>
> Problem with shared memory is that we can’t really predict size of state
> data, and anyway it isn’t faster then reading data from WAL
> (I have tested that while preparing original patch).
>
> We can just apply the same logic on replica that on master: do not do
> anything special on prepare, and just read that data from WAL.
> If checkpoint occurs during recovery/replay probably existing code will
> handle moving data to files.
>
> I will update patch to address this issue.
>

I'm looking to commit what we have now, so lets do that as a separate but
necessary patch please.


> > I think padding will negate the effects of the additional bool.
> >
> > If we want to reduce the size of the array GIDSIZE is currently 200, but
> XA says maximum 128 bytes.
> >
> > Anybody know why that is set to 200?
>
> Good catch about GID size.
>

I'll apply that as a separate patch also.


> If we talk about further optimisations i see two ways:
>
> 1) Optimising access to GXACT. Here we can try to shrink it; introduce
> more granular locks,
> e.g. move GIDs out of GXACT and lock GIDs array only once while checking
> new GID uniqueness; try to lock only part of GXACT by hash; etc.
>

Have you measured lwlocking as a problem?


> 2) Be optimistic about consequent COMMIT PREPARED. In normal workload next
> command after PREPARE will be COMMIT/ROLLBACK, so we can save
> transaction context and release it only if next command isn’t our
> designated COMMIT/ROLLBACK. But that is a big amount of work and requires
> changes to whole transaction pipeline in postgres.
>

We'd need some way to force session pools to use that correctly, but yes,
agreed.


> Anyway I suggest that we should consider that as a separate task.


Definitely. From the numbers, I can see there is still considerable
performance gain to be had.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] 2016-01 Commitfest

2016-01-11 Thread Peter Geoghegan

On Mon, Jan 11, 2016 at 6:38 AM, Alvaro Herrera
 wrote:
> * Partial sort

That shouldn't have been in "needs review" state. Fixed.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better

I wrote:
> I think we might be able to improve this if we don't do the check in
> plpython's _PG_init(), but delay it until we have a need to call into
> libpython; which we could skip doing at _PG_init() time, at the cost
> of perhaps one more flag check per plpython function call to see if
> we've initialized libpython yet.

> The behavior would then be that if you load both language .so's
> into one session, neither one would be willing to do anything
> anymore --- not run a function, and not syntax-check one either,
> because any attempted call into either libpython might crash.
> But restoring a dump has no need to do either of those things;
> it just wants to be able to LOAD the .so.  Also, the error for
> not being able to do something wouldn't have to be FATAL.

Attached is a draft patch along this line.  It seems to work as intended.

I can think of at least a couple of ways in which this test might be
inadequate.  One is that a plpython2 function might call a plpython3
function or vice versa.  The inner plpython_call_handler would correctly
throw an error, but if the outer function trapped the error and continued
execution, it would be at risk.  Another, more hypothetical risk is that
once we've executed anything out of one libpython, it might have changed
process state (eg, installed hooks) in such a way that it can get control
back even without an explicit call to plpython.  In that case, a
subsequent load of another Python version would put things at risk for
such execution in the first libpython.

We could ameliorate the first of these cases by putting the can't-run-
with-two-pythons error back up to FATAL rather than ERROR, but I'm not
sure that that would be a net improvement --- FATAL errors aren't very
friendly.  In any case errors of the second type seem unpreventable
unless we stick with the immediate-FATAL-error approach.

Since plpython only comes in an untrusted (hence superuser-only) flavor,
holes like this wouldn't be security issues but only "well, you shouldn't
do that" problems.  I'm inclined to think it's a good tradeoff for being
able to dump/restore/upgrade mixed installations, which are surely
likely to get more popular.

Thoughts?

regards, tom lane

diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c
index 3c2ebfa..059adfa 100644
*** a/src/pl/plpython/plpy_main.c
--- b/src/pl/plpython/plpy_main.c
*** static void PLy_init_interp(void);
*** 63,69 
  static PLyExecutionContext *PLy_push_execution_context(void);
  static void PLy_pop_execution_context(void);

! static const int plpython_python_version = PY_MAJOR_VERSION;

  /* initialize global variables */
  PyObject   *PLy_interp_globals = NULL;
--- 63,70 
  static PLyExecutionContext *PLy_push_execution_context(void);
  static void PLy_pop_execution_context(void);

! static int *plpython_version_bitmask_ptr = NULL;
! static int	plpython_version_bitmask = 0;

  /* initialize global variables */
  PyObject   *PLy_interp_globals = NULL;
*** static PLyExecutionContext *PLy_executio
*** 75,102 
  void
  _PG_init(void)
  {
! 	/* Be sure we do initialization only once (should be redundant now) */
! 	static bool inited = false;
  	const int **version_ptr;

! 	if (inited)
! 		return;

! 	/* Be sure we don't run Python 2 and 3 in the same session (might crash) */
  	version_ptr = (const int **) find_rendezvous_variable("plpython_python_version");
! 	if (!(*version_ptr))
! 		*version_ptr = _python_version;
! 	else
! 	{
! 		if (**version_ptr != plpython_python_version)
! 			ereport(FATAL,
! 	(errmsg("Python major version mismatch in session"),
! 	 errdetail("This session has previously used Python major version %d, and it is now attempting to use Python major version %d.",
! 			   **version_ptr, plpython_python_version),
! 	 errhint("Start a new session to use a different Python major version.")));
! 	}

! 	pg_bindtextdomain(TEXTDOMAIN);

  #if PY_MAJOR_VERSION >= 3
  	PyImport_AppendInittab("plpy", PyInit_plpy);
--- 76,144 
  void
  _PG_init(void)
  {
! 	int		  **bitmask_ptr;
  	const int **version_ptr;

! 	/*
! 	 * Set up a shared bitmask variable telling which Python version(s) are
! 	 * loaded into this process's address space.  If there's more than one, we
! 	 * cannot call into libpython for fear of causing crashes.  But postpone
! 	 * the actual failure for later, so that operations like pg_restore can
! 	 * load more than one plpython library so long as they don't try to do
! 	 * anything much with the language.
! 	 */
! 	bitmask_ptr = (int **) find_rendezvous_variable("plpython_version_bitmask");
! 	if (!(*bitmask_ptr))		/* am I the first? */
! 		*bitmask_ptr = _version_bitmask;
! 	/* Retain pointer to the agreed-on shared variable ... */
! 	plpython_version_bitmask_ptr = *bitmask_ptr;
! 	/* ... and announce my presence */
! 	*plpython_version_bitmask_ptr |= (1 << PY_MAJOR_VERSION);

! 	/*
! 	 * This should

Re: [HACKERS] Speedup twophase transactions

Hi,

On 2016-01-09 15:29:11 +, Simon Riggs wrote:
> Hmm, I was just preparing this for commit.

Just read downthread that you want to commit this soon. Please hold of
for a while, this doesn't really look ready to me. I don't have time for
a real review right now, but I'll try to get to it asap.

> +
> +/*
> + * Reads 2PC data from xlog. During checkpoint this data will be moved to
> + * twophase files and ReadTwoPhaseFile should be used instead.
> + */
> +static void
> +XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
> +{
> + XLogRecord *record;
> + XLogReaderState *xlogreader;
> + char   *errormsg;
> +
> + xlogreader = XLogReaderAllocate(_read_local_xlog_page,
> NULL);

logical_read_local_xlog_page isn't really suitable for the use
here. Besides the naming issue, at the very least it'll be wrong during
WAL replay in the presence of promotions on an upstream node - it
doesn't dealwith timelines.

More generally, I'm doubtful that the approach of reading data from WAL
as proposed here is a very good idea. It seems better to "just" dump the
entire 2pc state into *one* file at checkpoint time.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On 2016-01-11 20:03:18 +0100, Andres Freund wrote:
> More generally, I'm doubtful that the approach of reading data from WAL
> as proposed here is a very good idea. It seems better to "just" dump the
> entire 2pc state into *one* file at checkpoint time.

Or better: After determining the checkpoint redo location, insert a WAL
record representing the entire 2PC state as of that moment. That way it
can easily restored during WAL replay and nothing special has to be done
on a standby. This way we'll need no extra wal flushes and fsyncs.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2016-01 Commitfest

On 11 January 2016 at 14:38, Alvaro Herrera 
wrote:


> * Simon Riggs
>   Fix handling on XLOG_RUNNING_XACTS generated by bgwriter on idle
>   systems
>
> I assume this means they intend to commit them in some reasonable
> timeframe (possibly after some rework).  If this is not the case, please
> let us know.


Confirmed for this CF.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-11 Thread Pavel Stehule

2016-01-11 17:05 GMT+01:00 Catalin Iacob :

> On Fri, Jan 8, 2016 at 7:56 AM, Pavel Stehule 
> wrote:
> >> 3. PLy_error__init__ is used for BaseError but always sets an
> >> attribute called spidata, I would expect that to only happen for
> >> SPIError not for BaseError. You'll need to pick some other way of
> >> attaching the error details to BaseError instances. Similarly,
> >> PLy_get_spi_error_data became PLy_get_error_data and it's invoked on
> >> other classes than SPIError but it still always looks at the spidata
> >> attribute.
> >
> >
> > I afraid of compatibility issues - so I am not changing internal format
> > simply. Some legacy application can be based on detection of spidata
> > attribute. So I cannot to change this structure for SPIError.
>
> Indeed, I wasn't suggesting changing it for SPIError, that needs to
> stay spidata.
>
> > I can change it for BaseError, but this is question. Internally BaseError
> > and SPIError share same data. So it can be strange if BaseError and
> SPIError
> > uses different internal formats.
> >
> > I am interesting your opinion??
>
> Yes, it's not great if BaseError and SPIError have different ways but
> I think having BaseError have a member (spidata) named after one of
> its subclasses is even worse.
>
> I would say detail, hint etc belong to BaseError and I would make them
> different attributes of BaseError instances instead of one big
> BaseError.data tuple similar to what SPIError.spidata is now. SPIError
> would keep its spidata for compatibility but would get the same
> information into its detail, hint etc. attributes since it's derived
> from BaseError.
>
> So an equivalent to this Python (pseudo)code:
>
> # base class for all exceptions raised by PL/Python
> class BaseError:
> def __init__(self, msg, detail=None, hint=None, and so on for
> every accepted argument):
> self.msg = msg
> self.detail = detail
> # and so on
>
> class Error(BaseError):
>  pass
>
> class Fatal(BaseError):
> pass
>
> class SPIError(BaseError):
>  def __init__(self, msg):
>  # call BaseError.__init__ so that SPIError also gets .msg,
> .detail and so on like all other BaseError instances
>  # do what's done today to fill spidata to keep backward
> compatibility
>
> If you don't like that spidata duplicates the other fields, it's
> probably also ok to not make inherit from BaseError at all. I actually
> like this better. Then I would call the base class something like
> UserRaisedError (ugly name but can't think of something better to
> emphasize that it's about errors that the PL/Python developer is
> supposed to raise rather SPIError which is Postgres raising them) and
> it would be:
>
> # base class for exceptions raised by users in their PL/Python code
> class UserRaisedError:
> def __init__(self, msg, detail=None, hint=None, and so on for
> every accepted argument):
> self.msg = msg
> self.detail = detail
> # and so on
>
> class Error(UserRaisedError):
>  pass
>
> class Fatal(UserRaisedError):
> pass
>
> # base class for exceptions raised by Postgres when executing sql code
> class SPIError:
>  # no change to this class
>
>
I see it.

it looks like distinguish between Error and SPIError is wrong way. And I
have any other ugly use case.

If I raise a Error from one PLPythonu function, then in other PLPython
function I'll trap a SPIError exception, because the information about
class was lost inside Postgres. And it should be pretty messy.
I have not information if any exception was User based or SPI based.

The differentiation between Error and SPIError is wrong, because there
isn't any difference in reality. There are two ways.

1. break compatibility and SPIError replace by Error

2. don't introduce Error class

-- @1 is better - the SPIError isn't the best name, but breaking
compatibility is pretty unhappy - so only one solution is @2 :(


Comments, notes ??

Regards

Pavel



> >
> > a Fatal cannnot be raised - it is a session end. Personally - support of
> > Fatal level is wrong, and it should not be propagated to user level, but
> it
> > is now. And then due consistency I wrote support for fatal level. But
> hard
> > to test it.
> >
>
> I see, then never mind.
>

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-11 Thread Pavel Stehule

2016-01-11 19:41 GMT+01:00 Jim Nasby :

> On 1/11/16 12:33 PM, Pavel Stehule wrote:
>
>> 1. break compatibility and SPIError replace by Error
>>
>
> At this point I've lost track... what's the incompatibility between the
> two?
>

the name and internal format (but this structure can be visible to user
space)

Pavel


> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better

Jim Nasby  writes:
> Something that's always concerned me about functions in other languages 
> is that any kind of snafu in the function/language can hose the backend, 
> which you may or may not detect. I've used other databases that (by 
> default) spin up a separate process for executing functions, maybe we 
> could do something like that? If we treated 2 and 3 as different 
> languages you could actually use both at the same time in a single 
> backend. The only thing that's not clear to me is how you'd be able to 
> re-enter the process during recursive/nested calls.

There's at least one PL/Java implementation that does that.  The
interprocess communication overhead is pretty awful, IIRC.  Don't know
what they do about nested calls.

> Obviously this is a lot more work than what you're proposing though. :(

Yeah.  I think what I'm suggesting is a back-patchable fix, which that
certainly wouldn't be.

I realized that the patch I put up before would do the Wrong Thing if
an updated library were loaded followed by a not-updated library for
the other python version.  Ideally that couldn't happen, but considering
that the two plpythons are likely to get packaged in separate subpackages
(certainly Red Hat does things that way), it's not too hard to envision
cases where it would.  So the attached update corrects that and fixes the
docs too.  We can lose the whole test in HEAD, but I think it's necessary
in the back branches.

The question of whether to do ERROR or FATAL remains open.  I'm not sure
I have a strong preference either way.

regards, tom lane

diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index 015bbad..03f6bcc 100644
*** a/doc/src/sgml/plpython.sgml
--- b/doc/src/sgml/plpython.sgml
***
*** 176,183 
 based on Python 3 in the same session, because the symbols in the
 dynamic modules would clash, which could result in crashes of the
 PostgreSQL server process.  There is a check that prevents mixing
!Python major versions in a session, which will abort the session if
!a mismatch is detected.  It is possible, however, to use both
 PL/Python variants in the same database, from separate sessions.

--- 176,183 
 based on Python 3 in the same session, because the symbols in the
 dynamic modules would clash, which could result in crashes of the
 PostgreSQL server process.  There is a check that prevents mixing
!Python major versions within a session.
!It is possible, however, to use both
 PL/Python variants in the same database, from separate sessions.

diff --git a/src/pl/plpython/plpy_main.c b/src/pl/plpython/plpy_main.c
index 3c2ebfa..6805148 100644
*** a/src/pl/plpython/plpy_main.c
--- b/src/pl/plpython/plpy_main.c
*** static void PLy_init_interp(void);
*** 63,68 
--- 63,71 
  static PLyExecutionContext *PLy_push_execution_context(void);
  static void PLy_pop_execution_context(void);

+ /* static state for Python library conflict detection */
+ static int *plpython_version_bitmask_ptr = NULL;
+ static int	plpython_version_bitmask = 0;
  static const int plpython_python_version = PY_MAJOR_VERSION;

  /* initialize global variables */
*** static PLyExecutionContext *PLy_executio
*** 75,102 
  void
  _PG_init(void)
  {
! 	/* Be sure we do initialization only once (should be redundant now) */
! 	static bool inited = false;
  	const int **version_ptr;

! 	if (inited)
! 		return;

! 	/* Be sure we don't run Python 2 and 3 in the same session (might crash) */
  	version_ptr = (const int **) find_rendezvous_variable("plpython_python_version");
  	if (!(*version_ptr))
  		*version_ptr = _python_version;
  	else
  	{
! 		if (**version_ptr != plpython_python_version)
  			ereport(FATAL,
  	(errmsg("Python major version mismatch in session"),
  	 errdetail("This session has previously used Python major version %d, and it is now attempting to use Python major version %d.",
  			   **version_ptr, plpython_python_version),
  	 errhint("Start a new session to use a different Python major version.")));
  	}

! 	pg_bindtextdomain(TEXTDOMAIN);

  #if PY_MAJOR_VERSION >= 3
  	PyImport_AppendInittab("plpy", PyInit_plpy);
--- 78,155 
  void
  _PG_init(void)
  {
! 	int		  **bitmask_ptr;
  	const int **version_ptr;

! 	/*
! 	 * Set up a shared bitmask variable telling which Python version(s) are
! 	 * loaded into this process's address space.  If there's more than one, we
! 	 * cannot call into libpython for fear of causing crashes.  But postpone
! 	 * the actual failure for later, so that operations like pg_restore can
! 	 * load more than one plpython library so long as they don't try to do
! 	 * anything much with the language.
! 	 */
! 	bitmask_ptr = (int **) find_rendezvous_variable("plpython_version_bitmask");
! 	if (!(*bitmask_ptr))		/* am I the first? */
! 		*bitmask_ptr = _version_bitmask;

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better


On 1/11/16 1:00 PM, Tom Lane wrote:

There's at least one PL/Java implementation that does that.  The
interprocess communication overhead is pretty awful, IIRC.  Don't know
what they do about nested calls.


You'd think that pipes wouldn't be that much overhead...


>Obviously this is a lot more work than what you're proposing though.:(

Yeah.  I think what I'm suggesting is a back-patchable fix, which that
certainly wouldn't be.


Yeah, and it sounds like we need one.


The question of whether to do ERROR or FATAL remains open.  I'm not sure
I have a strong preference either way.


If they both get loaded is there risk of bad data happening? Personally, 
I'll take a traceable FATAL (or even PANIC) over data corruption every 
time. But I'm guessing that if you tried to use both you'd pretty 
immediately end up crashing the backend.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange CREATE INDEX tab completion cases

Alvaro Herrera  writes:
> One thing I just noticed is that CREATE INDEX CONCURRENTLY cannot be
> used within CREATE SCHEMA, so perhaps the lines that match the
> CONCURRENTLY keyword should use Matches() rather than TailMatches().
> Similarly (but perhaps this is not workable) the lines that TailMatch()
> but do not Match() should not offer CONCURRENTLY after INDEX.

This seems overcomplicated.  I don't think there's any expectation that
tab completion is 100% right all the time.  Let's just treat CREATE INDEX
CONCURRENTLY the same as CREATE INDEX.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On 11 January 2016 at 19:07, Andres Freund  wrote:

> On 2016-01-11 20:03:18 +0100, Andres Freund wrote:
> > More generally, I'm doubtful that the approach of reading data from WAL
> > as proposed here is a very good idea. It seems better to "just" dump the
> > entire 2pc state into *one* file at checkpoint time.
>
> Or better: After determining the checkpoint redo location, insert a WAL
> record representing the entire 2PC state as of that moment. That way it
> can easily restored during WAL replay and nothing special has to be done
> on a standby. This way we'll need no extra wal flushes and fsyncs.
>

Feel free to submit a patch that does that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Speedup twophase transactions

On 11 January 2016 at 18:51, Jesper Pedersen 
wrote:

> On 01/10/2016 04:15 AM, Simon Riggs wrote:
>
>> One concern that come into my mind while reading updated
>>> patch is about creating extra bool field in GlobalTransactionData
>>> structure. While this improves readability, it
>>> also increases size of that structure and that size have impact on
>>> performance on systems with many cores
>>> (say like 60-80). Probably one byte will not make measurable difference,
>>> but I think it is good idea to keep
>>> GXact as small as possible. As far as I understand the same logic was
>>> behind split of
>>> PGPROC to PGPROC+PGXACT in 9.2 (comment in proc.h:166)
>>>
>>
>>
>> I think padding will negate the effects of the additional bool.
>>
>> If we want to reduce the size of the array GIDSIZE is currently 200, but
>> XA
>> says maximum 128 bytes.
>>
>> Anybody know why that is set to 200?
>>
>>
> Even though GlobalTransactionId and BranchQualifer have a maximum of 64
> each, external clients may choose to encode the information, and thereby
> need more space,
>
>
> https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/xa/RecoveredXid.java#L66-L70
>
> http://docs.oracle.com/javaee/7/api/javax/transaction/xa/Xid.html
>
> which in this case adds up to a maximum of 189 characters.
>

OK, thanks for those references.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] strange CREATE INDEX tab completion cases

Peter Eisentraut wrote:
> On 12/13/15 9:16 AM, Michael Paquier wrote:
> > Please see the attached to address those things (and others) with
> > extra fixes for a couple of comments.
> 
> I have ported these changes to the new world order and divided them up
> into more logical changes that are more clearly documented.  Please
> check that this matches what you had in mind.

One thing I just noticed is that CREATE INDEX CONCURRENTLY cannot be
used within CREATE SCHEMA, so perhaps the lines that match the
CONCURRENTLY keyword should use Matches() rather than TailMatches().
Similarly (but perhaps this is not workable) the lines that TailMatch()
but do not Match() should not offer CONCURRENTLY after INDEX.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On 2016-01-11 19:15:23 +, Simon Riggs wrote:
> On 11 January 2016 at 19:03, Andres Freund  wrote:
> 
> > Hi,
> >
> > On 2016-01-09 15:29:11 +, Simon Riggs wrote:
> > > Hmm, I was just preparing this for commit.
> >
> > Just read downthread that you want to commit this soon. Please hold of
> > for a while, this doesn't really look ready to me. I don't have time for
> > a real review right now, but I'll try to get to it asap.
> >
> 
> "A real review"? Huh.

All I meant was that my email didn't consist out of a real review, but
just was a quick scan

> > More generally, I'm doubtful that the approach of reading data from WAL
> > as proposed here is a very good idea. It seems better to "just" dump the
> > entire 2pc state into *one* file at checkpoint time.
> >
> 
>  I think you misunderstand the proposed approach. This isn't just to do
> with reading things back at checkpoint.

Sure, the main purpose is not to write 2pc state files in the common
path - or is that not the main purpose?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Extending the HyperLogLog API a bit

Tomas Vondra wrote:

> Attached is v2 of the patch, adding the comments.

Looks pretty reasonable to me.  I'm not sure we want to push this ahead
of the bloom filter stuff, but it looks ready to commit otherwise.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On 11 January 2016 at 18:43, Simon Riggs  wrote:

> I'm looking to commit what we have now.
>

Here is the patch in its "final" state after my minor additions, edits and
review.

Performance tests for me show that the patch is effective; my results match
Jesper's roughly in relative numbers.

My robustness review is that the approach and implementation are safe.

It's clear there are various additional tuning opportunities, but the
objective of the current patch to improve performance is very, very clearly
met, so I'm aiming to commit *this* patch soon.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

2pc_optimize.v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On January 11, 2016 8:57:58 PM GMT+01:00, Simon Riggs  
wrote:
>On 11 January 2016 at 18:43, Simon Riggs  wrote:

>It's clear there are various additional tuning opportunities, but the
>objective of the current patch to improve performance is very, very
>clearly
>met, so I'm aiming to commit *this* patch soon.

Again, the WAL read routine used doesn't deal with timeline changes. So no,  
it's bit ready to be committed.

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

On 1/11/16 12:46 PM, Pavel Stehule wrote:

2016-01-11 19:41 GMT+01:00 Jim Nasby >:

On 1/11/16 12:33 PM, Pavel Stehule wrote:

1. break compatibility and SPIError replace by Error

At this point I've lost track... what's the incompatibility between
the two?

the name and internal format (but this structure can be visible to user
space)

Were Error and Fatal ever documented as classes? All I see is "raise
plpy.Error(msg) and raise plpy.Fatal(msg) are equivalent to calling
plpy.error and plpy.fatal, respectively." which doesn't lead me to
believe I should be trapping on those.

It's not clear to me why you'd want to handle error and fatal
differently anyway; an error is an error. Unless fatal isn't supposed to
be trappable? [1] leads me to believe that you shouldn't be able to trap
a FATAL because it's supposed to cause the entire session to abort.

Since spiexceptions and SPIError are the only documented exceptions
classes, I'd say we should stick with those and get rid of the others.
Worst-case, we can have a compatability GUC, but I think plpy.Error and
plpy.Fatal were just poorly thought out.

[1]
http://www.postgresql.org/docs/9.5/static/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speedup twophase transactions

On 11 January 2016 at 19:03, Andres Freund  wrote:

> Hi,
>
> On 2016-01-09 15:29:11 +, Simon Riggs wrote:
> > Hmm, I was just preparing this for commit.
>
> Just read downthread that you want to commit this soon. Please hold of
> for a while, this doesn't really look ready to me. I don't have time for
> a real review right now, but I'll try to get to it asap.
>

"A real review"? Huh.


> > +
> > +/*
> > + * Reads 2PC data from xlog. During checkpoint this data will be moved
> to
> > + * twophase files and ReadTwoPhaseFile should be used instead.
> > + */
> > +static void
> > +XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
> > +{
> > + XLogRecord *record;
> > + XLogReaderState *xlogreader;
> > + char   *errormsg;
> > +
> > + xlogreader = XLogReaderAllocate(_read_local_xlog_page,
> > NULL);
>
> logical_read_local_xlog_page isn't really suitable for the use
> here. Besides the naming issue, at the very least it'll be wrong during
> WAL replay in the presence of promotions on an upstream node - it
> doesn't dealwith timelines.
>

I'm aware of that, though note that it isn't being used in that way here.


> More generally, I'm doubtful that the approach of reading data from WAL
> as proposed here is a very good idea. It seems better to "just" dump the
> entire 2pc state into *one* file at checkpoint time.
>

 I think you misunderstand the proposed approach. This isn't just to do
with reading things back at checkpoint.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] checkpointer continuous flushing

On 2016-01-11 14:45:16 +0100, Andres Freund wrote:
> On 2016-01-09 16:49:56 +0100, Fabien COELHO wrote:
> > >Hm. New theory: The current flush interface does the flushing inside
> > >FlushBuffer()->smgrwrite()->mdwrite()->FileWrite()->FlushContextSchedule().
> > > The
> > >problem with that is that at that point we (need to) hold a content lock
> > >on the buffer!
> > 
> > You are worrying that FlushBuffer is holding a lock on a buffer and the
> > "sync_file_range" call occurs is issued at that moment.
> > 
> > Although I agree that it is not that good, I would be surprise if that was
> > the explanation for a performance regression, because the sync_file_range
> > with the chosen parameters is an async call, it "advises" the OS to send the
> > file, but it does not wait for it to be completed.
> 
> I frequently see sync_file_range blocking - it waits till it could
> submit the writes into the io queues. On a system bottlenecked on IO
> that's not always possible immediately.
> 
> > Also, maybe you could answer a question I had about the performance
> > regression you observed, I could not find the post where you gave the
> > detailed information about it, so that I could try reproducing it: what are
> > the exact settings and conditions (shared_buffers, pgbench scaling, host
> > memory, ...), what is the observed regression (tps? other?), and what is the
> > responsiveness of the database under the regression (eg % of seconds with 0
> > tps for instance, or something like that).
> 
> I measured it in a different number of cases, both on SSDs and spinning
> rust. I just reproduced it with:
> 
> postgres-ckpt14 \
> -D /srv/temp/pgdev-dev-800/ \
> -c maintenance_work_mem=2GB \
> -c fsync=on \
> -c synchronous_commit=off \
> -c shared_buffers=2GB \
> -c wal_level=hot_standby \
> -c max_wal_senders=10 \
> -c max_wal_size=100GB \
> -c checkpoint_timeout=30s
> 
> Using a fresh cluster each time (copied from a "template" to save time)
> and using
> pgbench -M prepared -c 16 -j16 -T 300 -P 1
> I get
> 
> My laptop 1 EVO 840, 1 i7-4800MQ, 16GB ram:
> master:
> scaling factor: 800
> query mode: prepared
> number of clients: 16
> number of threads: 16
> duration: 300 s
> number of transactions actually processed: 1155733
> latency average: 4.151 ms
> latency stddev: 8.712 ms
> tps = 3851.242965 (including connections establishing)
> tps = 3851.725856 (excluding connections establishing)
> 
> ckpt-14 (flushing by backends disabled):
> scaling factor: 800
> query mode: prepared
> number of clients: 16
> number of threads: 16
> duration: 300 s
> number of transactions actually processed: 855156
> latency average: 5.612 ms
> latency stddev: 7.896 ms
> tps = 2849.876327 (including connections establishing)
> tps = 2849.912015 (excluding connections establishing)

Hm. I think I have an entirely different theory that might explain some
of this theory. I instrumented lwlocks to check for additional blocking
and found some. Admittedly not exactly where I thought it might
be. Check out what you can observe when adding/enabling an elog in
FlushBuffer() (and the progress printing from BufferSync()):

(sorry, a bit long, but it's necessary to understand)

[2016-01-11 20:15:02 CET][14957] CONTEXT:  writing block 0 of relation 
base/13000/16387
to_scan: 131141, scanned: 6, %processed: 0.00, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG:  xlog flush request 1F/D2FD7E0; write 
1F/D296000; flush 1F/D296000; insert: 1F/D33B418
[2016-01-11 20:15:02 CET][14957] CONTEXT:  writing block 2 of relation 
base/13000/16387
to_scan: 131141, scanned: 7, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG:  xlog flush request 1F/D3B2E30; write 
1F/D33C000; flush 1F/D33C000; insert: 1F/D403198
[2016-01-11 20:15:02 CET][14957] CONTEXT:  writing block 3 of relation 
base/13000/16387
to_scan: 131141, scanned: 9, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG:  xlog flush request 1F/D469990; write 
1F/D402000; flush 1F/D402000; insert: 1F/D4FDD00
[2016-01-11 20:15:02 CET][14957] CONTEXT:  writing block 5 of relation 
base/13000/16387
to_scan: 131141, scanned: 11, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG:  xlog flush request 1F/D5663E8; write 
1F/D4FC000; flush 1F/D4FC000; insert: 1F/D5D1390
[2016-01-11 20:15:02 CET][14957] CONTEXT:  writing block 7 of relation 
base/13000/16387
to_scan: 131141, scanned: 14, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG:  xlog flush request 1F/D673700; write 
1F/D5D; flush 1F/D5D; insert: 1F/D687E58
[2016-01-11 20:15:02 CET][14957] CONTEXT:  writing block 10 of relation 
base/13000/16387
to_scan: 131141, scanned: 15, %processed: 0.01, %writeouts: 100.00
[2016-01-11 20:15:02 CET][14957] LOG:  xlog flush request 1F/D76BEC8; write 
1F/D686000; flush 1F/D686000; insert: 1F/D7A83A0
[2016-01-11 20:15:02 CET][14957] CONTEXT:

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better