date:20151109

kawami...@tkl.iis.u-tokyo.ac.jp writes:
>   - cost parameter calibration: random_page_cost = 92.89

TBH, you lost me there already.  I know of no hardware on which that would
be a sane depiction of reality, so I think you've probably overfitted the
model to some particular case it was already inaccurate on.  Any results
you're getting using this setting will likely fall into the category of
"garbage in, garbage out".

What led you to choose that number?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-09 14:44 GMT+01:00 YUriy Zhuravlev :

> On Monday 09 November 2015 13:50:20 Pavel Stehule wrote:
> > New symbols increase a complexity of our code and our documentation.
> >
> > If some functionality can be implemented via functions without
> performance
> > impacts, we should not to create new operators or syntax - mainly for
> > corner use cases.
> >
> > Regards
> >
> > Pavel
>
> Ok we can use {:} instead [:] for zero array access.
> The function is the solution half.
>

It isn't solution. The any syntax/behave change have to have stronger
motivation. We had so talk about it 20 years ago :(

Regards

Pavel


>
> --
> YUriy Zhuravlev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Some questions about the array.

2015-11-09 13:50 GMT+01:00 Pavel Stehule :

>
>
> 2015-11-09 13:32 GMT+01:00 YUriy Zhuravlev :
>
>> On Monday 09 November 2015 13:29:30 you wrote:
>> > It is ugly, but you can wrap it to function - so still I don't see any
>> > reason, why it is necessary
>> For example, I'm writing a lot of queries by hands...
>> This functionality is available in many languages and it's just
>> convenient. Of
>> course it is possible and without it, but why?
>>
>
> New symbols increase a complexity of our code and our documentation.
>
> If some functionality can be implemented via functions without performance
> impacts, we should not to create new operators or syntax - mainly for
> corner use cases.
>

I can understand, so current system of accessing array data has some
disadvantage - the behave was designed twenty years ago, when the arrays
didn't support NULLs, but I am not sure, if introduction secondary
accessing system helps. Probably not.

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>
>>
>> Thanks.
>> --
>> YUriy Zhuravlev
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>

Re: [HACKERS] Some questions about the array.

2015-11-09 13:07 GMT+01:00 YUriy Zhuravlev :

> On Monday 09 November 2015 12:48:54 you wrote:
> > I am sorry - it is looking pretty obscure. Really need this feature?
>
> IMHO yes.
> Now for write: array[~2:~-2] you need like:
> array[array_lower(array, 1)+3: array_upper(array, 1)-2]
>
> Worse when long names. Besides the extra functions calls.
>

It is ugly, but you can wrap it to function - so still I don't see any
reason, why it is necessary

Regards

Pavel



>
> Thanks.
> --
> YUriy Zhuravlev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Some questions about the array.

2015-11-09 Thread Vitaly Burovoy

On 11/9/15, YUriy Zhuravlev  wrote:
> On Monday 09 November 2015 12:48:54 you wrote:
>> I am sorry - it is looking pretty obscure. Really need this feature?
>
> IMHO yes.
> Now for write: array[~2:~-2] you need like:
> array[array_lower(array, 1)+3: array_upper(array, 1)-2]
>
> Worse when long names. Besides the extra functions calls.

You can write it as a separate function instead of changing current syntax.
Call would be like :
SELECT slice_abs('[-3:3]={1,2,3,4,5,6,7}'::int[], 2, -2)  == {3,4,5,6}
SELECT slice_abs('[-3:3]={1,2,3,4,5,6,7}'::int[], 2, NULL)  ==
{3,4,5,6,7}  -- omitting boundaries
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-09 13:32 GMT+01:00 YUriy Zhuravlev :

> On Monday 09 November 2015 13:29:30 you wrote:
> > It is ugly, but you can wrap it to function - so still I don't see any
> > reason, why it is necessary
> For example, I'm writing a lot of queries by hands...
> This functionality is available in many languages and it's just
> convenient. Of
> course it is possible and without it, but why?
>

New symbols increase a complexity of our code and our documentation.

If some functionality can be implemented via functions without performance
impacts, we should not to create new operators or syntax - mainly for
corner use cases.

Regards

Pavel



>
> Thanks.
> --
> YUriy Zhuravlev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Some questions about the array.

On Monday 09 November 2015 13:29:30 you wrote:
> It is ugly, but you can wrap it to function - so still I don't see any
> reason, why it is necessary
For example, I'm writing a lot of queries by hands...
This functionality is available in many languages and it's just convenient. Of 
course it is possible and without it, but why?

Thanks.
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-09 13:38 GMT+01:00 YUriy Zhuravlev :

> On Monday 09 November 2015 04:33:28 you wrote:
> > You can write it as a separate function instead of changing current
> syntax.
> I do not think, because we have a multi-dimensional arrays.
> And why we have [:] syntax now?
>

The own implementation of ":" can have a performance impact on code. Now,
it is better on 9.5, but it was impossible to implement it in plpgsql
effectively.

Regards

Pavel


> --
> YUriy Zhuravlev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-11-05 7:24 GMT+01:00 Catalin Iacob :

> On Wed, Nov 4, 2015 at 10:12 AM, Pavel Stehule 
> wrote:
> > It helped me lot of, thank you
>
> Welcome, I learned quite some from the process as well.
>
> >>
> >>
> >> There's just the doc part left then.
> >
> >
> > done
>
> We're almost there but not quite.
>
> There's still a typo in the docs: excation.
>
> A plpy.SPIError can be raised should be A
> plpy.SPIError can be raised right?
>
> And most importantly, for Python 3.5 there is a plpython_error_5.out
> which is needed because of an alternative exception message in that
> version. You didn't update this file, this makes the tests fail on
> Python3.5.
>
> Since you might not have Python 3.5 easily available I've attached a
> patch to plpython_error_5.out which makes the tests pass, you can fold
> this into your patch.
>

I needed to understand the support for Python 3.5.

The patch with the fix is attached regress tests 3.5 Python

Regards

Pavel
commit f3ab75afe0d9b31118d461a346e3c56ceb09c238
Author: Pavel Stehule 
Date:   Sat Oct 17 20:11:56 2015 +0200

inital

diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index 015bbad..51bc48e 100644
--- a/doc/src/sgml/plpython.sgml
+++ b/doc/src/sgml/plpython.sgml
@@ -1205,6 +1205,24 @@ $$ LANGUAGE plpythonu;
 approximately the same functionality

   
+
+  
+   Raising Errors
+
+   
+A plpy.SPIError can be raised from PL/Python, the constructor accepts
+keyword parameters:
+plpy.SPIError([ message [, detail [, hint [, sqlstate  [, schema  [, table  [, column  [, datatype  [, constraint ]).
+   
+   
+An example of raising custom exception could be written as:
+
+DO $$
+  raise plpy.SPIError('custom message', hint = 'It is test only');
+$$ LANGUAGE plpythonu;
+
+   
+  
  
 
  
diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out
index 1f52af7..435a5c2 100644
--- a/src/pl/plpython/expected/plpython_error.out
+++ b/src/pl/plpython/expected/plpython_error.out
@@ -422,3 +422,65 @@ EXCEPTION WHEN SQLSTATE 'SILLY' THEN
 	-- NOOP
 END
 $$ LANGUAGE plpgsql;
+/* the possibility to set fields of custom exception
+ */
+DO $$
+raise plpy.SPIError('This is message text.',
+detail = 'This is detail text',
+hint = 'This is hint text.')
+$$ LANGUAGE plpythonu;
+ERROR:  plpy.SPIError: This is message text.
+DETAIL:  This is detail text
+HINT:  This is hint text.
+CONTEXT:  Traceback (most recent call last):
+  PL/Python anonymous code block, line 4, in 
+hint = 'This is hint text.')
+PL/Python anonymous code block
+\set VERBOSITY verbose
+DO $$
+raise plpy.SPIError('This is message text.',
+detail = 'This is detail text',
+hint = 'This is hint text.',
+sqlstate = 'SILLY',
+schema = 'any info about schema',
+table = 'any info about table',
+column = 'any info about column',
+datatype = 'any info about datatype',
+constraint = 'any info about constraint')
+$$ LANGUAGE plpythonu;
+ERROR:  SILLY: plpy.SPIError: This is message text.
+DETAIL:  This is detail text
+HINT:  This is hint text.
+CONTEXT:  Traceback (most recent call last):
+  PL/Python anonymous code block, line 10, in 
+constraint = 'any info about constraint')
+PL/Python anonymous code block
+SCHEMA NAME:  any info about schema
+TABLE NAME:  any info about table
+COLUMN NAME:  any info about column
+DATATYPE NAME:  any info about datatype
+CONSTRAINT NAME:  any info about constraint
+LOCATION:  PLy_elog, plpy_elog.c:122
+\set VERBOSITY default
+DO $$
+raise plpy.SPIError(detail = 'This is detail text')
+$$ LANGUAGE plpythonu;
+ERROR:  plpy.SPIError: 
+DETAIL:  This is detail text
+CONTEXT:  Traceback (most recent call last):
+  PL/Python anonymous code block, line 2, in 
+raise plpy.SPIError(detail = 'This is detail text')
+PL/Python anonymous code block
+DO $$
+raise plpy.SPIError();
+$$ LANGUAGE plpythonu;
+ERROR:  plpy.SPIError: 
+CONTEXT:  Traceback (most recent call last):
+  PL/Python anonymous code block, line 2, in 
+raise plpy.SPIError();
+PL/Python anonymous code block
+DO $$
+raise plpy.SPIError(sqlstate = 'wrong sql state');
+$$ LANGUAGE plpythonu;
+ERROR:  could not create SPIError object (invalid SQLSTATE code)
+CONTEXT:  PL/Python anonymous code block
diff --git a/src/pl/plpython/expected/plpython_error_5.out b/src/pl/plpython/expected/plpython_error_5.out
index 5ff46ca..8788bde 100644
--- a/src/pl/plpython/expected/plpython_error_5.out
+++ b/src/pl/plpython/expected/plpython_error_5.out
@@ -422,3 +422,65 @@ EXCEPTION WHEN SQLSTATE 'SILLY' THEN
 	-- NOOP
 END
 $$ LANGUAGE plpgsql;
+/* the possibility to set fields of custom exception
+ */
+DO $$
+raise plpy.SPIError('This is message text.',
+

Re: [HACKERS] Some questions about the array.

On Monday 09 November 2015 13:50:20 Pavel Stehule wrote:
> New symbols increase a complexity of our code and our documentation.
> 
> If some functionality can be implemented via functions without performance
> impacts, we should not to create new operators or syntax - mainly for
> corner use cases.
> 
> Regards
> 
> Pavel

Ok we can use {:} instead [:] for zero array access.  
The function is the solution half.

-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

On Monday 09 November 2015 04:33:28 you wrote:
> You can write it as a separate function instead of changing current syntax.
I do not think, because we have a multi-dimensional arrays.
And why we have [:] syntax now?
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

On 2015-11-09 11:54:59 -0500, Jesper Pedersen wrote:
> Hi,
> 
> On 11/06/2015 03:47 PM, Jesper Pedersen wrote:
> >>Did you initdb between tests? Pgbench -i? Restart the database?
> >
> >I didn't initdb / pgbench -i between the tests, so that it is likely it.
> >
> 
> Each graph has a full initdb + pgbench -i cycle now.

That looks about as we'd expect: the lock-free pinning doesn't matter
and ssynchronous commit is beneficial. I think our bottlenecks in write
workloads are sufficiently elsewhere that it's unlikely that buffer pins
make a lot of difference.

You could try a readonly pgbench workload (i.e. -S), to see whether a
difference is visible there. For a pgbench -S workload it's more likely
that you only see significant contention on larger machines. If you've a
workload that touches more cached buffers, it'd be visible earlier.

> I know, I have a brown paper bag somewhere.

Why? This looks as expected, and the issues from the previous run were
easy to make mistakes?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Multixact slru doesn't don't force WAL flushes in SlruPhysicalWritePage()

Hi,

By happenstance I just had slru.h open after lunch and noticed the
following comment:
/*
 * Optional array of WAL flush LSNs associated with entries in the SLRU
 * pages.  If not zero/NULL, we must flush WAL before writing pages 
(true
 * for pg_clog, false for multixact, pg_subtrans, pg_notify).  
group_lsn[]
 * has lsn_groups_per_page entries per buffer slot, each containing the
 * highest LSN known for a contiguous group of SLRU entries on that 
slot's
 * page.
 */
XLogRecPtr *group_lsn;
int lsn_groups_per_page;

Uhm. multixacts historically didn't need to follow the
write-WAL-before-data rule because it was zapped at restart. But it's
now persistent.

There are no comments about this choice anywhere in multixact.c, leading
me to believe that this was not an intentional decision.

Right now I think the "missing" flushes are fairly unlikely to cause
problems in practice. Mostly because the multixact SLRUs are essentially
append only.

But I'd like some more brains to think about potential danger. If we
decide it's ok, let's update the comments.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] storage/buffer/README docs about buffer replacement are out of date

Hi,

$subject contains:

> The "clock hand" is a buffer index, nextVictimBuffer, that moves circularly
> through all the available buffers.  nextVictimBuffer is protected by the
> buffer_strategy_lock.
> 
> The algorithm for a process that needs to obtain a victim buffer is:
> 
> 1. Obtain buffer_strategy_lock.
> 
> 2. If buffer free list is nonempty, remove its head buffer.  Release
> buffer_strategy_lock.  If the buffer is pinned or has a nonzero usage count,
> it cannot be used; ignore it go back to step 1.  Otherwise, pin the buffer,
> and return it.
> 
> 3. Otherwise, the buffer free list is empty.  Select the buffer pointed to by
> nextVictimBuffer, and circularly advance nextVictimBuffer for next time.
> Release buffer_strategy_lock.
> 
> 4. If the selected buffer is pinned or has a nonzero usage count, it cannot
> be used.  Decrement its usage count (if nonzero), reacquire
> buffer_strategy_lock, and return to step 3 to examine the next buffer.
> 
> 5. Pin the selected buffer, and return.


This is currently accurate on several levels:

a) nextVictimBuffer isn't protectec by the buffer_strategy_lock
   anymore.
b) The buffer free list is first checked unlocked - which 2) above
   doesn't document.
c) The buffer isn't actually returned pinned - instead it's kept locked.

Now a) and b) are recent oversights of mine. I'd apparently not realized
that there's detailed docs on this in buffer/README. But c) is pretty
old - essentially 5d50873 from 2005.

I wonder if it's worthwhile to go into that level of detail - seems
kinda likely to get out of date, as evidenced by it being outdated for
~10 years.

If we want to keep it, how about something like:

The "clock hand" is a buffer index, nextVictimBuffer, that moves
circularly through all the available buffers.  nextVictimBuffer is
incremented atomically. The wraparound handling is protected by
buffer_strategy_lock.

The algorithm for a process that needs to obtain a victim buffer is:

1. Check, without a lock, whether the buffer free list is empty. If
so, jump to step four.

2. Obtain buffer_strategy_lock, check whether the free list is still
non-empty, and if so remove the head buffer. Release
buffer_strategy_lock. If the list is empty jump to step four.

3. If the buffer is pinned or has a nonzero usage count, it cannot be
used; go back to step 2. Otherwise, return the buffer with the header
spinlock held.

4. The buffer free list is empty.  Select the buffer pointed to by
nextVictimBuffer, and circularly advance nextVictimBuffer for next time.

5. If the selected buffer is pinned or has a nonzero usage count, it
cannot be used.  Decrement its usage count (if nonzero), and return to
step 4 to examine the next buffer.

6. Return the buffer with the header spinlock held.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

Adam Brightwell  writes:
>>> Need to bump this #define?  If you didn't get the error that this is
>>> supposed to throw, perhaps there's a bug somewhere worth investigating.

> Whoops!  It must be getting late... updated patch attached.

I'm with Alvaro: the most interesting question here is why that mistake
did not blow up on you immediately.  I thought we had enough safeguards
in place to catch this type of error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CustomScan support on readfuncs.c

2015-11-09 Thread Kouhei Kaigai

> On Sat, Nov 7, 2015 at 6:38 AM, Kouhei Kaigai  wrote:
> > Are you suggesting to pass the object name on software build time?
> 
> Yes.  That's what test_shm_mq and worker_spi already do:
> 
> sprintf(worker.bgw_library_name, "test_shm_mq");
>
OK, I ripped out the portion around dfmgr.c.

> > If so, it may load incorrect libraries when DBA renamed its filename.
> > At least, PostgreSQL cannot prevent DBA to rename library filenames
> > even if they try to deploy "pg_strom.so", "pg_strom.20151106.so" and
> > "pg_strom.20151030_bug.so" on same directory.
> 
> I agree.  But that's not a new problem that needs to be solved by this
> patch.  It already exists - see above.
>
It still may be a potential problem, even though a corner case.
I'll try to implement an internal API to know "what is my name".

The attached main patch (custom-scan-on-readfuncs.v3.patch) once
removes TextOutCustomScan, thus all the serialized tokens become
known to the core backend, and add _readCustomScan() at readfuncs.c.
It deserializes CustomScan nodes from cstring tokens, especially,
reloads the pointer of CustomScanMethods tables using a pair of
library name and symbol name.
Thus, it also requires custom scan providers to keep the methods
table visible from external objects.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



custom-scan-on-readfuncs.v3.patch
Description: custom-scan-on-readfuncs.v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] clearing opfuncid vs. parallel query

2015-11-09 Thread Noah Misch

On Thu, Oct 22, 2015 at 05:14:29PM -0400, Robert Haas wrote:
> On Thu, Oct 22, 2015 at 5:09 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> Despite everybody's best efforts, we mess this up more than is really
> >> desirable. Commit b8fe12a83622b350dc6849f8bb933bd8a86c1424 fixed bugs
> >> in a whole bunch of preceding commits, and I wonder if anybody else
> >> would have found those if Noah hadn't.  It's just too easy to miss
> >> these things today.  If generating the .c files isn't practical,
> >> another alternative worth exploring would be a tool to cross-check
> >> them against the .h files.

FWIW, such a tool found the bugs I fixed in 53bbc68, b5eccae, b8fe12a.  My
script generates {copy,equal,out,read}funcs.c functions from the headers and
diffs each function against its hand-maintained counterpart.  I read through
the diff for anything suspicious.  (Most functions with deliberate nonstandard
behavior carry a comment about it.)

> > Yeah, I could get on board with that.  It doesn't seem terribly hard:
> > just make sure that all fields mentioned in the struct declaration are
> > mentioned in each relevant routine.  (Cases where we intentionally skip
> > a field could be handled by adding a no-op macro, eg "COPY_IGNORE(foo);")
> >
> > It would be nice if we could also check that the macro type is sane for
> > the field datatype (eg, not use COPY_SCALAR_FIELD() for a pointer field).
> > But that would probably increase the difficulty very substantially for
> > just a small gain in error detection.
> 
> I actually think that's quite an easy mistake to make, so I'd be happy
> if we could catch it.  But anything would be better than nothing.

So far, type mismatch defects have been no less common than neglecting a field
entirely.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-09 Thread Adam Brightwell

>> @@ -3365,6 +3370,8 @@ RelationCacheInitializePhase3(void)
>>   AuthIdRelationId);
>>   load_critical_index(AuthMemMemRoleIndexId,
>>   AuthMemRelationId);
>> + load_critical_index(SharedSecLabelObjectIndexId,
>> + 
>> SharedSecLabelRelationId);
>>
>>  #define NUM_CRITICAL_SHARED_INDEXES 5/* fix if you change list 
>> above */
>>
>
> Need to bump this #define?  If you didn't get the error that this is
> supposed to throw, perhaps there's a bug somewhere worth investigating.

Hmm... I thought that I had, are you not seeing the following change?

-#define NUM_CRITICAL_SHARED_RELS3/* fix if you change list above */
+#define NUM_CRITICAL_SHARED_RELS4/* fix if you change list above */

-Adam


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

On 2015-11-09 23:38:57 -0500, Adam Brightwell wrote:
> >> @@ -3365,6 +3370,8 @@ RelationCacheInitializePhase3(void)
> >>   AuthIdRelationId);
> >>   load_critical_index(AuthMemMemRoleIndexId,
> >>   AuthMemRelationId);
> >> + load_critical_index(SharedSecLabelObjectIndexId,
> >> + 
> >> SharedSecLabelRelationId);
> >>
> >>  #define NUM_CRITICAL_SHARED_INDEXES 5/* fix if you change list 
> >> above */
> >>
> >
> > Need to bump this #define?  If you didn't get the error that this is
> > supposed to throw, perhaps there's a bug somewhere worth investigating.
> 
> Hmm... I thought that I had, are you not seeing the following change?
> 
> -#define NUM_CRITICAL_SHARED_RELS3/* fix if you change list above */
> +#define NUM_CRITICAL_SHARED_RELS4/* fix if you change list above */

NUM_CRITICAL_SHARED_RELS != NUM_CRITICAL_SHARED_INDEXES


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-11-09 Thread Amit Langote

On 2015/10/29 23:22, Syed, Rahila wrote:
> 
> Please find attached an updated patch.
> 

Thanks for the v6. A few quick comments:

- duplicate_oids error in HEAD.

- a compiler warning:

pgstat.c:2898: warning: no previous prototype for ‘pgstat_reset_activityflag’

To fix that use void for empty parameter list -

-extern void pgstat_reset_activityflag();
+extern void pgstat_reset_activityflag(void);

One more change you could do is 's/activityflag/activity_flag/g', which I
guess is a naming related guideline in place.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-09 Thread Adam Brightwell

>> >>  #define NUM_CRITICAL_SHARED_INDEXES 5/* fix if you change list 
>> >> above */
>> >>
>> >
>> > Need to bump this #define?  If you didn't get the error that this is
>> > supposed to throw, perhaps there's a bug somewhere worth investigating.
>>
>> Hmm... I thought that I had, are you not seeing the following change?
>>
>> -#define NUM_CRITICAL_SHARED_RELS3/* fix if you change list above */
>> +#define NUM_CRITICAL_SHARED_RELS4/* fix if you change list above */
>
> NUM_CRITICAL_SHARED_RELS != NUM_CRITICAL_SHARED_INDEXES

Whoops!  It must be getting late... updated patch attached.

-Adam
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 9c3d096..b701d82 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -51,6 +51,7 @@
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_rewrite.h"
+#include "catalog/pg_shseclabel.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_trigger.h"
 #include "catalog/pg_type.h"
@@ -98,6 +99,7 @@ static const FormData_pg_attribute Desc_pg_database[Natts_pg_database] = {Schema
 static const FormData_pg_attribute Desc_pg_authid[Natts_pg_authid] = {Schema_pg_authid};
 static const FormData_pg_attribute Desc_pg_auth_members[Natts_pg_auth_members] = {Schema_pg_auth_members};
 static const FormData_pg_attribute Desc_pg_index[Natts_pg_index] = {Schema_pg_index};
+static const FormData_pg_attribute Desc_pg_shseclabel[Natts_pg_shseclabel] = {Schema_pg_shseclabel};
 
 /*
  *		Hash tables that index the relation cache
@@ -3187,13 +3189,14 @@ RelationCacheInitialize(void)
 /*
  *		RelationCacheInitializePhase2
  *
- *		This is called to prepare for access to shared catalogs during startup.
- *		We must at least set up nailed reldescs for pg_database, pg_authid,
- *		and pg_auth_members.  Ideally we'd like to have reldescs for their
- *		indexes, too.  We attempt to load this information from the shared
- *		relcache init file.  If that's missing or broken, just make phony
- *		entries for the catalogs themselves.  RelationCacheInitializePhase3
- *		will clean up as needed.
+ *		This is called to prepare for access to shared catalogs during
+ *		startup.  We must at least set up nailed reldescs for
+ *		pg_database, pg_authid, pg_auth_members, and pg_shseclabel.
+ *		Ideally we'd like to have reldescs for their indexes, too.  We
+ *		attempt to load this information from the shared relcache init
+ *		file.  If that's missing or broken, just make phony entries for
+ *		the catalogs themselves.  RelationCacheInitializePhase3 will
+ *		clean up as needed.
  */
 void
 RelationCacheInitializePhase2(void)
@@ -3229,8 +3232,10 @@ RelationCacheInitializePhase2(void)
   true, Natts_pg_authid, Desc_pg_authid);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
   false, Natts_pg_auth_members, Desc_pg_auth_members);
+		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
+  false, Natts_pg_shseclabel, Desc_pg_shseclabel);
 
-#define NUM_CRITICAL_SHARED_RELS	3	/* fix if you change list above */
+#define NUM_CRITICAL_SHARED_RELS	4	/* fix if you change list above */
 	}
 
 	MemoryContextSwitchTo(oldcxt);
@@ -3365,8 +3370,10 @@ RelationCacheInitializePhase3(void)
 			AuthIdRelationId);
 		load_critical_index(AuthMemMemRoleIndexId,
 			AuthMemRelationId);
+		load_critical_index(SharedSecLabelObjectIndexId,
+			SharedSecLabelRelationId);
 
-#define NUM_CRITICAL_SHARED_INDEXES 5	/* fix if you change list above */
+#define NUM_CRITICAL_SHARED_INDEXES 6	/* fix if you change list above */
 
 		criticalSharedRelcachesBuilt = true;
 	}
diff --git a/src/include/catalog/pg_shseclabel.h b/src/include/catalog/pg_shseclabel.h
index 0ff41f3..d8334bf 100644
--- a/src/include/catalog/pg_shseclabel.h
+++ b/src/include/catalog/pg_shseclabel.h
@@ -18,9 +18,10 @@
  *		typedef struct FormData_pg_shseclabel
  * 
  */
-#define SharedSecLabelRelationId		3592
+#define SharedSecLabelRelationId			3592
+#define SharedSecLabelRelation_Rowtype_Id	4066
 
-CATALOG(pg_shseclabel,3592) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
+CATALOG(pg_shseclabel,3592) BKI_SHARED_RELATION BKI_ROWTYPE_OID(4066) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO
 {
 	Oid			objoid;			/* OID of the shared object itself */
 	Oid			classoid;		/* OID of table containing the shared object */
@@ -31,6 +32,8 @@ CATALOG(pg_shseclabel,3592) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 #endif
 } FormData_pg_shseclabel;
 
+typedef FormData_pg_shseclabel *Form_pg_shseclabel;
+
 /* 
  *		compiler constants for pg_shseclabel
  * 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-11-09 Thread Corey Huinker

On Fri, Nov 6, 2015 at 8:08 PM, Peter Geoghegan  wrote:

> On Wed, Aug 19, 2015 at 7:24 PM, Peter Geoghegan  wrote:
> > I'll start a new thread for this, since my external sorting patch has
> > now evolved well past the original "quicksort with spillover"
> > idea...although not quite how I anticipated it would. It seems like
> > I've reached a good point to get some feedback.
>
> Corey Huinker has once again assisted me with this work, by doing some
> benchmarking on an AWS instance of his:
>
> 32 cores (c3.8xlarge, I suppose)
> MemTotal:   251902912 kB
>
> I believe it had one EBS volume.
>
> This testing included 2 data sets:
>
> * A data set that he happens to have that is representative of his
> production use-case. Corey had some complaints about the sort
> performance of PostgreSQL, particularly prior to 9.5, and I like to
> link any particular performance optimization to an improvement in an
> actual production workload, if at all possible.
>
> * A tool that I wrote, that works on top of sortbenchmark.org's
> "gensort" [1] data generation tool. It seems reasonable to me to drive
> this work in part with a benchmark devised by Jim Gray. He did after
> all receive a Turing award for this contribution to transaction
> processing. I'm certainly a fan of his work. A key practical advantage
> of that is that is has reasonable guarantees about determinism, making
> these results relatively easy to recreate independently.
>
> The modified "gensort" is available from
> https://github.com/petergeoghegan/gensort
>
> The python script postgres_load.py, which performs bulk-loading for
> Postgres using COPY FREEZE. It ought to be fairly self-documenting:
>
> $:~/gensort$ ./postgres_load.py --help
> usage: postgres_load.py [-h] [-w WORKERS] [-m MILLION] [-s] [-l] [-c]
>
> optional arguments:
>   -h, --helpshow this help message and exit
>   -w WORKERS, --workers WORKERS
> Number of gensort workers (default: 4)
>   -m MILLION, --million MILLION
> Generate n million tuples (default: 100)
>   -s, --skewSkew distribution of output keys (default: False)
>   -l, --logged  Use logged PostgreSQL table (default: False)
>   -c, --collate Use default collation rather than C collation
> (default: False)
>
> For this initial report to the list, I'm going to focus on a case
> involving 16 billion non-skewed tuples generated using the gensort
> tool. I wanted to see how a sort of a ~1TB table (1017GB as reported
> by psql, actually) could be improved, as compared to relatively small
> volumes of data (in the multiple gigabyte range) that were so improved
> by sorts on my laptop, which has enough memory to avoid blocking on
> physical I/O much of the time. How the new approach deals with
> hundreds of runs that are actually reasonably sized is also of
> interest. This server does have a lot of memory, and many CPU cores.
> It was kind of underpowered on I/O, though.
>
> The initial load of 16 billion tuples (with a sortkey that is "C"
> locale text) took about 10 hours. My tool supports parallel generation
> of COPY format files, but serial performance of that stage isn't
> especially fast. Further, in order to support COPY FREEZE, and in
> order to ensure perfect determinism, the COPY operations occur
> serially in a single transaction that creates the table that we
> performed a CREATE INDEX on.
>
> Patch, with 3GB maintenance_work_mem:
>
> ...
> LOG:  performsort done (except 411-way final merge): CPU
> 1017.95s/17615.74u sec elapsed 23910.99 sec
> STATEMENT:  create index on sort_test (sortkey );
> LOG:  external sort ended, 54740802 disk blocks used: CPU
> 2001.81s/31395.96u sec elapsed 41648.05 sec
> STATEMENT:  create index on sort_test (sortkey );
>
> So just over 11 hours (11:34:08), then. The initial sorting for 411
> runs took 06:38:30.99, as you can see.
>
> Master branch:
>
> ...
> LOG:  finished writing run 202 to tape 201: CPU 1224.68s/31060.15u sec
> elapsed 34409.16 sec
> LOG:  finished writing run 203 to tape 202: CPU 1230.48s/31213.55u sec
> elapsed 34580.41 sec
> LOG:  finished writing run 204 to tape 203: CPU 1236.74s/31366.63u sec
> elapsed 34750.28 sec
> LOG:  performsort starting: CPU 1241.70s/31501.61u sec elapsed 34898.63 sec
> LOG:  finished writing run 205 to tape 204: CPU 1242.19s/31516.52u sec
> elapsed 34914.17 sec
> LOG:  finished writing final run 206 to tape 205: CPU
> 1243.23s/31564.23u sec elapsed 34963.03 sec
> LOG:  performsort done (except 206-way final merge): CPU
> 1243.86s/31570.58u sec elapsed 34974.08 sec
> LOG:  external sort ended, 54740731 disk blocks used: CPU
> 2026.98s/48448.13u sec elapsed 55299.24 sec
> CREATE INDEX
> Time: 55299315.220 ms
>
> So 15:21:39 for master -- it's much improved, but this was still
> disappointing given the huge improvements on relatively small cases.
>
> Finished index was fairly large, which can be

Re: [HACKERS] Multixid hindsight design

On Mon, Nov 9, 2015 at 2:02 AM, Craig Ringer  wrote:
> On 25 June 2015 at 00:52, Robert Haas  wrote:
>> I agree that we can do much better at testing than we traditionally
>> have done, and I think pretty much everyone in the room for the
>> developer unconference session on testing at PGCon was also in
>> agreement with that.  I really like the idea of taking purpose-built
>> testing frameworks - like the one that Heikki created for the WAL
>> format changes - and polishing them to the point where they can go
>> into core.  That's more work, of course, but very beneficial.
>
> Something that'd really help with that, IMO, would be weakening the
> requirement that everything be C or be core Perl. Instead require that
> configure detect whether or not facilities for some tests are present,
> and have them fail with a clean warning indicating they were skipped
> for lack of pre-requisites at 'make' time.
>
> I don't see that as significantly different to having some buildfarm
> animals not bother to configure or test the PLs, SSL, etc. I
> understand why adding to the mix required for the core server build
> isn't acceptable, but hopefully separate test suites can be more
> flexible. A free-for-all of languages and tools doesn't make sense,
> but I'd like to see, say, python and the 'unittest' module added, and
> to see acceptance of tests using psycopg2 to stream and decode WAL
> from a slot.
>
> Thoughts?

I actually kind of hate this sort of thing.  It's already the case
that some of the TAP tests are skilled if you don't have the
prerequisites, and while that solves the problem of spurious test
failures, it also means that the tests which have those dependencies
are run in fewer and fewer places.

Now I'm not dead set against changing anything at all here, but I want
to point out that we just added the TAP framework and already there
are proposals (like Kevin's snapshot too old patch) to require DBD::Pg
for some tests, which a lot of people aren't going to have, and I know
you and probably a few other people would like python, which is fair
enough, but other people are going to want other things, and pretty
soon we we end up with a situation - if we're not already there -
where for a developer or packager to get a machine to the point where
it runs "all the tests" is a major undertaking.

Accepting more tools also increases the breadth of knowledge expected
of committers and patch authors.  If I commit a patch and it turns out
something breaks, I'm expected to fix that.  Now if it's in C or Perl,
I can.  If it's in Python, I can't.  If we add Python to the list of
things we test with, there will (I'm sure) be other committers who can
fix the C and Python stuff, but have no clue about the Perl.  The more
frameworks we support, the worse this gets.  And woe betide a patch
author whose feature requires adjustment of existing tests - if that
person actually notices the problem before commit, as opposed to the
buildfarm finding it, that person now needs to learn enough of every
language in which we support tests to update the existing tests and
add any new ones which are needed.  I'm starting to get the hang of
isolation tests, but I still don't *really* understand TAP tests,
really, and anything new we add is a whole new obstacle.

I think this is sort of an inherent problem with long-running
projects.  Everybody knows, for example, that the toolchain we're
using to build our documentation is pretty crufty.  But if we switched
to something else that used a different input format, then every
back-patchable doc fix would have to be done twice, once for the old
toolchain and once for the new, which means that at least some people
would have to understand both, for a good 5-6 years.  We'd also have
to get that new toolchain working on every platform we support, and
all of our packagers would have to switch to it, and of course we'd
want it to support all the output formats we have today.  Yet, if we
were starting over, I think there's about zero chance we'd pick this
approach.  It's just that switching now is a job and a half, and we
don't want to take that on likely.  Similarly with testing frameworks
- except those are even worse, because nobody's saying that any of the
stuff we have now will ever go away.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eXtensible Transaction Manager API

On Sun, Nov 8, 2015 at 6:35 PM, Michael Paquier
 wrote:
> Sure. Now imagine that the pg_twophase entry is corrupted for this
> transaction on one node. This would trigger a PANIC on it, and
> transaction would not be committed everywhere.

If the database is corrupted, there's no way to guarantee that
anything works as planned.  This is like saying that criticizing
somebody's disaster recovery plan on the basis that it will be
inadequate if the entire planet earth is destroyed.

> I am aware of the fact
> that by definition PREPARE TRANSACTION ensures that a transaction will
> be committed with COMMIT PREPARED, just trying to see any corner cases
> with the approach proposed. The DTM approach is actually rather close
> to what a GTM in Postgres-XC does :)

Yes.  I think that we should try to learn as much as possible from the
XC experience, but that doesn't mean we should incorporate XC's fuzzy
thinking about 2PC into PG.  We should not.

One point I'd like to mention is that it's absolutely critical to
design this in a way that minimizes network roundtrips without
compromising correctness.  XC's GTM proxy suggests that they failed to
do that.  I think we really need to look at what's going to be on the
other sides of the proposed APIs and think about whether it's going to
be possible to have a strong local caching layer that keeps network
roundtrips to a minimum.  We should consider whether the need for such
a caching layer has any impact on what the APIs should look like.

For example, consider a 10-node cluster where each node has 32 cores
and 32 clients, and each client is running lots of short-running SQL
statements.  The demand for snapshots will be intense.  If every
backend separately requests a snapshot for every SQL statement from
the coordinator, that's probably going to be terrible.  We can make it
the problem of the stuff behind the DTM API to figure out a way to
avoid that, but maybe that's going to result in every DTM needing to
solve the same problems.

Another thing that I think we need to consider is fault-tolerance.
For example, suppose that transaction commit and snapshot coordination
services are being provided by a central server which keeps track of
the global commit ordering.  When that server gets hit by a freak bold
of lightning and melted into a heap of slag, somebody else needs to
take over.  Again, this would live below the API proposed here, but I
think it really deserves some thought before we get too far down the
path.  XC didn't start thinking about how to add fault-tolerance until
quite late in the project, I think, and the same could be said of
PostgreSQL itself: some newer systems have easier-to-use fault
tolerance mechanisms because it was designed in from the beginning.
Distributed systems by nature need high availability to a far greater
degree than single systems, because when there are more nodes, node
failures are more frequent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor comment improvement to create_foreignscan_plan

On Mon, Nov 9, 2015 at 5:34 AM, Etsuro Fujita
 wrote:
> Here is a small patch to update an comment in create_foreignscan_plan;
> add fdw_recheck_quals to the list of expressions that need the
> replace_nestloop_params processing.  I should have updated the comment
> when I proposed the patch for the fdw_recheck_quals.

OK, not a big deal, but thanks.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extracting fields from 'infinity'::TIMESTAMP[TZ]

2015-11-09 Thread Torsten Zuehlsdorff




On 09.11.2015 17:41, Tom Lane wrote:

Kevin Grittner  writes:

On Monday, November 9, 2015 9:37 AM, Robert Haas  wrote:

That doesn't seem like enough consensus to commit this patch, which
would change everything to +/-infinity.  That particular choice
wouldn't bother me much, but it sounds like other people aren't sold.
I think we need to try to hash that out a little more rather than
rushing into a backward-incompatible change.



I agree that none of this should be back-patched.


Definitely.


I agree that a timestamp[tz] of infinity should yield infinity for
epoch.


I think everybody is sold on that much.


My first choice for other things would be NaN, but throwing an
error instead would be OK.


Since the function hasn't thrown error for such cases in the past, making
it do so now would likely break applications.  More than once, we've
had to modify functions to avoid throwing errors so that you don't get
incidental errors when blindly applying a function to all entries in a
column.  I think going in the opposite direction would elicit protests.


An error will also break legit SQL statements.


I could see using NaN except for one thing: it'd mean injecting a rather
fundamental dependence on IEEE math into a basic function definition.  You
can be just about 100% certain that if the SQL committee ever addresses
this case, it won't be with NaN.


ACK.


What about returning NULL for the ill-defined cases?  That seems to
comport with SQL's notion of NULL as "unknown/undefined".


This is the first i would expect in such a case.

+ 1 for NULL.

Greetings,
Torsten


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

Hi

2015-11-05 22:23 GMT+01:00 Robert Haas :

> On Thu, Nov 5, 2015 at 3:53 PM, Catalin Iacob 
> wrote:
> > On Thu, Nov 5, 2015 at 5:27 PM, Robert Haas 
> wrote:
> >>> I wrote some text. But needs some work of native speaker.
> >>
> >> It does.  It would be nice if some kind reviewer could help volunteer
> >> to clean that up.
> >
> > I'll give it a go sometime next week.
>
> Thanks, that would be great!
>
> I recommend comparing the section on -c and the section on -C, and
> probably updating the former as well as adjusting the wording of the
> latter.  We don't want to repeat all the same details in both places,
> but we hopefully want to give people a little clue that if they're
> thinking about using -c, they may wish to instead consider -C.
>

-g was replaced by -C option and some other required changes.

I have not idea about good long name. In this moment I used
"multi-command". Can be changed freely.

The name of this patch is same (although it doesn't use "group-command"
internally anymore) due better orientation.

Regards

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
new file mode 100644
index 5899bb4..3ef32d2
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
*** EOF
*** 132,137 
--- 132,176 
  
  
  
+   -C command(s)
+   --multi-command=command(s)
+   
+   
+   Specifies that psql is to execute one or
+   more command strings, commands,
+   and then exit. This is useful in shell scripts. Start-up files
+   (psqlrc and ~/.psqlrc) are
+   ignored with this option.
+   
+ 
+   
+   There are a few differences between -c and
+   -C options. The option -c can be used
+   only once. The option -C can be used more times.
+   This can simplify writing some non trivial SQL commands. With the
+   -C option it is possible to call several psql
+   parametrized backslash commands. When you execute multiple SQL
+   commands via -c option, only result of last command
+   is returned. The execution started by -C option shows
+   result of all commands.
+   
+ 
+   
+   Another difference is in wrapping the transaction. The -c
+   option runs commands in one transaction. The -C option
+   uses autocommit mode by default. This allows running multiple commads
+   which would otherwise not be allowed to execute within one transaction.
+   This is typical for VACUUM command.
+ 
+ psql -Atq -C "VACUUM FULL foo; SELECT pg_relation_size('foo')"
+ psql -Atq -C "VACUUM FULL foo" -C "SELECT pg_relation_size('foo')"
+ 
+   
+ 
+   
+ 
+ 
+ 
-d dbname
--dbname=dbname

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
new file mode 100644
index 72c00c1..1bc20d3
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
*** process_file(char *filename, bool single
*** 2293,2299 
  	int			result;
  	char	   *oldfilename;
  	char		relpath[MAXPGPATH];
- 	PGresult   *res;
  
  	if (!filename)
  	{
--- 2293,2298 
*** process_file(char *filename, bool single
*** 2338,2374 
  	oldfilename = pset.inputfile;
  	pset.inputfile = filename;
  
! 	if (single_txn)
! 	{
! 		if ((res = PSQLexec("BEGIN")) == NULL)
! 		{
! 			if (pset.on_error_stop)
! 			{
! result = EXIT_USER;
! goto error;
! 			}
! 		}
! 		else
! 			PQclear(res);
! 	}
! 
! 	result = MainLoop(fd);
! 
! 	if (single_txn)
! 	{
! 		if ((res = PSQLexec("COMMIT")) == NULL)
! 		{
! 			if (pset.on_error_stop)
! 			{
! result = EXIT_USER;
! goto error;
! 			}
! 		}
! 		else
! 			PQclear(res);
! 	}
  
- error:
  	if (fd != stdin)
  		fclose(fd);
  
--- 2337,2344 
  	oldfilename = pset.inputfile;
  	pset.inputfile = filename;
  
! 	result = MainLoop(fd, single_txn);
  
  	if (fd != stdin)
  		fclose(fd);
  
*** error:
*** 2376,2383 
  	return result;
  }
  
- 
- 
  static const char *
  _align2string(enum printFormat in)
  {
--- 2346,2351 
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
new file mode 100644
index 5b63e76..7058ada
*** a/src/bin/psql/help.c
--- b/src/bin/psql/help.c
*** usage(unsigned short int pager)
*** 81,86 
--- 81,88 
  	if (!env)
  		env = user;
  	fprintf(output, _("  -c, --command=COMMANDrun only single command (SQL or internal) and exit\n"));
+ 	fprintf(output, _("  -C, --multi-command=COMMAND\n"
+ 	  "   run more multiple commands (SQL or internal) and exit\n"));
  	fprintf(output, _("  -d, --dbname=DBNAME  database name to connect to (default: \"%s\")\n"), env);
  	fprintf(output, _("  -f, --file=FILENAME  execute commands from file, then exit\n"));
  	fprintf(output, _("  -l, --list   list

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-11-09 Thread Catalin Iacob

On Mon, Nov 9, 2015 at 6:31 PM, Pavel Stehule  wrote:
> merged your patch

So, I just that tested version 08 is the same as the previous patch +
my change and I already tested that on Python 2.4, 2.5, 2.6, 2.7 and
3.5.

All previous comments addressed so I'll mark this Ready for Committer.

Thanks Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-09 17:55 GMT+01:00 Alexander Korotkov :

> On Mon, Nov 9, 2015 at 4:53 PM, Pavel Stehule 
> wrote:
>
>> 2015-11-09 14:44 GMT+01:00 YUriy Zhuravlev :
>>
>>> On Monday 09 November 2015 13:50:20 Pavel Stehule wrote:
>>> > New symbols increase a complexity of our code and our documentation.
>>> >
>>> > If some functionality can be implemented via functions without
>>> performance
>>> > impacts, we should not to create new operators or syntax - mainly for
>>> > corner use cases.
>>> >
>>> > Regards
>>> >
>>> > Pavel
>>>
>>> Ok we can use {:} instead [:] for zero array access.
>>> The function is the solution half.
>>>
>>
>> It isn't solution. The any syntax/behave change have to have stronger
>> motivation. We had so talk about it 20 years ago :(
>>
>
> Assuming array[~n] has a current meaning, could we give a try to new
> syntax which doesn't have current meaning? Not yet sure what exactly it
> could be...
>

Using this syntax can introduce compatibility issues -
http://www.postgresql.org/docs/9.1/static/sql-createoperator.html

Regards

Pavel


>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

Re: [HACKERS] Multixid hindsight design

Robert Haas  writes:
> On Mon, Nov 9, 2015 at 2:02 AM, Craig Ringer  wrote:
>> Something that'd really help with that, IMO, would be weakening the
>> requirement that everything be C or be core Perl. Instead require that
>> configure detect whether or not facilities for some tests are present,
>> and have them fail with a clean warning indicating they were skipped
>> for lack of pre-requisites at 'make' time.

> I actually kind of hate this sort of thing.

FWIW, I agree with Robert's position.  Expanding the infrastructure
required for tests may make life simpler for the initial author of a test,
but we pay for that over and over again when you consider the long run.

I think insisting on having e.g. DBD::Pg is simply exporting the author's
work to other people; there isn't anything that that enables that you
can't do without it.  We have added test infrastructure when there simply
wasn't any other way to test something.  Both the isolation framework and
the TAP framework represent substantial investments of that sort.  I'm
fine with both of those.  But the bar to adding new requirements ought to
be pretty darn high, and not just "I was too lazy to write it without".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-09 Thread Alvaro Herrera

Adam Brightwell wrote:

> @@ -3365,6 +3370,8 @@ RelationCacheInitializePhase3(void)
>   AuthIdRelationId);
>   load_critical_index(AuthMemMemRoleIndexId,
>   AuthMemRelationId);
> + load_critical_index(SharedSecLabelObjectIndexId,
> + 
> SharedSecLabelRelationId);
>  
>  #define NUM_CRITICAL_SHARED_INDEXES 5/* fix if you change list above 
> */
>  

Need to bump this #define?  If you didn't get the error that this is
supposed to throw, perhaps there's a bug somewhere worth investigating.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-09 Thread Jeff Janes

On Mon, Nov 9, 2015 at 9:06 AM, Tom Lane  wrote:
> =?GBK?B?tcK45w==?=  writes:
>>PostgreSQL 9.5 added skip locked to select for update to improve 
>> concurrency performance, but why not add it to update sql?
>
> Seems like you'd have unpredictable results from the update then.

But with use of RETURNING, you could at least know what those results
were and so could deal with the unpredictability.

I don't understand Digoal's use case (Google Translate does a poor job
on the linked page), but this would be handy in conjunction with LIMIT
(which also doesn't exist for UPDATE right now).

update work_queue set assigned='Jeff' where assigned is null and
skills_needed <@ '{whining,"breaking things"}'  limit 1 skip locked
returning id, description

In 9.5 you will be able to do it with a subselect, but putting them
directly on the UPDATE would be easier to understand, and perhaps more
efficient to execute.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-11-09 19:21 GMT+01:00 Catalin Iacob :

> On Mon, Nov 9, 2015 at 6:31 PM, Pavel Stehule 
> wrote:
> > merged your patch
>
> So, I just that tested version 08 is the same as the previous patch +
> my change and I already tested that on Python 2.4, 2.5, 2.6, 2.7 and
> 3.5.
>
> All previous comments addressed so I'll mark this Ready for Committer.
>

Thank you very much for help

Regards

Pavel


>
> Thanks Pavel
>

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

On 9 November 2015 at 17:06, Tom Lane  wrote:

> =?GBK?B?tcK45w==?=  writes:
> >PostgreSQL 9.5 added skip locked to select for update to improve
> concurrency performance, but why not add it to update sql?
>
> Seems like you'd have unpredictable results from the update then.
>

True, but given the already restricted use case of SKIP LOCKED, the request
makes sense for the following

UPDATE ...
SKIP LOCKED
RETURNING xxx

would be better than

BEGIN

SELECT  xxx
FOR UPDATE
SKIP LOCKED

UPDATE

COMMIT

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

[HACKERS] pg_receivexlog: spurious error message connecting to 9.3

2015-11-09 Thread Marco Nenciarini

Hi,

I was testing the compatibility of pg_receivexlog with the previous PostgreSQL 
versions and I've discovered that in 9.5 and 9.6, although being compatible 
with 9.3, it prints an ugly but harmless error message.

$ 9.5/bin/pg_receivexlog -D /tmp/testx -v -p 5493
*pg_receivexlog: could not identify system: got 1 rows and 3 fields, expected 1 
rows and 4 or more fields*
*column number 3 is out of range 0..2*
pg_receivexlog: starting log streaming at 0/400 (timeline 1)

After the error, the streaming starts and continue without issue, as it was 
printed by the code that checks if the connection is not database specific, and 
this check is not needed on 9.3.

I've attached a little patch that removes the errors when connected to 9.3.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/src/bin/pg_basebackup/streamutil.c 
b/src/bin/pg_basebackup/streamutil.c
index 2c963b6..273b2cf 100644
*** a/src/bin/pg_basebackup/streamutil.c
--- b/src/bin/pg_basebackup/streamutil.c
*** RunIdentifySystem(PGconn *conn, char **s
*** 295,308 
if (db_name != NULL)
{
if (PQnfields(res) < 4)
!   fprintf(stderr,
!   _("%s: could not identify system: got 
%d rows and %d fields, expected %d rows and %d or more fields\n"),
!   progname, PQntuples(res), 
PQnfields(res), 1, 4);
! 
!   if (PQgetisnull(res, 0, 3))
!   *db_name = NULL;
else
!   *db_name = pg_strdup(PQgetvalue(res, 0, 3));
}
  
PQclear(res);
--- 295,315 
if (db_name != NULL)
{
if (PQnfields(res) < 4)
!   {
!   if (PQserverVersion(conn) >= 90400)
!   fprintf(stderr,
!   _("%s: could not identify 
system: got %d rows and %d fields, expected %d rows and %d or more fields\n"),
!   progname, PQntuples(res), 
PQnfields(res), 1, 4);
!   else
!   *db_name = NULL;
!   }
else
!   {
!   if (PQgetisnull(res, 0, 3))
!   *db_name = NULL;
!   else
!   *db_name = pg_strdup(PQgetvalue(res, 0, 3));
!   }
}
  
PQclear(res);


signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] bootstrap pg_shseclabel in relcache initialization

2015-11-09 Thread Joe Conway

On 11/08/2015 11:17 PM, Craig Ringer wrote:
> On 9 November 2015 at 12:40, Adam Brightwell
>  wrote:
>> Hi All,
>>
>> While working on an auth hook, I found that I was unable to access the
>> pg_shseclabel system table while processing the hook.  I discovered
>> that the only tables that were bootstrapped and made available at this
>> stage of the the auth process were pg_database, pg_authid and
>> pg_auth_members.  Unfortunately, this is problematic if you have
>> security labels that are associated with a role which are needed to
>> determine auth decisions/actions.
>>
>> Given that the shared relations currently exposed can also have
>> security labels that can be used for auth purposes, I believe it makes
>> sense to make those available as well.  I have attached a patch that
>> adds this functionality for review/discussion.  If this functionality
>> makes sense I'll add it to the commitfest.
> 
> Your reasoning certainly makes sense to me. I'm a little surprised
> this didn't cause issues for SEPostgreSQL already.

Currently sepgsql at least does not support security labels on roles,
even though nominally postgres does. If the label provider does not
support them (as in sepgsql) you just get a "feature not supported" type
of error when trying to create the label. I'm not sure if there are any
other label providers in the wild other than sepgsql, but I should think
they would all benefit from this change.

+1 for adding to the next commitfest.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-11-09 16:46 GMT+01:00 Catalin Iacob :

> On Mon, Nov 9, 2015 at 2:58 PM, Pavel Stehule 
> wrote:
> > I needed to understand the support for Python 3.5.
> >
> > The patch with the fix is attached regress tests 3.5 Python
>
> I wanted to type a reply this morning to explain and then I noticed
> there's another file (_0.out) for Python2.4 and older (as explained by
> src/pl/plpython/expected/README) so tests there fail as well. I built
> Python 2.4.6 and then Postgres against it and updated the expected
> output for 2.4 as well. Find the changes for 2.4 attached. You can add
> those to your patch.
>

I forgot it


> While I was at it I tested 2.5 and 2.6 as well, they also pass all
> tests. So with the 2.4 changes attached I think we're done.
>

merged your patch


>
> BTW, the alternative output files for pg_regress are explained at
> http://www.postgresql.org/docs/devel/static/regress-variant.html


I didn't know it - nice, maybe I use it in orafce.

Thank you

Regards

Pavel
diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
new file mode 100644
index 015bbad..51bc48e
*** a/doc/src/sgml/plpython.sgml
--- b/doc/src/sgml/plpython.sgml
*** $$ LANGUAGE plpythonu;
*** 1205,1210 
--- 1205,1228 
  approximately the same functionality
 

+ 
+   
+Raising Errors
+ 
+
+ A plpy.SPIError can be raised from PL/Python, the constructor accepts
+ keyword parameters:
+ plpy.SPIError([ message [, detail [, hint [, sqlstate  [, schema  [, table  [, column  [, datatype  [, constraint ]).
+
+
+ An example of raising custom exception could be written as:
+ 
+ DO $$
+   raise plpy.SPIError('custom message', hint = 'It is test only');
+ $$ LANGUAGE plpythonu;
+ 
+
+   
   
  
   
diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out
new file mode 100644
index 1f52af7..435a5c2
*** a/src/pl/plpython/expected/plpython_error.out
--- b/src/pl/plpython/expected/plpython_error.out
*** EXCEPTION WHEN SQLSTATE 'SILLY' THEN
*** 422,424 
--- 422,486 
  	-- NOOP
  END
  $$ LANGUAGE plpgsql;
+ /* the possibility to set fields of custom exception
+  */
+ DO $$
+ raise plpy.SPIError('This is message text.',
+ detail = 'This is detail text',
+ hint = 'This is hint text.')
+ $$ LANGUAGE plpythonu;
+ ERROR:  plpy.SPIError: This is message text.
+ DETAIL:  This is detail text
+ HINT:  This is hint text.
+ CONTEXT:  Traceback (most recent call last):
+   PL/Python anonymous code block, line 4, in 
+ hint = 'This is hint text.')
+ PL/Python anonymous code block
+ \set VERBOSITY verbose
+ DO $$
+ raise plpy.SPIError('This is message text.',
+ detail = 'This is detail text',
+ hint = 'This is hint text.',
+ sqlstate = 'SILLY',
+ schema = 'any info about schema',
+ table = 'any info about table',
+ column = 'any info about column',
+ datatype = 'any info about datatype',
+ constraint = 'any info about constraint')
+ $$ LANGUAGE plpythonu;
+ ERROR:  SILLY: plpy.SPIError: This is message text.
+ DETAIL:  This is detail text
+ HINT:  This is hint text.
+ CONTEXT:  Traceback (most recent call last):
+   PL/Python anonymous code block, line 10, in 
+ constraint = 'any info about constraint')
+ PL/Python anonymous code block
+ SCHEMA NAME:  any info about schema
+ TABLE NAME:  any info about table
+ COLUMN NAME:  any info about column
+ DATATYPE NAME:  any info about datatype
+ CONSTRAINT NAME:  any info about constraint
+ LOCATION:  PLy_elog, plpy_elog.c:122
+ \set VERBOSITY default
+ DO $$
+ raise plpy.SPIError(detail = 'This is detail text')
+ $$ LANGUAGE plpythonu;
+ ERROR:  plpy.SPIError: 
+ DETAIL:  This is detail text
+ CONTEXT:  Traceback (most recent call last):
+   PL/Python anonymous code block, line 2, in 
+ raise plpy.SPIError(detail = 'This is detail text')
+ PL/Python anonymous code block
+ DO $$
+ raise plpy.SPIError();
+ $$ LANGUAGE plpythonu;
+ ERROR:  plpy.SPIError: 
+ CONTEXT:  Traceback (most recent call last):
+   PL/Python anonymous code block, line 2, in 
+ raise plpy.SPIError();
+ PL/Python anonymous code block
+ DO $$
+ raise plpy.SPIError(sqlstate = 'wrong sql state');
+ $$ LANGUAGE plpythonu;
+ ERROR:  could not create SPIError object (invalid SQLSTATE code)
+ CONTEXT:  PL/Python anonymous code block
diff --git a/src/pl/plpython/expected/plpython_error_0.out b/src/pl/plpython/expected/plpython_error_0.out
new file mode 100644
index 5323906..19b1a9b
*** a/src/pl/plpython/expected/plpython_error_0.out
--- b/src/pl/plpython/expected/plpython_error_0.out
*** EXCEPTION WHEN SQLSTATE 'SILLY' THEN
*** 422,424 
--- 422,486 
  	-- NOOP
  END
  $$ LANGUAGE plpgsql;
+ /* the possibility to set

Re: [HACKERS] Extracting fields from 'infinity'::TIMESTAMP[TZ]

On Sat, Nov 7, 2015 at 9:47 AM, Vitaly Burovoy  wrote:
> I'd like to raise a topic about extracting fields from infinite
> timestamps, so much more that it is mentioned in the TODO list:
> "Determine how to represent date/time field extraction on infinite
> timestamps".
>
> Currently extracting any field from 'infinity'::TIMESTAMP[TZ] gives
> result "0" as a mark it has "special" input value.
>
> The most confusing case is 'epoch' field: returning "0" from
> "infinity" means the same thing as returning "0" from "1970-01-01+00".
>
> Returning zero in most other cases is only slightly less confusing
> (may be because for me they are less often used).
> For example, what about "SELECT EXTRACT(DOW FROM TIMESTAMP
> 'Infinity')" with result 0, as if it is Sunday?
> The same thing with fields: decade, hour, minute, seconds,
> microseconds, milliseconds, timezone, timezone_hour, timezone_minute.
> Also for "millennium" and "year" (with the note "Keep in mind there is
> no 0 AD") current returning value is _between_ allowed values, but
> disallowed.
> http://www.postgresql.org/docs/9.5/static/functions-datetime.html
>
>
> There was a discussion ended in nothing. It began at:
> http://www.postgresql.org/message-id/ca+mi_8bda-fnev9ixeubnqhvacwzbyhhkwoxpqfbca9edpp...@mail.gmail.com
>
> Discussants agreed change is necessary, but couldn't decide what
> behavior is preferred: throwing an error or returning NULL, NaN or +/-
> infinity.
>
> My thoughts about that cases:
> * Throwing an error: prefer to avoid it according to
> http://www.postgresql.org/message-id/73a5666e-2d40-457e-9dff-248895db7...@gmail.com
> * NULL: it is "absence of any value", i.e. it could be returned iff
> input value is NULL (in the other case it is not better than returning
> 0).
> * NaN: it could be returned if value is outside current axe (like
> complex value), but it is not the case.
>
> In a parallel discussion ("converting between infinity timestamp and
> float8 (epoch)")
> http://www.postgresql.org/message-id/cadakt-icuesh16ulocxbr-dkpcvwtuje4jwxnkdajaawp6j...@mail.gmail.com
> There was interesting thought to make difference between monotonic
> values (century, decade, epoch, isoyear, millennium and year) and
> oscillating values (day, dow, doy, hour, isodow, microseconds,
> milliseconds, minute, month, quarter, second and week).
> An argument is for monotonic values +/- infinity has a sense, but not
> for oscillating ones.
> But for oscillating values NULL was proposed, that (IMHO) is not a
> good idea (see above).
> I think changing current mark "input value is not finite" allows an
> app layer (which knows which field it tries to fetch from
> timestamp[tz]) to handle extracted value correctly. For oscillating
> values there can be the same values as for monotonic values, because
> you can't mix them up.
> The end of the parallel discussion (with the most important thoughts)
> at http://www.postgresql.org/message-id/4efcfd1c.8040...@archidevsys.co.nz
>
> So I think +/- infinity is the best returning value for all fields.
>
> The attached patch contains changes in timestamp_part and
> timestamptz_part and tests for them.
>
> I doubt whether it can be backpatched (according to team's rules) or
> not, but the patch can be applied down to 9.2 without conflicts and
> passes tests.
> Unfortunately, on 9.1 proposed test fails because "SELECT
> EXTRACT(EPOCH FROM DATE '1970-01-01')" gives "28800" instead of "0".
> Before 9.2 it was time zone-related.

We're definitely not going to back-patch this.  Let's tally up the
votes on that other thread:

Danielle Varrazzo: infinity
Bruce Momjian: infinity
Robert Haas: not sure we want to change anything, but if so let's
definitely NOT throw an error
Alvaro Herrera: infinity for epoch, but what about other things?
Brendan Jurd: infinity for epoch, error for other things
Tom Lane: infinity for epoch, error or NaN for other things
Josh Berkus: definitely change something, current behavior sucks

That doesn't seem like enough consensus to commit this patch, which
would change everything to +/-infinity.  That particular choice
wouldn't bother me much, but it sounds like other people aren't sold.
I think we need to try to hash that out a little more rather than
rushing into a backward-incompatible change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-11-09 Thread Catalin Iacob

On Mon, Nov 9, 2015 at 2:58 PM, Pavel Stehule  wrote:
> I needed to understand the support for Python 3.5.
>
> The patch with the fix is attached regress tests 3.5 Python

I wanted to type a reply this morning to explain and then I noticed
there's another file (_0.out) for Python2.4 and older (as explained by
src/pl/plpython/expected/README) so tests there fail as well. I built
Python 2.4.6 and then Postgres against it and updated the expected
output for 2.4 as well. Find the changes for 2.4 attached. You can add
those to your patch.

While I was at it I tested 2.5 and 2.6 as well, they also pass all
tests. So with the 2.4 changes attached I think we're done.

BTW, the alternative output files for pg_regress are explained at
http://www.postgresql.org/docs/devel/static/regress-variant.html

adjust_py24_expected.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting sorted data from foreign server for merge join

On Sat, Nov 7, 2015 at 5:42 PM, Greg Stark  wrote:
> On Fri, Nov 6, 2015 at 4:54 AM, Ashutosh Bapat
>  wrote:
>> PFA patch to get data sorted from the foreign server (postgres_fdw)
>> according to the pathkeys useful for merge join.
>
> An idle thought. There are going to be a lot of cases where different
> software systems actually disagree about collation rules. I wonder if
> it would be valuable to have a node that just checks that each row is
> in fact greater than the previous row and throws an error if not. That
> can be optional or a parameter of the FDW but it's probably cheap
> enough to have enabled by default. It would save a lot of difficult to
> heartache since the behaviour if the inputs aren't correctly sorted
> will be strangely incorrect join results. Often the results may just
> be missing or duplicated rows and that can be easy to miss and lead to
> corrupted databases or security problems.

It's not a bad thought, but it could come up even locally - we've had
more than one situation where indexes have gotten corrupted by
updating glibc.  The new glibc doesn't agree with the old one on what
the collation ordering is, and so the indexes are wrong with respect
to the new glibc version.  If we were going to design something like
this, rather than making it a separate node, I'd be inclined to create
it as a C-callable function that could be invoked anywhere we want to
check that the ordering is valid.  I suspect you're wrong about the
cost, though: I bet it wouldn't be too hard to find cases where it
imposes a really noticeable penalty.

Also, to be clear, I don't think this patch needs to solve that problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Modify tqueue infrastructure to support transient record types.

On Mon, Nov 9, 2015 at 4:06 AM, Amit Kapila  wrote:
> On Sat, Nov 7, 2015 at 3:29 AM, Robert Haas  wrote:
>> Modify tqueue infrastructure to support transient record types.
>
> I am getting below compiler warning on Windows:
> tqueue.c(662): warning C4715: 'TupleQueueRemap' : not all control paths
> return a value

Well, that sucks.  Apparently, your Windows compiler thinks
elog(ERROR, ...) might return.  Apparently
b853eb97182079dcd30b4f52576bd5d6c275ee71 wasn't good enough for your
compiler; I wonder why not.

> Attached patch fixes the problem.

I don't want to do that, because adding a default: case to the switch
will prevent the compiler from complaining if somebody introduces
another enum value in the future.  Instead, we can just add a dummy
return at the end of the function.  I'll go do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-09 Thread Alexander Korotkov

On Mon, Nov 9, 2015 at 4:53 PM, Pavel Stehule 
wrote:

> 2015-11-09 14:44 GMT+01:00 YUriy Zhuravlev :
>
>> On Monday 09 November 2015 13:50:20 Pavel Stehule wrote:
>> > New symbols increase a complexity of our code and our documentation.
>> >
>> > If some functionality can be implemented via functions without
>> performance
>> > impacts, we should not to create new operators or syntax - mainly for
>> > corner use cases.
>> >
>> > Regards
>> >
>> > Pavel
>>
>> Ok we can use {:} instead [:] for zero array access.
>> The function is the solution half.
>>
>
> It isn't solution. The any syntax/behave change have to have stronger
> motivation. We had so talk about it 20 years ago :(
>

Assuming array[~n] has a current meaning, could we give a try to new syntax
which doesn't have current meaning? Not yet sure what exactly it could be...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2015-11-09 Thread Jesper Pedersen


Hi,

On 11/06/2015 03:47 PM, Jesper Pedersen wrote:

Did you initdb between tests? Pgbench -i? Restart the database?


I didn't initdb / pgbench -i between the tests, so that it is likely it.



Each graph has a full initdb + pgbench -i cycle now.

I know, I have a brown paper bag somewhere.

Best regards,
 Jesper



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CustomScan support on readfuncs.c

On Sat, Nov 7, 2015 at 6:38 AM, Kouhei Kaigai  wrote:
> Are you suggesting to pass the object name on software build time?

Yes.  That's what test_shm_mq and worker_spi already do:

sprintf(worker.bgw_library_name, "test_shm_mq");

> If so, it may load incorrect libraries when DBA renamed its filename.
> At least, PostgreSQL cannot prevent DBA to rename library filenames
> even if they try to deploy "pg_strom.so", "pg_strom.20151106.so" and
> "pg_strom.20151030_bug.so" on same directory.

I agree.  But that's not a new problem that needs to be solved by this
patch.  It already exists - see above.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extracting fields from 'infinity'::TIMESTAMP[TZ]

2015-11-09 Thread Kevin Grittner

On Monday, November 9, 2015 9:37 AM, Robert Haas  wrote:
> On Sat, Nov 7, 2015 at 9:47 AM, Vitaly Burovoy  
> wrote:

>> I'd like to raise a topic about extracting fields from infinite
>> timestamps, so much more that it is mentioned in the TODO list:
>> "Determine how to represent date/time field extraction on infinite
>> timestamps".
>>
>> Currently extracting any field from 'infinity'::TIMESTAMP[TZ] gives
>> result "0" as a mark it has "special" input value.
>>
>> The most confusing case is 'epoch' field: returning "0" from
>> "infinity" means the same thing as returning "0" from "1970-01-01+00".
>>
>> Returning zero in most other cases is only slightly less confusing
>> (may be because for me they are less often used).
>> For example, what about "SELECT EXTRACT(DOW FROM TIMESTAMP
>> 'Infinity')" with result 0, as if it is Sunday?
>> The same thing with fields: decade, hour, minute, seconds,
>> microseconds, milliseconds, timezone, timezone_hour, timezone_minute.
>> Also for "millennium" and "year" (with the note "Keep in mind there is
>> no 0 AD") current returning value is _between_ allowed values, but
>> disallowed.

> We're definitely not going to back-patch this.  Let's tally up the
> votes on that other thread:
>
> Danielle Varrazzo: infinity
> Bruce Momjian: infinity
> Robert Haas: not sure we want to change anything, but if so let's
> definitely NOT throw an error
> Alvaro Herrera: infinity for epoch, but what about other things?
> Brendan Jurd: infinity for epoch, error for other things
> Tom Lane: infinity for epoch, error or NaN for other things
> Josh Berkus: definitely change something, current behavior sucks
>
> That doesn't seem like enough consensus to commit this patch, which
> would change everything to +/-infinity.  That particular choice
> wouldn't bother me much, but it sounds like other people aren't sold.
> I think we need to try to hash that out a little more rather than
> rushing into a backward-incompatible change.

I agree that none of this should be back-patched.

I agree that a timestamp[tz] of infinity should yield infinity for
epoch.

My first choice for other things would be NaN, but throwing an
error instead would be OK.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extracting fields from 'infinity'::TIMESTAMP[TZ]

2015-11-09 Thread Kevin Grittner

> On Mon, Nov 9, 2015 at 8:22 AM, Kevin Grittner  wrote:
>> My first choice for other things would be NaN, but throwing an
>> error instead would be OK.


On Monday, November 9, 2015 10:41 AM, Tom Lane  wrote:
> What about returning NULL for the ill-defined cases?  That seems
> to comport with SQL's notion of NULL as "unknown/undefined".


On Monday, November 9, 2015 10:44 AM, Steve Crawford 
 wrote:
> Given that null is a "special value that is used to indicate the
> absence of any data value" and that attributes like month or
> day-of-week will have no value for a date of infinity I'd be OK
> with returning null.


NULL seens clearly better than NaN or an error; I wish that had
occurred to me before I posted.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-09 Thread 德哥



HI, 
   PostgreSQL 9.5 added skip locked to select for update to improve concurrency 
performance, but why not add it to update sql?
   this is an application case, some body will update a tuple at the same time, 
so the RT for waiter is big, I use function and select for update nowait or 
advisory lock , can improve concurrency , but it's not very pure for developer.
 http://blog.163.com/digoal@126/blog/static/16387704020158149538415/


--
公益是一辈子的事,I'm Digoal,Just Do It.

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

=?GBK?B?tcK45w==?=  writes:
>PostgreSQL 9.5 added skip locked to select for update to improve 
> concurrency performance, but why not add it to update sql?

Seems like you'd have unpredictable results from the update then.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extracting fields from 'infinity'::TIMESTAMP[TZ]

2015-11-09 Thread Steve Crawford

I was unaware that we had +- infinity for numeric.

select pg_typeof(extract(epoch from current_date));
   pg_typeof
--
double precision

Given that null is a "special value that is used to indicate the absence of
any data value" and that attributes like month or day-of-week will have no
value for a date of infinity I'd be OK with returning null.

I suppose the real question is what return value will cause the smallest
amount of breakage and surprising results. Throwing an error will
definitely break legit queries.

Cheers,
Steve


On Mon, Nov 9, 2015 at 8:22 AM, Kevin Grittner  wrote:

> On Monday, November 9, 2015 9:37 AM, Robert Haas 
> wrote:
> > On Sat, Nov 7, 2015 at 9:47 AM, Vitaly Burovoy 
> wrote:
>
> >> I'd like to raise a topic about extracting fields from infinite
> >> timestamps, so much more that it is mentioned in the TODO list:
> >> "Determine how to represent date/time field extraction on infinite
> >> timestamps".
> >>
> >> Currently extracting any field from 'infinity'::TIMESTAMP[TZ] gives
> >> result "0" as a mark it has "special" input value.
> >>
> >> The most confusing case is 'epoch' field: returning "0" from
> >> "infinity" means the same thing as returning "0" from "1970-01-01+00".
> >>
> >> Returning zero in most other cases is only slightly less confusing
> >> (may be because for me they are less often used).
> >> For example, what about "SELECT EXTRACT(DOW FROM TIMESTAMP
> >> 'Infinity')" with result 0, as if it is Sunday?
> >> The same thing with fields: decade, hour, minute, seconds,
> >> microseconds, milliseconds, timezone, timezone_hour, timezone_minute.
> >> Also for "millennium" and "year" (with the note "Keep in mind there is
> >> no 0 AD") current returning value is _between_ allowed values, but
> >> disallowed.
>
> > We're definitely not going to back-patch this.  Let's tally up the
> > votes on that other thread:
> >
> > Danielle Varrazzo: infinity
> > Bruce Momjian: infinity
> > Robert Haas: not sure we want to change anything, but if so let's
> > definitely NOT throw an error
> > Alvaro Herrera: infinity for epoch, but what about other things?
> > Brendan Jurd: infinity for epoch, error for other things
> > Tom Lane: infinity for epoch, error or NaN for other things
> > Josh Berkus: definitely change something, current behavior sucks
> >
> > That doesn't seem like enough consensus to commit this patch, which
> > would change everything to +/-infinity.  That particular choice
> > wouldn't bother me much, but it sounds like other people aren't sold.
> > I think we need to try to hash that out a little more rather than
> > rushing into a backward-incompatible change.
>
> I agree that none of this should be back-patched.
>
> I agree that a timestamp[tz] of infinity should yield infinity for
> epoch.
>
> My first choice for other things would be NaN, but throwing an
> error instead would be OK.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-11-09 Thread Alvaro Herrera

Ildus Kurbangaliev wrote:

> Thanks for the review. I've attached a new version of SLRU patch. I've
> removed add_postfix and fixed EXEC_BACKEND case.

Thanks.

Please do not use "committs" in commit_ts.c; people didn't like the
abbreviated name without the underscore.  But then, why are we
abbreviating here?  We could keep it complete and with a space instead
of underscore, so why not use just "commit timestamp", because it's just
a string, right?

In multixact.c, is there a reason to have underscore in the strings?  We
could substitute it with a space and it'd look prettier; but really, we
could also keep those names parallel to subdirectory names by using the
already existing string parameter as name here, and not add another one.

I also imagined that the Slru's ControlLock itself would be part of the
tranche, and not just the per-buffer locks.  That requires a bit more
churn, but seems reasonable.

Why do we have two per-buffer loops in SimpleLruInit?  I mean, why not
add the LWLockInitialize call to the second one?

I'm up to speed on how the LWLockTranche API works -- does assigning to
tranche_name a pstrdup string work okay?  Is the pstrdup really
necessary?

>   /* Initialize our shared state struct */
> diff --git a/src/backend/access/transam/slru.c 
> b/src/backend/access/transam/slru.c
> index 90c7cf5..868b35a 100644
> --- a/src/backend/access/transam/slru.c
> +++ b/src/backend/access/transam/slru.c
> @@ -157,6 +157,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
>   if (nlsns > 0)
>   sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));/* 
> group_lsn[] */
>  
> + sz += MAXALIGN(nslots * sizeof(LWLockPadded)); /* lwlocks[] */
> +
>   return BUFFERALIGN(sz) + BLCKSZ * nslots;
>  }

What is the "lwlocks[]" comment supposed to mean?  I don't think there's
a struct member with that name, is there?

Uhm, actually, why do we keep buffer_locks[] at all?  This arrangement
seems pretty odd, where if I understand correctly we have one array
which is the tranche and another array which points to each item in the
tranche ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting sorted data from foreign server for merge join

On Fri, Nov 6, 2015 at 2:03 PM, Kevin Grittner  wrote:
> Has anyone taken a close look at what happens if the two sides of
> the merge join have different implementations of the same collation
> name?  Is there anything we should do to defend against the
> problem?

The issue of FDWs vs. collations has been thought about to some degree
(see FDW_COLLATE_NONE/SAFE/UNSAFE), but I don't quite understand that
stuff in detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting sorted data from foreign server for merge join

On Sat, Nov 7, 2015 at 12:07 PM, Corey Huinker  wrote:
> Sorry to barge in late, but I was wondering if what we've learned with this
> patch can be applied to the case of asserting a sort order on a query
> returning from dblink().

Nope.

Sorry to the bearer of bad news, but that would be a different and
much harder development project.  Set-returning functions have no way
to indicate anything about the ordering of their return values at
present.  We could invent something, but the work involved wouldn't
have much to do with this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-11-09 Thread Amit Kapila

On Fri, Nov 6, 2015 at 10:13 AM, Robert Haas  wrote:
>
> On Thu, Nov 5, 2015 at 10:49 PM, Amit Kapila 
wrote:
> > On Thu, Nov 5, 2015 at 11:54 PM, Robert Haas 
wrote:
> >> I was thinking about this idea:
> >>
> >> 1. Add a parallel_aware flag to each plan.
> >
> > Okay, so shall we add it in generic Plan node or to specific plan nodes
> > like SeqScan, IndexScan, etc.  To me, it appears that parallelism is
> > a node specific property, so we should add it to specific nodes and
> > for now as we are parallelising seq scan, so we can add this flag in
> > SeqScan node.  What do you say?
>
> I think it should go in the Plan node itself.  Parallel Append is
> going to need a way to test whether a node is parallel-aware, and
> there's nothing simpler than if (plan->parallel_aware).  That makes
> life simple for EXPLAIN, too.
>

Okay, I have updated the patch to make seq scan node parallel aware.
To make that happen we need to have parallel_aware flag both in Plan
as well as Path, so that we can pass that information from Path to Plan.
I think the right place to copy parallel_aware info from path to
plan is copy_path_costsize and ideally we should change the name
of function to something like copy_generic_path_info(), but for
now I have retained it's original name as it is used at many places,
let me know if you think we should goahead and change the name
of function as well.

I have changed Explain as well so that it adds Parallel for Seq Scan if
SeqScan node is parallel_aware.

I have not integrated it with consider-parallel patch, so that this and
Partial Seq Scan version of the patch can be compared without much
difficulity.

Thoughts?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

parallel_seqscan_partialseqscan_v25.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Extracting fields from 'infinity'::TIMESTAMP[TZ]

Kevin Grittner  writes:
> On Monday, November 9, 2015 9:37 AM, Robert Haas  
> wrote:
>> That doesn't seem like enough consensus to commit this patch, which
>> would change everything to +/-infinity.  That particular choice
>> wouldn't bother me much, but it sounds like other people aren't sold.
>> I think we need to try to hash that out a little more rather than
>> rushing into a backward-incompatible change.

> I agree that none of this should be back-patched.

Definitely.

> I agree that a timestamp[tz] of infinity should yield infinity for
> epoch.

I think everybody is sold on that much.

> My first choice for other things would be NaN, but throwing an
> error instead would be OK.

Since the function hasn't thrown error for such cases in the past, making
it do so now would likely break applications.  More than once, we've
had to modify functions to avoid throwing errors so that you don't get
incidental errors when blindly applying a function to all entries in a
column.  I think going in the opposite direction would elicit protests.

I could see using NaN except for one thing: it'd mean injecting a rather
fundamental dependence on IEEE math into a basic function definition.  You
can be just about 100% certain that if the SQL committee ever addresses
this case, it won't be with NaN.

What about returning NULL for the ill-defined cases?  That seems to
comport with SQL's notion of NULL as "unknown/undefined".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eXtensible Transaction Manager API

On 9 November 2015 at 18:46, Robert Haas  wrote:


> > I am aware of the fact
> > that by definition PREPARE TRANSACTION ensures that a transaction will
> > be committed with COMMIT PREPARED, just trying to see any corner cases
> > with the approach proposed. The DTM approach is actually rather close
> > to what a GTM in Postgres-XC does :)
>
> Yes.  I think that we should try to learn as much as possible from the
> XC experience, but that doesn't mean we should incorporate XC's fuzzy
> thinking about 2PC into PG.  We should not.
>

Fuzzy thinking. Please explain.

One point I'd like to mention is that it's absolutely critical to
> design this in a way that minimizes network roundtrips without
> compromising correctness.  XC's GTM proxy suggests that they failed to
> do that.  I think we really need to look at what's going to be on the
> other sides of the proposed APIs and think about whether it's going to
> be possible to have a strong local caching layer that keeps network
> roundtrips to a minimum.  We should consider whether the need for such
> a caching layer has any impact on what the APIs should look like.
>

You mean the caching layer that already exists in XL/XC?


> For example, consider a 10-node cluster where each node has 32 cores
> and 32 clients, and each client is running lots of short-running SQL
> statements.  The demand for snapshots will be intense.  If every
> backend separately requests a snapshot for every SQL statement from
> the coordinator, that's probably going to be terrible.  We can make it
> the problem of the stuff behind the DTM API to figure out a way to
> avoid that, but maybe that's going to result in every DTM needing to
> solve the same problems.


The whole purpose of that XTM API is to allow different solutions for that
to be created. Konstantin has made a very good case for such an API to
exist, based around 3 markedly different approaches.

Whether we allow the API into core to be accessible via extensions is a
different issue, but it looks fine for its purpose.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-11-09 Thread Ildus Kurbangaliev


> On Nov 9, 2015, at 7:56 PM, Alvaro Herrera  wrote:
> 
> Ildus Kurbangaliev wrote:
> 
>> Thanks for the review. I've attached a new version of SLRU patch. I've
>> removed add_postfix and fixed EXEC_BACKEND case.
> 
> Thanks.
> 
> Please do not use "committs" in commit_ts.c; people didn't like the
> abbreviated name without the underscore.  But then, why are we
> abbreviating here?  We could keep it complete and with a space instead
> of underscore, so why not use just "commit timestamp", because it's just
> a string, right?
> 
> In multixact.c, is there a reason to have underscore in the strings?  We
> could substitute it with a space and it'd look prettier; but really, we
> could also keep those names parallel to subdirectory names by using the
> already existing string parameter as name here, and not add another one.

I do not insist on concrete names or a case here, but I think that identifiers 
are more
useful when they don't contain spaces. For example that name will be exposed 
later
in other places and can be part of some longer string.

> 
> Why do we have two per-buffer loops in SimpleLruInit?  I mean, why not
> add the LWLockInitialize call to the second one?

Thanks. I didn't see that.

> 
> I'm up to speed on how the LWLockTranche API works -- does assigning to
> tranche_name a pstrdup string work okay?  Is the pstrdup really
> necessary?

I think pstrdup can be removed here.

> 
>>  /* Initialize our shared state struct */
>> diff --git a/src/backend/access/transam/slru.c 
>> b/src/backend/access/transam/slru.c
>> index 90c7cf5..868b35a 100644
>> --- a/src/backend/access/transam/slru.c
>> +++ b/src/backend/access/transam/slru.c
>> @@ -157,6 +157,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
>>  if (nlsns > 0)
>>  sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));/* 
>> group_lsn[] */
>> 
>> +sz += MAXALIGN(nslots * sizeof(LWLockPadded)); /* lwlocks[] */
>> +
>>  return BUFFERALIGN(sz) + BLCKSZ * nslots;
>> }
> 
> What is the "lwlocks[]" comment supposed to mean?  I don't think there's
> a struct member with that name, is there?

It just means that we are allocating memory for an array of lwlocks,
i'll change it.

> 
> Uhm, actually, why do we keep buffer_locks[] at all?  This arrangement
> seems pretty odd, where if I understand correctly we have one array
> which is the tranche and another array which points to each item in the
> tranche ...

Actually yes, that is a good idea.


Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company

Re: [HACKERS] can we add SKIP LOCKED to UPDATE?

2015-11-09 Thread Craig Ringer

On 10 November 2015 at 01:38, Jeff Janes  wrote:

> this would be handy in conjunction with LIMIT
> (which also doesn't exist for UPDATE right now).

... and, in turn, UPDATE ... ORDER BY ..., since LIMIT without ORDER
BY is usually not a great choice.

I'd quite like to see UPDATE ... ORDER BY for deadlock avoidance
anyway. Right now doing it really reliably seems to require a SELECT
... FOR UPDATE, then an UPDATE on the SELECTed tuples only. If you're
in READ COMMITTED you can't assume the UPDATE won't see new tuples
since the SELECT so you need to supply a key-list to the UPDATE
directly or via a wCTE.

I'm constantly surprised that people don't seem to hit deadlocks
between updates more often. I guess the number of cases where
multi-row updates on overlapping but non-identical sets of rows occur
concurrently must be fairly limited in practice.

Using SKIP LOCKED in a wCTE with an UPDATE is clunkier but not that
bad. So I don't think it's drastically important, but it would be
nice.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] eXtensible Transaction Manager API

2015-11-09 Thread Amit Kapila

On Mon, Nov 9, 2015 at 2:08 PM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:
>
>
> On 09.11.2015 07:46, Amit Kapila wrote:
>
> I think so.  Basically DLM should be responsible for maintaining
> all the lock information which inturn means that any backend process
> that needs to acquire/release lock needs to interact with DLM, without that
> I don't think even global deadlock detection can work (to detect deadlocks
> among all the nodes, it needs to know the lock info of all nodes).
>
>
> I hope that it will not needed, otherwise it will add significant
> performance penalty.
> Unless I missed something, locks can still managed locally, but we need
> DLM to detect global deadlocks.
>

How will you check lock conflicts, example Process A on node-1
tries to acquire lock on object-1, but Process B from node-2 already
holds a conflicting lock on the object-1.  Now if try to think in a way
that we will have an entry of lock request from node-2 in node-1, then
I think it will be difficult to manage and release locks.

>
>
>
>> 3. Should DLM be implemented by separate process or should it be part of
>> arbiter (dtmd).
>>
>
> That's important decision. I think it will depend on which kind of design
> we choose for distributed transaction manager (arbiter based solution or
> non-arbiter based solution, something like tsDTM).  I think DLM should be
> separate, else arbiter will become hot-spot with respect to contention.
>
>
> There are pros and contras.
> Pros for integrating DLM in DTM:
> 1. DTM arbiter has information about local to global transaction ID
> mapping which may be needed to DLM
> 2. If my assumptions about DLM are correct, then it will be accessed
> relatively rarely and should not cause significant
> impact on performance.
>
>
Yeah, if the usage of DLM is relatively less, then it can make sense
to club them, but otherwise it doesn't make much sense.


> Contras:
> 1. tsDTM doesn't need centralized arbiter but still needs DLM
> 2. Logically DLM and DTM are independent components
>
>
>
> Can you please explain more about tsDTM approach, how timestamps
> are used, what is exactly CSN (is it Commit Sequence Number) and
> how it is used in prepare phase?  IS CSN used as timestamp?
> Is the coordinator also one of the PostgreSQL instance?
>
>
> In tsDTM approach system time (in microseconds) is used as CSN (commit
> sequence number).
> We also enforce that assigned CSN is unique and monotonic within
> PostgreSQL instance.
> CSN are assigned locally and do not require interaction with some other
> cluster nodes.
> This is why in theory tsDTM approach should provide good scalability.
>

Okay, but won't checking visibility of tuples need transaction to CSN
mapping?


>
> From my point of view there are two different scenarios:
> 1. When most transactions are local and only few of them are global (for
> example most  operation in branch of the back are
> performed with accounts of clients of this branch, but there are few
> transactions involved accounts from different branches).
> 2. When most or all transactions are global.
>
> It seems to me that first approach is more popular in real life and
> actually good performance of distributed system can be achieved only in
> case when most transaction are local (involves only one node). There are
> several approaches allowing to optimize local transactions. For example
> once used in SAP HANA (
> http://pi3.informatik.uni-mannheim.de/~norman/dsi_jour_2014.pdf)
> We have also DTM implementation based on this approach but it is not yet
> working.
>
> If most of transaction are global, them affect random subsets of cluster
> nodes (so it is not possible to logically split cluster into groups of
> tightly coupled nodes) and number of nodes is not very large (<10) then I
> do not think that there can be better alternative (from performance point
> of view) than centralized arbiter.
>
>
I am slightly confused with above statement, it seems for both
the scenario's you seem to be suggesting to have centrailized
arbiter. I think when most transactions are global having centrailized
arbiter might not be good solution, especially if the nodes in cluster
are large.


> But it is only my speculations and it will be really very interesting for
> me to know access patterns of real customers, using distributed systems.
>
>
I think it is better if the solution is optimized for all kind of
scenario's,
because once the solution is adopted by PostgreSQL, it will be very
difficult to change it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-11-09 Thread Kyotaro HORIGUCHI

Hello, I have some random comments on this patch addition to
Amit's comments.

- Type of the flag of vacuum activity.

ACTIVITY_IS_VACUUM is the alone entry in the enum, and the
variable to store it is named as *flag. If you don't have any
plan to extend this information, the name of this variable would
seems better to be something like pgstat_report_vacuum_running
and in the type of boolean.

- Type of st_progress_param and so.

The variable st_progress_param has very generic name but as
looking the pg_stat_get_vacuum_progress, every elements of it is
in a definite role. If so, the variable should be a struct.

st_progress_param_float is currently totally useless.

- Definition of progress_message.

The definition of progress_message in lazy_scan_heap is "char
[PROGRESS_MESSAGE_LENGTH][N_PROGRESS_PARAM]" which looks to be
inversed. The following snprintf,

| snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", schemaname);

certainly  destroys the data already stored in it if any.

- snprintf()

You are so carefully to use snprintf,

+snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s", schemaname);
+   strcat(progress_message[0],".");
+   strcat(progress_message[0],relname);

but the strcats following ruin it.


- Calculation of total_heap_pages in lazy_scan_heap.

  The current code subtracts the number of blocks when
  skipping_all_visible_blocks is set in two places. But I think
  it is enough to decrement when skipping.

I'll be happy if this can be of any help.

regards,


At Tue, 10 Nov 2015 14:44:23 +0900, Amit Langote 
 wrote in <56418437.5080...@lab.ntt.co.jp>
> Thanks for the v6. A few quick comments:
> 
> - duplicate_oids error in HEAD.
> 
> - a compiler warning:
> 
> pgstat.c:2898: warning: no previous prototype for ‘pgstat_reset_activityflag’
> 
> To fix that use void for empty parameter list -
> 
> -extern void pgstat_reset_activityflag();
> +extern void pgstat_reset_activityflag(void);
> 
> One more change you could do is 's/activityflag/activity_flag/g', which I
> guess is a naming related guideline in place.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog: spurious error message connecting to 9.3

2015-11-09 Thread Craig Ringer

On 10 November 2015 at 01:47, Marco Nenciarini
 wrote:

> I've attached a little patch that removes the errors when connected to 9.3.

Looks good to me. No point confusing users.

The other callers of RunIdentifySystem are pg_basebackup and
pg_receivelogical.

pg_basebackup doesn't ask for the db_name (passes null).

pg_receivelogical handles it being null already (and if it didn't,
it'd die with or without this patch).

pg_receivexlog expects it to be null and fails gracefully if it isn't.

So this change just removes some pointless noise.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storage/buffer/README docs about buffer replacement are out of date

2015-11-09 Thread Amit Kapila

On Tue, Nov 10, 2015 at 3:34 AM, Andres Freund  wrote:
>
> Hi,
>
> $subject contains:
>
> > The "clock hand" is a buffer index, nextVictimBuffer, that moves
circularly
> > through all the available buffers.  nextVictimBuffer is protected by the
> > buffer_strategy_lock.
> >
> > The algorithm for a process that needs to obtain a victim buffer is:
> >
> > 1. Obtain buffer_strategy_lock.
> >
> > 2. If buffer free list is nonempty, remove its head buffer.  Release
> > buffer_strategy_lock.  If the buffer is pinned or has a nonzero usage
count,
> > it cannot be used; ignore it go back to step 1.  Otherwise, pin the
buffer,
> > and return it.
> >
> > 3. Otherwise, the buffer free list is empty.  Select the buffer pointed
to by
> > nextVictimBuffer, and circularly advance nextVictimBuffer for next time.
> > Release buffer_strategy_lock.
> >
> > 4. If the selected buffer is pinned or has a nonzero usage count, it
cannot
> > be used.  Decrement its usage count (if nonzero), reacquire
> > buffer_strategy_lock, and return to step 3 to examine the next buffer.
> >
> > 5. Pin the selected buffer, and return.
>
>
> This is currently accurate on several levels:
>
> a) nextVictimBuffer isn't protectec by the buffer_strategy_lock
>anymore.
> b) The buffer free list is first checked unlocked - which 2) above
>doesn't document.
> c) The buffer isn't actually returned pinned - instead it's kept locked.
>
> Now a) and b) are recent oversights of mine. I'd apparently not realized
> that there's detailed docs on this in buffer/README. But c) is pretty
> old - essentially 5d50873 from 2005.
>

I think in point 5 above, it talks about the buffer returned by BufferAlloc.
Refer the point before commitid -
5d7962c6797c0baae9ffb3b5b9ac0aec7b598bc3
-5. Pin the selected buffer, release BufFreelistLock, and return the buffer.
+5. Pin the selected buffer, and return.

BufFreelistLock was released in BufferAlloc() which indicates that it talks
about pinning done in the BufferAlloc() function, so it seems we can retain
that part as it is and rest changes suggested by you seems to be fine.

Also if we see what is written around these points, I think the intention
of 5th point was to cover buffer returned by BufferAlloc().


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Multixid hindsight design

2015-11-09 Thread Craig Ringer

On 10 November 2015 at 02:26, Robert Haas  wrote:
>> I'd like to see, say, python and the 'unittest' module added, and
>> to see acceptance of tests using psycopg2 to stream and decode WAL
>> from a slot.
>
> I actually kind of hate this sort of thing.

For what it's worth, I don't actually like it either. However, I
suspect that the current restricted tools are a significant impediment
to test tooling and infrastructure improvement.

> Now I'm not dead set against changing anything at all here, but I want
> to point out that we just added the TAP framework and already there
> are proposals (like Kevin's snapshot too old patch) to require DBD::Pg
> for some tests, which a lot of people aren't going to have

The alternative seems to be driving psql -qAtz as a coprocess over a
pipeline as a poor man's database driver. I really doubt that's any
kind of improvement when it comes to the complexity of maintaining
tests, fixing them when they break, etc. I'd rather just write tests
in C against libpq despite the verbosity, annoyance of maintaining
them, etc.

Part of why I'm somewhat interested in seeing python and psycopg2
added is that I'd personally quite like to see widely used client
drivers covered by the buildfarm to some extent specifically to see if
we break them. Though really, it'd make sense to add the most
important drivers like psqlODBC and PgJDBC first if doing that and
that'd be a whole new barrel of fun.

> and I know
> you and probably a few other people would like python, which is fair
> enough, but other people are going to want other things, and pretty
> soon we we end up with a situation - if we're not already there -
> where for a developer or packager to get a machine to the point where
> it runs "all the tests" is a major undertaking.

That's the bit I find odd, I guess. It's *trivial* to add DBD::Pg on
any remotely sane host. Similarly Python and psycopg2. One package
each, or a very simple compile. It's nothing in comparison to setting
up to building our documentation... or getting pretty much any of the
basic libraries in place on Windows.

(However, the streaming replication support for psycopg2 hasn't landed
in mainline yet, so you can't just grab it from PGDG repos).

Adding more tools does not make the existing ones easier, of course,
so that's not really an argument in favour. I just suspect the cost of
adding them is being somewhat overblown. That said ... I don't
maintain a 1990s-era MIPS box, an old ARM 6, or whatever.

It's a pity in a way that the Java toolchain and build process is so
heavy, because otherwise it'd make sense to just use Java and PgJDBC.
PgJDBC development happens closer to core Pg than most drivers, within
the main Pg community. It's also crucial for PostgreSQL's success,
with huge adoption and a lot of people relying on it. JUnit is one of
the least awful test frameworks I've worked with. Java is widely
understood and relatively easy to follow even if you don't know it
well. Unfortunately I see around zero chance of getting a JVM, maven
or ant, and a bajillion libraries onto buildfarm members.

> Accepting more tools also increases the breadth of knowledge expected
> of committers and patch authors.

That's the real problem IMO. The cognitive and knowledge load.

It's exactly the same reason why I just explained to someone else (the
not-UTF32 network run by the secret gov't conspiracy/aliens dude, as
it turns out) why the codebase requires English identifiers, variable
names, comments, etc. While I'd argue that the cost of adding Python
localized to some tests is lower than the cost of adding (say) code in
Russian to a contrib, it's not zero.

So I guess I've already argued against myself in another form.

Still, the alternative seems to be writing new bespoke tools on top of
libpq. Need something to analyse and decode a WAL stream? Write it in
C on top of libpq, add it to the test infrastructure. Probably write
new tests as plugins in C, too, or use an embedded Perl runtime, or
XS. Or concoct yet another custom language / test description, like
isolationtester, for the purpose. Those tools aren't much easier to
get rid of, once added, than new languages etc are.

Eventually it'd be good to be able to actually test things like
streaming replication in an automated manner. We're going to hit the
limitations of 'do some stuff and diff the output files' pretty
quickly there. Adding a new scripting language isn't exactly going to
fix that though.

> Similarly with testing frameworks
> - except those are even worse, because nobody's saying that any of the
> stuff we have now will ever go away.

Yeah. I guess that's the issue, isn't it. When something gets added
we're generally stuck with it forever. We still have PL/TCL!

Even if it were desirable (and ignoring the horrific backpatching pain
that'd result), nobody's going to leap up and volunteer to rewrite all
the Perl like the Windows build system so we could remove Perl. It's
there

CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-11-09 Thread Kouhei Kaigai

It is a relevant topic of readfuncs support for custom-scan.

Unlike CustomPath and CustomScanState, we don't allow custom-scan
provider to define own and larger structure that embeds CustomScan
at head of the structure, to support copyObject().
Thus, custom-scan provider needs to have own logic to pack and
unpack private fields, like:
  https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L112

At present, only create_customscan_plan() and _copyCustomScan()
can produce CustomScan node.
The create_customscan_plan() invokes PlanCustomPath callback,
thus, CSP can return a pointer of CustomScan field within
a larger structure. So, we don't adjust this interface.
On the other hands, _copyCustomScan() allocates a new node using
makeNode(CustomScan), thus, here is no space for the larger
structure if CSP wants to use to keep private values without
packing and unpacking.
Only CSP knows exact size of the larger structure, so all we can
do is ask CSP to allocate a new node and copy its private fields.
This patch newly adds NodeCopyCustomScan callback to support
copyObject.

Also, v9.6 will support nodeToString/stringToNode on plan nodes.
So, we need to re-define the role of TextOutCustomScan callback
that is also defined at v9.5.
If CSP extends the CustomScan to save private values on the extra
area, only CSP knows what values and which data types are stored,
thus, all we can do is ask CSP to serialize and deserialize its
private fields without inconsistency.
Even though TextOutCustomScan callback shall be once ripped out
to support readfuncs.c, a pair of TextOut and TextRead callback
will re-define its responsibility again; when CSP defines
a larger structure that embeds CustomScan, a pair of these
callbacks are used to serialize/deserialize the entire custom
defined objects.

The attached patches are for v9.5 and v9.6. The v9.6 patch
assumes the patch for readfuncs support is already applied.
The v9.5 patch assumes the latest master.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Tuesday, November 10, 2015 11:47 AM
> To: Robert Haas
> Cc: Amit Kapila; Andres Freund; pgsql-hackers
> Subject: Re: [HACKERS] CustomScan support on readfuncs.c
> 
> > On Sat, Nov 7, 2015 at 6:38 AM, Kouhei Kaigai  wrote:
> > > Are you suggesting to pass the object name on software build time?
> >
> > Yes.  That's what test_shm_mq and worker_spi already do:
> >
> > sprintf(worker.bgw_library_name, "test_shm_mq");
> >
> OK, I ripped out the portion around dfmgr.c.
> 
> > > If so, it may load incorrect libraries when DBA renamed its filename.
> > > At least, PostgreSQL cannot prevent DBA to rename library filenames
> > > even if they try to deploy "pg_strom.so", "pg_strom.20151106.so" and
> > > "pg_strom.20151030_bug.so" on same directory.
> >
> > I agree.  But that's not a new problem that needs to be solved by this
> > patch.  It already exists - see above.
> >
> It still may be a potential problem, even though a corner case.
> I'll try to implement an internal API to know "what is my name".
> 
> The attached main patch (custom-scan-on-readfuncs.v3.patch) once
> removes TextOutCustomScan, thus all the serialized tokens become
> known to the core backend, and add _readCustomScan() at readfuncs.c.
> It deserializes CustomScan nodes from cstring tokens, especially,
> reloads the pointer of CustomScanMethods tables using a pair of
> library name and symbol name.
> Thus, it also requires custom scan providers to keep the methods
> table visible from external objects.
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei 

custom-scan-on-copyfuncs-9.6devel.v3.patch
Description: custom-scan-on-copyfuncs-9.6devel.v3.patch

custom-scan-on-copyfuncs-9.5beta.v3.patch
Description: custom-scan-on-copyfuncs-9.5beta.v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-11-09 Thread Ashutosh Bapat

> Any distributed transaction management requires 2PC in some or other form.
> So, we should implement 2PC for FDW keeping in mind various forms of 2PC
> used practically. Use that infrastructure to build XTM like capabilities
> for restricted postgres_fdw uses. Previously, I have requested the authors
> of XTM to look at my patch and provide me feedback about their requirements
> for implementing 2PC part of XTM. But I have not heard anything from them.
>
> 1.
> https://domino.mpi-inf.mpg.de/intranet/ag5/ag5publ.nsf/1c0a12a383dd2cd8c125613300585c64/7684dd8109a5b3d5c1256de40051686f/$FILE/tdd99.pdf
>
>
>
> Sorry, may be I missed some message. but I have not received request from
> you to provide feedback concerning your patch.
>
>
See my mail on 31st August on hackers in the thread with subject
"Horizontal scalability/sharding".

>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] eXtensible Transaction Manager API

On 8 November 2015 at 23:35, Michael Paquier 
wrote:

> > COMMIT PREPARED is a pretty thin layer; the work is all done in the
> PREPARE.
> > With a DTM, the main commit itself is done once only in the DTM, so all
> the
> > COMMIT PREPARED would do is release local node resources.
>
> Sure. Now imagine that the pg_twophase entry is corrupted for this
> transaction on one node. This would trigger a PANIC on it, and
> transaction would not be committed everywhere. I am aware of the fact
> that by definition PREPARE TRANSACTION ensures that a transaction will
> be committed with COMMIT PREPARED, just trying to see any corner cases
> with the approach proposed. The DTM approach is actually rather close
> to what a GTM in Postgres-XC does :)
>

This is wrong.

There is no "approach proposed", this is just normal 2PC feature that
PostgreSQL has supported since 8.1. All that is proposed here is that we
have an API that allows this to be exposed.

The whole point of PREPARE is that it allows a database to crash after that
step and it can still be recovered. That has nothing to do with Xanything.

In this case, the role of the GTM is to record the commit. Other nodes
consult the GTM, not local state to see whether the transaction has
committed, acting as the transaction manager in an XA sense.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-09 Thread Shulgin, Oleksandr

On Mon, Nov 9, 2015 at 11:08 AM,  wrote:

>
>   - cost parameter calibration: random_page_cost = 92.89
>

This demands some explanation and raises question of value of seq_page_cost
parameter -- I don't see anything about it your mail.

--
Alex

[HACKERS] Minor comment improvement to create_foreignscan_plan

2015-11-09 Thread Etsuro Fujita

Hi,

Here is a small patch to update an comment in create_foreignscan_plan;
add fdw_recheck_quals to the list of expressions that need the
replace_nestloop_params processing.  I should have updated the comment
when I proposed the patch for the fdw_recheck_quals.

Best regards,
Etsuro Fujita
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***
*** 2141,2151  create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
  	scan_plan->fs_relids = best_path->path.parent->relids;
  
  	/*
! 	 * Replace any outer-relation variables with nestloop params in the qual
! 	 * and fdw_exprs expressions.  We do this last so that the FDW doesn't
! 	 * have to be involved.  (Note that parts of fdw_exprs could have come
! 	 * from join clauses, so doing this beforehand on the scan_clauses
! 	 * wouldn't work.)  We assume fdw_scan_tlist contains no such variables.
  	 */
  	if (best_path->path.param_info)
  	{
--- 2141,2152 
  	scan_plan->fs_relids = best_path->path.parent->relids;
  
  	/*
! 	 * Replace any outer-relation variables with nestloop params in the qual,
! 	 * fdw_exprs and fdw_recheck_quals expressions.  We do this last so that
! 	 * the FDW doesn't have to be involved.  (Note that parts of fdw_exprs
! 	 * or fdw_recheck_quals could have come from join clauses, so doing this
! 	 * beforehand on the scan_clauses wouldn't work.)  We assume
! 	 * fdw_scan_tlist contains no such variables.
  	 */
  	if (best_path->path.param_info)
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-09 Thread Gavin Flower


On 09/11/15 23:08, kawami...@tkl.iis.u-tokyo.ac.jp wrote:

Hi guys,

I’ve been using Postgres for research at an university,

Great!

[...]

・Postgres 9.4.1

[..]

More knowledgeable people are sure to reply in more detail!

However, they would probably appreciate it if you can run with 9.4.5 
(the latest released version).  Running it with the beta of 9.5 would be 
a bonus!


Note that I don't know enough to say for sure that later versions would 
make any difference in this case, but at least using later later 
versions would kill lots of Red Herrings!  :-)



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Erroneous cost estimation for nested loop join

On 9 November 2015 at 10:08,  wrote:

>
> We guessed the cause of this error would be in the cost model of Postgres,
> and investigated the source code of optimizer, and we found the cause of
> this problem. It was in the index cost estimation process. On scanning
> inner table, if loop count is greater than 1, its I/O cost is counted as
> random access. In the case of Query2, in one loop (i.e. one inner table
> scan) , much data is read sequentially with clustered index, so it seems to
> be wrong that optimizer thinks its I/O workload is random access.
>

We don't have a clustered index in Postgres. We do store correlation stats
for the index, which is used in some places to reduce cost.

Could you look some more at this and see if a model change that uses the
correlation can improve this?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Getting sorted data from foreign server for merge join

2015-11-09 Thread Amit Langote

On 2015/11/10 0:56, Robert Haas wrote:
> On Sat, Nov 7, 2015 at 12:07 PM, Corey Huinker  
> wrote:
>> Sorry to barge in late, but I was wondering if what we've learned with this
>> patch can be applied to the case of asserting a sort order on a query
>> returning from dblink().
> 
> Nope.
> 
> Sorry to the bearer of bad news, but that would be a different and
> much harder development project.  Set-returning functions have no way
> to indicate anything about the ordering of their return values at
> present.  We could invent something, but the work involved wouldn't
> have much to do with this patch.

There was a patch (which, it seems, was rejected [1]) to do this sort of
thing.

Thanks,
Amit

[1] https://commitfest.postgresql.org/4/88/



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uh-oh: documentation PDF output no longer builds in HEAD

I wrote:
> Curiously though, that gets us down to this:
>  30615 strings out of 245828
>  397721 string characters out of 1810780
> which implies that indeed FlowObjectSetup *is* the cause of most of
> the strings being entered.  I'm not sure how that squares with the
> observation that there are less than 5000 \pagelabel entries in the
> postgres-US.aux file.  Time for more digging.

Well, after much digging, I've found what seems a workable answer.
It turns out that the original form of FlowObjectSetup is just
unbelievably awful when it comes to handling of hyperlink anchors:
it will put a hyperlink anchor into the PDF for every "flow object",
that is, everything in the document that could possibly have a link
to it, whether or not it actually is linked to.  And aside from bloating
the PDF file, it turns out that the hyperlink stuff also consumes some
control sequence names, which is why we're running out of strings.

There already is logic (probably way older than the hyperlink code)
in jadetex to avoid generating page-number labels for objects that have
no cross-references.  So what I did to fix this was to piggyback on
that code: with the attached jadetex.cfg, both a page-number label
and a hyperlink anchor will be generated for all and only those flow
objects that have either a page-number reference or a hyperlink reference.
(We could try to separate those things, but then we'd need two control
sequence names not one per object for tracking purposes, and anyway many
objects will have both kinds of reference if they have either.)

This gets us down to ~135000 strings to build HEAD, and not incidentally,
the resulting PDF is about half the size it was before.  I think I've
also fixed a number of formerly unexplainable broken hyperlinks in the
PDF; some are still broken, but they were that way before.  (It looks
like  with endterm doesn't work very well in jadetex; all the
remaining bad links seem to be associated with uses of that.)

Barring objection I'll commit this tomorrow.  I'm inclined to back-patch
it at least into 9.5, maybe further, because I'm afraid we may be closer
than we realized to exceeding the strings limit in the back branches too.

regards, tom lane

% doc/src/sgml/jadetex.cfg
%
% This file redefines FlowObjectSetup and some related macros to greatly
% reduce the number of control sequence names created, and also to avoid
% creation of many useless hyperlink anchors in PDF files.
%
% The original coding of FlowObjectSetup defined a control sequence x@LABEL
% for pretty nearly every flow object in the file, whether that object was
% cross-referenced or not.  Worse yet, it created a hyperlink anchor for
% every such object, which not only bloated the output PDF with useless
% anchors but consumed additional control sequence names internally.
%
% To fix, extend PageLabel's already-existing mechanism whereby a p@LABEL
% control sequence is filled in only for labels that are referenced by at
% least one \Pageref call.  We now also fill in p@LABEL for labels that are
% referenced by a \Link.  Then, we can drop x@LABEL entirely, and use
% p@LABEL to control emission of both a hyperlink anchor and a \PageLabel.
% Now, both of those things are emitted for all and only the flow objects
% that have either a hyperlink reference or a page-number reference.
%
% (With a more invasive patch, we could track the need for an anchor and a
% page-number label separately, but that would probably require more control
% sequences than this way does.)
%
%
% In addition to checking p@LABEL not x@LABEL, this version of FlowObjectSetup
% is fixed to clear \Label and \Element whether or not it emits an anchor
% and page label.  Failure to do that seems to explain some pre-existing bugs
% in which certain SGML constructs weren't correctly cross-referenced.
%
\def\FlowObjectSetup#1{%
\ifDoFOBSet
  \ifLabelElements
 \ifx\Label\@empty\let\Label\Element\fi
  \fi
  \ifx\Label\@empty\else
  \expandafter\ifx\csname p@\Label\endcsname\relax
  \else
   \bgroup
 \ifNestedLink
 \else
   \hyper@anchorstart{\Label}\hyper@anchorend
   \PageLabel{\Label}%
 \fi
   \egroup
  \fi
  \let\Label\@empty
  \let\Element\@empty
  \fi
\fi
}
%
% Adjust PageLabel so that the p@NAME control sequence acquires a correct
% value immediately; this seems to be needed to avoid scenarios wherein
% additional TeX runs are needed to reach a stable state of the .aux file.
%
\def\PageLabel#1{%
  \@bsphack
  \expandafter\ifx\csname p@#1\endcsname\relax
  \else
  \protected@write\@auxout{}%
 {\string\pagelabel{#1}{\thepage}}%
  % Ensure the p@NAME control sequence acquires correct value immediately
  \expandafter\xdef\csname p@#1\endcsname{\thepage}%
  \fi
  \@esphack}
%
% In \Link, add code to emit an aux-file entry if the p@NAME sequence isn't
% defined.  Much as in @Setref, this ensures we'll process the referenced
% item correctly on the next TeX

Re: [HACKERS] Uh-oh: documentation PDF output no longer builds in HEAD

On 2015-11-09 19:46:37 -0500, Tom Lane wrote:
> Well, after much digging, I've found what seems a workable answer.
> It turns out that the original form of FlowObjectSetup is just
> unbelievably awful [...].
>
> This gets us down to ~135000 strings to build HEAD, and not incidentally,
> the resulting PDF is about half the size it was before.  I think I've
> also fixed a number of formerly unexplainable broken hyperlinks in the
> PDF; some are still broken, but they were that way before.  (It looks
> like  with endterm doesn't work very well in jadetex; all the
> remaining bad links seem to be associated with uses of that.)

Nice work. On an ugly subject.

> Barring objection I'll commit this tomorrow.  I'm inclined to back-patch
> it at least into 9.5, maybe further, because I'm afraid we may be closer
> than we realized to exceeding the strings limit in the back branches too.

+1 for doing this in 9.5+. I think we will probably want this in all
branches at some point. I don't have a strong opinion on whether we want
to let this mature in 9.5 or not.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Git cartoon

2015-11-09 Thread Albe Laurenz

Fabrízio de Royes Mello wrote:
> Em domingo, 8 de novembro de 2015, Bruce Momjian  escreveu:
>> This git cartoon was too funny not to share:
>>
>> http://xkcd.com/1597/
>> 
>> Maybe we need it on our git wiki page.  ;-)
> 
>  I think we need our own cartoon with a funny DB story.

What, like this?

http://xkcd.com/327/

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adjust errorcode in background worker code

2015-11-09 Thread Amit Langote

On 2015/11/07 3:55, Robert Haas wrote:
> On Sun, Jun 28, 2015 at 10:43 PM, Amit Langote
>  wrote:
>> On 2015-06-29 AM 11:36, Amit Langote wrote:
>>> Hi,
>>>
>>> How about the attached that adjusts errorcode for the error related to
>>> checking the flag bgw_flags in BackgroundWorkerInitializeConnection*()
>>> functions so that it matches the treatment in SanityCheckBackgroundWorker()?
>>>
>>> s/ERRCODE_PROGRAM_LIMIT_EXCEEDED/ERRCODE_INVALID_PARAMETER_VALUE/g
>>>
>>> There is already a "/* XXX is this the right errcode? */" there.
>>
>> Oops, a wrong thing got attached.
>>
>> Please find correct one attached this time.
> 
> Well, I'm just catching up on some old email and saw this thread.  I
> like the idea of trying to use the best possible error code, but I'm
> not so sure this is an improvement.  One problem is that
> ERRCODE_INVALID_PARAMETER_VALUE is that we use it, uh, a lot:
> 
> [rhaas pgsql]$ git grep ERRCODE_ | sed 's/.*ERRCODE_/ERRCODE_/;
> s/[^A-Z0-9_].*//;' | sort  | uniq -c | sort -n -r | head
>  540 ERRCODE_FEATURE_NOT_SUPPORTED
>  442 ERRCODE_INVALID_PARAMETER_VALUE
>  380 ERRCODE_SYNTAX_ERROR
>  194 ERRCODE_WRONG_OBJECT_TYPE
>  194 ERRCODE_UNDEFINED_OBJECT
>  181 ERRCODE_DATATYPE_MISMATCH
>  180 ERRCODE_INSUFFICIENT_PRIVILEGE
>  150 ERRCODE_INVALID_TEXT_REPRESENTATION
>  137 ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE
>  123 ERRCODE_PROGRAM_LIMIT_EXCEEDED
> 
> I wonder if we need to think about inventing some new error codes.  I
> can sort of understand that "feature not supported" is something that
> can come in a large number of different contexts and mean pretty much
> the same all the time, but I'm betting that things like "invalid
> parameter value" and "invalid text representation" and "object not in
> prerequisite state" cover an amazing breadth of errors that may not
> actually be that similar to each other.
> 

Now that I see it, ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE may be a
better choice there.

In general, I tend to agree with the observations expressed by Craig
down-thread. I'd think this particular error code (or more to the point,
the error site) does not concern a client app itself but rather suggests a
bug in a loadable module implementation which the client app has no way to
fix itself. In general, it seems we should only ever report error codes
that a client app can do something about. But I guess that's a whole
different area and vast project to investigate about. For a rather
contrived example related to OP, it would have made sense to send an error
code if there was a parameter like worker_spi.can_connect_db which the app
set to false and then later did something with it that required a database
connection.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-11-09 Thread Etsuro Fujita

On 2015/11/09 13:40, Kouhei Kaigai wrote:

Having said that, as I said previously, I don't see much value in adding
the callback routine, to be honest. I know KaiGai-san considers that
that would be useful for custom joins, but I don't think that that would
be useful even for foreign joins, because I think that in case of
foreign joins, the practical implementation of that routine in FDWs
would be to create a secondary plan and execute that plan by performing
ExecProcNode, as my patch does [1]. Maybe I'm missing something, though.

I've never denied that alternative local sub-plan is one of the best
approach for postgres_fdw, however, I've also never heard why you can
say the best approach for postgres_fdw is definitely also best for
others.
If we would justify less flexible interface specification because of
comfort for a particular extension, it should not be an extension,
but a built-in feature.
My standpoint has been consistent through the discussion; we can never
predicate which feature shall be implemented on FDW interface, therefore,
we also cannot predicate which implementation is best for EPQ rechecks
also. Only FDW driver knows which is the "best" for them, not us.

What the RecheckForeignScan routine does for the foreign-join case would
be the following for tuples stored in estate->es_epqTuple[]:

1. Apply relevant restriction clauses, including fdw_recheck_quals, to
the tuples for the baserels involved in a foreign-join, and see if the
tuples still pass the clauses.

2. If so, form a join tuple, while applying relevant join clauses to the
tuples, and set the join tuple in the given slot. Else set empty.

I think these would be more efficiently processed internally in core
than externally in FDWs. That's why I don't see much value in adding
the routine. I have to admit that that means no flexibility, though.

However, the routine as-is doesn't seem good enough, either. For
example, since the routine is called after each of the tuples was
re-fetched from the remote end or re-computed from the whole-row var and
stored in the corresponding estate->es_epqTuple[], the routine wouldn't
allow for what Robert proposed in [2]. To do such a thing, I think we
would probably need to change the existing EPQ machinery more
drastically and rethink the right place for calling the routine.

Best regards,
Etsuro Fujita

[2]
http://www.postgresql.org/message-id/ca+tgmozdpu_fcspozxxpd1xvyq3czcawd7-x3avwbkgsfoh...@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Git cartoon

2015-11-09 Thread Fabrízio de Royes Mello

Em segunda-feira, 9 de novembro de 2015, Albe Laurenz <
laurenz.a...@wien.gv.at> escreveu:

> Fabrízio de Royes Mello wrote:
> > Em domingo, 8 de novembro de 2015, Bruce Momjian  > escreveu:
> >> This git cartoon was too funny not to share:
> >>
> >> http://xkcd.com/1597/
> >>
> >> Maybe we need it on our git wiki page.  ;-)
> >
> >  I think we need our own cartoon with a funny DB story.
>
> What, like this?
>
> http://xkcd.com/327/
>

Hahahahah... Very funny...


-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-11-09 Thread Konstantin Knizhnik




On 06.11.2015 21:37, Robert Haas wrote:

On Wed, Aug 12, 2015 at 6:25 AM, Ashutosh Bapat
 wrote:

The previous patch would not compile on the latest HEAD. Here's updated
patch.

Perhaps unsurprisingly, this doesn't apply any more.  But we have
bigger things to worry about.

The recent eXtensible Transaction Manager and the slides shared at the
Vienna sharding summit, now posted at
https://drive.google.com/file/d/0B8hhdhUVwRHyMXpRRHRSLWFXeXc/view make
me think that some careful thought is needed here about what we want
and how it should work. Slide 10 proposes a method for the extensible
transaction manager API to interact with FDWs.  The FDW would do this:

select dtm_join_transaction(xid);
begin transaction;
update...;
commit;

I think the idea here is that the commit command doesn't really
commit; it just escapes the distributed transaction while leaving it
marked not-committed.  When the transaction subsequently commits on
the local server, the XID is marked committed and the effects of the
transaction become visible on all nodes.

I think that this API is intended to provide not only consistent
cross-node decisions about whether a particular transaction has
committed, but also consistent visibility.  If the API is sufficient
for that and if it can be made sufficiently performant, that's a
strictly stronger guarantee than what this proposal would provide.

On the other hand, I see a couple of problems:

1. The extensible transaction manager API is meant to be pluggable.
Depending on which XTM module you choose to load, the SQL that needs
to be executed by postgres_fdw on the remote node will vary.
postgres_fdw shouldn't have knowledge of all the possible XTMs out
there, so it would need some way to know what SQL to send.

2. If the remote server isn't running the same XTM as the local
server, or if it is running the same XTM but is not part of the same
group of cooperating nodes as the local server, then we can't send a
command to join the distributed transaction at all.  In that case, the
2PC for FDW approach is still, maybe, useful.

On the whole, I'm inclined to think that the XTM-based approach is
probably more useful and more general, if we can work out the problems
with it.  I'm not sure that I'm right, though, nor am I sure how hard
it will be.
Sorry, but we currently considered only case of homogeneous environment: 
when all cluster instances are using PostgreSQL with the same XTM 
implementation.
I can imagine situations when it may be useful to coordinate transaction 
processing in heterogeneous cluster, but it seems to be quite exotic use 
case.
Combining several different databases on one cluster can be explained by 
some historical reasons or specific of particular system architecture. 
But I can not imagine any reason for using different XTM implementations 
and especially mixing them in one transaction.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-11-09 Thread Ildus Kurbangaliev


On 11/06/2015 08:53 PM, Robert Haas wrote:

On Fri, Nov 6, 2015 at 6:27 AM, Ildus Kurbangaliev
 wrote:

There is a patch that splits SLRU LWLocks to separate tranches and
moves them to SLRU Ctl. It does some work from the main patch from
this thread, but can be commited separately. It also simplifies
lwlock.c.

Thanks.  I like the direction this is going.

-   char   *ptr;
-   Sizeoffset;
-   int slotno;
+   char*ptr;
+   Size offset;
+   int  slotno;
+   int  tranche_id;
+   LWLockPadded*locks;

Please don't introduce this kind of churn.  pgindent will undo it.

This isn't going to work for EXEC_BACKEND builds, I think.  It seems
to rely on the LWLockRegisterTranche() performed !IsUnderPostmaster
being inherited by subsequent children, which won't work under
EXEC_BACKEND.  Instead, store the tranche ID in SlruSharedData.  Move
the LWLockRegisterTranche call out from the (!IsUnderPostmaster) case
and call it based on the tranche ID from SlruSharedData.

I would just drop the add_postfix stuff.  I think it's fine if the
names of the shared memory checks are just "CLOG" etc. rather than
"CLOG Slru Ctl", and similarly I think the locks can be registered
without the "Locks" suffix.  It'll be clear from context that they are
locks.  I suggest also that we just change all of these names to be
lower case, though I realize that's a debatable and cosmetic point.

Thanks for the review. I've attached a new version of SLRU patch. I've 
removed

add_postfix and fixed EXEC_BACKEND case.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3a58f1e..ea83655 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -456,7 +456,7 @@ void
 CLOGShmemInit(void)
 {
 	ClogCtl->PagePrecedes = CLOGPagePrecedes;
-	SimpleLruInit(ClogCtl, "CLOG Ctl", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
+	SimpleLruInit(ClogCtl, "clog", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
   CLogControlLock, "pg_clog");
 }
 
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index b21a313..d2c281f 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -478,7 +478,7 @@ CommitTsShmemInit(void)
 	bool		found;
 
 	CommitTsCtl->PagePrecedes = CommitTsPagePrecedes;
-	SimpleLruInit(CommitTsCtl, "CommitTs Ctl", CommitTsShmemBuffers(), 0,
+	SimpleLruInit(CommitTsCtl, "committs", CommitTsShmemBuffers(), 0,
   CommitTsControlLock, "pg_commit_ts");
 
 	commitTsShared = ShmemInitStruct("CommitTs shared",
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7d97085..b66a2b6 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1838,10 +1838,10 @@ MultiXactShmemInit(void)
 	MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;
 
 	SimpleLruInit(MultiXactOffsetCtl,
-  "MultiXactOffset Ctl", NUM_MXACTOFFSET_BUFFERS, 0,
+  "multixact_offset", NUM_MXACTOFFSET_BUFFERS, 0,
   MultiXactOffsetControlLock, "pg_multixact/offsets");
 	SimpleLruInit(MultiXactMemberCtl,
-  "MultiXactMember Ctl", NUM_MXACTMEMBER_BUFFERS, 0,
+  "multixact_member", NUM_MXACTMEMBER_BUFFERS, 0,
   MultiXactMemberControlLock, "pg_multixact/members");
 
 	/* Initialize our shared state struct */
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 90c7cf5..868b35a 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -157,6 +157,8 @@ SimpleLruShmemSize(int nslots, int nlsns)
 	if (nlsns > 0)
 		sz += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));	/* group_lsn[] */
 
+	sz += MAXALIGN(nslots * sizeof(LWLockPadded)); /* lwlocks[] */
+
 	return BUFFERALIGN(sz) + BLCKSZ * nslots;
 }
 
@@ -174,9 +176,10 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	if (!IsUnderPostmaster)
 	{
 		/* Initialize locks and shared memory area */
-		char	   *ptr;
-		Size		offset;
-		int			slotno;
+		char			*ptr;
+		Size			 offset;
+		int slotno;
+		LWLockPadded	*locks;
 
 		Assert(!found);
 
@@ -212,6 +215,17 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			offset += MAXALIGN(nslots * nlsns * sizeof(XLogRecPtr));
 		}
 
+		/* Initialize LWLocks */
+		locks = (LWLockPadded *) ShmemAlloc(sizeof(LWLockPadded) * nslots);
+
+		shared->locks_tranche_id = LWLockNewTrancheId();
+		shared->locks_tranche.name = pstrdup(name);
+		shared->locks_tranche.array_base = locks;
+		shared->locks_tranche.array_stride = sizeof(LWLockPadded);
+
+		for (slotno = 0; slotno < nslots; slotno++)
+			LWLockInitialize([slotno].lock,

Re: [HACKERS] eXtensible Transaction Manager API

2015-11-09 Thread Konstantin Knizhnik




On 09.11.2015 07:46, Amit Kapila wrote:
On Sat, Nov 7, 2015 at 10:23 PM, Konstantin Knizhnik 
> wrote:



Lock manager is one of the tasks we are currently working on.
There are still a lot of open questions:
1. Should distributed lock manager (DLM) do something else except
detection of distributed deadlock?


I think so.  Basically DLM should be responsible for maintaining
all the lock information which inturn means that any backend process
that needs to acquire/release lock needs to interact with DLM, without 
that

I don't think even global deadlock detection can work (to detect deadlocks
among all the nodes, it needs to know the lock info of all nodes).


I hope that it will not needed, otherwise it will add significant 
performance penalty.
Unless I missed something, locks can still managed locally, but we need 
DLM to detect global deadlocks.
Deadlock detection in PostgreSQL is performed only after timeout 
expiration for lock waiting. Only then lock graph is analyzed for 
presence of loops. At this moment we can send our local lock graph to 
DLM. If it is really distributed deadlock, then sooner or later
all nodes involved in this deadlock will send their local graphs to DLM 
and DLM will be able to build global graph and detect distributed 
deadlock. In this scenario DLM will be accessed very rarely - only when 
lock can not be granted within deadlock_timeout.


One of the possible problems is false global deadlock detection, because 
local lock graphs corresponds to different moments of time, so we can 
found "false" loop. I am still thinking about best solution of this problem.




This is somewhat inline with what currently we do during lock conflicts,
i.e if the backend incur a lock conflict and decides to sleep, before
sleeping it tries to detect an early deadlock, so if we make DLM 
responsible

for managing locks the same could be even achieved in distributed system.

2. Should DLM be part of XTM API or it should be separate API?


We might be able to do it either ways (make it part of XTM API or devise a
separate API). I think here more important point is to first get the 
high level

design for Distributed Transactions (which primarily includes consistent
Commit/Rollback, snapshots, distributed lock manager (DLM) and recovery).

3. Should DLM be implemented by separate process or should it be
part of arbiter (dtmd).


That's important decision. I think it will depend on which kind of design
we choose for distributed transaction manager (arbiter based solution or
non-arbiter based solution, something like tsDTM).  I think DLM should be
separate, else arbiter will become hot-spot with respect to contention.


There are pros and contras.
Pros for integrating DLM in DTM:
1. DTM arbiter has information about local to global transaction ID 
mapping which may be needed to DLM
2. If my assumptions about DLM are correct, then it will be accessed 
relatively rarely and should not cause significant

impact on performance.

Contras:
1. tsDTM doesn't need centralized arbiter but still needs DLM
2. Logically DLM and DTM are independent components




Can you please explain more about tsDTM approach, how timestamps
are used, what is exactly CSN (is it Commit Sequence Number) and
how it is used in prepare phase?  IS CSN used as timestamp?
Is the coordinator also one of the PostgreSQL instance?


In tsDTM approach system time (in microseconds) is used as CSN (commit 
sequence number).
We also enforce that assigned CSN is unique and monotonic within 
PostgreSQL instance.
CSN are assigned locally and do not require interaction with some other 
cluster nodes.

This is why in theory tsDTM approach should provide good scalability.



I think in this patch, it is important to see the completeness of
all the
API's that needs to be exposed for the implementation of distributed
transactions and the same is difficult to visualize without
having complete
picture of all the components that has some interaction with the
distributed
transaction system.  On the other hand we can do it in
incremental fashion
as and when more parts of the design are clear.


That is exactly what we are going to do - we are trying to
integrate DTM with existed systems (pg_shard, postgres_fdw, BDR)
and find out what is missed and should be added. In parallel we
are trying to compare efficiency and scalability of different
solutions.


One thing, I have noticed that in DTM approach, you seems to
be considering a centralized (Arbiter) transaction manager and
centralized Lock manager which seems to be workable approach,
but I think such an approach won't scale in write-heavy or mixed
read-write workload.  Have you thought about distributing responsibility
of global transaction and lock management?


From my point of view there are two different scenarios:
1. When most transactions are local and

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-11-09 Thread Victor Wagner

On Mon, 9 Nov 2015 15:14:02 +0800
Craig Ringer  wrote:

> 
> I'd rather not see convoluted and complex connstrings, per my prior
> post. The JDBC extended URL style seems good enough.

It is what my patch implements.

> IMO the biggest problem is that client-side failover is way more
> complex than "got -ECONNREFUSED, try next host". I think we're all
> focusing on not being able to twiddle arbitrary settings while
> ignoring the real problem with client failover, i.e. making it
> actually work in the real world.
> 
I've tried to deal with some of these problems. 

My patch have support for following things:

1. Check whether database instance is in the recovery/standby mode and
try to find another one if so.
2. Let cluster management software to have some time to promote one of
the standbys to master. I.e. there can be failover timeout specified to
allow retry after some time if no working master found.

Really there is room for some improvements in handling of connect
timeout (which became much more important thing when ability to try
next host appears). Now it is handled only by blocking-mode connect
functions, not by async state machine. But I decided to publish patch
without these improvements to get feedback from community.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-11-09 Thread Konstantin Knizhnik




On 09.11.2015 09:59, Ashutosh Bapat wrote:



Since the foreign server (referred to in the slides as secondary 
server) requires to call "create extension pg_dtm" and select 
dtm_join_transaction(xid);, I assume that the foreign server has to be 
a PostgreSQL server and one which has this extension installed and has 
a version that can support this extension. So, we can not use the 
extension for all FDWs and even for postgres_fdw it can be used only 
for a foreign server with above capabilities. The slides mention just 
FDW but I think they mean postgres_fdw and not all FDWs.


DTM approach is based on sharing XIDs and snapshots between different 
cluster nodes, so it really can be easily implemented only for 
PostgreSQL. So I really have in mind postgres_fdw rather than abstract FDW.
Approach with timestamps is more universal and in principle can be used 
for any DBMS where visibility is based on CSNs.




I think that this API is intended to provide not only consistent
cross-node decisions about whether a particular transaction has
committed, but also consistent visibility.  If the API is sufficient
for that and if it can be made sufficiently performant, that's a
strictly stronger guarantee than what this proposal would provide.

On the other hand, I see a couple of problems:

1. The extensible transaction manager API is meant to be pluggable.
Depending on which XTM module you choose to load, the SQL that needs
to be executed by postgres_fdw on the remote node will vary.
postgres_fdw shouldn't have knowledge of all the possible XTMs out
there, so it would need some way to know what SQL to send.

2. If the remote server isn't running the same XTM as the local
server, or if it is running the same XTM but is not part of the same
group of cooperating nodes as the local server, then we can't send a
command to join the distributed transaction at all.  In that case, the
2PC for FDW approach is still, maybe, useful.


Elaborating more on this: Slide 11 shows arbiter protocol to start a 
transaction and next slide shows the same for commit. Slide 15 shows 
the transaction flow diagram for tsDTM. In DTM approach it doesn't 
specify how xids are communicated between nodes, but it's implicit in 
the protocol that xid space is shared by the nodes. Similarly for 
tsDTM it assumes that CSN space is shared by all the nodes (see 
synchronization for max(CSN)). This can not be assumed for FDWs (not 
even postgres_fdw) where foreign servers are independent entities with 
independent xid space.


Proposed architecture of DTM includes "coordinator". Coordinator is a 
process  responsible for managing logic of distributed transaction. It 
can be just a normal client application, or it can be intermediate 
master node (like in case of pg_shard).
It can be also PostgreSQL instance (as in case of postgres_fdw) or not.  
We try to put as less restriction on "coordinator" as possible.
It should just communicate with PostgreSQL backends using any 
communication protocol it likes (i.e. libpq) and invokes some special 
stored procedures which are part of particular DTM extension. Such 
functions also impose some protocol of exchanging data between different 
nodes involved in distributed transaction. In such way we are 
propagating XIDs/CSNs between different nodes which may even do not know 
about each other.
In DTM approach nodes only know about location of "arbiter". In tsDTM 
approach there is even not arbiter...






On the whole, I'm inclined to think that the XTM-based approach is
probably more useful and more general, if we can work out the problems
with it.  I'm not sure that I'm right, though, nor am I sure how hard
it will be.


2PC for FDW and XTM are trying to solve different problems with some 
commonality. 2PC for FDW is trying to solve problem of atomic commit 
(I am borrowing from the terminology you used in PGCon 2015) for FDWs 
in general (although limited to FDWs which can support 2 phase commit) 
and XTM tries to solve problems of atomic visibility, atomic commit 
and consistency for postgres_fdw where foreign servers support XTM. 
The only thing common between these two is atomic visibility.


If we accept XTM and discard 2PC for FDW, we will not be able to 
support atomic commit for FDWs in general. That, I think would be 
serious limitation for Postgres FDW, esp. now that DMLs are allowed. 
If we accept only 2PC for FDW and discard XTM, we won't be able to get 
atomic visibility and consistency for postgres_fdw with foreign 
servers supporting XTM. That would be again serious limitation for 
solutions implementing sharding, multi-master clusters etc.


There are approaches like [1] by which cluster of heterogenous servers 
(with some level of snapshot isolation) can be constructed. Ideally 
that will enable PostgreSQL users to maximize their utilization of FDWs.


Any distributed transaction management requires 2PC in some or

Re: [HACKERS] Some questions about the array.

On Sunday 08 November 2015 16:49:20 you wrote:
> I'm not necessarily objecting to that, but it's not impossible that it
> could break something for some existing user.  We can decide not to
> care about that, though.

We had an idea. You can use ~ to convert the index to the array which always 
starts with 0. Then we can use negative indexes, and you can always find the 
beginning of the array.
Example:
we have array [-3:3]={1,2,3,4,5,6,7}
array[~0] == 1
array[~-1] == 7
array[~2:~-2] == {3,4,5,6}

What do you think?
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-11-09 12:36 GMT+01:00 YUriy Zhuravlev :

> On Sunday 08 November 2015 16:49:20 you wrote:
> > I'm not necessarily objecting to that, but it's not impossible that it
> > could break something for some existing user.  We can decide not to
> > care about that, though.
>
> We had an idea. You can use ~ to convert the index to the array which
> always
> starts with 0. Then we can use negative indexes, and you can always find
> the
> beginning of the array.
> Example:
> we have array [-3:3]={1,2,3,4,5,6,7}
> array[~0] == 1
> array[~-1] == 7
> array[~2:~-2] == {3,4,5,6}
>
> What do you think?
>

I am sorry - it is looking pretty obscure. Really need this feature?

Pavel


> --
> YUriy Zhuravlev
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-11-09 Thread Kouhei Kaigai

> On 2015/11/09 13:40, Kouhei Kaigai wrote:
> >> Having said that, as I said previously, I don't see much value in adding
> >> the callback routine, to be honest.  I know KaiGai-san considers that
> >> that would be useful for custom joins, but I don't think that that would
> >> be useful even for foreign joins, because I think that in case of
> >> foreign joins, the practical implementation of that routine in FDWs
> >> would be to create a secondary plan and execute that plan by performing
> >> ExecProcNode, as my patch does [1].  Maybe I'm missing something, though.
> 
> > I've never denied that alternative local sub-plan is one of the best
> > approach for postgres_fdw, however, I've also never heard why you can
> > say the best approach for postgres_fdw is definitely also best for
> > others.
> > If we would justify less flexible interface specification because of
> > comfort for a particular extension, it should not be an extension,
> > but a built-in feature.
> > My standpoint has been consistent through the discussion; we can never
> > predicate which feature shall be implemented on FDW interface, therefore,
> > we also cannot predicate which implementation is best for EPQ rechecks
> > also. Only FDW driver knows which is the "best" for them, not us.
> 
> What the RecheckForeignScan routine does for the foreign-join case would
> be the following for tuples stored in estate->es_epqTuple[]:
> 
> 1. Apply relevant restriction clauses, including fdw_recheck_quals, to
> the tuples for the baserels involved in a foreign-join, and see if the
> tuples still pass the clauses.
>
It depends on how FDW driver has restriction clauses, but you should not
use fdw_recheck_quals to recheck individual base relations, because it is
initialized to run on the joined tuple according to fdw_scan_tlist, so
restriction clauses has to be kept in other private field.

> 2. If so, form a join tuple, while applying relevant join clauses to the
> tuples, and set the join tuple in the given slot.  Else set empty.
>
No need to form a joined tuple after the rechecks of base relations's
clauses. If FDW support only inner-join, it can reconstruct a joined
tuple first, then run fdw_recheck_quals (by caller) that contains both
relation's clauses and join clause.
FDW driver can choose its comfortable way according to its implementation
and capability.

> I think these would be more efficiently processed internally in core
> than externally in FDWs.  That's why I don't see much value in adding
> the routine.  I have to admit that that means no flexibility, though.
>
Something like "efficiently", "better", "reasonable" and etc... are
your opinions from your standpoint. Things important is why you
thought X is better and Y is worse. It is what I've wanted to see for
three months, but never seen.

Discussion will become unproductive without understanding of the
reason of different conclusion. Please don't omit why you think it
is "efficient" that can justify to enforce all FDW drivers
a particular implementation manner, as a part of interface contract.

> However, the routine as-is doesn't seem good enough, either.  For
> example, since the routine is called after each of the tuples was
> re-fetched from the remote end or re-computed from the whole-row var and
> stored in the corresponding estate->es_epqTuple[], the routine wouldn't
> allow for what Robert proposed in [2].  To do such a thing, I think we
> would probably need to change the existing EPQ machinery more
> drastically and rethink the right place for calling the routine.
>
Please also see my message:
http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f801161...@bpxm15gp.gisp.nec.co.jp

And, why Robert thought here is a tough challenge:
http://www.postgresql.org/message-id/CA+TgmoY5Lf+vYy1Bha=u7__s3qtmqp7d+gssfd+ln4xz6fy...@mail.gmail.com

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.