Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-11 Thread Konstantin Knizhnik

On 03/11/2016 11:35 PM, Tom Lane wrote:

Robert Haas  writes:

On Fri, Mar 11, 2016 at 1:11 PM, David Steele  wrote:

Is anyone willing to volunteer a review or make an argument for the
importance of this patch?

There's been a lot of discussion on another thread about this patch.
The subject is "The plan for FDW-based sharding", but the thread kind
of got partially hijacked by this issue.  The net-net of that is that
I don't think we have a clear enough idea about where we're going with
global transaction management to make it a good idea to adopt an API
like this.  For example, if we later decide we want to put the
functionality in core, will we keep the hooks around for the sake of
alternative non-core implementations?  I just don't believe this
technology is nearly mature enough to commit to at this point.
Konstantin does not agree with my assessment, perhaps unsurprisingly.

I re-read the original thread,

http://www.postgresql.org/message-id/flat/f2766b97-555d-424f-b29f-e0ca0f6d1...@postgrespro.ru

I think there is no question that this is an entirely immature patch.
Not coping with subtransactions is alone sufficient to make it not
credible for production.


Lack of subtractions support is not a limitation of XTM API.
It is limitation of current pg_dtm implementation. And another DTM 
implementation - pg_tsdtm supports subtransactions.



Even if the extension API were complete and clearly stable, I have doubts
that there's any great value in integrating it into 9.6, rather than some
later release series.  The above thread makes it clear that pg_dtm is very
much WIP and has easily a year's worth of work before anybody would think
of wanting to deploy it.  So end users don't need this patch in 9.6, and
developers working on pg_dtm shouldn't really have much of a problem
applying the patch locally --- how likely is it that they'd be using a
perfectly stock build of the database apart from this patch?


I agree with you that pg_dtm is very far from production.
But I wan to notice two things:

1. pg_dtm and pg_tsdtm are not complete cluster solutions, them are just one 
(relatively small) part of them.
pg_tsdtm seems to be even more "mature", may be because it is simpler and do 
not have many limitations which pg_dtm has (like subtrasaction support).

2. Them can be quite easily integrated with other (existed) cluster solutions. 
We have integrated bother of them with postgres_fwd and pg_shard.
Postgres_fdw is also not a ready solution, but just a mechanism which can be 
used also for sharding.
But pg_shard & CitusDB  are quite popular solutions for distributed execution 
of queries which provide good performance for analytic and single node OLTP queries.
Integration with DTMs  adds ACID semantic for distributed transactions and 
makes it possible to support more complex OLTP and OLAP queries involving 
multiple nodes.

Such integration is already done,  performance was evaluated, so it is not 
quite correct to say that we need a year or more to make pg_dtm/pg_tsdtm ready 
to deploy.




But my real takeaway from that thread is that there's no great reason
to believe that this API definition *is* stable.  The single existing
use-case is very far from completion, and it's hardly unlikely that
what it needs will change.


Sorry, may be I am completely wrong, but I do not thing that it is possible to 
develop stable API if nobody is using it.
It is like "fill pool with a water only after you learn how to swim".



I also took a very quick look at the patch itself:

1. No documentation.  For something that purports to be an API
specification, really the documentation should have been written *first*.


Sorry, it was my fault. I have already written documentation and it will be 
included in next version of the patch.
But please notice, that starting work on DTM we do not have good understanding 
with PostgreSQL TM features have to be changed.
Only during work on pg_dtm, pg_tsdtm, multimaster current view of XTM has been 
formed.

And yet another moment: we have not introduce new abstractions in XTM.
We just override existed PostgreSQL functions.
Certainly when some internal functions become part of API, it should be much 
better documented.



2. As noted in the cited thread, it's not clear that
Get/SetTransactionStatus are a useful cutpoint; they don't provide any
real atomicity guarantees.


I wonder how such guarantees can be provided at API level?
Atomicity means that all other transaction either see this transaction as 
committed, either uncommitted.
So transaction commit should be coordinated with visibility check.
In case of pg_dtm atomicity is simply enforced by the fact that decision 
whether to commit transaction is taken by central coordinator.
When it decides that transaction is committed, it marks it as committed in all 
subsequently obtained snapshots.

In case of pg_tsdtm there is no central arbiter, so we have to introduce 
"in-doubt" state of 

Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Amit Kapila
On Sat, Mar 12, 2016 at 5:01 AM, Jim Nasby  wrote:
>
> On 3/10/16 8:36 PM, Robert Haas wrote:
>>
>> 1. We make it true only for heavyweight lock waits, and false for
>> other kinds of waits.  That's pretty strange.
>> 2. We make it true for all kinds of waits that we now know how to
>> report.  That still breaks compatibility.
>
>
> I would absolutely vote for 2 here. You could even argue that it's a bug
fix, since those were waits we technically should have been indicating.
>

I see it as reverse.  I think waiting=true for only heavyweight locks makes
sense in existing versions as user can still find whats actually going in
the system either by looking at "query" in pg_stat_activity or by referring
pg_locks, but OTOH if waiting is true for all kind of waits (lwlock,
heavyweight lock, I/O, etc) then I think it will be difficult for user to
make any sense out of it.  So I see going for option 2 can confuse users
rather than simplifying anything.

>
> Another random thought... changes like this would probably be easier to
handle if we provided backwards compatibility extensions that created views
> that mimicked the catalog for a specific Postgres version.
>

That makes sense to me if other people agree to it, but I think there will
be some maintenance overhead for it, but I see that as worth the effort in
terms of user convenience.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Perl's newSViv() versus 64-bit ints?

2016-03-11 Thread Salvador Fandiño

On 03/12/2016 12:49 AM, Tom Lane wrote:

Anybody know what will happen when passing a uint64 to newSViv()?


On 64 bit platforms, it is just interpreted as a signed integer, any 
number with the upper bit set will become negative. Perl provides 
newSVuv for unsigned numbers.


On 32bit platforms and Perls compiled with 32 bit IVs the number is 
truncated. My module Math::Int64 can be used to add support for 64bit 
numbers there. It has a C API[*] which allows calling it from C code 
easily. Another possibility is to just use newSVnv(), but NVs are not 
able to represent all the uint64 range precisely (IIRC, they can 
represent integers up to 48bits?).


Well, and then you can always use some bigint module, but AFAIK, the 
ones distributed in the Perl core do not provide a C API.



* https://metacpan.org/pod/Math::Int64#C-API


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Use MemoryContextAlloc() in the MemoryContextAllocZero() and MemoryContextAllocZeroAligned()

2016-03-11 Thread Alexander Kuleshov
Hello all,

Attached patch simplifies the MemoryContextAllocZero() and
MemoryContextAllocZeroAligned().
The MemoryContextAllocZero() and MemoryContextAllocZeroAligned()
functions does almost the
same that MemoryContextAlloc() does. Additionally these functions
fills allocated memory context
with zeros via MemSetAligned() and MemSetLoop(). Let's call
MemoryContextAlloc() in these functions
instead of setting isReset to false, call alloc() callback of the
context and etc., to prevent code duplication.
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 2bfd364..2845089 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -788,26 +788,7 @@ MemoryContextAllocZero(MemoryContext context, Size size)
 {
 	void	   *ret;
 
-	AssertArg(MemoryContextIsValid(context));
-	AssertNotInCriticalSection(context);
-
-	if (!AllocSizeIsValid(size))
-		elog(ERROR, "invalid memory alloc request size %zu", size);
-
-	context->isReset = false;
-
-	ret = (*context->methods->alloc) (context, size);
-	if (ret == NULL)
-	{
-		MemoryContextStats(TopMemoryContext);
-		ereport(ERROR,
-(errcode(ERRCODE_OUT_OF_MEMORY),
- errmsg("out of memory"),
- errdetail("Failed on request of size %zu.", size)));
-	}
-
-	VALGRIND_MEMPOOL_ALLOC(context, ret, size);
-
+	ret = MemoryContextAlloc(context, size);
 	MemSetAligned(ret, 0, size);
 
 	return ret;
@@ -825,26 +806,7 @@ MemoryContextAllocZeroAligned(MemoryContext context, Size size)
 {
 	void	   *ret;
 
-	AssertArg(MemoryContextIsValid(context));
-	AssertNotInCriticalSection(context);
-
-	if (!AllocSizeIsValid(size))
-		elog(ERROR, "invalid memory alloc request size %zu", size);
-
-	context->isReset = false;
-
-	ret = (*context->methods->alloc) (context, size);
-	if (ret == NULL)
-	{
-		MemoryContextStats(TopMemoryContext);
-		ereport(ERROR,
-(errcode(ERRCODE_OUT_OF_MEMORY),
- errmsg("out of memory"),
- errdetail("Failed on request of size %zu.", size)));
-	}
-
-	VALGRIND_MEMPOOL_ALLOC(context, ret, size);
-
+	ret = MemoryContextAlloc(context, size);
 	MemSetLoop(ret, 0, size);
 
 	return ret;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-11 Thread Amit Kapila
On Fri, Mar 11, 2016 at 3:34 PM, Mithun Cy 
wrote:
>
> On Thu, Mar 10, 2016 at 9:39 PM, Robert Haas 
wrote:
> >I guess there must not be an occurrence of this pattern in the
> >regression tests, or previous force_parallel_mode testing would have
> >found this problem.  Perhaps this patch should add one?
>
> I have added the test to select_into.sql. Added Explain select into
statement.
>

I don't see how this test will fail with force_parallel_mode=regress and
max_parallel_degree > 0 even without the patch proposed to fix the issue in
hand.  In short, I don't think this test would have caught the issue, so I
don't see much advantage in adding such a test.  Even if we want to add
such a test case, I think as proposed this will substantially increase the
timing for "Select Into" test which might not be an acceptable test case
addition especially for testing one corner case.

>
> Explain Analyze produces planning time and execution time even with
TIMING OFF
> so not adding the same to regress tests.
>

Yeah, that makes the addition of test for this functionality difficult.
Robert, do you have any idea what kind of test would have caught this issue?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] raw output from copy

2016-03-11 Thread Pavel Stehule
Hi

2016-03-09 18:41 GMT+01:00 Corey Huinker :

>
>>> The regression tests seem to adequately cover all new functionality,
>>> though I wonder if we should add some cases that highlight situations where
>>> BINARY mode is insufficient.
>>>
>>>
> One thing I tried to test RAW was to load an existing json file.
>
> My own personal test was to load an existing .json file into a 1x1 bytea
> table, which worked. From there I was able to
> select encode(col_name,'escape')::text::jsonb from test_table
> and the json was correctly converted.
>
> A similar test copying binary failed.
>
> A write up of the test looks like this:
>
>
> \copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);
> COPY 1
> create temporary table raw_byte (b bytea);
> CREATE TABLE
> create temporary table raw_text (t text);
> CREATE TABLE
> \copy raw_jsonb from '/tmp/raw_test.blob' (format raw);
> psql:/home/ubuntu/raw_test.sql:9: ERROR:  relation "raw_jsonb" does not
> exist
> \copy raw_byte from '/tmp/raw_test.blob' (format raw);
> COPY 1
> select encode(b,'escape')::text::json from raw_byte;
>  encode
> 
>  {"foo": "bar"}
> (1 row)
>
> \copy raw_text from '/tmp/raw_test.blob' (format raw);
> COPY 1
> select t::jsonb from raw_text;
>t
> 
>  {"foo": "bar"}
> (1 row)
>
> create temporary table binary_byte (b bytea);
> CREATE TABLE
> create temporary table binary_text (t text);
> CREATE TABLE
> \copy binary_byte from '/tmp/raw_test.blob' (format binary);
> psql:/home/ubuntu/raw_test.sql:22: ERROR:  COPY file signature not
> recognized
> select encode(b,'escape')::jsonb from binary_byte;
>  encode
> 
> (0 rows)
>
> \copy binary_text from '/tmp/raw_test.blob' (format binary);
> psql:/home/ubuntu/raw_test.sql:26: ERROR:  COPY file signature not
> recognized
> select t::jsonb from binary_text;
>  t
> ---
> (0 rows)
>
>
> So, *if* we want to add a regression test to demonstrate to posterity why
> we need RAW for cases that BINARY can't handle, I offer the attached file.
>

I don't think so regress tests should to do this demonstration. It is
clean, so COPY BINARY should to fail every time, and then there is not any
benefit from it in regress tests. There are lot of discussion in this
thread, and we don't need to inject more "garbage" to regress tests.


>
> Does anyone else see value in adding that to the regression tests?
>

>
>
>> Before I give my approval, I want to read it again more closely to make
>>> sure that no cases were skipped with regard to the  (binary || raw) and
>>> (binary || !raw) tests. Also, I want to use it on some of my problematic
>>> files. Maybe I'll find a good edge case. Probably not.
>>>
>>
> I don't know why I thought this, but when I looked at the patch, I assumed
> that the ( binary || raw ) tests were part of a large if/elseif/else
> waterfall. They are not. They stand alone. There are no edge cases to find.
>

This is organized to files by necessity to work with external files. The
regress tests for COPY RAW has about 100 lines - so why need special files
and infrastructure. COPY RAW, COPY BINARY tests well shares infrastructure.


>
> Review complete and passed. I can re-review if we want to add the
> additional test.
>
>
Great, thank you very much. I hope so this feature really useful. It allow
to simple export/import XML doc in different encodings, JSONs and can be
enhanced future via options. The nice feature  (but not for this release)
can be additional cast info for import -- like "COPY table(jsonb_column)
FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because there are the options,
there are big space for other enhancing.

Regards

Pavel


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Dilip Kumar
On Sat, Mar 12, 2016 at 8:37 AM, Amit Kapila 
wrote:

> Can you post the numbers for 1, 5, 10, 15, 25 or whatever other multiplier
> you have tried, so that it is clear that 20 is best?


I had Tried with 1, 10, 20 and 50.

1. With base code it was almost the same as base code.

2. With 10 thread data it matching with my previous group extend patch data
posted upthread
http://www.postgresql.org/message-id/CAFiTN-tyEu+Wf0-jBc3TGfCoHdEAjNTx=wvuxpoa1vddyst...@mail.gmail.com

3. Beyond 20 with 50 I did not see any extra benefit in performance number
(compared to 20 when tested 50 with 4 byte COPY I did not tested other data
size with 50.).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Joel Jacobson
On Sat, Mar 12, 2016 at 6:31 AM, Jim Nasby  wrote:
> On 3/10/16 8:36 PM, Robert Haas wrote:
>>
>> 1. We make it true only for heavyweight lock waits, and false for
>> other kinds of waits.  That's pretty strange.
>> 2. We make it true for all kinds of waits that we now know how to
>> report.  That still breaks compatibility.
>
>
> I would absolutely vote for 2 here. You could even argue that it's a bug
> fix, since those were waits we technically should have been indicating.
>
> The only way I can see #2 breaking anything is if you're using waiting=true
> to determine whether you look at pg_locks and your code will blow up if you
> get no rows back, but that seems like a pretty limited use case to me
> (Hello, LEFT JOIN).
>
> Dropping the column entirely though would break tons of things.

That was a very good point I hadn't thought about. In that case, +1 to keep it.

I also came to think of the renaming of pg_stat_activity.procpid ->
pg_stat_activity.pid,
which was renamed because "backwards compatibility was broken anyway"
(due to current_query -> query).

I wouldn't rule out there were users who didn't use the
"current_query" column but *did* use "procpid",
who wouldn't have been affected if the procpid column hadn't been
renamed as well.

Apparently in this case, it was OK to break even more things than
necessary, since another things was already broken.

I don't have any opinion whether doing so was a bad or good decision,
it all depends on what the project's objectives are.
Unfortunately, it feels like the case-by-case basis decision model
lead to conflicting arguments, if they would be compared side-by-side.

I really think the project need a policy here on how, why and when
it's OK to break backwards-compatibility.
Such a policy won't cover all cases of course, but if a consensus can
be made on the basic principles,
then discussion can be focused on the details and cases not covered by
the basic principles,
instead of "end up hashing it out on a case-by-case basis".


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/pgSQL, get diagnostics and big data

2016-03-11 Thread Petr Jelinek

On 12/03/16 04:30, Tom Lane wrote:


1. I found two places (marked XXX in this patch) that are using strtoul()
to parse a tuple count back out of a command tag.  That won't do anymore.
pg_stat_statements has a messy hack for the same problem (look for
HAVE_STRTOULL), which is probably what we want to do, but not by
copy-and-pasting #ifdef HAVE_STRTOULL into multiple places.  I'd be
inclined to provide a utility function "pg_strtouint64" or some such
to encapsulate that.  (numutils.c might be a good place for it.)



Hmm, I thought that solution is not really portable for 64bit numbers 
and only is allowed in pg_stat_statements because worst case it will cut 
the number to 32bit int and misreport but won't break anything there. 
For example windows IIRC need _strtoui64 for this.


I once wrote (well copy-pasted from BDS + some #define wrappers) 
portable version of that, see the 0004 and bottom of 0003 in 
http://www.postgresql.org/message-id/557d9ded.2080...@2ndquadrant.com (I 
think at minimum what the 0003 does in c.h is needed).


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Petr Jelinek

On 12/03/16 01:01, Jim Nasby wrote:

On 3/11/16 5:14 PM, Petr Jelinek wrote:

I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.



Yeah I was thinking about the latter part and as I said it's very rare
case, but I did see something similar couple of times in the wild. It's
not objection against committing this patch though, in fact I think it
can be committed as is.


FWIW, this is definitely a real possibility in any shop that has very
high downtime costs and high transaction rates.

I also think some kind of clamp is a good idea. It's not that uncommon
to run max_connections significantly higher than 100, so the extension
could be way larger than 16MB. In those cases this patch could actually
make things far worse as everyone backs up waiting on the OS to extend
many MB when all you actually needed were a couple dozen more pages.



Well yeah I've seen 10k, but not everything will write to same table, 
wanted to stay in realms of something that has realistic chance of 
happening.



BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent
demand for extension that means you're running lots of small DML
operations, not really big ones. I'd think that would make *1 more
appropriate.


The benchmarks I've seen showed you want at least *10 and *20 was better.

--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Petr Jelinek

On 12/03/16 03:46, Dilip Kumar wrote:


On Sat, Mar 12, 2016 at 5:31 AM, Jim Nasby > wrote:

FWIW, this is definitely a real possibility in any shop that has
very high downtime costs and high transaction rates.

I also think some kind of clamp is a good idea. It's not that
uncommon to run max_connections significantly higher than 100, so
the extension could be way larger than 16MB. In those cases this
patch could actually make things far worse as everyone backs up
waiting on the OS to extend many MB when all you actually needed
were a couple dozen more pages.


I agree, We can have some max limit on number of extra pages, What other
thinks ?



Well, that's what I meant with clamping originally. I don't know what is 
a good value though.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Aggregate

2016-03-11 Thread David Rowley
On 11 March 2016 at 03:39, David Rowley  wrote:
> A couple of things which I'm not 100% happy with.
>
> 1. make_partialgroup_input_target() doing lookups to the syscache.
> Perhaps this job can be offloaded to a new function in a more suitable
> location. Ideally the Aggref would already store the required
> information, but I don't see a great place to be looking that up.

I've made some changes and moved this work off to a new function in tlist.c.

> 2. I don't really like how I had to add tlist to
> create_grouping_paths(), but I didn't want to go to the trouble of
> calculating the partial agg PathTarget if Parallel Aggregation is not
> possible, as this does do syscache lookups, so it's not free, so I'd
> rather only do it if we're actually going to add some parallel paths.

This is now fixed. The solution was much easier after 49635d7b.

> 3. Something about the force_parallel_mode GUC. I'll think about this
> when I start to think about how to test this, as likely I'll need
> that, else I'd have to create tables bigger than we'd want to in the
> regression tests.

On further analysis it seems that this GUC does not do what I thought
it did, which will be why Robert said that I don't need to think about
it here. The GUC just seems to add a Gather node at the base of the
plan tree, when possible. Which leaves me a bit lost when it comes to
how to write tests for this... It seems like I need to add at least
136k rows to a test table to get a Parallel Aggregate plan, which I
think is a no-go for the regression test suite... that's with
parallel_setup_cost=0;

It would be nice if that GUC just, when enabled, preferred the
cheapest parallel plan (when available), rather than hacking in a
Gather node into the plan's root. This should have the same result in
many cases anyway, and would allow me to test this without generating
oversized tables in the regression tests.

I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.

This patch also fixes a couple of bugs, one in the cost estimates for
the number of groups that will be produced by the final aggregate,
also there was a missing copyObject() in the setrefs.c code which
caused a Var not found in targetlist problem in setrefs.c for plans
with more than 1 partial aggregate node... I had to modify the planner
to get it to add an additional aggregate node to test this (separate
test patch for this is attached).

Comments/Reviews/Testing all welcome.

Thanks

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


parallel_aggregation_8a26089_2016-03-12.patch
Description: Binary data


parallel_aggregation_extra_combine_node.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/pgSQL, get diagnostics and big data

2016-03-11 Thread Tom Lane
I wrote:
> I'll take it from here, unless I find bigger issues.

Hmm ... so the more I pulled on this string, the more stuff I found.
The attached updated patch fixes several additional significant
areas:

* call_cntr and max_calls in FuncCallContext are now uint64

* Result widths for PortalRunFetch and allied routines are now uint64

* count arguments to ExecutorRun and related routines are now uint64

* Maximum number of tuples in a SPITupleTable is now uint64

* Management of portal (cursor) positions fixed.  I promoted portalPos to
uint64, and got rid of posOverflow, which seems not especially necessary
if we're using 64-bit counters.

* A whole lot of places that were comparing integer-width variables
to SPI_processed, and would therefore go into infinite loops with
result sizes above 2G tuples, now use uint64 counters instead.

However, I drew the line at changing FetchStmt.howMany, which means that
the count-limit inputs to PortalRunFetch et al didn't change; they're
still signed long.  Changing that would require messing with the lexer
which I did not feel like doing.  So the maximum offset/position in a
FETCH or MOVE is (still) platform-dependent.  I'm not sure that this is
worth changing, as you will get a clean syntax error if you try to exceed
the limit.

BTW, if anyone thinks the above changes represent too much API churn,
speak up.  This is definitely going to cause some pain for extensions.

There's a fair amount of work yet to do:

1. I found two places (marked XXX in this patch) that are using strtoul()
to parse a tuple count back out of a command tag.  That won't do anymore.
pg_stat_statements has a messy hack for the same problem (look for
HAVE_STRTOULL), which is probably what we want to do, but not by
copy-and-pasting #ifdef HAVE_STRTOULL into multiple places.  I'd be
inclined to provide a utility function "pg_strtouint64" or some such
to encapsulate that.  (numutils.c might be a good place for it.)

2. As I was just complaining to -hackers, plpython plperl and pltcl
all now contain attempts to pass uint64 values (from SPI_processed)
into language-specific functions.  We need to figure out whether
that will overflow and whether it's worth doing something about.

3. This patch still needs a lot of review as I may have missed
something.

So I'm bouncing this back to Waiting on Author.

regards, tom lane

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 76d1831..6708d81 100644
*** a/contrib/auto_explain/auto_explain.c
--- b/contrib/auto_explain/auto_explain.c
*** void		_PG_fini(void);
*** 61,67 
  static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
  static void explain_ExecutorRun(QueryDesc *queryDesc,
  	ScanDirection direction,
! 	long count);
  static void explain_ExecutorFinish(QueryDesc *queryDesc);
  static void explain_ExecutorEnd(QueryDesc *queryDesc);
  
--- 61,67 
  static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
  static void explain_ExecutorRun(QueryDesc *queryDesc,
  	ScanDirection direction,
! 	uint64 count);
  static void explain_ExecutorFinish(QueryDesc *queryDesc);
  static void explain_ExecutorEnd(QueryDesc *queryDesc);
  
*** explain_ExecutorStart(QueryDesc *queryDe
*** 257,263 
   * ExecutorRun hook: all we need do is track nesting depth
   */
  static void
! explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
  {
  	nesting_level++;
  	PG_TRY();
--- 257,263 
   * ExecutorRun hook: all we need do is track nesting depth
   */
  static void
! explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, uint64 count)
  {
  	nesting_level++;
  	PG_TRY();
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9ce60e6..29e12cd 100644
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
*** static void pgss_post_parse_analyze(Pars
*** 289,295 
  static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
  static void pgss_ExecutorRun(QueryDesc *queryDesc,
   ScanDirection direction,
!  long count);
  static void pgss_ExecutorFinish(QueryDesc *queryDesc);
  static void pgss_ExecutorEnd(QueryDesc *queryDesc);
  static void pgss_ProcessUtility(Node *parsetree, const char *queryString,
--- 289,295 
  static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
  static void pgss_ExecutorRun(QueryDesc *queryDesc,
   ScanDirection direction,
!  uint64 count);
  static void pgss_ExecutorFinish(QueryDesc *queryDesc);
  static void pgss_ExecutorEnd(QueryDesc *queryDesc);
  static void pgss_ProcessUtility(Node *parsetree, const char *queryString,
*** pgss_ExecutorStart(QueryDesc *queryDesc,
*** 866,872 
   * ExecutorRun hook: all we need do is track nesting depth
   */
  static void
! pgss_ExecutorRun(QueryDesc 

[HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-11 Thread Andres Freund
Hi,

On 2016-01-28 19:09:01 +, Robert Haas wrote:
> Only try to push down foreign joins if the user mapping OIDs match.
> 
> Previously, the foreign join pushdown infrastructure left the question
> of security entirely up to individual FDWs, but it would be easy for
> a foreign data wrapper to inadvertently open up subtle security holes
> that way.  So, make it the core code's job to determine which user
> mapping OID is relevant, and don't attempt join pushdown unless it's
> the same for all relevant relations.
> 
> Per a suggestion from Tom Lane.  Shigeru Hanada and Ashutosh Bapat,
> reviewed by Etsuro Fujita and KaiGai Kohei, with some further
> changes by me.

I noticed that this breaks some citus regression tests in a minor
manner. Namely previously file_fdw worked without a user mapping, now it
doesn't appear to anymore.

This is easy enough to fix, and it's perfectly ok for us to fix this,
but I do wonder if that's not going to cause trouble for others.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Amit Kapila
On Sat, Mar 12, 2016 at 8:16 AM, Dilip Kumar  wrote:
>
>
> On Sat, Mar 12, 2016 at 5:31 AM, Jim Nasby 
wrote:
>>
>> FWIW, this is definitely a real possibility in any shop that has very
high downtime costs and high transaction rates.
>>
>> I also think some kind of clamp is a good idea. It's not that uncommon
to run max_connections significantly higher than 100, so the extension
could be way larger than 16MB. In those cases this patch could actually
make things far worse as everyone backs up waiting on the OS to extend many
MB when all you actually needed were a couple dozen more pages.
>
>
> I agree, We can have some max limit on number of extra pages, What other
thinks ?
>
>
>>
>> BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent
demand for extension that means you're running lots of small DML
operations, not really big ones. I'd think that would make *1 more
appropriate.
>
>
> *1 will not solve this problem, Here the main problem was many people are
sleep/wakeup on the extension lock and that was causing the bottleneck. So
if we do *1 this will satisfy only current requesters which has already
waited on the lock. But our goal is to avoid backends from requesting this
lock.
>
> Idea of Finding the requester to get the statistics on this locks (load
on the lock) and extend in multiple of load so that in future this
situation will be avoided for long time and again when happen next time
extend in multiple of load.
>
> How 20 comes ?
>   I tested with Multiple clients loads 1..64,  with multiple load size 4
byte records to 1KB Records,  COPY/ INSERT and found 20 works best.
>

Can you post the numbers for 1, 5, 10, 15, 25 or whatever other multiplier
you have tried, so that it is clear that 20 is best?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Proposal: BSD Authentication support

2016-03-11 Thread Peter Eisentraut
On 3/11/16 4:38 PM, Thomas Munro wrote:
> It looks like this needs review from an OpenBSD user specifically.
> FreeBSD and NetBSD use PAM instead of BSD auth.

FreeBSD has man pages for this stuff, so maybe they also have it now.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: BSD Authentication support

2016-03-11 Thread Peter Eisentraut
On 1/7/16 9:40 PM, Marisa Emerson wrote:
> There's a port for PAM, but we would prefer to use BSD Auth as its quite
> a lot cleaner and is standard on OpenBSD.
> 
> I've attached an updated patch that includes documentation. It has been
> tested against OpenBSD 5.8. I'll add this thread to the commitfest.

(Not a BSD user, just reviewing the code.)

configure.in has "build with BSD support", which should be "build with
BSD Authentication support".

There should be some documentation of the new configure option in
installation.sgml.

The documentation in client-auth.sgml speaks of a postgresql user and an
auth group.  Maybe that's clear to users of BSD, but I don't know
whether these are OS entities or groups that I need to create or what.

The auth_userokay() call hardcodes a "type" of "pg-auth".  That seems
important and should probably be documented.  Extrapolating from PAM, I
think that should perhaps be an option in pg_hba.conf.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Dilip Kumar
On Sat, Mar 12, 2016 at 5:31 AM, Jim Nasby  wrote:

> FWIW, this is definitely a real possibility in any shop that has very high
> downtime costs and high transaction rates.
>
> I also think some kind of clamp is a good idea. It's not that uncommon to
> run max_connections significantly higher than 100, so the extension could
> be way larger than 16MB. In those cases this patch could actually make
> things far worse as everyone backs up waiting on the OS to extend many MB
> when all you actually needed were a couple dozen more pages.
>

I agree, We can have some max limit on number of extra pages, What other
thinks ?



> BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent
> demand for extension that means you're running lots of small DML
> operations, not really big ones. I'd think that would make *1 more
> appropriate.
>

*1 will not solve this problem, Here the main problem was many people are
sleep/wakeup on the extension lock and that was causing the bottleneck. So
if we do *1 this will satisfy only current requesters which has already
waited on the lock. But our goal is to avoid backends from requesting this
lock.

Idea of Finding the requester to get the statistics on this locks (load on
the lock) and extend in multiple of load so that in future this situation
will be avoided for long time and again when happen next time extend in
multiple of load.

How 20 comes ?
  I tested with Multiple clients loads 1..64,  with multiple load size 4
byte records to 1KB Records,  COPY/ INSERT and found 20 works best.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Tomas Vondra

On Fri, 2016-03-11 at 16:40 -0800, Peter Geoghegan wrote:
> On Fri, Mar 11, 2016 at 4:17 PM, Peter Geoghegan 
> wrote:
> > 
> > If you want the tool to limp on when it finds an error, that can be
> > done by changing the constant for the CORRUPTION macro in
> > amcheck.c.
> > But having that be dynamically configurable is not really
> > compatible
> > with the goal of having amcheck be run routinely.
> Also, it's just really hard to reason about what remains OK to check
> after the first problem is encountered in the general case. It's
> "unreasonable" for any of the checks to ever fail. So, by that
> standard, assuming that they might fail at all could be called
> paranoid. Who can say what "paranoid enough" should be? I think it's
> useful to have a low-impact, generic check function for B-Tree
> indexes, but I don't think we need to hold back on being exhaustive
> elsewhere.

Right, but isn't there a difference between the two functions in this
respect? Once you find corruption involving relationship between
multiple pages, then I agree it's complicated to do any reasoning about
what additional checks are safe.

But does that problem also apply to bt_index_check, which checks pages
independently?

Admittedly, this also depends on the use case. If all we need to do is
answering a question "Is the index corrupted?" then sure, bailing out
on the first error is perfectly appropriate.

But perhaps this would be useful for some recovery/forensics tasks?

>From time to time we need to investigate corruption in a database, i.e.
see how much of the data is actually corrupted, list pages that need to
be zeroed to get the cluster up to salvage as much as possible, etc.
Currently this is tedious because we essentially find/fix the pages one
by one. It'd be very useful to list all broken pages in one go and then
fix all of them.

Obviously, that's about heapam checks, but perhaps it would be useful
for an index too?

regard

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perl's newSViv() versus 64-bit ints?

2016-03-11 Thread Andrew Dunstan



On 03/11/2016 06:49 PM, Tom Lane wrote:

Anybody know what will happen when passing a uint64 to newSViv()?





A perl IV is guaranteed large enough to hold a pointer, if that's any 
help. But for an unsigned value you might be better off calling newSVuv()


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Peter Geoghegan
On Fri, Mar 11, 2016 at 4:30 PM, Jim Nasby  wrote:
> Right, but you still have the option to enable them if you don't want to
> swamp your IO system. That's why CIC obeys it too. If I was running a
> consistency check on a production system I'd certainly want the option to
> throttle it. Without that option, I don't see running this on production
> systems as being an option. If that's not a goal then fine, but if it is a
> goal I think it needs to be there.
>
> Isn't it just a few extra lines of code to support it?

I see your point.

I'll add that if people like the interface you propose. (Overloading
the VACUUM cost delay functions to cause a delay for amcheck
functions, too). Note that the functions already use an appropriate
buffer access strategy (it avoids blowing shared_buffers, much like
VACUUM itself).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Peter Geoghegan
On Fri, Mar 11, 2016 at 4:17 PM, Peter Geoghegan  wrote:
> If you want the tool to limp on when it finds an error, that can be
> done by changing the constant for the CORRUPTION macro in amcheck.c.
> But having that be dynamically configurable is not really compatible
> with the goal of having amcheck be run routinely.

Also, it's just really hard to reason about what remains OK to check
after the first problem is encountered in the general case. It's
"unreasonable" for any of the checks to ever fail. So, by that
standard, assuming that they might fail at all could be called
paranoid. Who can say what "paranoid enough" should be? I think it's
useful to have a low-impact, generic check function for B-Tree
indexes, but I don't think we need to hold back on being exhaustive
elsewhere.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a way around function search_path killing SQL function inlining? - and backup / restore issue

2016-03-11 Thread Jim Nasby

On 3/10/16 3:29 PM, Regina Obe wrote:

Take for example, I have tiger geocoder which relies on fuzzystrmatch.  I have 
no idea where someone installs fuzzystrmatch so I can't schema qualify those 
calls.  I use that dependent function to use to build an index on tables.


This is something I've thought about as well, and I think the real 
problem is search_path just isn't the right way to handle this. I think 
there needs to be some way to definitively reference something that's 
part of an extension; a method that doesn't depend on whatever schema 
the extension happens to be installed in.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Jim Nasby

On 3/11/16 6:17 PM, Peter Geoghegan wrote:

Not sure about the cost delay thing. Delays are disabled by default
for manually issued VACUUM, so have doubts that that's useful.


Right, but you still have the option to enable them if you don't want to 
swamp your IO system. That's why CIC obeys it too. If I was running a 
consistency check on a production system I'd certainly want the option 
to throttle it. Without that option, I don't see running this on 
production systems as being an option. If that's not a goal then fine, 
but if it is a goal I think it needs to be there.


Isn't it just a few extra lines of code to support it?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Peter Geoghegan
On Fri, Mar 11, 2016 at 3:50 PM, Jim Nasby  wrote:
> I also agree that the nmodule name isn't very clear. If this is meant to be
> the start of a generic consistency checker, lets call it that. Otherwise, it
> should be marked as being specific to btrees, because presumably we might
> eventually want similar tools for GIN, etc. (FWIW I'd vote for a general
> consistency checker).

It's a generic consistency checker -- that's the intent. Robert wanted
to go that way about a year ago, and I think it makes sense (this
module started out life as btreecheck). As I said, I don't really care
what the name ends up being. amcheck seems fine to me, but I'll go
with any reasonable suggestion that reflects the scope of the tool as
a generic consistency checker for AMs, including heapam.

I don't know that I understand the concern about the naming of
bt_index_parent_check(). I called it something that accurately
reflects what it does. That particular SQL-callable function is more
orientated towards helping hackers than towards helping users detect
routine problems, as the docs say, but it could help users working
with hackers, and it might even help users acting on their own. I
don't want to make it sound scary, because it isn't. I don't know what
problems will be detected by this tool most often, and I don't think
it would be wise to try to predict how that will work out. And so,
concealing the internal workings of each function/check by giving them
all totally non-technical names seems like a bad idea. It's not that
hard to understand that a B-Tree has multiple levels, and checking the
levels against each other requires more restrictive/stronger
heavyweight locks. I've only added 2 SQL-callable functions, each of
which has one argument, and the first of which has a generic name and
very light locking requirements, like a SELECT. It's not that hard to
get the general idea, as an ordinary user.

> I know the vacuum race condition would be very rare, but I don't think it
> can be ignored. Last thing you want out of a consistency checker is false
> negatives/positives.

That's what I said. And the docs also say that there should never be a
false positive. That's definitely an important design goal here,
because it enables routine testing.

> I do think it would be reasonable to just wholesale
> block against concurrent vacuums, but I don't think there's any reasonable
> way to do that.

Actually, that's exactly what bt_index_parent_check() does already.
VACUUM requires a SHARE UPDATE EXCLUSIVE lock on the heap relation,
which cannot be concurrently acquired due to bt_index_parent_check()'s
acquisition of a SHARE lock. The locking for bt_index_parent_check()
is almost the same as REINDEX, except that that acquires an ACCESS
EXCLUSIVE lock on the index relation; bt_index_parent_check() only
requires an EXCLUSIVE lock on the index relation.

Not sure about the cost delay thing. Delays are disabled by default
for manually issued VACUUM, so have doubts that that's useful.

If you want the tool to limp on when it finds an error, that can be
done by changing the constant for the CORRUPTION macro in amcheck.c.
But having that be dynamically configurable is not really compatible
with the goal of having amcheck be run routinely.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2016-03 Commitfest

2016-03-11 Thread David Steele
We're are now one third of the way through the 2016-03 Commitfest.

There are still some patches left that need review but have no reviewer
(https://commitfest.postgresql.org/9/?reviewer=-2) though a lot have
been picked up in the last week.

Needs review: 56
Needs *reviewer*: 15 (was 58 last week)

There were a number of patches that were marked as "needs review" but
were actually "waiting for author" as far as I could see.  If your patch
is in this state it would be good if you could give an idea when you
will be able to supply a new patch or otherwise address concerns raised
in the thread.

Waiting on Author: 29

The closed patches are up significantly since last week and the good
news is that most of them were committed.  36% of the patches are now
closed.

Committed: 46
Rejected: 4
Returned with Feedback: 4

I will continue to respond individually to threads that seem idle or
have issues that need to be resolved.  Please do not hesitate to contact
me about about any concerns that you might have regarding the Commitfest
process.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Jim Nasby

On 3/11/16 5:14 PM, Petr Jelinek wrote:

I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.



Yeah I was thinking about the latter part and as I said it's very rare
case, but I did see something similar couple of times in the wild. It's
not objection against committing this patch though, in fact I think it
can be committed as is.


FWIW, this is definitely a real possibility in any shop that has very 
high downtime costs and high transaction rates.


I also think some kind of clamp is a good idea. It's not that uncommon 
to run max_connections significantly higher than 100, so the extension 
could be way larger than 16MB. In those cases this patch could actually 
make things far worse as everyone backs up waiting on the OS to extend 
many MB when all you actually needed were a couple dozen more pages.


BTW, how was *20 arrived at? ISTM that if you have a lot of concurrent 
demand for extension that means you're running lots of small DML 
operations, not really big ones. I'd think that would make *1 more 
appropriate.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PREPARE dynamic SQL in plpgsql

2016-03-11 Thread David G. Johnston
On Fri, Mar 11, 2016 at 4:45 PM, Koichi Suzuki  wrote:

> Hi,
>
> Does someone know how to prepare a synamic SQL statement in plpgsql?
>
> All the examples, PG documents describe only about preparing static SQL
> statement.
>
>
This is not an appropriate question for the -hackers list.  It is better
asked on either -general or, more correctly, -novice.

It is also somewhat lacking in detail - including an example of what you've
tried or read would be a nice touch.
​
David J.


Re: [HACKERS] Perl's newSViv() versus 64-bit ints?

2016-03-11 Thread Tom Lane
I wrote:
> Anybody know what will happen when passing a uint64 to newSViv()?

Oh, and how about Python's PyInt_FromLong()?
Or PyList_New()?
Or the second argument of PyList_SetItem()?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PREPARE dynamic SQL in plpgsql

2016-03-11 Thread David G. Johnston
On Fri, Mar 11, 2016 at 4:45 PM, Koichi Suzuki  wrote:

> Hi,
>
> Does someone know how to prepare a synamic SQL statement in plpgsql?
>
> All the examples, PG documents describe only about preparing static SQL
> statement.
>
>
You might want to rephrase the question.  From the pl/pgsql documentation:

"Note: The PL/pgSQL EXECUTE statement is not related to the EXECUTE SQL
statement supported by the PostgreSQL server. The server's EXECUTE
statement cannot be used directly within PL/pgSQL functions (and is not
needed)."

http://www.postgresql.org/docs/9.5/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN

So even if you could SQL-PREPARE you wouldn't be able to execute it.

The linked section describes how to construct and execute dynamic queries
in pl/pgsql.  The short answer is you create variable of type text and then
EXECUTE it.

stmt := format('SELECT * FROM %i', in_tbl_name);
EXECUTE stmt [...]

David J.


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Jim Nasby

On 3/11/16 3:31 PM, Peter Geoghegan wrote:

Can we come up with names that more clearly identify the difference
>between those two functions? I mean,_parent_  does not make it
>particularly obvious that the second function acquires exclusive lock
>and performs more thorough checks.

Dunno about that. It's defining characteristic is that it checks child
pages against their parent IMV. Things are not often defined in terms
of their locking requirements.


First, thanks for your work on this. I've wanted it in the past.

I agree the name isn't very clear. Perhaps _recurse?

I also agree that the nmodule name isn't very clear. If this is meant to 
be the start of a generic consistency checker, lets call it that. 
Otherwise, it should be marked as being specific to btrees, because 
presumably we might eventually want similar tools for GIN, etc. (FWIW 
I'd vote for a general consistency checker).


I know the vacuum race condition would be very rare, but I don't think 
it can be ignored. Last thing you want out of a consistency checker is 
false negatives/positives. I do think it would be reasonable to just 
wholesale block against concurrent vacuums, but I don't think there's 
any reasonable way to do that.


I would prefer the ability to do something other than raising an error 
when corruption is found, so that you could find all corruption in an 
index. Obviously could log to a different level. Another option would be 
SRFs that return info about all the corruption found, but that's 
probably overkill.


It'd be nice if you had the option to obey vacuum_cost_delay when 
running this, but that's clearly just a nice-to-have (or maybe just obey 
it all the time, since it defaults to 0).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Perl's newSViv() versus 64-bit ints?

2016-03-11 Thread Tom Lane
Anybody know what will happen when passing a uint64 to newSViv()?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Background Processes and reporting

2016-03-11 Thread Andres Freund
On 2016-03-12 02:24:33 +0300, Alexander Korotkov wrote:
> Idea of individual time measurement of every wait event met criticism
> because it might have high overhead [1].

Right. And that's actually one of the point which I meant with "didn't
listen to criticism". There've been a lot of examples, on an off list,
where taking timings trigger significant slowdowns.  Yes, in some
bare-metal environments, which a coherent tsc, the overhead can be
low. But that doesn't make it ok to have a high overhead on a lot of
other systems.

Just claiming that that's not a problem will only lead to your position
not being taken serious.


> This is really so at least for Windows [2].

Measuring timing overhead for a simplistic workload on a single system
doesn't mean that.  Try doing such a test on a vmware esx virtualized
windows machine, on a multi-socket server; in a lot of instances you'll
see two-three orders of magnitude longer average times; with peaks going
into 4-5 orders of magnitude.  And, as sad it is, realistically most
postgres instances will run in virtualized environments.


> But accessing only current values wouldn't be very useful.  We
> anyway need to gather some statistics.  Gathering it by sampling would be
> both more expensive and less accurate for majority of systems.  This is why
> I proposed hooks to make possible platform dependent extensions.  Robert
> rejects hook because he is "not a big fan of hooks as a way of resolving
> disagreements about the design" [3].

I think I agree with Robert here. Providing hooks into very low level
places tends to lead to problems in my experience; tight control over
what happens is often important - I certainly don't want any external
code to run while we're waiting for an lwlock.


> Besides that is actually not design issues but platform issues...

I don't see how that's the case.


> Another question is wait parameters.  We want to expose wait event with
> some parameters.  Robert rejects that because it *might* add additional
> overhead [3]. When I proposed to fit something useful into hard-won
> 4-bytes, Roberts claims that it is "too clever" [4].

I think stopping to treat this as "Robert/EDB vs. pgpro" would be a good
first step to make progress here.


It seems entirely possible to extend the current API in an incremental
fashion, either allowing to disable the individual pieces, or providing
sufficient measurements that it's not needed.


> So, situation looks like dead-end.  I have no idea how to convince Robert
> about any kind of advanced functionality of wait monitoring to PostgreSQL.
> I'm thinking about implementing sampling extension over current
> infrastructure just to make community see that it sucks. Andres, it would
> be very nice if you have any idea how to move this situation forward.

I've had my share of conflicts with Robert. But if I were in his shoes,
targeted by this kind of rhetoric, I'd be very tempted to just ignore
any further arguments from the origin.  So I think the way forward is
for everyone to cool off, and to see how we can incrementally make
progress from here on.


> Another aspect is that EnterpriseDB offers waits monitoring in proprietary
> fork [5].

So?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PREPARE dynamic SQL in plpgsql

2016-03-11 Thread Koichi Suzuki
Hi,

Does someone know how to prepare a synamic SQL statement in plpgsql?

All the examples, PG documents describe only about preparing static SQL
statement.

Thank you;
--
Koichi Suzuki


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Jim Nasby

On 3/10/16 8:36 PM, Robert Haas wrote:

1. We make it true only for heavyweight lock waits, and false for
other kinds of waits.  That's pretty strange.
2. We make it true for all kinds of waits that we now know how to
report.  That still breaks compatibility.


I would absolutely vote for 2 here. You could even argue that it's a bug 
fix, since those were waits we technically should have been indicating.


The only way I can see #2 breaking anything is if you're using 
waiting=true to determine whether you look at pg_locks and your code 
will blow up if you get no rows back, but that seems like a pretty 
limited use case to me (Hello, LEFT JOIN).


Dropping the column entirely though would break tons of things.

Another random thought... changes like this would probably be easier to 
handle if we provided backwards compatibility extensions that created 
views that mimicked the catalog for a specific Postgres version.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Upper planner pathification

2016-03-11 Thread Andres Freund
On 2016-03-10 23:38:14 -0500, Tom Lane wrote:
> > Would you rather add back the exports or should I?
> 
> I'll do it ... just send me the list.

After exporting make_agg, make_limit, make_sort_from_sortclauses and
making some trivial adjustments due to the pull_var_clause changes
change, Citus' tests pass on 9.6, bar some noise.  Pathification made
some plans switch from hash-agg to sort-agg, and the other way round;
but that's obviously ok.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Background Processes and reporting

2016-03-11 Thread Alexander Korotkov
On Sat, Mar 12, 2016 at 12:22 AM, Andres Freund  wrote:

> On 2016-03-11 23:53:15 +0300, Vladimir Borodin wrote:
> > It was many times stated in threads about waits monitoring [0, 1, 2]
> > and supported by different people, but ultimately waits information
> > was stored in PgBackendStatus.
>
> Only that it isn't. It's stored in PGPROC.


Yes. And this is good. Original concept of single byte PgBackendStatus of
Robert looks incompatible with any further development of wait monitoring.


> This criticism is true of
> the progress reporting patch, but a quick scan of the thread doesn't
> show authors of the wait events patch participating there.
>
>
> > Can’t we think one more time about implementation provided by Ildus
> > and Alexander here [3]?
>
> I don't think so. Afaics the proposed patch tried to do too many things
> at once, and it's authors didn't listen well to criticism.


Original patch really did too many things at once.  But it was good as
prototype.
We should move forward by splitting it into many smaller parts.  But we
didn't because disagreement about design.

Idea of individual time measurement of every wait event met criticism
because it might have high overhead [1].  This is really so at least for
Windows [2]. But accessing only current values wouldn't be very useful.  We
anyway need to gather some statistics.  Gathering it by sampling would be
both more expensive and less accurate for majority of systems.  This is why
I proposed hooks to make possible platform dependent extensions.  Robert
rejects hook because he is "not a big fan of hooks as a way of resolving
disagreements about the design" [3].  Besides that is actually not design
issues but platform issues...

Another question is wait parameters.  We want to expose wait event with
some parameters.  Robert rejects that because it *might* add additional
overhead [3]. When I proposed to fit something useful into hard-won
4-bytes, Roberts claims that it is "too clever" [4].

So, situation looks like dead-end.  I have no idea how to convince Robert
about any kind of advanced functionality of wait monitoring to PostgreSQL.
I'm thinking about implementing sampling extension over current
infrastructure just to make community see that it sucks. Andres, it would
be very nice if you have any idea how to move this situation forward.

Another aspect is that EnterpriseDB offers waits monitoring in proprietary
fork [5].

> SQL Session/System wait tuning diagnostics
> The Dynamic Runtime Instrumentation Tools Architecture (DRITA) allows a
> DBA to query catalog views to determine the wait events that affect the
> performance of individual sessions or the system as a whole. DRITA records
> the *number of times each event occurs as well as the time spent waiting*;
> you can use this information to diagnose performance problems.

And they declare to measure time of individual wait events.  This is
exactly thing which Robert resist so much.  So, I suspect some kind of
hidden motivation here.  Probably, EBD guys just don't want to loose
majority of their proprietary product over open source PostgreSQL.

1.
http://www.postgresql.org/message-id/ca+tgmobdc4fbfpa49j6ok-ikazyvhw71vssotpvmf4tdyzr...@mail.gmail.com
2.
http://www.postgresql.org/message-id/capphfdvflevfetzgjprnrexfliyrksgkdu98xo5xx0hw5gu...@mail.gmail.com
3.
http://www.postgresql.org/message-id/ca+tgmoz-8zpoum9bgtbup1u4duqhc-9epedlzyk0dg4pkmd...@mail.gmail.com
4.
http://www.postgresql.org/message-id/ca+tgmobyf2vchghvoks7yrxa2wde+wmmrmxyughvx_w-0wx...@mail.gmail.com
5.
http://www.enterprisedb.com/products-services-training/products/postgres-plus-advanced-server?quicktabs_advanceservertab=3#quicktabs-advanceservertab

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] pl/pgSQL, get diagnostics and big data

2016-03-11 Thread Tom Lane
"Andreas 'ads' Scherbaum"  writes:
> On 09.02.2016 20:32, Christian Ullrich wrote:
>> To fix this, I think it will be enough to change the format strings to
>> use "%zu" instead of "%lu".

> Attached is a new version of the patch, with %lu replaced by %zu.

Nonono ... that just moves the portability problem to a different set of
platforms.  "%z" means size_t, and sizeof(size_t) is not any more fixed
than sizeof(long).  The right thing to use is UINT64_FORMAT.

+  /* Int64GetDatum() instead of 
UInt64GetDatum(),
+ because there is no UInt64GetDatum() */
+  Int64GetDatum(estate->eval_processed),

Although in practice you'd get the same conversion anyway, I'm thinking
it's time to fix that omission rather than just averting our eyes.  The
use of int64 data isn't decreasing as time goes on.

These are small enough changes that there's no need for a new patch
submission.  I'll take it from here, unless I find bigger issues.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Petr Jelinek

On 11/03/16 22:29, Robert Haas wrote:

On Thu, Mar 10, 2016 at 8:54 PM, Petr Jelinek  wrote:

I am not talking about extension locks, the lock queue can be long because
there is concurrent DDL for example and then once DDL finishes suddenly 100
connections that tried to insert into table will try to get extension lock
and this will add 2000 new pages when much fewer was actually needed. I
guess that's fine as it's corner case and it's only 16MB even in such
extreme case.


I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.



Yeah I was thinking about the latter part and as I said it's very rare 
case, but I did see something similar couple of times in the wild. It's 
not objection against committing this patch though, in fact I think it 
can be committed as is.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Peter Geoghegan
On Fri, Mar 11, 2016 at 8:19 AM, Anastasia Lubennikova
 wrote:
> I do hope that my patch will be accepted in 9.6, so this conflict looks
> really bad.

I hope so too. I'll have to look into this issue.

> I think that error is caused by changes in pages layout. To save some space
> nonkey attributes are truncated
> when btree copies the indexed value into inner pages or into high key. You
> can look at index_reform_tuple() calls.

That seems like the kind of problem that would be expected when things
like that change. I think it's going to be hard to add new B-Tree
features without a tool like this, which was a big reason to work on
it; to make these new projects possible to test and review. I see many
opportunities to improve the B-Tree code, just as I imagine you do.
These projects are quite strategically important, because B-Trees are
so frequently used. I think that Postgres B-Trees produce excessive
random I/O for checkpoints for various reasons, and this disadvantages
Postgres when it is compared to other major systems. As things get
better in other areas of Postgres, these hard problems become more
important to solve.

> I wonder, whether you can add some check of catalog version to your module
> or this case requires more changes?

I think that it's just going to be tied to the Postgres version. So,
if your B-Tree patches are committed first, it's on me to make sure
they're handled correctly. Or vice-versa. Not worried that that will
be a problem.

We already take special steps to avoid the "minus infinity" item on
internal pages. I think that in the future, if Postgres B-Trees get
suffix truncation for internal page items, there are new problems for
amcheck (suffix truncation remove unneeded information from internal
page items, sometimes greatly increasing B-Tree fan-out. Internal page
items need only be sufficient to guide index scans correctly.).

Specially, with suffix truncation there might be "minus infinity"
*attributes*, too (these could make it safe to completely remove
attributes/columns past the first distinguishing/distinct attribute on
each item on internal pages). That's a case that amcheck then needs to
care about, just like it currently cares about the existing concept of
minus infinity items. That's how it goes for amcheck.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-03-11 Thread Tom Lane
I wrote:
> I wondered why, instead of inventing an extra semantics-modifying flag,
> we couldn't just change the jointype to *be* JOIN_SEMI when we've
> discovered that the inner side is unique.

BTW, to clarify: I'm not imagining that we'd make this change in the
query jointree, as for example prepjointree.c might do.  That would appear
to add join order constraints, which we don't want.  But I'm thinking that
at the instant where we form a join Path, we could change the Path's
jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the
inner side unique, rather than annotating the Path with a separate flag.
Then that representation is what propagates forward.

It seems like the major intellectual complexity here is to figure out
how to detect inner-side-unique at reasonable cost.  I see that for
LEFT joins you're caching that in the SpecialJoinInfos, which is probably
fine.  But for INNER joins it looks like you're just doing it over again
for every candidate join, and that seems mighty expensive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Pavel Stehule
2016-03-11 23:22 GMT+01:00 Joel Jacobson :

> On Sat, Mar 12, 2016 at 5:09 AM, Pavel Stehule 
> wrote:
> >> What we need is more input on proposed changes from other companies
> >> who are also heavy users of PL/pgSQL.
> >>
> >> Only then can we move forward. It's like Robert is saying, there is a
> >> risk for bikeshedding here,
> >> we must widen our perspectives and get better understanding for how
> >> other heavy users are using PL/pgSQL.
> >
> >
> > I disagree with this opinion - this is community sw, not commercial. We
> can
> > do nothing if we don't find a agreement.
>
> I disagree with your disagreement.
>
> The users are what matters, and many of them are of course not on this
> list (since this is a list for hackers), so we need to reach out to
> the users, and those are companies/websites/nonprofits/governments.
> So discussing proposed changes on this list will take us absolutely
> nowhere, without further input from actual heavy users.
> Once we do have input from the heavy users, then and only then can we
> continue discussing things on this list, but before then it's kind of
> pointless, because we don't know what the most commonly proposed
> changes are, not you, not me. The risk of bikeshedding is just too
> big, like Robert pointed out.
>

I sent a list of requested features. Really, I have not any request from
companies when I worked, did training, consultations for less verbosity or
significant changes in languages. The people miss the features from Oracle,
MSSQL.


>
> >> Pavel, do you know of any such companies?
> > Probably the biggest company with pretty large code of PL/pgSQL was
> Skype,
> > but I have not any info about current state.
>
> True! I had almost forgotten about them after Microsoft acquired them.
> Let's hope they are still on PostgreSQL. I'll check it out, thanks.
>


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Joel Jacobson
On Sat, Mar 12, 2016 at 5:09 AM, Pavel Stehule  wrote:
>> What we need is more input on proposed changes from other companies
>> who are also heavy users of PL/pgSQL.
>>
>> Only then can we move forward. It's like Robert is saying, there is a
>> risk for bikeshedding here,
>> we must widen our perspectives and get better understanding for how
>> other heavy users are using PL/pgSQL.
>
>
> I disagree with this opinion - this is community sw, not commercial. We can
> do nothing if we don't find a agreement.

I disagree with your disagreement.

The users are what matters, and many of them are of course not on this
list (since this is a list for hackers), so we need to reach out to
the users, and those are companies/websites/nonprofits/governments.
So discussing proposed changes on this list will take us absolutely
nowhere, without further input from actual heavy users.
Once we do have input from the heavy users, then and only then can we
continue discussing things on this list, but before then it's kind of
pointless, because we don't know what the most commonly proposed
changes are, not you, not me. The risk of bikeshedding is just too
big, like Robert pointed out.

>> Pavel, do you know of any such companies?
> Probably the biggest company with pretty large code of PL/pgSQL was Skype,
> but I have not any info about current state.

True! I had almost forgotten about them after Microsoft acquired them.
Let's hope they are still on PostgreSQL. I'll check it out, thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Background Processes and reporting

2016-03-11 Thread Andres Freund
On 2016-03-12 01:05:43 +0300, Vladimir Borodin wrote:
> > 12 марта 2016 г., в 0:22, Andres Freund  написал(а):
> > Only that it isn't. It's stored in PGPROC.  
> 
> Sorry, I missed that. So monitoring of wait events for auxiliary processes 
> still could be implemented?

It's basically a question of where to report the information.

> >> Seems that current implementation doesn’t give reasonable ways to
> >> implement all that features and it is really sad.
> > 
> > Why is that?
> 
> Storing information about wait event in 4 bytes gives an ability to
> store only wait type and event. No way to store duration or extra
> information (i.e. buffer number for I/O events or buffer manager
> LWLocks). Maybe I’m missing something...

Sure, but that that's just incrementally building features?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Pavel Stehule
2016-03-11 22:48 GMT+01:00 Joel Jacobson :

> On Sat, Mar 12, 2016 at 4:41 AM, Pavel Stehule 
> wrote:
> > I afraid so you try to look on your use case as global/generic issue. The
> > PL/SQL, ADA. PL/pgSQL are verbose languages, and too shortcuts does the
> > languages dirty. In this point we have different opinion.
> >
> > I proposed some enhanced PLpgSQL API with a possibility to create a
> > extension that can enforce your requested behave. The implementation can
> not
> > be hard, and it can coverage some special/individual requests well.
>
> I'm not at all interested to discuss any of the proposed changes that
> have already been proposed,
> because we have already had lengthy discussions on them, and I doubt
> neither you nor me have nothing to add.
>
> What we need is more input on proposed changes from other companies
> who are also heavy users of PL/pgSQL.
>
> Only then can we move forward. It's like Robert is saying, there is a
> risk for bikeshedding here,
> we must widen our perspectives and get better understanding for how
> other heavy users are using PL/pgSQL.
>

I disagree with this opinion - this is community sw, not commercial. We can
do nothing if we don't find a agreement.


>
> Pavel, do you know of any such companies?
>

I know companies with pretty heavy PostgreSQL load and critical
applications, but with only ten thousands lines of PL/pgSQL. They has only
one request - stability. I talked with Oleg and with other people - and
common requests are

* session (global) variables
* global temp tables
* better checking of embedded SQL
* usual possibility to specify left part of assign statement

But these requests depends on development style - it is related to
classical usage of stored procedures - the people are interesting about it,
because they does porting from DB2 or Oracle to Postgres.

Probably the biggest company with pretty large code of PL/pgSQL was Skype,
but I have not any info about current state.

Regards

Pavel


Re: [HACKERS] Background Processes and reporting

2016-03-11 Thread Vladimir Borodin

> 12 марта 2016 г., в 0:22, Andres Freund  написал(а):
> 
> On 2016-03-11 23:53:15 +0300, Vladimir Borodin wrote:
>> It was many times stated in threads about waits monitoring [0, 1, 2]
>> and supported by different people, but ultimately waits information
>> was stored in PgBackendStatus.
> 
> Only that it isn't. It's stored in PGPROC.  

Sorry, I missed that. So monitoring of wait events for auxiliary processes 
still could be implemented?

> This criticism is true of
> the progress reporting patch, but a quick scan of the thread doesn't
> show authors of the wait events patch participating there.
> 
> 
>> Can’t we think one more time about implementation provided by Ildus
>> and Alexander here [3]?
> 
> I don't think so. Afaics the proposed patch tried to do too many things
> at once, and it's authors didn't listen well to criticism.  Trying to go
> back to that seems like a surefire way to have nothing in 9.6.

The idea is not to try implement all that at once (and more in 9.6) but give an 
ability to implement all that features eventually. If it is still possible, 
it’s great.

> 
> 
>> Seems that current implementation doesn’t give reasonable ways to
>> implement all that features and it is really sad.
> 
> Why is that?

Storing information about wait event in 4 bytes gives an ability to store only 
wait type and event. No way to store duration or extra information (i.e. buffer 
number for I/O events or buffer manager LWLocks). Maybe I’m missing something...

> 
> 
> Andres Freund


--
May the force be with you…
https://simply.name



Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Joel Jacobson
On Sat, Mar 12, 2016 at 4:48 AM, Joel Jacobson  wrote:
> neither you nor me have nothing to add.

Correction: neither you nor me have anything to add.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Joel Jacobson
On Sat, Mar 12, 2016 at 4:41 AM, Pavel Stehule  wrote:
> I afraid so you try to look on your use case as global/generic issue. The
> PL/SQL, ADA. PL/pgSQL are verbose languages, and too shortcuts does the
> languages dirty. In this point we have different opinion.
>
> I proposed some enhanced PLpgSQL API with a possibility to create a
> extension that can enforce your requested behave. The implementation can not
> be hard, and it can coverage some special/individual requests well.

I'm not at all interested to discuss any of the proposed changes that
have already been proposed,
because we have already had lengthy discussions on them, and I doubt
neither you nor me have nothing to add.

What we need is more input on proposed changes from other companies
who are also heavy users of PL/pgSQL.

Only then can we move forward. It's like Robert is saying, there is a
risk for bikeshedding here,
we must widen our perspectives and get better understanding for how
other heavy users are using PL/pgSQL.

Pavel, do you know of any such companies?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Peter Geoghegan
On Fri, Mar 11, 2016 at 1:31 PM, Peter Geoghegan  wrote:
> You could have a race, where
> there was a concurrent page deletion of the left sibling of the child
> page, then a concurrent insertion into the newly expanded keyspace of
> the parent. Therefore, the downlink in the parent (which is the
> "target", to use the patch's terminology) would not be a lower bound
> on items in the page.

Excuse me: I meant the newly expanded keyspace of the *child*. (The
parent's keyspace would have covered everything. It's naturally far
larger than either child's keyspace, since it typically has several
hundred pages.)


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Pavel Stehule
2016-03-11 22:32 GMT+01:00 Joel Jacobson :

> On Sat, Mar 12, 2016 at 4:08 AM, Robert Haas 
> wrote:
> >
> > I don't think my experience in this area is as deep as you seem to
> > think.  I can tell you that most of the requests EnterpriseDB gets for
> > PL/pgsql enhancements involve wanting it to be more like Oracle's
> > PL/SQL, which of course has very little overlap with the stuff that
> > you're interested in.
>
> Do you know who could possibly be more experienced
> with companies who are heavy users of PL/pgSQL in the community?
>
> and/or,
>
> Do you know of any companies who officially are heavy users of PL/pgSQL?
>
> The only other company I can think of is Zalado, but of course there
> are many more,
> I just wish I knew their names, because I want to compile a wish list with
> proposed changes from as many companies who are heavy users of
> PL/pgSQL as possible.
>
> That's the only way to push this forward. As you say, we need a
> consensus and input
> from a broad range of heavy users, not just from people on this list
> with feelings
> and opinions who might not actually be heavy users themselves.
>
> Of course almost everybody on this list uses PL/pgSQL from time to
> time or even daily,
> but it's a completely different thing to write an entire backend
> system in the language,
> it's first then when you start to become really excited of e.g. not
> having to type
> at least 30 characters of text every time you do an UPDATE/INSERT
> to be sure you modified exactly one row.
>

I afraid so you try to look on your use case as global/generic issue. The
PL/SQL, ADA. PL/pgSQL are verbose languages, and too shortcuts does the
languages dirty. In this point we have different opinion.

I proposed some enhanced PLpgSQL API with a possibility to create a
extension that can enforce your requested behave. The implementation can
not be hard, and it can coverage some special/individual requests well.

Regards

Pavel


Re: [HACKERS] Background Processes and reporting

2016-03-11 Thread Andres Freund

Hi,

On 2016-03-11 11:16:32 -0800, Andres Freund wrote:
> It seems rather worthwhile to think about how we can expand the coverage
> of progress tracking to other types of background processes.

WRT the progress reporting patch, I think we should split (as afaics was
discussed in the thread for a while) off the new part of PgBackendStatus
into it's own structure.

That'd not just allow using this from non-backend processes, but would
also have the advantage that the normal PgBackendStatus' changecount
doesn't advance quite so rapidly. E.g. when reporting progress of a
vacuum, the changecount will probably change at quite a rapid rate, but
that's uninteresting for e.g. pg_stat_activity.


> Similarly for the wait event stuff - checkpointer, wal writer,
> background writer are in many cases processes that very often are
> blocked on locks, IO and such.  Thus restricting the facility to
> database connected processes seems like a loss.

I think one way to address this would be to not only report
PgBackendStatus type processes in pg_stat_activity. While that'd
obviously be a compatibility break, I think it'd be an improvement.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: BSD Authentication support

2016-03-11 Thread Thomas Munro
On Sat, Mar 12, 2016 at 5:14 AM, David Steele  wrote:
> On 1/14/16 11:22 PM, Robert Haas wrote:
>> On Tue, Jan 12, 2016 at 2:27 AM, Marisa Emerson  wrote:
>>> I've attached the latest version of this patch. I've fixed up an issue with
>>> the configuration scripts that I missed.
>> Looks reasonable on a quick read-through.  Can anyone with access to a
>> BSD system review and test?
>
> Is anyone with access to/experience with BSD able to review and test
> this patch?  Seems like it would make a great addition to 9.6.

It looks like this needs review from an OpenBSD user specifically.
FreeBSD and NetBSD use PAM instead of BSD auth.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Pavel Stehule
2016-03-11 22:08 GMT+01:00 Robert Haas :

> On Fri, Mar 11, 2016 at 3:44 PM, Joel Jacobson  wrote:
> > On Fri, Mar 11, 2016 at 11:14 AM, Robert Haas 
> wrote:
> >> I'm not direly opposed to most of what's on that page,
> >> but I'm not excited about most of it, either.
> >
> > May I ask, what improvements of PL/pgSQL would you personally be most
> > excited about,
> > if you or someone else would have unlimited resources to hack on it?
> >
> >> I bet if we canvassed 10 different companies that made heavy use of
> PL/pgsql they'd all have
> >> a list of proposed changes like that, and I bet some of them would
> >> conflict with each other, and I bet if we did all that stuff the
> >> average PL/pgsql user's life would not be much better, but the manual
> >> would be much longer.
> >
> > You as a professional PostgreSQL consultant obviously have a lot of more
> > contact than me with other companies who make heavy use of PL/pgSQL.
> >
> > I'm assuming your bet on these proposed changes in conflict you talk
> about
> > are based on things you've picked up IRL from companies you've been
> > working with.
> >
> > What would you say are the top most commonly proposed changes
> > from companies that make heavy use of PL/pgSQL, and which of those are
> > in conflict?
>
> I don't think my experience in this area is as deep as you seem to
> think.  I can tell you that most of the requests EnterpriseDB gets for
> PL/pgsql enhancements involve wanting it to be more like Oracle's
> PL/SQL, which of course has very little overlap with the stuff that
> you're interested in.  But I'm not really commenting here based on
> that.  I'm just giving you my impression based on the discussion I've
> seen on the mailing list and my own personal feelings.  If there is an
> outcry for STRICT as you have proposed it, I'm not especially opposed
> to that.  I just think it needs a consensus that I haven't seen
> emerge.
>

This proposal is not bad from my perspective - but in detail these points
breaks compatibility, can have negative performance impacts or are ugly. I
understand to all points, but I am not sure, if the benefits are better
than costs (compatibility issues).

Can we talk about these points in separate thread? I can imagine few points
as extra checks. This is probably theme for 9.6, so we can discuss about it
in 9.6 release cycle.

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Joel Jacobson
On Sat, Mar 12, 2016 at 4:08 AM, Robert Haas  wrote:
>
> I don't think my experience in this area is as deep as you seem to
> think.  I can tell you that most of the requests EnterpriseDB gets for
> PL/pgsql enhancements involve wanting it to be more like Oracle's
> PL/SQL, which of course has very little overlap with the stuff that
> you're interested in.

Do you know who could possibly be more experienced
with companies who are heavy users of PL/pgSQL in the community?

and/or,

Do you know of any companies who officially are heavy users of PL/pgSQL?

The only other company I can think of is Zalado, but of course there
are many more,
I just wish I knew their names, because I want to compile a wish list with
proposed changes from as many companies who are heavy users of
PL/pgSQL as possible.

That's the only way to push this forward. As you say, we need a
consensus and input
from a broad range of heavy users, not just from people on this list
with feelings
and opinions who might not actually be heavy users themselves.

Of course almost everybody on this list uses PL/pgSQL from time to
time or even daily,
but it's a completely different thing to write an entire backend
system in the language,
it's first then when you start to become really excited of e.g. not
having to type
at least 30 characters of text every time you do an UPDATE/INSERT
to be sure you modified exactly one row.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-03-11 Thread Tom Lane
So I started re-reading this thread in preparation for looking at the
patch, and this bit in your initial message jumped out at me:

> In all of our join algorithms in the executor, if the join type is SEMI,
> we skip to the next outer row once we find a matching inner row. This is
> because we don't want to allow duplicate rows in the inner side to
> duplicate outer rows in the result set. Obviously this is required per SQL
> spec. I believe we can also skip to the next outer row in this case when
> we've managed to prove that no other row can possibly exist that matches
> the current outer row, due to a unique index or group by/distinct clause
> (for subqueries).

I wondered why, instead of inventing an extra semantics-modifying flag,
we couldn't just change the jointype to *be* JOIN_SEMI when we've
discovered that the inner side is unique.

Now of course this only works if the join type was INNER to start with.
If it was a LEFT join, you'd need an "outer semi join" jointype which
we haven't got at the moment.  But I wonder whether inventing that
jointype wouldn't let us arrive at a less messy handling of things in
the executor and EXPLAIN.  I'm not very enamored of plastering this
"match_first_tuple_only" flag on every join, in part because it doesn't
appear to have sensible semantics for other jointypes such as JOIN_RIGHT.
And I'd really be happier to see the information reflected by join type
than a new line in EXPLAIN, also.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Peter Geoghegan
On Thu, Mar 10, 2016 at 9:18 AM, Tomas Vondra
 wrote:
> I've looked at this patch today, mostly to educate myself, so this
> probably should not count as a full review. Anyway, the patch seems in
> excellent shape - it'd be great if all patches (including those written
> by me) had this level of comments/docs.

Thanks. I try. This patch is something I've been working on slowly and
steadily for over 2 years. It was time to see it through.

>> Note that no function currently checks that the index is consistent
>> with the heap, which would be very useful (that's probably how I'd
>> eventually target the heapam, actually).
>
> I'm afraid I don't understand what "target the heapam" means. Could you
> elaborate?

Just that that's how I'd make amcheck verify that the heap was sane,
if I was going to undertake that project. Or, I'd start there.

> I agree. It'd be nice to have a tool that also checks for data
> corruption the a lower level (e.g. that varlena headers are not
> corrupted etc.) but that seems like a task for some other tool.

I'm not sure that that's a task for another tool. I think that this
tool is scoped at detecting corruption, and that could work well for
the heap, too. There might be fewer interesting things to test about
the heap when indexes aren't involved. Following HOT chains through an
index, and verifying things like that about the heap as we go could
detect a lot of problematic cases.

> Can we come up with names that more clearly identify the difference
> between those two functions? I mean, _parent_ does not make it
> particularly obvious that the second function acquires exclusive lock
> and performs more thorough checks.

Dunno about that. It's defining characteristic is that it checks child
pages against their parent IMV. Things are not often defined in terms
of their locking requirements.

> This also applies to the name of the contrib module - it's not
> particularly obvious what "amcheck" unless you're familiar with the
> concept of access methods. Which is quite unlikely for regular users.
> Maybe something like "check_index" would be more illustrative?

I think that given the heap may be targeted in the future, amcheck is
more future-proof. I think Robert might have liked that name a year
ago or something. In general, I'm not too worried about what the
contrib module is called in the end.

> Well, I wouldn't count myself as an expert here, but do we actually need
> such protection? I mean, we only do such checks when holding an
> exclusive lock on the parent relation, no? And even if we don't vacuum
> can only remove entries from the pages - why should that cause
> violations of any invariants?

I think that a more worked out explanation for why the ExclusiveLock
is needed is appropriate. I meant to do that.

Basically, a heavier lock is needed because of page deletion by
VACUUM, which is the major source of complexity (much more than page
splits, I'd say). In general, the key space can be consolidated by
VACUUM in a way that breaks child page checking because the downlink
we followed from our target page is no longer the current downlink.
Page deletion deletes the right sibling's downlink, and the deleted
page's downlink is used for its sibling. You could have a race, where
there was a concurrent page deletion of the left sibling of the child
page, then a concurrent insertion into the newly expanded keyspace of
the parent. Therefore, the downlink in the parent (which is the
"target", to use the patch's terminology) would not be a lower bound
on items in the page.

That's a very low probability race, because it involves deletion a
page representing a range in the keyspace that has no live items, but
then immediately getting one just in time for our check. But, I'm
pretty determined that false positives like that need to be impossible
(or else it's a bug).

I have a paranoid feeling that there is a similar very low probability
race with the left-right keyspace checks, which don't have relation
ExclusiveLock protection (IOW, I think that that might be buggy). I
need to think about that some more, but current thinking is that it
would hardly matter if we used the highkey from right page rather than
the first data item, which definitely would be correct. And it would
be simpler.

> A few minor comments:

Thanks for catching those issues. Will fix.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-11 Thread Robert Haas
On Thu, Mar 10, 2016 at 8:54 PM, Petr Jelinek  wrote:
> I am not talking about extension locks, the lock queue can be long because
> there is concurrent DDL for example and then once DDL finishes suddenly 100
> connections that tried to insert into table will try to get extension lock
> and this will add 2000 new pages when much fewer was actually needed. I
> guess that's fine as it's corner case and it's only 16MB even in such
> extreme case.

I don't really understand this part about concurrent DDL.  If there
were concurrent DDL going on, presumably other backends would be
blocked on the relation lock, not the relation extension lock - and it
doesn't seem likely that you'd often have a huge pile-up of inserters
waiting on concurrent DDL.  But I guess it could happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Background Processes and reporting

2016-03-11 Thread Andres Freund
On 2016-03-11 23:53:15 +0300, Vladimir Borodin wrote:
> It was many times stated in threads about waits monitoring [0, 1, 2]
> and supported by different people, but ultimately waits information
> was stored in PgBackendStatus.

Only that it isn't. It's stored in PGPROC.  This criticism is true of
the progress reporting patch, but a quick scan of the thread doesn't
show authors of the wait events patch participating there.


> Can’t we think one more time about implementation provided by Ildus
> and Alexander here [3]?

I don't think so. Afaics the proposed patch tried to do too many things
at once, and it's authors didn't listen well to criticism.  Trying to go
back to that seems like a surefire way to have nothing in 9.6.


> Seems that current implementation doesn’t give reasonable ways to
> implement all that features and it is really sad.

Why is that?


Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Peter Geoghegan
On Fri, Mar 11, 2016 at 1:09 PM, Peter Geoghegan  wrote:
> Or, you could add code like this to comparetup_index_btree(), to
> simulate a broken opclass:
>
> diff --git a/src/backend/utils/sort/tuplesort.c
> b/src/backend/utils/sort/tuplesort.c
> index 67d86ed..23712ff 100644
> --- a/src/backend/utils/sort/tuplesort.c
> +++ b/src/backend/utils/sort/tuplesort.c
> @@ -3562,6 +3562,9 @@ comparetup_index_btree(const SortTuple *a, const
> SortTuple *b,
> compare = ApplySortComparator(a->datum1, a->isnull1,
>
> b->datum1, b->isnull1,
>   sortKey);
> +
> +   if (random() <= (MAX_RANDOM_VALUE / 1000))
> +   compare = -compare;
> if (compare != 0)
> return compare;


Note that this patch that I sketched would make CREATE INDEX produce
corrupt indexes, but the tool's verification itself would not be
affected. Although even if it was (even if _bt_compare() gave the same
wrong answers), it would still very likely detect corruption.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Peter Geoghegan
On Fri, Mar 11, 2016 at 8:24 AM, Michael Paquier
 wrote:
> You can use for example dd in non-truncate mode to corrupt on-disk
> page data, say that for example:
> dd if=/dev/random bs=8192 count=1 \
> seek=$BLOCK_ID of=base/$DBOID/$RELFILENODE \
> conv=notrunc

Sure, but that would probably fail at the first hurdle -- the page
header would be corrupt. Which is a valid test, but not all that
interesting.

One testing workflow I tried is overwriting some page in a B-Tree
relfilenode with some other page in the same file:

$:~/pgdata/base/12413$ dd if=somefile of=somefile conv=notrunc bs=8192
count=1 skip=2 seek=3

That should fail due to the key space not being in order across pages,
which is slightly interesting. Or, you could selectively change one
item with a hex editor, as Anastasia did.

Or, you could add code like this to comparetup_index_btree(), to
simulate a broken opclass:

diff --git a/src/backend/utils/sort/tuplesort.c
b/src/backend/utils/sort/tuplesort.c
index 67d86ed..23712ff 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3562,6 +3562,9 @@ comparetup_index_btree(const SortTuple *a, const
SortTuple *b,
compare = ApplySortComparator(a->datum1, a->isnull1,

b->datum1, b->isnull1,
  sortKey);
+
+   if (random() <= (MAX_RANDOM_VALUE / 1000))
+   compare = -compare;
if (compare != 0)
return compare;

There are many options when you want to produce a corrupt B-Tree index!

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Robert Haas
On Fri, Mar 11, 2016 at 3:44 PM, Joel Jacobson  wrote:
> On Fri, Mar 11, 2016 at 11:14 AM, Robert Haas  wrote:
>> I'm not direly opposed to most of what's on that page,
>> but I'm not excited about most of it, either.
>
> May I ask, what improvements of PL/pgSQL would you personally be most
> excited about,
> if you or someone else would have unlimited resources to hack on it?
>
>> I bet if we canvassed 10 different companies that made heavy use of PL/pgsql 
>> they'd all have
>> a list of proposed changes like that, and I bet some of them would
>> conflict with each other, and I bet if we did all that stuff the
>> average PL/pgsql user's life would not be much better, but the manual
>> would be much longer.
>
> You as a professional PostgreSQL consultant obviously have a lot of more
> contact than me with other companies who make heavy use of PL/pgSQL.
>
> I'm assuming your bet on these proposed changes in conflict you talk about
> are based on things you've picked up IRL from companies you've been
> working with.
>
> What would you say are the top most commonly proposed changes
> from companies that make heavy use of PL/pgSQL, and which of those are
> in conflict?

I don't think my experience in this area is as deep as you seem to
think.  I can tell you that most of the requests EnterpriseDB gets for
PL/pgsql enhancements involve wanting it to be more like Oracle's
PL/SQL, which of course has very little overlap with the stuff that
you're interested in.  But I'm not really commenting here based on
that.  I'm just giving you my impression based on the discussion I've
seen on the mailing list and my own personal feelings.  If there is an
outcry for STRICT as you have proposed it, I'm not especially opposed
to that.  I just think it needs a consensus that I haven't seen
emerge.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Background Processes and reporting

2016-03-11 Thread Vladimir Borodin

> 11 марта 2016 г., в 22:16, Andres Freund  написал(а):
> 
> Hi,
> 
> We now have "Provide much better wait information in pg_stat_activity"
> and "Add a generic command progress reporting facility" making it easier
> to provide insight into the system.
> 
> 
> While working on the writeback control / checkpoint sorting patch I'd
> the following statement in BufferSync()'s main loop:
> 
>fprintf(stderr, "\33[2K\rto_scan: %d, scanned: %d, %%processed: %.2f, 
> %%writeouts: %.2f",
>num_to_scan, num_processed,
>(((double) num_processed) / num_to_scan) * 100,
>((double) num_written / num_processed) * 100);
> 
> which basically printed the progress of a checkpoint, and some
> additional detail to stderr. Quite helpful to see whether progress is
> "unsteady".
> 
> Obviously that's not something that could be committed.
> 
> So I'm wondering how we can make it possible to use the aforementioned
> "progress reporting facility" to monitor checkpoint progress. To which
> Robert replied on IM:
> "it wouldn't quite help with that because the checkpointer doesn't show
> up as a regular backend"
> 
> 
> It seems rather worthwhile to think about how we can expand the coverage
> of progress tracking to other types of background processes.
> 
> 
> Similarly for the wait event stuff - checkpointer, wal writer,
> background writer are in many cases processes that very often are
> blocked on locks, IO and such.  Thus restricting the facility to
> database connected processes seems like a loss.

It was many times stated in threads about waits monitoring [0, 1, 2] and 
supported by different people, but ultimately waits information was stored in 
PgBackendStatus. Can’t we think one more time about implementation provided by 
Ildus and Alexander here [3]? That implementation included 1. waits monitoring 
for all process, 2. ability to trace waits of a particular process to file, 3. 
wait events history with sampling every N ms and 4. configurable measurement of 
wait timings. It has much more features, has been used in production on 100+ 
databases (patched 9.4) and gives wide opportunities for further development. 
Of course, huge work should be done to rebase across current master and cleanup 
but IMHO it is much better approach. Seems that current implementation doesn’t 
give reasonable ways to implement all that features and it is really sad.

[0] http://www.postgresql.org/message-id/55a3acc3.6020...@postgrespro.ru
[1] http://www.postgresql.org/message-id/55c3306b.5010...@postgrespro.ru
[2] 
http://www.postgresql.org/message-id/9a99c2a7-1760-419f-bdc9-a2cf99ecd...@simply.name
[3] http://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru

> 
> 
> Greetings,
> 
> Andres Freund
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you…
https://simply.name



Re: [HACKERS] [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.

2016-03-11 Thread Joel Jacobson
On Fri, Mar 11, 2016 at 11:14 AM, Robert Haas  wrote:
> I'm not direly opposed to most of what's on that page,
> but I'm not excited about most of it, either.

May I ask, what improvements of PL/pgSQL would you personally be most
excited about,
if you or someone else would have unlimited resources to hack on it?

> I bet if we canvassed 10 different companies that made heavy use of PL/pgsql 
> they'd all have
> a list of proposed changes like that, and I bet some of them would
> conflict with each other, and I bet if we did all that stuff the
> average PL/pgsql user's life would not be much better, but the manual
> would be much longer.

You as a professional PostgreSQL consultant obviously have a lot of more
contact than me with other companies who make heavy use of PL/pgSQL.

I'm assuming your bet on these proposed changes in conflict you talk about
are based on things you've picked up IRL from companies you've been
working with.

What would you say are the top most commonly proposed changes
from companies that make heavy use of PL/pgSQL, and which of those are
in conflict?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-11 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 11, 2016 at 1:11 PM, David Steele  wrote:
>> Is anyone willing to volunteer a review or make an argument for the
>> importance of this patch?

> There's been a lot of discussion on another thread about this patch.
> The subject is "The plan for FDW-based sharding", but the thread kind
> of got partially hijacked by this issue.  The net-net of that is that
> I don't think we have a clear enough idea about where we're going with
> global transaction management to make it a good idea to adopt an API
> like this.  For example, if we later decide we want to put the
> functionality in core, will we keep the hooks around for the sake of
> alternative non-core implementations?  I just don't believe this
> technology is nearly mature enough to commit to at this point.
> Konstantin does not agree with my assessment, perhaps unsurprisingly.

I re-read the original thread,

http://www.postgresql.org/message-id/flat/f2766b97-555d-424f-b29f-e0ca0f6d1...@postgrespro.ru

I think there is no question that this is an entirely immature patch.
Not coping with subtransactions is alone sufficient to make it not
credible for production.

Even if the extension API were complete and clearly stable, I have doubts
that there's any great value in integrating it into 9.6, rather than some
later release series.  The above thread makes it clear that pg_dtm is very
much WIP and has easily a year's worth of work before anybody would think
of wanting to deploy it.  So end users don't need this patch in 9.6, and
developers working on pg_dtm shouldn't really have much of a problem
applying the patch locally --- how likely is it that they'd be using a
perfectly stock build of the database apart from this patch?

But my real takeaway from that thread is that there's no great reason
to believe that this API definition *is* stable.  The single existing
use-case is very far from completion, and it's hardly unlikely that
what it needs will change.


I also took a very quick look at the patch itself:

1. No documentation.  For something that purports to be an API
specification, really the documentation should have been written *first*.

2. As noted in the cited thread, it's not clear that
Get/SetTransactionStatus are a useful cutpoint; they don't provide any
real atomicity guarantees.

3. Uh, how can you hook GetNewTransactionId but not ReadNewTransactionId?

4. There seems to be an intention to encapsulate snapshots, but surely
wrapping hooks around GetSnapshotData and XidInMVCCSnapshot is not nearly
enough for that.  Look at all the knowledge snapmgr.c has about snapshot
representation, for example.  And is a function like GetOldestXmin even
meaningful with a different notion of what snapshots are?  (For that
matter, is TransactionId == uint32 still tenable for any other notion
of snapshots?)

5. BTW, why would you hook at XidInMVCCSnapshot rather than making use of
the existing capability to have a custom SnapshotSatisfiesFunc snapshot
checker function?


IMO this is not committable as-is, and I don't think that it's something
that will become committable during this 'fest.  I think we'd be well
advised to boot it to the 2016-09 CF and focus our efforts on other stuff
that has a better chance of getting finished this month.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-03-11 Thread Vitaly Burovoy
On 3/11/16, Tom Lane  wrote:
> [ catches up with thread... ]
>
> Yes.  It would be more reasonable IMO for to_date to throw an error
> because this is bad input.  On the other hand, to_date mostly doesn't
> throw an error no matter how bad the input is.  I think that may have
> been intentional, although its habit of producing garbage output
> instead (and not, say, NULL) is certainly not very happy-making.
>
> It's a bit schizophrenic for this patch to be both adding ereport's
> for year zero (thereby breaking the no-failure-on-bad-input policy)
> *and* trying to produce sane output for arguably-insane input.
>
> I don't really see an argument why '0001-00-00' should be accepted
> but '-01-01' should throw an error, but that would be the result
> if we take this patch.

Well. In case of zero year it could return the first year instead of
an exception by the same way as "MM" and "DD" do it...

> And I quite agree with Robert that it's insane
> to consider '-2-06-01' as satisfying the format '-MM-DD'.  The
> fact that it even appears to do something related to a BC year is
> an implementation artifact, and not a very nice one.
>
> I would be in favor of a ground-up rewrite of to_date and friends, with
> some better-stated principles (in particular, a rationale why they even
> exist when date_in and friends usually do it better)

I think they exist because date_in can't convert something like
"IYYY-IDDD" (I wonder if date_in can do so) or to parse dates/stamps
independent from the "DateStyle" parameter.

> and crisper error
> detection.  But I'm not seeing the argument that hacking at the margins
> like this moves us forward on either point; what it does do is create
> another backward-compatibility hazard for any such rewrite.
>
> In short, I vote with Robert to reject this patch.

Accepted. Let's agree it is a case "garbage in, garbage out" and "an
implementation artifact".

> BTW, the context for the original report wasn't clear,

The context was to make "extract" and "to_date"/"to_timestamp" be
consistently reversible for "year"/"" for negative values (since
both of them support ones).

> but I wonder how
> much of the actual problem could be addressed by teaching make_date()
> and friends to accept negative year values as meaning BC.
>
>   regards, tom lane

Thank Thomas, Robert and Tom very much for an interesting (but short)
discussion.
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Saving SRF context

2016-03-11 Thread Salvador Fandiño

Hi,

I have implemented a SRF[*] that returns rows one by one (using 
ExprMultipleResult). But now I need to save somewhere some context 
information between calls pertaining to the same result set and I am 
unable to find a proper place for that.


I have seen that cfinfo->flinfo->fn_extra is available, but if I 
understand the mechanics of FmgrInfo correctly, that structure is not 
unique for call frame. It could be shared between several cfinfo 
structures and would be crushed on recursion (I plan to use SPI from my 
function).


So, is that right? should I build a stack there? is there any other 
place where to store context information relative to a SRF call frame?


Regards


* Code is here: https://github.com/salva/plswipl, I am trying to provide 
a procedural language using SWI-Prolog.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Change error code for hstore syntax error

2016-03-11 Thread Tom Lane
Sherrylyn Branchaw  writes:
> The hstore module uses elog() to default to ERRCODE_INTERNAL_ERROR
> (SQLSTATE XX000) when the error message reads "Syntax error near '%c' at
> position %d".

Yeah, that is entirely bogus.  No user-facing error report should ever
return ERRCODE_INTERNAL_ERROR.

> I propose to switch to ereport() to return ERRCODE_SYNTAX_ERROR (SQLSTATE
> 42601), on the grounds that it's more transparent.

Hm, class 42 is generally meant for SQL-level syntax errors.  Are these
errors not coming from subroutines of hstore_in()?  I think our usual
convention is to use ERRCODE_INVALID_TEXT_REPRESENTATION for complaints
that a data value does not meet its type's conventions.  In any case
it seems closer to class 22 than 42.

While you're at it, please make the error message texts conform to
our style guidelines:
http://www.postgresql.org/docs/devel/static/error-style-guide.html

These seem to have multiple problems, starting with incorrect
capitalization and extending to failure to quote the whole string
being complained of.

In short, though, +1 for improving this stuff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Change error code for hstore syntax error

2016-03-11 Thread Sherrylyn Branchaw
The hstore module uses elog() to default to ERRCODE_INTERNAL_ERROR
(SQLSTATE XX000) when the error message reads "Syntax error near '%c' at
position %d".

I propose to switch to ereport() to return ERRCODE_SYNTAX_ERROR (SQLSTATE
42601), on the grounds that it's more transparent. It took me longer to
figure out what error code was being returned than to write both the patch
and my originally intended function using the patch.

I also propose to raise the same error code for "Unexpected end of string"
in hstore, since it serves the same purpose, at least in my mind:
differentiating between valid and invalid hstore syntax.

Any objections to my submitting the attached patch to the September
commitfest?
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 0c1d99a..0cbc865 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -81,7 +81,10 @@ get_val(HSParser *state, bool ignoreeq, bool *escaped)
}
else if (*(state->ptr) == '=' && !ignoreeq)
{
-   elog(ERROR, "Syntax error near '%c' at position 
%d", *(state->ptr), (int32) (state->ptr - state->begin));
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+(errmsg("Syntax error near 
'%c' at position %d",
+ *(state->ptr), (int32) (state->ptr - 
state->begin);
}
else if (*(state->ptr) == '\\')
{
@@ -138,7 +141,9 @@ get_val(HSParser *state, bool ignoreeq, bool *escaped)
}
else if (*(state->ptr) == '\0')
{
-   elog(ERROR, "Unexpected end of string");
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+(errmsg("Unexpected end of 
string";
}
else
{
@@ -150,7 +155,9 @@ get_val(HSParser *state, bool ignoreeq, bool *escaped)
else if (st == GV_WAITESCIN)
{
if (*(state->ptr) == '\0')
-   elog(ERROR, "Unexpected end of string");
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+(errmsg("Unexpected end of 
string";
RESIZEPRSBUF;
*(state->cur) = *(state->ptr);
state->cur++;
@@ -159,7 +166,9 @@ get_val(HSParser *state, bool ignoreeq, bool *escaped)
else if (st == GV_WAITESCESCIN)
{
if (*(state->ptr) == '\0')
-   elog(ERROR, "Unexpected end of string");
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+(errmsg("Unexpected end of 
string";
RESIZEPRSBUF;
*(state->cur) = *(state->ptr);
state->cur++;
@@ -216,11 +225,16 @@ parse_hstore(HSParser *state)
}
else if (*(state->ptr) == '\0')
{
-   elog(ERROR, "Unexpected end of string");
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+(errmsg("Unexpected end of 
string";
}
else if (!isspace((unsigned char) *(state->ptr)))
{
-   elog(ERROR, "Syntax error near '%c' at position 
%d", *(state->ptr), (int32) (state->ptr - state->begin));
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+(errmsg("Syntax error near 
'%c' at position %d",
+ *(state->ptr), (int32) (state->ptr - 
state->begin);
}
}
else if (st == WGT)
@@ -231,17 +245,24 @@ parse_hstore(HSParser *state)
}
else if (*(state->ptr) == '\0')
{
-   elog(ERROR, "Unexpected end of string");
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+(errmsg("Unexpected end of 
string";

Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-03-11 Thread Tom Lane
Robert Haas  writes:
> On Sun, Feb 28, 2016 at 9:38 PM, Vitaly Burovoy
>  wrote:
>>> However, I'm not sure we ought to tinker with the behavior in this
>>> area.  If -MM-DD is going to accept things that are not of the
>>> format -MM-DD, and I'd argue that -1-06-01 is not in that format,

>> It is not about format, it is about values.

> I disagree.  In a format like "-1-06-01", you want the first minus to
> indicate negation and the other two to be a separator.  That's not
> very far away from wanting the database to read your mind.

[ catches up with thread... ]

Yes.  It would be more reasonable IMO for to_date to throw an error
because this is bad input.  On the other hand, to_date mostly doesn't
throw an error no matter how bad the input is.  I think that may have
been intentional, although its habit of producing garbage output
instead (and not, say, NULL) is certainly not very happy-making.

It's a bit schizophrenic for this patch to be both adding ereport's
for year zero (thereby breaking the no-failure-on-bad-input policy)
*and* trying to produce sane output for arguably-insane input.

I don't really see an argument why '0001-00-00' should be accepted
but '-01-01' should throw an error, but that would be the result
if we take this patch.  And I quite agree with Robert that it's insane
to consider '-2-06-01' as satisfying the format '-MM-DD'.  The
fact that it even appears to do something related to a BC year is
an implementation artifact, and not a very nice one.

I would be in favor of a ground-up rewrite of to_date and friends, with
some better-stated principles (in particular, a rationale why they even
exist when date_in and friends usually do it better) and crisper error
detection.  But I'm not seeing the argument that hacking at the margins
like this moves us forward on either point; what it does do is create
another backward-compatibility hazard for any such rewrite.

In short, I vote with Robert to reject this patch.

BTW, the context for the original report wasn't clear, but I wonder how
much of the actual problem could be addressed by teaching make_date()
and friends to accept negative year values as meaning BC.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Background Processes and reporting

2016-03-11 Thread Andres Freund
Hi,

We now have "Provide much better wait information in pg_stat_activity"
and "Add a generic command progress reporting facility" making it easier
to provide insight into the system.


While working on the writeback control / checkpoint sorting patch I'd
the following statement in BufferSync()'s main loop:

fprintf(stderr, "\33[2K\rto_scan: %d, scanned: %d, %%processed: %.2f, 
%%writeouts: %.2f",
num_to_scan, num_processed,
(((double) num_processed) / num_to_scan) * 100,
((double) num_written / num_processed) * 100);

which basically printed the progress of a checkpoint, and some
additional detail to stderr. Quite helpful to see whether progress is
"unsteady".

Obviously that's not something that could be committed.

So I'm wondering how we can make it possible to use the aforementioned
"progress reporting facility" to monitor checkpoint progress. To which
Robert replied on IM:
"it wouldn't quite help with that because the checkpointer doesn't show
up as a regular backend"


It seems rather worthwhile to think about how we can expand the coverage
of progress tracking to other types of background processes.


Similarly for the wait event stuff - checkpointer, wal writer,
background writer are in many cases processes that very often are
blocked on locks, IO and such.  Thus restricting the facility to
database connected processes seems like a loss.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] insufficient qualification of some objects in dump files

2016-03-11 Thread David Steele
Hi Peter,

On 2/26/16 1:30 AM, Tom Lane wrote:
> Michael Paquier  writes:
>> On Fri, Feb 26, 2016 at 9:47 AM, Peter Eisentraut  wrote:
>>> Tom thought this might require an archive version dump, but I'm not
>>> sure.  The tags are more of an informational string for human
>>> consumption, not strictly part of the archive format.
> 
>> Hm, the TOC entry, with its tag changed, is part of the dump, and this
>> is written in the archive, but the shape of TocEntry does not change
>> so this is really debatable.
> 
> I had in mind that we would add a separate field for tag's schema name to
> TocEntry, which surely would require an archive format number bump.
> As the patch is presented, I agree with Peter that it does not really
> need a format number bump.  The question that has to be answered is
> whether this solution is good enough?  You could not trust it for
> automated processing of tags --- it's easy to think of cases in which the
> schema/object name separation would be ambiguous.  So is the tag really
> "strictly for human consumption"?  I'm not sure about that.

It looks like there is still some discussion to be had here about
whether a "human readable" solution is enough.

Until that's resolved I've marked this patch "Waiting on Author".

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-11 Thread David Steele
On 3/11/16 2:00 PM, Oleg Bartunov wrote:
> On Fri, Mar 11, 2016 at 7:11 PM, David Steele  I'm concerned about the lack of response or reviewers for this patch.
> It may be because everyone believes they had their say on the original
> thread, or because it seems like a big change to go into the last CF, or
> for other reasons altogether.
> 
> 
> We'll prepare easy setup to play with our solutions, so any developers
> could see how it works.  Hope this weekend we'll post something about this.

OK, then for now I'm marking this "waiting for author."  You can switch
it back to "needs review" once you have posted additional material.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-11 Thread Oleg Bartunov
On Fri, Mar 11, 2016 at 7:11 PM, David Steele  wrote:

> On 2/10/16 12:50 PM, Konstantin Knizhnik wrote:
>
> > PostgresProffesional cluster teams wants to propose new version of
> > eXtensible Transaction Manager API.
> > Previous discussion concerning this patch can be found here:
> >
> >
> http://www.postgresql.org/message-id/f2766b97-555d-424f-b29f-e0ca0f6d1...@postgrespro.ru
>
> I see a lot of discussion on this thread but little in the way of
> consensus.
>
> > The API patch itself is small enough, but we think that it will be
> > strange to provide just API without examples of its usage.
>
> It's not all that small, though it does apply cleanly even after a few
> months.  At least that indicates there is not a lot of churn in this area.
>
> I'm concerned about the lack of response or reviewers for this patch.
> It may be because everyone believes they had their say on the original
> thread, or because it seems like a big change to go into the last CF, or
> for other reasons altogether.
>

We'll prepare easy setup to play with our solutions, so any developers
could see how it works.  Hope this weekend we'll post something about this.



>
> I think you should try to make it clear why this patch would be a win
> for 9.6.
>

Looks like discussion shifted to different thread, we'll answer here.



>
> Is anyone willing to volunteer a review or make an argument for the
> importance of this patch?
>
> --
> -David
> da...@pgmasters.net
>
>


Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-03-11 Thread David Steele
Hi Filip,

On 2/20/16 8:00 AM, Filip Rembiałkowski wrote:
> On Fri, Feb 19, 2016 at 10:09 PM, Catalin Iacob  On 2/9/16, Tom Lane >
> wrote:
> > FWIW, I think it would be a good thing if the NOTIFY statement syntax 
> were
> > not remarkably different from the syntax used in the pg_notify() 
> function
> > call.  To do otherwise would certainly be confusing.  So on the whole
> > I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option.
> 
> Filip, do you agree with Tom's proposal? Do you plan to rework the
> patch on these lines? If you are I'll try to review it, if not I could
> give it a shot as I'm interested in having this in 9.6.
> 
> I see that Tom's remarks give more flexibility, and your refinement
> makes sense.

It looks like we are waiting on a new patch from you before this can be
reviewed.  Are you close to having that done?

Meanwhile, I have marked it "Waiting on Author".

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-11 Thread David Steele
On 3/11/16 1:30 PM, Robert Haas wrote:

> There's been a lot of discussion on another thread about this patch.
> The subject is "The plan for FDW-based sharding", but the thread kind
> of got partially hijacked by this issue.  The net-net of that is that
> I don't think we have a clear enough idea about where we're going with
> global transaction management to make it a good idea to adopt an API
> like this.  For example, if we later decide we want to put the
> functionality in core, will we keep the hooks around for the sake of
> alternative non-core implementations?  I just don't believe this
> technology is nearly mature enough to commit to at this point.

Ah yes, I forgot about the related discussion on that thread.  Pasting
here for reference:

http://www.postgresql.org/message-id/20160223164335.ga11...@momjian.us

> Konstantin does not agree with my assessment, perhaps unsurprisingly.

I'm certainly no stranger to feeling strongly about a patch!

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-11 Thread Robert Haas
On Fri, Mar 11, 2016 at 1:11 PM, David Steele  wrote:
> On 2/10/16 12:50 PM, Konstantin Knizhnik wrote:
>> PostgresProffesional cluster teams wants to propose new version of
>> eXtensible Transaction Manager API.
>> Previous discussion concerning this patch can be found here:
>>
>> http://www.postgresql.org/message-id/f2766b97-555d-424f-b29f-e0ca0f6d1...@postgrespro.ru
>
> I see a lot of discussion on this thread but little in the way of consensus.
>
>> The API patch itself is small enough, but we think that it will be
>> strange to provide just API without examples of its usage.
>
> It's not all that small, though it does apply cleanly even after a few
> months.  At least that indicates there is not a lot of churn in this area.
>
> I'm concerned about the lack of response or reviewers for this patch.
> It may be because everyone believes they had their say on the original
> thread, or because it seems like a big change to go into the last CF, or
> for other reasons altogether.
>
> I think you should try to make it clear why this patch would be a win
> for 9.6.
>
> Is anyone willing to volunteer a review or make an argument for the
> importance of this patch?

There's been a lot of discussion on another thread about this patch.
The subject is "The plan for FDW-based sharding", but the thread kind
of got partially hijacked by this issue.  The net-net of that is that
I don't think we have a clear enough idea about where we're going with
global transaction management to make it a good idea to adopt an API
like this.  For example, if we later decide we want to put the
functionality in core, will we keep the hooks around for the sake of
alternative non-core implementations?  I just don't believe this
technology is nearly mature enough to commit to at this point.

Konstantin does not agree with my assessment, perhaps unsurprisingly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] remove wal_level archive

2016-03-11 Thread David Steele
On 2/8/16 2:34 PM, Alvaro Herrera wrote:
> Peter Eisentraut wrote:
>> On 1/26/16 10:56 AM, Simon Riggs wrote:
>>> Removing one of "archive" or "hot standby" will just cause confusion and
>>> breakage, so neither is a good choice for removal.
>>>
>>> What we should do is 
>>> 1. Map "archive" and "hot_standby" to one level with a new name that
>>> indicates that it can be used for both/either backup or replication.
>>>   (My suggested name for the new level is "replica"...)
>>> 2. Deprecate "archive" and "hot_standby" so that those will be removed
>>> in a later release.
>>
>> Updated patch to reflect these suggestions.
> 
> I wonder if the "keep one / keep both" argument is running in circles as
> new reviewers arrive at the thread.  Perhaps somebody could read the
> whole thread(s) and figure out a way to find consensus so that we move
> forward on this.

There was a lot of argument upstream about whether to keep 'hot_standby'
or 'archive' but after the proposal to change it to 'replica' came up
everybody seemed to fall in line with that.

+1 from me for using 'replica' as the WAL level to replace 'hot_standby'
and 'archive'.

+1 from me for removing the 'hot_standby' and 'archive' options entirely
in 9.6 rather than deprecating.

Unless anyone has objections I would like to mark this 'ready for
committer'.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] WIP: Access method extendability

2016-03-11 Thread Oleg Bartunov
On Wed, Mar 9, 2016 at 8:31 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi!
>
> On Wed, Mar 9, 2016 at 3:27 PM, Alvaro Herrera 
> wrote:
>
>> Hi.  As I just said to Tomas Vondra: since your patch creates a new
>> object type, please make sure to add a case to cover it in the
>> object_address.sql test.  That verifies some things such as
>> pg_identify_object and related.
>>
>
> Good catch, thanks! Tests were added.
> I also introduced numbering into patch names to make evident the order to
> their application.
>
>
Nice to see progress ! I hope to see Alexander' work in 9.6.

I and Teodor will show at PGCon new GIN AM as an extension, optimized for
full text search (millisecond FTS) , which we gave up to push into core.


> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Inconsistent error handling in START_REPLICATION command

2016-03-11 Thread David Steele
On 3/11/16 1:11 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Fri, Mar 11, 2016 at 11:36 AM, David Steele  wrote:
>>> It looks like a decision needs to be made here whether to apply this patch,
>>> send it back to the author, or reject it so I'm marking it "Ready for
>>> Committer".
>>>
>>> Robert, since you were participating in this conversation can you have a
>>> look?
> 
>> Who, me?  *looks around*
>>
>> OK, I had a look.  I don't think this is a bug.
> 
> I agree, and Craig's complaint that this change reduces flexibility
> for plugins seems to tilt the balance against it.  I see that Alex
> himself accepted that argument in his last message in the thread
> ().
>
> So let's mark it rejected and move on.

+1 and done.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Inconsistent error handling in START_REPLICATION command

2016-03-11 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 11, 2016 at 11:36 AM, David Steele  wrote:
>> It looks like a decision needs to be made here whether to apply this patch,
>> send it back to the author, or reject it so I'm marking it "Ready for
>> Committer".
>> 
>> Robert, since you were participating in this conversation can you have a
>> look?

> Who, me?  *looks around*
> 
> OK, I had a look.  I don't think this is a bug.

I agree, and Craig's complaint that this change reduces flexibility
for plugins seems to tilt the balance against it.  I see that Alex
himself accepted that argument in his last message in the thread
().
So let's mark it rejected and move on.

> unexpected PQresultStatus: 8

It is a bit annoying that psql doesn't print "COPY_BOTH" here.
But fixing that is not this patch, either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] eXtensible Transaction Manager API (v2)

2016-03-11 Thread David Steele
On 2/10/16 12:50 PM, Konstantin Knizhnik wrote:

> PostgresProffesional cluster teams wants to propose new version of
> eXtensible Transaction Manager API.
> Previous discussion concerning this patch can be found here:
> 
> http://www.postgresql.org/message-id/f2766b97-555d-424f-b29f-e0ca0f6d1...@postgrespro.ru

I see a lot of discussion on this thread but little in the way of consensus.

> The API patch itself is small enough, but we think that it will be
> strange to provide just API without examples of its usage.

It's not all that small, though it does apply cleanly even after a few
months.  At least that indicates there is not a lot of churn in this area.

I'm concerned about the lack of response or reviewers for this patch.
It may be because everyone believes they had their say on the original
thread, or because it seems like a big change to go into the last CF, or
for other reasons altogether.

I think you should try to make it clear why this patch would be a win
for 9.6.

Is anyone willing to volunteer a review or make an argument for the
importance of this patch?

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-11 Thread Joshua D. Drake

On 03/11/2016 09:48 AM, Masahiko Sawada wrote:



Thank you so much!
What I wanted deal with in thread is almost done. I'm going to more
test the feature for 9.6 releasing.


Nicely done!



Regards,

--
Masahiko Sawada





--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GinPageIs* don't actually return a boolean

2016-03-11 Thread Yury Zhuravlev

Andres Freund wrote:

I plan to commit something like this, unless there's very loud protest
from Peter's side.


I agree. Peter proposal can be considered in a separate thread.

Thanks. 
--

Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-11 Thread Masahiko Sawada
On Sat, Mar 12, 2016 at 2:37 AM, Robert Haas  wrote:
> On Thu, Mar 10, 2016 at 10:47 PM, Masahiko Sawada  
> wrote:
>>> Thanks.  I adopted some of your suggested, rejected others, fixed a
>>> few minor things that I missed previously, and committed this.  If you
>>> think any of the changes that I rejected still have merit, please
>>> resubmit those changes as separate patches.
>>
>> Thank you for your effort to this feature and committing it.
>> I guess that I couldn't do good work to this feature at final stage,
>> but I really appreciate all your advice and suggestion.
>
> Don't feel bad, you put a lot of work on this, and if you were getting
> a little tired towards the end, that's very understandable.  This
> extremely important feature was largely driven by you, and that's a
> big accomplishment.
>
>> I got your point.
>> Attached latest patch can skip to write the last part of last old page
>> if it's empty.
>> Please review it.
>
> Committed.
>
> Which I think just about brings us to the end of this epic journey,
> except for any cleanup of what's already been committed that needs to
> be done.  Thanks so much for your hard work!
>

Thank you so much!
What I wanted deal with in thread is almost done. I'm going to more
test the feature for 9.6 releasing.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-11 Thread Robert Haas
On Thu, Mar 10, 2016 at 10:47 PM, Masahiko Sawada  wrote:
>> Thanks.  I adopted some of your suggested, rejected others, fixed a
>> few minor things that I missed previously, and committed this.  If you
>> think any of the changes that I rejected still have merit, please
>> resubmit those changes as separate patches.
>
> Thank you for your effort to this feature and committing it.
> I guess that I couldn't do good work to this feature at final stage,
> but I really appreciate all your advice and suggestion.

Don't feel bad, you put a lot of work on this, and if you were getting
a little tired towards the end, that's very understandable.  This
extremely important feature was largely driven by you, and that's a
big accomplishment.

> I got your point.
> Attached latest patch can skip to write the last part of last old page
> if it's empty.
> Please review it.

Committed.

Which I think just about brings us to the end of this epic journey,
except for any cleanup of what's already been committed that needs to
be done.  Thanks so much for your hard work!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining Aggregates

2016-03-11 Thread David Steele
On 1/20/16 11:42 PM, David Rowley wrote:
> On 21 January 2016 at 08:06, Robert Haas  > wrote:
> 
> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
> >
> wrote:
> > Agreed. So I've attached a version of the patch which does not have any 
> of
> > the serialise/deserialise stuff in it.
> 
> I re-reviewed this and have committed most of it with only minor
> kibitizing.  A few notes:
> 
> 
> I've attached the re-based remainder, which includes the serial/deserial
> again.
> 
> I'll submit this part to March 'fest, where hopefully we'll also have
> something to utilise it.

As far as I can tell this is the most recent version of the patch but it
doesn't apply on master.  I'm guessing that's mostly because of Tom's
UPP patch.

This is a long and confusing thread and I should know since I just read
through the entire thing.  I'm setting this back to "waiting for author"
and I'd like see a rebased patch and a summary of what's in the patch
before setting it back to "needs review".

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Tsvector editing functions

2016-03-11 Thread Teodor Sigaev

I saw errors on windows, here is the fix:

Thank you, pushed

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-03-11 Thread Vitaly Burovoy
On 3/11/16, Robert Haas  wrote:
> On Sun, Feb 28, 2016 at 9:38 PM, Vitaly Burovoy
>  wrote:
>>> However, I'm not sure we ought to tinker with the behavior in this
>>> area.  If -MM-DD is going to accept things that are not of the
>>> format -MM-DD, and I'd argue that -1-06-01 is not in that format,
>>
>> It is not about format, it is about values.
>
> I disagree.  In a format like "-1-06-01", you want the first minus to
> indicate negation and the other two to be a separator.  That's not
> very far away from wanting the database to read your mind.

It is not my wish. The database does it just now:
postgres=# SELECT to_date('-1-06-01', '');
to_date
---
 0002-01-01 BC
(1 row)

>> Because it is inconvenient a little. If one value ("-2345") is passed,
>> another one ("2346 BC") is got. In the other case a programmer must
>> check for negative value, and if so change a sign and add "BC" to the
>> format. Moreover the programmer must keep in mind that it is not
>> enough to have usual date format "DD/MM/", because sometimes there
>> can be "BC" part.
>
> Yeah, well, that's life.  You can write an alternative function to
> construct dates that works the way you like, and that may well be a
> good idea.  But I think *this* change is not a good idea, and
> accordingly I vote we reject this patch.

My wish is to make the behavior be consistent.
Since there are two reverse functions ("extract" and "to_date"
["to_timestamp" in fact is the same]), I expect that is described as
"year" ("year"-"") means the same thing in both of them, the same
with pairs "isoyear"-"IYYY", "dow"-"DDD", "isodow"-"IDDD", etc.

Now "year" is _not_ the same as "" (but it cat be so according to
the documentation: there is no mentioning of any ISO standard),
whereas "isoyear" _is_ the same:
postgres=# SELECT y, to_date(y, ''),to_date(y, 'IYYY')IYYY
postgres-# FROM(VALUES('-1-06-01'))t(y);
y |   | iyyy
--+---+---
 -1-06-01 | 0002-01-01 BC | 0002-01-01 BC
(1 row)

and
postgres=# SELECT y, date_part('year', y),date_part('isoyear', y)IYYY
postgres-# FROM(VALUES('0002-06-01 BC'::date))t(y);
   y   |  | iyyy
---+--+--
 0002-06-01 BC |   -2 |   -1
(1 row)

P.S.: proposed patch changes IYYY as well, but it is easy to fix it
and I'm ready to do it after finding a consensus.
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tsvector editing functions

2016-03-11 Thread Stas Kelvich
> 
> On 11 Mar 2016, at 16:13, Stas Kelvich  wrote:
> 
> 
>> On 10 Mar 2016, at 20:29, Teodor Sigaev  wrote:
>> 
>> I would like to suggest rename both functions to array_to_tsvector and 
>> tsvector_to_array to have consistent name. Later we could add 
>> to_tsvector([regconfig, ], text[]) with morphological processing.
>> 
>> Thoughts?
>> 

Hi, thanks for commit.

I saw errors on windows, here is the fix:



tsvector.fix.patch
Description: Binary data

---
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] auto_explain sample rate

2016-03-11 Thread Robert Haas
On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
 wrote:
> A bit late, but I think we should rename the GUC variable to
> "sampling_rate" (instead of sample_ratio) as that's what pgbench uses
> for the same thing. That'd be more consistent.

I like that idea.  It seems like slightly better terminology.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent error handling in START_REPLICATION command

2016-03-11 Thread Robert Haas
On Fri, Mar 11, 2016 at 11:36 AM, David Steele  wrote:
> It looks like a decision needs to be made here whether to apply this patch,
> send it back to the author, or reject it so I'm marking it "Ready for
> Committer".
>
> Robert, since you were participating in this conversation can you have a
> look?

Who, me?  *looks around*

OK, I had a look.  I don't think this is a bug.  The original report
is complaining about this:

=
psycopg2test=# START_REPLICATION SLOT "test1" LOGICAL 0/0 ("invalid" 'value');
unexpected PQresultStatus: 8

The last command results in the following output sent to the server log:

ERROR:  option "invalid" = "value" is unknown
CONTEXT:  slot "test1", output plugin "test_decoding", in the startup callback
=

But that's just because psql knows utterly zero about copy-both mode.
So the problem here isn't that the server is doing anything it
shouldn't be doing; the server code is totally legitimate.  It's just
that the client is stupid.

Now, we could commit this patch anyway.  It won't hurt anything.  But
it won't help very much, either.  The fundamental issue is that
issuing a START_REPLICATION_SLOT command in psql can't be expected to
work.  If you do that, it's either going to fail with a server error,
or it's going to fail with "unexpected PQresultStatus: 8".  Committing
this patch would cause this particular example to fail in the first
way rather than the second one, and I'm not direly opposed to that if
somebody really feels strongly about it.  But it seems like the wrong
thing to focus on.  I think the real question is: why are you
insisting on issuing a command in psql that can NEVER succeed there?
If you want that to work, you're going to need to add copy-both
support to psql, which might be a fun project but has nothing to do
with this patch.

So I am inclined to think it's fine to reject this, unless somebody
wants to argue that there's some advantage to changing it than I am
missing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-11 Thread Igal @ Lucee.org

On 3/11/2016 12:40 AM, Craig Ringer wrote:


That's why (sorry, Igal) I'd like to see some more tests for cases 
other than identity columns. How is GENERATED ALWAYS handled, if 
supported? What about if it's on a UNIQUE column? How about a PRIMARY 
KEY whose value is assigned by a DEFAULT or by a trigger?
I was using Oracle 11g XE, GENERATED ALWAYS was not available.  This is 
the code I used for Oracle:


  CREATE TABLE jdbc (j_name VARCHAR2(64) NOT NULL, j_id NUMBER(10) NOT 
NULL);


  CREATE SEQUENCE jdbc_seq;

  CREATE OR REPLACE TRIGGER jdbc_seq_trigger
  BEFORE INSERT ON jdbc
  FOR EACH ROW
  WHEN (new.j_id IS NULL)
  BEGIN
SELECT jdbc_seq.NEXTVAL
INTO   :new.j_id
FROM   dual;
  END;
  /

For DB2 the type is indeed GENERATED ALWAYS AS IDENTITY:

  j_id INT GENERATED ALWAYS AS IDENTITY

Originally the name was ID but when both DB2 and MS/jTDS returned a 
column named "ID" I realized that it might come from the column name, so 
I modified the column name.  DB2 was indeed returning the column name, 
while MS/jTDS returns a column named "ID" regardless of the actual 
column name.




Based on the rather funky behaviour Igal found I suspect the answer 
will be "nothing much" for all of those, i.e. it just doesn't work 
with other drivers/vendors. But I'd like to know.
I agree, but I can test it if you give me the SQL commands.  I do want 
to remove all of that horrible software from my workstation as soon as 
possible, but it can wait if more testing is required.



2) Same for multicolumn keys:  Pg just returns (col1, col2) ==
(42, 146). Then client would be able to locate the row via "where
col1=42 and col2=146


Yeah, I was wondering about composite PKs.  I think Igal focused only 
on generated synthetic keys, which are after all overwhelmingly common 
case when getting generated keys.

If you give me the code that you want to test I will test it.



3) If multiple unique keys present, it is fine if Pg returns one
or the another depending on the phase of the moon. Yet more
compact key would be preferable to save on bandwidth.


I disagree there. Behavour must be well-defined and predictable unless 
it's really unavoidable.

I agree with Craig.


I think naming the resulting column(s) like "generated_key" /
"generated_keys" does not make much sense. Especially, for
multi-column keys.


Yeah. At least in PgJDBC where it's a separate resultset (IIRC), so 
you have metadata that means you don't have to guess column names etc.


I'm not sure how multi-column keys work.  In both MySQL and SQL Server 
for example, you can not have more than one SEQUENCE column, so perhaps 
that's their "solution".



Igal



Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Tomas Vondra
Hi,

On Fri, 2016-03-11 at 17:24 +0100, Michael Paquier wrote:
> On Fri, Mar 11, 2016 at 5:19 PM, Anastasia Lubennikova wrote:
> > 
> > BTW, if you know a good way to corrupt index (and do it
> > reproducible) I'd be
> > very glad to see it.
> You can use for example dd in non-truncate mode to corrupt on-disk
> page data, say that for example:
> dd if=/dev/random bs=8192 count=1 \
> seek=$BLOCK_ID of=base/$DBOID/$RELFILENODE \
> conv=notrunc

I guess /dev/random is not very compatible with the "reproducible"
requirement. I mean, it will reproducibly break the page, but pretty
much completely, which is mostly what checksums are for.

I think the main goal of the amcheck module is to identify issues that
are much more subtle - perhaps a coding error or unhandled cornercase
which results in page with a valid structure and checksum, but broken
page. Or perhaps issues coming from outside of PostgreSQL, like a
subtle change in the locales.

Simulating those issues requires a tool that allows minor tweaks to the
page, dd is a bit too dull for that.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2016-03-11 Thread Jesper Pedersen

On 01/26/2016 07:43 AM, Stas Kelvich wrote:

Thanks for reviews and commit!

   As Simon and Andres already mentioned in this thread replay of twophase 
transaction is significantly slower then the same operations in normal mode. 
Major reason is that each state file is fsynced during replay and while it is 
not a problem for recovery, it is a problem for replication. Under high 2pc 
update load lag between master and async replica is constantly increasing (see 
graph below).

   One way to improve things is to move fsyncs to restartpoints, but as we saw 
previously it is a half-measure and just frequent calls to fopen can cause 
bottleneck.

   Other option is to use the same scenario for replay that was used already 
for non-recovery mode: read state files to memory during replay of prepare, and 
if checkpoint/restartpoint occurs between prepare and commit move data to 
files. On commit we can read xlog or files. So here is the patch that 
implements this scenario for replay.

   Patch is quite straightforward. During replay of prepare records 
RecoverPreparedFromXLOG() is called to create memory state in GXACT, PROC, 
PGPROC; on commit XlogRedoFinishPrepared() is called to clean up that state. 
Also there are several functions (PrescanPreparedTransactions, 
StandbyTransactionIdIsPrepared) that were assuming that during replay all 
prepared xacts have files in pg_twophase, so I have extended them to check 
GXACT too.
   Side effect of that behaviour is that we can see prepared xacts in 
pg_prepared_xacts view on slave.

While this patch touches quite sensible part of postgres replay and there is 
some rarely used code paths, I wrote shell script to setup master/slave 
replication and test different failure scenarios that can happened with 
instances. Attaching this file to show test scenarios that I have tested and 
more importantly to show what I didn’t tested. Particularly I failed to 
reproduce situation where StandbyTransactionIdIsPrepared() is called, may be 
somebody can suggest way how to force it’s usage. Also I’m not too sure about 
necessity of calling cache invalidation callbacks during 
XlogRedoFinishPrepared(), I’ve marked this place in patch with 2REVIEWER 
comment.

Tests shows that this patch increases speed of 2pc replay to the level when 
replica can keep pace with master.

Graph: replica lag under a pgbench run for a 200 seconds with 2pc update transactions (80 
connections, one update per 2pc tx, two servers with 12 cores each, 10GbE interconnect) 
on current master and with suggested patch. Replica lag measured with "select 
sent_location-replay_location as delay from pg_stat_replication;" each second.



Some comments:

* The patch needs a rebase against the latest TwoPhaseFileHeader change
* Rework the check.sh script into a TAP test case (src/test/recovery), 
as suggested by Alvaro and Michael down thread

* Add documentation for RecoverPreparedFromXLOG

+* that xlog record. We need just to clen up memmory state.

'clean' + 'memory'

+* This is usually called after end-of-recovery checkpoint, so all 2pc
+* files moved xlog to files. But if we restart slave when master is
+* switched off this function will be called before checkpoint ans we 
need
+* to check PGXACT array as it can contain prepared transactions that
+* didn't created any state files yet.

=>

"We need to check the PGXACT array for prepared transactions that 
doesn't have any state file in case of a slave restart with the master 
being off."


+* prepare xlog resords in shared memory in the same way as it 
happens

'records'

+* We need such behaviour because speed of 2PC replay on 
replica should
+* be at least not slower than 2PC tx speed on master.

=>

"We need this behaviour because the speed of the 2PC replay on the 
replica should be at least the same as the 2PC transaction speed of the 
master."


I'll leave the 2REVIEWER section to Simon.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inconsistent error handling in START_REPLICATION command

2016-03-11 Thread David Steele
On 1/21/16 9:53 AM, Shulgin, Oleksandr wrote:
> On Thu, Jan 21, 2016 at 3:25 PM, Robert Haas  > wrote:
>
>
> So it's true that the client can't unilaterally terminate COPY BOTH
> mode; it can only send CopyDone.  But an error on the server side
> should do so.
>
>
> Hm, you're right.  Even though the server sends COPY_BOTH message
> before the plugin startup, an attempt by the client to actually read
> the data messages results in ErrorResponse handled on the client, then
> the client can re-submit the corrected START_REPLICATION command and
> continue without the need to reconnect.  This cannot be actually
> tested in psql, but I can verify the behavior with my python scripts.
>
> Then I don't have a preference for the early error reporting in this
> case.  If the current behavior potentially allows for more flexible
> error reporting, I'm for keeping it.

It looks like a decision needs to be made here whether to apply this
patch, send it back to the author, or reject it so I'm marking it "Ready
for Committer".

Robert, since you were participating in this conversation can you have a
look?

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] auto_explain sample rate

2016-03-11 Thread Tomas Vondra
Hi,

On Fri, 2016-03-11 at 15:11 +0100, Magnus Hagander wrote:
> 
> > 
> Applied with a minor word-fix in the documentation and removal of
> some unrelated whitespace changes. 

A bit late, but I think we should rename the GUC variable to
"sampling_rate" (instead of sample_ratio) as that's what pgbench uses
for the same thing. That'd be more consistent.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Michael Paquier
On Fri, Mar 11, 2016 at 5:19 PM, Anastasia Lubennikova wrote:
> BTW, if you know a good way to corrupt index (and do it reproducible) I'd be
> very glad to see it.

You can use for example dd in non-truncate mode to corrupt on-disk
page data, say that for example:
dd if=/dev/random bs=8192 count=1 \
seek=$BLOCK_ID of=base/$DBOID/$RELFILENODE \
conv=notrunc
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Anastasia Lubennikova

01.03.2016 03:09, Peter Geoghegan:

I was assigned an "action point" during the FOSDEM developer meeting:
"Post new version of btree consistency checker patch". I attach a new
WIP version of my consistency checker tool, amcheck. This patch is
proposed for 9.6, as an extension in contrib -- maybe we can still get
it in. This is the first time I've added any version of this to a
commitfest, although I've posted a couple of rough versions of this in
the past couple of years. The attached version has received a major
overhaul, and is primarily aimed at finding corruption in production
systems, although I think it will still have significant value for
debugging too. Maybe it can help with some of the B-Tree patches in
the final commitfest, for example. I also have some hope that it will
become a learning tool for people interested in how B-Tree indexes
work.

To recap, the extension adds some SQL-callable functions that verify
certain invariant conditions hold within some particular B-Tree index.
These are the conditions that index scans rely on always being true.
The tool's scope may eventually cover other AMs, including heapam, but
nbtree seems like the best place to start.

Note that no function currently checks that the index is consistent
with the heap, which would be very useful (that's probably how I'd
eventually target the heapam, actually).


Hi,
thank you for the contrib module. The patch itself looks great.
It was really useful to have such a tool while working on b-tree patches.


Invariants


nbtree invariants that the tool verifies with just an AccessShareLock
on the relation are:

* That all items are in the correct, opclass order on each page.

* That the page "high key", if any, actually bounds the items on the page.

* That the last item on a page is less than or equal to the first item
on the next page (the page to its right). The idea here is that the
key space spans multiple pages, not just one page, so it make sense to
check the last item where we can.

With an ExclusiveLock + ShareLock, some addition invariants are verified:

* That child pages actually have their parent's downlink as a lower bound.

* Sane right links and left links at each level.

Obviously, this tool is all about distrusting the structure of a
B-Tree index. That being the case, it's not always totally clear where
to draw the line. I think I have the balance about right, though.

Interface
===

There are only 2 SQL callable functions in the extension, which are
very similar:

bt_index_check(index regclass)

bt_index_parent_check(index regclass)

The latter is more thorough than the former -- it performs all checks,
including those checks that I mentioned required an ExclusiveLock. So,
bt_index_check() only requires an AccessShareLock.
bt_index_parent_check() requires an ExclusiveLock on the index
relation, and a ShareLock on its heap relation, almost like REINDEX.
bt_index_parent_check() performs verification that is a superset of
the verification performed by bt_index_check() -- mostly, the extra
verification/work is that it verifies downlinks against child pages.

Both functions raise an error in the event of observing that an
invariant in a B-Tree was violated, such as items being out of order
on a page. I've written extensive documentation, which goes into
practical aspects of using amcheck effectively. It doesn't go into
significant detail about every invariant that is checked, but gives a
good idea of roughly what checks are performed.

I could almost justify only having one function with an argument about
the downlink/child verification, but that would be a significant
footgun given the varying locking requirements that such a function
would have.


I've done some testing. And I can say that both announced functions work 
well.
BTW, if you know a good way to corrupt index (and do it reproducible) 
I'd be very glad to see it.


I created an index, then opened the file and changed the order of elements.
And bt_index_check() found the error successfully.

/* Ensure that the data is changed. */
SELECT * FROM bt_page_items('idx', 1);

 itemoffset | ctid  | itemlen | nulls | vars | data
+---+-+---+--+-
  1 | (0,3) |  16 | f | f| 03 00 00 00 00 00 00 00
  2 | (0,2) |  16 | f | f| 02 00 00 00 00 00 00 00
  3 | (0,1) |  16 | f | f| 01 00 00 00 00 00 00 00
(3 rows)


/* Great, amcheck found an error. */
select bt_index_check('idx');
ERROR:  page order invariant violated for index "idx"
DETAIL:  Lower index tid=(1,1) (points to heap tid=(0,3)) higher index 
tid=(1,2) (points to heap tid=(0,2)) page lsn=0/17800D0.



Locking
==

We never rely on something like holding on to a buffer pin as an
interlock for correctness (the vacuum interlock thing isn't generally
necessary, because we don't look at the heap at all). We simply pin +
BT_READ lock a buffer, copy it into local memory allocated by

Re: [HACKERS] Proposal: BSD Authentication support

2016-03-11 Thread David Steele
On 1/14/16 11:22 PM, Robert Haas wrote:
> On Tue, Jan 12, 2016 at 2:27 AM, Marisa Emerson  wrote:
>> I've attached the latest version of this patch. I've fixed up an issue with
>> the configuration scripts that I missed.
> Looks reasonable on a quick read-through.  Can anyone with access to a
> BSD system review and test?

Is anyone with access to/experience with BSD able to review and test
this patch?  Seems like it would make a great addition to 9.6.

Thanks,

-- 
-David
da...@pgmasters.net




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-03-11 Thread Gilles Darold
Le 11/03/2016 15:22, Tom Lane a écrit :
> Gilles Darold  writes:
>> Le 11/03/2016 10:49, Shulgin, Oleksandr a écrit :
>>> Would it make sense to have it as a symlink instead?
>> The only cons I see is that it can be more "difficult" with some
>> language to gather the real path, but do we really need it? There is
>> also little time where the symlink doesn't exist, this is when it needs
>> to be removed before being recreated to point to the new log file.
> Yeah, a symlink is impossible to update atomically (on most platforms
> anyway).  Plain file containing the pathname seems better.
>
> Another point is that we might not necessarily want *only* the pathname in
> there.  postmaster.pid has accreted more stuff over time, and this file
> might too.  I can see wanting the syslogger PID in there, for example,
> so that onlookers can detect a totally stale file.  (Not proposing this
> right now, just pointing out that it's a conceivable future feature.)
>
>   regards, tom lane

Here is the patch rewritten to use alternate file
$PGDATA/pg_log_filename to store the current log filename used by
syslogger. All examples used in the first mail of this thread work the
exact same way. If there's no other remarks, I will add the patch to the
next commit fest.


Here are some additional examples with this feature.

To obtain the filling percentage of the log file when log_rotation_size
is used:

postgres=# SELECT pg_current_logfile(), (select setting::int*1000 from
pg_settings where name='log_rotation_size'), a.size size, 
((a.size*100)/(select setting::int*1000 from pg_settings where
name='log_rotation_size')) percent_used FROM
pg_stat_file(pg_current_logfile())
a(size,access,modification,change,creation,isdir);

-[ RECORD 1 ]--+
pg_current_logfile | pg_log/postgresql-2016-03-11_160817.log
log_rotation_size  | 1024
size   | 1246000
percent_used   | 12

This can help to know if the file is near to be rotated. Or if you use
time based rotation:

postgres=# select pg_current_logfile(), (select setting::int*60 from
pg_settings where name='log_rotation_age')
log_rotation_age,a.access,a.modification, (((extract(epoch from
a.modification) - extract(epoch from a.access)) * 100) / (select
setting::int*60 from pg_settings where name='log_rotation_age'))
percent_used FROM pg_stat_file(pg_current_logfile())
a(size,access,modification,change,creation,isdir);
-[ RECORD 1 ]--+
pg_current_logfile | pg_log/postgresql-2016-03-11_162143.log
log_rotation_age   | 3600
access | 2016-03-11 16:21:43+01
modification   | 2016-03-11 16:33:12+01
percent_used   | 19.13889

Best regards,

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b5ee81..e6a18fc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15026,6 +15026,11 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
channel names that the session is currently listening on
   
 
+  pg_current_logfile()
+   text
+   current log file used by the logging collector
+  
+
   
pg_notification_queue_usage()
double
@@ -15264,6 +15269,16 @@ SET search_path TO schema , schema, ..

 

+pg_current_logfile
+   
+
+   
+pg_current_logfile returns the name of the current log
+file used by the logging collector, as a text. Log collection
+must be active.
+   
+
+   
 pg_postmaster_start_time

 
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index e7e488a..4769dc5 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -145,6 +145,7 @@ static char *logfile_getname(pg_time_t timestamp, const char *suffix);
 static void set_next_rotation_time(void);
 static void sigHupHandler(SIGNAL_ARGS);
 static void sigUsr1Handler(SIGNAL_ARGS);
+static void store_current_log_filename(char *filename);
 
 
 /*
@@ -571,6 +572,9 @@ SysLogger_Start(void)
 
 	syslogFile = logfile_open(filename, "a", false);
 
+	/* store information about current log filename used by log collection */
+	store_current_log_filename(filename);
+
 	pfree(filename);
 
 #ifdef EXEC_BACKEND
@@ -1209,6 +1213,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 		fclose(syslogFile);
 		syslogFile = fh;
 
+		/* store information about current log filename used by log collection */
+		store_current_log_filename(filename);
+
 		/* instead of pfree'ing filename, remember it for next time */
 		if (last_file_name != NULL)
 			pfree(last_file_name);
@@ -1253,6 +1260,9 @@ logfile_rotate(bool time_based_rotation, int size_rotation_for)
 		fclose(csvlogFile);
 		csvlogFile = fh;
 
+		/* store information about current log filename used by log collection */
+		

  1   2   >