Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
> I also think the async I/O is the way to go. Luckily that has already been 
> done
> in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
> of netty java NIO library. It has quite good feature parity with the original
> pgjdbc driver. 

Huh, it does seem to now. The big omission, unless I'm blind, is support
for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7
and JDBC 4.1, but that's reasonable enough.)

It now has LISTEN/NOTIFY by the looks, and of course it's built around
binary protocol support.

I freely admit I haven't looked at it too closely. I'm a tad frustrated
by the parallel development of a different driver when the "official"
driver has so much in the way of low hanging fruit to improve.

I have to say I like some aspects of how pgjdbc-ng is being done - in
particular use of Maven and being built around non-blocking I/O.

OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
driver, and since it hard-depends on JDK 7 I don't understand why Netty
was used instead of the async / non-blocking features of the core JVM.
That may simply be my inexperience with writing non-blocking socket I/O
code on Java though.

I'm also concerned about how it'll handle new JDBC versions, as it seems
to lack the JDBC interface abstraction of the core driver.

> I do not think the JDBC batch interface even allow doing updates to multiple
> tables when using prepared statements?

Ah, I missed this before.

That's correct. You get prepared statements _or_ multiple different
statements.

That's a more understandable reason to concoct a new API, and explains
why you're not just solving the issues with batches. Though I still
think you're going to have to fix the buffer management code before you
do anything like this.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/03/2014 07:05 AM, Scott Harrington wrote:
> This avoids the need for a Future, and avoids the client having to
> loop/sleep until done.

A Future is the logical way to represent an asynchronous operation in
Java. Why implement something else that doesn't fit into existing
libraries and tools when there's already a standard interface?

If you're breaking outside the stock JDBC model and thinking
asynchronously, then using the existing Java APIs for asynchrony makes
sense to me.

In terms of looping until done ... what we really need is a reliably
non-blocking PGConnection.consumeInput() so apps can call that in their
main loops, or via a helper thread. Then we'd be able to add async
notification callbacks too. To do that we need to make PGstream use a
java.nio.Channel instead of a java.net.socket .

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
> I do not quite grasp why not sending Sync is so important.

Well, at the moment the PgJDBC driver relies on the following flow to
manage its buffers and avoid a logjam where both server and client are
waiting for the other to consume input:

* Send some stuff
* Sync
* Consume input until Sync response received

... since at this point it knows the server's send buffer should be
empty, or near enough.

There's no fundamental reason why you *can't* send a Sync after each
query, AFAIK. You just have to change how the driver handles syncs so it
knows that there might be more work pipelined. Or fix the driver's
buffer management, as described in the github issues I linked.

It doesn't make much sense to unless you're running in autocommit mode
though, and I'm not convinced piplelining work in autocommit mode makes
a lot of sense. The biggest problem is exception handling - you can't
throw an exception at some statement execution in the past. So you need
a new interface for piplelining... or to just use the existing batch
interface.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/03/2014 07:05 AM, Scott Harrington wrote:

> I looked over your patch. Your list of ResultHandlerHolders seems to be
> the right direction, but as Tom Lane mentioned there may need to be some
> way to ensure the statements are all in the same transaction.

Why go down this track, when we already have batching?

Just use the standard JDBC API batching facilities and add the missing
support for batching queries that return result sets in PgJDBC.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread CK Tan
Josh,

Do you have a list of what needs to be done to keep the MONEY type?
What is wrong with it?

Thanks,
-cktan

On Mon, Nov 3, 2014 at 10:30 PM, Feng Tian  wrote:
> Hi,
>
> This is Feng from Vitesse.   Performance different between Money and Numeric
> is *HUGE*.   For TPCH Q1, the performance difference is 5x for stock
> postgres, and ~20x for vitesse.
>
> Stock postgres, for my laptop, TPCH 1G, Q1, use money type ~ 9s, use Numeric
> (15, 2) is ~53s.
>
> Kevin,
>  test=# do $$ begin perform sum('1.01'::numeric) from
> generate_series(1,1000); end; $$;
>
> This may not reflect the difference of the two data type.   One aggregate is
> not where most of the time is spent.  TPCH Q1 has many more computing.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Nov 3, 2014 at 4:54 AM, Michael Banck 
> wrote:
>>
>> Am Sonntag, den 02.11.2014, 12:41 -0500 schrieb Tom Lane:
>> > BTW, after reflecting a bit more I'm less than convinced that this
>> > datatype is completely useless.  Even if you prefer to store currency
>> > values in numeric columns, casting to or from money provides a way to
>> > accept or emit values in whatever monetary format the LC_MONETARY locale
>> > setting specifies.  That seems like a useful feature, and it's one you
>> > could not easily duplicate using to_char/to_number (not to mention that
>> > those functions aren't without major shortcomings of their own).
>>
>> As an additional datapoint, Vitesse Data changed the DB schema from
>> NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
>> modification to data types is easy to understand -- money and double
>> types are faster than Numeric (and no one on this planet has a bank
>> account that overflows the money type, not any time soon)."[1] And
>> "Replaced NUMERIC fields representing currency with MONEY"[2].
>>
>> Not sure whether they modified/optimized PostgreSQL with respect to the
>> MONEY data type and/or how much performance that gained, so CCing CK Tan
>> as well.
>>
>>
>> Michael
>>
>> [1]
>> http://vitesse-timing-on.blogspot.de/2014/10/running-tpch-on-postgresql-part-1.html
>> [2] http://vitessedata.com/benchmark/
>>
>> --
>> Michael Banck
>> Projektleiter / Berater
>> Tel.: +49 (2161) 4643-171
>> Fax:  +49 (2161) 4643-100
>> Email: michael.ba...@credativ.de
>>
>> credativ GmbH, HRB Mönchengladbach 12080
>> USt-ID-Nummer: DE204566209
>> Hohenzollernstr. 133, 41061 Mönchengladbach
>> Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Craig Ringer
On 11/03/2014 03:41 AM, Tom Lane wrote:
> Nothing that I recall at the moment, but there is certainly plenty of
> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
> intarray, isn, spi, and xml2 are all in worse shape than the money type.

What's wrong with intarray?

I'm aware of issues with the rest (and I think citext has a few of its
own too) but I didn't think intarray had any real issues.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Feng Tian
Hi,

This is Feng from Vitesse.   Performance different between Money and
Numeric is *HUGE*.   For TPCH Q1, the performance difference is 5x for
stock postgres, and ~20x for vitesse.

Stock postgres, for my laptop, TPCH 1G, Q1, use money type ~ 9s, use
Numeric (15, 2) is ~53s.

Kevin,
 test=# do $$ begin perform sum('1.01'::numeric) from
generate_series(1,1000); end; $$;

This may not reflect the difference of the two data type.   One aggregate
is not where most of the time is spent.  TPCH Q1 has many more computing.

















On Mon, Nov 3, 2014 at 4:54 AM, Michael Banck 
wrote:

> Am Sonntag, den 02.11.2014, 12:41 -0500 schrieb Tom Lane:
> > BTW, after reflecting a bit more I'm less than convinced that this
> > datatype is completely useless.  Even if you prefer to store currency
> > values in numeric columns, casting to or from money provides a way to
> > accept or emit values in whatever monetary format the LC_MONETARY locale
> > setting specifies.  That seems like a useful feature, and it's one you
> > could not easily duplicate using to_char/to_number (not to mention that
> > those functions aren't without major shortcomings of their own).
>
> As an additional datapoint, Vitesse Data changed the DB schema from
> NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
> modification to data types is easy to understand -- money and double
> types are faster than Numeric (and no one on this planet has a bank
> account that overflows the money type, not any time soon)."[1] And
> "Replaced NUMERIC fields representing currency with MONEY"[2].
>
> Not sure whether they modified/optimized PostgreSQL with respect to the
> MONEY data type and/or how much performance that gained, so CCing CK Tan
> as well.
>
>
> Michael
>
> [1]
> http://vitesse-timing-on.blogspot.de/2014/10/running-tpch-on-postgresql-part-1.html
> [2] http://vitessedata.com/benchmark/
>
> --
> Michael Banck
> Projektleiter / Berater
> Tel.: +49 (2161) 4643-171
> Fax:  +49 (2161) 4643-100
> Email: michael.ba...@credativ.de
>
> credativ GmbH, HRB Mönchengladbach 12080
> USt-ID-Nummer: DE204566209
> Hohenzollernstr. 133, 41061 Mönchengladbach
> Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-11-03 Thread Rahila Syed
Hello ,

Please find updated patch with the review comments given above implemented.
The compressed data now includes all backup blocks and their headers except
the header of first backup block in WAL record. The first backup block
header in WAL record is used to store the compression information. This is
done in order to avoid adding compression information in WAL record header.

Memory allocation on SIGHUP in autovacuum is remaining. Working  on it.


Thank you,
Rahila Syed





On Tue, Oct 28, 2014 at 8:51 PM, Rahila Syed 
wrote:

>
> >>>Do we release the buffers for compressed data when fpw is changed from
> "compress" to "on"?
> >> The current code does not do this.
> >Don't we need to do that?
> Yes this needs to be done in order to avoid memory leak when compression is
> turned off at runtime while the backend session is running.
>
> >You don't need to make the processes except the startup process allocate
> >the memory for uncompressedPages when fpw=on. Only the startup process
> >uses it for the WAL decompression
> I see. fpw != on check can be put at the time of memory allocation of
> uncompressedPages in the backend code . And at the time of recovery
> uncompressedPages can be allocated separately if not already allocated.
>
> >BTW, what happens if the memory allocation for uncompressedPages for
> >the recovery fails?
> The current code does not handle this. This will be rectified.
>
> >Which would prevent the recovery at all, so PANIC should
> >happen in that case?
> IIUC, instead of reporting  PANIC , palloc can be used to allocate memory
> for uncompressedPages at the time of recovery which will throw ERROR and
> abort startup process in case of failure.
>
>
> Thank you,
> Rahila Syed
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/Compression-of-full-page-writes-tp5769039p5824613.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


compress_fpw_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-11-03 Thread Noah Misch
On Mon, Nov 03, 2014 at 04:40:39PM -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
> > I thank you for this research, but I suggest that we ship 9.4 as is,
> > that is with requiring IPC::Run and --enable-* option.  All the possible
> > alternatives will clearly need more rounds of portability testing.  We
> > can then evaluate these changes for 9.5 in peace.
> 
> I concur --- it's time to get 9.4 out, and the current state of affairs
> is Good Enough imo.

Agreed; any release-blocking aspects of this test suite are behind us.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Tom Lane
Michael Paquier  writes:
> There is as well another way: finally support WAL-logging for hash indexes.

Sure, but that's a bit easier said than done.  I think Robert looked into
that a year or two back and found some stumbling blocks that it wasn't
obvious how to surmount.  I do hope it will happen eventually, but we
should assume that it's a long-term answer not a short-term one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Tom Lane
Josh Berkus  writes:
> On 11/02/2014 11:41 AM, Tom Lane wrote:
>> Nothing that I recall at the moment, but there is certainly plenty of
>> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
>> intarray, isn, spi, and xml2 are all in worse shape than the money type.

> Why are we holding on to xml2 again?

IIRC, there's some xpath-related functionality in there that's not
yet available in core (and needs some redesign before it'd ever get
accepted into core, so there's not a real quick fix to be had).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #11867: Strange behaviour with composite types after resetting database tablespace

2014-11-03 Thread Tom Lane
m...@bloodnok.com writes:
> I have a script (below) that sets and resets the tablespace for a database
> and drops and recreates a composite type.  The script fails when trying to
> re-create the previously dropped composite type because the type has
> apparently been magically resuscitated by changing the tablespace for the
> database.  I assume this is a bug.  

Fascinating.

I believe what is happening is:

* You create the composite type.  This results in entries in the
database's pg_class and pg_type tables.  These entries are in blocks
of those tables that are in shared buffers, with hashtag RelFileNodes
that mention the database's original tablespace (tbs3).

* You move the database to another tablespace.  We faithfully flush
the dirty catalog blocks to disk and then copy them somewhere else
on disk.

* You drop the composite type.  This entails fetching in relevant
blocks of the database's pg_class and pg_type tables from their
new tablespace and marking the catalog rows deleted.  This happens in
shared buffers that have hashtags mentioning the database's new
tablespace (pg_default, in this example).

* You move the database back to its original tablespace.  We faithfully
flush the dirty catalog blocks to disk and then copy them back to the
original on-disk location.

However: at no point in this sequence did we flush the original catalog
blocks out of shared buffers.  Therefore, a read of the database's
pg_class or pg_type at this point is likely to find the previous states of
those pages (pre-composite-type-drop) as valid, matching entries, so that
we skip reading the up-to-date page contents back from disk.

A quick fix for this would be to issue DropDatabaseBuffers during
movedb(), though I'm not sure offhand where in that sequence is the
safest place for it.  We could be more restrictive and only drop
buffers that belong to the source tablespace, but I'm not sure it's
worth any extra code to do so.

> This is not mission-critical for me but I'd be grateful for suggestions for
> work-arounds.

I don't see any workaround that's much easier than fixing the bug.
But what's your use case that involves flipping databases from one
tablespace to another and then back again within the life expectancy of
unused shared-buffers pages?  It doesn't seem terribly surprising that
nobody noticed this before ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-11-03 Thread Etsuro Fujita

(2014/10/30 21:30), Fujii Masao wrote:

On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita
 wrote:



Here are my review comments.

* The patch applies cleanly and make and make check run successfully.  I
think that the patch is mostly good.


Thanks! Attached is the updated version of the patch.


Thank you for updating the patch!


* In src/backend/utils/misc/guc.c:
+   {
+   {"pending_list_cleanup_size", PGC_USERSET,
CLIENT_CONN_STATEMENT,
+   gettext_noop("Sets the maximum size of the pending
list for GIN index."),
+NULL,
+   GUC_UNIT_KB
+   },
+   &pending_list_cleanup_size,
+   4096, 0, MAX_KILOBYTES,
+   NULL, NULL, NULL
+   },

ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No?


Yes if the pending list always exists in the memory. But not, IIUC. Thought?


Exactly.  But I think we can expect that in many cases, since I think 
that the users would often set the GUC to a small value to the extent 
that most of the pending list pages would be cached by shared buffer, to 
maintain *search* performance.


I'd like to hear the opinions of others about the category for the GUC.


Also why not set min to 64, not to 0, in accoradance with that of work_mem?


I'm OK to use 64. But I just chose 0 because I could not think of any reasonable
reason why 64k is suitable as the minimum size of the pending list.
IOW, I have no idea about whether it's reasonable to use the min value of
work_mem as the min size of the pending list.


IIUC, I think that min = 0 disables fast update, so ISTM that it'd be 
appropriate to set min to some positive value.  And ISTM that the idea 
of using the min value of work_mem is not so bad.



* In doc/src/sgml/ref/create_index.sgml:
+ PENDING_LIST_CLEANUP_SIZE

IMHO, it seems to me better for this variable to be in lowercase in
accordance with the GUC version.


Using lowercase only for pending_list_cleanup_size and uppercase for
other options
looks strange to me. What about using lowercase for all the storage options?


+1


I changed the document in that way.


*** 356,361  CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ class="parameter">name
--- 356,372 
  
 
 
+
+
+ PENDING_LIST_CLEANUP_SIZE

The above is still in uppercse.

Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Michael Paquier
On Sun, Nov 2, 2014 at 2:30 AM, Tom Lane  wrote:

> In the case of hash indexes, because we still have to have the hash
> opclasses in core, there's no way that it could be pushed out as an
> extension module even if we otherwise had full support for AMs as
> extensions.  So what I hear you proposing is "let's break this so
> thoroughly that it *can't* be fixed".  I'm not on board with that.
> I think the WARNING will do just fine to discourage novices who are
> not familiar with the state of the hash AM.  In the meantime, we
> could push forward with the idea of making hash indexes automatically
> unlogged, so that recovering from a crash wouldn't be quite so messy/
> dangerous.
>
There is as well another way: finally support WAL-logging for hash indexes.
-- 
Michael


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Josh Berkus
On 11/02/2014 11:41 AM, Tom Lane wrote:
> Nothing that I recall at the moment, but there is certainly plenty of
> stuff of dubious quality in there.  I'd argue that chkpass, intagg,
> intarray, isn, spi, and xml2 are all in worse shape than the money type.

Why are we holding on to xml2 again?

FWIW, I'd be fine with moving ISN, intarray and intagg to PGXN.  In
fact, I think it would be better for them to be there.  And they're not
core types, so there shouldn't be an issue with that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PG-XC] how to pass a value(String) from coordinator node to all data nodes

2014-11-03 Thread Michael Paquier
On Tue, Nov 4, 2014 at 10:40 AM, Demai Ni  wrote:

> hi, hackers,
>
> I am looking for a simple way to pass a value(ideally a hashtable, and use
> a string for easy implementation for now), that is generated by coordinator
> node, and pass it to all data nodes. Plans may be one approach, I am just
> wondering whether there is a simpler hook?
>
> Many thanks for your input.
>
This is a message for the Postgres-XC mailing lists. Feel free to subscribe
and send messages here:
https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
-- 
Michael


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/04/2014 09:10 AM, Tom Lane wrote:
> Mikko Tiihonen  writes:
>> I do not quite grasp why not sending Sync is so important. My proof of 
>> concept setup was for queries with autocommit enabled.
> 
> [snip] It'll be very much like
> sending a fixed (predetermined) SQL script to the server using a scripting
> language that lacks any conditionals.

... which is part of why I think the batch interface for JDBC is the
appropriate UI, and the existing support in PgJDBC just needs to be
extended to support returning result sets.

The JDBC driver supports multiple result sets, and a batch execution
looks just like a procedure that returned multiple result sets (or
rather, a mix of result sets, affected rowcounts, and no-info success
messages).

See

http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#addBatch(java.lang.String)

http://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#executeBatch()



The docs are admittedly mostly oriented toward DML, but nothing stops us
returning SUCCESS_NO_INFO and a resultset. Or if we choose, returning a
rowcount and resultset.

It'd be worth testing what other drivers do before doing that, but
nothing in the spec:

https://jcp.org/aboutJava/communityprocess/mrel/jsr221/index.html

seems to stop us doing it.

The batch interface doesn't offer any way to set scrollable/editable
resultset options, but we don't want to allow that for batches anyway.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread David Fetter
On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote:
> > The performance of our numeric vs Oracle's was a common complaint when
> > I was at EnterpriseDB (in 2007).
> > 
> > Perhaps numeric's performance could be greatly improved in cases where
> > the precision is low enough to map to an int/bigint. That would get us
> > closer to eliminating money as well as give other uses a big win. Of
> > course, how to do that and not break pg_upgrade is a huge question...
> 
> Just out of curiosity, why is Oracle's NUMBER (I assume you are
> talking about this) so fast?

I suspect that what happens is that NUMBER is stored as a native type
(int2, int4, int8, int16) that depends on its size and then cast to
the next upward thing as needed, taking any performance hits at that
point.  The documentation hints (38 decimal places) at a 128-bit
internal representation as the maximum.  I don't know what happens
when you get past what 128 bits can represent.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PG-XC] how to pass a value(String) from coordinator node to all data nodes

2014-11-03 Thread Demai Ni
hi, hackers,

I am looking for a simple way to pass a value(ideally a hashtable, and use
a string for easy implementation for now), that is generated by coordinator
node, and pass it to all data nodes. Plans may be one approach, I am just
wondering whether there is a simpler hook?

Many thanks for your input.

Demai


Re: [HACKERS] pg_multixact not getting truncated

2014-11-03 Thread Josh Berkus
On 11/03/2014 05:24 PM, Josh Berkus wrote:
> BTW, the reason I started poking into this was a report from a user that
> they have a pg_multixact directory which is 21GB in size, and is 2X the
> size of the database.
> 
> Here's XID data:
> 
> Latest checkpoint's NextXID:  0/1126461940
> Latest checkpoint's NextOID:  371135838
> Latest checkpoint's NextMultiXactId:  162092874
> Latest checkpoint's NextMultiOffset:  778360290
> Latest checkpoint's oldestXID:945761490
> Latest checkpoint's oldestXID's DB:   370038709
> Latest checkpoint's oldestActiveXID:  1126461940
> Latest checkpoint's oldestMultiXid:   123452201
> Latest checkpoint's oldestMulti's DB: 370038709
> 
> Oldest mxid file is 29B2, newest is 3A13
> 
> No tables had a relminmxid of 1 (some of the system tables were 0,
> though), and the data from pg_class and pg_database is consistent.

More tidbits:

I just did a quick check on customer systems (5 of them).  This only
seems to be happening on customer systems where I specifically know
there is a high number of FK lock waits (the system above gets around
1000 per minute that we know of).  Other systems with higher transaction
rates don't exhibit this issue; I checked a 9.3.5 database which
normally needs to do XID wraparound once every 10 days, and it's
pg_multixact is only 48K (it has one file, ).

Note that pg_clog on the bad machine is only 64K in size.

How many IDs are there per mxid file?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_multixact not getting truncated

2014-11-03 Thread Josh Berkus
On 11/03/2014 05:06 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> Hackers,
>>
>> I'm looking at a couple of high-transaction-rate and high-FK-conflict
>> rate servers where pg_multixact has grown to be more than 1GB in size.
>> One such server doesn't appear to be having any notable issues with
>> vacuuming, and the oldest mxid on the system is about 47m old. VACUUM
>> FREEZEing the oldest databases did not cause the pg_multixact dir to get
>> smaller --- it may have even caused it to get larger.
>>
>> Why would pg_multixact not be truncating?  Does it never truncate files
>> with aborted multixacts in them?  Might we have another multixact bug?
> 
> Have a look at the MultiXactId values in pg_controldata, datminmxid in
> pg_database, and relminmxid in pg_class.  They should advance as you
> VACUUM FREEZE.  If it's stuck at 1, you might be in pg_upgrade trouble
> if you used 9.3.4 or earlier to upgrade.

They're advancing.  I also know for a fact that this system was not
pg_upgraded, because I set it up.  It's always been on 9.3.5.

So after some vacuum freezing and a checkpoint, the oldestMultiXid
advanced by 9 million, about 20% of the gap with current multiXid.
However, that did not delete 20% of the files; before the Freeze there
were 4414 files, and now there are 4195.  So advancing 9m mxids resulted
in only deleting ~~130 files.  So I think we definitely have an issue here.

BTW, the reason I started poking into this was a report from a user that
they have a pg_multixact directory which is 21GB in size, and is 2X the
size of the database.

Here's XID data:

Latest checkpoint's NextXID:  0/1126461940
Latest checkpoint's NextOID:  371135838
Latest checkpoint's NextMultiXactId:  162092874
Latest checkpoint's NextMultiOffset:  778360290
Latest checkpoint's oldestXID:945761490
Latest checkpoint's oldestXID's DB:   370038709
Latest checkpoint's oldestActiveXID:  1126461940
Latest checkpoint's oldestMultiXid:   123452201
Latest checkpoint's oldestMulti's DB: 370038709

Oldest mxid file is 29B2, newest is 3A13

No tables had a relminmxid of 1 (some of the system tables were 0,
though), and the data from pg_class and pg_database is consistent.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Tom Lane
Mikko Tiihonen  writes:
> I do not quite grasp why not sending Sync is so important. My proof of 
> concept setup was for queries with autocommit enabled.

The point is that that will be very, very much harder to use than doing
it the other way.  It's fairly easy to reason about the results of
single-transaction pipelined queries: they're all or nothing.  If you
fire off queries that are going to autocommit independently, then you
have to worry about all combinations of success/failure, and you won't
have the opportunity to adjust on the fly.  It'll be very much like
sending a fixed (predetermined) SQL script to the server using a scripting
language that lacks any conditionals.  People certainly do use fixed
scripts sometimes, but they usually find out they want them wrapped into
single transactions, because otherwise they're left with a horrible mess
if the script goes off the rails at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_multixact not getting truncated

2014-11-03 Thread Alvaro Herrera
Josh Berkus wrote:
> Hackers,
> 
> I'm looking at a couple of high-transaction-rate and high-FK-conflict
> rate servers where pg_multixact has grown to be more than 1GB in size.
> One such server doesn't appear to be having any notable issues with
> vacuuming, and the oldest mxid on the system is about 47m old. VACUUM
> FREEZEing the oldest databases did not cause the pg_multixact dir to get
> smaller --- it may have even caused it to get larger.
> 
> Why would pg_multixact not be truncating?  Does it never truncate files
> with aborted multixacts in them?  Might we have another multixact bug?

Have a look at the MultiXactId values in pg_controldata, datminmxid in
pg_database, and relminmxid in pg_class.  They should advance as you
VACUUM FREEZE.  If it's stuck at 1, you might be in pg_upgrade trouble
if you used 9.3.4 or earlier to upgrade.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-03 Thread Alvaro Herrera
Jeff Janes wrote:
> On Mon, Nov 3, 2014 at 2:18 PM, Alvaro Herrera 
> wrote:

> I get a couple compiler warnings with this:
> 
> brin.c: In function 'brininsert':
> brin.c:97: warning: 'tupcxt' may be used uninitialized in this function
> brin.c:98: warning: 'oldcxt' may be used uninitialized in this function

Ah, that's easily fixed.  My compiler (gcc 4.9 from Debian Jessie
nowadays) doesn't complain, but I can see that it's not entirely
trivial.

> Also, I think it is missing a cat version bump.  It let me start the
> patched server against an unpatched initdb run, but once started it didn't
> find the index method.

Sure, that's expected (by me at least).  I'm too lazy to maintain
catversion bumps in the patch before pushing, since that generates
constant conflicts as I rebase.

> What would it take to make CLUSTER work on a brin index?  Now I just added
> a btree index on the same column, clustered on that, then dropped that
> index.

Interesting question.  What's the most efficient way to pack a table to
minimize the intervals covered by each index entry?  One thing that
makes this project a bit easier, I think, is that CLUSTER has already
been generalized so that it supports either an indexscan or a
seqscan+sort.  If anyone wants to work on this, be my guest; I'm
certainly not going to add it to the initial commit.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Fabrízio de Royes Mello
On Mon, Nov 3, 2014 at 7:25 PM, Alexey Vasiliev  wrote:
>
>
>
>
> Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello <
fabriziome...@gmail.com>:
> >
> > >
> > > Should I change my patch and send it by email? And also as I
understand I should change message ID for
https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it?
> > >
> >
> > You should just send another version of your patch by email and add a
new "comment" to commit-fest using the "Patch" comment type.
>
>
> Added new patch.
>

And you should add the patch to an open commit-fest not to an in-progress.
The next opened is 2014-12 [1].


Regards,


[1] https://commitfest.postgresql.org/action/commitfest_view?id=25

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


[HACKERS] pg_multixact not getting truncated

2014-11-03 Thread Josh Berkus
Hackers,

I'm looking at a couple of high-transaction-rate and high-FK-conflict
rate servers where pg_multixact has grown to be more than 1GB in size.
One such server doesn't appear to be having any notable issues with
vacuuming, and the oldest mxid on the system is about 47m old. VACUUM
FREEZEing the oldest databases did not cause the pg_multixact dir to get
smaller --- it may have even caused it to get larger.

Why would pg_multixact not be truncating?  Does it never truncate files
with aborted multixacts in them?  Might we have another multixact bug?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-03 Thread Jeff Janes
On Mon, Nov 3, 2014 at 2:18 PM, Alvaro Herrera 
wrote:

> Robert Haas wrote:
> > [lots]
>
> I have fixed all these items in the attached, thanks -- most
> user-visible change was the pageinspect 1.3 thingy.  pg_upgrade from 1.2
> works fine now.  I also fixed some things Heikki noted, mainly avoid
> retail pfree where possible, and renumber the support procs to leave
> room for future expansion of the framework.  XLog replay code is updated
> too.
>
> Also, I made the summarization step callable directly from SQL without
> having to invoke VACUUM.
>
> So here's v21.  I also attach a partial diff from v20, just in case
> anyone wants to give it a look.
>

I get a couple compiler warnings with this:

brin.c: In function 'brininsert':
brin.c:97: warning: 'tupcxt' may be used uninitialized in this function
brin.c:98: warning: 'oldcxt' may be used uninitialized in this function

Also, I think it is missing a cat version bump.  It let me start the
patched server against an unpatched initdb run, but once started it didn't
find the index method.

What would it take to make CLUSTER work on a brin index?  Now I just added
a btree index on the same column, clustered on that, then dropped that
index.

Thanks,

Jeff


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Mikko Tiihonen
> Craig Ringer wrote:
> On 11/02/2014 09:27 PM, Mikko Tiihonen wrote:
> > Is the following summary correct:
> > - the network protocol supports pipelinings
> Yes.
> 
> All you have to do is *not* send a Sync message and be aware that the
> server will discard all input until the next Sync, so pipelining +
> autocommit doesn't make a ton of sense for error handling reasons.

I do not quite grasp why not sending Sync is so important. My proof of concept 
setup was for queries with autocommit enabled.
When looking with wireshark I could observe that the client sent 3-10 
P/B//D/E/S messages to server, before the server started sending the 
corresponding 1/2/T/D/C/Z replies for each request. Every 10 requests the test 
application waited for the all the replies to come to not overflow the network 
buffers (which is known to cause deadlocks with current pg jdbc driver).

If I want separate error handling for each execution then shouldn't I be 
sending separate sync for each pipelined operation?

> > - the server handles operations in order, starting the processing of next 
> > operation only after fully processing the previous one 
> >- thus pipelining is invisible to the server
> 
> As far as I know, yes. The server just doesn't care.
> 
> > - libpq driver does not support pipelining, but that is due to internal 
> > limitations
> 
> Yep.
> 
> > - if proper error handling is done by the client then there is no reason 
> > why pipelining could be supported by any pg client
> 
> Indeed, and most should support it. Sending batches of related queries
> would make things a LOT faster.
> 
> PgJDBC's batch support is currently write-oriented. There is no
> fundamental reason it can't be expanded for reads. I've already written
> a patch to do just that for the case of returning generated keys.
>
> https://github.com/ringerc/pgjdbc/tree/batch-returning-support
> 
> and just need to rebase it so I can send a pull for upstream PgJDBC.
> It's already linked in the issues documenting the limitatations in batch
>support.

Your code looked like good. Returning inserts are an important thing to 
optimize.

> If you want to have more general support for batches that return rowsets
> there's no fundamental technical reason why it can't be added. It just
> requires some tedious refactoring of the driver to either:
> 
> - Sync and wait before it fills its *send* buffer, rather than trying
>   to manage its receive buffer (the server send buffer), so it can
>   reliably avoid deadlocks; or
> 
> - Do async I/O in a select()-like loop over a protocol state machine,
>   so it can simultaneously read and write on the wire.

I also think the async I/O is the way to go. Luckily that has already been done
in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
of netty java NIO library. It has quite good feature parity with the original
pgjdbc driver. I'll try next if I can enable the pipelining with it now that
I have tried the proof of concept with the originial pgjdbc driver.

> I might need to do some of that myself soon, but it's a big (and
> therefore error-prone) job I've so far avoided by making smaller, more
> targeted changes.
> 
> For JDBC the JDBC batch interface is the right place to do this, and you
> should not IMO attempt to add pipelining outside that interface.
> (Multiple open resultsets from portals, yes, but not pipelining of queries).

I do not think the JDBC batch interface even allow doing updates to multiple
tables when using prepared statements?

--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-03 Thread Álvaro Hernández Tortosa


On 03/11/14 22:19, Tom Lane wrote:

=?ISO-8859-1?Q?=C1lvaro_Hern=E1ndez_Tortosa?=  writes:

  IMHO, from a user perspective the transaction begins when the BEGIN
command is issued. If I really want to see a "frozen" view of the
database before any real SELECT, I have to issue another query like
"SELECT 1". This seems odd to me. I understand tx snapshot may be
deferred until real execution for performance reasons, but it is
confusing from a user perspective. Is this really expected, or is it a
bug? Am I having a bad day and missing some point here? ^_^

It's expected.  Without this behavior, you could not take out any locks
before freezing the transaction snapshot, which would be a bad thing.


Thank you for your comment, Tom. However I think this behavior, as 
seen from a user perspective, it's not the expected one. There may be 
some internal reasons for it, but for the user executing the 
transaction, it's normal to expect the freezing to happen right after 
the BEGIN, rather than *after* the first query.


If it is still the intended behavior, I think it should be clearly 
documented as such, and a recommendation similar to "issue a 'SELECT 1' 
right after BEGIN to freeze the data before any own query" or similar 
comment should be added. Again, as I said in my email, the documentation 
clearly says that "only sees data committed before the transaction 
began". And this is clearly not the real behavior.




I think there are some examples in the "concurrency control" chapter
of the manual.


Sure, there are, that was the link I pointed out, but I found no 
explicit mention to the fact that I'm raising here.



Regards,

Álvaro

--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-03 Thread Merlin Moncure
On Mon, Nov 3, 2014 at 3:26 PM, Peter Eisentraut  wrote:
> On 11/1/14 8:04 AM, Petr Jelinek wrote:
>> On second thought, maybe those should be pg_get_transaction_committs,
>> pg_get_transaction_committs_data, etc.
>
> Please don't name anything "committs".  That looks like a misspelling of
> something.
>
> There is nothing wrong with
>
> pg_get_transaction_commit_timestamp()
>
> If you want to reduce the length, lose the "get".

+1: all non void returning functions 'get' something.

>> For me the commit time thing feels problematic in the way I perceive it
>> - I see commit time as a point in time, where I see commit timestamp (or
>> committs for short) as something that can recorded. So I would prefer to
>> stick with commit timestamp/committs.
>
> In PostgreSQL, it is pretty clearly established that time is hours,
> minutes, seconds, and timestamp is years, months, days, hours, minutes,
> seconds.  So unless this feature only records the hour, minute, and
> second of a commit, it should be "timestamp".

Elsewhere, for example, we have: "pg_last_xact_replay_timestamp()".
So, in keeping with that, maybe,

pg_xact_commit_timestamp(xid)

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Kevin Grittner
Alvaro Herrera  wrote:

> There is a real advantage of money over numeric in the performance
> front.  I haven't measured it, but suffice to say that money uses
> integer operations which map almost directly to CPU instructions,
> whereas numeric needs to decode from our varlena base-1 digit
> format, operate on digits at a time, then encode back.  No matter how
> much you optimize numeric, it's never going to outperform stuff that
> runs practically on bare electrons.  As far as I recall, some TPCH
> queries run aggregations on currency columns.
>
> Now, whether this makes a measurable difference or not in TPCH terms, I
> have no idea.

Best of 3 tries on each statement...

A count(*) as a baseline:

test=# do $$ begin perform count(*) from generate_series(1,1000); end; $$;
DO
Time: 3260.920 ms

A sum of money:

test=# do $$ begin perform sum('1.01'::money) from 
generate_series(1,1000); end; $$;
DO
Time: 3572.709 ms

A sum of numeric:

test=# do $$ begin perform sum('1.01'::numeric) from 
generate_series(1,1000); end; $$;
DO
Time: 4805.559 ms

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Tatsuo Ishii
> The performance of our numeric vs Oracle's was a common complaint when
> I was at EnterpriseDB (in 2007).
> 
> Perhaps numeric's performance could be greatly improved in cases where
> the precision is low enough to map to an int/bigint. That would get us
> closer to eliminating money as well as give other uses a big win. Of
> course, how to do that and not break pg_upgrade is a huge question...

Just out of curiosity, why is Oracle's NUMBER (I assume you are
talking about this) so fast?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-03 Thread Petr Jelinek

On 03/11/14 22:26, Peter Eisentraut wrote:

On 11/1/14 8:04 AM, Petr Jelinek wrote:

On second thought, maybe those should be pg_get_transaction_committs,
pg_get_transaction_committs_data, etc.


Please don't name anything "committs".  That looks like a misspelling of
something.

There is nothing wrong with

pg_get_transaction_commit_timestamp()

If you want to reduce the length, lose the "get".



I am fine with that, I only wonder if your definition of "anything" only 
concerns the SQL interfaces or also the internals.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-11-03 Thread Andrew Dunstan


On 11/03/2014 04:44 PM, Peter Eisentraut wrote:

On 10/29/14 8:42 AM, Robert Haas wrote:

I'm sympathetic to that line of reasoning, but I really think that if
you want to keep this infrastructure, it needs to be made portable.

Let me clarify that this was my intention.  I have looked at many test
frameworks, many of which are much nicer than what we have, but the
portability and dependency implications for this project would have been
between shocking and outrageous.  I settled for what I felt was the
absolute minimum: Perl + IPC::Run.  It was only later on that I learned
that 1) subtests don't work in Perl 5.10, and 2) subtests are broken in
Perl 5.12.  So we removed the use of subtests and now we are back at the
baseline I started with.

The irony in this whole story is that if we had thrown this onto the
build farm right away, we might have found and fixed these problems
within a week instead of five months later.





An email note to me would probably have done the trick. I've missed one 
or two things lately, since my mind has been to some extent elsewhere, 
and don't mind a little memory jog.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-11-03 Thread Peter Eisentraut
On 10/29/14 8:42 AM, Robert Haas wrote:
> I'm sympathetic to that line of reasoning, but I really think that if
> you want to keep this infrastructure, it needs to be made portable.

Let me clarify that this was my intention.  I have looked at many test
frameworks, many of which are much nicer than what we have, but the
portability and dependency implications for this project would have been
between shocking and outrageous.  I settled for what I felt was the
absolute minimum: Perl + IPC::Run.  It was only later on that I learned
that 1) subtests don't work in Perl 5.10, and 2) subtests are broken in
Perl 5.12.  So we removed the use of subtests and now we are back at the
baseline I started with.

The irony in this whole story is that if we had thrown this onto the
build farm right away, we might have found and fixed these problems
within a week instead of five months later.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-11-03 Thread Tom Lane
Peter Eisentraut  writes:
> I thank you for this research, but I suggest that we ship 9.4 as is,
> that is with requiring IPC::Run and --enable-* option.  All the possible
> alternatives will clearly need more rounds of portability testing.  We
> can then evaluate these changes for 9.5 in peace.

I concur --- it's time to get 9.4 out, and the current state of affairs
is Good Enough imo.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-11-03 Thread Peter Eisentraut
On 11/2/14 2:00 PM, Noah Misch wrote:
>> Ick; I concur with your judgment on those aspects of the IPC::Cmd design.
>> Thanks for investigating.  So, surviving options include:
>>
>> 1. Require IPC::Run.
>> 2. Write our own run() that reports the raw exit code.
>> 3. Distill the raw exit code from the IPC::Cmd::run() error string.
>> 4. Pass IPC::Run::run_forked() a subroutine that execs an argument list.
> 
> FWIW, (3) looks most promising to me.  That is to say, implement a reverse of
> IPC::Cmd::_pp_child_error().  Ugly to be sure, but the wart can be small and
> self-contained.

I thank you for this research, but I suggest that we ship 9.4 as is,
that is with requiring IPC::Run and --enable-* option.  All the possible
alternatives will clearly need more rounds of portability testing.  We
can then evaluate these changes for 9.5 in peace.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-03 Thread Peter Eisentraut
On 11/1/14 8:04 AM, Petr Jelinek wrote:
> On second thought, maybe those should be pg_get_transaction_committs,
> pg_get_transaction_committs_data, etc.

Please don't name anything "committs".  That looks like a misspelling of
something.

There is nothing wrong with

pg_get_transaction_commit_timestamp()

If you want to reduce the length, lose the "get".

> For me the commit time thing feels problematic in the way I perceive it
> - I see commit time as a point in time, where I see commit timestamp (or
> committs for short) as something that can recorded. So I would prefer to
> stick with commit timestamp/committs.

In PostgreSQL, it is pretty clearly established that time is hours,
minutes, seconds, and timestamp is years, months, days, hours, minutes,
seconds.  So unless this feature only records the hour, minute, and
second of a commit, it should be "timestamp".



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[3]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Alexey Vasiliev



Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello 
:
> 
> >
> > Should I change my patch and send it by email? And also as I understand I 
> > should change message ID for  
> > https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it?
> >
> 
> You should just send another version of your patch by email and add a new 
> "comment" to commit-fest using the "Patch" comment type.


Added new patch.

-- 
Alexey Vasiliev
From a5d941717df71e4c16941b8a02f0dca40a1107a0 Mon Sep 17 00:00:00 2001
From: Alexey Vasiliev 
Date: Mon, 3 Nov 2014 00:21:14 +0200
Subject: [PATCH] add recovery_timeout to controll timeout between
 restore_command nonzero

---
 doc/src/sgml/recovery-config.sgml   | 24 
 src/backend/access/transam/recovery.conf.sample |  5 +
 src/backend/access/transam/xlog.c   | 20 ++--
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0f1ff34..98eb451 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,30 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   
  
 
+ 
+  restore_command_retry_interval (integer)
+  
+restore_command_retry_interval recovery parameter
+  
+  
+  
+   
+By default, if restore_command return nonzero status,
+server will retry command again after 5 seconds. This parameter
+allow to change this time. This parameter is optional. This can
+be useful to increase/decrease number of a restore_command calls.
+   
+   
+This is useful, if I using for restore of wal logs some
+external storage (like AWS S3) and no matter what the slave database
+will lag behind the master. The problem, what for each request to
+AWS S3 need to pay, what is why for N nodes, which try to get next
+wal log each 5 seconds will be bigger price, than for example each
+30 seconds.
+   
+  
+ 
+
 
 
   
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 7657df3..4eee8f4 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional timeout after nonzero code of restore_command.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_command_retry_interval = 5s
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3c9aeae..c26101e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -233,6 +233,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_command_retry_interval = 0;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -5245,6 +5246,20 @@ readRecoveryCommandFile(void)
 	(errmsg_internal("trigger_file = '%s'",
 	 TriggerFile)));
 		}
+		else if (strcmp(item->name, "restore_command_retry_interval") == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item->value, &restore_command_retry_interval, GUC_UNIT_MS,
+		   &hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("parameter \"%s\" requires a temporal value",
+"restore_command_retry_interval"),
+		 hintmsg ? errhint("%s", _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal("restore_command_retry_interval = '%s'", item->value)));
+		}
 		else if (strcmp(item->name, "recovery_min_apply_delay") == 0)
 		{
 			const char *hintmsg;
@@ -11104,13 +9,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	}
 
 	/*
-	 * Wait for more WAL to arrive. Time out after 5 seconds,
+	 * Wait for more WAL to arrive. Time out after
+	 * restore_command_retry_interval (5 seconds by default),
 	 * like when polling the archive, to react to a trigger
 	 * file promptly.
 	 */
 	WaitLatch(&XLogCtl->recoveryWakeupLatch,
 			  WL_LATCH_SET | WL_TIMEOUT,
-			  5000L);
+			  restore_command_retry_interval > 0 ? restore_command_retry_interval : 5000L);
 	ResetLatch(&XLogCtl->recoveryWakeupLatch);
 	break;
 }
-- 
1.9.3 (Apple Git-50)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-03 Thread Tom Lane
=?ISO-8859-1?Q?=C1lvaro_Hern=E1ndez_Tortosa?=  writes:
>  IMHO, from a user perspective the transaction begins when the BEGIN 
> command is issued. If I really want to see a "frozen" view of the 
> database before any real SELECT, I have to issue another query like 
> "SELECT 1". This seems odd to me. I understand tx snapshot may be 
> deferred until real execution for performance reasons, but it is 
> confusing from a user perspective. Is this really expected, or is it a 
> bug? Am I having a bad day and missing some point here? ^_^

It's expected.  Without this behavior, you could not take out any locks
before freezing the transaction snapshot, which would be a bad thing.
I think there are some examples in the "concurrency control" chapter
of the manual.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Fabrízio de Royes Mello
>
> Should I change my patch and send it by email? And also as I understand I
should change message ID for
https://commitfest.postgresql.org/action/patch_view?id=1636, isn't it?
>

You should just send another version of your patch by email and add a new
"comment" to commit-fest using the "Patch" comment type.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] how to handle missing "prove"

2014-11-03 Thread Tom Lane
Peter Eisentraut  writes:
> On 11/3/14 3:11 PM, Tom Lane wrote:
>> If there's a commit that goes with this statement, you neglected to push 
>> it...

> Are you not seeing this in configure.in:

>   AC_CHECK_PROGS(PROVE, prove)
>   if test -z "$PROVE"; then
> AC_MSG_ERROR([prove not found])

Oh, duh.  I read your message to mean that you'd just pushed a change
for that, not that it already did it.  Nevermind ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to handle missing "prove"

2014-11-03 Thread Peter Eisentraut
On 11/3/14 3:11 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 11/2/14 11:36 AM, Tom Lane wrote:
>>> Committed patch looks good, but should we also add the stanza we discussed
>>> in Makefile.global.in concerning defining $(prove) in terms of "missing"
>>> if we didn't find it?  I think the behavior of HEAD when you ask for
>>> --enable-tap-tests but don't have "prove" might be less than ideal.
> 
>> configure will now fail when "prove" is not found.
> 
> If there's a commit that goes with this statement, you neglected to push it...

Are you not seeing this in configure.in:

#
# Check for test tools
#
if test "$enable_tap_tests" = yes; then
  AC_CHECK_PROGS(PROVE, prove)
  if test -z "$PROVE"; then
AC_MSG_ERROR([prove not found])
  fi
  if test -z "$PERL"; then
AC_MSG_ERROR([Perl not found])
  fi
fi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-11-03 Thread Tom Lane
Abhijit Menon-Sen  writes:
> Earlier, I was using a combination of check and assign hooks to convert
> names to OIDs, but (as Andres pointed out) that would have problems with
> cache invalidations. I was even playing with caching membership lookups,
> but I ripped out all that code.

> In the attached patch, role_is_audited does all the hard work to split
> up the list of roles, look up the corresponding OIDs, and check if the
> user is a member of any of those roles. It works fine, but it doesn't
> seem desirable to repeat all that work for every statement.

> So does anyone have suggestions about how to make this faster?

Have you read the code in acl.c that caches lookup results for
role-is-member-of checks?  Sounds pretty closely related.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-11-03 Thread Abhijit Menon-Sen
Hi.

I could actually use some comments on the approach. I've attached
a prototype I've been working on (which is a cut down version of
my earlier code; but it's not terribly interesting and you don't
need to read it to comment on my questions below). The attached
patch does the following:

1. Adds a pgaudit.roles = 'role1, role2' GUC setting.

2. Adds a role_is_audited() function that returns true if the given
   role OID is mentioned in (or inherits from a role mentioned in)
   pgaudit.roles.

3. Adds a call to role_is_audited from log_audit_event with the current
   user id (GetSessionUserId in the patch, though it may be better to
   use GetUserId; but that's a minor detail).

Earlier, I was using a combination of check and assign hooks to convert
names to OIDs, but (as Andres pointed out) that would have problems with
cache invalidations. I was even playing with caching membership lookups,
but I ripped out all that code.

In the attached patch, role_is_audited does all the hard work to split
up the list of roles, look up the corresponding OIDs, and check if the
user is a member of any of those roles. It works fine, but it doesn't
seem desirable to repeat all that work for every statement.

So does anyone have suggestions about how to make this faster?

-- Abhijit
diff --git a/pgaudit.c b/pgaudit.c
index 30f729a..5ba9886 100644
--- a/pgaudit.c
+++ b/pgaudit.c
@@ -58,6 +58,13 @@ PG_FUNCTION_INFO_V1(pgaudit_func_ddl_command_end);
 PG_FUNCTION_INFO_V1(pgaudit_func_sql_drop);
 
 /*
+ * pgaudit_roles_str is the string value of the pgaudit.roles
+ * configuration variable, which is a list of role names.
+ */
+
+char *pgaudit_roles_str = NULL;
+
+/*
  * pgaudit_log_str is the string value of the pgaudit.log configuration
  * variable, e.g. "read, write, user". Each token corresponds to a flag
  * in enum LogClass below. We convert the list of tokens into a bitmap
@@ -117,6 +124,40 @@ typedef struct {
 } AuditEvent;
 
 /*
+ * Takes a role OID and returns true or false depending on whether
+ * auditing it enabled for that roles according to the pgaudit.roles
+ * setting.
+ */
+
+static bool
+role_is_audited(Oid roleid)
+{
+	List *roles;
+	ListCell *lt;
+
+	if (!SplitIdentifierString(pgaudit_roles_str, ',', &roles))
+		return false;
+
+	foreach(lt, roles)
+	{
+		char *name = (char *)lfirst(lt);
+		HeapTuple roleTup;
+
+		roleTup = SearchSysCache1(AUTHNAME, PointerGetDatum(name));
+		if (HeapTupleIsValid(roleTup))
+		{
+			Oid parentrole = HeapTupleGetOid(roleTup);
+
+			ReleaseSysCache(roleTup);
+			if (is_member_of_role(roleid, parentrole))
+return true;
+		}
+	}
+
+	return false;
+}
+
+/*
  * Takes an AuditEvent and returns true or false depending on whether
  * the event should be logged according to the pgaudit.log setting. If
  * it returns true, it also fills in the name of the LogClass which it
@@ -289,18 +330,24 @@ should_be_logged(AuditEvent *e, const char **classname)
 static void
 log_audit_event(AuditEvent *e)
 {
+	Oid userid;
 	const char *timestamp;
 	const char *database;
 	const char *username;
 	const char *eusername;
 	const char *classname;
 
+	userid = GetSessionUserId();
+
+	if (!role_is_audited(userid))
+		return;
+
 	if (!should_be_logged(e, &classname))
 		return;
 
 	timestamp = timestamptz_to_str(GetCurrentTimestamp());
 	database = get_database_name(MyDatabaseId);
-	username = GetUserNameFromId(GetSessionUserId());
+	username = GetUserNameFromId(userid);
 	eusername = GetUserNameFromId(GetUserId());
 
 	/*
@@ -328,7 +375,7 @@ log_executor_check_perms(List *rangeTabls, bool abort_on_violation)
 {
 	ListCell *lr;
 
-	foreach (lr, rangeTabls)
+	foreach(lr, rangeTabls)
 	{
 		Relation rel;
 		AuditEvent e;
@@ -1127,6 +1174,32 @@ pgaudit_object_access_hook(ObjectAccessType access,
 }
 
 /*
+ * Take a pgaudit.roles value such as "role1, role2" and verify that
+ * the string consists of comma-separated tokens.
+ */
+
+static bool
+check_pgaudit_roles(char **newval, void **extra, GucSource source)
+{
+	List *roles;
+	char *rawval;
+
+	/* Make sure we have a comma-separated list of tokens. */
+
+	rawval = pstrdup(*newval);
+	if (!SplitIdentifierString(rawval, ',', &roles))
+	{
+		GUC_check_errdetail("List syntax is invalid");
+		list_free(roles);
+		pfree(rawval);
+		return false;
+	}
+	pfree(rawval);
+
+	return true;
+}
+
+/*
  * Take a pgaudit.log value such as "read, write, user", verify that
  * each of the comma-separated tokens corresponds to a LogClass value,
  * and convert them into a bitmap that log_audit_event can check.
@@ -1232,6 +1305,23 @@ _PG_init(void)
  errmsg("pgaudit must be loaded via shared_preload_libraries")));
 
 	/*
+	 * pgaudit.roles = "role1, role2"
+	 *
+	 * This variable defines a list of roles for which auditing is
+	 * enabled.
+	 */
+
+	DefineCustomStringVariable("pgaudit.roles",
+			   "Enable auditing for certain roles",
+			   NULL,
+			   &pgaudit_roles_str,
+			   "",
+			   PGC_SUSET,
+			   GUC_LIST_INPUT | GUC_NOT_IN_SAMPL

Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-11-03 Thread Jeff Janes
On Wed, Oct 15, 2014 at 1:11 PM, Jeff Janes  wrote:

> On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge  > wrote:
>
>> Hi,
>>
>> As part of our monitoring work for our customers, we stumbled upon an
>> issue with our customers' servers who have a wal_keep_segments setting
>> higher than 0.
>>
>> We have a monitoring script that checks the number of WAL files in the
>> pg_xlog directory, according to the setting of three parameters
>> (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
>> We usually add a percentage to the usual formula:
>>
>> greatest(
>>   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
>>   checkpoint_segments + wal_keep_segments + 1
>> )
>>
>
> I think the first bug is even having this formula in the documentation to
> start with, and in trying to use it.
>
> "and will normally not be more than..."
>
> This may be "normal" for a toy system.  I think that the normal state for
> any system worth monitoring is that it has had load spikes at some point in
> the past.
>
> So it is the next part of the doc, which describes how many segments it
> climbs back down to upon recovering from a spike, which is the important
> one.  And that doesn't mention wal_keep_segments at all, which surely
> cannot be correct.
>
> I will try to independently derive the correct formula from the code, as
> you did, without looking too much at your derivation  first, and see if we
> get the same answer.
>

It looked to me that the formula, when descending from a previously
stressed state, would be:

greatest(1 + checkpoint_completion_target) * checkpoint_segments,
wal_keep_segments) + 1 +
2 * checkpoint_segments + 1

This assumes logs are filled evenly over a checkpoint cycle, which is
probably not true because there is a spike in full page writes right after
a checkpoint starts.

But I didn't have a great deal of confidence in my analysis.

The first line reflects the number of WAL that will be retained as-is, the
second is the number that will be recycled for future use before starting
to delete them.

My reading of the code is that wal_keep_segments is computed from the
current end of WAL (i.e the checkpoint record), not from the checkpoint
redo point.  If I distribute the part outside the 'greatest' into both
branches of the 'greatest', I don't get the same answer as you do for
either branch.

Then I started wondering if the number we keep for recycling is a good
choice, anyway.  2 * checkpoint_segments + 1 seems pretty large.  But then
again, given that we've reached the high-water-mark once, how unlikely are
we to reach it again?

Cheers,

Jeff


Re: [HACKERS] tracking commit timestamps

2014-11-03 Thread Alvaro Herrera
Jim Nasby wrote:
> On 11/1/14, 8:41 AM, Petr Jelinek wrote:
> >Well this is not BDR specific thing, the idea is that with logical 
> >replication, commit timestamp is not enough for conflict handling, you also 
> >need to have additional info in order to identify some types of conflicts 
> >conflicts (local update vs remote update for example). So the extradata 
> >field was meant as something that could be used to add the additional info 
> >to the xid.
> 
> Related to this... is there any way to deal with 2 transactions that commit 
> in the same microsecond? It seems silly to try and handle that for every 
> commit since it should be quite rare, but perhaps we could store the LSN as 
> extradata if we detect a conflict?

Well, two things.  One, LSN is 8 bytes and extradata (at least in this
patch when I last saw it) is only 4 bytes.  But secondly and more
important is that detecting a conflict is done in node B *after* node A
has recorded the transaction's commit time; there is no way to know at
commit time that there is going to be a conflict caused by that
transaction in the future.  (If there was a way to tell, you could just
as well not commit the transaction in the first place.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Alexey Vasiliev
Mon, 03 Nov 2014 14:36:33 -0500 от Peter Eisentraut :
> On 11/3/14 6:04 AM, Alexey Vasiliev wrote:
> >  3. What the patch does in a short paragraph: This patch should add
> > option recovery_timeout, which help to control timeout of
> > restore_command nonzero status code. Right now default value is 5
> > seconds. This is useful, if I using for restore of wal logs some
> > external storage (like AWS S3) and no matter what the slave database
> > will lag behind the master. The problem, what for each request to
> > AWS S3 need to pay, what is why for N nodes, which try to get next
> > wal log each 5 seconds will be bigger price, than for example each
> > 30 seconds.
> 
> That seems useful.  I would include something about this use case in the
> documentation.

Ok, I will add this in patch.

> 
> > This is my first patch. I am not sure about name of option. Maybe it
> > should called "recovery_nonzero_timeout".
> 
> The option name had me confused.  At first I though this is the time
> after which a running restore_command invocation gets killed.  I think a
> more precise description might be restore_command_retry_interval.

"restore_command_retry_interval" - I like this name!

Should I change my patch and send it by email? And also as I understand I 
should change message ID for 
https://commitfest.postgresql.org/action/patch_view?id=1636, isn't it?

Thanks

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Jim Nasby

On 11/3/14, 1:34 PM, Alvaro Herrera wrote:

David Fetter wrote:

On Mon, Nov 03, 2014 at 01:54:09PM +0100, Michael Banck wrote:

As an additional datapoint, Vitesse Data changed the DB schema from
NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
modification to data types is easy to understand -- money and double
types are faster than Numeric (and no one on this planet has a bank
account that overflows the money type, not any time soon)."[1] And
"Replaced NUMERIC fields representing currency with MONEY"[2].

Not sure whether they modified/optimized PostgreSQL with respect to the
MONEY data type and/or how much performance that gained, so CCing CK Tan
as well.


How does our NUMERIC type's performance compare to other systems'
precise types?  I realize that some of those systems might have
restrictions on publishing numbers they don't authorize, but they
might have pushed some authorized numbers if those numbers make them
look good.


There is a real advantage of money over numeric in the performance
front.  I haven't measured it, but suffice to say that money uses
integer operations which map almost directly to CPU instructions,
whereas numeric needs to decode from our varlena base-1 digit
format, operate on digits at a time, then encode back.  No matter how
much you optimize numeric, it's never going to outperform stuff that
runs practically on bare electrons.  As far as I recall, some TPCH
queries run aggregations on currency columns.

Now, whether this makes a measurable difference or not in TPCH terms, I
have no idea.


The performance of our numeric vs Oracle's was a common complaint when I was at 
EnterpriseDB (in 2007).

Perhaps numeric's performance could be greatly improved in cases where the 
precision is low enough to map to an int/bigint. That would get us closer to 
eliminating money as well as give other uses a big win. Of course, how to do 
that and not break pg_upgrade is a huge question...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to handle missing "prove"

2014-11-03 Thread Tom Lane
Peter Eisentraut  writes:
> On 11/2/14 11:36 AM, Tom Lane wrote:
>> Committed patch looks good, but should we also add the stanza we discussed
>> in Makefile.global.in concerning defining $(prove) in terms of "missing"
>> if we didn't find it?  I think the behavior of HEAD when you ask for
>> --enable-tap-tests but don't have "prove" might be less than ideal.

> configure will now fail when "prove" is not found.

If there's a commit that goes with this statement, you neglected to push it...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing FIN_CRC32 calls in logical replication code

2014-11-03 Thread Heikki Linnakangas

On 10/31/2014 03:46 PM, Andres Freund wrote:

On 2014-10-27 09:30:33 -0400, Tom Lane wrote:

Andres Freund  writes:

On 2014-10-27 12:51:44 +0200, Heikki Linnakangas wrote:

replication/slot.c and replication/logical/snapbuild.c use a CRC on the
physical slot and snapshot files. It uses the same algorithm as used e.g.
for the WAL. However, they are not doing the finalization step, FIN_CRC32()
on the calculated checksums. Not that it matters much, but it's a bit weird
and inconsistent, and was probably an oversight.



Hm. Good catch - that's stupid. I wonder what to do about it. I'm
tempted to just add a comment about it to 9.4 and fix it on master as
changing it is essentially a catversion bump. Any objections to that?


Yeah, I think you should get it right the first time.  It hardly seems
likely that any 9.4 beta testers are depending on those files to be stable
yet.


Since both state files have the version embedded it'd be trivial to just
do the FIN_CRC32() when loading a version 2 file. Does anybody object to
the relevant two lines of code + docs?


No objection, if you feel the backwards-compatibility with beta3 is 
worth it.


Looking at slot.c again, a comment in ReplicationSlotOnDisk contradicts 
the code:



/*
 * Replication slot on-disk data structure.
 */
typedef struct ReplicationSlotOnDisk
{
/* first part of this struct needs to be version independent */

/* data not covered by checksum */
uint32  magic;
pg_crc32checksum;

/* data covered by checksum */
uint32  version;
uint32  length;

ReplicationSlotPersistentData slotdata;
} ReplicationSlotOnDisk;

/* size of the part of the slot that is version independent */
#define ReplicationSlotOnDiskConstantSize \
offsetof(ReplicationSlotOnDisk, slotdata)
/* size of the slots that is not version indepenent */
#define ReplicationSlotOnDiskDynamicSize \
sizeof(ReplicationSlotOnDisk) - ReplicationSlotOnDiskConstantSize


The code that calculates the checksum skips over the constant size, i.e. 
upto ReplicationSlotOnDiskConstantSize, or slotdata. That means that the 
version and length are in fact not covered by the checksum, contrary to 
the comment.


Looking at the similar code in snapbuild.c, I think the code was 
supposed to do what the comment says, and include 'version' and 'length' 
in the checksum. It would be nice to fix that too in slot.c, to be 
consistent with snapbuild.c.


PS. I find the name "ReplicationSlotOnDiskDynamicSize" confusing, as it 
is in fact a fixed size struct. I gather it's expected that the size of 
that part might change across versions, but by that definition nothing 
is constant.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Jeff Janes
On Sat, Nov 1, 2014 at 10:26 AM, Andres Freund 
wrote:

> On 2014-11-01 10:18:03 -0700, Josh Berkus wrote:
> > On 10/31/2014 03:07 PM, Tom Lane wrote:
> > > I don't care one way or the other about the money type, but I will
> defend
> > > hash indexes, especially seeing that we've already added a pretty
> > > in-your-face warning as of 9.5:
> > >
> > > regression=# create table foo(f1 int);
> > > CREATE TABLE
> > > regression=# create index on foo using hash (f1);
> > > WARNING:  hash indexes are not WAL-logged and their use is discouraged
> > > CREATE INDEX
> >
> > Yes, and I'm arguing that is the wrong decision.  If hash indexes are
> > "discouraged", then they shouldn't be in core in the first place.
>
> Last time we discussed it there were people (IIRC Andrew was one of
> them) commenting that they use hash indexes *precisely* because they're
> not WAL logged and that they can live with the dangers that creates. I
> don't think that's sufficient justification for introducing the feature
> at all. But it's nothing new that removing a feature has to fit quite
> different criteria than adding one.
>
> So, by that argument we could remove hash indexes once we have unlogged
> indexes on logged tables. But then there's no need to remove them
> anymore...
>

I would object to removing hash indexes as long as the alternative way to
index oversized value is to write all my SQL to look like:

select count(*) from foo where substr(x,1,2700)=substr($1,1,2700) and x=$1

Now, if the planner were smart enough to realize that x=$1 implies
substr(x,1,2700)=substr($1,1,2700), that might be a different matter.  But
it is not.

Or, if there were a way to create a view on foo which would do this
implication automatically, but again as far as I know there is not a way to
do that either.

Cheers,

Jeff


Re: [HACKERS] tracking commit timestamps

2014-11-03 Thread Jim Nasby

On 11/1/14, 8:41 AM, Petr Jelinek wrote:

Well this is not BDR specific thing, the idea is that with logical replication, 
commit timestamp is not enough for conflict handling, you also need to have 
additional info in order to identify some types of conflicts conflicts (local 
update vs remote update for example). So the extradata field was meant as 
something that could be used to add the additional info to the xid.


Related to this... is there any way to deal with 2 transactions that commit in 
the same microsecond? It seems silly to try and handle that for every commit 
since it should be quite rare, but perhaps we could store the LSN as extradata 
if we detect a conflict?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to handle missing "prove"

2014-11-03 Thread Peter Eisentraut
On 11/2/14 11:36 AM, Tom Lane wrote:
> Committed patch looks good, but should we also add the stanza we discussed
> in Makefile.global.in concerning defining $(prove) in terms of "missing"
> if we didn't find it?  I think the behavior of HEAD when you ask for
> --enable-tap-tests but don't have "prove" might be less than ideal.

configure will now fail when "prove" is not found.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Peter Eisentraut
On 11/3/14 6:04 AM, Alexey Vasiliev wrote:
>  3. What the patch does in a short paragraph: This patch should add
> option recovery_timeout, which help to control timeout of
> restore_command nonzero status code. Right now default value is 5
> seconds. This is useful, if I using for restore of wal logs some
> external storage (like AWS S3) and no matter what the slave database
> will lag behind the master. The problem, what for each request to
> AWS S3 need to pay, what is why for N nodes, which try to get next
> wal log each 5 seconds will be bigger price, than for example each
> 30 seconds.

That seems useful.  I would include something about this use case in the
documentation.

> This is my first patch. I am not sure about name of option. Maybe it
> should called "recovery_nonzero_timeout".

The option name had me confused.  At first I though this is the time
after which a running restore_command invocation gets killed.  I think a
more precise description might be restore_command_retry_interval.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Alvaro Herrera
David Fetter wrote:
> On Mon, Nov 03, 2014 at 01:54:09PM +0100, Michael Banck wrote:

> > As an additional datapoint, Vitesse Data changed the DB schema from
> > NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
> > modification to data types is easy to understand -- money and double
> > types are faster than Numeric (and no one on this planet has a bank
> > account that overflows the money type, not any time soon)."[1] And
> > "Replaced NUMERIC fields representing currency with MONEY"[2].
> > 
> > Not sure whether they modified/optimized PostgreSQL with respect to the
> > MONEY data type and/or how much performance that gained, so CCing CK Tan
> > as well.
> 
> How does our NUMERIC type's performance compare to other systems'
> precise types?  I realize that some of those systems might have
> restrictions on publishing numbers they don't authorize, but they
> might have pushed some authorized numbers if those numbers make them
> look good.

There is a real advantage of money over numeric in the performance
front.  I haven't measured it, but suffice to say that money uses
integer operations which map almost directly to CPU instructions,
whereas numeric needs to decode from our varlena base-1 digit
format, operate on digits at a time, then encode back.  No matter how
much you optimize numeric, it's never going to outperform stuff that
runs practically on bare electrons.  As far as I recall, some TPCH
queries run aggregations on currency columns.

Now, whether this makes a measurable difference or not in TPCH terms, I
have no idea.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread David Fetter
On Mon, Nov 03, 2014 at 01:54:09PM +0100, Michael Banck wrote:
> Am Sonntag, den 02.11.2014, 12:41 -0500 schrieb Tom Lane:
> > BTW, after reflecting a bit more I'm less than convinced that this
> > datatype is completely useless.  Even if you prefer to store currency
> > values in numeric columns, casting to or from money provides a way to
> > accept or emit values in whatever monetary format the LC_MONETARY locale
> > setting specifies.  That seems like a useful feature, and it's one you
> > could not easily duplicate using to_char/to_number (not to mention that
> > those functions aren't without major shortcomings of their own).
> 
> As an additional datapoint, Vitesse Data changed the DB schema from
> NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
> modification to data types is easy to understand -- money and double
> types are faster than Numeric (and no one on this planet has a bank
> account that overflows the money type, not any time soon)."[1] And
> "Replaced NUMERIC fields representing currency with MONEY"[2].
> 
> Not sure whether they modified/optimized PostgreSQL with respect to the
> MONEY data type and/or how much performance that gained, so CCing CK Tan
> as well.

How does our NUMERIC type's performance compare to other systems'
precise types?  I realize that some of those systems might have
restrictions on publishing numbers they don't authorize, but they
might have pushed some authorized numbers if those numbers make them
look good.

Also, just a general three-state comparison, "we beat them handily,"
"we're somewhere near them" or "we're not even close" would be enough
to work from.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-03 Thread Álvaro Hernández Tortosa

Hi!

Given a transaction started with "BEGIN (REPEATABLE READ | 
SERIALIZABLE)", if a concurrent session commits some data before *any* 
query within the first transaction, that committed data is seen by the 
transaction. This is not what I'd expect. Specifically, the 
documentation states that:


"The Repeatable Read isolation level only sees data committed before the 
transaction began;" [1]


IMHO, from a user perspective the transaction begins when the BEGIN 
command is issued. If I really want to see a "frozen" view of the 
database before any real SELECT, I have to issue another query like 
"SELECT 1". This seems odd to me. I understand tx snapshot may be 
deferred until real execution for performance reasons, but it is 
confusing from a user perspective. Is this really expected, or is it a 
bug? Am I having a bad day and missing some point here? ^_^


Regards,

Álvaro


[1] http://www.postgresql.org/docs/devel/static/transaction-iso.html


P.S. In case it wasn't clear what I meant, here's an example:


Session 1 Session 2

CREATE TABLE i (i integer);
BEGIN ISOLATION LEVEL REPEATABLE READ;
INSERT INTO i VALUES (1);
SELECT i FROM i; -- returns 1 row, value 1
-- should return empty set
INSERT INTO i VALUES (2);
SELECT i FROM i; -- returns 1 row, value 1
-- returns, as it should, the same as the previous query


In the first select, I'd have expected to have no rows. If a "SELECT 1" 
is issued after BEGIN, there are no rows found.


--
Álvaro Hernández Tortosa


---
8Kdata



Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Jim Nasby

On 11/1/14, 1:45 PM, Andrew Dunstan wrote:

On 11/01/2014 02:34 PM, Andres Freund wrote:

Yeah, if we were trying to duplicate the behavior of indisvalid, there'd
need to be a way to detect the invalid index at plan time and not use it.
But I'm not sure that that's actually an improvement from the user's
standpoint: what they'd see is queries suddenly, and silently, performing
a lot worse than they expect.  An explicit complaint about the necessary
REINDEX seems more user-friendly from where I sit.

A REINDEX is imo unlikely to be acceptable. It takes long (why would you
bother on a small table?) and locks the relation/indexes.



It's a bit of a pity we don't have REINDEX CONCURRENTLY.


Reviews welcome: https://commitfest.postgresql.org/action/patch_view?id=1563
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Robert Haas
On Mon, Nov 3, 2014 at 1:02 PM, Simon Riggs  wrote:
> OK, I think I'm happy with this as a general point.

Cool!

> How will we automatically test this code?

Good question.  I can see two possible approaches:

1. We write a test_group_locking harness along the lines of
test_shm_mq and test_decoding and put that in contrib.

2. We wait until higher-level facilities built on top of this are
available and get regression test coverage of this code via those
higher-level modules.

Personally, I can't imagine debugging this code without writing some
kind of test harness that only does locking; I don't want my locking
bugs to be mixed with my planner and executor bugs.  But I could go
either way on actually putting that code in contrib.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Simon Riggs
On 3 November 2014 17:00, Robert Haas  wrote:
> On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs  wrote:
>> The main question is still why any of this is necessary? If we are
>> only acquiring fast path locks in workers, what need is there for any
>> of this? The leader and workers never conflict with each other, so why
>> would they ever wait for each other (except at a point where they are
>> actively transferring data), which is the necessary pre-condition for
>> a deadlock?
>>
>> Yes, running a command with a conflicting lock would disrupt the start
>> of a parallel operation, but as I said above, the deadlock detector
>> will eventually see that and re-arrange us so there is no deadlock.
>>
>> The only need I can see is if one of the workers/leader requests a
>> Strong lock on an object, someone else requests a conflicting lock on
>> that object and then we perform a parallel operation ourselves. We
>> could easily solve that by saying that you can't go parallel if you
>> hold a strong lock AND that workers can't take anything higher than
>> RowShareLock, like HS backends. That's good enough for now.
>
> That would be enough to keep the parallel group from deadlocking with
> itself, but it wouldn't be enough to prevent deadlock with other
> processes, which is the scenario that actually worries me a lot more.
> In particular, I'm worried about deadlocks where some other process
> gets sandwiched between two processes from the same parallel group.
> That is, let A1 be a parallel group leader and A2 be a process from
> the same parallel group, and let B be an unrelated process.  I'm
> worried about the situation where A2 waits for B which waits for A1
> which (outside the view of the deadlock detector) waits for A2.  All
> that can happen even if the parallel backends hold no strong locks,
> because B can hold strong locks.

OK, I think I'm happy with this as a general point.

How will we automatically test this code?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-11-03 Thread Abhijit Menon-Sen
At 2014-11-03 17:31:37 +, si...@2ndquadrant.com wrote:
>
> Stephen's suggestion allows auditing to be conducted without the
> users/apps being aware it is taking place.

OK, that makes sense. I'm working on a revised patch that does things
this way, and will post it in a few days.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2014-11-03 Thread Simon Riggs
On 14 October 2014 20:33, Abhijit Menon-Sen  wrote:
> At 2014-10-14 20:09:50 +0100, si...@2ndquadrant.com wrote:
>>
>> I think that's a good idea.
>>
>> We could have pg_audit.roles = 'audit1, audit2'
>
> Yes, it's a neat idea, and we could certainly do that. But why is it any
> better than "ALTER ROLE audit_rw SET pgaudit.log = …" and granting that
> role to the users whose actions you want to audit?

That would make auditing visible to the user, who may then be able to
do something about it.

Stephen's suggestion allows auditing to be conducted without the
users/apps being aware it is taking place.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-03 Thread Sven Wegener
Hi all,

we experienced what seems to be a bug in the COPY TO implementation. When a
table is being rewritten by an ALTER TABLE statement, a parallel COPY TO
results in an empty result.

Consider the following table data:

  CREATE TABLE test (id INTEGER NOT NULL, PRIMARY KEY (id));
  INSERT INTO test (id) SELECT generate_series(1, 1000);

One session does:

  ALTER TABLE test ADD COLUMN dummy BOOLEAN NOT NULL DEFAULT FALSE;

This acquires an exclusive lock to the table.

Another session now performs parallel:

  COPY test TO STDOUT;

This blocks on the exclusive lock held by the ALTER statement. When the ALTER
staement is finished, the COPY statement returns with an empty result. Same
goes for COPY (SELECT ...) TO, whereas the same SELECT executed without COPY
blocks and returns the correct result as expected.

This is my analysis of this issue:

The backend opens the rewritten data files, but it ignores the complete
content, which indicates the data is being ignored because of XIDs. For direct
SELECT statements the top-level query parsing acquires locks on involved tables
and creates a new snapshot, but for COPY statements the parsing and locking is
done later in COPY code. After locking the tables in COPY code, the data
is read with an old snapshot, effectively ignoring all data from the rewritten
table.

I've check git master and 9.x and all show the same behaviour. I came up with
the patch below, which is against curent git master. The patch modifies the
COPY TO code to create a new snapshot, after acquiring the necessary locks on
the source tables, so that it sees any modification commited by other backends.

Despite thinking this is the correct solution, another solution or
optimization would be to have ALTER TABLE, which holds the highest lock level,
set the XID of rewritten tuples to the FrozenXID, as no other backend should
access the table before the ALTER TABLE is committed.

Sven

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b83576..fe2d157 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1344,6 +1344,13 @@ BeginCopy(bool is_from,
(errcode(ERRCODE_UNDEFINED_COLUMN),
 errmsg("table \"%s\" does not have 
OIDs",

RelationGetRelationName(cstate->rel;
+
+   /*
+* Use a new snapshot to ensure this query sees
+* results of any previously executed queries.
+*/
+   if (!is_from)
+   PushActiveSnapshot(GetTransactionSnapshot());
}
else
{
@@ -1394,11 +1401,10 @@ BeginCopy(bool is_from,
plan = planner(query, 0, NULL);
 
/*
-* Use a snapshot with an updated command ID to ensure this 
query sees
+* Use a new snapshot to ensure this query sees
 * results of any previously executed queries.
 */
-   PushCopiedSnapshot(GetActiveSnapshot());
-   UpdateActiveSnapshotCommandId();
+   PushActiveSnapshot(GetTransactionSnapshot());
 
/* Create dest receiver for COPY OUT */
dest = CreateDestReceiver(DestCopyOut);
@@ -1741,9 +1747,11 @@ EndCopyTo(CopyState cstate)
ExecutorFinish(cstate->queryDesc);
ExecutorEnd(cstate->queryDesc);
FreeQueryDesc(cstate->queryDesc);
-   PopActiveSnapshot();
}
 
+   /* Discard snapshot */
+   PopActiveSnapshot();
+
/* Clean up storage */
EndCopy(cstate);
 }


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes && superuser review

2014-11-03 Thread Stephen Frost
* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
> > That said, I don't feel very strongly about that position, so if you and
> > Robert (and others on the thread) agree that's the right approach then
> > I'll see about getting it done.

Thanks for trying to make progress on this.

> We haven't reached consensus on this one yet and I didn't want it to fall
> too far off the radar.
> 
> Here is what I summarize as the current state of the discussion:

For my 2c- I don't think we'd be able to drop the old syntax and
therefore I'm not sure that I really see the point in adding new syntax.
I don't hold that position strongly, but I don't like the idea that
we're just going to be sitting on our thumbs because we can't agree on
the color of the bike shed.

If there is agreement to move forward with using the syntax below,
great, that can be implemented in relatively short order.  If there
isn't, then let's move forward with the existing syntax and change it
later, if there is agreement to do so at some point in the future.

I don't like the idea of tying it into GRANT.  GRANT is SQL-spec
defined, quite overloaded already, and role attributes don't act like
GRANTs anyway (there's no ADMIN option and they aren't inheirited).

> 2. Catalog Representation:
> 
> Condense all attributes in pg_authid to single int64 column and create
> bitmasks accordingly.

I'd suggest we do this regardless and the only question is if we feel
there is need to provide a view to split them back out again or not.
I'm inclined to say 'no' to another view and instead suggest we provide
SQL-level "has_whatever_privilege" for these which match the existing C
functions, update our clients to use those, and encourage others to do
the same.  Probably also helpful would be a single function that returns
a simple list of the non-default role attributes which a user has and
we'd simplify psql's describeRoles by having it use that instead.

> Obviously there is some concern for upgrading and whether to do both at
> once or to do them incrementally.  IMHO, I think if the changes are going
> to be made, then we should go ahead and do them at the same time.  Though,
> would it be beneficial to separate in to two distinct patches?

I agree that doing these changes at the same time makes sense, if we can
get agreement on what exactly to do.  I've shared my thoughts, but we
really need input from others, ideally of the form of "I prefer X over
Y" rather than "well, we could do X or Y".

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Robert Haas
On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs  wrote:
> The main question is still why any of this is necessary? If we are
> only acquiring fast path locks in workers, what need is there for any
> of this? The leader and workers never conflict with each other, so why
> would they ever wait for each other (except at a point where they are
> actively transferring data), which is the necessary pre-condition for
> a deadlock?
>
> Yes, running a command with a conflicting lock would disrupt the start
> of a parallel operation, but as I said above, the deadlock detector
> will eventually see that and re-arrange us so there is no deadlock.
>
> The only need I can see is if one of the workers/leader requests a
> Strong lock on an object, someone else requests a conflicting lock on
> that object and then we perform a parallel operation ourselves. We
> could easily solve that by saying that you can't go parallel if you
> hold a strong lock AND that workers can't take anything higher than
> RowShareLock, like HS backends. That's good enough for now.

That would be enough to keep the parallel group from deadlocking with
itself, but it wouldn't be enough to prevent deadlock with other
processes, which is the scenario that actually worries me a lot more.
In particular, I'm worried about deadlocks where some other process
gets sandwiched between two processes from the same parallel group.
That is, let A1 be a parallel group leader and A2 be a process from
the same parallel group, and let B be an unrelated process.  I'm
worried about the situation where A2 waits for B which waits for A1
which (outside the view of the deadlock detector) waits for A2.  All
that can happen even if the parallel backends hold no strong locks,
because B can hold strong locks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes && superuser review

2014-11-03 Thread Adam Brightwell
All,


> That said, I don't feel very strongly about that position, so if you and
> Robert (and others on the thread) agree that's the right approach then
> I'll see about getting it done.


We haven't reached consensus on this one yet and I didn't want it to fall
too far off the radar.

Here is what I summarize as the current state of the discussion:

1. Syntax:

ALTER ROLE  { ADD | DROP } CAPABILITY 

* I think this is the most straight forward approach as it still close to
the current syntax.
* Perhaps keeping the current syntax around as deprecated to be removed in
a scheduled future version. (provide a "deprecated" message to the user?)

or

GRANT EXECUTE PRIVILEGES ON  TO 

* Though, this will be tricky since EXECUTE is not reserved and the
currently state of GRANT in the grammar would require either refactoring or
making it RESERVED... neither I think are acceptable at this point in time
for obvious reasons.

2. Catalog Representation:

Condense all attributes in pg_authid to single int64 column and create
bitmasks accordingly.

Obviously there is some concern for upgrading and whether to do both at
once or to do them incrementally.  IMHO, I think if the changes are going
to be made, then we should go ahead and do them at the same time.  Though,
would it be beneficial to separate in to two distinct patches?

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Robert Haas
On Mon, Nov 3, 2014 at 10:18 AM, Greg Stark  wrote:
> On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas  wrote:
>> 1. Any non-trivial piece of PostgreSQL code is likely to contain
>> syscache lookups.
>> 2. Syscache lookups had better work in parallel workers, or they'll be
>> all but useless.
>
> I've been using parallel sorts and index builds in my mental model of
> how this will be used. I note that sorts go out of their way to look
> up all the syscache entries in advance precisely so that tuplesort
> doesn't start doing catalog lookups in the middle of the sort.

This isn't really true.  Routines like varstr_cmp() do syscache
lookups and de-TOASTing.  We've recently tried to reduce that with the
sortsupport stuff, but it hasn't eliminated it completely, and there
are other cases.  This is really the key point that I think everybody
needs to understand: a lot of code that you think doesn't do anything
complicated actually reads from the database - generally either in the
form of syscache lookups, or in the form of de-TOASTing.  Most of that
code doesn't perform those operations *frequently*, but that's because
there are backend-private caches that minimize the number of times we
actually hit the database.  But restructuring all of that code to do
no database access whatsoever would be a very difficult proposition
and would involve painful sacrifices, which is, I imagine, why it is
the way it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Tom Lane
Greg Stark  writes:
> On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas  wrote:
>> 2. Syscache lookups had better work in parallel workers, or they'll be
>> all but useless.

> I've been using parallel sorts and index builds in my mental model of
> how this will be used. I note that sorts go out of their way to look
> up all the syscache entries in advance precisely so that tuplesort
> doesn't start doing catalog lookups in the middle of the sort.

Well, they try to avoid doing the lookups more than once, but that does
not by any means imply that no lookups happen after the sort starts.
In particular, if you're sorting a container type (array, record)
there will be lookups during the first call of the sort type's comparison
function.  Enums could cause catalog lookups much later than the first
call, too.

I'm with Robert: if you cannot do catalog lookups in a worker process,
the feature will be so crippled as to be essentially useless.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-03 Thread Greg Stark
On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas  wrote:
> 1. Any non-trivial piece of PostgreSQL code is likely to contain
> syscache lookups.
> 2. Syscache lookups had better work in parallel workers, or they'll be
> all but useless.


I've been using parallel sorts and index builds in my mental model of
how this will be used. I note that sorts go out of their way to look
up all the syscache entries in advance precisely so that tuplesort
doesn't start doing catalog lookups in the middle of the sort. In
general I think what people are imagining is that the parallel workers
will be running low-level code like tuplesort that has all the
databasey stuff like catalog lookups done in advance and just operates
on C data structures like function pointers. And I think that's a
valuable coding discipline to enforce, it avoids having low level
infrastructure calling up to higher level abstractions which quickly
becomes hard to reason about.

However in practice I think you're actually right -- but not for the
reasons you've been saying. I think the parallel workers *should* be
written as low level infrastructure and not be directly doing syscache
lookups or tuple locking etc. However there are a million ways in
which Postgres is extensible which causes loops in the call graph that
aren't apparent in the direct code structure. For instance, what
happens if the index you're building is an expression index or partial
index? Worse, what happens if those expressions have a plpython
function that does queries using SPI

But those are the kinds of user code exploiting extensibility are the
situations where we need a deadlock detector and where you might need
this infrastructure. We wouldn't and shouldn't need a deadlock
detector for our own core server code. In an ideal world some sort of
compromise that enforces careful locking rules where all locks are
acquired in advance and parallel workers are prohibited from obtaining
locks in the core code while still allowing users to a free-for-all
and detecting deadlocks at runtime for them would be ideal. But I'm
not sure there's any real middle ground here.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/02/2014 09:27 PM, Mikko Tiihonen wrote:
> Is the following summary correct:
> - the network protocol supports pipelinings

Yes.

All you have to do is *not* send a Sync message and be aware that the
server will discard all input until the next Sync, so pipelining +
autocommit doesn't make a ton of sense for error handling reasons.

> - the server handles operations in order, starting the processing of next 
> operation only after fully processing the previous one - thus pipelining is 
> invisible to the server

As far as I know, yes. The server just doesn't care.

> - libpq driver does not support pipelining, but that is due to internal 
> limitations

Yep.

> - if proper error handling is done by the client then there is no reason why 
> pipelining could be supported by any pg client

Indeed, and most should support it. Sending batches of related queries
would make things a LOT faster.

PgJDBC's batch support is currently write-oriented. There is no
fundamental reason it can't be expanded for reads. I've already written
a patch to do just that for the case of returning generated keys.

https://github.com/ringerc/pgjdbc/tree/batch-returning-support

and just need to rebase it so I can send a pull for upstream PgJDBC.
It's already linked in the issues documenting the limitatations in batch
support.


If you want to have more general support for batches that return rowsets
there's no fundamental technical reason why it can't be added. It just
requires some tedious refactoring of the driver to either:

- Sync and wait before it fills its *send* buffer, rather than trying
  to manage its receive buffer (the server send buffer), so it can
  reliably avoid deadlocks; or

- Do async I/O in a select()-like loop over a protocol state machine,
  so it can simultaneously read and write on the wire.

I might need to do some of that myself soon, but it's a big (and
therefore error-prone) job I've so far avoided by making smaller, more
targeted changes.

Doing async I/O using Java nio channels is by far the better approach,
but also the more invasive one. The driver currently sends data on the
wire where it generates it and blocks to receive expected data.
Switching to send-side buffer management doesn't have the full
performance gains that doing bidirectional I/O via channels does,
though, and may be a significant performance _loss_ if you're sending
big queries but getting small replies.

For JDBC the JDBC batch interface is the right place to do this, and you
should not IMO attempt to add pipelining outside that interface.
(Multiple open resultsets from portals, yes, but not pipelining of queries).


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pipelining executions to postgresql server

2014-11-03 Thread Craig Ringer
On 11/01/2014 10:04 PM, Mikko Tiihonen wrote:
> Hi,
> 
> I created a proof of concecpt patch for postgresql JDBC driver that allows 
> the caller to do pipelining of requests within a transaction. The pipelining 
> here means same as for HTTP: the client can send the next execution already 
> before waiting for the response of the previous request to be fully processed.

... but ... it already does that.

Use batches, and that's exactly what it'll do.

Statement.addBatch(String)
or
PreparedStatement.addBatch()

> What kind of problems could pipelining cause (assuming we limit it to rather 
> simple use cases only)?

Client/server pipleline deadlocks due to how PgJDBC manages it.

See the details:

https://github.com/pgjdbc/pgjdbc/issues/195

and

https://github.com/pgjdbc/pgjdbc/issues/194

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-03 Thread Michael Banck
Am Sonntag, den 02.11.2014, 12:41 -0500 schrieb Tom Lane:
> BTW, after reflecting a bit more I'm less than convinced that this
> datatype is completely useless.  Even if you prefer to store currency
> values in numeric columns, casting to or from money provides a way to
> accept or emit values in whatever monetary format the LC_MONETARY locale
> setting specifies.  That seems like a useful feature, and it's one you
> could not easily duplicate using to_char/to_number (not to mention that
> those functions aren't without major shortcomings of their own).

As an additional datapoint, Vitesse Data changed the DB schema from
NUMERIC to MONEY for their TPCH benchmark for performance reasons: "The
modification to data types is easy to understand -- money and double
types are faster than Numeric (and no one on this planet has a bank
account that overflows the money type, not any time soon)."[1] And
"Replaced NUMERIC fields representing currency with MONEY"[2].

Not sure whether they modified/optimized PostgreSQL with respect to the
MONEY data type and/or how much performance that gained, so CCing CK Tan
as well.


Michael

[1] 
http://vitesse-timing-on.blogspot.de/2014/10/running-tpch-on-postgresql-part-1.html
[2] http://vitessedata.com/benchmark/

-- 
Michael Banck
Projektleiter / Berater
Tel.: +49 (2161) 4643-171
Fax:  +49 (2161) 4643-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CINE in CREATE TABLE AS ... and CREATE MATERIALIZED VIEW ...

2014-11-03 Thread Fabrízio de Royes Mello
On Mon, Nov 3, 2014 at 3:12 AM, Rushabh Lathia 
wrote:
>
> Patch looks good, assigning to committer.
>

Thanks for your review!

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


[HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Alexey Vasiliev

Hello everyone.

*  Project name:  Add recovery_timeout option to control timeout of 
restore_command nonzero status code
*  Uniquely identifiable file name, so we can tell difference between your v1 
and v24:  0001-add-recovery_timeout-to-controll-timeout-between-res.patch
*  What the patch does in a short paragraph: This patch should add option 
recovery_timeout, which help to control timeout of restore_command nonzero 
status code. Right now default value is 5 seconds. This is useful, if I using 
for restore of wal logs some external storage (like AWS S3) and no matter what 
the slave database will lag behind the master. The problem, what for each 
request to AWS S3 need to pay, what is why for N nodes, which try to get next 
wal log each 5 seconds will be bigger price, than for example each 30 seconds. 
Before I do this in this way: " if ! (/usr/local/bin/envdir /etc/wal-e.d/env 
/usr/local/bin/wal-e wal-fetch "%f" "%p"); then sleep 60; fi ". But in this 
case restart/stop database slower.
*  Whether the patch is for discussion or for application: No such thing.
*  Which branch the patch is against: master branch
*  Whether it compiles and tests successfully, so we know nothing obvious is 
broken: compiled and pass tests on local mashine.
*  Whether it contains any platform-specific items and if so, has it been 
tested on other platforms: hope, no.
*  Confirm that the patch includes regression tests to check the new feature 
actually works as described: No it doesn't have test. I don't found ho to 
testing new config variables.
*  Include documentation: added.
*  Describe the effect your patch has on performance, if any: shouldn't effect 
on database performance.
This is my first patch. I am not sure about name of option. Maybe it should 
called "recovery_nonzero_timeout".

-- 
Alexey VasilievFrom 35abe56b2497f238a6888fe98c54aa9cb5300866 Mon Sep 17 00:00:00 2001
From: Alexey Vasiliev 
Date: Mon, 3 Nov 2014 00:21:14 +0200
Subject: [PATCH] add recovery_timeout to controll timeout between
 restore_command nonzero

---
 doc/src/sgml/recovery-config.sgml   | 16 
 src/backend/access/transam/recovery.conf.sample |  5 +
 src/backend/access/transam/xlog.c   | 17 -
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0f1ff34..bc410b3 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,22 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   
  
 
+ 
+  restore_timeout (integer)
+  
+restore_timeout recovery parameter
+  
+  
+  
+   
+By default, if restore_command return nonzero status,
+server will retry command again after 5 seconds. This parameter
+allow to change this time. This parameter is optional. This can
+be useful to increase/decrease number of a restore_command calls.
+   
+  
+ 
+
 
 
   
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 7657df3..282e898 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional timeout after nonzero code of restore_command.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_timeout = ''
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3c9aeae..98f0fca 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -233,6 +233,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_timeout = 0;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -5245,6 +5246,20 @@ readRecoveryCommandFile(void)
 	(errmsg_internal("trigger_file = '%s'",
 	 TriggerFile)));
 		}
+		else if (strcmp(item->name, "restore_timeout") == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item->value, &restore_timeout, GUC_UNIT_MS,
+		   &hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("parameter \"%s\" requires a temporal value",
+"restore_timeout"),
+		 hintmsg ? errhint("%s", _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal("restore_timeout = '%s'", item->value)));
+		}
 		else if (strcmp(item->name, "recovery_min_apply_delay") == 0)
 		{
 			const char *hintmsg;
@@ -0,7 +11125,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr Re