date:20141104

On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
 Well, Michael has point that the extradata is pretty much useless currently,
 perhaps it would help to add the interface to set extradata?

Only accessible via C and useless aren't the same thing. But sure, add
it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-04 Thread Craig Ringer

On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote:
 Thank you for your comment, Tom. However I think this behavior, as
 seen from a user perspective, it's not the expected one.

That may be the case, but I think it's the SQL-standard behaviour, so we
can't really mess with it.

The spec requires SET TRANSACTION ISOLATION, and you can't implement
that if you take a snapshot at BEGIN.

 If it is still the intended behavior, I think it should be clearly
 documented as such, and a recommendation similar to issue a 'SELECT 1'
 right after BEGIN to freeze the data before any own query or similar
 comment should be added. Again, as I said in my email, the documentation
 clearly says that only sees data committed before the transaction
 began. And this is clearly not the real behavior.

It's more of a difference in when the transaction begins.

Arguably, BEGIN says I intend to begin a new transaction with the
next query rather than immediately begin executing a new transaction.

This concept could be clearer in the docs.

 Sure, there are, that was the link I pointed out, but I found no
 explicit mention to the fact that I'm raising here.

I'm sure it's documented *somewhere*, in that I remember reading about
this detail in the docs, but I can't find _where_ in the docs.

It doesn't seem to be in:

http://www.postgresql.org/docs/current/static/transaction-iso.html

where I'd expect.

In any case, we simply cannot take the snapshot at BEGIN time, because
it's permitted to:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

in a DB that has default serializable isolation or has a SET SESSION
CHARACTERISTICS isolation mode of serializable. Note that SET
TRANSACTION is SQL-standard.

AFAIK deferring the snapshot that's consistent with other RDBMSes that
use snapshots, too.


The docs of that command allude to, but doesn't explicitly state, the
behaviour you mention.

http://www.postgresql.org/docs/current/static/sql-set-transaction.html


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

On Sat, Nov 1, 2014 at 10:00 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 More comments:


I have done a couple of tests on my laptop with pgbench like that to
generate a maximum of transaction commits:
$ pgbench --no-vacuum -f ~/Desktop/commit.sql -T 60 -c 24 -j 24
$ cat ~/Desktop/commit.sql
SELECT txid_current()
Here is an average of 5 runs:
- master: 49842.44
- GUC off, patched = 49688.71
- GUC on, patched = 49459.73
So there is little noise.

Here are also more comments about the code that I found while focusing on
committs.c:
1) When the GUC is not enabled, and when the transaction ID provided is not
in a correct range, a dummy value based on InvalidTransactionId (?!) is
returned, like that:
+   /* Return empty if module not enabled */
+   if (!commit_ts_enabled)
+   {
+   if (ts)
+   *ts = InvalidTransactionId;
+   if (data)
+   *data = (CommitExtraData) 0;
+   return;
+   }
This leads to some incorrect results:
=# select pg_get_transaction_committime('1');
 pg_get_transaction_committime
---
 2000-01-01 09:00:00+09
(1 row)
Or for example:
=# SELECT txid_current();
 txid_current
--
 1006
(1 row)
=# select pg_get_transaction_committime('1006');
 pg_get_transaction_committime
---
 2014-11-04 14:54:37.589381+09
(1 row)
=# select pg_get_transaction_committime('1007');
 pg_get_transaction_committime
---
 2000-01-01 09:00:00+09
(1 row)
=# SELECT txid_current();
 txid_current
--
 1007
(1 row)
And at other times error is not that helpful for user:
=# select pg_get_transaction_committime('1');
ERROR:  XX000: could not access status of transaction 1
DETAIL:  Could not read from file pg_committs/ at offset 114688:
Undefined error: 0.
LOCATION:  SlruReportIOError, slru.c:896
I think that you should simply return an error in
TransactionIdGetCommitTsData when xid is not in the range supported. That
will be more transparent for the user.
2) You may as well want to return a different error if the GUC
track_commit_timestamps is disabled.
3) This comment should be updated in committs.c, we are not talking about
CLOG here:
+/*
+ * Link to shared-memory data structures for CLOG control
+ */
4) Similarly, I think more comments should be put in here. It is OK to
truncate the commit timestamp stuff similarly to CLOG to have a consistent
status context available, but let's explain it.
 * multixacts; that will be done by the next checkpoint.
 */
TruncateCLOG(frozenXID);
+   TruncateCommitTs(frozenXID)
5) Reading the code, TransactionTreeSetCommitTimestamp is always called
with do_xlog = false, making that actually no timestamps are WAL'd... Hence
WriteSetTimestampXlogRec is just dead code with the latest version of the
patch. IMO, this should be always WAL-logged when track_commit_timestamp is
on.
6) Shouldn't any value update of track_commit_timestamp be tracked in
XLogReportParameters? That's thinking about making the commit timestamp
available on standbys as well..
7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be
useful for developers.
8) The redo and xlog routines of committs should be out in a dedicated
file, like committsxlog.c or similar. The other rmgr do so, so let's be
consistent.

Regards,
-- 
Michael

Re: [HACKERS] tracking commit timestamps

On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
 5) Reading the code, TransactionTreeSetCommitTimestamp is always called
 with do_xlog = false, making that actually no timestamps are WAL'd... Hence
 WriteSetTimestampXlogRec is just dead code with the latest version of the
 patch. IMO, this should be always WAL-logged when track_commit_timestamp is
 on.

It's callable via a 'extern' function. So, I'd not consider it dead. And
the WAL logging is provided by xact.c's own WAL logging - it always does
the corresponding committs calls.

 6) Shouldn't any value update of track_commit_timestamp be tracked in
 XLogReportParameters? That's thinking about making the commit timestamp
 available on standbys as well..

Yes, it should.

 7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be
 useful for developers.

What do you mean by that? There's the corresponding rmgrdesc.c support I
think?

 8) The redo and xlog routines of committs should be out in a dedicated
 file, like committsxlog.c or similar. The other rmgr do so, so let's be
 consistent.

Seems pointless to me. The file isn't that big and the other SLRUs don't
do it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

On Tue, Nov 4, 2014 at 5:05 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
  Well, Michael has point that the extradata is pretty much useless
 currently,
  perhaps it would help to add the interface to set extradata?

 Only accessible via C and useless aren't the same thing. But sure, add
 it.

I'm still on a -1 for that. You mentioned that there is perhaps no reason
to delay a decision on this matter, but IMO there is no reason to rush
either in doing something we may regret. And I am not the only one on this
thread expressing concern about this extra data thingy.

If this extra data field is going to be used to identify from which node a
commit comes from, then it is another feature than what is written on the
subject of this thread. In this case let's discuss it in the thread
dedicated to replication identifiers, or come up with an extra patch once
the feature for commit timestamps is done.
-- 
Michael

Re: [HACKERS] tracking commit timestamps

On Tue, Nov 4, 2014 at 5:23 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
  5) Reading the code, TransactionTreeSetCommitTimestamp is always called
  with do_xlog = false, making that actually no timestamps are WAL'd...
 Hence
  WriteSetTimestampXlogRec is just dead code with the latest version of the
  patch. IMO, this should be always WAL-logged when track_commit_timestamp
 is
  on.

 It's callable via a 'extern' function. So, I'd not consider it dead. And
 the WAL logging is provided by xact.c's own WAL logging - it always does
 the corresponding committs calls.

The code path is unused. We'd better make the XLOG record mandatory if
tracking is enabled, as this information is useful on standbys as well.


  7) pg_xlogdump has no support for RM_COMMITTS_ID, something that would be
  useful for developers.

 What do you mean by that? There's the corresponding rmgrdesc.c support I
 think?

Oops sorry. I thought there was some big switch in pg_xlogdump when writing
this comment. Yeah that's fine.
-- 
Michael

Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Albe Laurenz

David Fetter wrote:
 On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote:
 Just out of curiosity, why is Oracle's NUMBER (I assume you are
 talking about this) so fast?
 
 I suspect that what happens is that NUMBER is stored as a native type
 (int2, int4, int8, int16) that depends on its size and then cast to
 the next upward thing as needed, taking any performance hits at that
 point.  The documentation hints (38 decimal places) at a 128-bit
 internal representation as the maximum.  I don't know what happens
 when you get past what 128 bits can represent.

No, Oracle stores NUMBERs as variable length field (up to 22 bytes),
where the first byte encodes the sign and the comma position and the
remaining bytes encode the digits, each byte representing two digits
in base-100 notation (see Oracle Metalink note 1007641.6).

So it's not so different from PostgreSQL.
No idea why their arithmetic should be faster.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-04 Thread Simon Riggs

On 3 November 2014 22:18, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 So here's v21.  I also attach a partial diff from v20, just in case
 anyone wants to give it a look.

Looks really good.

I'd like to reword this sentence in the readme, since one of the main
use cases would be tables without btrees
   It's unlikely that BRIN would be the only
+ indexes in a table, though, because primary keys can be btrees only, and so
+ we don't implement this optimization.

I don't see a regression test. Create, use, VACUUM, just so we know it
hasn't regressed after commit.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

On 2014-11-04 17:29:04 +0900, Michael Paquier wrote:
 On Tue, Nov 4, 2014 at 5:23 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  On 2014-11-04 17:19:18 +0900, Michael Paquier wrote:
   5) Reading the code, TransactionTreeSetCommitTimestamp is always called
   with do_xlog = false, making that actually no timestamps are WAL'd...
  Hence
   WriteSetTimestampXlogRec is just dead code with the latest version of the
   patch. IMO, this should be always WAL-logged when track_commit_timestamp
  is
   on.
 
  It's callable via a 'extern' function. So, I'd not consider it dead. And
  the WAL logging is provided by xact.c's own WAL logging - it always does
  the corresponding committs calls.
 
 The code path is unused.

No. It is not. It can be called by extensions?

 We'd better make the XLOG record mandatory if
 tracking is enabled, as this information is useful on standbys as well.

Did you read what I wrote? To quote And the WAL logging is provided by
xact.c's own WAL logging - it always does the corresponding committs
calls..

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What exactly is our CRC algorithm?

2014-11-04 Thread Heikki Linnakangas


On 10/27/2014 06:02 PM, Heikki Linnakangas wrote:

I came up with the attached patches. They do three things:

1. Get rid of the 64-bit CRC code. It's not used for anything, and
haven't been for years, so it doesn't seem worth spending any effort to
fix them.

2. Switch to CRC-32C (Castagnoli) for WAL and other places that don't
need to remain compatible across major versions.

3. Use the same lookup table for hstore and ltree, as used for the
legacy almost CRC-32 algorithm. The tables are identical, so might as
well.

Any objections?


I hear none, so committed with some small fixes.
- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-04 Thread Jeff Janes

On Mon, Nov 3, 2014 at 2:18 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:


 So here's v21.  I also attach a partial diff from v20, just in case
 anyone wants to give it a look.


This needs a bump to 1.3, or the extension won't install:

contrib/pageinspect/pageinspect.control


During crash recovery, I am getting a segfault:

#0  0x0067ee35 in LWLockRelease (lock=0x0) at lwlock.c:1161
#1  0x00664f4a in UnlockReleaseBuffer (buffer=0) at bufmgr.c:2888
#2  0x00465a88 in brin_xlog_revmap_extend (lsn=value optimized
out, record=value optimized out) at brin_xlog.c:261
#3  brin_redo (lsn=value optimized out, record=value optimized out) at
brin_xlog.c:284
#4  0x004ce505 in StartupXLOG () at xlog.c:6795

I failed to preserve the data directory, I'll try to repeat this later this
week if needed.

Cheers,

Jeff

[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Alexey Vasiliev




Mon, 3 Nov 2014 22:55:02 -0200 от Fabrízio de Royes Mello 
fabriziome...@gmail.com:
 
 
 
 On Mon, Nov 3, 2014 at 7:25 PM, Alexey Vasiliev  leopard...@inbox.ru  wrote:
 
 
 
 
  Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello  
  fabriziome...@gmail.com :
  
   
Should I change my patch and send it by email? And also as I understand 
I should change message ID for   
https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it?
   
  
   You should just send another version of your patch by email and add a new 
   comment to commit-fest using the Patch comment type.
 
 
  Added new patch.
 
 
 And you should add the patch to an open commit-fest not to an in-progress. 
 The next opened is 2014-12 [1].
 

Moved. Thanks.

 
 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira:  http://www.timbira.com.br
  Blog:  http://fabriziomello.github.io
  Linkedin:  http://br.linkedin.com/in/fabriziomello
  Twitter:  http://twitter.com/fabriziomello
  Github:  http://github.com/fabriziomello

-- 
Alexey Vasiliev

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sequence Access Method WIP

2014-11-04 Thread Heikki Linnakangas


On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Hi,

I rewrote the patch with different API along the lines of what was
discussed.


Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to 
see how it really works. The local one is too dummy to expose any 
possible issues.



The API now consists of following functions:
sequence_alloc - allocating range of new values
The function receives the sequence relation, current value, number of
requested values amdata and relevant sequence options like min/max and
returns new amdata, new current value, number of values allocated and
also if it needs wal write (that should be returned if amdata has
changed plus other reasons the AM might have to force the wal update).

sequence_setval - notification that setval is happening
This function gets sequence relation, previous value and new value plus
the amdata and returns amdata (I can imagine some complex sequence AMs
will want to throw error that setval can't be done on them).

sequence_request_update/sequence_update - used for background processing
Basically AM can call the sequence_request_update and backend will then
call the sequence_update method of an AM with current amdata and will
write the updated amdata to disk

sequence_seqparams - function to process/validate the standard sequence
options like start position, min/max, increment by etc by the AM, it's
called in addition to the standard processing

sequence_reloptions - this is the only thing that remained unchanged
from previous patch, it's meant to pass custom options to the AM

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.


Hmm. The division of labour between the seqam and commands/sequence.c 
still feels a bit funny. sequence.c keeps track of how many values have 
been WAL-logged, and thus usable immediately, but we still call 
sequence_alloc even when using up those already WAL-logged values. If 
you think of using this for something like a centralized sequence server 
in a replication cluster, you certainly don't want to make a call to the 
remote server for every value - you'll want to cache them.


With the local seqam, there are two levels of caching. Each backend 
caches some values (per the CACHE value option in CREATE SEQUENCE). In 
addition to that, the server WAL-logs 32 values at a time. If you have a 
remote seqam, it would most likely add a third cache, but it would 
interact in strange ways with the second cache.


Considering a non-local seqam, the locking is also a bit strange. The 
server keeps the sequence page locked throughout nextval(). But if the 
actual state of the sequence is maintained elsewhere, there's no need to 
serialize the calls to the remote allocator, i.e. the sequence_alloc() 
calls.


I'm not exactly sure what to do about that. One option is to completely 
move the maintenance of the current value, i.e. sequence.last_value, 
to the seqam. That makes sense from an abstraction point of view. For 
example with a remote server managing the sequence, storing the 
current value in the local catalog table makes no sense as it's always 
going to be out-of-date. The local seqam would store it as part of the 
am-private data. However, you would need to move the responsibility of 
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to 
provide an API that the seqam can call to do that. Perhaps just let the 
seqam call heap_inplace_update on the sequence relation.



For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be.


It would be nice if the seqam could define exactly the columns it needs, 
with any datatypes. There would be a set of common attributes: 
sequence_name, start_value, cache_value, increment_by, max_value, 
min_value, is_cycled. The local seqam would add last_value, log_cnt 
and is_called to that. A remote seqam that calls out to some other 
server might store the remote server's hostname etc.


There could be a seqam function that returns a TupleDesc with the 
required columns, for example.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] tracking commit timestamps

2014-11-04 Thread Alvaro Herrera

Michael Paquier wrote:

 I'm still on a -1 for that. You mentioned that there is perhaps no reason
 to delay a decision on this matter, but IMO there is no reason to rush
 either in doing something we may regret. And I am not the only one on this
 thread expressing concern about this extra data thingy.
 
 If this extra data field is going to be used to identify from which node a
 commit comes from, then it is another feature than what is written on the
 subject of this thread. In this case let's discuss it in the thread
 dedicated to replication identifiers, or come up with an extra patch once
 the feature for commit timestamps is done.

Introducing the extra data field in a later patch would mean an on-disk
representation change, i.e. pg_upgrade trouble.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP test breakage on MacOS X

2014-11-04 Thread Robert Haas

On Mon, Nov 3, 2014 at 4:44 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 10/29/14 8:42 AM, Robert Haas wrote:
 I'm sympathetic to that line of reasoning, but I really think that if
 you want to keep this infrastructure, it needs to be made portable.

 Let me clarify that this was my intention.  I have looked at many test
 frameworks, many of which are much nicer than what we have, but the
 portability and dependency implications for this project would have been
 between shocking and outrageous.  I settled for what I felt was the
 absolute minimum: Perl + IPC::Run.  It was only later on that I learned
 that 1) subtests don't work in Perl 5.10, and 2) subtests are broken in
 Perl 5.12.  So we removed the use of subtests and now we are back at the
 baseline I started with.

Thanks.  At last check, which I think was approximately last week, the
tests were running and passing on my machine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What exactly is our CRC algorithm?

2014-11-04 Thread Robert Haas

On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I hear none, so committed with some small fixes.

Are you going to get the slice-by-N stuff working next, to speed this up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

On 2014-11-04 10:01:00 -0300, Alvaro Herrera wrote:
 Michael Paquier wrote:
 
  I'm still on a -1 for that. You mentioned that there is perhaps no reason
  to delay a decision on this matter, but IMO there is no reason to rush
  either in doing something we may regret. And I am not the only one on this
  thread expressing concern about this extra data thingy.
  
  If this extra data field is going to be used to identify from which node a
  commit comes from, then it is another feature than what is written on the
  subject of this thread. In this case let's discuss it in the thread
  dedicated to replication identifiers, or come up with an extra patch once
  the feature for commit timestamps is done.
 
 Introducing the extra data field in a later patch would mean an on-disk
 representation change, i.e. pg_upgrade trouble.

It's also simply not necessarily related to replication
identifiers. This is useful whether replication identifiers make it in
or not. It allows to implement something like replication identifiers
outside of core (albeit with a hefty overhead in OLTP workloads).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What exactly is our CRC algorithm?

On 2014-11-04 08:21:13 -0500, Robert Haas wrote:
 On Tue, Nov 4, 2014 at 4:47 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  I hear none, so committed with some small fixes.
 
 Are you going to get the slice-by-N stuff working next, to speed this up?

I don't plan to do anything serious with it, but I've hacked up the crc
code to use the hardware instruction. The results are quite good - crc
vanishes entirely from the profile for most workloads. It's still
visible for bulk copy, but that's pretty much it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Hi,

On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote:
 *  What the patch does in a short paragraph: This patch should add option 
 recovery_timeout, which help to control timeout of restore_command nonzero 
 status code. Right now default value is 5 seconds. This is useful, if I using 
 for restore of wal logs some external storage (like AWS S3) and no matter 
 what the slave database will lag behind the master. The problem, what for 
 each request to AWS S3 need to pay, what is why for N nodes, which try to get 
 next wal log each 5 seconds will be bigger price, than for example each 30 
 seconds. Before I do this in this way:  if ! (/usr/local/bin/envdir 
 /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch %f %p); then sleep 60; fi 
 . But in this case restart/stop database slower.

Without saying that the feature is unneccessary, wouldn't this better be
solved by using streaming rep most of the time?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread k...@rice.edu

On Tue, Nov 04, 2014 at 11:44:22AM +0900, Michael Paquier wrote:
 On Sun, Nov 2, 2014 at 2:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  In the case of hash indexes, because we still have to have the hash
  opclasses in core, there's no way that it could be pushed out as an
  extension module even if we otherwise had full support for AMs as
  extensions.  So what I hear you proposing is let's break this so
  thoroughly that it *can't* be fixed.  I'm not on board with that.
  I think the WARNING will do just fine to discourage novices who are
  not familiar with the state of the hash AM.  In the meantime, we
  could push forward with the idea of making hash indexes automatically
  unlogged, so that recovering from a crash wouldn't be quite so messy/
  dangerous.
 
 There is as well another way: finally support WAL-logging for hash indexes.

+1

Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Kevin Grittner

Feng Tian ft...@vitessedata.com wrote:

 Performance different between Money and Numeric is *HUGE*.  For
 TPCH Q1, the performance difference is 5x for stock postgres, and
 ~20x for vitesse.

 Stock postgres, for my laptop, TPCH 1G, Q1, use money type ~ 9s,
 use Numeric (15, 2) is ~53s.

 test=# do $$ begin perform sum('1.01'::numeric) from 
 generate_series(1,1000); end; $$;

 This may not reflect the difference of the two data type.   One
 aggregate is not where most of the time is spent.  TPCH Q1 has
 many more computing.

That's why I gave the count(*) baseline.  If you subtract that from
the sum() timings for money and numeric, numeric takes 5x as long
as money in my quick-and-dirty benchmark. I have no problem
believing that some applications can see that level of performance
hit from using numeric instead of money.  It's a little surprising
to me to see more than a factor of five slowdown from using numeric
instead of money; it leaves me a little curious how that happens.
Perhaps that workload is more conducive to keeping those money
amounts in registers and doing register arithmetic.

In any event, I'm against removing or re-deprecating the money
type.  There are some problems with money; there are other problems
with numeric.  If the docs are failing to make the trade-offs
clear, we should fix the docs.  And we can always look at improving
either or both types.  The fact that numeric is 5x to 20x slower
than money in important applications, combined with the fact that
it performs far worse than a similar type in another product,
suggest that numeric could stand a serious optimization pass.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Andreas Karlsson


On 11/02/2014 06:41 PM, Tom Lane wrote:

I wrote:

Either way, requiring a dump/reload for upgrade is surely a better answer
for users of the type than just summarily screwing them.


BTW, after reflecting a bit more I'm less than convinced that this
datatype is completely useless.  Even if you prefer to store currency
values in numeric columns, casting to or from money provides a way to
accept or emit values in whatever monetary format the LC_MONETARY locale
setting specifies.  That seems like a useful feature, and it's one you
could not easily duplicate using to_char/to_number (not to mention that
those functions aren't without major shortcomings of their own).


I personally find that one of the money type's missfeatures, since the 
currency symbol does not vary just per language/locale but also for 
which currency you are working with. For databases with multiple 
currencies the output format of money is just confusing.


Andreas



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Merlin Moncure

On Tue, Nov 4, 2014 at 8:16 AM, Kevin Grittner kgri...@ymail.com wrote:
 In any event, I'm against removing or re-deprecating the money
 type.  There are some problems with money; there are other problems
 with numeric.  If the docs are failing to make the trade-offs
 clear, we should fix the docs.  And we can always look at improving
 either or both types.  The fact that numeric is 5x to 20x slower
 than money in important applications, combined with the fact that
 it performs far worse than a similar type in another product,
 suggest that numeric could stand a serious optimization pass.

Money should stay.  It is the only fixed point integer based numeric
that comes with the database.  Should a non locale dependent fixed
point type that offers the same performance come along, then you have
a good argument to deprecate.

numeric is definitely a dog which the fundamental problem IMNSHO.  I'm
guessing optimization paths are going to revolve around utilizing
binary integer ops in cases where it's known to be safe to do so.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices



On 11/03/2014 10:11 PM, Tom Lane wrote:

Josh Berkusj...@agliodbs.com  writes:

On 11/02/2014 11:41 AM, Tom Lane wrote:

Nothing that I recall at the moment, but there is certainly plenty of
stuff of dubious quality in there.  I'd argue that chkpass, intagg,
intarray, isn, spi, and xml2 are all in worse shape than the money type.

Why are we holding on to xml2 again?

IIRC, there's some xpath-related functionality in there that's not
yet available in core (and needs some redesign before it'd ever get
accepted into core, so there's not a real quick fix to be had).




Yes, xpath_table is badly broken, as Robert Haas documented and I 
expanded on a while back. See for example 
http://www.postgresql.org/message-id/4d6bcc1e.3010...@dunslane.net I 
don't have any time of my own to work on this any time soon. But I know 
that it has users, so just throwing it out would upset some people.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Alexey Vasiliev




Tue, 4 Nov 2014 14:41:56 +0100 от Andres Freund and...@2ndquadrant.com:
 Hi,
 
 On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote:
  *  What the patch does in a short paragraph: This patch should add option 
  recovery_timeout, which help to control timeout of restore_command nonzero 
  status code. Right now default value is 5 seconds. This is useful, if I 
  using for restore of wal logs some external storage (like AWS S3) and no 
  matter what the slave database will lag behind the master. The problem, 
  what for each request to AWS S3 need to pay, what is why for N nodes, which 
  try to get next wal log each 5 seconds will be bigger price, than for 
  example each 30 seconds. Before I do this in this way:  if ! 
  (/usr/local/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch %f 
  %p); then sleep 60; fi . But in this case restart/stop database slower.
 
 Without saying that the feature is unneccessary, wouldn't this better be
 solved by using streaming rep most of the time?

But we don't need streaming rep. Master database no need to know about slaves 
(and no need to add this little overhead to master). Slaves read wal logs from 
S3 and the same S3 wal logs used as backups. 

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

On 7/11/14 4:36 AM, Ali Akbar wrote:
 But i found some bug in libxml2's code, because we call xmlCopyNode with
 1 as the second parameter, internally xmlCopyNode will call
 xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And
 xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode
 whether it's NULL. So if someday the recursion returns NULL (maybe
 because of memory exhaustion), it will SEGFAULT.
 
 Knowing this but in libxml2 code, is this patch is still acceptable?

This problem was fixed in libxml2 2.9.2 (see
https://bugzilla.gnome.org/show_bug.cgi?id=708681).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

On 7/8/14 6:03 AM, Ali Akbar wrote:
 If we put 1 as the parameter, the resulting Node will link to the
 existing children. In this case, the namespace problem for the children
 might be still exists. If any of the children uses different
 namespace(s) than the parent, the namespace definition will not be
 copied correctly.

This analysis might very well be right, but I think we should prove it
with a test case.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

On 10/6/14 10:24 PM, Ali Akbar wrote:
 While reviewing the patch myself, i spotted some formatting problems in
 the code. Fixed in this v5 patch.
 
 Also, this patch uses context patch format (in first versions, because
 of the small modification, context patch format obfucates the changes.
 After reimplementation this isn't the case anymore)

I think the problem this patch is addressing is real, and your approach
is sound, but I'd ask you to go back to the xmlCopyNode() version, and
try to add a test case for why the second argument = 1 is necessary.  I
don't see any other problems.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN pageinspect functions

2014-11-04 Thread Amit Kapila

On Tue, Oct 7, 2014 at 10:33 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 Some time ago, when debugging a GIN bug, I wrote these pageinspect
functions to inspect GIN indexes. They were very useful; we should add them.


I think these functions will be quite useful for debugging purpose
and we already have similar function's for other index (btree).

Few suggestions for patch:

1. Documentation seems to be missing, other API's exposed
via pageinspect are documented at:
http://www.postgresql.org/docs/devel/static/pageinspect.html

2.
+CREATE FUNCTION gin_metapage(IN page bytea,
+OUT pending_head bigint,
+OUT pending_tail bigint,
+
OUT tail_free_size int4,
+OUT n_pending_pages bigint,
+OUT n_pending_tuples bigint,
+OUT
n_total_pages bigint,
+OUT n_entry_pages bigint,
+OUT n_data_pages bigint,
+OUT n_entries bigint,
+
 OUT version int4)
+AS 'MODULE_PATHNAME', 'gin_metapage'
+LANGUAGE C STRICT;

a. Isn't it better to name the function as gin_metap(..) similar to
existing function bt_metap(..)?
b. Can this function have a similar signature as bt_metap() which means
it should take input as relname?

3. Can gin_dataleafpage() API have similar name and signature as
API bt_page_items() exposed for btree?

4. Can we have any better name for gin_pageopaq (other API name's
in this module are self explanatory)?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

Craig Ringer cr...@2ndquadrant.com writes:
 On 11/03/2014 03:41 AM, Tom Lane wrote:
 Nothing that I recall at the moment, but there is certainly plenty of
 stuff of dubious quality in there.  I'd argue that chkpass, intagg,
 intarray, isn, spi, and xml2 are all in worse shape than the money type.

 What's wrong with intarray?

The single biggest practical problem with it is that it creates new
definitions of the @ and @ operators, which cause confusion with the
core operators of those names.  For example, we've repeatedly seen bug
reports along the lines of why does this query not use this index
because of people trying to use the contrib @ operator with a core GIN
index or vice versa.  (In fairness, I think intarray might be older
than the core operators, but that doesn't make this less of a problem.)

Another thing that grates on me is the mostly-arbitrary restriction
to non-null-containing arrays.  That's an implementation artifact rather
than something semantically required by the operations.

More generally, it seems like a grab bag of not terribly well designed
features, and the features that do seem well designed seem like they
ought to be more generic than just for int4 arrays.  So to me it feels
like proof-of-concept experimentation rather than a production-grade
thing that we could feel good about moving into core.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #11867: Strange behaviour with composite types after resetting database tablespace

2014-11-04 Thread Marc Munro

On Mon, 2014-11-03 at 22:08 -0500, Tom Lane wrote:
 spockFascinating./spock

:-)

 I believe what is happening is:

[ . . . ]

  This is not mission-critical for me but I'd be grateful for suggestions for
  work-arounds.
 
 I don't see any workaround that's much easier than fixing the bug.
 But what's your use case that involves flipping databases from one
 tablespace to another and then back again within the life expectancy of
 unused shared-buffers pages?  It doesn't seem terribly surprising that
 nobody noticed this before ...

It's a completely unreal use case.  I am working on a diff tool, and
this was found as part of my test suite: build db x, drop db x,  build
db y, apply diff y-x, compare with original x, apply diff x-y, compare
with original y.

So this is a fairly minor inconvenience for me.

__
Marc


signature.asc
Description: This is a digitally signed message part

Re: [HACKERS] alter user set local_preload_libraries.

On 10/9/14 1:58 PM, Fujii Masao wrote:
 Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
 set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
 about applying the attached patch? This patch allows that and
 changes the context of session_preload_libraries to PGC_SU_BACKEND.

After looking through this again, I wonder whether there is any reason
why ignore_system_indexes cannot be plain PGC_USERSET.  With this
change, we'd allow setting it via ALTER ROLE, but the access to
pg_db_role_setting happens before it.  So if there is anything unsafe
about changing ignore_system_indexes, then this would be a problem, but
I don't see anything.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Pipelining executions to postgresql server

2014-11-04 Thread Kevin Wooten


 On Nov 4, 2014, at 12:55 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 
 On 11/04/2014 07:56 AM, Mikko Tiihonen wrote:
 I also think the async I/O is the way to go. Luckily that has already been 
 done
 in the pgjdbc-ng  (https://github.com/impossibl/pgjdbc-ng), built on top
 of netty java NIO library. It has quite good feature parity with the original
 pgjdbc driver. 
 
 Huh, it does seem to now. The big omission, unless I'm blind, is support
 for COPY. (It also lacks support for JDBC3 and JDBC4, requiring JDK 7
 and JDBC 4.1, but that's reasonable enough.)
 

I focused on 4.1 compliance instead of legacy support. I am proud to say I 
believe pgjdbc-ng is 100% feature compliant for JDBC 4.1.  Legacy support is 
still not, and likely will never be, planned.

It does lack COPY support as I have never used it myself from the driver, only 
from tools outside the JVM.

 It now has LISTEN/NOTIFY by the looks, and of course it's built around
 binary protocol support.
 
 I freely admit I haven't looked at it too closely. I'm a tad frustrated
 by the parallel development of a different driver when the official
 driver has so much in the way of low hanging fruit to improve.
 

I’ll only nibble this bait…. I outlined my reasons when I started the project.  
They were all valid then and still are now.  My apologies for causing any 
frustration with this new path. It’s a lot cleaner, simpler and provides more 
JDBC features than the original driver because of it.  Although I must say, 
without the original driver and it’s exhausting battery of unit tests, building 
a new driver would seem impossible.

 I have to say I like some aspects of how pgjdbc-ng is being done - in
 particular use of Maven and being built around non-blocking I/O.
 
 OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
 driver, and since it hard-depends on JDK 7 I don't understand why Netty
 was used instead of the async / non-blocking features of the core JVM.
 That may simply be my inexperience with writing non-blocking socket I/O
 code on Java though.
 

It confuses me as to why you consider using stable, well implemented, well 
tested and well cared for libraries as inexplicable.  Just because we are 
building a “driver” means we have to write every line of code ourselves?  No 
thanks.  You can imagine our differences on this philosophy are one of the 
reasons why I consider pgjdbc-ng’s parallel development to be a godsend rather 
than hacking on the original code.

Concerning Guava…  A great library with an amazing number of classes that make 
everyday Java easier.  The requirement for JDK 7 was chosen before Guava was 
considered not because of it.  Using it seemed obvious after that decision.  
Also, we have internalized the classes we use out of Guava to remove it as a 
dependency. This is more work to maintain on our part but makes it worth it 
when deploying a single JAR.

Concerning Netty…  All options were entertained at the beginning.  The original 
version actually used a basic NIO socket.  After I realized I had to basically 
write my own framework to work with this socket correctly I searched for an 
alternative and found Netty.  The deciding factor was that Implementing SSL on 
on top of the NIO API was considered next to impossible to get right; according 
to all prevailing wisdom at the time.  Whereas with Netty, SSL support is 
basically a single line change.

 I'm also concerned about how it'll handle new JDBC versions, as it seems
 to lack the JDBC interface abstraction of the core driver.
 

My plan is to handle adding support for 4.2 and beyond by using a Maven based 
conditional preprocessor.  If that fails, or seems just too ugly, I’ll probably 
have to look at an abstract class based method like that of the original driver.

 I do not think the JDBC batch interface even allow doing updates to multiple
 tables when using prepared statements?
 
 Ah, I missed this before.
 
 That's correct. You get prepared statements _or_ multiple different
 statements.
 
 That's a more understandable reason to concoct a new API, and explains
 why you're not just solving the issues with batches. Though I still
 think you're going to have to fix the buffer management code before you
 do anything like this.
 
 -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
 
 
 -- 
 Sent via pgsql-jdbc mailing list (pgsql-j...@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-jdbc



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #11867: Strange behaviour with composite types after resetting database tablespace

I wrote:
 However: at no point in this sequence did we flush the original catalog
 blocks out of shared buffers.  Therefore, a read of the database's
 pg_class or pg_type at this point is likely to find the previous states of
 those pages (pre-composite-type-drop) as valid, matching entries, so that
 we skip reading the up-to-date page contents back from disk.

 A quick fix for this would be to issue DropDatabaseBuffers during
 movedb()

BTW, I wondered whether the exact same hazard didn't exist for ALTER TABLE
SET TABLESPACE.  At first glance it looks bad, because ATExecSetTableSpace
doesn't forcibly drop old shared buffers for the moved table either.
Closer examination says we're safe though:

* The buffers will be dropped during smgrdounlink at end of transaction,
so you could only be at risk if you moved a table, modified it, and moved
it back in a single transaction.

* A new relfilenode will be assigned during each movement.

* When assigning a relfilenode during the move-back, we'd be certain to
choose a relfilenode different from the original one, because the old
relation still exists on-disk at this point.

So it's safe as things stand; but this seems a bit, um, rickety.  Should
we insert a DropRelFileNodesAllBuffers call into ATExecSetTableSpace to
make it safer?  It's kind of annoying to have to scan the buffer pool
twice, but I'm afraid that sometime in the future somebody will try to get
clever about optimizing away the end-of-transaction buffer pool scan, and
break this case.  Or maybe I'm just worrying too much.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alter user set local_preload_libraries.

Peter Eisentraut pete...@gmx.net writes:
 On 10/9/14 1:58 PM, Fujii Masao wrote:
 Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
 set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
 about applying the attached patch? This patch allows that and
 changes the context of session_preload_libraries to PGC_SU_BACKEND.

 After looking through this again, I wonder whether there is any reason
 why ignore_system_indexes cannot be plain PGC_USERSET.  With this
 change, we'd allow setting it via ALTER ROLE, but the access to
 pg_db_role_setting happens before it.  So if there is anything unsafe
 about changing ignore_system_indexes, then this would be a problem, but
 I don't see anything.

There are some, um, interesting consequences of setting
ignore_system_indexes; AFAIK, none that put data integrity at risk, but it
can destroy performance in ways beyond the obvious ones.  See for example
the comments for get_mergejoin_opfamilies and get_ordering_op_properties.
I don't particularly want to answer user bug reports about such behaviors,
nor do I care to put any effort into making the behavior without system
indexes smarter than it is now.  (We should also consider the risk that
there might be as-yet-unrecognized dependencies on catalog scan order that
would amount to actual bugs.)

So I'm -1 on any change that might make it look like we were encouraging
people to use ignore_system_indexes except as a very last resort.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] superuser() shortcuts

On 10/27/14 11:40 AM, Stephen Frost wrote:
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 As I started looking at this, there are multiple other places where
 these types of error messages occur (opclasscmds.c, user.c,
 postinit.c, miscinit.c are just a few), not just around the changes in
 this patch.  If we change them in one place, wouldn't it be best to
 change them in the rest?  If that is the case, I'm afraid that might
 distract from the purpose of this patch.  Perhaps, if we want to
 change them, then that should be submitted as a separate patch?

 Yeah.  I'm just saying that maybe this patch should adopt whatever
 wording we agree to, not that we need to change other places.  On the
 other hand, since so many other places have adopted the different
 wording, maybe there's a reason for it and if so, does anybody know what
 it is.  But I have to say that it does look inconsistent to me.
 
 Updated patch attached.  Comments welcome.

The changes in

src/backend/commands/alter.c
src/backend/commands/foreigncmds.c
src/backend/commands/tablecmds.c
src/backend/commands/typecmds.c

are OK, because the superuser() calls are redundant, and cleaning this
up is clearly in line with planned future work.

I suggest moving the rest of the changes into separate patches.


The error message wording changes might well be worth considering, but
there are tons more must be $someone to do $something error messages,
and changing two of them isn't making anything more or less consistent.


The ha*_something_privilege() changes are also not very consistent.

We already have have_createrole_privilege(), which does include a
superuser check, and you add has_replication_privilege() with a
superuser check, but has_catupdate_privilege() and
has_inherit_privilege() don't include a superuser check.  That's clearly
a mess.

Btw., why rename have_createrole_privilege()?


Also, your patch has spaces between tabs.  Check for whitespace errors
with git.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] get_cast_func syscache utility function



here's a patch for a utility function to look up the cast function for a 
from/to pair of types, as recently suggested by Alvaro. Although it only 
contains one use (in json.c), the upcoming jsonb generators would also 
use it twice. I'd like to get this committed fairly quickly so I can 
prepare an updated patch for the jsonb generators.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] get_cast_func syscache utility function

2014-11-04 Thread Pavel Stehule

missing patch

Regards

Pavel

2014-11-04 18:57 GMT+01:00 Andrew Dunstan and...@dunslane.net:


 here's a patch for a utility function to look up the cast function for a
 from/to pair of types, as recently suggested by Alvaro. Although it only
 contains one use (in json.c), the upcoming jsonb generators would also use
 it twice. I'd like to get this committed fairly quickly so I can prepare an
 updated patch for the jsonb generators.

 cheers

 andrew


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] get_cast_func syscache utility function



On 11/04/2014 12:57 PM, Andrew Dunstan wrote:


here's a patch for a utility function to look up the cast function for 
a from/to pair of types, as recently suggested by Alvaro. Although it 
only contains one use (in json.c), the upcoming jsonb generators would 
also use it twice. I'd like to get this committed fairly quickly so I 
can prepare an updated patch for the jsonb generators.





This time with patch.

cheers

andrew
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index d2bf640..fe9cf83 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1299,22 +1299,12 @@ json_categorize_type(Oid typoid,
 /* but let's look for a cast to json, if it's not built-in */
 if (typoid = FirstNormalObjectId)
 {
-	HeapTuple	tuple;
+	Oid castfunc = get_cast_func(typoid, JSONOID);
 
-	tuple = SearchSysCache2(CASTSOURCETARGET,
-			ObjectIdGetDatum(typoid),
-			ObjectIdGetDatum(JSONOID));
-	if (HeapTupleIsValid(tuple))
+	if (OidIsValid(castfunc))
 	{
-		Form_pg_cast castForm = (Form_pg_cast) GETSTRUCT(tuple);
-
-		if (castForm-castmethod == COERCION_METHOD_FUNCTION)
-		{
-			*tcategory = JSONTYPE_CAST;
-			*outfuncoid = castForm-castfunc;
-		}
-
-		ReleaseSysCache(tuple);
+		*tcategory = JSONTYPE_CAST;
+		*outfuncoid = castfunc;
 	}
 }
 			}
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 552e498..0da08fe 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -21,6 +21,7 @@
 #include bootstrap/bootstrap.h
 #include catalog/pg_amop.h
 #include catalog/pg_amproc.h
+#include catalog/pg_cast.h
 #include catalog/pg_collation.h
 #include catalog/pg_constraint.h
 #include catalog/pg_namespace.h
@@ -2889,3 +2890,36 @@ get_range_subtype(Oid rangeOid)
 	else
 		return InvalidOid;
 }
+
+/*
+ *
+ * get_cast_func(fromtype, totype) - funcoid
+ *
+ * find a cast function if any from fromtype to totype.
+ *
+ * return the Oid of the fucntion found or InvalidOid otherwise.
+ */
+
+Oid
+get_cast_func(Oid from_type, Oid to_type)
+{
+	HeapTuple	tuple;
+	Oid castfunc = InvalidOid;
+
+	tuple = SearchSysCache2(CASTSOURCETARGET,
+			ObjectIdGetDatum(from_type),
+			ObjectIdGetDatum(to_type));
+	if (HeapTupleIsValid(tuple))
+	{
+		Form_pg_cast castForm = (Form_pg_cast) GETSTRUCT(tuple);
+
+		if (castForm-castmethod == COERCION_METHOD_FUNCTION)
+		{
+			castfunc = castForm-castfunc;
+		}
+
+		ReleaseSysCache(tuple);
+	}
+
+	return castfunc;
+}
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 07d24d4..0291b96 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -151,6 +151,7 @@ extern void free_attstatsslot(Oid atttype,
   float4 *numbers, int nnumbers);
 extern char *get_namespace_name(Oid nspid);
 extern Oid	get_range_subtype(Oid rangeOid);
+extern Oid  get_cast_func(Oid from_type, Oid to_type);
 
 #define type_is_array(typid)  (get_element_type(typid) != InvalidOid)
 /* type_is_array_domain accepts both plain arrays and domains over arrays */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Bernd Helmle




--On 3. November 2014 18:15:04 +0100 Sven Wegener 
sven.wege...@stealer.net wrote:



I've check git master and 9.x and all show the same behaviour. I came up
with the patch below, which is against curent git master. The patch
modifies the COPY TO code to create a new snapshot, after acquiring the
necessary locks on the source tables, so that it sees any modification
commited by other backends.


Well, i have the feeling that there's nothing wrong with it. The ALTER 
TABLE command has rewritten all tuples with its own XID, thus the current 
snapshot does not see these tuples anymore. I suppose that in 
SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still 
doesn't return the tuples you'd like to see.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] get_cast_func syscache utility function

Andrew Dunstan and...@dunslane.net writes:
 here's a patch for a utility function to look up the cast function for 
 a from/to pair of types, as recently suggested by Alvaro. Although it 
 only contains one use (in json.c), the upcoming jsonb generators would 
 also use it twice. I'd like to get this committed fairly quickly so I 
 can prepare an updated patch for the jsonb generators.

I'm not exactly convinced that this is an appropriate utility function,
because casts for which a COERCION_METHOD_FUNCTION entry exists in pg_cast
are by no means the whole universe of casts.  I'm concerned that creating
this function will encourage patch authors to blow off other
possibilities when they should not.  In the particular context at hand,
it seems like you might be better advised to think about how to refactor
json_categorize_type to be more helpful to your new use-case.

A concrete example of what I'm worried about is that json_categorize_type
already ignores the possibility of binary-compatible (WITHOUT FUNCTION)
casts to json.  Since it's eliminated the domain case earlier, that's
perhaps not too horridly broken; but it'd be very easy for new uses of
this get_cast_func() function to overlook such considerations.

In short, I'd rather see this addressed through functions with slightly
higher-level APIs that are capable of covering more cases.  In most cases
it'd be best if callers were using find_coercion_pathway() rather than
taking shortcuts.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

Bernd Helmle maili...@oopsware.de writes:
 --On 3. November 2014 18:15:04 +0100 Sven Wegener 
 sven.wege...@stealer.net wrote:
 I've check git master and 9.x and all show the same behaviour. I came up
 with the patch below, which is against curent git master. The patch
 modifies the COPY TO code to create a new snapshot, after acquiring the
 necessary locks on the source tables, so that it sees any modification
 commited by other backends.

 Well, i have the feeling that there's nothing wrong with it. The ALTER 
 TABLE command has rewritten all tuples with its own XID, thus the current 
 snapshot does not see these tuples anymore. I suppose that in 
 SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still 
 doesn't return the tuples you'd like to see.

Not sure.  The OP's point is that in a SELECT, you do get unsurprising
results, because a SELECT will acquire its execution snapshot after it's
gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
already acts like he wants, so why isn't plain COPY equivalent to that?

What concerns me more is whether there are other utility statements that
have comparable issues.  We should not fix just COPY if there are more
cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE



On 11/04/2014 01:51 PM, Tom Lane wrote:

Bernd Helmle maili...@oopsware.de writes:

--On 3. November 2014 18:15:04 +0100 Sven Wegener
sven.wege...@stealer.net wrote:

I've check git master and 9.x and all show the same behaviour. I came up
with the patch below, which is against curent git master. The patch
modifies the COPY TO code to create a new snapshot, after acquiring the
necessary locks on the source tables, so that it sees any modification
commited by other backends.

Well, i have the feeling that there's nothing wrong with it. The ALTER
TABLE command has rewritten all tuples with its own XID, thus the current
snapshot does not see these tuples anymore. I suppose that in
SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still
doesn't return the tuples you'd like to see.

Not sure.  The OP's point is that in a SELECT, you do get unsurprising
results, because a SELECT will acquire its execution snapshot after it's
gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
already acts like he wants, so why isn't plain COPY equivalent to that?


Yes, that seems like an outright bug.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Sven Wegener

On Tue, 4 Nov 2014, Bernd Helmle wrote:

 --On 3. November 2014 18:15:04 +0100 Sven Wegener sven.wege...@stealer.net
 wrote:
 
  I've check git master and 9.x and all show the same behaviour. I came up
  with the patch below, which is against curent git master. The patch
  modifies the COPY TO code to create a new snapshot, after acquiring the
  necessary locks on the source tables, so that it sees any modification
  commited by other backends.
 
 Well, i have the feeling that there's nothing wrong with it. The ALTER TABLE
 command has rewritten all tuples with its own XID, thus the current snapshot
 does not see these tuples anymore. I suppose that in SERIALIZABLE or
 REPEATABLE READ transaction isolation your proposal still doesn't return the
 tuples you'd like to see.

No, but with REPEATABLE READ and SERIALIZABLE the plain SELECT case is 
currently broken too. The issue I'm fixing is that COPY (SELECT ...) TO 
and SELECT behave differently.

So, how does an ALTER TABLE should behave transaction-wise? PostgreSQL 
claims transactional DDL support. As mentioned above a parallel SELECT 
with SERIALIZABLE returns an empty result, but it also sees the schema 
change, at least the change shows up in the result tuple description. The 
change doesn't show up in pg_catalog, so that bit is handled correctly. 
The schema change transaction can be rolled back the way it is now, so 
it's kind of transactional, but other transactions seeing the schema 
change in query results is broken.

The empty result might be fixed by just keeping the old XID of rewritten 
tuples, as the exclusive lock ALTER TABLE helds should be enough to make 
sure nobody is actively accessing the rewritten table. But I'm pondering 
if there is a solution for the visible schema change that doesn't involve 
keeping the old data files around or to just raise an serialization error.

Coming back on my mentioned solution of setting the XID of rewritten 
tuples to the FrozenXID. That's broken as in the REPEATABLE 
READ/SERIALIZABLE case there might be tuples that should not be seen by 
older transactions and moving the tuples to the FrozenXID would make them 
visible.

Sven


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Sven Wegener

On Tue, 4 Nov 2014, Tom Lane wrote:

 Bernd Helmle maili...@oopsware.de writes:
  --On 3. November 2014 18:15:04 +0100 Sven Wegener 
  sven.wege...@stealer.net wrote:
  I've check git master and 9.x and all show the same behaviour. I came up
  with the patch below, which is against curent git master. The patch
  modifies the COPY TO code to create a new snapshot, after acquiring the
  necessary locks on the source tables, so that it sees any modification
  commited by other backends.
 
  Well, i have the feeling that there's nothing wrong with it. The ALTER 
  TABLE command has rewritten all tuples with its own XID, thus the current 
  snapshot does not see these tuples anymore. I suppose that in 
  SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still 
  doesn't return the tuples you'd like to see.
 
 Not sure.  The OP's point is that in a SELECT, you do get unsurprising
 results, because a SELECT will acquire its execution snapshot after it's
 gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
 Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
 already acts like he wants, so why isn't plain COPY equivalent to that?

No, both plain COPY and COPY (SELECT ...) TO are broken.

Sven


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Josh Berkus

On 11/04/2014 07:33 AM, Tom Lane wrote:
 More generally, it seems like a grab bag of not terribly well designed
 features, and the features that do seem well designed seem like they
 ought to be more generic than just for int4 arrays.  So to me it feels
 like proof-of-concept experimentation rather than a production-grade
 thing that we could feel good about moving into core.

Yah, as a user of intarray, I think it belongs as an external extension.
 Heck, I think it might get more attention as an external extension, and
if it were external then folks could fork it and create versions that
don't have operator conflicts.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] superuser() shortcuts

2014-11-04 Thread Adam Brightwell

Thanks for looking at this patch.


 I suggest moving the rest of the changes into separate patches.


Hmmm... perhaps the following?

* superuser-cleanup - contains above mentioned superuser shortcuts only.
* has_privilege-cleanup - contains has_*_priviledge cleanup only.

Would that also require a separate commitfest entry?

The ha*_something_privilege() changes are also not very consistent.

 We already have have_createrole_privilege(), which does include a
 superuser check, and you add has_replication_privilege() with a
 superuser check, but has_catupdate_privilege() and
 has_inherit_privilege() don't include a superuser check.  That's clearly
 a mess.


Good catch.  Though, according to the documentation, not even superuser is
allowed to bypass CATUPDATE.

http://www.postgresql.org/docs/9.4/static/catalog-pg-authid.html.

However, I can't think of a reason why inherit wouldn't need the
superuser check.  Obviously superuser is considered a member of every role,
but is there a reason that a superuser would not be allowed to bypass
this?  I only ask because it did not have a check previously, so I figure
there might have been a good reason for it?

Btw., why rename have_createrole_privilege()?


Well, actually it wasn't necessarily a rename.  It was a removal of that
function all together as all it did was simply return the result of
has_createrole_privilege.  That seemed rather redundant and unnecessary,
IMO.

Also, your patch has spaces between tabs.  Check for whitespace errors
 with git.


Yikes.

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com

Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

On 2014-11-04 13:51:23 -0500, Tom Lane wrote:
 Bernd Helmle maili...@oopsware.de writes:
  --On 3. November 2014 18:15:04 +0100 Sven Wegener 
  sven.wege...@stealer.net wrote:
  I've check git master and 9.x and all show the same behaviour. I came up
  with the patch below, which is against curent git master. The patch
  modifies the COPY TO code to create a new snapshot, after acquiring the
  necessary locks on the source tables, so that it sees any modification
  commited by other backends.
 
  Well, i have the feeling that there's nothing wrong with it. The ALTER 
  TABLE command has rewritten all tuples with its own XID, thus the current 
  snapshot does not see these tuples anymore. I suppose that in 
  SERIALIZABLE or REPEATABLE READ transaction isolation your proposal still 
  doesn't return the tuples you'd like to see.
 
 Not sure.  The OP's point is that in a SELECT, you do get unsurprising
 results, because a SELECT will acquire its execution snapshot after it's
 gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
 Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
 already acts like he wants, so why isn't plain COPY equivalent to that?

Even a plain SELECT essentially acts that way if I recall correctly if
you use REPEATABLE READ+ and force a snapshot to be acquired
beforehand. It's imo not very surprising.

All ALTER TABLE rewrites just disregard visibility of existing
tuples. Only CLUSTER/VACUUM FULL do the full hangups to keep all the
necessary tuples + ctid chains around.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Alvaro Herrera

Michael Paquier wrote:
 On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  Although I doubt necessity of the flexibility seeing the current
  testing framework, I don't have so strong objection about
  that. Nevertheless, perhaps you are appreciated to put a notice
  on.. README or somewhere.
 Hm, well... Fine, I added it in this updated series.

Did this go anywhere?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [JDBC] [HACKERS] Pipelining executions to postgresql server

2014-11-04 Thread Mikko Tiihonen

Kevin Wooten kd...@me.com wrote:
  On Nov 4, 2014, at 12:55 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 
  I have to say I like some aspects of how pgjdbc-ng is being done - in
  particular use of Maven and being built around non-blocking I/O.
 
  OTOH, the use of Google Guava I find pretty inexplicable in a JDBC
  driver, and since it hard-depends on JDK 7 I don't understand why Netty
  was used instead of the async / non-blocking features of the core JVM.
  That may simply be my inexperience with writing non-blocking socket I/O
  code on Java though.
 

Java6 has been EOL since 2011 and Java7 is EOL in less than 6 months. I think
that depending on old Java 7 version that soon should not even be used in
production (without paying for extra support) can hardly be too hard 
requirement.

 It confuses me as to why you consider using stable, well implemented, well 
 tested and well cared for libraries as inexplicable.   Just because we are 
 building a “driver” means we have to write every line of code ourselves?  No 
 thanks.

Embedding parts of other projects into code-base during build with renamed 
packages is nowadays common practice in java projects: spring does it, 
elasticsearch embeds whole netty and more, even jre embeds for example xerces 
and asm.
It might not be the optimal solution, but still definitely better than writing 
everything from scratch or copy-pasting code from other projects.

If pgjdbc-ng provides both a thin maven version (with external versioned 
dependencies) and a fat-jar with the external versions repackaged inside the 
users can choose either old way: just-drop-a-jdbc-jar-into-project or new way 
with their chosen build tool that automatically manages the dependencies.

-Mikko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup fails with long tablespace paths

On 10/20/14 4:51 PM, Peter Eisentraut wrote:
 On 10/20/14 2:59 PM, Tom Lane wrote:
 What do we want to do about this?  I think a minimum expectation would be
 for pg_basebackup to notice and complain when it's trying to create an
 unworkably long symlink entry, but it would be far better if we found a
 way to cope instead.
 
 Isn't it the backend that should error out before sending truncated
 files names?
 
 src/port/tar.c:
 
 /* Name 100 */
 sprintf(h[0], %.99s, filename);

Here are patches to address that.  First, it reports errors when
attempting to create a tar header that would truncate file or symlink
names.  Second, it works around the problem in the tests by creating a
symlink from the short-name tempdir that we had set up for the
Unix-socket directory case.

The first patch can be backpatched to 9.3.  The tar code before that is
different and would need manual adjustments.

If someone has a too-long tablespace path, I think they can work around
that after this patch by creating a shorter symlink and updating the
pg_tblspc symlinks to point there.
From 16d5eb9514c89391e6c2361212363a7ac2f16e0b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut peter_e@gmx.net
Date: Tue, 4 Nov 2014 15:19:18 -0500
Subject: [PATCH 1/2] Error when creating names too long for tar format

The tar format (at least the version we are using), does not support
file names or symlink targets longer than 99 bytes.  Until now, the tar
creation code would silently truncate any names that are too long.  (Its
original application was pg_dump, where this never happens.)  This
creates problems when running base backups over the replication
protocol.

The most important problem is when a tablespace path is longer than 99
bytes, which will result in a truncated tablespace path being backed up.
Less importantly, the basebackup protocol also promises to back up any
other files it happens to find in the data directory, which would also
lead to file name truncation if someone put a file with a long name in
there.

Now both of these cases result in an error during the backup.
---
 src/backend/replication/basebackup.c | 21 -
 src/include/pgtar.h  | 10 +-
 src/port/tar.c   | 10 +-
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbcecbb..5835725 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1234,11 +1234,30 @@ _tarWriteHeader(const char *filename, const char *linktarget,
 struct stat * statbuf)
 {
 	char		h[512];
+	enum tarError rc;
 
-	tarCreateHeader(h, filename, linktarget, statbuf-st_size,
+	rc = tarCreateHeader(h, filename, linktarget, statbuf-st_size,
 	statbuf-st_mode, statbuf-st_uid, statbuf-st_gid,
 	statbuf-st_mtime);
 
+	switch (rc)
+	{
+		case TAR_OK:
+			break;
+		case TAR_NAME_TOO_LONG:
+			ereport(ERROR,
+	(errmsg(file name too long for tar format: \%s\,
+			filename)));
+			break;
+		case TAR_SYMLINK_TOO_LONG:
+			ereport(ERROR,
+	(errmsg(symbolic link target too long for tar format: file name \%s\, target \%s\,
+			filename, linktarget)));
+			break;
+		default:
+			elog(ERROR, unrecognized tar error: %d, rc);
+	}
+
 	pq_putmessage('d', h, 512);
 }
 
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index 2dd6f6b..6b6c1e1 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -11,5 +11,13 @@
  *
  *-
  */
-extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
+
+enum tarError
+{
+	TAR_OK = 0,
+	TAR_NAME_TOO_LONG,
+	TAR_SYMLINK_TOO_LONG
+};
+
+extern enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
 extern int	tarChecksum(char *header);
diff --git a/src/port/tar.c b/src/port/tar.c
index 09fd6c1..74168e8 100644
--- a/src/port/tar.c
+++ b/src/port/tar.c
@@ -49,10 +49,16 @@ tarChecksum(char *header)
  * must always have space for 512 characters, which is a requirement by
  * the tar format.
  */
-void
+enum tarError
 tarCreateHeader(char *h, const char *filename, const char *linktarget,
 size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime)
 {
+	if (strlen(filename)  99)
+		return TAR_NAME_TOO_LONG;
+
+	if (linktarget  strlen(linktarget)  99)
+		return TAR_SYMLINK_TOO_LONG;
+
 	/*
 	 * Note: most of the fields in a tar header are not supposed to be
 	 * null-terminated.  We use sprintf, which will write a null after the
@@ -141,4 +147,6 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget,
 	 * 6 digits, a space, and a null, which is legal per POSIX.
 	 */
 	sprintf(h[148], %06o , tarChecksum(h));
+
+	return TAR_OK;
 }
-- 
2.1.3

From 25f6daded3a4be8182af2d185f4826708f7a1b76 Mon

Re: [HACKERS] Sequence Access Method WIP


On 04/11/14 13:11, Heikki Linnakangas wrote:

On 10/13/2014 01:01 PM, Petr Jelinek wrote:

Hi,

I rewrote the patch with different API along the lines of what was
discussed.


Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to
see how it really works. The local one is too dummy to expose any
possible issues.



Yeah, I don't know what that would be, It's hard for me to find 
something that would sever purely demonstration purposes. I guess I 
could port BDR sequences to this if it would help (once we have bit more 
solid agreement that the proposed API at least theoretically seems ok so 
that I don't have to rewrite it 10 times if at all possible).



...
Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.


Hmm. The division of labour between the seqam and commands/sequence.c
still feels a bit funny. sequence.c keeps track of how many values have
been WAL-logged, and thus usable immediately, but we still call
sequence_alloc even when using up those already WAL-logged values. If
you think of using this for something like a centralized sequence server
in a replication cluster, you certainly don't want to make a call to the
remote server for every value - you'll want to cache them.

With the local seqam, there are two levels of caching. Each backend
caches some values (per the CACHE value option in CREATE SEQUENCE). In
addition to that, the server WAL-logs 32 values at a time. If you have a
remote seqam, it would most likely add a third cache, but it would
interact in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. The
server keeps the sequence page locked throughout nextval(). But if the
actual state of the sequence is maintained elsewhere, there's no need to
serialize the calls to the remote allocator, i.e. the sequence_alloc()
calls.

I'm not exactly sure what to do about that. One option is to completely
move the maintenance of the current value, i.e. sequence.last_value,
to the seqam. That makes sense from an abstraction point of view. For
example with a remote server managing the sequence, storing the
current value in the local catalog table makes no sense as it's always
going to be out-of-date. The local seqam would store it as part of the
am-private data. However, you would need to move the responsibility of
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
provide an API that the seqam can call to do that. Perhaps just let the
seqam call heap_inplace_update on the sequence relation.



My idea of how this works is - sequence_next handles the allocation and 
the level2 caching (WAL logged cache) via amdata if it supports it, or 
returns single value if it doesn't - then the WAL will always just write 
the 1 value and there will basically be no level2 cache, since it is the 
sequence_next who controls how much will be WAL-logged, what backend 
asks for is just suggestion.


The third level caching as you say should be solved by the 
sequence_request_update and sequence_update callback - that will enable 
the sequence AM to handle this kind of caching asynchronously without 
blocking the sequence_next unnecessarily.


That way it's possible to implement many different strategies, from no 
cache at all, to three levels of caching, with and without blocking of 
calls to sequence_next.



For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be.


It would be nice if the seqam could define exactly the columns it needs,
with any datatypes. There would be a set of common attributes:
sequence_name, start_value, cache_value, increment_by, max_value,
min_value, is_cycled. The local seqam would add last_value, log_cnt
and is_called to that. A remote seqam that calls out to some other
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the
required columns, for example.



Wouldn't that somewhat bloat catalog if we had new catalog table for 
each sequence AM? It also does not really solve the amdata being dynamic 
size issue. I think just dynamic field where AM stores whatever it 
wants is enough (ie the current solution), it's

Re: [HACKERS] tracking commit timestamps


On 04/11/14 09:05, Andres Freund wrote:

On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:

Well, Michael has point that the extradata is pretty much useless currently,
perhaps it would help to add the interface to set extradata?


Only accessible via C and useless aren't the same thing. But sure, add
it.



I actually meant nicer C api - the one that will make it possible to say 
for this transaction, use this extradata (or for all transactions from 
now on done by this backend use this extradata), instead of current API 
where you have to overwrite what RecordCommit already wrote.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-11-04 Thread Alvaro Herrera

Michael Paquier wrote:
 On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
  Although I doubt necessity of the flexibility seeing the current
  testing framework, I don't have so strong objection about
  that. Nevertheless, perhaps you are appreciated to put a notice
  on.. README or somewhere.
 Hm, well... Fine, I added it in this updated series.

FWIW I gave this a trial run and found I needed some tweaks to test.sh
and the Makefile in order to make it work on VPATH; mainly replace ./
with `dirname $0` in a couple test.sh in a couple of places, and
something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
which doesn't work.

Also you have the Makefile checking for -DBUFFER_CAPTURE exactly but for
some reason I used -DBUFFER_CAPTURE=1 which wasn't well received by your
$(filter) stuff.  Instead of checking CFLAGS it might make more sense to
expose it as a read-only GUC and grep `postmaster -C buffer_capture` or
similar.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps


On 04/11/14 09:19, Michael Paquier wrote:

On Sat, Nov 1, 2014 at 10:00 PM, Michael Paquier
michael.paqu...@gmail.com mailto:michael.paqu...@gmail.com wrote:

More comments:

Here are also more comments about the code that I found while focusing
on committs.c:
1) When the GUC is not enabled, and when the transaction ID provided is
not in a correct range, a dummy value based on InvalidTransactionId (?!)
is returned, like that:
+   /* Return empty if module not enabled */
+   if (!commit_ts_enabled)
+   {
+   if (ts)
+   *ts = InvalidTransactionId;
+   if (data)
+   *data = (CommitExtraData) 0;
+   return;
+   }
This leads to some incorrect results:
=# select pg_get_transaction_committime('1');
  pg_get_transaction_committime
---
  2000-01-01 09:00:00+09
(1 row)


Oh, I had this on my TODO and somehow forgot about it, I am leaning 
towards throwing an error when calling any committs get function while 
the tracking is disabled.



Or for example:
=# SELECT txid_current();
  txid_current
--
  1006
(1 row)
=# select pg_get_transaction_committime('1006');
  pg_get_transaction_committime
---
  2014-11-04 14:54:37.589381+09
(1 row)
=# select pg_get_transaction_committime('1007');
  pg_get_transaction_committime
---
  2000-01-01 09:00:00+09
(1 row)
=# SELECT txid_current();
  txid_current
--
  1007
(1 row)
And at other times error is not that helpful for user:
=# select pg_get_transaction_committime('1');
ERROR:  XX000: could not access status of transaction 1
DETAIL:  Could not read from file pg_committs/ at offset 114688:
Undefined error: 0.
LOCATION:  SlruReportIOError, slru.c:896
I think that you should simply return an error in
TransactionIdGetCommitTsData when xid is not in the range supported.
That will be more transparent for the user.


Agreed.


5) Reading the code, TransactionTreeSetCommitTimestamp is always called
with do_xlog = false, making that actually no timestamps are WAL'd...
Hence WriteSetTimestampXlogRec is just dead code with the latest version
of the patch. IMO, this should be always WAL-logged when
track_commit_timestamp is on.


As Andres explained this is always WAL-logged when called by current 
caller so we don't want it to be double logged, so that's why do_xlog = 
false, but when extension will need call it, it will most likely need 
do_xlog = true.



8) The redo and xlog routines of committs should be out in a dedicated
file, like committsxlog.c or similar. The other rmgr do so, so let's be
consistent.



Most actually don't AFAICS.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

On 11/3/14 5:17 PM, Petr Jelinek wrote:
 Please don't name anything committs.  That looks like a misspelling of
 something.

 There is nothing wrong with

 pg_get_transaction_commit_timestamp()

 If you want to reduce the length, lose the get.

 
 I am fine with that, I only wonder if your definition of anything only
 concerns the SQL interfaces or also the internals.

I'd be fine with commit_ts for internals, but not committs.

One day, you'll need a function or data structure that works with
multiple of these, and then you'll really be in naming trouble. ;-)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps


On 04/11/14 09:25, Michael Paquier wrote:

On Tue, Nov 4, 2014 at 5:05 PM, Andres Freund and...@2ndquadrant.com
mailto:and...@2ndquadrant.com wrote:

On 2014-11-02 19:27:25 +0100, Petr Jelinek wrote:
 Well, Michael has point that the extradata is pretty much useless 
currently,
 perhaps it would help to add the interface to set extradata?

Only accessible via C and useless aren't the same thing. But sure, add
it.

I'm still on a -1 for that. You mentioned that there is perhaps no
reason to delay a decision on this matter, but IMO there is no reason to
rush either in doing something we may regret. And I am not the only one
on this thread expressing concern about this extra data thingy.

If this extra data field is going to be used to identify from which node
a commit comes from, then it is another feature than what is written on
the subject of this thread. In this case let's discuss it in the thread
dedicated to replication identifiers, or come up with an extra patch
once the feature for commit timestamps is done.


The issue as I see it is that both of those features are closely related 
and just one without the other has very limited use. What I learned from 
working on UDR is that for conflict handling, I was actually missing the 
extradata more than the timestamp itself - in other words I have 
extension for 9.4 where I have use for this API already, so the argument 
about dead code or forks or whatever does not really hold.


The other problem is that if we add extradata later we will either break 
upgrade-ability of will have to write essentially same code again which 
will store just the extradata instead of the timestamp, I don't really 
like either of those options to be honest.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

On 11/4/14 3:21 PM, Alvaro Herrera wrote:
 FWIW I gave this a trial run and found I needed some tweaks to test.sh
 and the Makefile in order to make it work on VPATH; mainly replace ./
 with `dirname $0` in a couple test.sh in a couple of places, and
 something similar in the Makefile.  Also you have $PG_ROOT_DIR somewhere
 which doesn't work.

I also saw some bashisms in the script.

Maybe the time for shell-based test scripts has passed?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps


On 04/11/14 22:20, Peter Eisentraut wrote:

On 11/3/14 5:17 PM, Petr Jelinek wrote:

Please don't name anything committs.  That looks like a misspelling of
something.

There is nothing wrong with

pg_get_transaction_commit_timestamp()

If you want to reduce the length, lose the get.



I am fine with that, I only wonder if your definition of anything only
concerns the SQL interfaces or also the internals.


I'd be fine with commit_ts for internals, but not committs.

One day, you'll need a function or data structure that works with
multiple of these, and then you'll really be in naming trouble. ;-)



Hmm we use CommitTs in interfaces that uses CamelCase naming so I guess 
commit_ts is indeed natural expansion of that.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivelog completion command

On 11/2/14 8:26 AM, Magnus Hagander wrote:
 The idea is to have pg_receivexlog
 fire off an external command at the end of each segment - for example
 a command to gzip the file, or to archive it off into a Magic Cloud
 (TM) or something like that.

A simple facility to allow gzipping after the file is complete might be
OK, but the cloud use case is probably too abstract to be useful.  I'd
rather write my own consumer for that, or go back to archive_command,
which has the queuing logic built in already.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

Andres Freund and...@2ndquadrant.com writes:
 On 2014-11-04 13:51:23 -0500, Tom Lane wrote:
 Not sure.  The OP's point is that in a SELECT, you do get unsurprising
 results, because a SELECT will acquire its execution snapshot after it's
 gotten AccessShareLock on the table.  Arguably COPY should behave likewise.
 Or to be even more concrete, COPY (SELECT * FROM tab) TO ... probably
 already acts like he wants, so why isn't plain COPY equivalent to that?

 Even a plain SELECT essentially acts that way if I recall correctly if
 you use REPEATABLE READ+ and force a snapshot to be acquired
 beforehand. It's imo not very surprising.

It doesn't fail in a non-default isolation mode is hardly much of an
argument for this being okay in READ COMMITTED.

 All ALTER TABLE rewrites just disregard visibility of existing
 tuples. Only CLUSTER/VACUUM FULL do the full hangups to keep all the
 necessary tuples + ctid chains around.

Yeah, and I think that it's entirely reasonable for rewriting ALTER TABLEs
to update the xmin of the rewritten tuples; after all, the output data
could be arbitrarily different from what the previous transactions put
into the table.  But that is not the question here.  If the COPY blocks
until the ALTER completes --- as it must --- why is its execution snapshot
not taken *after* the lock is acquired?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [GENERAL] Re: [BUGS] [HACKERS] COPY TO returning empty result with parallel ALTER TABLE

2014-11-04 Thread Bernd Helmle




--On 4. November 2014 17:18:14 -0500 Tom Lane t...@sss.pgh.pa.us wrote:


Yeah, and I think that it's entirely reasonable for rewriting ALTER TABLEs
to update the xmin of the rewritten tuples; after all, the output data
could be arbitrarily different from what the previous transactions put
into the table.  But that is not the question here.  If the COPY blocks
until the ALTER completes --- as it must --- why is its execution snapshot
not taken *after* the lock is acquired?


COPY waits in DoCopy() but its snapshot gets acquired in PortalRunUtility() 
earlier. SELECT has it's lock already during transform/analyse phase and 
its snapshot is taken much later.  Looks like we need something that 
analyses a utility statement to get its lock before executing.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] ltree::text not immutable?

I wrote:
 An alternative that just occurred to me is to put the no-volatile-
 I/O-functions check into CREATE TYPE, but make it just WARNING not
 ERROR.  That would be nearly as good as an ERROR in terms of nudging
 people who'd accidentally omitted a volatility marking from their
 custom types.  But we could leave chkpass as-is and wait to see if
 anyone complains hey, why am I getting this warning?.  If we don't
 hear anyone complaining, maybe that means we can get away with changing
 the type's behavior in 9.6 or later.

Attached is a complete patch along these lines.  As I suggested earlier,
this just makes the relevant changes in ltree--1.0.sql and
pg_trgm--1.1.sql without bumping their extension version numbers,
since it doesn't seem important enough to justify a version bump.

I propose that we could back-patch the immutability-additions in ltree and
pg_trgm, since they won't hurt anything and might make life a little
easier for future adopters of those modules.  The WARNING additions should
only go into HEAD though.

regards, tom lane

diff --git a/contrib/ltree/ltree--1.0.sql b/contrib/ltree/ltree--1.0.sql
index 5a2f375..7d55fc6 100644
*** a/contrib/ltree/ltree--1.0.sql
--- b/contrib/ltree/ltree--1.0.sql
***
*** 6,17 
  CREATE FUNCTION ltree_in(cstring)
  RETURNS ltree
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION ltree_out(ltree)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE ltree (
  	INTERNALLENGTH = -1,
--- 6,17 
  CREATE FUNCTION ltree_in(cstring)
  RETURNS ltree
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION ltree_out(ltree)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE ltree (
  	INTERNALLENGTH = -1,
*** CREATE OPERATOR CLASS ltree_ops
*** 303,314 
  CREATE FUNCTION lquery_in(cstring)
  RETURNS lquery
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION lquery_out(lquery)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE lquery (
  	INTERNALLENGTH = -1,
--- 303,314 
  CREATE FUNCTION lquery_in(cstring)
  RETURNS lquery
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION lquery_out(lquery)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE lquery (
  	INTERNALLENGTH = -1,
*** CREATE OPERATOR ^? (
*** 414,425 
  CREATE FUNCTION ltxtq_in(cstring)
  RETURNS ltxtquery
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION ltxtq_out(ltxtquery)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE ltxtquery (
  	INTERNALLENGTH = -1,
--- 414,425 
  CREATE FUNCTION ltxtq_in(cstring)
  RETURNS ltxtquery
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION ltxtq_out(ltxtquery)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE ltxtquery (
  	INTERNALLENGTH = -1,
*** CREATE OPERATOR ^@ (
*** 481,492 
  CREATE FUNCTION ltree_gist_in(cstring)
  RETURNS ltree_gist
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION ltree_gist_out(ltree_gist)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE ltree_gist (
  	internallength = -1,
--- 481,492 
  CREATE FUNCTION ltree_gist_in(cstring)
  RETURNS ltree_gist
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION ltree_gist_out(ltree_gist)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE ltree_gist (
  	internallength = -1,
diff --git a/contrib/pg_trgm/pg_trgm--1.1.sql b/contrib/pg_trgm/pg_trgm--1.1.sql
index 1fff7af..34b37e4 100644
*** a/contrib/pg_trgm/pg_trgm--1.1.sql
--- b/contrib/pg_trgm/pg_trgm--1.1.sql
*** CREATE OPERATOR - (
*** 53,64 
  CREATE FUNCTION gtrgm_in(cstring)
  RETURNS gtrgm
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE FUNCTION gtrgm_out(gtrgm)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT;
  
  CREATE TYPE gtrgm (
  INTERNALLENGTH = -1,
--- 53,64 
  CREATE FUNCTION gtrgm_in(cstring)
  RETURNS gtrgm
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE FUNCTION gtrgm_out(gtrgm)
  RETURNS cstring
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE TYPE gtrgm (
  INTERNALLENGTH = -1,
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 55a6881..cbb65f8 100644
*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
*** DefineType(List *names, List *parameters
*** 547,552 
--- 547,598 
  	   NameListToString(analyzeName));
  #endif
  
+ 	/*
+ 	 * Print warnings if any of the type's I/O functions are marked volatile.
+ 	 * There is a general assumption that I/O functions are stable or
+ 	 * immutable; this allows us for example to mark record_in/record_out
+ 	 * stable

[HACKERS] to_char_at_timezone()?

2014-11-04 Thread Marko Tiikkaja


Hi,

9.4 FINALLY added the UTC offset formatting pattern to to_char(). 
However, it falls a bit short in the sense that it's always the time 
zone offset according to the effective TimeZone value.  This has a few 
issues as far as I can tell:


  1) It's not truly controlled by the query which produces the 
timestamptz values in the case of e.g. functions

  2) Having to SET LOCAL before a query is quite ugly
  3) It supports only a single TimeZone value per query

So I got into thinking whether it would make sense to provide a new 
function, say, to_char_at_timezone() to solve this problem.  For example:


local:marko=#* select now();
  now
---
 2014-11-05 00:43:47.954662+01
(1 row)

local:marko=#* select to_char_at_timezone(now(), '-MM-DD 
HH24:MI:SSOF', 'Etc/Utc');

   to_char_at_timezone

 2014-11-04 23:43:47+00
(1 row)

local:marko=#* select to_char_at_timezone(now(), '-MM-DD 
HH24:MI:SSOF', 'America/Los_Angeles');

   to_char_at_timezone

 2014-11-04 15:43:47-08
(1 row)


Any thoughts?  The patch is quite trivial.


.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

On Tue, Nov 4, 2014 at 10:01 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Michael Paquier wrote:

  I'm still on a -1 for that. You mentioned that there is perhaps no reason
  to delay a decision on this matter, but IMO there is no reason to rush
  either in doing something we may regret. And I am not the only one on
 this
  thread expressing concern about this extra data thingy.
 
  If this extra data field is going to be used to identify from which node
 a
  commit comes from, then it is another feature than what is written on the
  subject of this thread. In this case let's discuss it in the thread
  dedicated to replication identifiers, or come up with an extra patch once
  the feature for commit timestamps is done.

 Introducing the extra data field in a later patch would mean an on-disk
 representation change, i.e. pg_upgrade trouble.

Then why especially 4 bytes for the extra field? Why not 8 or 16?
-- 
Michael

Re: [HACKERS] to_char_at_timezone()?

Marko Tiikkaja ma...@joh.to writes:
 So I got into thinking whether it would make sense to provide a new 
 function, say, to_char_at_timezone() to solve this problem.  For example:
 ...
 Any thoughts?  The patch is quite trivial.

I'm not convinced that it's all that trivial.  Is the input timestamp or
timestamptz, and what's the semantics of that exactly (ie what timezone
rotation happens)?  One's first instinct is often wrong in this area.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] to_char_at_timezone()?

2014-11-04 Thread Marko Tiikkaja


On 11/5/14, 12:59 AM, Tom Lane wrote:

Marko Tiikkaja ma...@joh.to writes:

So I got into thinking whether it would make sense to provide a new
function, say, to_char_at_timezone() to solve this problem.  For example:
...
Any thoughts?  The patch is quite trivial.


I'm not convinced that it's all that trivial.  Is the input timestamp or
timestamptz, and what's the semantics of that exactly (ie what timezone
rotation happens)?  One's first instinct is often wrong in this area.


In my example, the input is a timestamptz, and the output is converted 
to the target time zone the same way timestamptz_out() does, except 
based on the input timezone instead of TimeZone.


Not sure whether it would make sense to do this for timestamp, or 
whether there's even a clear intuitive behaviour there.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

2014-11-04 Thread Álvaro Hernández Tortosa



On 04/11/14 09:07, Craig Ringer wrote:

On 11/04/2014 07:31 AM, Álvaro Hernández Tortosa wrote:

 Thank you for your comment, Tom. However I think this behavior, as
seen from a user perspective, it's not the expected one.

That may be the case, but I think it's the SQL-standard behaviour, so we
can't really mess with it.

The spec requires SET TRANSACTION ISOLATION, and you can't implement
that if you take a snapshot at BEGIN.


It's true that the standard mandates SET TRANSACTION rather than 
setting the isolation level with the BEGIN statement, and in any case 
you can raise/lower the isolation level with SET regardless of what the 
session or the begin command said. However, is it really a problem 
taking a snapshot at BEGIN time --only if the tx is started with BEGIN 
... (REPEATABLE READ | SERIALIZABLE)? AFAIK, and I may be missing some 
internal details here, the worst that can happen is that you took one 
extra, unnecessary snapshot. I don't see that as a huge problem.


The standard (92) says that transaction is initiated when a 
transaction-initiating SQL-statement is executed. To be fair, that 
sounds to me more of a SELECT rather than a BEGIN, but I may be wrong.



 If it is still the intended behavior, I think it should be clearly
documented as such, and a recommendation similar to issue a 'SELECT 1'
right after BEGIN to freeze the data before any own query or similar
comment should be added. Again, as I said in my email, the documentation
clearly says that only sees data committed before the transaction
began. And this is clearly not the real behavior.

It's more of a difference in when the transaction begins.

Arguably, BEGIN says I intend to begin a new transaction with the
next query rather than immediately begin executing a new transaction.

This concept could be clearer in the docs.


If this is really how it should behave, I'd +1000 to make it 
clearer in the docs, and to explicitly suggest the user to perform a 
query discarding the results early after BEGIN if the user wants the 
state freezed if there may span time between BEGIN and the real queries 
to be executed (like doing a SELECT 1).





 Sure, there are, that was the link I pointed out, but I found no
explicit mention to the fact that I'm raising here.

I'm sure it's documented *somewhere*, in that I remember reading about
this detail in the docs, but I can't find _where_ in the docs.

It doesn't seem to be in:

http://www.postgresql.org/docs/current/static/transaction-iso.html

where I'd expect.


Yepp, there's no mention there.



In any case, we simply cannot take the snapshot at BEGIN time, because
it's permitted to:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

in a DB that has default serializable isolation or has a SET SESSION
CHARACTERISTICS isolation mode of serializable. Note that SET
TRANSACTION is SQL-standard.


As I said, AFAIK it shouldn't matter a lot to take the snapshot at 
BEGIN. The worst that can happen is that you end up in read committed 
and you need to take more snapshots, one per query.




AFAIK deferring the snapshot that's consistent with other RDBMSes that
use snapshots, too.


I tried Oracle and SQL Server. SQL Server seems to behave as 
PostgreSQL, but just because it locks the table if accessed in a 
serializable transaction, so it definitely waits until select to lock 
it. However, Oracle behaved as I expected: data is frozen at BEGIN time. 
I haven't tested others.





The docs of that command allude to, but doesn't explicitly state, the
behaviour you mention.

http://www.postgresql.org/docs/current/static/sql-set-transaction.html




Should we improve then the docs stating this more clearly? Any 
objection to do this?


Regards,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

On 2014-11-05 08:57:07 +0900, Michael Paquier wrote:
 On Tue, Nov 4, 2014 at 10:01 PM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:
 
  Michael Paquier wrote:
 
   I'm still on a -1 for that. You mentioned that there is perhaps no reason
   to delay a decision on this matter, but IMO there is no reason to rush
   either in doing something we may regret. And I am not the only one on
  this
   thread expressing concern about this extra data thingy.
  
   If this extra data field is going to be used to identify from which node
  a
   commit comes from, then it is another feature than what is written on the
   subject of this thread. In this case let's discuss it in the thread
   dedicated to replication identifiers, or come up with an extra patch once
   the feature for commit timestamps is done.
 
  Introducing the extra data field in a later patch would mean an on-disk
  representation change, i.e. pg_upgrade trouble.
 
 Then why especially 4 bytes for the extra field? Why not 8 or 16?

It's sufficiently long that you can build infrastructure to storing more
transaction metadata data ontop. I could live making it 8 bytes, but I
don't see a clear advantage.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Preferring MemSet or memset?

Hi all,

MemSet is an internal macro faster than the system memset for zeroing small
word-aligned structures defined in src/include/c.h. Both are being used
here and there with no real preference.
An example of that is that in many code paths we have for example nulls and
values used to build tuples, that are sometimes initialized with memset,
other times with MemSet. Wouldn't we gain a bit of performance by switching
to MemSet the initializations of nulls and values currently done with
memset?
Opinions?
Regards,
-- 
Michael

Re: [HACKERS] Preferring MemSet or memset?

Michael Paquier michael.paqu...@gmail.com writes:
 MemSet is an internal macro faster than the system memset for zeroing small
 word-aligned structures defined in src/include/c.h. Both are being used
 here and there with no real preference.
 An example of that is that in many code paths we have for example nulls and
 values used to build tuples, that are sometimes initialized with memset,
 other times with MemSet. Wouldn't we gain a bit of performance by switching
 to MemSet the initializations of nulls and values currently done with
 memset?

MemSet is *not* a win unless the target area is word-sized+word-aligned,
so it'd be a pretty unlikely thing for it to win on zeroing bool arrays.

In principle it could win for zeroing Datum arrays, but I don't think
I want to read code that is zeroing a Datum array with MemSet and the
adjacent bool array with memset; that's just weird, and it's unlikely
that we get enough win out of it to justify the notational inconsistency.

I keep expecting that we'll find that on modern platforms memset is a
winner across the board.  There is no reason that gcc couldn't optimize
memset calls into code as good as or better than MemSet, and I have the
impression that the compiler guys have actually put some effort into
optimizing it lately.  I'm not sure if anyone's recently redone the
tests that showed MemSet was worthwhile ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Convert query plan to sql query

2014-11-04 Thread mariem

Hello,

I would like to transform the query plan (output of the planner,
debug_print_plan) into an sql query.
I know that there are pieces of the query plan that might be machine
dependent (in var for example).
So I wanted to have your suggestions or thoughts before I put efforts into
it.

Basically, if I have:
 query1 - parser - rewriter - planner
the process would be :
 query_plan - planner - parser - query2

query1 and query2 are not necessarily the same due to rewrite, stats..

Thanks!




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Convert-query-plan-to-sql-query-tp5825727.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add CREATE support to event triggers

On Fri, Oct 31, 2014 at 11:27 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 So, what I think is missing is really a friendly interface to manipulate
 JsonbContainers directly, and I think that we are not far from it with
 something like this set, roughly:
 - Initialization of an empty container
 - Set of APIs to directly push a value to a container (boolean, array,
 null, string, numeric or other jsonb object)
 - Initialization of JsonbValue objects

Here are more thoughts among those lines looking at the current state of
the patch 4 that introduces the infrastructure of the whole feature. This
would make possible in-memory manipulation of jsonb containers without
relying on a 3rd-part set of APIs like what this patch is doing with
ObjTree to deparse the DDL parse trees.
1) Set of constructor functions for JsonbValue: null, bool, string, array,
JSONB object for nested values. Note that keys for can be used as Jsonb
string objects
2) Lookup functions for values in a JsonbContainer. Patch 4 is introducing
that with find_string_in_jsonbcontainer and find_bool_in_jsonbcontainer. We
may as well extend it to be able to look for another Jsonb object for
nested searches for example.
3) Functions to push JsonbValue within a container, using a key and a
value. This is where most of the work would be necessary, for bool, null,
string, Jsonb object and numeric.

This infrastructure would allow in-memory manipulation of jsonb containers.
Containers that can then be easily be manipulated to be changed back to
strings and for value lookups using key strings.
Regards,
--
Michael

Re: [HACKERS] WAL replay bugs

Thanks for the tests.

On Wed, Nov 5, 2014 at 5:21 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

Michael Paquier wrote:
On Mon, Jul 14, 2014 at 6:14 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
Although I doubt necessity of the flexibility seeing the current
testing framework, I don't have so strong objection about
that. Nevertheless, perhaps you are appreciated to put a notice
on.. README or somewhere.
Hm, well... Fine, I added it in this updated series.

FWIW I gave this a trial run and found I needed some tweaks to test.sh
and the Makefile in order to make it work on VPATH; mainly replace ./
with `dirname $0` in a couple test.sh in a couple of places, and
something similar in the Makefile. Also you have $PG_ROOT_DIR somewhere
which doesn't work.

Ah thanks, forgot that.

Also you have the Makefile checking for -DBUFFER_CAPTURE exactly but for
some reason I used -DBUFFER_CAPTURE=1 which wasn't well received by your
$(filter) stuff. Instead of checking CFLAGS it might make more sense to
expose it as a read-only GUC and grep `postmaster -C buffer_capture` or
similar.

Yes that's a good idea.

Now, do we really want this feature in-core? That's somewhat a duplicate of
what is mentioned here:
http://www.postgresql.org/message-id/CAB7nPqQMq=4ejak317mxz4has0i+1rslbqu29zx18jwlb2j...@mail.gmail.com
Of course both things do not have the same coverage as the former is for
buildfarm and dev, while the latter is dedidated to production systems, but
could be used for development as well.

The patch sent there is a bit outdated, but a potential implementation gets
simpler with XLogReadBufferForRedo able to return flags about each block
state during redo. I am still planning to come back to it for this cycle,
though I stopped for now waiting for the WAL format patches finish to shape
the APIs this feature would rely on.
Regards,
--
Michael

Re: [HACKERS] WAL replay bugs