Re: [HACKERS] Parallel Seq Scan

2015-01-02 Thread Thom Brown
On 1 January 2015 at 17:59, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jan 1, 2015 at 12:00 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Can we check the number of free bgworkers slots to set the max workers?

 The real solution here is that this patch can't throw an error if it's
 unable to obtain the desired number of background workers.  It needs
 to be able to smoothly degrade to a smaller number of background
 workers, or none at all.  I think a lot of this work will fall out
 quite naturally if this patch is reworked to use the parallel
 mode/parallel context stuff, the latest version of which includes an
 example of how to set up a parallel scan in such a manner that it can
 run with any number of workers.


+1

That sounds like exactly what's needed.

Thom


Re: [HACKERS] Parallel Seq Scan

2015-01-02 Thread Thom Brown
On 2 January 2015 at 11:13, Amit Kapila amit.kapil...@gmail.com wrote:


 On Fri, Jan 2, 2015 at 4:09 PM, Thom Brown t...@linux.com wrote:
 
  On 1 January 2015 at 10:34, Amit Kapila amit.kapil...@gmail.com wrote:
 
   Running it again, I get the same issue.  This is with
 parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.
  
   This doesn't happen if I set the pgbench scale to 50.  I suspect this
 is a OOM issue.  My laptop has 16GB RAM, the table is around 13GB at scale
 100, and I don't have swap enabled.  But I'm concerned it crashes the whole
 instance.
  
 
  Isn't this a backend crash due to OOM?
  And after that server will restart automatically.
 
 
  Yes, I'm fairly sure it is.  I guess what I'm confused about is that 8
 parallel sequential scans in separate sessions (1 per session) don't cause
 the server to crash, but in a single session (8 in 1 session), they do.
 

 It could be possible that master backend retains some memory
 for longer period which causes it to hit OOM error, by the way
 in your test does always master backend hits OOM or is it
 random (either master or worker)


Just ran a few tests, and it appears to always be the master that hits OOM,
or at least I don't seem to be able to get an example of the worker hitting
it.



 
  Will there be a GUC to influence parallel scan cost?  Or does it take
 into account effective_io_concurrency in the costs?
 
  And will the planner be able to decide whether or not it'll choose to
 use background workers or not?  For example:
 

 Yes, we are planing to introduce cost model for parallel
 communication (there is some discussion about the same
 upthread), but it's still not there and that's why you
 are seeing it to choose parallel plan when it shouldn't.
 Currently in patch, if you set parallel_seqscan_degree, it
 will most probably choose parallel plan only.


Ah, okay.  Great.

Thanks.

Thom


Re: [HACKERS] Parallel Seq Scan

2015-01-02 Thread Amit Kapila
On Fri, Jan 2, 2015 at 4:09 PM, Thom Brown t...@linux.com wrote:

 On 1 January 2015 at 10:34, Amit Kapila amit.kapil...@gmail.com wrote:

  Running it again, I get the same issue.  This is with
parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.
 
  This doesn't happen if I set the pgbench scale to 50.  I suspect this
is a OOM issue.  My laptop has 16GB RAM, the table is around 13GB at scale
100, and I don't have swap enabled.  But I'm concerned it crashes the whole
instance.
 

 Isn't this a backend crash due to OOM?
 And after that server will restart automatically.


 Yes, I'm fairly sure it is.  I guess what I'm confused about is that 8
parallel sequential scans in separate sessions (1 per session) don't cause
the server to crash, but in a single session (8 in 1 session), they do.


It could be possible that master backend retains some memory
for longer period which causes it to hit OOM error, by the way
in your test does always master backend hits OOM or is it
random (either master or worker)


 Will there be a GUC to influence parallel scan cost?  Or does it take
into account effective_io_concurrency in the costs?

 And will the planner be able to decide whether or not it'll choose to use
background workers or not?  For example:


Yes, we are planing to introduce cost model for parallel
communication (there is some discussion about the same
upthread), but it's still not there and that's why you
are seeing it to choose parallel plan when it shouldn't.
Currently in patch, if you set parallel_seqscan_degree, it
will most probably choose parallel plan only.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] What exactly is our CRC algorithm?

2015-01-02 Thread Heikki Linnakangas

On 01/01/2015 09:17 AM, Abhijit Menon-Sen wrote:

Hi.

OK, here are the patches with the various suggestions applied.

I found that the alignment didn't seem to make much difference for the
CRC32* instructions, so I changed to process (len/8)*8bytes followed by
(len%8)*1bytes, the way the Linux kernel does.


Ok.

In the slicing-by-8 version, I wonder if it would be better to do 
single-byte loads to c0-c7, instead of two 4-byte loads and shifts. 
4-byte loads are presumably faster than single byte loads, but then 
you'd avoid the shifts. And then you could go straight into the 
8-bytes-at-a-time loop, without the initial single-byte processing to 
get the start address aligned. (the Linux implementation doesn't do 
that, so maybe it's a bad idea, but might be worth testing..)


Looking at the Linux implementation, I think it only does the bswap once 
per call, not inside the hot loop. Would it even make sense to keep the 
crc variable in different byte order, and only do the byte-swap once in 
END_CRC32() ?


The comments need some work. I note that there is no mention of the 
slicing-by-8 algorithm anywhere in the comments (in the first patch).


Instead of checking for defined(__GNUC__) || defined(__clang__), 
should add an explicit configure test for __builtin_bswap32().


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel mode and parallel contexts

2015-01-02 Thread Amit Kapila
On Thu, Dec 18, 2014 at 1:23 AM, Robert Haas robertmh...@gmail.com wrote:


 In the meantime, I had a good chat with Heikki on IM yesterday which
 gave me some new ideas on how to fix up the transaction handling in
 here, which I am working on implementing.  So hopefully I will have
 that by then.

 I am also hoping Amit will be adapting his parallel seq-scan patch to
 this framework.


While working on parallel seq-scan patch to adapt this framework, I
noticed few things and have questions regrading the same.

1.
Currently parallel worker just attaches to error queue, for tuple queue
do you expect it to be done in the same place or in the caller supplied
function, if later then we need segment address as input to that function
to attach queue to the segment(shm_mq_attach()).
Another question, I have in this regard is that if we have redirected
messages to error queue by using pq_redirect_to_shm_mq, then how can
we set tuple queue for the same purpose.  Similarly I think more handling
is needed for tuple queue in master backend and the answer to above will
dictate what is the best way to do it.

2.
Currently there is no interface for wait_for_workers_to_become_ready()
in your patch, don't you think it is important that before start of fetching
tuples, we should make sure all workers are started, what if some worker
fails to start?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

2015-01-02 Thread Dennis Kögel
Am 31.12.2014 um 11:40 schrieb Michael Paquier michael.paqu...@gmail.com:
 As long as master is fixed, I don't actually care. But I agree with Dennis
 that it's hard to see what's been commited with all the different issues
 found, and if any commits were done, in which branch. I'd like to be able to
 tell my customers: update to this minor release to see if it's fixed, but I
 can't even do that.
 This bug does not endanger at all data consistency as only old WAL
 files remain on the standby, so I'm fine as well with just a clean fix
 on master, and nothing done on back-branches.

I’d like to point out that this issue is causing severe disk space bloat over 
time, as described in my previous posting. I’d assume that, as time goes by, 
this issue will pop up more and more, as long as back-branches remain unfixed.

And as the general recommendation is to never ever fiddle with pg_xlog/ 
contents, I’m not even sure what to do about it.

- D.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-02 Thread Amit Kapila
On Fri, Jan 2, 2015 at 11:47 AM, Dilip kumar dilip.ku...@huawei.com wrote:

 On 31 December 2014 18:36, Amit Kapila Wrote,

 The patch looks good to me.  I have done couple of

 cosmetic changes (spelling mistakes, improve some comments,

 etc.), check the same once and if you are okay, we can move

 ahead.



 Thanks for review and changes, changes looks fine to me..


Okay, I have marked this patch as Ready For Committer

Notes for Committer -
There is one behavioural difference in the handling of --analyze-in-stages
switch, when individual tables (by using -t option) are analyzed by
using this switch, patch will process (in case of concurrent jobs) all the
given tables for stage-1 and then for stage-2 and so on whereas in the
unpatched code it will process all the three stages table by table
(table-1 all three stages, table-2 all three stages and so on).  I think
the new behaviour is okay as the same is done when this utility does
vacuum for whole database.  As there was no input from any committer
on this point, I thought it is better to get the same rather than waiting
more just for one point.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Andres Freund
On 2014-12-31 16:09:31 -0500, Bruce Momjian wrote:
 I still don't understand the value of adding WAL compression, given the
 high CPU usage and minimal performance improvement.  The only big
 advantage is WAL storage, but again, why not just compress the WAL file
 when archiving.

before: pg_xlog is 800GB
after: pg_xlog is 600GB.

I'm damned sure that many people would be happy with that, even if the
*per backend* overhead is a bit higher. And no, compression of archives
when archiving helps *zap* with that (streaming, wal_keep_segments,
checkpoint_timeout). As discussed before.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compress method for spgist - 2

2015-01-02 Thread Heikki Linnakangas

On 12/23/2014 03:02 PM, Teodor Sigaev wrote:

I think we'll need a separate SpGistTypeDesc for the input type. Or perhaps a
separate SpGistTypeDesc for the reconstructed value and an optional decompress
method to turn the reconstructedValue back into an actual reconstructed input
datum. Or something like that.

I suppose that compress and reconstruct are mutual exclusive options.


I would rather not assume that. You might well want to store something 
in the leaf nodes that's different from the original Datum, but 
nevertheless contains enough information to reconstruct the original Datum.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Bruce Momjian
On Fri, Jan  2, 2015 at 10:15:57AM -0600, k...@rice.edu wrote:
 On Fri, Jan 02, 2015 at 01:01:06PM +0100, Andres Freund wrote:
  On 2014-12-31 16:09:31 -0500, Bruce Momjian wrote:
   I still don't understand the value of adding WAL compression, given the
   high CPU usage and minimal performance improvement.  The only big
   advantage is WAL storage, but again, why not just compress the WAL file
   when archiving.
  
  before: pg_xlog is 800GB
  after: pg_xlog is 600GB.
  
  I'm damned sure that many people would be happy with that, even if the
  *per backend* overhead is a bit higher. And no, compression of archives
  when archiving helps *zap* with that (streaming, wal_keep_segments,
  checkpoint_timeout). As discussed before.
  
  Greetings,
  
  Andres Freund
  
 
 +1
 
 On an I/O constrained system assuming 50:50 table:WAL I/O, in the case
 above you can process 100GB of transaction data at the cost of a bit
 more CPU.

OK, so given your stats, the feature give a 12.5% reduction in I/O.  If
that is significant, shouldn't we see a performance improvement?  If we
don't see a performance improvement, is I/O reduction worthwhile?  Is it
valuable in that it gives non-database applications more I/O to use?  Is
that all?

I suggest we at least document that this feature as mostly useful for
I/O reduction, and maybe say CPU usage and performance might be
negatively impacted.

OK, here is the email I remember from Fujii Masao this same thread that
showed a performance improvement for WAL compression:


http://www.postgresql.org/message-id/CAHGQGwGqG8e9YN0fNCUZqTTT=hnr7ly516kft5ffqf4pp1q...@mail.gmail.com

Why are we not seeing the 33% compression and 15% performance
improvement he saw?  What am I missing here?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Andres Freund
On 2015-01-02 12:06:33 -0500, Bruce Momjian wrote:
 On Fri, Jan  2, 2015 at 05:55:52PM +0100, Andres Freund wrote:
  On 2015-01-02 11:52:42 -0500, Bruce Momjian wrote:
   Why are we not seeing the 33% compression and 15% performance
   improvement he saw?  What am I missing here?
  
  To see performance improvements something needs to be the bottleneck. If
  WAL writes/flushes aren't that in the tested scenario, you won't see a
  performance benefit. Amdahl's law and all that.
  
  I don't understand your negativity about the topic.
 
 I remember the initial post from Masao in August 2013 showing a
 performance boost, so I assumed, while we had the concurrent WAL insert
 performance improvement in 9.4, this was going to be our 9.5 WAL
 improvement.

I don't think it makes sense to compare features/improvements that way.

 While the WAL insert performance improvement required no tuning and
 was never a negative

It's actually a negative in some cases.

 , I now see the compression patch as something that has negatives, so
 has to be set by the user, and only wins in certain cases.  I am
 disappointed, and am trying to figure out how this became such a
 marginal win for 9.5.  :-(

I find the notion that a multi digit space reduction is a marginal win
pretty ridiculous and way too narrow focused. Our WAL volume is a
*significant* problem in the field. And it mostly consists out of FPWs
spacewise.

 My negativity is not that I don't want it, but I want to understand why
 it isn't better than I remembered.  You are basically telling me it was
 always a marginal win.  :-(  Boohoo!

No, I didn't. I told you that *IN ONE BENCHMARK* wal writes apparently
are not the bottleneck.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-02 Thread Kevin Grittner
Amit Kapila amit.kapil...@gmail.com wrote:

 Notes for Committer -
 There is one behavioural difference in the handling of --analyze-in-stages
 switch, when individual tables (by using -t option) are analyzed by
 using this switch, patch will process (in case of concurrent jobs) all the
 given tables for stage-1 and then for stage-2 and so on whereas in the
 unpatched code it will process all the three stages table by table
 (table-1 all three stages, table-2 all three stages and so on).  I think
 the new behaviour is okay as the same is done when this utility does
 vacuum for whole database.

IMV, the change is for the better.  The whole point of
--analyze-in-stages is to get minimal statistics so that good
enough plans will be built for most queries to allow a production
database to be brought back on-line quickly, followed by generating
increasing granularity (which takes longer but should help ensure
best plan) while the database is in use with the initial
statistics.  Doing all three levels for one table before generating
the rough statistics for the others doesn't help with that, so I
see this change as fixing a bug.  Is it feasible to break that part 
out as a separate patch?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Claudio Freire
On Fri, Jan 2, 2015 at 2:11 PM, Andres Freund and...@2ndquadrant.com wrote:
 , I now see the compression patch as something that has negatives, so
 has to be set by the user, and only wins in certain cases.  I am
 disappointed, and am trying to figure out how this became such a
 marginal win for 9.5.  :-(

 I find the notion that a multi digit space reduction is a marginal win
 pretty ridiculous and way too narrow focused. Our WAL volume is a
 *significant* problem in the field. And it mostly consists out of FPWs
 spacewise.

One thing I'd like to point out, is that in cases where WAL I/O is an
issue (ie: WAL archiving), usually people already compress the
segments during archiving. I know I do, and I know it's recommended on
the web, and by some consultants.

So, I wouldn't want this FPW compression, which is desirable in
replication scenarios if you can spare the CPU cycles (because of
streaming), adversely affecting WAL compression during archiving.

Has anyone tested the compressability of WAL segments with FPW compression on?

AFAIK, both pglz and lz4 output should still be compressible with
deflate, but I've never tried.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Bruce Momjian
On Fri, Jan  2, 2015 at 05:55:52PM +0100, Andres Freund wrote:
 On 2015-01-02 11:52:42 -0500, Bruce Momjian wrote:
  Why are we not seeing the 33% compression and 15% performance
  improvement he saw?  What am I missing here?
 
 To see performance improvements something needs to be the bottleneck. If
 WAL writes/flushes aren't that in the tested scenario, you won't see a
 performance benefit. Amdahl's law and all that.
 
 I don't understand your negativity about the topic.

I remember the initial post from Masao in August 2013 showing a
performance boost, so I assumed, while we had the concurrent WAL insert
performance improvement in 9.4, this was going to be our 9.5 WAL
improvement.   While the WAL insert performance improvement required no
tuning and was never a negative, I now see the compression patch as
something that has negatives, so has to be set by the user, and only
wins in certain cases.  I am disappointed, and am trying to figure out
how this became such a marginal win for 9.5.  :-(

My negativity is not that I don't want it, but I want to understand why
it isn't better than I remembered.  You are basically telling me it was
always a marginal win.  :-(  Boohoo!

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Publish autovacuum informations

2015-01-02 Thread Jim Nasby

On 1/1/15, 4:17 PM, Noah Misch wrote:

I'd be all right with putting the data structure declarations in a file
named something like autovacuum_private.h, especially if it carried an
annotation that if you depend on this, don't be surprised if we break
your code in future.

Such an annotation would be no more true than it is for the majority of header
files.  If including it makes you feel better, I don't object.


We need to be careful with that. Starting to segregate things into _private 
headers implies that stuff in non-private headers *is* locked down. We'd need 
to set clear expectations.

I do think more clarity would be good here. Right now the only distinction we have is things 
like SPI are spelled out in the docs. Other than that, the there really isn't anything to 
indicate how safe it is to rely on what's in the headers. For example, I've got some code 
that's looking at fcinfo-flinfo-fn_expr, and I have no idea how likely that is to get 
broken. Since it's a parse node, my guess is likely, but I'm just guessing.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread k...@rice.edu
On Fri, Jan 02, 2015 at 01:01:06PM +0100, Andres Freund wrote:
 On 2014-12-31 16:09:31 -0500, Bruce Momjian wrote:
  I still don't understand the value of adding WAL compression, given the
  high CPU usage and minimal performance improvement.  The only big
  advantage is WAL storage, but again, why not just compress the WAL file
  when archiving.
 
 before: pg_xlog is 800GB
 after: pg_xlog is 600GB.
 
 I'm damned sure that many people would be happy with that, even if the
 *per backend* overhead is a bit higher. And no, compression of archives
 when archiving helps *zap* with that (streaming, wal_keep_segments,
 checkpoint_timeout). As discussed before.
 
 Greetings,
 
 Andres Freund
 

+1

On an I/O constrained system assuming 50:50 table:WAL I/O, in the case
above you can process 100GB of transaction data at the cost of a bit
more CPU.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Bruce Momjian
On Fri, Jan  2, 2015 at 06:11:29PM +0100, Andres Freund wrote:
  My negativity is not that I don't want it, but I want to understand why
  it isn't better than I remembered.  You are basically telling me it was
  always a marginal win.  :-(  Boohoo!
 
 No, I didn't. I told you that *IN ONE BENCHMARK* wal writes apparently
 are not the bottleneck.

What I have not seen is any recent benchmarks that show it as a win,
while the original email did, so I was confused.  I tried to explain
exactly how I viewed things  --- you can not like it, but that is how I
look for upcoming features, and where we should focus our time.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Andres Freund
On 2015-01-02 11:52:42 -0500, Bruce Momjian wrote:
 Why are we not seeing the 33% compression and 15% performance
 improvement he saw?  What am I missing here?

To see performance improvements something needs to be the bottleneck. If
WAL writes/flushes aren't that in the tested scenario, you won't see a
performance benefit. Amdahl's law and all that.

I don't understand your negativity about the topic.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-01-02 Thread Tom Lane
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 While looking at the patch for supporting inheritance on foreign tables, I
 noticed that if a transaction makes changes to more than two foreign
 servers the current implementation in postgres_fdw doesn't make sure that
 either all of them rollback or all of them commit their changes, IOW there
 is a possibility that some of them commit their changes while others
 rollback theirs.

 PFA patch which uses 2PC to solve this problem. In pgfdw_xact_callback() at
 XACT_EVENT_PRE_COMMIT event, it sends prepares the transaction at all the
 foreign postgresql servers and at XACT_EVENT_COMMIT or XACT_EVENT_ABORT
 event it commits or aborts those transactions resp.

TBH, I think this is a pretty awful idea.

In the first place, this does little to improve the actual reliability
of a commit occurring across multiple foreign servers; and in the second
place it creates a bunch of brand new failure modes, many of which would
require manual DBA cleanup.

The core of the problem is that this doesn't have anything to do with
2PC as it's commonly understood: for that, you need a genuine external
transaction manager that is aware of all the servers involved in a
transaction, and has its own persistent state (or at least a way to
reconstruct its own state by examining the per-server states).
This patch is not that; in particular it treats the local transaction
asymmetrically from the remote ones, which doesn't seem like a great
idea --- ie, the local transaction could still abort after committing
all the remote ones, leaving you no better off in terms of cross-server
consistency.

As far as failure modes go, one basic reason why this cannot work as
presented is that the remote servers may not even have prepared
transaction support enabled (in fact max_prepared_transactions = 0
is the default in all supported PG versions).  So this would absolutely
have to be a not-on-by-default option.  But the bigger issue is that
leaving it to the DBA to clean up after failures is not a production
grade solution, *especially* not for prepared transactions, which are
performance killers if not closed out promptly.  So I can't imagine
anyone wanting to turn this on without a more robust answer than that.

Basically I think what you'd need for this to be a credible patch would be
for it to work by changing the behavior only in the PREPARE TRANSACTION
path: rather than punting as we do now, prepare the remote transactions,
and report their server identities and gids to an external transaction
manager, which would then be responsible for issuing the actual commits
(along with the actual commit of the local transaction).  I have no idea
whether it's feasible to do that without having to assume a particular
2PC transaction manager API/implementation.

It'd be interesting to hear from people who are using 2PC in production
to find out if this would solve any real-world problems for them, and
what the details of the TM interface would need to look like to make it
work in practice.

In short, you can't force 2PC technology on people who aren't using it
already; while for those who are using it already, this isn't nearly
good enough as-is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-02 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 01/02/2015 11:41 PM, Tom Lane wrote:
 What might be worth trying is establishing a hard-and-fast boundary
 between C land and SQL land, with bitwise names in C and bytewise names
 in SQL.  This would mean, for example, that int4pl() would be renamed to
 int32pl() so far as the C function goes, but the function's SQL name would
 remain the same.

 I don't like that. I read int4pl as the function implementing plus 
 operator for the SQL-visible int4 datatype, so int4pl makes perfect sense.

I agree with that so far as the SQL name for the function goes, which is
part of why I don't think we should rename anything at the SQL level.
But right now at the C level, it's unclear how things should be named,
and I think we don't really want a situation where the most appropriate
name is so unclear and potentially confusing.  We're surviving fine with
int32 in C meaning int4 in SQL so far as the type names go, so why not
copy that naming approach for function names?

 That would introduce visible inconsistency between such
 functions' pg_proc.proname and pg_proc.prosrc fields, but at least the
 inconsistency would follow a very clear pattern.  And I doubt that very
 many user applications are depending on the contents of pg_proc.prosrc.

 Someone might be doing
 DirectFunctionCall2(int4pl, ...)
 in an extension. Well, probably not too likely for int4pl, as you could 
 just use the native C + operator, but it's not inconceivable for 
 something like int4recv().

We don't have a lot of hesitation about breaking individual function calls
in extensions, so long as (1) you'll get a compile error and (2) it's
clear how to update the code.  See for instance the multitude of cases
where we've added new arguments to existing C functions.  So I don't think
this objection holds a lot of water, especially when (as you note) there's
not that much reason to be calling most of these functions directly from C.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-02 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 On 12/31/14, 8:13 AM, Andres Freund wrote:
 Note that the C datatype has been int32/int64 for a while now, it's just
 the SQL datatype and the names of its support functions. Given that,
 afaiu, we're talking about the C datatype it seems pretty clear that it
 should be named int128.

I don't think there was any debate about calling the C typedef int128.
The question at hand was that there are some existing C functions whose
names follow the int2/int4/int8 convention, and it's not very clear what
to do with their 128-bit analogues.  Having said that, I'm fine with
switching to int128 for the C names of the new functions; but it is
definitely less than consistent with what we're doing elsewhere.

 Should we start down this road with SQL too, by creating int32, 64 and 128 
 (if needed?), and changing docs as needed?

No.  The situation is messy enough without changing our conventions at
the SQL level; that will introduce breakage into user applications,
for no very good reason because we've never had any inconsistency there.

What might be worth trying is establishing a hard-and-fast boundary
between C land and SQL land, with bitwise names in C and bytewise names
in SQL.  This would mean, for example, that int4pl() would be renamed to
int32pl() so far as the C function goes, but the function's SQL name would
remain the same.  That would introduce visible inconsistency between such
functions' pg_proc.proname and pg_proc.prosrc fields, but at least the
inconsistency would follow a very clear pattern.  And I doubt that very
many user applications are depending on the contents of pg_proc.prosrc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2015-01-02 Thread Andres Freund
On 2014-12-29 16:59:05 -0500, Bruce Momjian wrote:
 diff --git a/src/backend/storage/buffer/buf_init.c 
 b/src/backend/storage/buffer/buf_init.c
 new file mode 100644
 index ff6c713..c4dce5b
 *** a/src/backend/storage/buffer/buf_init.c
 --- b/src/backend/storage/buffer/buf_init.c
 *** InitBufferPool(void)
 *** 67,72 
 --- 67,73 
   boolfoundBufs,
   foundDescs;
   
 + fprintf(stderr, Buffer Descriptors size = %ld\n, sizeof(BufferDesc));
   BufferDescriptors = (BufferDesc *)
   ShmemInitStruct(Buffer Descriptors,
   NBuffers * sizeof(BufferDesc), 
 foundDescs);
 diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
 new file mode 100644
 index 2ea2216..669c07f
 *** a/src/backend/storage/ipc/shmem.c
 --- b/src/backend/storage/ipc/shmem.c
 *** ShmemInitStruct(const char *name, Size s
 *** 327,332 
 --- 327,335 
   ShmemIndexEnt *result;
   void   *structPtr;
   
 + if (strcmp(name, Buffer Descriptors) == 0)
 + size = BUFFERALIGN(size) + 64;
 + 
   LWLockAcquire(ShmemIndexLock, LW_EXCLUSIVE);
   
   if (!ShmemIndex)
 *** ShmemInitStruct(const char *name, Size s
 *** 413,418 
 --- 416,432 
\%s\ (%zu bytes 
 requested),
   name, size)));
   }
 + if (strcmp(name, Buffer Descriptors) == 0)
 + {
 + /* align on 32 */
 + if ((int64)structPtr % 32 != 0)
 + structPtr = (void *)((int64)structPtr + 32 - 
 (int64)structPtr % 32);
 + /* align on 32 but not 64 */
 + if ((int64)structPtr % 64 == 0)
 + structPtr = (void *)((int64)structPtr + 32);
 + }
 + fprintf(stderr, shared memory alignment of %s:  %ld-byte\n, 
 name,
 + (int64)structPtr % 64 == 0 ? 64 : (int64)structPtr % 
 64);
   result-size = size;
   result-location = structPtr;
   }

 diff --git a/src/backend/storage/buffer/buf_init.c 
 b/src/backend/storage/buffer/buf_init.c
 new file mode 100644
 index ff6c713..c4dce5b
 *** a/src/backend/storage/buffer/buf_init.c
 --- b/src/backend/storage/buffer/buf_init.c
 *** InitBufferPool(void)
 *** 67,72 
 --- 67,73 
   boolfoundBufs,
   foundDescs;
   
 + fprintf(stderr, Buffer Descriptors size = %ld\n, sizeof(BufferDesc));
   BufferDescriptors = (BufferDesc *)
   ShmemInitStruct(Buffer Descriptors,
   NBuffers * sizeof(BufferDesc), 
 foundDescs);
 diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
 new file mode 100644
 index 2ea2216..50f836e
 *** a/src/backend/storage/ipc/shmem.c
 --- b/src/backend/storage/ipc/shmem.c
 *** ShmemInitStruct(const char *name, Size s
 *** 327,332 
 --- 327,335 
   ShmemIndexEnt *result;
   void   *structPtr;
   
 + if (strcmp(name, Buffer Descriptors) == 0)
 + size = BUFFERALIGN(size) + 64;
 + 
   LWLockAcquire(ShmemIndexLock, LW_EXCLUSIVE);
   
   if (!ShmemIndex)
 *** ShmemInitStruct(const char *name, Size s
 *** 413,418 
 --- 416,429 
\%s\ (%zu bytes 
 requested),
   name, size)));
   }
 + if (strcmp(name, Buffer Descriptors) == 0)
 + {
 + /* align on 64 */
 + if ((int64)structPtr % 64 != 0)
 + structPtr = (void *)((int64)structPtr + 64 - 
 (int64)structPtr % 64);
 + }
 + fprintf(stderr, shared memory alignment of %s:  %ld-byte\n, 
 name,
 + (int64)structPtr % 64 == 0 ? 64 : (int64)structPtr % 
 64);
   result-size = size;
   result-location = structPtr;
   }

I can't run tests right now...

What exactly do you want to see with these tests? that's essentially
what I've already benchmarked + some fprintfs?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-02 Thread Jim Nasby

On 12/31/14, 8:13 AM, Andres Freund wrote:

On 2015-01-01 03:00:50 +1300, David Rowley wrote:

2. References to int16 meaning 16 bytes. I'm really in two minds about

this,

it's quite nice to keep the natural flow, int4, int8, int16, but I can't
help think that this will confuse someone one day. I think it'll be a

long

time before it confused anyone if we called it int128 instead, but I'm

not

that excited about seeing it renamed either. I'd like to hear what others
have to say... Is there a chance that some sql standard in the distant
future will have HUGEINT and we might regret not getting the internal

names

nailed down?


Yeah, I think using int16 to mean 16-bytes will be confusing to
someone almost immediately.



hmm, I think it should be changed to int128 then.  Pitty int4 was selected
as a name instead of int32 back in the day...


Note that the C datatype has been int32/int64 for a while now, it's just
the SQL datatype and the names of its support functions. Given that,
afaiu, we're talking about the C datatype it seems pretty clear that it
should be named int128.


Should we start down this road with SQL too, by creating int32, 64 and 128 (if 
needed?), and changing docs as needed?

Presumably that would be best as a separate patch...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Bruce Momjian
On Fri, Jan  2, 2015 at 02:18:12PM -0300, Claudio Freire wrote:
 On Fri, Jan 2, 2015 at 2:11 PM, Andres Freund and...@2ndquadrant.com wrote:
  , I now see the compression patch as something that has negatives, so
  has to be set by the user, and only wins in certain cases.  I am
  disappointed, and am trying to figure out how this became such a
  marginal win for 9.5.  :-(
 
  I find the notion that a multi digit space reduction is a marginal win
  pretty ridiculous and way too narrow focused. Our WAL volume is a
  *significant* problem in the field. And it mostly consists out of FPWs
  spacewise.
 
 One thing I'd like to point out, is that in cases where WAL I/O is an
 issue (ie: WAL archiving), usually people already compress the
 segments during archiving. I know I do, and I know it's recommended on
 the web, and by some consultants.
 
 So, I wouldn't want this FPW compression, which is desirable in
 replication scenarios if you can spare the CPU cycles (because of
 streaming), adversely affecting WAL compression during archiving.

To be specific, desirable in streaming replication scenarios that don't
use SSL compression.  (What percentage is that?)  It is something we
should mention in the docs for this feature?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2015-01-02 Thread Jim Nasby

On 12/31/14, 3:05 PM, Noah Misch wrote:

On Wed, Dec 31, 2014 at 05:33:43PM +, Andrew Gierth wrote:

 Noah == Noah Mischn...@leadboat.com  writes:


  Noah Suppose one node orchestrated all sorting and aggregation.

Well, that has the downside of making it into an opaque blob, without
actually gaining much.

The opaque-blob criticism is valid.  As for not gaining much, well, the gain I
sought was to break this stalemate.  You and Tom have expressed willingness to
accept the read I/O multiplier of the CTE approach.  You and I are willing to
swallow an architecture disruption, namely a tuplestore acting as a side
channel between executor nodes.  Given your NACK, I agree that it fails to
move us toward consensus and therefore does not gain much.  Alas.


I haven't read the full discussion in depth, but is what we'd want here is the 
ability to feed tuples to more than one node simultaneously? That would allow 
things like

GroupAggregate
-- Sort(a) \
+-- Sort(a,b) -\
-- Hash(b) +
\-- SeqScan

That would allow the planner to trade off things like total memory consumption 
vs IO.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-02 Thread Heikki Linnakangas

On 01/02/2015 11:41 PM, Tom Lane wrote:

What might be worth trying is establishing a hard-and-fast boundary
between C land and SQL land, with bitwise names in C and bytewise names
in SQL.  This would mean, for example, that int4pl() would be renamed to
int32pl() so far as the C function goes, but the function's SQL name would
remain the same.


I don't like that. I read int4pl as the function implementing plus 
operator for the SQL-visible int4 datatype, so int4pl makes perfect sense.



 That would introduce visible inconsistency between such
functions' pg_proc.proname and pg_proc.prosrc fields, but at least the
inconsistency would follow a very clear pattern.  And I doubt that very
many user applications are depending on the contents of pg_proc.prosrc.


Someone might be doing

DirectFunctionCall2(int4pl, ...)

in an extension. Well, probably not too likely for int4pl, as you could 
just use the native C + operator, but it's not inconceivable for 
something like int4recv().


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2015-01-02 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 To be specific, desirable in streaming replication scenarios that don't
 use SSL compression.  (What percentage is that?)  It is something we
 should mention in the docs for this feature?

Considering how painful the SSL rengeotiation problems were and the CPU
overhead, I'd be surprised if many high-write-volume replication
environments use SSL at all.

There's a lot of win to be had from compression of FPWs, but it's like
most compression in that there are trade-offs to be had and environments
where it won't be a win, but I believe those cases to be the minority.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-02 Thread Peter Geoghegan
On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan p...@heroku.com wrote:
 I've been working on fixing the bugs that Jeff Janes found [1] with
 approach #2 to value locking [2]. Approach #1 was unaffected.

Unfortunately, exclusion constraints (which only value locking
approach #2 has support for, for the IGNORE variant) are broken, even
with my fix for the largely unrelated issues also described in my
opening mail. I now believe that it is inherently impossible to
support exclusion constraints with INSERT ... ON CONFLICT IGNORE, if
we are insistent that there cannot be livelocks, unprincipled
deadlocks or spurious exclusion violationsand I think you all know
how I feel about those  :-)  . I think that we should drop support for
exclusion constraints, regardless of whatever else happens with value
locking.

Previously, my mental model of approach #2 was that it more or less
handled B-Trees and exclusion constraint related indexes equivalently.
This is untrue, though.

Today, in general, exclusion constraints more or less work by
physically inserting a heap tuple and index tuple relating to the
exclusion constraint (typically this is with a GiST index). There
could well be a conflict due to a concurrent insertion, or even due to
the fact that there was a conflicting tuple all along. The
implementation tries to find a conclusively committed conflicting
tuple. If it does not find such a tuple, it knows that its own already
physically inserted tuple is enough of a reason for others to have to
worry about it, and so it may proceed and commit the xact -- once it
does a whole index scan with only its own tuple found (and not any
overlapping tuple from an unfinished xact), it's clear that it can
proceed.

Exclusion constraints are made to do a pre-check by approach #2 to
value locking, to do an IGNORE, but that isn't really significant
here. The reason that #2 is broken for exclusion constraints but not
for unique indexes is that unique index insertion reports a conflict
in the process of inserting. In contrast, unique constraints insert
optimistically, and re-check. What's the difference? Well, even though
(since the implementation passes UNIQUE_CHECK_PARTIAL for unique index
insertion) we still insert when there is a conflict, there is an
important distinction: Value locking. That is to say, even though we
still insert, we also still use buffer locks as value locks in the
usual way that unique indexes do (we hold an exclusion buffer lock on
the first B-Tree leaf page the value could be on, as always). We still
decide that there is no conflict when the is an exclusion buffer lock,
which is a sufficient choke point to make that determination
conclusive. This implies that even with many concurrent insertions
into a newly created table, there will definitely be one inserter that
conclusively does *not* have a conflict, while the others might
(depends on the status of the guy who conclusively had *no* conflict,
or his line of successor xacts that may or may not themselves commit).

In a private e-mail to Heikki, I pointed out that he was mistaken when
he said that there is a pre-existing problem with exclusion
constraints: it is not possible for concurrent inserters to both
spuriously get exclusion violations when only one needs to. Heikki
accepted this. However, I now realize that Heikki was closer to being
right about it than I was; you can't get spurious exclusion
violations, but you can get something that's about the same, which is
a deadlock. That usually doesn't matter that much, because people
usually don't worry about the issue with exclusion constraints. But in
a world where they support INSERT ... ON CONFLICT IGNORE, that isn't
acceptable, IMV.

I have a test case:
https://github.com/petergeoghegan/upsert/blob/master/exclusion.sh

I see both spurious exclusion violations, and unprincipled deadlocks
with this simple IGNORE-based testcase, that uses a GiST + btree_gist
exclusion constraint. I didn't see livelocks, which would have been
really bad, because a mutual need to wait on each other's xact causes
a deadlock, and the deadlock detector detects those. I think that the
spurious exclusion violations are merely an artifact of the
implementation, as opposed to an inherent problem with exclusion
constraints (in other words, I don't withdraw my remarks from the last
paragraph - only (arguably) unprincipled deadlocks occur with ordinary
exclusion constraint enforcement with lots of concurrency).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detecting backend failures via libpq / DBD::Pg

2015-01-02 Thread Craig Ringer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/30/2014 08:43 AM, Greg Sabino Mullane wrote:
 I am working on enhancing the ping() method of DBD::Pg. The goal of
 that is for a user to be able to determine if the connection to the
 database is still valid.

This entire concept is flawed. IMO the documentation for it should be
marked with a warning pointing out that use of this method to validate
a connection before use implies a race condition; it is never safe to
validate a connection then assume it'll actually be valid and
working for the following statement(s).

Correctly written applications must instead tolerate failures,
trapping errors/exceptions at each statement and retrying the transaction.

 The basic way we do this is to send a simple SELECT via PQexec and
 then check for a valid return value (and when in doubt, we check 
 PQstatus).

At the protocol level all that's needed, and all that should be done
IMO, is to send a Sync message and wait for the server's ReadyForQuery
response.

It might be nice to expose a libpq function to do this.

It's only safe to do this when there's no in-progress query, but
that's true of *any* validation method you might choose.

 After some experimenting, the best solution I found is to send the
 PQexec, and then check if PQresultErrorField(result, 'C') is
 '25P02'. If it is, then all is well, in that the server is still
 alive. If it is not, then we can assume the backend is bad (for
 example, PQerrorMessage gives a could not receive data from
 server: Bad file descriptor). Being that we cannot do a rollback
 before calling the PQexec, is this a decent solution? Can we depend
 on really serious errors always trumping the expected 25P02?

I struggle to understand the utility of this.

Your transaction could be in the failed state because of an internal
error caused by the server's storage failing. So it's alive in the
sense of responding to queries, but completely useless.

This is another aspect of why the whole approach is wrong. If you want
to find out if the server can run a particular query, run the query.
If you get an error, then it couldn't run that query.

- -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBAgAGBQJUp1IhAAoJELBXNkqjr+S2hvwH/3V1jNIYTzjVKlMKGfiOmXAH
FAdy1PznxZx/DWidcIl7AfSa+9SDoF+cqR9leH1ju0MCy1VkS0W/Lx9lFEfDm6fU
s75kN0zno0N9wbSb/sMLGCFEv4wyX3On0rC401NY7/2HXDWco227JfH7O0fAz/lv
dNUDdmIg2+d0J1lKyTQ9Z5T8hl8SvMuRvnoaT0/5/6sqSRL3S/hSE0pObFGpKG0I
hWpklz3nQwMXZgLOt1YmSAprBd6HyUIzQDG0mV8QQ4SKn7M92J5KSgN1ORyVbMGZ
ImKJ+EpnUVEA+n/yG/CV/u27OfKVSYVQZJLZE3XepLY+/eBI3Ai2d+wK7x9Yrfk=
=NkUj
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-02 Thread Claudio Freire
On Fri, Jan 2, 2015 at 7:57 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 On 1/2/15, 4:18 PM, Tom Lane wrote:

 Heikki Linnakangashlinnakan...@vmware.com  writes:

 On 01/02/2015 11:41 PM, Tom Lane wrote:

 What might be worth trying is establishing a hard-and-fast boundary
 between C land and SQL land, with bitwise names in C and bytewise
  names
 in SQL.  This would mean, for example, that int4pl() would be renamed
  to
 int32pl() so far as the C function goes, but the function's SQL name
  would
 remain the same.

 I don't like that. I read int4pl as the function implementing plus
 operator for the SQL-visible int4 datatype, so int4pl makes perfect
  sense.

 I agree with that so far as the SQL name for the function goes, which is
 part of why I don't think we should rename anything at the SQL level.
 But right now at the C level, it's unclear how things should be named,
 and I think we don't really want a situation where the most appropriate
 name is so unclear and potentially confusing.  We're surviving fine with
 int32 in C meaning int4 in SQL so far as the type names go, so why not
 copy that naming approach for function names?


 Realistically, how many non-developers actually use the intXX SQL names? I
 don't think I've ever seen it; the only places I recall seeing it done are
 code snippets on developer blogs. Everyone else uses smallint, etc.

 I know we're all gun-shy about this after standard_conforming_strings, but
 that affected *everyone*. I believe this change would affect very, very few
 users.

 Also, note that I'm not talking about removing anything yet; that would come
 later.

I think it's naive to think the intXX names wouldn't be used just
because they're postgres-specific. Especially when pgadmin3 generates
them.

Many scripts of mine have them because pgadmin3 generated them, many
others have them because I grew accustomed to using them, especially
when I don't care being postgres-specific. float8 vs double precision
and stuff like that.

Lets not generalize anecdote (me using them), lets just pay attention
to the fact that lots of tools expose them (pgadmin3 among them).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-02 Thread Jim Nasby

On 1/2/15, 4:18 PM, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

On 01/02/2015 11:41 PM, Tom Lane wrote:

What might be worth trying is establishing a hard-and-fast boundary
between C land and SQL land, with bitwise names in C and bytewise names
in SQL.  This would mean, for example, that int4pl() would be renamed to
int32pl() so far as the C function goes, but the function's SQL name would
remain the same.

I don't like that. I read int4pl as the function implementing plus
operator for the SQL-visible int4 datatype, so int4pl makes perfect sense.

I agree with that so far as the SQL name for the function goes, which is
part of why I don't think we should rename anything at the SQL level.
But right now at the C level, it's unclear how things should be named,
and I think we don't really want a situation where the most appropriate
name is so unclear and potentially confusing.  We're surviving fine with
int32 in C meaning int4 in SQL so far as the type names go, so why not
copy that naming approach for function names?


Realistically, how many non-developers actually use the intXX SQL names? I 
don't think I've ever seen it; the only places I recall seeing it done are code 
snippets on developer blogs. Everyone else uses smallint, etc.

I know we're all gun-shy about this after standard_conforming_strings, but that 
affected *everyone*. I believe this change would affect very, very few users.

Also, note that I'm not talking about removing anything yet; that would come 
later.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] History of WAL_LEVEL (archive vs hot_standby)

2015-01-02 Thread Noah Misch
On Thu, Mar 27, 2014 at 08:16:13PM -0700, David Johnston wrote:
 Slightly tangential but are the locking operations associated with the
 recent bugfix generated in both (all?) modes or only hot_standby?

All modes.

 I thought
 it strange that transient locking operations were output with WAL though I
 get it if they are there to support read-only queries.

It is unintuitive.  This comment in heap_lock_tuple() attempts to explain:

/*
 * XLOG stuff.  You might think that we don't need an XLOG record 
because
 * there is no state change worth restoring after a crash.  You would be
 * wrong however: we have just written either a TransactionId or a
 * MultiXactId that may never have been seen on disk before, and we need
 * to make sure that there are XLOG entries covering those ID numbers.
 * Else the same IDs might be re-used after a crash, which would be
 * disastrous if this page made it to disk before the crash.  
Essentially
 * we have to enforce the WAL log-before-data rule even in this case.
 * (Also, in a PITR log-shipping or 2PC environment, we have to have 
XLOG
 * entries for everything anyway.)
 */

Another reason not mentioned is torn pages.  Locking a tuple updates t_xmax,
t_infomask2 and t_infomask.  It's possible for t_xmax to fall on one side of a
page tear and the infomasks to fall on the other side.  Writing t_xmax without
writing the corresponding infomasks could cause the tuple to be considered
deleted, not merely locked, after a crash.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-02 Thread Amit Kapila
On Fri, Jan 2, 2015 at 8:34 PM, Kevin Grittner kgri...@ymail.com wrote:

 Amit Kapila amit.kapil...@gmail.com wrote:

  Notes for Committer -
  There is one behavioural difference in the handling of
--analyze-in-stages
  switch, when individual tables (by using -t option) are analyzed by
  using this switch, patch will process (in case of concurrent jobs) all
the
  given tables for stage-1 and then for stage-2 and so on whereas in the
  unpatched code it will process all the three stages table by table
  (table-1 all three stages, table-2 all three stages and so on).  I think
  the new behaviour is okay as the same is done when this utility does
  vacuum for whole database.

 IMV, the change is for the better.  The whole point of
 --analyze-in-stages is to get minimal statistics so that good
 enough plans will be built for most queries to allow a production
 database to be brought back on-line quickly, followed by generating
 increasing granularity (which takes longer but should help ensure
 best plan) while the database is in use with the initial
 statistics.  Doing all three levels for one table before generating
 the rough statistics for the others doesn't help with that, so I
 see this change as fixing a bug.  Is it feasible to break that part
 out as a separate patch?


Currently as the patch stands the fix (or new behaviour) is only
implemented for the multiple jobs option, however to fix this in base
code a separate patch is required.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] replicating DROP commands across servers

2015-01-02 Thread Jeff Janes
On Mon, Dec 29, 2014 at 2:15 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Here's a patch that tweaks the grammar to use TypeName in COMMENT,
 SECURITY LABEL, and DROP for the type and domain cases.  The required
 changes in the code are pretty minimal, thankfully.  Note the slight
 changes to the new object_address test also.

 I think I'm pretty much done with this now, so I intend to push this
 first thing tomorrow and then the rebased getObjectIdentityParts patch
 sometime later.



This commit 3f88672a4e4d8e648d24ccc65 seems to have broken pg_upgrade for
pg_trgm.

On 9.2.9 freshly initdb and started with default config:

$ createdb jjanes

in psql:

create extension pg_trgm ;
create table foo (x text);
create index on foo using gin (upper(x) gin_trgm_ops);

Then run 9.5's pg_upgrade and it fails at the stage of restoring the
database schema.

The relevant log files are attached

Cheers,

Jeff


pg_upgrade_server.log
Description: Binary data


pg_upgrade_dump_16384.log
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redesigning checkpoint_segments

2015-01-02 Thread Josh Berkus
On 01/02/2015 01:57 AM, Heikki Linnakangas wrote:
 wal_keep_segments does not affect the calculation of CheckPointSegments.
 If you set wal_keep_segments high enough, checkpoint_wal_size will be
 exceeded. The other alternative would be to force a checkpoint earlier,
 i.e. lower CheckPointSegments, so that checkpoint_wal_size would be
 honored. However, if you set wal_keep_segments high enough, higher than
 checkpoint_wal_size, it's impossible to honor checkpoint_wal_size no
 matter how frequently you checkpoint.

So you're saying that wal_keep_segments is part of the max_wal_size
total, NOT in addition to it?

Just asking for clarification, here.  I think that's a fine idea, I just
want to make sure I understood you.  The importance of wal_keep_segments
will be fading as more people use replication slots.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers