Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-19 Thread Fujii Masao
On Tue, Aug 19, 2014 at 2:08 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-18 13:06:15 -0400, Robert Haas wrote:
 On Mon, Aug 18, 2014 at 7:19 AM, Rahila Syed rahilasye...@gmail.com wrote:
 According to the measurement result, the amount of WAL generated in
 Multiple Blocks in one run than that in Single Block in one run.
 So ISTM that compression of multiple blocks at one run can improve
 the compression ratio. Am I missing something?
 
  Sorry for using unclear terminology. WAL generated here means WAL that gets
  generated in each run without compression.
  So, the value WAL generated in the  above measurement is uncompressed WAL
  generated to be specific.
  uncompressed WAL = compressed WAL  + Bytes saved.
 
  Here, the measurements are done for a constant amount of time rather than
  fixed number of transactions. Hence amount of WAL generated does not
  correspond to compression ratios of each algo. Hence have calculated bytes
  saved in order to get accurate idea of the amount of compression in each
  scenario and for various algorithms.
 
  Compression ratio i.e Uncompressed WAL/compressed WAL in each of the above
  scenarios are as follows:
 
  Compression algo   Multiple Blocks in one runSingle Block in one 
  run
 
  LZ4  1.21   
  1.27
 
  Snappy1.19   1.25
 
  pglz 1.14   
  1.16
 
  This shows compression ratios of both the scenarios Multiple blocks and
  single block  are nearly same for this benchmark.

 I don't agree with that conclusion.  The difference between 1.21 and
 1.27, or between 1.19 and 1.25, is quite significant.  Even the
 difference beyond 1.14 and 1.16 is not trivial.  We should try to get
 the larger benefit, if it is possible to do so without an unreasonable
 effort.

 Agreed.

 One more question: Do I see it right that multiple blocks compressed
 together compress *worse* than compressing individual blocks? If so, I
 have a rather hard time believing that the patch is sane.

Or the way of benchmark might have some problems.

Rahila,
I'd like to measure the compression ratio in both multiple blocks and
single block cases.
Could you tell me where the patch for single block in one run is?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-19 Thread Jeff Davis
On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote:
 I think that's right, and I rather like your (Jeff's) approach.  It's
 definitely true that we could do better if we have a mechanism for
 serializing and deserializing group states, but (1) I think an awful
 lot of cases would get an awful lot better even just with the approach
 proposed here and (2) I doubt we would make the
 serialization/deserialization interfaces mandatory, so even if we had
 that we'd probably want a fallback strategy anyway.

Thank you for taking a look.

To solve the problem for array_agg, that would open up two potentially
lengthy discussions:

1. Trying to support non-serialized representations (like
ArrayBuildState for array_agg) as a real type rather than using
internal.

2. What changes should we make to the aggregate API? As long as we're
changing/extending it, should we go the whole way and support partial
aggregation[1] (particularly useful for parallelism)?

Both of those discussions are worth having, and perhaps they can happen
in parallel as I wrap up this patch.

I'll see whether I can get consensus that my approach is (potentially)
commit-worthy, and your statement that it (potentially) solves a real
problem is a big help.

Regards,
Jeff Davis

[1]
http://blogs.msdn.com/b/craigfr/archive/2008/01/18/partial-aggregation.aspx




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-19 Thread Michael Paquier
On Mon, Aug 18, 2014 at 10:55 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:
 All rightey.

 Here's the next version of this work. It now comes as two patches. The
first
 one refactors adds the XLogReplayBuffer() function and refactors all the
 redo functions to use it. It doesn't change the WAL record format in any
 way. The second patch applies over the first one, and changes the WAL
 format, and all the WAL-creating functions to use the new API for
 constructing WAL records. The second patch removes the relfilenode and
block
 number arguments from XLogReplayBuffer, because they're no longer needed
 when that information is included in the record format.

 Todo:

 * Performance testing. Do the new WAL construction functions add overhead
to
 WAL insertion?
 * Compare WAL record sizes before and after. I've tried to keep it as
 compact as possible, but I know that some records have gotten bigger. Need
 to do a more thorough analysis.
 * Rename XLogReplayBuffer. I don't particularly like any of the
suggestions
 so far, XLogReplayBuffer included. Discussion continues..
 * Anything else?

Patch 1 looks good. The main difference with the v1 submitted a couple of
days back is that the global variable approach is replaced with additional
arguments for the LSN position and record pointer in XLogReplayBuffer. I
have as well run a couple of tests with the page comparison tool, done some
tests based on installcheck-world with a slave replaying WAL behind, and
found no issues with it.
Perhaps we could consider pushing it to facilitate the next work? Even if
the second patch is dropped it is still a win IMO to have backup block
replay managed within a single function (being it named XLogReplayBuffer in
latest patch), and having it return a status flag.

Regarding patch 2:
- The main differences with the latest version are the modifications for
XLogReplayBuffer having new arguments (LSN position and record pointer).
XLogRecGetBlockRefIds has been changed to return a palloc'd array of block
IDs. xloginsert.h, containing all the functions for xlog record
construction is introduced as well.
- Tiny thing, be aware of tab padding. Here is heapam.c:
page = BufferGetPage(buffer);
PageSetAllVisible(page);
MarkBufferDirty(buffer);
- XLogRecGetBlockRefIds is not described in
src/backend/access/transam/README. Btw, pg_xlogdump drops a core dump when
using it:
--Output:
Assertion failed: (n == *num_refs), function XLogRecGetBlockRefIds, file
xlogreader.c, line 1157.
rmgr: Heaplen (rec/tot): 14/ 12912, tx:  3, lsn:
0/01000148, prev 0/01000120, Abort trap: 6 (core dumped)
-- Backtrace:
frame #4: 0x000103870363
pg_xlogdump`XLogRecGetBlockRefIds(record=0x7ff38a003200,
num_refs=0x7fff5c394464) + 435 at xlogreader.c:1157
frame #5: 0x00010386d610
pg_xlogdump`XLogDumpDisplayRecord(config=0x7fff5c3945c8,
ReadRecPtr=16777544, record=0x7ff38a003200) + 272 at pg_xlogdump.c:357
frame #6: 0x00010386cad8 pg_xlogdump`main(argc=2,
argv=0x7fff5c394658) + 3160 at pg_xlogdump.c:749
In order to reproduce that, simply run regression tests, followed by
pg_xlogdump on one of the WAL files generated.
- This patch includes some diffs from pg_receivexlog.c taken from 52bffe3.
- I have run installcheck-world and compared the size of the WAL generated
on HEAD and the patch (any hints to improve such analysis are of course
welcome)
  name  |   start   |stop|   diff
+---++---
 HEAD (8605bc7) | 0/16C6808 | 0/11A2C670 | 271998568
 Patch 1+2  | 0/16D45D8 | 0/1267A4B0 | 284843736
(2 rows)
So that's a diff of more or less 13MB for this test set.

Looking forward for some performance numbers as well as more precise
comparison of WAL record length.
-- 
Michael


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-19 Thread Jeff Davis
On Thu, 2014-08-14 at 12:52 -0400, Robert Haas wrote:
 It appears to me that the performance characteristics for this version
 are not significantly different from version 1.  I have not looked at
 the code.

While trying to reproduce your results, I noticed what might be around a
1% regression just from adding the 3 fields to MemoryContextData. If I
cut it down to adding just one field, the regression disappears.

The results are fairly noisy, so I could be chasing the wrong thing. But
one reason to believe it is that I pushed the size of MemoryContextData
above 64, which sounds like it might be an important threshold.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-19 Thread Jeff Davis
On Sat, 2014-08-16 at 23:09 +0200, Tomas Vondra wrote:
 But maybe the inheritance really is not necessary - maybe it would be
 enough to track this per-context, and then just walk through the
 contexts and collect this. Because my observation is that most of the
 time is actually spent in walking through blocks and freelists.

That makes a lot of sense to me.

Another approach is to pass a flag to hash_create that tells it not to
create a subcontext. Then we don't need to traverse anything; we already
know which context we want to look at. Perhaps I was being too clever
with the idea of tracking space for an entire hierarchy.

Also, as I pointed out in my reply to Robert, adding too many fields to
MemoryContextData may be the cause of the regression. Your idea requires
only one field, which doesn't show the same regression in my tests.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Fujii Masao
On Mon, Aug 18, 2014 at 4:01 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Aug 18, 2014 at 3:48 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Aug 18, 2014 at 2:38 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
 check was done but in 9.4 this command returns 4 fields:
 (PQntuples(res) != 1 || PQnfields(res)  3)
 That's not directly related to this patch, but making some corrections
 is not going to hurt..

 Good catch! I found that libpqwalreceiver.c, etc have the same problem.
 It's better to fix this separately. Patch attached.
 Patch looks good to me.

Okay, applied!

 Once you push that I'll rebase the stuff on
 this thread once again, that's going to conflict for sure. And now
 looking at your patch there is additional refactoring possible with
 IDENTIFY_SYSTEM and pg_basebackup as well...

Yep, that's possible. But since the patch needs to be back-patch to 9.4,
I didn't do the refactoring.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-19 Thread Heikki Linnakangas

On 08/19/2014 11:07 AM, Michael Paquier wrote:

Patch 1 looks good. The main difference with the v1 submitted a couple of
days back is that the global variable approach is replaced with additional
arguments for the LSN position and record pointer in XLogReplayBuffer. I
have as well run a couple of tests with the page comparison tool, done some
tests based on installcheck-world with a slave replaying WAL behind, and
found no issues with it.


Thanks!


Perhaps we could consider pushing it to facilitate the next work? Even if
the second patch is dropped it is still a win IMO to have backup block
replay managed within a single function (being it named XLogReplayBuffer in
latest patch), and having it return a status flag.


Yeah, that was my plan.

Regarding the name, the following have been suggested so far:

XLogReplayBuffer
XLogRestoreBuffer
XLogRecoverBuffer
XLogReadBufferForReplay
ReadBufferForXLogReplay

One more idea:

XLogRedoBuffer (would be like three first options above, but would match 
that we call the functions that call this redo functions)


I think XLogReadBufferForReplay is the most descriptive. Andres and 
Alvaro both suggested it - independently I believe - so that seems to 
come out naturally. But maybe make it XLogReadBufferForRedo, since we 
call the redo functions redo functions and not replay functions.


Yet another option is to just call it XLogReadBuffer, and rename the 
existing XLogReadBuffer to something else. With the 2nd patch, there are 
only a few callers of XLogReadBuffer left. But is it too deceitful if 
XLogReadBuffer doesn't merely read the page, but also sometimes replaces 
it with a full-page image? Maybe it's OK..



Barring objections or better ideas, I'm leaning towards 
XLogReadBufferForRedo.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Andres Freund
On 2014-08-18 14:38:06 +0900, Michael Paquier wrote:
 - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
 check was done but in 9.4 this command returns 4 fields:
 (PQntuples(res) != 1 || PQnfields(res)  3)

Which is correct. We don't want to error out in the case where 3 columns
are returned because that'd unnecessarily break compatibility with 
9.4. Previously that check was != 3...

This isn't a bug.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Fujii Masao
On Tue, Aug 19, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-18 14:38:06 +0900, Michael Paquier wrote:
 - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
 check was done but in 9.4 this command returns 4 fields:
 (PQntuples(res) != 1 || PQnfields(res)  3)

 Which is correct. We don't want to error out in the case where 3 columns
 are returned because that'd unnecessarily break compatibility with 
 9.4. Previously that check was != 3...

 This isn't a bug.

Okay, I understood why you didn't update those codes.

Since we don't allow replication between different major versions,
it's better to apply this change at least into libpqwalreceiver.c. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Andres Freund
On 2014-08-19 18:02:32 +0900, Fujii Masao wrote:
 On Tue, Aug 19, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-08-18 14:38:06 +0900, Michael Paquier wrote:
  - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
  check was done but in 9.4 this command returns 4 fields:
  (PQntuples(res) != 1 || PQnfields(res)  3)
 
  Which is correct. We don't want to error out in the case where 3 columns
  are returned because that'd unnecessarily break compatibility with 
  9.4. Previously that check was != 3...
 
  This isn't a bug.
 
 Okay, I understood why you didn't update those codes.
 
 Since we don't allow replication between different major versions,
 it's better to apply this change at least into libpqwalreceiver.c. Thought?

We'd discussed that we'd rather keep it consistent. It also results in a
more explanatory error message lateron.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-19 Thread Fujii Masao
On Tue, Aug 19, 2014 at 6:03 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-19 18:02:32 +0900, Fujii Masao wrote:
 On Tue, Aug 19, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-08-18 14:38:06 +0900, Michael Paquier wrote:
  - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
  check was done but in 9.4 this command returns 4 fields:
  (PQntuples(res) != 1 || PQnfields(res)  3)
 
  Which is correct. We don't want to error out in the case where 3 columns
  are returned because that'd unnecessarily break compatibility with 
  9.4. Previously that check was != 3...
 
  This isn't a bug.

 Okay, I understood why you didn't update those codes.

 Since we don't allow replication between different major versions,
 it's better to apply this change at least into libpqwalreceiver.c. Thought?

 We'd discussed that we'd rather keep it consistent. It also results in a
 more explanatory error message lateron.

Hmm... okay, will revert the commit.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-19 Thread Rahila Syed
Hello,
Thank you for comments.

Could you tell me where the patch for single block in one run is?
Please find attached patch for single block compression in one run.

Thank you,



On Tue, Aug 19, 2014 at 1:17 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Aug 19, 2014 at 2:08 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-08-18 13:06:15 -0400, Robert Haas wrote:
  On Mon, Aug 18, 2014 at 7:19 AM, Rahila Syed rahilasye...@gmail.com
 wrote:
  According to the measurement result, the amount of WAL generated in
  Multiple Blocks in one run than that in Single Block in one run.
  So ISTM that compression of multiple blocks at one run can improve
  the compression ratio. Am I missing something?
  
   Sorry for using unclear terminology. WAL generated here means WAL
 that gets
   generated in each run without compression.
   So, the value WAL generated in the  above measurement is uncompressed
 WAL
   generated to be specific.
   uncompressed WAL = compressed WAL  + Bytes saved.
  
   Here, the measurements are done for a constant amount of time rather
 than
   fixed number of transactions. Hence amount of WAL generated does not
   correspond to compression ratios of each algo. Hence have calculated
 bytes
   saved in order to get accurate idea of the amount of compression in
 each
   scenario and for various algorithms.
  
   Compression ratio i.e Uncompressed WAL/compressed WAL in each of the
 above
   scenarios are as follows:
  
   Compression algo   Multiple Blocks in one runSingle Block in
 one run
  
   LZ4  1.21
1.27
  
   Snappy1.19
  1.25
  
   pglz 1.14
1.16
  
   This shows compression ratios of both the scenarios Multiple blocks
 and
   single block  are nearly same for this benchmark.
 
  I don't agree with that conclusion.  The difference between 1.21 and
  1.27, or between 1.19 and 1.25, is quite significant.  Even the
  difference beyond 1.14 and 1.16 is not trivial.  We should try to get
  the larger benefit, if it is possible to do so without an unreasonable
  effort.
 
  Agreed.
 
  One more question: Do I see it right that multiple blocks compressed
  together compress *worse* than compressing individual blocks? If so, I
  have a rather hard time believing that the patch is sane.

 Or the way of benchmark might have some problems.

 Rahila,
 I'd like to measure the compression ratio in both multiple blocks and
 single block cases.
 Could you tell me where the patch for single block in one run is?

 Regards,

 --
 Fujii Masao



CompressSingleBlock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2014-08-19 Thread David Rowley
On Tue, Feb 11, 2014 at 7:59 AM, Alexander Korotkov aekorot...@gmail.com
wrote:

 Done. Patch is splitted.


I've started to look at this, and for now I'm still finding my way around
the patch, so I'm not quite there yet with understanding everything.
Never-the-less it seems best to post my comments early, so as to help
maintain concurrency between the review and getting the patch into shape.

I've only been looking at partial-sort-basic-1.patch so far;

The patch no longer applies to master, but this was only due to a tab being
replaced by 2 spaces in a pgident run. I've attached an updated patch which
currently applies without any issues.

Here's a few notes from reading over the code:

* pathkeys.c

  EquivalenceMember *member = (EquivalenceMember *)
lfirst(list_head(key-pk_eclass-ec_members));

You can use linitial() instead of lfirst(list_head()). The same thing
occurs in costsize.c

* pathkeys.c

The following fragment:

n = pathkeys_common(root-query_pathkeys, pathkeys);

if (n != 0)
{
/* It's useful ... or at least the first N keys are */
return n;
}

return 0; /* path ordering not useful */
}

Could just read:

/* return the number of path keys in common, or 0 if there are none */
return pathkeys_common(root-query_pathkeys, pathkeys);

* execnodes.h

In struct SortState, some new fields don't have a comment.


I've also thrown a few different workloads at the patch and I'm very
impressed with most of the results. Especially when LIMIT is used, however
I've found a regression case which I thought I should highlight, but for
now I can't quite see what could be done to fix it.

create table a (x int not null, y int not null);
insert into a select x.x,y.y from generate_series(1,100) x(x) cross
join generate_series(1,10) y(y);

Patched:
explain analyze select x,y from a where x+0=1 order by x,y limit 10;
 QUERY PLAN

 Limit  (cost=92.42..163.21 rows=10 width=8) (actual
time=6239.426..6239.429 rows=10 loops=1)
   -  Partial sort  (cost=92.42..354064.37 rows=5 width=8) (actual
time=6239.406..6239.407 rows=10 loops=1)
 Sort Key: x, y
 Presorted Key: x
 Sort Method: quicksort  Memory: 25kB
 -  Index Scan using a_x_idx on a  (cost=0.44..353939.13
rows=5 width=8) (actual time=0.059..6239.319 rows=10 loops=1)
   Filter: ((x + 0) = 1)
   Rows Removed by Filter: 990
 Planning time: 0.212 ms
 Execution time: 6239.505 ms
(10 rows)


Time: 6241.220 ms

Unpatched:
explain analyze select x,y from a where x+0=1 order by x,y limit 10;
 QUERY PLAN

 Limit  (cost=195328.26..195328.28 rows=10 width=8) (actual
time=3077.759..3077.761 rows=10 loops=1)
   -  Sort  (cost=195328.26..195453.26 rows=5 width=8) (actual
time=3077.757..3077.757 rows=10 loops=1)
 Sort Key: x, y
 Sort Method: quicksort  Memory: 25kB
 -  Seq Scan on a  (cost=0.00..194247.77 rows=5 width=8)
(actual time=0.018..3077.705 rows=10 loops=1)
   Filter: ((x + 0) = 1)
   Rows Removed by Filter: 990
 Planning time: 0.510 ms
 Execution time: 3077.837 ms
(9 rows)


Time: 3080.201 ms

As you can see, the patched version performs an index scan in order to get
the partially sorted results, but it does end up quite a bit slower than
the seqscan/sort that the unpatched master performs. I'm not quite sure how
realistic the x+0 = 1 WHERE clause is, but perhaps the same would happen if
something like x+y = 1 was performed too After a bit more analysis on
this, I see that if I change the 50k estimate to 10 in the debugger that
the num_groups is properly estimated at 1 and it then performs the seq scan
instead. So it looks like the costings of the patch are not to blame here.
(The 50k row estimate comes from rel tuples  / DEFAULT_NUM_DISTINCT)

That's all I have at the moment... More to follow soon.

Regards

David Rowley


partial-sort-basic-1_rebased.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Greg Stark
On Tue, Aug 19, 2014 at 1:21 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 There's plenty of agreement on not a GUC - but what about alternatives?

It could be a new protocol message. Currently there are no transaction
oriented protocol messages (other than the transaction status in
ReadyForQuery). But would it not make sense to have TransactionBegin,
TransactionCommit, and TransactionAbort in the protocol? Would that
make it easier for the client-side failover to keep track of what
transactions are pending or committed and need to be verified after a
failover?


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Andres Freund
On 2014-08-19 08:21:10 +0800, Craig Ringer wrote:
 On 08/19/2014 01:03 AM, Robert Haas wrote:
  2. I agree that it's not good to have this get controlled by a GUC.
  If the behavior change is big enough that it's going to break clients,
  adding a GUC isn't a sufficient remedy.  If it's not, adding a GUC is
  unnecessary.
 
 There's plenty of agreement on not a GUC - but what about alternatives?

What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
Reporting the LSN in the command tag? Anything doing transparent
failover needs to be aware of transaction boundaries anyway. 

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] After switching primary server while using replication slot.

2014-08-19 Thread Fujii Masao
On Mon, Aug 18, 2014 at 11:16 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 Hi all,

 After switching primary serer while using repliaction slot, the
 standby server will not able to connect new primary server.
 Imagine this situation, if primary server has two ASYNC standby
 servers, also use each replication slots.
 And the one standby(A) apply WAL without problems. But another one
 standby(B) has stopped after connected to primary server.
 (or sending WAL is too delayed)

 In this situation, the standby(B) has not received WAL segment file
 while stopping itself.
 And the primary server can not remove WAL segments which has not been
 received to all standby.
 Therefore the primary server have to keep the WAL segment file which
 has not been received to all standby.
 But standby(A) can do checkpoint itself, and then it's possible to
 recycle WAL segments.
 The number of WAL segment of each server are different.
 ( The number of WAL files of standby(A) having smaller than primary server.)
 After the primary server is crashed, the standby(A) promote to primary,
 we can try to connect standby(B) to standby(A) as new standby server.
 But it will be failed because the standby(A) server might not have WAL
 segment files that standby(B) required.

This sounds valid concern.

 To resolve this situation, I think that we should make master server
 to notify about removal of WAL segment to all standby servers.
 And the standby servers recycle WAL segments files base on that information.

 Thought?

How does the server recycle WAL files after it's promoted from the
standby to master?
It does that as it likes? If yes, your approach would not be enough.

The approach prevents unexpected removal of WAL files while the standby
is running. But after the standby is promoted to master, it might recycle
needed WAL files immediately. So another standby may still fail to retrieve
the required WAL file after the promotion.

ISTM that, in order to address this, we might need to log all the replication
slot activities and replicate them to the standby. I'm not sure if this
breaks the design of replication slot at all, though.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-19 Thread Tomas Vondra
On 19 Srpen 2014, 9:52, Jeff Davis wrote:
 On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote:
 I think that's right, and I rather like your (Jeff's) approach.  It's
 definitely true that we could do better if we have a mechanism for
 serializing and deserializing group states, but (1) I think an awful
 lot of cases would get an awful lot better even just with the approach
 proposed here and (2) I doubt we would make the
 serialization/deserialization interfaces mandatory, so even if we had
 that we'd probably want a fallback strategy anyway.

 Thank you for taking a look.

 To solve the problem for array_agg, that would open up two potentially
 lengthy discussions:

 1. Trying to support non-serialized representations (like
 ArrayBuildState for array_agg) as a real type rather than using
 internal.

That's certainly an option, and it's quite straightforward. The downside
of it is that you either prevent the aggregates from using the most
efficient state form (e.g. the array_agg might use a simple array as a
state) or you cause a proliferation of types with no other purpose.


 2. What changes should we make to the aggregate API? As long as we're
 changing/extending it, should we go the whole way and support partial
 aggregation[1] (particularly useful for parallelism)?

Maybe, but not in this patch please. That's far wider scope, and while
considering it when designing API changes is probably a good idea, we
should resist the attempt to do those two things in the same patch.

 Both of those discussions are worth having, and perhaps they can happen
 in parallel as I wrap up this patch.

Exactly.

 I'll see whether I can get consensus that my approach is (potentially)
 commit-worthy, and your statement that it (potentially) solves a real
 problem is a big help.

IMHO it's a step in the right direction. It may not go as far as I'd like,
but that's OK.

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-08-19 Thread Tomas Vondra
On 19 Srpen 2014, 10:26, Jeff Davis wrote:
 On Sat, 2014-08-16 at 23:09 +0200, Tomas Vondra wrote:
 But maybe the inheritance really is not necessary - maybe it would be
 enough to track this per-context, and then just walk through the
 contexts and collect this. Because my observation is that most of the
 time is actually spent in walking through blocks and freelists.

 That makes a lot of sense to me.

 Another approach is to pass a flag to hash_create that tells it not to
 create a subcontext. Then we don't need to traverse anything; we already
 know which context we want to look at. Perhaps I was being too clever
 with the idea of tracking space for an entire hierarchy.

 Also, as I pointed out in my reply to Robert, adding too many fields to
 MemoryContextData may be the cause of the regression. Your idea requires
 only one field, which doesn't show the same regression in my tests.

Yeah, keeping the structure size below 64B seems like a good idea.

The use-case for this is tracking a chosen subtree of contexts - e.g.
aggcontext and below, so I'd expect the tracked subtrees to be relatively
shallow. Am I right?

My fear is that by removing the inheritance bit, we'll hurt cases with a
lot of child contexts. For example, array_agg currently creates a separate
context for each group - what happens if you have 100k groups and do
MemoryContextGetAllocated? I guess iterating over 100k groups is not free.

Wouldn't the solution with inheritance and propagating the accounting info
to the parent actually better? Or maybe allowing both, having two flags
when creating a context instead of one?

  AllocSetCreateTracked( , bool track, bool propagate_immediately)

By squashing both flags into a single mask you wouldn't increase the size.
Also, do we really need to track allocated bytes - couldn't we track
kilobytes or something and use smaller data types to get below the 64B?

regards
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-08-19 Thread Amit Kapila
On Wed, Aug 13, 2014 at 4:01 PM, Dilip kumar dilip.ku...@huawei.com wrote:
 On 11 August 2014 10:29, Amit kapila wrote,
 5.

 res = executeQuery(conn,

 select relname, nspname from pg_class c, pg_namespace ns

  where (relkind = \'r\' or relkind = \'m\')

  and c.relnamespace = ns.oid order by relpages desc,

 progname, echo);

 

 a. Here you need to use SQL keywords in capital letters, refer one
 of the other caller of executeQuery() in vacuumdb.c

 b. Why do you need this condition c.relnamespace = ns.oid in above

 query?



 IT IS POSSIBLE THAT, TWO NAMESPACE HAVE THE SAME TABLE NAME, SO WHEN WE
ARE SENDING COMMAND FROM CLIENT WE NEED TO GIVE NAMESPACE WISE BECAUSE WE
NEED TO VACUUM ALL THE

 TABLES.. (OTHERWISE TWO TABLE WITH SAME NAME FROM DIFFERENT NAMESPACE
WILL BE TREATED SAME.)

Thats right, however writing query in below way might
make it more understandable
+ SELECT relname, nspname FROM pg_class c, pg_namespace ns

SELECT c.relname, ns.nspname FROM pg_class c, pg_namespace ns



 7.


 Here we are getting message string, I think if we need to find the error
code then we need to parse the string, and after that we need to compare
with error codes.

 Is there any other way to do this ?

You can compare against SQLSTATE by using below API.
val = PQresultErrorField(res, PG_DIAG_SQLSTATE);

You need to handle *42P01* SQLSTATE, also please refer below
usage where we are checking SQLSTATE.

fe-connect.c
PQresultErrorField(conn-result, PG_DIAG_SQLSTATE),
   ERRCODE_INVALID_PASSWORD) == 0)

Few more comments:

1.
* If user has not given the vacuum of complete db, then if

I think here you have said reverse of what code is doing.
You don't need *not* in above sentence.

2.
+ appendPQExpBuffer(sql, \%s\.\%s\, nspace, relName);
I think here you need to use function fmtQualifiedId() or fmtId()
or something similar to handle quotes appropriately.

3.

+  */
+ if (!r  !completedb)
Here the usage of completedb variable is reversed which means
that it goes into error path when actually whole database is
getting vacuumed and the reason is that you are setting it
to false in below code:
+ /* Vaccuming full database*/
+ vacuum_tables = false;

4.
Functions prepare_command() and vacuum_one_database() contain
duplicate code, is there any problem in using prepare_command()
function in vacuum_one_database(). Another point in this context
is that I think it is better to name function prepare_command()
as append_vacuum_options() or something on that lines, also it will
be better if you can write function header for this function as well.

5.
+ if (error)
+ {
+ for (i = 0; i  max_slot; i++)
+ {
+ DisconnectDatabase(connSlot[i]);
+ }

Here why do we need DisconnectDatabase() type of function?
Why can't we simply call PQfinish() as in base code?

6.
+ /*
+  * if table list is not provided then we need to do vaccum for whole DB
+  * get the list of all tables and prpare the list
+  */
spelling of prepare is wrong. I have noticed spell mistake
in comments at some other place as well, please check all
comments once

7. I think in new mechanism cancel handler will not work.
In single connection vacuum it was always set/reset
in function executeMaintenanceCommand(). You might need
to set/reset it in function run_parallel_vacuum().


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Fujii Masao
On Tue, Aug 19, 2014 at 7:21 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-19 08:21:10 +0800, Craig Ringer wrote:
 On 08/19/2014 01:03 AM, Robert Haas wrote:
  2. I agree that it's not good to have this get controlled by a GUC.
  If the behavior change is big enough that it's going to break clients,
  adding a GUC isn't a sufficient remedy.  If it's not, adding a GUC is
  unnecessary.

 There's plenty of agreement on not a GUC - but what about alternatives?

 What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
 Reporting the LSN in the command tag? Anything doing transparent
 failover needs to be aware of transaction boundaries anyway.

So something like transparent failover doesn't work when a client is
working in auto commit mode? That sounds not good.

Just idea. What about using NoticeResponse message to report LSN?
It can be sent basically anytime and this idea doesn't break current
wire protocol.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-08-19 Thread Amit Kapila
On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
  1.
  +Number of parallel connections to perform the operation. This
  option will enable the vacuum
  +operation to run on parallel connections, at a time one table
will
  be operated on one connection.
 
  a. How about describing w.r.t asynchronous connections
  instead of parallel connections?

 I don't think asynchronous is a good choice of word.

Agreed.

Maybe simultaneous?

Not sure. How about *concurrent* or *multiple*?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] Trove with PostgreSQL-XC

2014-08-19 Thread Vivek Singh Raghuwanshi
Hi All,

Please let me know is that possible to use Openstack Trove with Postgres-XC.
With instances and Baremetal (after Juno Release).
I Know it is possible to use other medium like MySQL or PostgreSQL, but i
am not sure about XC.


-- 
ViVek Raghuwanshi
Mobile -+91-09595950504
Skype - vivek_raghuwanshi
IRC vivekraghuwanshi
http://vivekraghuwanshi.wordpress.com/
http://in.linkedin.com/in/vivekraghuwanshi


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Andres Freund
On 2014-08-19 19:59:51 +0900, Fujii Masao wrote:
 On Tue, Aug 19, 2014 at 7:21 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-08-19 08:21:10 +0800, Craig Ringer wrote:
  On 08/19/2014 01:03 AM, Robert Haas wrote:
   2. I agree that it's not good to have this get controlled by a GUC.
   If the behavior change is big enough that it's going to break clients,
   adding a GUC isn't a sufficient remedy.  If it's not, adding a GUC is
   unnecessary.
 
  There's plenty of agreement on not a GUC - but what about alternatives?
 
  What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
  Reporting the LSN in the command tag? Anything doing transparent
  failover needs to be aware of transaction boundaries anyway.
 
 So something like transparent failover doesn't work when a client is
 working in auto commit mode? That sounds not good.

I don't think transparent failover + autocommit is a sensible
combination.

 Just idea. What about using NoticeResponse message to report LSN?
 It can be sent basically anytime and this idea doesn't break current
 wire protocol.

I think that'd be horrible from multiple perspectives: a) how to discern
them from regular notice messages b) It's not sent in the same protocol
level message as the COMMIT message. Thus there's scenarios where you
only have the commit, but not the LSN.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Postgres-xc-developers] Trove with PostgreSQL-XC

2014-08-19 Thread 鈴木 幸市
XC’s libpq is binary compatible with PG.   So as long as Openstack uses libpq 
to connect to PG, XC should work too.

Appreciate if you have a chance to try.

Thanks.
---
Koichi Suzuki

2014/08/19 20:14、Vivek Singh Raghuwanshi 
vivekraghuwan...@gmail.commailto:vivekraghuwan...@gmail.com のメール:


Hi All,

Please let me know is that possible to use Openstack Trove with Postgres-XC.
With instances and Baremetal (after Juno Release).
I Know it is possible to use other medium like MySQL or PostgreSQL, but i am 
not sure about XC.


--
ViVek Raghuwanshi
Mobile -+91-09595950504
Skype - vivek_raghuwanshi
IRC vivekraghuwanshi
http://vivekraghuwanshi.wordpress.com/
http://in.linkedin.com/in/vivekraghuwanshi
--
___
Postgres-xc-developers mailing list
postgres-xc-develop...@lists.sourceforge.netmailto:postgres-xc-develop...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers



Re: [HACKERS] Enable WAL archiving even in standby

2014-08-19 Thread Fujii Masao
On Fri, Aug 15, 2014 at 4:30 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 13, 2014 at 6:42 AM, Fujii Masao masao.fu...@gmail.com wrote:
 I'd propose the attached WIP patch which allows us to enable WAL archiving
 even in standby. The patch adds always as the valid value of archive_mode.
 If it's set to always, the archiver is started when the server is in 
 standby
 mode and all the WAL files that walreceiver wrote to the disk are archived by
 using archive_command. Then, even after the server is promoted to master,
 the archiver keeps archiving WAL files. The patch doesn't change the meanings
 of the setting values on and off of archive_mode.

 I like the feature, but I don't much like this as a control mechanism.
 Having archive_command and standby_archive_command, as you propose
 further down, seems saner.

Okay, that's fine. One question is; Which WAL files should be archived by
standby_archive_command? There are following kinds of WAL files.

(1) WAL files which were fully written and closed by walreceiver
 Curently they are not archived at all.

(2) WAL file which is being written by walreceiver
 This file will be closed before it's fully written because of,
 for example, standby promotion.
 Currently this is archived by archive_command.

(3) WAL file with new timeline, which is copied from (2)
  At the end of recovery, after new timeline is assigned,
  this latest WAL file with new timeline is created by being copied
  from (2) (i.e., latest WAL file with old timeline). WAL data of
  end-of-recovery checkpoint is written to this latest WAL file.
  Currently this is archived by archive_command.

(4) Timeline history files
 When standby is promoted to the master, the imeline is incremented
 and the timeline history file is created.
 Currently the timeline history files are archived by archive_command.

(5) WAL files generated in normal processing mode
  Currently they are archived by archive_command.

I'm thinking to use standby_archive_command only for (1) because
the others are currently archived by archive_command. That means
that even if there are type (1) WAL files which have not been archived
yet after the standby promotion (i.e., the situation where WAL archiving
was delayed for some reasons in the standby), they are archived by
standby_archive_command. IOW, the archiver uses both archive commands
as the situation demands.

OTOH, maybe there are people who want to use standby_archive_command
for all the WAL files with old timeline, i.e., (1) and (2). Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] dynahash vs. memory-bounded HashAggregate (hashjoin-style)

2014-08-19 Thread Tomas Vondra
Hi all,

while working on a prototype of memory-bounded hash aggregate (alternative
to Jeff's patch discussed in [1]), I ran into difficulties when dealing
with dynahash. So I'm asking for help ...

Some of the difficulties stem from my limited knowledge of dynahash, and
complexity in execGrouping.c, but it seems to me some are actually due to
a mismatch with the proposed hashjoin-like approach to batching.

The hashjoin-like batching essentially does this:

1) put data (in this case aggregate states) into a hash table, until a
memory limit is reached
2) double the number of batches and move half the entries from the hash
table (based on batch number, computed from the hash value)
3) repeat

This however requires two things, that I think seem quite difficult to do
with dynahash:

(a) detecting that a memory limit was reached

   This is usually done with a condition like this:

   if (consumed_memory  work_mem) {
  ... do the batching / remove ~50% of data from hash table
  ... it's expected counsumed_memory drops by 50%
   }

   Where consumed memory is the size of a memory context (cheap to
compute, thanks to another patch from Jeff [2]).

   This however does not work with dynahash, because dynahash does not
release memory for removed tuples - it just moves it to a freelist, so
consumed_memory only grows.

   For a while I thought I could do this:

   if (consumed_memory  consumed_memory_prev) {
  ...
  consumed_memory_prev = consumed_memory
   }

   But then I found out dynahash does not grow continuously, but in (quite
large) steps. Exceeding the limit a bit is not a big deal, but the
growth is quite fast and quickly leads to allocating much more than the
limit.


(b) removing tuples while batching

Whenever the number of batches is increased (doubled), I need to walk
through the hash table and remove entries not belonging to the current
batch (should be ~50% of them). The only way to do this with dynahash
sems to be iterating over the entries, and then doing another search
with HASH_REMOVE. Is there a better way?


I've been considering switching this to a custom hash table (similar to
what's used in hashjoin
https://commitfest.postgresql.org/action/patch_view?id=1494), which seems
like a better solution for this use-case, but I'm not sure about this. I'm
not a big fan of replacing large amounts of code for no good reason.

Opinions?


I'd be grateful if someone more knowledgeable of dynahash / the way it's
used in execGrouping could review the prototype I have so far. There's
only a handful of functions related to dynahash, and most of the issues I
have is about passing the values properly (slots, dummy values, tuples).


regards
Toms


[1]
http://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop

[2]
http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Postgres-xc-developers] Trove with PostgreSQL-XC

2014-08-19 Thread Vivek Singh Raghuwanshi
Thanks,
One more question, is this library support multitenancy or we need to
launch separate VPC (virtual public cloud) every time for each customer.
its good if we have both option.



On Tue, Aug 19, 2014 at 5:04 PM, 鈴木 幸市 koi...@intellilink.co.jp wrote:

  XC’s libpq is binary compatible with PG.   So as long as Openstack uses
 libpq to connect to PG, XC should work too.

  Appreciate if you have a chance to try.

  Thanks.
 ---
 Koichi Suzuki

  2014/08/19 20:14、Vivek Singh Raghuwanshi vivekraghuwan...@gmail.com
 のメール:


  Hi All,

  Please let me know is that possible to use Openstack Trove with
 Postgres-XC.
  With instances and Baremetal (after Juno Release).
  I Know it is possible to use other medium like MySQL or PostgreSQL, but
 i am not sure about XC.


  --
 ViVek Raghuwanshi
 Mobile -+91-09595950504
 Skype - vivek_raghuwanshi
 IRC vivekraghuwanshi
 http://vivekraghuwanshi.wordpress.com/
 http://in.linkedin.com/in/vivekraghuwanshi

 --
 ___
 Postgres-xc-developers mailing list
 postgres-xc-develop...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers





-- 
ViVek Raghuwanshi
Mobile -+91-09595950504
Skype - vivek_raghuwanshi
IRC vivekraghuwanshi
http://vivekraghuwanshi.wordpress.com/
http://in.linkedin.com/in/vivekraghuwanshi


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Fujii Masao
On Tue, Aug 19, 2014 at 8:22 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-19 19:59:51 +0900, Fujii Masao wrote:
 On Tue, Aug 19, 2014 at 7:21 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-08-19 08:21:10 +0800, Craig Ringer wrote:
  On 08/19/2014 01:03 AM, Robert Haas wrote:
   2. I agree that it's not good to have this get controlled by a GUC.
   If the behavior change is big enough that it's going to break clients,
   adding a GUC isn't a sufficient remedy.  If it's not, adding a GUC is
   unnecessary.
 
  There's plenty of agreement on not a GUC - but what about alternatives?
 
  What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
  Reporting the LSN in the command tag? Anything doing transparent
  failover needs to be aware of transaction boundaries anyway.

 So something like transparent failover doesn't work when a client is
 working in auto commit mode? That sounds not good.

 I don't think transparent failover + autocommit is a sensible
 combination.

 Just idea. What about using NoticeResponse message to report LSN?
 It can be sent basically anytime and this idea doesn't break current
 wire protocol.

 I think that'd be horrible from multiple perspectives: a) how to discern
 them from regular notice messages

You can implement your own protocol upon existing messages like
replication is done.

 b) It's not sent in the same protocol
 level message as the COMMIT message. Thus there's scenarios where you
 only have the commit, but not the LSN.

Hmm.. you can change the code so that the message with LSN is sent
as soon as COMMIT message is sent, if required.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-08-19 Thread Fujii Masao
On Tue, Aug 19, 2014 at 9:52 AM,  furu...@pm.nttdata.co.jp wrote:
 Thank you for updating the patch.

 I did not get error with applying, and compiling.
 It works fine. I think this function code has no problem.
 Could you please submit patch to commit fest app?

 Thanks for the review!

 As you pointed out, submitted patch to commit fest app.

When replication slot is not specified in pg_receivexlog, the flush location
in the feedback message always indicates invalid. So there seems to be
no need to send the feedback as soon as fsync is issued, in that case.
How should this option work when replication slot is not specified?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread MauMau

From: Fujii Masao masao.fu...@gmail.com
What's the main purpose of this tool? If it's for WAL archiving, the tool 
name

pg_copy sounds too generic. We already have pg_archivecleanup, so maybe
pg_archivecopy or something is better for the consistency?

pg_copy in the patch copies the file to the destination in a
straightforward way,
i.e., directly copies the file to the dest file with actual name. This can 
cause
the problem which some people reported. The problem is that, when the 
server
crashes while WAL file is being archived by cp command, its 
partially-filled
WAL file remains at the archival area. This half-baked archive file can 
cause
various troubles. To address this, WAL file needs to be copied to the 
temporary
file at first, then renamed to the actual name. I think that pg_copy 
should

copy the WAL file in that way.


I intended to make pg_copy a straightforward replacement of cp/copy, which 
complements the missing sync.  Direct I/O and posix_fadvice() feature may be 
convenient but not essential for this utility.  cp/copy doesn't copy to a 
temporary file, and the problem can be solved easily by mv/move.  I wanted 
to keep pg_copy as generic as cp/copy, so that it can be used by some 
advanced features in the future, e.g. comprehensive backup/recovery 
management like RMAN (this example may not be best) when it's integrated 
into the core.


With that said, copying to a temporary file like dest.tmp and renaming it 
to dest sounds worthwhile even as a basic copy utility.  I want to avoid 
copying to a temporary file with a fixed name like _copy.tmp, because some 
advanced utility may want to run multiple instances of pg_copy to copy 
several files into the same directory simultaneously.  However, I'm afraid 
multiple dest.tmp files might continue to occupy disk space after 
canceling copy or power failure in some use cases, where the copy of the 
same file won't be retried.  That's also the reason why I chose to not use a 
temporary file like cp/copy.


Currently pg_copy always syncs the archive file, and there is no way to 
disable
that. But I'm sure that not everyone want to sync the archive file. So I 
think

that it's better to add the option specifying whether to sync the file
or not, into
pg_copy.


pg_copy is for copying a file reliably by syncing.  If sync is not 
necessary, people can use cp/copy.



Some users might want to specify whether to call posix_fadvise or not 
because

they might need to re-read the archvied files just after the archiving.
For example, network copy of the archived files from the archive area to
remote site for disaster recovery.


This sounds reasonable.  Do you have an idea on the switch name and the 
default behavior?



Do you recommend to use pg_copy for restore_command? If yes, it also 
should
be documented. And in the WAL restore case, the restored WAL files are 
re-read

soon by recovery, so posix_fadvise is not good in that case.

Direct I/O and posix_fadvise are used only for destination file. But why 
not

source file? That might be useful especially for restore_command case.


No, I don't think it's necessary to use pg_copy for restore_command.



At last, the big question is, is there really no OS command which provides
the same functionality as pg_copy does? If there is, I'd like to avoid 
duplicate

work basically.


If there exists such a command available in the standard OS installation, I 
want to use it.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-19 Thread MauMau

From: Alvaro Herrera alvhe...@2ndquadrant.com

Is this supposed to be session-local data, or is it visible from remote
sessions too?  How durable is it supposed to be?  Keep in mind that in
case of a crash, all pgstats data is erased.


I want it to be visible from other sessions.  I'm okay about the data 
erasure during recovery.  We can probably extend pg_statsinfo to save the 
new info for long-term trend analysis.  TBH, I want a feature like 
pg_statsinfo in core.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Craig Ringer
On 08/19/2014 06:21 PM, Andres Freund wrote:
 What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
 Reporting the LSN in the command tag? Anything doing transparent
 failover needs to be aware of transaction boundaries anyway. 

Tom's objection to a GUC applies there too - a client app can send that
when the underlying driver doesn't expect to get the results.

I'm not completely convinced that's a problem - oh dear, the app breaks.
The answer to so many other things in Pg is well, don't do that then
that I don't see this as overly different.

However, granting that it is a problem, the same objection to a GUC
applies to this too.
-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-19 Thread Andres Freund
On 2014-08-19 21:47:26 +0800, Craig Ringer wrote:
 On 08/19/2014 06:21 PM, Andres Freund wrote:
  What's the problem with the COMMIT WITH (report_lsn on) I've proposed?
  Reporting the LSN in the command tag? Anything doing transparent
  failover needs to be aware of transaction boundaries anyway. 
 
 Tom's objection to a GUC applies there too - a client app can send that
 when the underlying driver doesn't expect to get the results.

I don't really think this is true. With a GUC it's set for the whole
session or even users. With such a option to COMMIT it'd only set when
issued by something that actually does transparent failover (i.e. the
underlying driver).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New PostgreSQL buildfarm client release 4.14 - bug fix for MSVC

2014-08-19 Thread Andrew Dunstan
There is a new release - version 4.14 - of the buildfarm client, now 
available at 
http://www.pgbuildfarm.org/downloads/releases/build-farm-4_14.tgz


The only change of note is that a bug which only affects MSVC clients 
(such that the client will not complete a run) and is present in 
releases 4.12 and 4.13 is fixed. Clients on other platforms do not need 
to upgrade.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enable WAL archiving even in standby

2014-08-19 Thread MauMau

From: Fujii Masao masao.fu...@gmail.com

I'd propose the attached WIP patch which allows us to enable WAL archiving
even in standby. The patch adds always as the valid value of 
archive_mode.
If it's set to always, the archiver is started when the server is in 
standby
mode and all the WAL files that walreceiver wrote to the disk are archived 
by

using archive_command. Then, even after the server is promoted to master,
the archiver keeps archiving WAL files. The patch doesn't change the 
meanings

of the setting values on and off of archive_mode.

I think that this feature is useful for the case, e.g., where large 
database

needs to be replicated between remote servers. Imagine the situation where
the replicated database gets corrupted completely in the remote standby.
How should we address this problematic situation and restart the standby?

One approach is to take a fresh backup from the master and restore it onto
the standby. But since the database is large and there is long distance
between two servers, this approach might take a surprisingly long time.

Another approach is to restore the backup which was taken from the standby
before. But most of many WAL files which the backup needs might exist only
in the master (because WAL archiving cannot be enabled in the standby) and
they need to be transfered from the master to the standby via 
long-distance

network. So I think that this approach also would take a fairly long time.
To shorten that time, you may think that archive_command in the master can
be set so that it transfers WAL files from the master to the standby's
archival storage. I agree that this setting can accelerate the database 
restore

process. But this causes every WAL files to be transfered between remote
servers twice (one is by streaming replication, another is by 
archive_command),

and which is a waste of network bandwidth.


Great.  This is exactly what I hoped for disaster recovery, although I 
haven't looked at the patch yet.




Back to the patch. If archive_mode is set to always, archive_command is
always used to archive WAL files even during recovery. Do we need to 
separate

the command into two for master and standby, respectively? We can add
something like standby_archive_command parameter which is used to archive
only WAL files walreceiver writes. The other WAL files are archived by
archive_command. I'm not sure if it's really worth separating the command
that way. Is there any use case?


I don't see any reason to separate parameters.  I want the spec simple.



I've not included the update of document in the patch yet. If we agree to
support this feature, I will do the remaining work.


Could you consider adding a new section for disaster recovery that describes 
concrete parameter settings (e.g. how do we discard old archive WAL files 
after taking a base backup from standby, because backup label file is not 
created?).  Good luck!


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-19 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 Barring objections or better ideas, I'm leaning towards
 XLogReadBufferForRedo.

WFM

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-08-19 Thread Andres Freund
On 2014-08-19 10:33:29 -0400, Alvaro Herrera wrote:
 Heikki Linnakangas wrote:
 
  Barring objections or better ideas, I'm leaning towards
  XLogReadBufferForRedo.
 
 WFM

for me too. Although we could imo strip the 'XLog' in the beginning if
we want to make it shorter. The ForRedo is saying that pretty much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
   server_cert_valid: Did the server present a valid certificate?
 yes or no
 
   server_cert_matches_host: Does the Common Name of the certificate
 match the host connected to? yes or no

Aren't these questions addressed by sslmode?

 Exposing the SSL information as generic key/value pairs allows
 adding more attributes in the future, without breaking the ABI, and
 it also allows exposing implementation-specific information in a
 generic way. The attributes listed above cover the needs of psql.
 What else do we need?

At first blush, I'd say a whole bunch..  Off the top of my head I can
think of:

For all certificates:
(client, server, cert that signed each, any intermediate CAs, root CAs)
  Certificate itself (perhaps in DER, PEM, X509 formats..)
  Fingerprint
  Signed-By info
  Common Name
  Organization (et al)
  Alternate names
  Issue date, expiration date
  CRL info, OCSP info
  Allowed usage (encryption, signing, etc)

CRL checking done?
OCSP used?

 I think it would also be nice to get more information from the
 server's certificate, like the hostname and the organization its
 issued to, and expiration date, so that an interactive client like
 pgAdmin or even psql could display that information like a web
 browser does. Would it be best to add those as extra attributes in
 the above list, perhaps with a server_cert_* prefix, or add a new
 function for extracting server cert's attributes?

This really shouldn't be for *just* the server's certificate but rather
available for all certificates involved- on both sides.

 The other question is: What do we do with PQgetssl()? We should
 document it as deprecated, but we'll have to keep it around for the
 foreseeable future for backwards-compatibility. We obviously cannot
 return a valid OpenSSL struct when using any other implementation,
 so I think it'll have to just return NULL when not using OpenSSL.
 Probably the most common use of PQgetssl() is to just check if it
 returns NULL or not, to determine if SSL is enabled, so a client
 that does that would incorrectly think that SSL is not used, even
 when it is. I think we can live with that.

That's not ideal, but the only other option I can think of offhand is to
break the existing API and force everyone to update and that seems
worse.

Have you looked at how this change will play out with the ODBC driver..?
Especially on Windows with the SSL library you're proposing we use
there..  I recall that at one point the ODBC driver simply used libpq to
handle the authentication and set everything up, and then switched to
talking directly without libpq.  In any case, it'd probably be good to
make sure the attributes you're suggesting are sufficient to meet the
needs of the ODBC driver too.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund
On 2014-08-19 10:48:41 -0400, Stephen Frost wrote:
  Exposing the SSL information as generic key/value pairs allows
  adding more attributes in the future, without breaking the ABI, and
  it also allows exposing implementation-specific information in a
  generic way. The attributes listed above cover the needs of psql.
  What else do we need?
 
 At first blush, I'd say a whole bunch..  Off the top of my head I can
 think of:
 
 For all certificates:
 (client, server, cert that signed each, any intermediate CAs, root CAs)
   Certificate itself (perhaps in DER, PEM, X509 formats..)
   Fingerprint
   Signed-By info
   Common Name
   Organization (et al)
   Alternate names
   Issue date, expiration date
   CRL info, OCSP info
   Allowed usage (encryption, signing, etc)
 
 CRL checking done?
 OCSP used?

I'm not really sure we need all that. We're not building a general ssl
library abstraction here. Presenting all those in a common and useful
format isn't trivial.

What I'm wondering is whether we should differentiate 'standard'
attributes that we require from ones that a library can supply
optionally. If we don't we'll have difficulty enlarging the 'standard'
set over time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Magnus Hagander
On Tue, Aug 19, 2014 at 4:48 PM, Stephen Frost sfr...@snowman.net wrote:
 * Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
   server_cert_valid: Did the server present a valid certificate?
 yes or no

   server_cert_matches_host: Does the Common Name of the certificate
 match the host connected to? yes or no

 Aren't these questions addressed by sslmode?

Not entirely. You can have sslmode=require and have a matching
certificate. You don't *have* to have sslmode=verify-full for that.

However, whether it makes *sense* without sslmode is another story -
but assuming you use something like kerberos for auth, it might. For
password, you've already lost once you get that far.


 Exposing the SSL information as generic key/value pairs allows
 adding more attributes in the future, without breaking the ABI, and
 it also allows exposing implementation-specific information in a
 generic way. The attributes listed above cover the needs of psql.
 What else do we need?

 At first blush, I'd say a whole bunch..  Off the top of my head I can
 think of:

 For all certificates:
 (client, server, cert that signed each, any intermediate CAs, root CAs)
   Certificate itself (perhaps in DER, PEM, X509 formats..)

Yeah, if we can extract it in PEM for example, that would be useful.

   Fingerprint

Definitely.

   Signed-By info

If we can get the full cert, do that one instead.

   Common Name

Definitely.

   Organization (et al)
   Alternate names
   Issue date, expiration date
   CRL info, OCSP info
   Allowed usage (encryption, signing, etc)

All those would also be covered by the certificate itself part I
think - they're not that common.


 CRL checking done?
 OCSP used?

 I think it would also be nice to get more information from the
 server's certificate, like the hostname and the organization its
 issued to, and expiration date, so that an interactive client like
 pgAdmin or even psql could display that information like a web
 browser does. Would it be best to add those as extra attributes in
 the above list, perhaps with a server_cert_* prefix, or add a new
 function for extracting server cert's attributes?

 This really shouldn't be for *just* the server's certificate but rather
 available for all certificates involved- on both sides.

Well, if you are already the client, wouldn't you know your own certificate?


 The other question is: What do we do with PQgetssl()? We should
 document it as deprecated, but we'll have to keep it around for the
 foreseeable future for backwards-compatibility. We obviously cannot
 return a valid OpenSSL struct when using any other implementation,
 so I think it'll have to just return NULL when not using OpenSSL.
 Probably the most common use of PQgetssl() is to just check if it
 returns NULL or not, to determine if SSL is enabled, so a client
 that does that would incorrectly think that SSL is not used, even
 when it is. I think we can live with that.

 That's not ideal, but the only other option I can think of offhand is to
 break the existing API and force everyone to update and that seems
 worse.

Agreed.

If we just return an arbitrary pointer, then any application that
*did* actually try to use it would crash.

It's not ideal, but errorring in the way of not saying we're secure
when we are, is acceptable - unlike the opposite.

Of course, we need to publish it very clearly in the release notes,
and I would suggest backpatching into the documentation in old
versions etc as well.


 Have you looked at how this change will play out with the ODBC driver..?
 Especially on Windows with the SSL library you're proposing we use
 there..  I recall that at one point the ODBC driver simply used libpq to
 handle the authentication and set everything up, and then switched to
 talking directly without libpq.  In any case, it'd probably be good to
 make sure the attributes you're suggesting are sufficient to meet the
 needs of the ODBC driver too.

+1.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-08-19 10:48:41 -0400, Stephen Frost wrote:
  At first blush, I'd say a whole bunch..  Off the top of my head I can
  think of:

[...]

 I'm not really sure we need all that. We're not building a general ssl
 library abstraction here.

Really?  I'm pretty sure that's exactly what we're doing.  What I was
wondering is which one we should be modeling off of.

One thought I had was to look at what Apache's mod_ssl provides, which
can be seen here: http://httpd.apache.org/docs/2.2/mod/mod_ssl.html

I know that I've used quite a few of those.

Telling users they simply can't have this information isn't acceptable.
I'm not a huge fan of just passing back all of the certificates and
making the user extract out the information themselves, but if it comes
down to it then that's at least better than removing any ability to get
at that information.

 What I'm wondering is whether we should differentiate 'standard'
 attributes that we require from ones that a library can supply
 optionally. If we don't we'll have difficulty enlarging the 'standard'
 set over time.

If we end up not being able to provide everything for all of the
libraries we support then perhaps we can document which are available
from all of them, but I'd hope the list of only in X is pretty small.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [Fwd: Re: [HACKERS] proposal: new long psql parameter --on-error-stop]

2014-08-19 Thread Alvaro Herrera
Abhijit Menon-Sen wrote:
 At 2014-06-29 20:35:04 +0900, maumau...@gmail.com wrote:
 
  Thanks, I marked it as ready for committer.  I hope Fujii san or
  another committer will commit this, refining English expression if
  necessary.
 
 Since it was just a matter of editing, I went through the patch and
 corrected various minor errors (typos, awkwardness, etc.). I agree
 that this is now ready for committer.

FWIW I think determines was correct.



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Magnus Hagander
On Tue, Aug 19, 2014 at 5:05 PM, Stephen Frost sfr...@snowman.net wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-08-19 10:48:41 -0400, Stephen Frost wrote:
  At first blush, I'd say a whole bunch..  Off the top of my head I can
  think of:

 [...]

 I'm not really sure we need all that. We're not building a general ssl
 library abstraction here.

 Really?  I'm pretty sure that's exactly what we're doing.  What I was
 wondering is which one we should be modeling off of.

 One thought I had was to look at what Apache's mod_ssl provides, which
 can be seen here: http://httpd.apache.org/docs/2.2/mod/mod_ssl.html

 I know that I've used quite a few of those.

 Telling users they simply can't have this information isn't acceptable.
 I'm not a huge fan of just passing back all of the certificates and
 making the user extract out the information themselves, but if it comes
 down to it then that's at least better than removing any ability to get
 at that information.

Yeah, being able to provide most of them easily accessible is a good
thing. Otherwise, we just move the burden to deparse them to the
client which will then have to know which SSL library it's built
against, so every single client that wants to do something useful with
the cert would have to know about multiple implementations.

I think starting from the apache list is a very good idea.

We should then expose the same set of data at least through the
sslinfo server module.


 What I'm wondering is whether we should differentiate 'standard'
 attributes that we require from ones that a library can supply
 optionally. If we don't we'll have difficulty enlarging the 'standard'
 set over time.

 If we end up not being able to provide everything for all of the
 libraries we support then perhaps we can document which are available
 from all of them, but I'd hope the list of only in X is pretty small.

+1. I bet the most common ones will be in all of them, because
frankly, it's functionality you just need to use SSL properly.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
 On Tue, Aug 19, 2014 at 4:48 PM, Stephen Frost sfr...@snowman.net wrote:
  Aren't these questions addressed by sslmode?
 
 Not entirely. You can have sslmode=require and have a matching
 certificate. You don't *have* to have sslmode=verify-full for that.
 
 However, whether it makes *sense* without sslmode is another story -
 but assuming you use something like kerberos for auth, it might. For
 password, you've already lost once you get that far.

Sure- I guess my point was really, if you're not verifying them by
sslmode=verify-full, do you really want to ask the question?  If you
*are* verifying them by verify-full, then you already know the answers.

  What else do we need?
 
  At first blush, I'd say a whole bunch..  Off the top of my head I can
  think of:
 
  For all certificates:
  (client, server, cert that signed each, any intermediate CAs, root CAs)
Certificate itself (perhaps in DER, PEM, X509 formats..)
 
 Yeah, if we can extract it in PEM for example, that would be useful.
 
Fingerprint
 
 Definitely.
 
Signed-By info
 
 If we can get the full cert, do that one instead.
 
Common Name
 
 Definitely.
 
Organization (et al)
Alternate names
Issue date, expiration date
CRL info, OCSP info
Allowed usage (encryption, signing, etc)
 
 All those would also be covered by the certificate itself part I
 think - they're not that common.

Not sure I agree with that but what I don't really like is the
suggestion that we'll need to tell everyone who wants more detailed
information from the certificate to link in whatever their preferred SSL
library is and use that to decode the PEM cert to pull the info.  We'll
end up having applications linking in both OpenSSL and GNUTLS, for
example, which is pretty grotty, imv.

Serial is absolutely another one we need to include, as I look over at
what mod_ssl supports.  Really, I'd look at that list as our minimum to
support..

  I think it would also be nice to get more information from the
  server's certificate, like the hostname and the organization its
  issued to, and expiration date, so that an interactive client like
  pgAdmin or even psql could display that information like a web
  browser does. Would it be best to add those as extra attributes in
  the above list, perhaps with a server_cert_* prefix, or add a new
  function for extracting server cert's attributes?
 
  This really shouldn't be for *just* the server's certificate but rather
  available for all certificates involved- on both sides.
 
 Well, if you are already the client, wouldn't you know your own certificate?

Uh, no?  Not without having a library of your own which can open the
certificate file (after it figures out which one we decided to use- oh
yeah, we should probably include that information too..  and then we
have to make sure we can represent things like on a smart card) and
then parse and extract the information you want from it..

  That's not ideal, but the only other option I can think of offhand is to
  break the existing API and force everyone to update and that seems
  worse.
 
 Agreed.
 
 If we just return an arbitrary pointer, then any application that
 *did* actually try to use it would crash.

That wasn't what I was thinking but rather something like remove
PQgetssl and replace it with PQgetopenssl or something, breaking the
API completely, forcing everyone to make changes to compile against the
new library, etc, etc.  Very ugly but also very obvious.

 It's not ideal, but errorring in the way of not saying we're secure
 when we are, is acceptable - unlike the opposite.

Yeah, I tend to agree, though I don't particularly like it.  The options
are just so much worse. :/

 Of course, we need to publish it very clearly in the release notes,
 and I would suggest backpatching into the documentation in old
 versions etc as well.

Sounds like a good idea to me.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas

On 08/19/2014 05:48 PM, Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

   server_cert_valid: Did the server present a valid certificate?
yes or no

   server_cert_matches_host: Does the Common Name of the certificate
match the host connected to? yes or no


Aren't these questions addressed by sslmode?


Sort of. In sslmode=verify-ca, libpq checks that the server cert was 
valid (the first attribute) and rejects the connection if not. In 
verify-full mode, it also checks that the hostname matches (the second 
attribute). But in sslmode=require, it's possible to connect to a server 
with an invalid server cert. (to be precise in sslmode=require mode 
libpq checks the server cert if a root CA cert was given, but if no root 
CA cert is configured it will allow connecting anyway).


I think it would be nice to be able to query those attributes 
explicitly, rather than just expect libpq to reject the connection if 
something's wrong. For example, I'm thinking that an interactive client 
might present an annoying pop-up window to the user if the server cert 
is not valid, asking if he wants to connect anyway, and perhaps remember 
the certificate and not ask again (TOFU).


We don't actually have such functionality today; you can query the 
OpenSSL structs for those things, but the checks that libpq performs are 
not exactly the same that OpenSSL does. We have our own function to 
check if a wildcard cert matches a hostname, for example, and libpq 
knows that host and hostaddr can be different. So this would 
actually be a new feature, probably best to be implemented as a separate 
patch. (I grabbed the idea for those attributes from Martijn's ancient 
gnutls patch.)



Exposing the SSL information as generic key/value pairs allows
adding more attributes in the future, without breaking the ABI, and
it also allows exposing implementation-specific information in a
generic way. The attributes listed above cover the needs of psql.
What else do we need?


At first blush, I'd say a whole bunch..  Off the top of my head I can
think of:

For all certificates:
(client, server, cert that signed each, any intermediate CAs, root CAs)
   Certificate itself (perhaps in DER, PEM, X509 formats..)
   Fingerprint
   Signed-By info
   Common Name
   Organization (et al)
   Alternate names
   Issue date, expiration date
   CRL info, OCSP info
   Allowed usage (encryption, signing, etc)


Hmm. That seems a bit too much. Perhaps provide just the certificate 
itself in DER/PEM format, and have the client parse it (using OpenSSL or 
something else) if it wants more details.



CRL checking done?


I guess, although you know implicitly that it was if the sslcrl option 
was given.



OCSP used?


We don't support OCSP.


I think it would also be nice to get more information from the
server's certificate, like the hostname and the organization its
issued to, and expiration date, so that an interactive client like
pgAdmin or even psql could display that information like a web
browser does. Would it be best to add those as extra attributes in
the above list, perhaps with a server_cert_* prefix, or add a new
function for extracting server cert's attributes?


This really shouldn't be for *just* the server's certificate but rather
available for all certificates involved- on both sides.


Ok, but why? All the other stuff is readily available in the 
configuration you use to connect. I guess it doesn't hurt to expose them 
through this interface as well, but I can't immediately think of an 
example that would use them.



Have you looked at how this change will play out with the ODBC driver..?
Especially on Windows with the SSL library you're proposing we use
there..  I recall that at one point the ODBC driver simply used libpq to
handle the authentication and set everything up, and then switched to
talking directly without libpq.  In any case, it'd probably be good to
make sure the attributes you're suggesting are sufficient to meet the
needs of the ODBC driver too.


Indeed, the ODBC driver only uses libpq for authentication, then calls 
PQgetssl(), and takes over the whole show calling SSL_read() and 
SSL_write() itself. Ideally, we'd modify psqlodbc to stop doing that, 
but that's not an easy job. In the short-term, I think we need to export 
pqsecure_read() and pqsecure_write() functions in libpq, so that the 
ODBC driver can use those instead of SSL_read() and SSL_write().


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund
On 2014-08-19 11:05:07 -0400, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  On 2014-08-19 10:48:41 -0400, Stephen Frost wrote:
   At first blush, I'd say a whole bunch..  Off the top of my head I can
   think of:
 
 [...]
 
  I'm not really sure we need all that. We're not building a general ssl
  library abstraction here.
 
 Really?  I'm pretty sure that's exactly what we're doing.

No. We should build something that's suitable for postgres, not
something general. We'll fail otherwise. For anything fancy the user has
to look at the certificate themselves. We should make it easy to get at
the whole certificate chain in a consistent manner.

 Telling users they simply can't have this information isn't
 acceptable.

Meh. Why? Most of that isn't something a normal libpq user is going to
need.

  What I'm wondering is whether we should differentiate 'standard'
  attributes that we require from ones that a library can supply
  optionally. If we don't we'll have difficulty enlarging the 'standard'
  set over time.
 
 If we end up not being able to provide everything for all of the
 libraries we support then perhaps we can document which are available
 from all of them, but I'd hope the list of only in X is pretty small.

I'm pretty sure that we can't build a reasonable list of the information
exposed by any library. Especially as we're likely going to need some
mapping to agree to map to the common names.

I'd just go for plain names for standard attributes and X-$library- for library
specific stuff.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Postgres-xc-developers] Trove with PostgreSQL-XC

2014-08-19 Thread Craig Ringer
I replied to the XC-list (only) to ask them to discontinue cross-posting
this thread.

Replying here just so you know.

On 08/19/2014 07:46 PM, Vivek Singh Raghuwanshi wrote:
 Thanks,
 One more question, is this library support multitenancy or we need to
 launch separate VPC (virtual public cloud) every time for each customer.
 its good if we have both option.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-19 Thread Craig Ringer
On 08/19/2014 05:02 AM, Joshua D. Drake wrote:
 
 I am finally able to get back around to this and I am still calling:
 Hokey. I just loaded up a fresh precise (I assume trusty will act the
 same way) and installed postgresql. I installed it, without the PDGD
 repository and everything worked perfectly. The only error I got when
 using pgxnclient to install pg_repack was an error about not having
 libedit-dev installed. I installed it, and it was perfect. I even tested
 with create extension etc...
 
 So... If we are supposed to ship the latest lib... how come Debian or
 Ubuntu don't do that? They ship the latest lib for the version they are
 shipping and because of that, everything works, as expected.
 
 I iterate, the current apt.postgresql.org is not doing things correctly.
 It breaks things and it shouldn't.

FWIW, this is inconsistent with what yum.postgresql.org does - it takes
the POLA approach of packing the libpq from the major release configured
in the repo. Each major has its own sub-repo.

I find it pretty hard to justify installing a 9.3 libpq alongside a 9.1
server myself.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump refactor patch to remove global variables

2014-08-19 Thread Craig Ringer
On 08/19/2014 01:40 AM, Robert Haas wrote:
  Attached is a patch that doesn't add any new functionality or
  features, all it does is get rid of the global variables that
  pg_dump.c is full of.
 I think this is an excellent idea.

It's also one small step toward library-ifying pg_dump.

Huge +1.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 I think it would be nice to be able to query those attributes
 explicitly, rather than just expect libpq to reject the connection
 if something's wrong. For example, I'm thinking that an interactive
 client might present an annoying pop-up window to the user if the
 server cert is not valid, asking if he wants to connect anyway, and
 perhaps remember the certificate and not ask again (TOFU).

Alright, I could see that being useful, though as you say, it'd really
be new functionality.

 Hmm. That seems a bit too much. Perhaps provide just the certificate
 itself in DER/PEM format, and have the client parse it (using
 OpenSSL or something else) if it wants more details.

I really don't care for that approach.  Our SSL support has always been
horrible- I was hoping we'd actually improve that situation.  Adding
things in piecemeal over time will just be painful for our users and I
don't see why we should wait.

 OCSP used?
 
 We don't support OCSP.

Another thing that we really should address (actually- can't you enable
it in OpenSSL directly?  I seem to recall something along those lines
anyway, though it's been quite a few years now).

 This really shouldn't be for *just* the server's certificate but rather
 available for all certificates involved- on both sides.
 
 Ok, but why? All the other stuff is readily available in the
 configuration you use to connect. I guess it doesn't hurt to expose
 them through this interface as well, but I can't immediately think
 of an example that would use them.

For starters, certificates can be passed between the client and the
server to complete the chain, so I don't see how it's readily
available, not to mention that even if the location of the certs was in
simple files locally, the application would need to bring in their own
library to parse and extract out this information, which we've
more-or-less already got.

 Indeed, the ODBC driver only uses libpq for authentication, then
 calls PQgetssl(), and takes over the whole show calling SSL_read()
 and SSL_write() itself. Ideally, we'd modify psqlodbc to stop doing
 that, but that's not an easy job. In the short-term, I think we need
 to export pqsecure_read() and pqsecure_write() functions in libpq,
 so that the ODBC driver can use those instead of SSL_read() and
 SSL_write().

Yeah, that's what I remembered.  There was an attempt to make that
change at one point, but it was reverted due to the lack of batching
ability in libpq (without resorting to cursors, as I recall...),
requiring double the memory usage.  Still, if pqsecure_read and
pqsecure_write are sufficient to make the ODBC driver work, that's good
news.  I had been worried it did other things with the OpenSSL struct
beyond just using those.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
 No. We should build something that's suitable for postgres, not
 something general. We'll fail otherwise. For anything fancy the user has
 to look at the certificate themselves. We should make it easy to get at
 the whole certificate chain in a consistent manner.

I don't buy this argument at all.

  Telling users they simply can't have this information isn't
  acceptable.
 
 Meh. Why? Most of that isn't something a normal libpq user is going to
 need.

I'm not interested in SSL support for users who don't use or care about
SSL (which would be 'normal libpq users', really).  I've *long* been
frustrated by our poor support of SSL and at how painful it is to get
proper SSL working- and it's been a real problem getting PG to pass the
security compliance requirements because of that poor support.  Let's
stop the rhetoric that PG doesn't need anything but the most basic
SSL/auditing/security capabilities.

  If we end up not being able to provide everything for all of the
  libraries we support then perhaps we can document which are available
  from all of them, but I'd hope the list of only in X is pretty small.
 
 I'm pretty sure that we can't build a reasonable list of the information
 exposed by any library. Especially as we're likely going to need some
 mapping to agree to map to the common names.

Per Apache's documentation, mod_ssl and mod_gnutls support the same set
of environment variables (with the same names even), so I don't buy this
argument either.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund
On 2014-08-19 11:52:37 -0400, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  No. We should build something that's suitable for postgres, not
  something general. We'll fail otherwise. For anything fancy the user has
  to look at the certificate themselves. We should make it easy to get at
  the whole certificate chain in a consistent manner.
 
 I don't buy this argument at all.

Aha.

   Telling users they simply can't have this information isn't
   acceptable.
  
  Meh. Why? Most of that isn't something a normal libpq user is going to
  need.
 
 I'm not interested in SSL support for users who don't use or care about
 SSL (which would be 'normal libpq users', really).

That's the majority of our users. Even those that care about ssl care
about setting it up in a safe manner, won't care about most of the
attributes.

I have no problem to expand the list of attributes once we have a couple
of differing backends for the support, but having a long list of things
that need to be supported by every one just makes getting there harder.

 I've *long* been
 frustrated by our poor support of SSL and at how painful it is to get
 proper SSL working- and it's been a real problem getting PG to pass the
 security compliance requirements because of that poor support.  Let's
 stop the rhetoric that PG doesn't need anything but the most basic
 SSL/auditing/security capabilities.

I've no problem with keeping future extensions of the API in mind while
this is being designed. We just shouldn't start too big. This is about
getting a proper abstraction in place, not making pg pass security
compliance stuff. Don't mix those too much.

   If we end up not being able to provide everything for all of the
   libraries we support then perhaps we can document which are available
   from all of them, but I'd hope the list of only in X is pretty small.
  
  I'm pretty sure that we can't build a reasonable list of the information
  exposed by any library. Especially as we're likely going to need some
  mapping to agree to map to the common names.
 
 Per Apache's documentation, mod_ssl and mod_gnutls support the same set
 of environment variables (with the same names even), so I don't buy this
 argument either.

Gnutls is quite similar from what it provides to openssl. That's not
saying much. Schannel would be more interesting from that point of view.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Alvaro Herrera
Stephen Frost wrote:
 * Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 
  Indeed, the ODBC driver only uses libpq for authentication, then
  calls PQgetssl(), and takes over the whole show calling SSL_read()
  and SSL_write() itself. Ideally, we'd modify psqlodbc to stop doing
  that, but that's not an easy job. In the short-term, I think we need
  to export pqsecure_read() and pqsecure_write() functions in libpq,
  so that the ODBC driver can use those instead of SSL_read() and
  SSL_write().
 
 Yeah, that's what I remembered.  There was an attempt to make that
 change at one point, but it was reverted due to the lack of batching
 ability in libpq (without resorting to cursors, as I recall...),
 requiring double the memory usage.  Still, if pqsecure_read and
 pqsecure_write are sufficient to make the ODBC driver work, that's good
 news.  I had been worried it did other things with the OpenSSL struct
 beyond just using those.

Um, libpq has recently gained the ability to return result fragments,
right?  Those didn't exist when libpq-ification of odbc was attempted,
as I recall -- perhaps it's possible now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Andres Freund (and...@2ndquadrant.com) wrote:
  Per Apache's documentation, mod_ssl and mod_gnutls support the same set
  of environment variables (with the same names even), so I don't buy this
  argument either.
 
 Gnutls is quite similar from what it provides to openssl. That's not
 saying much. Schannel would be more interesting from that point of view.

Fine- but let's at least start with what two of the three support and
figure out if there's actually an issue getting this information from
Schannel.  I'd be surprised if there really is, but I'm a lot happier
starting with a larger set and then considering if we can live without
certain things than trying to build up one-by-one over major releases.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  Yeah, that's what I remembered.  There was an attempt to make that
  change at one point, but it was reverted due to the lack of batching
  ability in libpq (without resorting to cursors, as I recall...),
  requiring double the memory usage.  Still, if pqsecure_read and
  pqsecure_write are sufficient to make the ODBC driver work, that's good
  news.  I had been worried it did other things with the OpenSSL struct
  beyond just using those.
 
 Um, libpq has recently gained the ability to return result fragments,
 right?  Those didn't exist when libpq-ification of odbc was attempted,
 as I recall -- perhaps it's possible now.

I was trying to remember off-hand if we still had that or not..  I
thought there was discussion about removing it, actually, but perhaps
that was something else.

I agree that having that would definitely help with the ODBC driver.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas

On 08/19/2014 06:44 PM, Stephen Frost wrote:

Hmm. That seems a bit too much. Perhaps provide just the certificate
itself in DER/PEM format, and have the client parse it (using
OpenSSL or something else) if it wants more details.

I really don't care for that approach.  Our SSL support has always been
horrible- I was hoping we'd actually improve that situation.  Adding
things in piecemeal over time will just be painful for our users and I
don't see why we should wait.


What would you like to do with the certificates?

I'm imagining that a GUI tool like pgAdmin might want to extract all 
information from the certificate, display it in a window, and let the 
user look at the whole chain and all the fields. Like a browser does 
when you click the little lock icon in the address bar. That would be a 
nice feature, but it's a huge effort to expose *all* certificate 
information through attributes, especially if you want to support 
multiple SSL libraries. If there was a generic get attribute X 
interface in OpenSSL and all the other SSL libraries we wish to support, 
we could provide a pass-through mechanism for that, so that e.g all 
attributes that OpenSSL exposes were mapped to server_cert_*. But I 
don't think that exists in OpenSSL, let alone in other libraries, and 
the attribute names would be all different anyway.


So that's not really feasible.

But if we provide an interface to grab the whole certificate chain, then 
you can use any library you want to parse and present it to the user. 
You could use OpenSSL, but you could also use a more light-weight parser 
like libtasn1, or if you're writing a python app for example, whatever 
x509 certificate handling library they have. You wouldn't be *verifying* 
the certificates - that's handled by libpq (or rather, the SSL library 
that libpq uses) - so no cryptography required.


Or you could just pass the whole cert to a 3rd party program 
specifically written to display x509 certificates, and let it do the 
parsing. I'll mention that the Windows Crypto API has a built-in 
function called CryptUIDlgViewCertificate that pops up a dialog for 
viewing the certificate. Very handy. I think it's the same dialog that 
Internet Explorer uses.


If you want to write such a GUI from scratch, anyway, I think you would 
be better off to *not* rely on libpq functions, so that you could use 
the same GUI in other contexts too. Like to view an arbitrary 
certificate file on the filesystem.


That said, if there's a need to extract some specific fields for some 
other purpose than displaying the whole certificate to the user, let's 
hear it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas

On 08/19/2014 07:10 PM, Alvaro Herrera wrote:

Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:


Indeed, the ODBC driver only uses libpq for authentication, then
calls PQgetssl(), and takes over the whole show calling SSL_read()
and SSL_write() itself. Ideally, we'd modify psqlodbc to stop doing
that, but that's not an easy job. In the short-term, I think we need
to export pqsecure_read() and pqsecure_write() functions in libpq,
so that the ODBC driver can use those instead of SSL_read() and
SSL_write().


Yeah, that's what I remembered.  There was an attempt to make that
change at one point, but it was reverted due to the lack of batching
ability in libpq (without resorting to cursors, as I recall...),
requiring double the memory usage.  Still, if pqsecure_read and
pqsecure_write are sufficient to make the ODBC driver work, that's good
news.  I had been worried it did other things with the OpenSSL struct
beyond just using those.


Um, libpq has recently gained the ability to return result fragments,
right?  Those didn't exist when libpq-ification of odbc was attempted,
as I recall -- perhaps it's possible now.


IIRC the thing that psqlodbc does that libpq doesn't support is sending 
multiple queries to the backend, and then wait for *all* the replies to 
arrive, in a single round-trip. The closest thing is using PQexec(foo; 
bar;), but that's quite limited.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Um, libpq has recently gained the ability to return result fragments,
 right?  Those didn't exist when libpq-ification of odbc was attempted,
 as I recall -- perhaps it's possible now.

 I was trying to remember off-hand if we still had that or not..  I
 thought there was discussion about removing it, actually, but perhaps
 that was something else.

Sure,
http://www.postgresql.org/docs/devel/static/libpq-single-row-mode.html
That's a done deal, it won't be going away.

Whether it would solve ODBC's problem I don't know (and I'm not
volunteering to do the work ;-))

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas

On 08/19/2014 06:52 PM, Stephen Frost wrote:

* Andres Freund (and...@2ndquadrant.com) wrote:

No. We should build something that's suitable for postgres, not
something general. We'll fail otherwise. For anything fancy the user has
to look at the certificate themselves. We should make it easy to get at
the whole certificate chain in a consistent manner.


I don't buy this argument at all.


Telling users they simply can't have this information isn't
acceptable.


Meh. Why? Most of that isn't something a normal libpq user is going to
need.


I'm not interested in SSL support for users who don't use or care about
SSL (which would be 'normal libpq users', really).  I've *long* been
frustrated by our poor support of SSL and at how painful it is to get
proper SSL working- and it's been a real problem getting PG to pass the
security compliance requirements because of that poor support.  Let's
stop the rhetoric that PG doesn't need anything but the most basic
SSL/auditing/security capabilities.


I think you just packed up the goalposts for a one-way trip to Mars, but 
I wonder: What would you consider proper SSL support? What exactly are 
we missing?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas

On 08/19/2014 06:00 PM, Magnus Hagander wrote:

On Tue, Aug 19, 2014 at 4:48 PM, Stephen Frost sfr...@snowman.net wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

   server_cert_valid: Did the server present a valid certificate?
yes or no

   server_cert_matches_host: Does the Common Name of the certificate
match the host connected to? yes or no


Aren't these questions addressed by sslmode?


Not entirely. You can have sslmode=require and have a matching
certificate. You don't *have* to have sslmode=verify-full for that.

However, whether it makes *sense* without sslmode is another story -
but assuming you use something like kerberos for auth, it might. For
password, you've already lost once you get that far.


Hmm, right, because the client application doesn't get control between 
libpq doing the SSL negotiation and sending the password to the server. 
So if after connecting you decided that you don't actually trust the 
server, you've already sent to password. Not good.


You might think that you could try connecting without password first, 
and try again with the password, but that's not safe either, because 
there's no guarantee that the second connection reaches the same server 
as the first one.


I think we need a callback or new asynchronous polling state after SSL 
negotiation but before libpq sends the password to the server. But 
that's a separate feature and patch.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication commands and log_statements

2014-08-19 Thread Robert Haas
On Sat, Aug 16, 2014 at 10:27 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I think ideally it would have been better if we could have logged
 replication commands under separate log_level, but as still there
 is no consensus on extending log_statement and nobody is even
 willing to pursue, it seems okay to go ahead and log these under
 'all' level.

I think the consensus is clearly for a separate GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-08-19 Thread Robert Haas
On Sat, Aug 16, 2014 at 9:31 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 12.8.2014 00:30, Tomas Vondra wrote:
 On 11.8.2014 20:25, Robert Haas wrote:
 It also strikes me that when there's only 1 batch, the set of bits
 that map onto the batch number is zero-width, and one zero-width bit
 range is as good as another.  In other words, if you're only planning
 to do one batch, you can easily grow the number of buckets on the fly.
 Growing the number of buckets only becomes difficult once you have
 more than one batch.

 ...

 I was considering using reversing the bits of the hash, because that's
 pretty much the simplest solution. But I think you're right it might
 actually work like this:

   * Are more batches needed?

   (yes) = just use nbuckets = my_log2(work_mem / tuple_size)

   (no) = go ahead, until processing all tuples or hitting work_mem

   (work_mem) = meh, use the same nbuckets above

   (all tuples) = compute optimal nbuckets / resize


 But I need to think about this a bit. So far it seems to me there's no
 way additional batches might benefit from increasing nbuckets further.

 I think this is a simple and solid solution, solving the batchno
 computation issues quite nicely. Attached is v10 patch (bare and
 combined with the dense allocation), that does this:

 1) when we know we'll need batching, buckets are sized for full work_mem
(using the estimated tuple width, etc.)

 2) without the batching, we estimate the 'right number of buckets' for
the estimated number of tuples, and keep track of the optimal number
as tuples are added to the hash table

- if we discover we need to start batching, we keep the current
  optimal value (which should be the same as the max number of
  buckets) and don't mess with it anymore (making it possible to
  compute batch IDs just like before)

- also, on the fist rebatch (nbatch=1 = nbatch=2) the hash table
  is resized as part of the rebatch

- if the hash build completes without batching, we do the resize

 I believe the patch is pretty much perfect. I plan to do more thorough
 testing on a wide range of queries in the next few days.

 I also removed the 'enable_hash_resize' GUC, because it would be more
 complex to implement this properly after doing the resize as part of
 rebatch etc.. So either it would make the patch more complex, or it
 wouldn't do what the name promises.

A variety of trivial comments on this:

PostgreSQL style is un-cuddled curly braces.  Also, multi-line
comments need to start with a line containing only /* and end with a
line containing only */.  In one place you've added curly braces
around a single-line block that is otherwise unmodified; please don't
do that.  In one place, you have becase instead of because.  In
another place, you write add if after it but it should say add it
after it or maybe better add the new one after it.  Avoid using
punctuation like = in comments to illustrate the connection between
sentences; instead, use a connecting word like then or therefore
or whatever is appropriate; in this instance, a period followed by the
start of a new sentence seems sufficient.  Revert the removal of a
single line of whitespace near the top of nodeHash.c.

There are too many things marked XXX in this patch.  They should
either be fixed, if they are real problems, or they should be
commented in a way that doesn't give rise to the idea that they're
problems if they aren't.

OK, now on to some more substantive stuff:

1. It's not clear to me what the overall effect of this patch on
memory utilization is.  Reducing NTUP_PER_BUCKET from 10 to 1 is going
to use, on the average, 10x as much bucket-header memory per tuple.
Specifically, I think it means we'll use about 8 bytes of
bucket-header memory per tuple instead of 0.8 bytes per tuple.  If the
tuples are narrow, that could be significant; concerns have been
expressed about that here in the past.  Increasing the number of
buckets could also increase memory usage.  On the other hand, the
dense allocation stuff probably saves a ton of memory, so maybe we end
up overall, but I'm not sure.  Your thoughts, and maybe some test
results with narrow and wide tuples, would be appreciated.

2. But, on the positive side, modulo the memory utilization questions
mentioned above, I would expect the impact on hash join performance to
be positive.  Going from 10 tuples per bucket to just 1 should help,
and on cases where the actual load factor would have ended up much
higher because of poor estimation, increasing the number of buckets on
the fly should help even more.  I haven't tested this, though.

I haven't had a chance to completely go through this yet, so these are
just some initial thoughts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-19 Thread Rahila Syed
So, it seems like you're basically using malloc to work around the
fact that a palloc failure is an error, and we can't throw an error in
a critical section.  I don't think that's good; we want all of our
allocations, as far as possible, to be tracked via palloc.  It might
be a good idea to add a new variant of palloc or MemoryContextAlloc
that returns NULL on failure instead of throwing an error; I've wanted
that once or twice.  But in this particular case, I'm not quite seeing
why it should be necessary

I am using malloc to return NULL in case of failure and proceed without
compression of FPW ,if it returns NULL.
Proceeding without compression seems to be more accurate than throwing an
error and exiting because of failure to allocate memory for compression.

the number of backup blocks per record is
limited to some pretty small number, so it ought to be possible to
preallocate enough memory to compress them all, perhaps just by
declaring a global variable like char wal_compression_space[8192]; or
whatever.

In the updated patch  a static global variable is added to which memory is
allocated from heap using malloc outside critical section. The size of the
memory block is 4 * BkpBlock header + 4 * BLCKSZ.


Thank you,



On Mon, Aug 18, 2014 at 10:40 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jul 3, 2014 at 3:58 PM, Rahila Syed rahilasye...@gmail.com
 wrote:
  Updated version of patches are attached.
  Changes are as follows
  1. Improved readability of the code as per the review comments.
  2. Addition of block_compression field in BkpBlock structure to store
  information about compression of block. This provides for switching
  compression on/off and changing compression algorithm as required.
  3.Handling of OOM in critical section by checking for return value of
 malloc
  and proceeding without compression of FPW if return value is NULL.

 So, it seems like you're basically using malloc to work around the
 fact that a palloc failure is an error, and we can't throw an error in
 a critical section.  I don't think that's good; we want all of our
 allocations, as far as possible, to be tracked via palloc.  It might
 be a good idea to add a new variant of palloc or MemoryContextAlloc
 that returns NULL on failure instead of throwing an error; I've wanted
 that once or twice.  But in this particular case, I'm not quite seeing
 why it should be necessary - the number of backup blocks per record is
 limited to some pretty small number, so it ought to be possible to
 preallocate enough memory to compress them all, perhaps just by
 declaring a global variable like char wal_compression_space[8192]; or
 whatever.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-08-19 Thread Tomas Vondra
On 19.8.2014 19:05, Robert Haas wrote:
 On Sat, Aug 16, 2014 at 9:31 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 12.8.2014 00:30, Tomas Vondra wrote:
 On 11.8.2014 20:25, Robert Haas wrote:
 It also strikes me that when there's only 1 batch, the set of bits
 that map onto the batch number is zero-width, and one zero-width bit
 range is as good as another.  In other words, if you're only planning
 to do one batch, you can easily grow the number of buckets on the fly.
 Growing the number of buckets only becomes difficult once you have
 more than one batch.

 ...

 I was considering using reversing the bits of the hash, because that's
 pretty much the simplest solution. But I think you're right it might
 actually work like this:

   * Are more batches needed?

   (yes) = just use nbuckets = my_log2(work_mem / tuple_size)

   (no) = go ahead, until processing all tuples or hitting work_mem

   (work_mem) = meh, use the same nbuckets above

   (all tuples) = compute optimal nbuckets / resize


 But I need to think about this a bit. So far it seems to me there's no
 way additional batches might benefit from increasing nbuckets further.

 I think this is a simple and solid solution, solving the batchno
 computation issues quite nicely. Attached is v10 patch (bare and
 combined with the dense allocation), that does this:

 1) when we know we'll need batching, buckets are sized for full work_mem
(using the estimated tuple width, etc.)

 2) without the batching, we estimate the 'right number of buckets' for
the estimated number of tuples, and keep track of the optimal number
as tuples are added to the hash table

- if we discover we need to start batching, we keep the current
  optimal value (which should be the same as the max number of
  buckets) and don't mess with it anymore (making it possible to
  compute batch IDs just like before)

- also, on the fist rebatch (nbatch=1 = nbatch=2) the hash table
  is resized as part of the rebatch

- if the hash build completes without batching, we do the resize

 I believe the patch is pretty much perfect. I plan to do more thorough
 testing on a wide range of queries in the next few days.

 I also removed the 'enable_hash_resize' GUC, because it would be more
 complex to implement this properly after doing the resize as part of
 rebatch etc.. So either it would make the patch more complex, or it
 wouldn't do what the name promises.
 
 A variety of trivial comments on this:
 
 PostgreSQL style is un-cuddled curly braces.  Also, multi-line
 comments need to start with a line containing only /* and end with a
 line containing only */.  In one place you've added curly braces
 around a single-line block that is otherwise unmodified; please don't
 do that.  In one place, you have becase instead of because.  In
 another place, you write add if after it but it should say add it
 after it or maybe better add the new one after it.  Avoid using
 punctuation like = in comments to illustrate the connection between
 sentences; instead, use a connecting word like then or therefore
 or whatever is appropriate; in this instance, a period followed by the
 start of a new sentence seems sufficient.  Revert the removal of a
 single line of whitespace near the top of nodeHash.c.
 
 There are too many things marked XXX in this patch.  They should
 either be fixed, if they are real problems, or they should be
 commented in a way that doesn't give rise to the idea that they're
 problems if they aren't.

OK, thanks for pointing this out. Attached is v11 of the patch (both
separate and combined with the dense allocation, as before).

I fixed as many of those issues as possible. All the XXX items were
obsolete, except for one in the chunk_alloc function.

I have also removed one constant

 
 OK, now on to some more substantive stuff:
 
 1. It's not clear to me what the overall effect of this patch on
 memory utilization is.  Reducing NTUP_PER_BUCKET from 10 to 1 is going
 to use, on the average, 10x as much bucket-header memory per tuple.
 Specifically, I think it means we'll use about 8 bytes of
 bucket-header memory per tuple instead of 0.8 bytes per tuple.  If the
 tuples are narrow, that could be significant; concerns have been
 expressed about that here in the past.  Increasing the number of
 buckets could also increase memory usage.  On the other hand, the
 dense allocation stuff probably saves a ton of memory, so maybe we end
 up overall, but I'm not sure.  Your thoughts, and maybe some test
 results with narrow and wide tuples, would be appreciated.

The effect of the dense allocation was briefly discussed in this thread,
along with some quick measurements:

http://www.postgresql.org/message-id/53beea9e.2080...@fuzzy.cz

The dense allocation removes pretty much all the palloc overhead. For a
40B tuple, I did get this before the dense allocation

   HashBatchContext: 1451221040 total in 182 blocks; 2826592 free (11
chunks); 

Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
  Um, libpq has recently gained the ability to return result fragments,
  right?  Those didn't exist when libpq-ification of odbc was attempted,
  as I recall -- perhaps it's possible now.
 
  I was trying to remember off-hand if we still had that or not..  I
  thought there was discussion about removing it, actually, but perhaps
  that was something else.
 
 Sure,
 http://www.postgresql.org/docs/devel/static/libpq-single-row-mode.html
 That's a done deal, it won't be going away.

Ugh.  Yes, there's single-row mode, but I had been thinking there was a
'batch' mode available ala what OCI8 had, where you'd allocate a chunk
of memory and then have it filled directly by the library as rows came
back in until it was full (there was a similar 'bulk send' operation, as
I recall).  Perhaps it was the 'pipelining' thread that I was thinking
about.  Not really relevant, in any case.

 Whether it would solve ODBC's problem I don't know (and I'm not
 volunteering to do the work ;-))

It could work..  though it's certainly been a while since I looked at
the ODBC internals.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Robert Haas
On Mon, Aug 18, 2014 at 7:54 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 In order to support alternatives to OpenSSL, we need to wean off
 applications from using PQgetssl(). To do that, we have to provide an
 alternative API to get the same information. PQgetSSL() returns a pointer
 directly to the OpenSSL private struct, and you can do anything with that.
 We cannot have a generic interface that exposes everything, so we need to
 identify the information that people actually want, and expose that.

I have a hard time believing that something like this will really
satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
call it good?  We can try to be incredibly thorough in exposing the
information people want and we will still inevitably miss something
that someone cares about; worse, we'll spend an awful lot of time and
energy along the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Magnus Hagander
On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Aug 18, 2014 at 7:54 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 In order to support alternatives to OpenSSL, we need to wean off
 applications from using PQgetssl(). To do that, we have to provide an
 alternative API to get the same information. PQgetSSL() returns a pointer
 directly to the OpenSSL private struct, and you can do anything with that.
 We cannot have a generic interface that exposes everything, so we need to
 identify the information that people actually want, and expose that.

 I have a hard time believing that something like this will really
 satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
 call it good?  We can try to be incredibly thorough in exposing the
 information people want and we will still inevitably miss something
 that someone cares about; worse, we'll spend an awful lot of time and
 energy along the way.

Well, for one you push the full burden onto the application. Then
every application has to support every SSL library we do, even for the
simplest check. And it has to be built against the same one. (So for
example if someone wants to use openssl on windows - yes there might
still be reasons for that even if we support schannel - they have to
rebuild every one of their applications. And every one of their higher
level language drivers sitting on top of openssl).

The same problem of course appears on say Linux, if you end up using a
mix of openssl and gnutls or a mix of nss and openssl for example.
It's not likely to happen as long as you only use the officially built
packages, but you're likely in for quite a bit of pain if you are
using any non-standard packaging like the oneclick installers etc.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote:
 I have a hard time believing that something like this will really
 satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
 call it good?  We can try to be incredibly thorough in exposing the
 information people want and we will still inevitably miss something
 that someone cares about; worse, we'll spend an awful lot of time and
 energy along the way.

 Well, for one you push the full burden onto the application.

Robert's got a point though: there is always going to be somebody who
wants something we fail to expose.  It's better to be able to say well,
you can do PQgetssl and then munge it for yourself than to have to say
sorry, you're screwed.  So if we're going to define PQgetssl as
returning NULL when you're not using OpenSSL, I don't see why we
shouldn't expose a similarly-defined PQgetXXX for each other underlying
implementation we support.  There will not be that many of 'em, and
I suspect the people with very specific needs will not care about more
than one underlying library anyway.

This does not say that we shouldn't also try to have some
library-independent functionality for interrogating certificate state
etc.  Just that having an escape hatch isn't a bad thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Magnus Hagander
On Tue, Aug 19, 2014 at 9:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote:
 I have a hard time believing that something like this will really
 satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
 call it good?  We can try to be incredibly thorough in exposing the
 information people want and we will still inevitably miss something
 that someone cares about; worse, we'll spend an awful lot of time and
 energy along the way.

 Well, for one you push the full burden onto the application.

 Robert's got a point though: there is always going to be somebody who
 wants something we fail to expose.  It's better to be able to say well,
 you can do PQgetssl and then munge it for yourself than to have to say
 sorry, you're screwed.  So if we're going to define PQgetssl as
 returning NULL when you're not using OpenSSL, I don't see why we
 shouldn't expose a similarly-defined PQgetXXX for each other underlying
 implementation we support.  There will not be that many of 'em, and
 I suspect the people with very specific needs will not care about more
 than one underlying library anyway.

 This does not say that we shouldn't also try to have some
 library-independent functionality for interrogating certificate state
 etc.  Just that having an escape hatch isn't a bad thing.

I do agree tha thaving both would be useful. We could have something like
int PQgetSSLstruct(void **sslstruct)

which returns the type of struct. Then it's up to the application to
know if it can handle it. For those apps that need a *lot*. But the
basic attributes - something like the list from apache - should be
retrievable in a library independent way.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 On 08/19/2014 06:44 PM, Stephen Frost wrote:
 Hmm. That seems a bit too much. Perhaps provide just the certificate
 itself in DER/PEM format, and have the client parse it (using
 OpenSSL or something else) if it wants more details.
 I really don't care for that approach.  Our SSL support has always been
 horrible- I was hoping we'd actually improve that situation.  Adding
 things in piecemeal over time will just be painful for our users and I
 don't see why we should wait.
 
 What would you like to do with the certificates?

In applications which I've developed in the past, I've had to rely on
the CN, serial number, and signing root CA to ensure that there were no
duplicates (this was a government environment which trusted multiple
independent root CAs, and there was no guarantee that even a given CA
wouldn't issue the same serial number to different individuals).  In
other cases, I've had to rely on the fingerprint, but that gets painful
when you have certificate roll-over since you then have to re-enroll
individuals when they get issued a new certificate.  I've also
implemented systems which have certificate expiration warnings.
Checking the extended attributes of the certificate has been a
requirement in the past (to verify it's only being used for its intended
purpose).

One of the things we don't support today is anything beyond matching on
the CN of the certificate in pg_ident, to map from a client certificate
to a PG role.  That wouldn't be acceptable in environments I've worked
in because two different individuals could have identical CNs.  Another
interesting twist are systems (such as Windows..) where the client
certificate to be presented depends on which root CA the server's
certificate is signed with.

I'm not asking this patch to fix that, but you asked what else a
developer might be looking for when it comes to SSL and I'm telling you
things I've actively used.  Generally speaking, these have been on the
server side (eg: with mod_ssl), but I could see a client wanting to use
them, and if we abstract getting this information on the server side to
meet the needs I've described above, wouldn't we be able to (and want
to) share that abstraction with users of libpq?

 I'm imagining that a GUI tool like pgAdmin might want to extract all
 information from the certificate, display it in a window, and let
 the user look at the whole chain and all the fields.

While that'd certainly be nice, it's not what I'm referring to and I
agree that having a third party library to handle that makes sense, as
some operating systems do.  In general, I'm all for more (and better)
integration with the OS-provided certificate systems.  For one thing,
they also can address the issues around ensuring that the client side
certificate is encrypted-at-rest, and can handle prompting the user for
the passphrase to decrypt it.

 But I don't think that exists in OpenSSL, let alone
 in other libraries, and the attribute names would be all different
 anyway.

As I said- let's look at mod_ssl/gnutls as a minimum set to start with..
That's certainly a set I'm familiar with and one which I expect most
other developers who work with SSL are also.  There are bits missing
from that list (mainly around the extended attributes..), but it's
certainly better than the list originally proposed.

 But if we provide an interface to grab the whole certificate chain,
 then you can use any library you want to parse and present it to the
 user.

Yes- we should do this also because there may be cases where the app
developers wants to pass that off to another library or do something
else with it, sure.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 I think you just packed up the goalposts for a one-way trip to Mars,
 but I wonder: What would you consider proper SSL support? What
 exactly are we missing?

I hit on a few things in my other email, but there is a huge portion of
SSL which is just about making it easy and sensible to install and get
working properly.  Apache is a good example of how to do this and is one
that a lot of people are familiar with.  Specific issues that I recall
running into are lack of the 'directory' options for certificates,
having trouble figuring out the right format and structure to provide
the complete root chain for the server's certificate and then trying to
figure out how to add intermediate and additional root CAs for client
certificates, getting CRLs to work was a pain, and nothing about how to
get OCSP working.

I think there's been some improvement since I last had to go through the
pain of setting this all up, and some of it is undoubtably OpenSSL's
fault, but there's definitely quite a bit more we could be doing to make
SSL support easier.  I'm hopeful that I'll be able to spend more time on
this in the future but it's not a priority currently.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Robert Haas
On Tue, Aug 19, 2014 at 3:16 PM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Aug 19, 2014 at 9:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote:
 I have a hard time believing that something like this will really
 satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
 call it good?  We can try to be incredibly thorough in exposing the
 information people want and we will still inevitably miss something
 that someone cares about; worse, we'll spend an awful lot of time and
 energy along the way.

 Well, for one you push the full burden onto the application.

 Robert's got a point though: there is always going to be somebody who
 wants something we fail to expose.  It's better to be able to say well,
 you can do PQgetssl and then munge it for yourself than to have to say
 sorry, you're screwed.  So if we're going to define PQgetssl as
 returning NULL when you're not using OpenSSL, I don't see why we
 shouldn't expose a similarly-defined PQgetXXX for each other underlying
 implementation we support.  There will not be that many of 'em, and
 I suspect the people with very specific needs will not care about more
 than one underlying library anyway.

 This does not say that we shouldn't also try to have some
 library-independent functionality for interrogating certificate state
 etc.  Just that having an escape hatch isn't a bad thing.

 I do agree tha thaving both would be useful. We could have something like
 int PQgetSSLstruct(void **sslstruct)

I think it's likely smarter to have totally separate functions.
First, to make it less likely that users will try to use a pointer to
one type of object as a pointer to some other kind of object.  And
second, because you might, for example, someday have an SSL
implementation that wants to return two pointers.  May as well make
that kind of thing easy.

BTW, if we're beating on libpq, I wonder if we shouldn't consider
bumping the soversion at some point.  I mean, I know that we
technically don't need to do that if we're only *adding* functions and
not changing any of the existing stuff in backward-incompatible ways,
but we might *want* to make some backward-incompatible changes at some
point, and I think there's a decent argument that any patch in this
are is already doing that at least to PQgetSSL().  Maybe this would be
a good time to think if there's anything else we want to do that
would, either by itself or in combination, justify a bump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread Peter Eisentraut
On 8/19/14 9:35 AM, MauMau wrote:
 pg_copy is for copying a file reliably by syncing.  If sync is not
 necessary, people can use cp/copy.

I'm getting mixed messages from this thread.

I think there could be a fair amount of support for a new tool that can
serve as a universal plug-and-play archive_command, with a variety of
options, such as fsync yes/no, fadvise yes/no, direct-io[*] yes/no,
atomic copy yes/no, allow overwrite yes/no, compression yes/no.  That
would reduce the need for users to compose adventurous shell commands,
and it would set out the various options more clearly.

This is not that.  This is cp+fsync with a hardcoded fadvise policy and
optional direct-io.  That is a valid problem to address, but it is one
of many.  On the other hand, I fear that the addition of this
single-and-a-half-purpose tool would make the overall landscape more
complicated than it already is.  Since it's in the examples, people will
probably use it, even if they don't need to or shouldn't.  And not
recommending it for the restore_command is also confusing.

Another example of how confusing all of this is: On Windows, the copy
command by default doesn't overwrite files, which is what we want
(usually).  The patch changes those instances of copy to pg_copy, but it
doesn't have that behavior.  Should the examples by changed to do a test
 pg_copy on Windows (what's the Windows shell syntax for that?), or
should pg_copy have an option to not overwrite a file?  How do you then
avoid inconsistencies with the Unix behavior?  Or what if you want fsync
but allow overwriting on Windows?

On the technical side, I think if you fsync a new file, you also need to
fsync the directory, to make sure the file is certain to be visible
after a crash.


[*] I keep reading directio as a typo of direction, so please
consider putting a hyphen or underscore in the option and variable
names. ;-)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread Peter Eisentraut
On 8/15/14 10:46 AM, Fujii Masao wrote:
 At last, the big question is, is there really no OS command which provides
 the same functionality as pg_copy does? If there is, I'd like to avoid 
 duplicate
 work basically.

If you look hard enough, you can maybe find an OS command that can fsync
a file after it was copied.  Some versions of dd can do that, and some
systems have an fsync program.  But it's not clear whether all systems
have that, and it probably won't be simple and consistent.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 BTW, if we're beating on libpq, I wonder if we shouldn't consider
 bumping the soversion at some point.  I mean, I know that we
 technically don't need to do that if we're only *adding* functions and
 not changing any of the existing stuff in backward-incompatible ways,
 but we might *want* to make some backward-incompatible changes at some
 point, and I think there's a decent argument that any patch in this
 are is already doing that at least to PQgetSSL().  Maybe this would be
 a good time to think if there's anything else we want to do that
 would, either by itself or in combination, justify a bump.

I'm not a big fan of doing it for this specific item, though it's
technically an API breakage (which means we should actually have
libpq2-dev packages, make everything that build-deps on libpq-dev
update to build-dep on libpq2-dev, have libpq6, etc..).  If there are
other backwards-incompatible things we wish to do, then I agree that
it'd be good to do them all at the same time (perhaps in conjunction
with 10.0...).  This is the part where I wish we had been keeping an
updated list of things we want to change (like on the wiki..).

It's certainly not a fun transistion to go through.  I also wonder if
we're going to need to worry about what happens when libpq5 and libpq6
end up linked into the same running application.  I don't think we
have any symbol versioning or anything to address that risk in place..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread Josh Berkus
On 08/19/2014 12:37 PM, Peter Eisentraut wrote:
 On 8/19/14 9:35 AM, MauMau wrote:
 pg_copy is for copying a file reliably by syncing.  If sync is not
 necessary, people can use cp/copy.
 
 I'm getting mixed messages from this thread.
 
 I think there could be a fair amount of support for a new tool that can
 serve as a universal plug-and-play archive_command, with a variety of
 options, such as fsync yes/no, fadvise yes/no, direct-io[*] yes/no,
 atomic copy yes/no, allow overwrite yes/no, compression yes/no.  That
 would reduce the need for users to compose adventurous shell commands,
 and it would set out the various options more clearly.

Yes please!

Although I'm not sold on the idea of using DirectIO for this.  Is there
really enough benefit to make it worth the trouble?

 
 This is not that.  This is cp+fsync with a hardcoded fadvise policy and
 optional direct-io.  That is a valid problem to address, but it is one
 of many.  On the other hand, I fear that the addition of this
 single-and-a-half-purpose tool would make the overall landscape more
 complicated than it already is.  Since it's in the examples, people will
 probably use it, even if they don't need to or shouldn't.  And not
 recommending it for the restore_command is also confusing.

I'm afraid that I agree with Peter here.  pg_copy looks like a nice
foundation for the eventual pg_copy utility we need, but it's not there yet.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3: more problems with Could not open file pg_multixact/members/xxxx

2014-08-19 Thread Jeff Janes
On Tue, Jul 15, 2014 at 3:58 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera alvhe...@2ndquadrant.com
  wrote:

 Jeff Janes wrote:

  This problem was initially fairly easy to reproduce, but since I
  started adding instrumentation specifically to catch it, it has become
  devilishly hard to reproduce.
 
  I think my next step will be to also log each of the values which goes
  into the complex if (...) expression that decides on the deletion.

 Could you please to reproduce it after updating to latest?  I pushed
 fixes that should close these issues.  Maybe you want to remove the
 instrumentation you added, to make failures more likely.


 There are still some problems in 9.4, but I haven't been able to diagnose
 them and wanted to do more research on it.  The announcement of upcoming
 back-branches for 9.3 spurred me to try it there, and I have problems with
 9.3 (12c5bbdcbaa292b2a4b09d298786) as well.  The move of truncation to the
 checkpoint seems to have made the problem easier to reproduce.  On an 8
 core machine, this test fell over after about 20 minutes, which is much
 faster than it usually reproduces.

 This the error I get:

 2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR:  could not access status of
 transaction 85837221
 2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL:  Could not open file
 pg_multixact/members/14031: No such file or directory.
 2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT:  SQL statement SELECT 1
 FROM ONLY public.foo_parent x WHERE id OPERATOR(pg_catalog.=) $1 FOR
 KEY SHARE OF x

 The testing harness is attached as 3 patches that must be made to the test
 server, and 2 scripts. The script do.sh sets up the database (using fixed
 paths, so be careful) and then invokes count.pl in a loop to do the
 actual work.


Sorry, after a long time when I couldn't do much testing on this, I've now
been able to get back to it.

It looks like what is happening is that  checkPoint.nextMultiOffset wraps
around from 2^32 to 0, even if 0 is still being used.  At that point it
starts deleting member files that are still needed.

Is there some interlock which is supposed to prevent from
 checkPoint.nextMultiOffset rom lapping iself?  I haven't been able to find
it.  It seems like the interlock applies only to MultiXid, not the Offset.

Thanks,

Jeff


Re: [HACKERS] 9.3: more problems with Could not open file pg_multixact/members/xxxx

2014-08-19 Thread Andres Freund
On August 19, 2014 10:24:20 PM CEST, Jeff Janes jeff.ja...@gmail.com wrote:
On Tue, Jul 15, 2014 at 3:58 PM, Jeff Janes jeff.ja...@gmail.com
wrote:

 On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera
alvhe...@2ndquadrant.com
  wrote:

 Jeff Janes wrote:

  This problem was initially fairly easy to reproduce, but since I
  started adding instrumentation specifically to catch it, it has
become
  devilishly hard to reproduce.
 
  I think my next step will be to also log each of the values which
goes
  into the complex if (...) expression that decides on the deletion.

 Could you please to reproduce it after updating to latest?  I pushed
 fixes that should close these issues.  Maybe you want to remove the
 instrumentation you added, to make failures more likely.


 There are still some problems in 9.4, but I haven't been able to
diagnose
 them and wanted to do more research on it.  The announcement of
upcoming
 back-branches for 9.3 spurred me to try it there, and I have problems
with
 9.3 (12c5bbdcbaa292b2a4b09d298786) as well.  The move of truncation
to the
 checkpoint seems to have made the problem easier to reproduce.  On an
8
 core machine, this test fell over after about 20 minutes, which is
much
 faster than it usually reproduces.

 This the error I get:

 2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR:  could not access
status of
 transaction 85837221
 2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL:  Could not open file
 pg_multixact/members/14031: No such file or directory.
 2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT:  SQL statement
SELECT 1
 FROM ONLY public.foo_parent x WHERE id OPERATOR(pg_catalog.=)
$1 FOR
 KEY SHARE OF x

 The testing harness is attached as 3 patches that must be made to the
test
 server, and 2 scripts. The script do.sh sets up the database (using
fixed
 paths, so be careful) and then invokes count.pl in a loop to do the
 actual work.


Sorry, after a long time when I couldn't do much testing on this, I've
now
been able to get back to it.

It looks like what is happening is that  checkPoint.nextMultiOffset
wraps
around from 2^32 to 0, even if 0 is still being used.  At that point it
starts deleting member files that are still needed.

Is there some interlock which is supposed to prevent from
checkPoint.nextMultiOffset rom lapping iself?  I haven't been able to
find
it.  It seems like the interlock applies only to MultiXid, not the
Offset.

There is none (and there never has been one either). I've complained about it a 
couple of times but nobody, me included, had time and energy to fix that :(

Andres


--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-19 Thread Alvaro Herrera
MauMau wrote:

 With that said, copying to a temporary file like dest.tmp and
 renaming it to dest sounds worthwhile even as a basic copy
 utility.  I want to avoid copying to a temporary file with a fixed
 name like _copy.tmp, because some advanced utility may want to run
 multiple instances of pg_copy to copy several files into the same
 directory simultaneously.  However, I'm afraid multiple dest.tmp
 files might continue to occupy disk space after canceling copy or
 power failure in some use cases, where the copy of the same file
 won't be retried.  That's also the reason why I chose to not use a
 temporary file like cp/copy.

Is there a way to create a link to a file which only exists as an open
file descriptor?   If there was, you could create a temp file, open an
fd, then delete the file.  That would remove the issue with files being
leaked due to failures of various kinds.

Also, it's been mentioned that this utility might be useful for
restore_command.  That sounds good I guess, but need to keep the
RECOVERYXLOG trick in mind.  I remember a case of stalled replay because
the restore command was writing to RECOVERYXLOG.gz and ungzipping, and
the unlink(RECOVERYXLOG) call failed after a partial copy and so did
the copy from the archive.  (Removing the borked RECOVERYXLOG.gz fixed
it.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Heikki Linnakangas

On 08/19/2014 10:31 PM, Robert Haas wrote:

On Tue, Aug 19, 2014 at 3:16 PM, Magnus Hagander mag...@hagander.net wrote:

On Tue, Aug 19, 2014 at 9:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Magnus Hagander mag...@hagander.net writes:

On Tue, Aug 19, 2014 at 8:49 PM, Robert Haas robertmh...@gmail.com wrote:

I have a hard time believing that something like this will really
satisfy anyone.  Why not just add PQgetSchannelHandleOrWhatever() and
call it good?  We can try to be incredibly thorough in exposing the
information people want and we will still inevitably miss something
that someone cares about; worse, we'll spend an awful lot of time and
energy along the way.



Well, for one you push the full burden onto the application.


Robert's got a point though: there is always going to be somebody who
wants something we fail to expose.  It's better to be able to say well,
you can do PQgetssl and then munge it for yourself than to have to say
sorry, you're screwed.  So if we're going to define PQgetssl as
returning NULL when you're not using OpenSSL, I don't see why we
shouldn't expose a similarly-defined PQgetXXX for each other underlying
implementation we support.  There will not be that many of 'em, and
I suspect the people with very specific needs will not care about more
than one underlying library anyway.

This does not say that we shouldn't also try to have some
library-independent functionality for interrogating certificate state
etc.  Just that having an escape hatch isn't a bad thing.


Yeah, wouldn't hurt I guess.


I do agree tha thaving both would be useful. We could have something like
int PQgetSSLstruct(void **sslstruct)


I think it's likely smarter to have totally separate functions.
First, to make it less likely that users will try to use a pointer to
one type of object as a pointer to some other kind of object.  And
second, because you might, for example, someday have an SSL
implementation that wants to return two pointers.  May as well make
that kind of thing easy.


The struct it returns is totally SSL-implementation specific anyway, so 
for an implementation that would like to return two structs, you could 
well define it to return a struct like:


struct {
CoolStructA *a;
CoolStructB *b;
} CoolSSLStruct;

I don't much like adding a separate function for every SSL 
implementation, but you've got a point that it would be nice to make it 
difficult to call PQgetSSLstruct() and just assume that the returned 
struct is e.g an OpenSSL struct, while it's actually something else. 
Perhaps:


int PQgetSSLstruct(void **sslstruct, char *structname)

You'd call it like PQgetSSLStruct(mystruct, openssl), and it checks 
that the argument matches the library actually been used, otherwise it 
returns an error. And if you need to return two structs, you'd call it 
twice: PQgetSSLStruct(a, cool_a) and PQgetSSLStruct(b, cool_b).


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-08-19 Thread Heikki Linnakangas

On 08/20/2014 12:17 AM, John Lumby wrote:

I am attaching a new version of the patch for consideration in the current 
commit fest.


Thanks for working on this!


Relative to the one I submitted on 25 June in 
bay175-w412ff89303686022a9f16aa3...@phx.gbl
the method for handling aio completion using sigevent has been re-written to use
signals exclusively rather than a composite of signals and LWlocks,
and this has fixed the problem I mentioned before with the LWlock method.


ISTM the patch is still allocating stuff in shared memory that really 
doesn't belong there. Namely, the BufferAiocb structs. Or at least parts 
of it; there's also a waiter queue there which probably needs to live in 
shared memory, but the rest of it does not.


At least BufAWaitAioCompletion is still calling aio_error() on an AIO 
request that might've been initiated by another backend. That's not OK.


Please write the patch without atomic CAS operation. Just use a 
spinlock. There's a patch in the commitfest to add support for that, but 
it's not committed yet, and all those USE_AIO_ATOMIC_BUILTIN_COMP_SWAP 
ifdefs make the patch more difficult to read. Same with all the other 
#ifdefs; please just pick a method that works.


Also, please split prefetching of regular index scans into a separate 
patch. It's orthogonal to doing async I/O; we could prefetch regular 
index scans with posix_fadvise too, and AIO should be useful for 
prefetching other stuff.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund
On 2014-08-20 00:58:22 +0300, Heikki Linnakangas wrote:
 I don't much like adding a separate function for every SSL implementation,
 but you've got a point that it would be nice to make it difficult to call
 PQgetSSLstruct() and just assume that the returned struct is e.g an OpenSSL
 struct, while it's actually something else. Perhaps:

A good reason to not have functions with the respective functions is
that it requires either including the relevant headers or adding forward
declarations of the libraries type.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-08-19 Thread Claudio Freire
On Tue, Aug 19, 2014 at 7:27 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Also, please split prefetching of regular index scans into a separate patch.
 It's orthogonal to doing async I/O; we could prefetch regular index scans
 with posix_fadvise too, and AIO should be useful for prefetching other
 stuff.

That patch already happened on the list, and it wasn't a win in many
cases. I'm not sure it should be proposed independently of this one.
Maybe a separate patch, but it should be considered dependent on this.

I don't have an archive link at hand atm, but I could produce one later.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-20 00:58:22 +0300, Heikki Linnakangas wrote:
 I don't much like adding a separate function for every SSL implementation,
 but you've got a point that it would be nice to make it difficult to call
 PQgetSSLstruct() and just assume that the returned struct is e.g an OpenSSL
 struct, while it's actually something else. Perhaps:

 A good reason to not have functions with the respective functions is
 that it requires either including the relevant headers or adding forward
 declarations of the libraries type.

It requires no such thing.  What we do for PQgetssl() is declare it as
returning void *, and we could easily do the same for other libraries.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQgetssl() and alternative SSL implementations

2014-08-19 Thread Andres Freund
On 2014-08-19 19:11:46 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-20 00:58:22 +0300, Heikki Linnakangas wrote:
  I don't much like adding a separate function for every SSL implementation,
  but you've got a point that it would be nice to make it difficult to call
  PQgetSSLstruct() and just assume that the returned struct is e.g an OpenSSL
  struct, while it's actually something else. Perhaps:
 
  A good reason to not have functions with the respective functions is
  that it requires either including the relevant headers or adding forward
  declarations of the libraries type.
 
 It requires no such thing.  What we do for PQgetssl() is declare it as
 returning void *, and we could easily do the same for other libraries.

Well, the reason the library specific variant has been called superiour
upthread is the potential for type safety...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] KNN searches support for SP-GiST [GSOC'14]

2014-08-19 Thread Vladislav Sterzhanov
Hi there, pg-Hackers!
Here I go with the patch which brings up the possibility to perform
nearest-neighbour searches on SP-GiSTs (as of now includes implementation
for quad and kd trees). Pre-reviewed by my GSoC mentor Alexander Korotkov.
Sample benchmarking script included in the attachment (dumps the current
geonames archive and runs several searches on the (latitude, longitude)
points), which demonstrates the dramatic improvements against plain
searches and sorting. Regression tests included, compiles and runs
successfully under both of my Ubuntu 12.04 Server and 08/2014 Arch Linux.

Vlad Sterzhanov / Quadrocube
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 56827e5..5214770 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -83,6 +83,7 @@
literalgt;gt;/
literalgt;^/
literal~=/
+   literallt;-gt;/
   /entry
  /row
  row
@@ -95,6 +96,7 @@
literalgt;gt;/
literalgt;^/
literal~=/
+   literallt;-gt;/
   /entry
  /row
  row
@@ -137,6 +139,10 @@
   supports the same operators but uses a different index data structure which
   may offer better performance in some applications.
  /para
+ para
+  By supporting the ordering lt;-gt; operator the quad_point_ops and kd_point_ops provide 
+  a user with the ability to perform a K-nearest-neighbour search over the indexed point dataset.
+ /para
 
 /sect1
 
@@ -539,9 +545,12 @@ CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...
 typedef struct spgInnerConsistentIn
 {
 ScanKey scankeys;   /* array of operators and comparison values */
+ScanKey		orderbyKeys;	/* array of ordering operators and comparison values */
 int nkeys;  /* length of array */
+int norderbys;  /* length of array */
 
 Datum   reconstructedValue; /* value reconstructed at parent */
+Datum		suppValue		/* supplimentary value of parent */
 int level;  /* current level (counting from zero) */
 boolreturnData; /* original data must be returned? */
 
@@ -559,6 +568,8 @@ typedef struct spgInnerConsistentOut
 int*nodeNumbers;/* their indexes in the node array */
 int*levelAdds;  /* increment level by this much for each */
 Datum  *reconstructedValues;/* associated reconstructed values */
+Datum	   *suppValues;		/* any additional data implementation needs to be stored in the child nodes */
+double	   **distances;		/* associated distances */
 } spgInnerConsistentOut;
 /programlisting
 
@@ -573,10 +584,15 @@ typedef struct spgInnerConsistentOut
In particular it is not necessary to check structfieldsk_flags/ to
see if the comparison value is NULL, because the SP-GiST core code
will filter out such conditions.
+   structfieldorderbyKeys/, of length structfieldnorderbys/,
+   describes ordering operators (if any) in the same fashion.
structfieldreconstructedValue/ is the value reconstructed for the
parent tuple; it is literal(Datum) 0/ at the root level or if the
functioninner_consistent/ function did not provide a value at the
parent level.
+   structfieldsuppValue/ is any additional value that an implementation
+   decided to store for the parent node (literal(Datum) 0/ in case the 
+   current node is root).
structfieldlevel/ is the current inner tuple's level, starting at
zero for the root level.
structfieldreturnData/ is literaltrue/ if reconstructed data is
@@ -592,7 +608,6 @@ typedef struct spgInnerConsistentOut
structfieldnNodes/ is the number of child nodes contained in the
inner tuple, and
structfieldnodeLabels/ is an array of their label values, or
-   NULL if the nodes do not have labels.
   /para
 
   para
@@ -608,9 +623,17 @@ typedef struct spgInnerConsistentOut
structfieldreconstructedValues/ to an array of the values
reconstructed for each child node to be visited; otherwise, leave
structfieldreconstructedValues/ as NULL.
+   structfieldsuppValues/ serves the similiar purpose of holding
+   the implementation-defined data for the inner nodes.
+   structfielddistances/ if the ordered search is carried out,
+   the implementation is supposed to fill them in accordance to the
+   ordering operators provided in structfieldorderbyKeys/
+   (nodes with lowest distances will be processed first). Leave it
+   NULL otherwise.
Note that the functioninner_consistent/ function is
responsible for palloc'ing the
-   structfieldnodeNumbers/, structfieldlevelAdds/ and
+   structfieldnodeNumbers/, structfieldlevelAdds/,
+   structfielddistances/, structfieldsuppValues/ and
structfieldreconstructedValues/ arrays.
   /para
  /listitem
@@ -636,7 +659,9 @@ CREATE FUNCTION 

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-08-19 Thread Michael Paquier
On Thu, Jun 19, 2014 at 6:40 PM, Vik Fearing vik.fear...@dalibo.com wrote:

 On 04/30/2014 11:41 PM, Tom Lane wrote:
  We really oughta fix the WAL situation, not just band-aid around it.

 After re-reading this thread, it is not clear that anyone is going to
 work on it so I'll just ask:

 Is anyone working on this?

 If not, I'd like to put it on my plate.

Vik, did you get time to look at that finally?
Regards,
-- 
Michael


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-19 Thread Josh Berkus
On 08/15/2014 04:19 PM, Tom Lane wrote:
 Personally I'd prefer to go to the all-lengths approach, but a large
 part of that comes from a subjective assessment that the hybrid approach
 is too messy.  Others might well disagree.
 
 In case anyone else wants to do measurements on some more data sets,
 attached is a copy of Heikki's patch updated to apply against git tip.

Note that this is not 100% comparable because I'm running it against git
clone, and the earlier tests were against beta2.  However, the Heikki
patch looks like a bust on this dataset -- see below.

postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));
 pg_size_pretty

 394 MB
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));

 pg_size_pretty

 542 MB

Extraction Test:

postgres=# explain analyze select row_to_json - 'kt1_total_sum' from
jsonbish where row_to_json @ '{ rpt_per_dt : 2003-06-30 }';
QUERY
PLAN
---
 Bitmap Heap Scan on jsonbish  (cost=29.55..582.92 rows=200 width=18)
(actual time=22.742..5281.823 rows=100423 loops=1)
   Recheck Cond: (row_to_json @ '{rpt_per_dt: 2003-06-30}'::jsonb)
   Heap Blocks: exact=1471
   -  Bitmap Index Scan on jsonbish_row_to_json_idx  (cost=0.00..29.50
rows=200 width=0) (actual time=22.445..22.445 rows=100423 loops=1)
 Index Cond: (row_to_json @ '{rpt_per_dt: 2003-06-30}'::jsonb)
 Planning time: 0.095 ms
 Execution time: 5292.047 ms
(7 rows)

So, that extraction test is about 1% *slower* than the basic Tom Lane
lengths-only patch, and still 80% slower than original JSONB.  And it's
the same size as the lengths-only version.

Huh?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-08-19 Thread Noah Misch
On Mon, Aug 18, 2014 at 12:30:40PM +0100, Greg Stark wrote:
 On Tue, Aug 5, 2014 at 3:41 AM, Noah Misch n...@leadboat.com wrote:
  This remains open for 9.4.  Your proposal to revert the feature in 9.4 and 
  fix
  it in 9.5 sounds reasonable.
 
 Ok, I've gone ahead and done this. I'm sorry for the delays and confusion.

Thanks.

  I did try psql-wrapped-expanded-fix-v5.patch with the tests Peter and I 
  posted
  upthread, and those tests now behave as they do in released versions.  What
  cases did you find that still change vs. 9.3?
 
 I was trying to build a spreadsheet of every combination of these
 options. It turns out that 4-dimensional spreadsheets are kind of
 awkward.

What's one query that still behaves differently in 9.5 vs. 9.3 (under
formatting options that exist in both versions)?

 I think the fundamental dilemma was the same that was discussed
 upthread. If wrapped expanded mode should have an extra space padding
 column for the wrap indicators then all expanded modes should have
 that column to be consistent since wrapping shouldn't change the
 amount of padding.

I might agree for a greenfield design, but -1 for changing expanded mode now
to improve consistency in this way.  I predict the complaints from users of
expanded mode in automation would overpower any applause for the consistency.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication commands and log_statements

2014-08-19 Thread Michael Paquier
On Wed, Aug 20, 2014 at 2:06 AM, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Aug 16, 2014 at 10:27 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  I think ideally it would have been better if we could have logged
  replication commands under separate log_level, but as still there
  is no consensus on extending log_statement and nobody is even
  willing to pursue, it seems okay to go ahead and log these under
  'all' level.

 I think the consensus is clearly for a separate GUC.

+1.
-- 
Michael


Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org

2014-08-19 Thread Joshua D. Drake


On 08/19/2014 08:34 AM, Craig Ringer wrote:


I iterate, the current apt.postgresql.org is not doing things correctly.
It breaks things and it shouldn't.


FWIW, this is inconsistent with what yum.postgresql.org does - it takes
the POLA approach of packing the libpq from the major release configured
in the repo. Each major has its own sub-repo.

I find it pretty hard to justify installing a 9.3 libpq alongside a 9.1
server myself.



Exactly.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
If we send our children to Caesar for their education, we should
 not be surprised when they come back as Romans.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-08-19 Thread Michael Paquier
On Fri, Jul 25, 2014 at 4:45 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:

 Given this is a very small and simple patch I thought it's not necessary...

 Added to the next CommitFest.

I had a look at this patch, and here are a couple of comments:
1) Depending on how ArchiveEntry is called to register an object to
dump, namespace may be NULL, but it is not the case
namespace-dobj.name, so you could get the namespace name at the top
of the function that have their verbose output improved with something
like that:
const char *namespace = tbinfo-dobj.namespace ?
   tbinfo-dobj.namespace-dobj.name : NULL;
And then simplify the message output as follows:
if (namespace)
   write_msg(blah \%s\.\%s\ blah, namespace, classname);
else
   write_msg(blah \%s\ blah, classname);
You can as well safely remove the checks on namespace-dobj.name.
2) I don't think that this is correct:
-   ahlog(AH, 1, processing data
for table \%s\\n,
- te-tag);
+   ahlog(AH, 1, processing data
for table \%s\.\%s\\n,
+ AH-currSchema, te-tag);
There are some code paths where AH-currSchema is set to NULL, and I
think that you should use te-namespace instead.
3) Changing only this message is not enough. The following verbose
messages need to be changed too for consistency:
- pg_dump: creating $tag $object
- pg_dump: setting owner and privileges for [blah]

I have been pondering as well about doing similar modifications to the
error message paths, but it did not seem worth it as this patch is
aimed only for the verbose output. Btw, I have basically fixed those
issues while doing the review, and finished with the attached patch.
Fabrizio, is this new version fine for you?
Regards,
-- 
Michael
From e0809869655c9e22cce11955c7286cef8a42bf1d Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Wed, 20 Aug 2014 14:40:40 +0900
Subject: [PATCH] Improve verbose messages of pg_dump with namespace

Namespace is added to the verbose output when it is available, relation
and namespace names are put within quotes for clarity and consistency
with the other tools as well.
---
 src/bin/pg_dump/pg_backup_archiver.c | 26 ---
 src/bin/pg_dump/pg_dump.c| 85 ++--
 2 files changed, 93 insertions(+), 18 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 3aebac8..07cc10e 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -546,8 +546,13 @@ RestoreArchive(Archive *AHX)
 		/* Both schema and data objects might now have ownership/ACLs */
 		if ((te-reqs  (REQ_SCHEMA | REQ_DATA)) != 0)
 		{
-			ahlog(AH, 1, setting owner and privileges for %s %s\n,
-  te-desc, te-tag);
+			/* Show namespace if available */
+			if (te-namespace)
+ahlog(AH, 1, setting owner and privileges for %s \%s\.\%s\\n,
+	  te-desc, te-namespace, te-tag);
+			else
+ahlog(AH, 1, setting owner and privileges for %s \%s\\n,
+	  te-desc, te-tag);
 			_printTocEntry(AH, te, ropt, false, true);
 		}
 	}
@@ -621,7 +626,13 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 
 	if ((reqs  REQ_SCHEMA) != 0)		/* We want the schema */
 	{
-		ahlog(AH, 1, creating %s %s\n, te-desc, te-tag);
+		/* Show namespace if available */
+		if (te-namespace)
+			ahlog(AH, 1, creating %s \%s\.\%s\\n,
+  te-desc, te-namespace, te-tag);
+		else
+			ahlog(AH, 1, creating %s \%s\\n, te-desc, te-tag);
+
 
 		_printTocEntry(AH, te, ropt, false, false);
 		defnDumped = true;
@@ -713,8 +724,13 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te,
 	_becomeOwner(AH, te);
 	_selectOutputSchema(AH, te-namespace);
 
-	ahlog(AH, 1, processing data for table \%s\\n,
-		  te-tag);
+	/* Show namespace if available */
+	if (te-namespace)
+		ahlog(AH, 1, processing data for table \%s\.\%s\\n,
+			  te-namespace, te-tag);
+	else
+		ahlog(AH, 1, processing data for table \%s\\n,
+			  te-tag);
 
 	/*
 	 * In parallel restore, if we created the table earlier in
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5c0f95f..dd7eef9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1383,6 +1383,8 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 {
 	TableDataInfo *tdinfo = (TableDataInfo *) dcontext;
 	TableInfo  *tbinfo = tdinfo-tdtable;
+	const char *namespace = tbinfo-dobj.namespace ?
+		tbinfo-dobj.namespace-dobj.name : NULL;
 	const char *classname = tbinfo-dobj.name;
 	const bool	hasoids = tbinfo-hasoids;
 	const bool	oids = tdinfo-oids;
@@ -1400,7 +1402,16 @@ dumpTableData_copy(Archive *fout, void *dcontext)
 	const char *column_list;
 
 	if (g_verbose)
-		write_msg(NULL, dumping contents of table %s\n, classname);
+	{
+		/* Print namespace information if