Re: [HACKERS] Compression of tables

2013-12-14 Thread Thomas Munro
On 10 December 2013 15:15, Merlin Moncure mmonc...@gmail.com wrote:

 I doubt you'll ever see generally heap compressed data in the way
 you're thinking: postgres has a strong informal policy of not
 implementing features which are dubious and or excessively complicated
 with limited benefit, particularly if there are ways to handle this
 outside the database; there are various operating system level tricks
 that can cause a compressed file or even an entire tablespace (o/s
 folder) masquerade as a regular structures.  So maybe you are asking
 for a feature we already have: CREATE TABLESPACE.

 For example take a look here:

 https://btrfs.wiki.kernel.org/index.php/Compression#How_do_I_enable_compression.3F

 (out of curiosity, if this strategy fits the bill for you I wouldn't
 mind seeing a follow up on how this handles your static data use
 case).


Thanks for the suggestion.  I see your point, those other database
generally do more themselves (direct IO, raw disk devices etc).

So I started experimenting with btrfs.  I copied a 1.1TB pg_data
directory onto a zlib compressed btrfs raid1 filesystem, and it used
~840GB of physical disk.  Not a great compression ratio, but I didn't
explore options other than the default for compression.  Then my 3.2
kernel printed a few nasty messages and all IO locked up...  I should
probably try a more recent kernel!

Googling, I see a number of people have reported success with PG on
ZFS with compression, with 2x to 4x compression and faster scans
due to increased effective IO bandwidth.

This does seem to make a lot of sense for my append-only static data
use case.  I'll have to work on my fear of new filesystems.


Re: [HACKERS] pgsql: Fix a couple of bugs in MultiXactId freezing

2013-12-14 Thread Andres Freund
On 2013-12-13 17:08:46 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
  Afaics this will cause HEAP_KEYS_UPDATED to be reset, is that
  problematic? I don't really remember what it's needed for TBH...
 
 So we do reset HEAP_KEYS_UPDATED, and in general that bit seems critical
 for several things.  So it should be kept when a Xmax is carried over
 from the pre-frozen version of the tuple.  But while reading through
 that, I realize that we should also be keeping HEAP_HOT_UPDATED in that
 case.  And particularly we should never clear HEAP_ONLY_TUPLE.

That's only for the multi-plain xid case tho, right?

 So I think heap_execute_freeze_tuple() is wrong, because it's resetting
 the whole infomask to zero, and then setting it to only the bits that
 heap_prepare_freeze_tuple decided that it needed set.  That seems bogus
 to me.  heap_execute_freeze_tuple() should only clear a certain number
 of bits, and then possibly set some of the same bits; but the remaining
 flags should remain untouched.

Uh, my version and the latest you've sent intiially copy the original
infomask to the freeze plan and then manipulate those. That seems fine
to me. Am I missing something?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stuck spinlock

2013-12-14 Thread Andres Freund
On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote:
 On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas robertmh...@gmail.com wrote:
  On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  And while we're on the subject ... isn't bgworker_die() utterly and
  completely broken?  That unconditional elog(FATAL) means that no process
  using that handler can do anything remotely interesting, like say touch
  shared memory.
 
  Yeah, but for the record (since I see I got cc'd here), that's not my
  fault.  I moved it into bgworker.c, but it's been like that since
  Alvaro's original commit of the bgworker facility
  (da07a1e856511dca59cbb1357616e26baa64428e).
 
 
 Is this an edge case or something that will hit a lot of users?
 Arbitrary server panics seems pretty serious...

Is your question about the bgworker part you're quoting or about the
stuck spinlock stuff? I don't think the bgworker bug is too bad in
practice but the one in handle_sig_alarm() stuff certainly is.

I think while it looks possible to hit problems without statement/lock
timeout, it's relatively unlikely that those are hit in practice.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stuck spinlock

2013-12-14 Thread Andres Freund
Hi,

On 2013-12-13 15:57:14 -0300, Alvaro Herrera wrote:
 If there was a way for raising an #error at compile time whenever a
 worker relies on the existing signal handler, I would vote for doing
 that.  (But then I have no idea how to do such a thing.)

I don't see a way either given how disconnected registration of the
signal handler is from the bgworker infrastructure. I think the best we
can do is to raise an error in BackgroundWorkerUnblockSignals() - and we
should definitely do that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stuck spinlock

2013-12-14 Thread Andres Freund
On 2013-12-13 13:39:42 -0500, Robert Haas wrote:
 On Fri, Dec 13, 2013 at 1:15 PM, Andres Freund and...@2ndquadrant.com wrote:
  Agreed on not going forward like now, but I don't really see how they
  could usefully use die(). I think we should just mandate that every
  bgworker conneced to shared memory registers a sigterm handler - we
  could put a check into BackgroundWorkerUnblockSignals(). We should leave
  the current handler in for unconnected one though...
  bgworkers are supposed to be written as a loop around procLatch, so
  adding a !got_sigterm, probably isn't too hard.
 
 I think the !got_sigterm thing is complete bunk.  If a background
 worker is running SQL queries, it really ought to honor a query cancel
 or sigterm at the next CHECK_FOR_INTERRUPTS().

I am not convinced by the necessity of that, not in general. After all,
the code is using a bgworker and not a normal backend for a reason. If
you e.g. have queing code, it very well might need to serialize its
state to disk before shutting down. Checking whether the bgworker should
shut down every iteration of the mainloop sounds appropriate to me for
such cases.

But I think we should provide a default handler that does the necessary
things to interrupt queries, so bgworker authors don't have to do it
themselves and, just as importantly, we can more easily add new stuff
there.

 +static void
 +handle_sigterm(SIGNAL_ARGS)
 +{
 +   int save_errno = errno;
 +
 +   if (MyProc)
 +   SetLatch(MyProc-procLatch);
 +
 +   if (!proc_exit_inprogress)
 +   {
 +   InterruptPending = true;
 +   ProcDiePending = true;
 +   }
 +
 +   errno = save_errno;
 +}
 
 ...but I'm not 100% sure that's right, either.

If you want a bgworker to behave as close as a normal backend we should
probably really do the full dance die() does. Specifically call
ProcessInterrupts() immediately if ImmediateInterruptOK allows it,
otherwise we'd just continue waiting for locks and similar.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Andres Freund
Hi,

Cool stuff.

On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
 Currently when we need to get ordered result from table we have to choose
 one of two approaches: get results from index in exact order we need or do
 sort of tuples. However, it could be useful to mix both methods: get
 results from index in order which partially meets our requirements and do
 rest of work from heap.

 --
  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
 time=0.097..0.099 rows=10 loops=1)
-  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
 time=0.096..0.097 rows=10 loops=1)
  Sort Key: v1, v2
  Sort Method: top-N heapsort  Memory: 25kB
  -  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
 rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
  Total runtime: 0.125 ms
 (6 rows)

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.

 *partial-knn-1.patch*
 
 KNN-GiST provides ability to get ordered results from index, but this order
 is based only on index information. For instance, GiST index contains
 bounding rectangles for polygons, and we can't get exact distance to
 polygon from index (similar situation is in PostGIS). In attached patch,
 GiST distance method can set recheck flag (similar to consistent method).
 This flag means that distance method returned lower bound of distance and
 we should recheck it from heap.

 See an example.
 
 create table test as (select id, polygon(3+(random()*10)::int,
 circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
 generate_series(1,100) id);
 create index test_idx on test using gist (p);
 
 We can get results ordered by distance from polygon to point.
 
 postgres=# select id, p - point(0.5,0.5) from test order by p -
 point(0.5,0.5) limit 10;

 --
  Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
 rows=10 loops=1)
-  Index Scan using test_idx on test  (cost=0.29..157672.29
 rows=100 width=36) (actual time=0.179..0.228 rows=10 loops=1)
  Order By: (p - '(0.5,0.5)'::point)
  Total runtime: 0.305 ms
 (4 rows)

Rechecking from the heap means adding a sort node though, which I don't
see here? Or am I misunderstanding something?
Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time-Delayed Standbys

2013-12-14 Thread Andres Freund
On 2013-12-13 13:44:30 +, Simon Riggs wrote:
 On 13 December 2013 13:22, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-12-13 13:09:13 +, Simon Riggs wrote:
  On 13 December 2013 11:58, Andres Freund and...@2ndquadrant.com wrote:
   I removed it because it was after the pause. I'll replace it, but
   before the pause.
  
   Doesn't after the pause make more sense? If somebody promoted while we
   were waiting, we want to recognize that before rolling forward? The wait
   can take a long while after all?
 
  That would change the way pause currently works, which is OOS for that 
  patch.
 
  But this feature isn't pause itself - it's imo something
  independent. Note that we currently
  a) check pause again after recoveryApplyDelay(),
  b) do check for promotion if the sleep in recoveryApplyDelay() is
 interrupted. So not checking after the final sleep seems confusing.
 
 I'm proposing the attached patch.

LOoks good, although I'd move it down below the comment ;)

 This patch implements a consistent view of recovery pause, which is
 that when paused, we don't check for promotion, during or immediately
 after. That is user noticeable behaviour and shouldn't be changed
 without thought and discussion on a separate thread with a clear
 descriptive title. (I might argue in favour of it myself, I'm not yet
 decided).

Some more improvements in that are certainly would be good...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Alexander Korotkov
Hi!

Thanks for feedback!

On Sat, Dec 14, 2013 at 4:54 PM, Andres Freund and...@2ndquadrant.comwrote:

 Hi,

 Cool stuff.

 On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
  Currently when we need to get ordered result from table we have to choose
  one of two approaches: get results from index in exact order we need or
 do
  sort of tuples. However, it could be useful to mix both methods: get
  results from index in order which partially meets our requirements and do
  rest of work from heap.

 
 --
   Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
  time=0.097..0.099 rows=10 loops=1)
 -  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
  time=0.096..0.097 rows=10 loops=1)
   Sort Key: v1, v2
   Sort Method: top-N heapsort  Memory: 25kB
   -  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
  rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
   Total runtime: 0.125 ms
  (6 rows)

 Is that actually all that beneficial when sorting with a bog standard
 qsort() since that doesn't generally benefit from data being pre-sorted?
 I think we might need to switch to a different algorithm to really
 benefit from mostly pre-sorted input.


In this patch I don't do full sort of dataset. For instance, index returns
data ordered by first column and we need to order them also by second
column. Then this node sorts groups (assumed to be small) where values of
the first column are same by value of second column. And with limit clause
only required number of such groups will be processed. But, I don't think
we should expect pre-sorted values of second column inside a group.


  *partial-knn-1.patch*
 
  KNN-GiST provides ability to get ordered results from index, but this
 order
  is based only on index information. For instance, GiST index contains
  bounding rectangles for polygons, and we can't get exact distance to
  polygon from index (similar situation is in PostGIS). In attached patch,
  GiST distance method can set recheck flag (similar to consistent method).
  This flag means that distance method returned lower bound of distance and
  we should recheck it from heap.

  See an example.
 
  create table test as (select id, polygon(3+(random()*10)::int,
  circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
  generate_series(1,100) id);
  create index test_idx on test using gist (p);
 
  We can get results ordered by distance from polygon to point.
 
  postgres=# select id, p - point(0.5,0.5) from test order by p -
  point(0.5,0.5) limit 10;

 
 --
   Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
  rows=10 loops=1)
 -  Index Scan using test_idx on test  (cost=0.29..157672.29
  rows=100 width=36) (actual time=0.179..0.228 rows=10 loops=1)
   Order By: (p - '(0.5,0.5)'::point)
   Total runtime: 0.305 ms
  (4 rows)

 Rechecking from the heap means adding a sort node though, which I don't
 see here? Or am I misunderstanding something?


KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked
inside GiST and reinserted into same RB-tree. It appears to be much easier
implementation for PoC and also looks very efficient. I'm not sure what is
actually right design for it. This is what I like to discuss.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Jeremy Harris

On 14/12/13 12:54, Andres Freund wrote:

On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:

Currently when we need to get ordered result from table we have to choose
one of two approaches: get results from index in exact order we need or do
sort of tuples. However, it could be useful to mix both methods: get
results from index in order which partially meets our requirements and do
rest of work from heap.



--
  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
time=0.097..0.099 rows=10 loops=1)
-  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
time=0.096..0.097 rows=10 loops=1)
  Sort Key: v1, v2
  Sort Method: top-N heapsort  Memory: 25kB
  -  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
  Total runtime: 0.125 ms
(6 rows)


Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.


Eg:  http://www.postgresql.org/message-id/5291467e.6070...@wizmail.org

Maybe Alexander and I should bash our heads together.

--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Martijn van Oosterhout
On Sat, Dec 14, 2013 at 06:21:18PM +0400, Alexander Korotkov wrote:
  Is that actually all that beneficial when sorting with a bog standard
  qsort() since that doesn't generally benefit from data being pre-sorted?
  I think we might need to switch to a different algorithm to really
  benefit from mostly pre-sorted input.
 
 
 In this patch I don't do full sort of dataset. For instance, index returns
 data ordered by first column and we need to order them also by second
 column. Then this node sorts groups (assumed to be small) where values of
 the first column are same by value of second column. And with limit clause
 only required number of such groups will be processed. But, I don't think
 we should expect pre-sorted values of second column inside a group.

Nice. I imagine this would be mostly beneficial for fast-start plans,
since you no longer need to sort the whole table prior to returning the
first tuple.

Reduced memory usage might be a factor, especially for large sorts
where you otherwise might need to spool to disk.

You can now use an index on (a) to improve sorting for (a,b).

Cost of sorting n groups of size l goes from O(nl log nl) to just O(nl
log l), useful for large n.

Minor comments:

I find cmpTuple a bad name. That's what it's doing but perhaps
cmpSkipColumns would be clearer.

I think it's worthwhile adding a seperate path for the skipCols = 0
case, to avoid extra copies.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Andres Freund
Hi,

Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
   time=0.097..0.099 rows=10 loops=1)
  -  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
   time=0.096..0.097 rows=10 loops=1)
Sort Key: v1, v2
Sort Method: top-N heapsort  Memory: 25kB
-  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
   rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
Total runtime: 0.125 ms
   (6 rows)
 
  Is that actually all that beneficial when sorting with a bog standard
  qsort() since that doesn't generally benefit from data being pre-sorted?
  I think we might need to switch to a different algorithm to really
  benefit from mostly pre-sorted input.
 
 
 In this patch I don't do full sort of dataset. For instance, index returns
 data ordered by first column and we need to order them also by second
 column.

Ah, that makes sense.

 But, I don't think we should expect pre-sorted values of second column
 inside a group.

Yes, if you do it that way, there doesn't seem to any need to assume
that any more than we usually do.

I think you should make the explain output reflect the fact that we're
assuming v1 is presorted and just sorting v2. I'd be happy enough with:
Sort Key: v1, v2
Partial Sort: v2
or even just
Partial Sort Key: [v1,] v2
but I am sure others disagree.

   *partial-knn-1.patch*

  Rechecking from the heap means adding a sort node though, which I don't
  see here? Or am I misunderstanding something?

 KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked
 inside GiST and reinserted into same RB-tree. It appears to be much easier
 implementation for PoC and also looks very efficient. I'm not sure what is
 actually right design for it. This is what I like to discuss.

I don't have enough clue about gist to say wether it's the right design,
but it doesn't look wrong to my eyes. It'd probably be useful to export
the knowledge that we are rechecking and how often that happens to the
outside.
While I didn't really look into the patch, I noticed in passing that you
pass a all_dead variable to heap_hot_search_buffer without using the
result - just pass NULL instead, that performs a bit less work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] clang's -Wmissing-variable-declarations shows some shoddy programming

2013-12-14 Thread Andres Freund
Hi,

Compiling postgres with said option in CFLAGS really gives an astounding
number of warnings. Except some bison/flex generated ones, none of them
looks acceptable to me.
Most are just file local variables with a missing static and easy to
fix. Several other are actually shared variables, where people simply
haven't bothered to add the variable to a header. Some of them with
comments declaring that fact, others adding longer comments, even others
adding longer comments about that fact.

I've attached the output of such a compilation run for those without
clang.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
In file included from /home/andres/src/postgresql/src/port/pg_crc.c:21:
/home/andres/src/postgresql/src/include/utils/pg_crc_tables.h:36:14: warning: 
no previous extern declaration for non-static variable 'pg_crc32_table' 
[-Wmissing-variable-declarations]
const uint32 pg_crc32_table[256] = {
 ^
1 warning generated.
In file included from /home/andres/src/postgresql/src/port/pg_crc.c:21:
/home/andres/src/postgresql/src/include/utils/pg_crc_tables.h:36:14: warning: 
no previous extern declaration for non-static variable 'pg_crc32_table' 
[-Wmissing-variable-declarations]
const uint32 pg_crc32_table[256] = {
 ^
1 warning generated.
bootparse.c:1267:5: warning: no previous extern declaration for non-static 
variable 'boot_yychar' [-Wmissing-variable-declarations]
int yychar;
^
bootparse.c:67:25: note: expanded from macro 'yychar'
#define yychar  boot_yychar
^
bootparse.c:1282:5: warning: no previous extern declaration for non-static 
variable 'boot_yynerrs' [-Wmissing-variable-declarations]
int yynerrs;
^
bootparse.c:69:25: note: expanded from macro 'yynerrs'
#define yynerrs boot_yynerrs
^
/home/andres/src/postgresql/src/backend/bootstrap/bootstrap.c:52:9: warning: no 
previous extern declaration for non-static variable 
'bootstrap_data_checksum_version' [-Wmissing-variable-declarations]
uint32  bootstrap_data_checksum_version = 0;/* No checksum 
*/
^
pg_shmem.c:41:15: warning: no previous extern declaration for non-static 
variable 'UsedShmemSegID' [-Wmissing-variable-declarations]
unsigned long UsedShmemSegID = 0;
  ^
pg_shmem.c:42:10: warning: no previous extern declaration for non-static 
variable 'UsedShmemSegAddr' [-Wmissing-variable-declarations]
void   *UsedShmemSegAddr = NULL;
^
1 warning generated.
2 warnings generated.
2 warnings generated.
/home/andres/src/postgresql/src/backend/catalog/heap.c:77:7: warning: no 
previous extern declaration for non-static variable 
'binary_upgrade_next_heap_pg_class_oid' [-Wmissing-variable-declarations]
Oid binary_upgrade_next_heap_pg_class_oid = InvalidOid;
^
/home/andres/src/postgresql/src/backend/catalog/heap.c:78:7: warning: no 
previous extern declaration for non-static variable 
'binary_upgrade_next_toast_pg_class_oid' [-Wmissing-variable-declarations]
Oid binary_upgrade_next_toast_pg_class_oid = InvalidOid;
^
/home/andres/src/postgresql/src/backend/nodes/nodes.c:31:10: warning: no 
previous extern declaration for non-static variable 'newNodeMacroHolder' 
[-Wmissing-variable-declarations]
Node   *newNodeMacroHolder;
^
1 warning generated.
/home/andres/src/postgresql/src/backend/libpq/be-secure.c:116:10: warning: no 
previous extern declaration for non-static variable 'SSLCipherSuites' 
[-Wmissing-variable-declarations]
char   *SSLCipherSuites = NULL;
^
/home/andres/src/postgresql/src/backend/libpq/be-secure.c:119:10: warning: no 
previous extern declaration for non-static variable 'SSLECDHCurve' 
[-Wmissing-variable-declarations]
char   *SSLECDHCurve;
^
/home/andres/src/postgresql/src/backend/libpq/be-secure.c:122:9: warning: no 
previous extern declaration for non-static variable 'SSLPreferServerCiphers' 
[-Wmissing-variable-declarations]
bool   SSLPreferServerCiphers;
   ^
2 warnings generated.
3 warnings generated.
/home/andres/src/postgresql/src/backend/catalog/index.c:72:7: warning: no 
previous extern declaration for non-static variable 
'binary_upgrade_next_index_pg_class_oid' [-Wmissing-variable-declarations]
Oid binary_upgrade_next_index_pg_class_oid = InvalidOid;
^
/home/andres/src/postgresql/src/backend/postmaster/bgworker.c:93:24: warning: 
no previous extern declaration for non-static variable 'BackgroundWorkerData' 
[-Wmissing-variable-declarations]
BackgroundWorkerArray *BackgroundWorkerData;
   ^
/home/andres/src/postgresql/src/backend/libpq/pqcomm.c:102:7: warning: no 
previous extern declaration for non-static variable 'Unix_socket_permissions' 

[HACKERS] Useless Replica Identity: NOTHING noise from psql \d

2013-12-14 Thread Tom Lane
In HEAD:

regression=# \d pg_depend
   Table pg_catalog.pg_depend
   Column|  Type   | Modifiers 
-+-+---
 classid | oid | not null
 objid   | oid | not null
 objsubid| integer | not null
 refclassid  | oid | not null
 refobjid| oid | not null
 refobjsubid | integer | not null
 deptype | char  | not null
Indexes:
pg_depend_depender_index btree (classid, objid, objsubid)
pg_depend_reference_index btree (refclassid, refobjid, refobjsubid)
Replica Identity: NOTHING

Where did that last line come from, and who thinks it's so important
that it should appear by default?  It seems absolutely content-free
even if I were using whatever feature it refers to, since it is
(I presume) the default state.

Please either suppress this entirely or at least condition it on \d+.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Useless Replica Identity: NOTHING noise from psql \d

2013-12-14 Thread Andres Freund
On 2013-12-14 11:27:53 -0500, Tom Lane wrote:
 In HEAD:
 
 regression=# \d pg_depend
Table pg_catalog.pg_depend
Column|  Type   | Modifiers 
 -+-+---
  classid | oid | not null
  objid   | oid | not null
  objsubid| integer | not null
  refclassid  | oid | not null
  refobjid| oid | not null
  refobjsubid | integer | not null
  deptype | char  | not null
 Indexes:
 pg_depend_depender_index btree (classid, objid, objsubid)
 pg_depend_reference_index btree (refclassid, refobjid, refobjsubid)
 Replica Identity: NOTHING
 
 Where did that last line come from, and who thinks it's so important
 that it should appear by default?  It seems absolutely content-free
 even if I were using whatever feature it refers to, since it is
 (I presume) the default state.

Hm. Yes, that's slightly inellegant. It's shown because it's not
actually the normal default normal tables. Just for system tables. Maybe
we should just set it to default (in pg_class) for system tables as
well, and just change it in the relcache.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction Interfaces

2013-12-14 Thread Robert Haas
On Fri, Dec 13, 2013 at 9:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 If you imagine a scenario where somebody establishes a replication
 slot and then keeps it forever, not often.  But if you're trying to do
 something more ad hoc, where replication slots might be used just for
 short periods of time and then abandoned, I think it could come up
 pretty frequently.

 But can you imagine those users needing an exported snapshot? I can
 think of several short-lived usages, but all of those are unlikely to
 need a consistent view of the overall database. And those are less
 likely to be full blown replication solutions.
 I.e. it's not the DBA making that decision but the developer making the
 decision based on whether he requires the snapshot or not.

Well, it still seems to me that the right way to think about this is
that the change stream begins at a certain point, and then once you
cross a certain threshold (all transactions in progress at that time
have ended) any subsequent snapshot is a possible point from which to
roll forward.  You'll need to avoid applying any transactions that are
already included during the snapshot, but I don't really think that's
any great matter.  You're focusing on the first point at which the
consistent snapshot can be taken, and on throwing away any logical
changes that might have been available before that point so that they
don't have to be ignored in the application code, but I think that's
myopic.

For example, suppose somebody is replication tables on node A to node
B.  And then the decide to replicate some of the same tables to node
C.  Well, one way to do this is to have node C connect to node A and
acquire its own slot, but that means decoding everything twice.
Alternatively, you could reuse the same change stream, but you'll need
a new snapshot to roll forward from.  That doesn't seem like a problem
unless the API makes it a problem.

 Generally, I think you're being too dismissive of the stuff I'm
 complaining about here.  If we just can't get this, well then I
 suppose we can't.

 I think I am just scared of needing to add more features before getting
 the basics done and in consequence overrunning 9.4...

I am sensitive to that.  On the other hand, this API is going to be a
lot harder to change once it's released, so we really need to avoid
painting ourselves into a corner with v1.  As far as high-level design
concerns go, there are three things that I'm not happy with:

1. Slots.  We know we need physical slots as well as logical slots,
but the patch as currently constituted only offers logical slots.
2. Snapshot Mangement.  This issue.
3. Incremental Decoding.  So that we can begin applying a really big
transaction speculatively before it's actually committed.

I'm willing to completely punt #3 as far as 9.4 is concerned, because
I see a pretty clear path to fixing that later.  I am not yet
convinced that either of the other two can or should be postponed.

 Right.  I think your idea is good, but maybe there should also be a
 version of the function that never confirms receipt even if the
 transaction commits.  That would be useful for ad-hoc poking at the
 queue.

 Ok, that sounds easy enough, maybe
 pg_decoding_slot_get_[binary_]changes()
 pg_decoding_slot_peek_[binary_]changes()
 ?

s/pg_decoding_slot/pg_logical_stream/?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] clang's -Wmissing-variable-declarations shows some shoddy programming

2013-12-14 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Compiling postgres with said option in CFLAGS really gives an astounding
 number of warnings. Except some bison/flex generated ones, none of them
 looks acceptable to me.

Given that we're not going to be able to get rid of the bison/flex cases,
is this really something to bother with?  I agree I don't like cases where
there's an extern in some other .c file rather than in a header, but I'm
dubious about making a goal of suppressing this warning as such.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction Interfaces

2013-12-14 Thread Andres Freund
On 2013-12-14 11:50:00 -0500, Robert Haas wrote:
 On Fri, Dec 13, 2013 at 9:14 AM, Andres Freund and...@2ndquadrant.com wrote:
  But can you imagine those users needing an exported snapshot? I can
  think of several short-lived usages, but all of those are unlikely to
  need a consistent view of the overall database. And those are less
  likely to be full blown replication solutions.
  I.e. it's not the DBA making that decision but the developer making the
  decision based on whether he requires the snapshot or not.
 
 Well, it still seems to me that the right way to think about this is
 that the change stream begins at a certain point, and then once you
 cross a certain threshold (all transactions in progress at that time
 have ended) any subsequent snapshot is a possible point from which to
 roll forward.

Unfortunately it's not possible to build exportable snapshots at any
time - it requires keeping far more state around since we need to care
about all transactions, not just transactions touching the
catalog. Currently you can only export the snapshot in the one point we
become consistent, after that we stop maintaining that state.

I see pretty much no chance to change that without major effort that's
imo clearly out of scope for 9.4. Maybe we want to provide that at some
point, but it's going to be a bit.
Also note that in order to import a snapshot, the exporting snapshot
still has to be alive, in the same transaction that has exported the
snapshot. So, until the snapshot has been imported, replication cannot
progress. That's existing limitations from the snapshot import code, but
the reasoning for them make it unlikely that we can easily change them

Under that light, usecases like your example won't be able to benefit
from getting changes before the snapshot, or do you still see usecases
for that?

I've already prototyped the quickstart option that doesn't allow
exporting a snapshot, although I don't know how the UI for it is going
to look.

  I think I am just scared of needing to add more features before getting
  the basics done and in consequence overrunning 9.4...
 
 I am sensitive to that.  On the other hand, this API is going to be a
 lot harder to change once it's released, so we really need to avoid
 painting ourselves into a corner with v1.

I think we need to accept the fact that quite possibly we won't get
everything right in 9.4. Obviously we should strive to avoid that, but
trying to design things perfectly usually ends in not getting there.
Yes, the API changes for FDWs caused pain. But there's no way we would
have the current features if we'd tried to introduce the correct API
allowing them from the get go.

  As far as high-level design
 concerns go, there are three things that I'm not happy with:
 
 1. Slots.  We know we need physical slots as well as logical slots,
 but the patch as currently constituted only offers logical slots.

Well, then tell me the way you want to go forward on that end. I can
make the slot interface more generic if we know exactly what we need,
but I doesn't seem fair to take this patch hostage until I develop a
separate not so small feature. Why is that my task?
Because I think it's important, and because by now I know the related
code pretty well by now, I am willing to provide the parts of the that
prevent required WAL from being deleted, peg xmin and report the current
state to SQL, but somebody else is going to have to the rest.

 s/pg_decoding_slot/pg_logical_stream/?

Works for me.

Regards,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] clang's -Wmissing-variable-declarations shows some shoddy programming

2013-12-14 Thread Andres Freund
On 2013-12-14 12:14:25 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Compiling postgres with said option in CFLAGS really gives an astounding
  number of warnings. Except some bison/flex generated ones, none of them
  looks acceptable to me.
 
 Given that we're not going to be able to get rid of the bison/flex cases,
 is this really something to bother with?

On a second look, it's not that hard to supress the warnings for
those. Something *roughly* like:

/*
 * Declare variables defined by bison as extern, so clang doesn't complain
 * about undeclared non-static variables.
 */
extern int plpgsql_yychar;
extern int plpgsql_yynerrs;

works.

 I agree I don't like cases where
 there's an extern in some other .c file rather than in a header, but I'm
 dubious about making a goal of suppressing this warning as such.

The cases where a 'static' is missing imo are cases that should clearly
be fixed, there's just no excuse for them. But it's an easy mistake to
make so having the compiler's support imo is helpful.

WRT the externs in .c files, if it were just old code, I wouldn't
bother. But we're regularly adding them. The last ones just last week in
316472146 and ef3267523 and several others aren't much older. So making
it a policy that can relatively easily be checked automatically not to
do so seems like a good idea.

Unfortunately gcc doesn't have a equivalent warning...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stats for network traffic WIP

2013-12-14 Thread Greg Stark
I could see this being interesting for FDW plan nodes of the status were
visible in explain. Possibly also time spent waiting on network reads and
writes.

I have a harder time seeing why it's useful to have these stays in
aggregate but I suppose if you had lots of FDW connections or lots of
steaming slaves you might want to be able to identify which ones are not
getting used or are dominating your network usage.

-- 
greg
On 11 Dec 2013 10:52, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut pete...@gmx.net
 wrote:
  Is there a reason why you can't get this directly from the OS?

  I would say that its more of a convenience to track the usage directly
  from the database instead of setting up OS infrastructure to store it.

 The thing that I'm wondering is why the database would be the right place
 to be measuring it at all.  If you've got a network usage problem,
 aggregate usage across everything on the server is probably what you
 need to be worried about, and PG can't tell you that.

 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Changeset Extraction Interfaces

2013-12-14 Thread Jim Nasby

On 12/12/13 11:13 AM, Robert Haas wrote:

I think it sucks (that's the technical term) to have to wait for all
currently-running transactions to terminate before being able to begin
streaming changes, because that could take a long time.  And you might
well know that the long-running transaction which is rolling up
enormous table A that you don't care about is never going to touch
table B which you actually want to replicate.  Now, ideally, the DBA
would have a way to ignore that long-running transaction and force
replication to start, perhaps with the caveat that if that
long-running transaction actually does touch B after all then we have
to resync.  Your model's fine when we want to replicate the whole
database, but a big part of why I want this feature is to allow
finer-grained replication, down to the table level, or even slices of
tables.


I know you're not going to attempt this for 9.4, but I want to mention a 
related case here. I've often wanted the ability to limit the tables a 
transaction can touch, so that it will not interfere with vacuuming other 
tables.

This would be useful when you have some tables that see very frequent 
updates/deletes in a database that also has to support long-running 
transactions that don't hit those tables. You'd explicitly limit the tables 
your long-running transaction will touch and that way vacuum can ignore the 
long-running XID when calculating minimum tuple age for the heavy-hit tables.

If we had that capability it could also be used to improve the time required to 
get a snapshot for a limited set of tables.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stats for network traffic WIP

2013-12-14 Thread Jim Nasby

On 12/11/13 12:51 PM, Tom Lane wrote:

Atri Sharma atri.j...@gmail.com writes:

On Wed, Dec 11, 2013 at 11:12 PM, Peter Eisentraut pete...@gmx.net wrote:

Is there a reason why you can't get this directly from the OS?



I would say that its more of a convenience to track the usage directly
from the database instead of setting up OS infrastructure to store it.


The thing that I'm wondering is why the database would be the right place
to be measuring it at all.  If you've got a network usage problem,
aggregate usage across everything on the server is probably what you
need to be worried about, and PG can't tell you that.


Except how many folks that care about performance that much don't have 
dedicated database servers?

BTW, since someone mentioned CPU etc, what I'd be interested in is being able 
to see what OS-level resources were consumed by individual queries. You can 
already get that to a degree via explain (at least for memory and buffer 
reads), but it'd be very useful to see what queries are CPU or IO-bound.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Andreas Karlsson

On 12/14/2013 10:59 AM, Alexander Korotkov wrote:

This patch allows to use index for order-by if order-by clause and index
has non-empty common prefix. So, index gives right ordering for first n
order-by columns. In order to provide right order for rest m columns,
sort node is inserted. This sort node sorts groups of tuples where
values of first n order-by columns are equal.


I recently looked at the same problem. I see that you solved the 
rescanning problem by simply forcing the sort to be redone on 
ExecReScanSort if you have done a partial sort.


My idea for a solution was to modify tuplesort to allow storing the 
already sorted keys in either memtuples or the sort result file, but 
setting a field so it does not sort thee already sorted tuples again. 
This would allow the rescan to work as it used to, but I am unsure how 
clean or ugly this code would be. Was this something you considered?


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Templates S03E11

2013-12-14 Thread Jeff Davis
On Fri, 2013-12-13 at 13:42 -0500, Stephen Frost wrote:
 * Jeff Davis (pg...@j-davis.com) wrote:
  For what it's worth, I think the idea of extension templates has good
  conceptual integrity. Extensions are external blobs. To make them work
  more smoothly in several ways, we move them into the catalog. They have
  pretty much the same upsides and downsides of our existing extensions,
  aside from issues directly related to filesystem vs. catalog.
 
 I've never particularly liked the idea that extensions are external
 blobs, to be honest.

It did feel a bit like you were arguing about extensions as they exist
today, rather than extension templates.

To make much more progress, it seems like we either need an ingenious
idea about how to change existing extensions to work for all purposes,
or we need to invent a new concept.

 but most of those cases also look to have external
 dependencies (eg: FDWs), which really makes me doubt this notion that
 they could be distributed independently and outside of the OS packaging
 system (or that it would be a particularly good idea to even try...).

As I pointed out before, many language communities handle libraries
outside of the OS packaging system, e.g. cpan, gems, cabal, pear, etc.
Arguably, those libraries are more likely to have external dependencies,
and yet they find it a better solution anyway.

And you are completely ignoring the common case where people are just
using C because *postgres says they have to*. For instance, defining new
data types, implementing the GiST/GIN/SP-GiST APIs, or using some C hook
in the backend. Those may have no external dependencies at all.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] typo: XIDs are actually compared using modulo-2^32 arithmetic

2013-12-14 Thread Greg Stark
I don't have a source tree handy but iirc we treaty 2^31 values as being in
the past and 2^31 values as being in the future.

I've been trying to think how to protect better against the recent vacuum
freeze bug. If someone ruins vacuum freeze now and has any wrapped values
they'll destroy their possibly recoverable data.

It seems to me we shouldn't really need 2^31 values in the future. If
vacuum or hot pruning comes across an xid far in the future, say a million
xids further into the future than the most recent transaction, then it
should signal an error rather than just treat it as being in the future.

-- 
greg
On 12 Dec 2013 09:41, Tom Lane t...@sss.pgh.pa.us wrote:

 Gianni Ciolli gianni.cio...@2ndquadrant.it writes:
  It seems there is a typo here:
 
 http://www.postgresql.org/docs/devel/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
  where we say that we compare XIDs using arithmetic modulo 2^31, which
  should instead be 2^32 (as it is with uint32, e.g. xid_age).

 [ thinks about that for awhile... ]  Yeah, I think you're right.
 Patch pushed, thanks!

 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Extension Templates S03E11

2013-12-14 Thread Stephen Frost
* Jeff Davis (pg...@j-davis.com) wrote:
 To make much more progress, it seems like we either need an ingenious
 idea about how to change existing extensions to work for all purposes,
 or we need to invent a new concept.

I'm still in favor of supporting SQL/trusted-language only
'extensions' and allowing them to be installed via a client by a
non-superuser, with perhaps superusers having the ability to install
extensions which use interpreted but untrusted languages.

 As I pointed out before, many language communities handle libraries
 outside of the OS packaging system, e.g. cpan, gems, cabal, pear, etc.

Sure, and we have PGXN, which I believe fits quite a bit better into the
above list than extension templates.  Even so, we could look to try
and add in binary distribution capabilities to PGXN, though David seems
more interested in having a way to go from PGXN source to RPM or Debian
packages.  Note also that all of the above operate by downloading
something remote and then installing it *locally*- yet we're being asked
to build a system like that which allows installing to a remote system
through a database protocol.

 And you are completely ignoring the common case where people are just
 using C because *postgres says they have to*. For instance, defining new
 data types, implementing the GiST/GIN/SP-GiST APIs, or using some C hook
 in the backend. Those may have no external dependencies at all.

I don't intend to ignore that case- my earlier point was merely that
there seems to be a pretty close split between no C, C with external
dependencies, and C without external dependencies.  Based on what I saw
in PGXN, we've got a good bit of all three.  The only thing which has
even been proposed thus far to address all three cases is PGXN-
extension templates don't, nor do 'packages' or whatever we want to call
extensions without C code.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] typo: XIDs are actually compared using modulo-2^32 arithmetic

2013-12-14 Thread Andres Freund
On 2013-12-14 20:19:11 +, Greg Stark wrote:
 I don't have a source tree handy but iirc we treaty 2^31 values as being in
 the past and 2^31 values as being in the future.
 
 I've been trying to think how to protect better against the recent vacuum
 freeze bug. If someone ruins vacuum freeze now and has any wrapped values
 they'll destroy their possibly recoverable data.

Fortunately that's exceedingly unlikely to happen. There's basically two
consequences the bug can have:
a) we don't freeze tuples on pages that are already marked all-visible
   because we're doing a partial scan and thus don't scan them.
b) (9.2+) we don't freeze tuples on a page not marked all visible,
   because a buffer is pinned and we skip those when !scan_all.

a) can lead to the tuple vanishing again because they are reported as
being in progress, after 2^31 xids passed. But by virtue of being on an
all-visible page, they are fully hinted. Which means, that after the
wraparound they will be reported as delete-in-progress or
insert-in-progress. Luckily neither will get vacuumed away. They will
just be invisible.

What can happen with b) is that the clog gets truncated to somewhere
between the real relfrozenxid and the computed relfrozenxid. In that
case we'll get errors when later doing a
HeapTupleSatisfiesVacuum/HTSMVCC. But it's quite likely that the tuple
will get vacuumed at some point before 2^31 xids have passed since its
not marked all visible and thus will be scanned with each future vacuum.

So, for the data to be removed permanently you'd have to hit b) with
partial vacuums (scan_all vacuums do wait!) several times in a row. That
seems unlikely.

 It seems to me we shouldn't really need 2^31 values in the future. If
 vacuum or hot pruning comes across an xid far in the future, say a million
 xids further into the future than the most recent transaction, then it
 should signal an error rather than just treat it as being in the future.

Yea, I have wondered about that as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-12-14 Thread Peter Geoghegan
On Fri, Dec 13, 2013 at 4:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I spent a little bit of time looking at btreelock_insert_on_dup.  AFAICT
 it executes FormIndexDatum() for later indexes while holding assorted
 buffer locks in earlier indexes.  That really ain't gonna do, because in
 the case of an expression index, FormIndexDatum will execute nearly
 arbitrary user-defined code, which might well result in accesses to those
 indexes or others.  What we'd have to do is refactor so that all the index
 tuple values get computed before we start to insert any of them.  That
 doesn't seem impossible, but it implies a good deal more refactoring than
 has been done here.

We were proceeding on the basis that what I'd done, if deemed
acceptable in principle, could eventually be replaced by an
alternative value locking implementation that more or less similarly
extends the limited way in which value locking already occurs (i.e.
unique index enforcement's buffer locking), but without the downsides.
While I certainly appreciate your input, I still think that there is a
controversy about what implementation gets us the most useful
semantics, and I think we should now focus on resolving it. I am not
sure that Heikki's approach is functionally equivalent to mine. At the
very least, I think the trade-off of doing one or the other should be
well understood.

 Once we do that, I wonder if we couldn't get rid of the LWLockWeaken/
 Strengthen stuff.  That scares the heck out of me; I think it's deadlock
 problems waiting to happen.

There are specific caveats around using those. I think that they could
be useful elsewhere, but are likely to only ever have a few clients.
As previously mentioned, the same semantics appear in other similar
locking primitives in other domains, so fwiw it really doesn't strike
me as all that controversial. I agree that their *usage* is not
acceptable as-is. I've only left the usage in the patch to give us
some basis for reasoning about the performance on mixed workloads for
comparative purposes. Perhaps I shouldn't have even done that, to
better focus reviewer attention on the semantics implied by each
implementation.

 Also, the lack of any doc updates makes it hard to review this.  I can
 see that you don't want to touch the user-facing docs until the syntax
 is agreed on, but at the very least you ought to produce real updates
 for the indexam API spec, since you're changing that significantly.

I'll certainly do that in any future revision.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Templates S03E11

2013-12-14 Thread Josh Berkus
All:

Can someone summarize the issues with this patch for those of us who
haven't been following it closely?  I was just chatting with a couple
other contributors, and at this point none of just know what it
implements, what it doesn't implement, what the plans are for expanding
its feature set (if any), and why Frost doesn't like it.  I tried
reading through the thread on -hackers, and came away even more confused.

Is there maybe a wiki page for it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread David Rowley
I've been working on speeding up aggregate functions when used in the
context of a window's with non fixed frame heads.

Currently if the top of the window is not in a fixed position then when the
window frame moves down the aggregates must be recalculated by looping over
each record in the new scope of the window frame.

Take the following as an example:

SELECT SUM(n::int) OVER (ORDER BY n ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING)
FROM generate_series(1,2) g(n);

Here the frame head moves along with the current row and always finishes
with the last row in the partition or the last row in the frame if no
partitioning exists.
Currently PostgreSQL must recalculate the aggregate each time, this means
looping from the current row to the end of the frame for each row
produced... So the first row needs 2 iterations, the 2nd row 1, 3rd
row 19998 etc.

The above query on the unpatched head, on my laptop takes 36676 ms. With
the attached patch it takes 73 ms. So a speed increase of about 500 times
for 20,000 rows. Of course this performance gap will increase with more
rows and decrease with less rows, but it's even seeing a performance
improvement with as little as 50 rows.


The attached patch is not quite finished yet, I've not yet fully covered
SUM and AVG for all types. I just need to look into numeric a bit more
before I figure that one out.
Here is a list of things still todo:

1. Fully implement negative transition functions for SUM and AVG.
2. CREATE AGGREGATE support for user defined aggregates.
3. Documentation.
4. Look to see which other aggregates apart from COUNT, SUM and AVG that
can make use of this.

Please note that if the aggregate function does not have a negative
transition function setup, e.g MAX and MIN, then the old behaviour will
take place.

One thing that is currently on my mind is what to do when passing volatile
functions to the aggregate. Since the number of times we execute a volatile
function will much depend on the window frame options, I think we should
include some sort of warning in the documentation that The number of times
that the expression is evaluated within the aggregate function when used in
the context of a WINDOW is undefined. The other option would be to disable
this optimisation if the aggregate expression contained a volatile
function, but doing this, to me seems a bit weird as is anyone actually
going to be depending on a volatile function being executed so many times?

If anyone can see any road blocking issues to why this optimisation cannot
be done please let me know.

Otherwise I'll continue to work on this so that it is ready for CF4.

Regards

David Rowley


negative_transition_funcs_v0.6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 The attached patch is not quite finished yet, I've not yet fully covered
 SUM and AVG for all types.

I think you *can't* cover them for the float types; roundoff error
would mean you don't get the same answers as before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread Greg Stark
On 14 Dec 2013 15:40, Tom Lane t...@sss.pgh.pa.us wrote:

 David Rowley dgrowle...@gmail.com writes:
  The attached patch is not quite finished yet, I've not yet fully covered
  SUM and AVG for all types.

 I think you *can't* cover them for the float types; roundoff error
 would mean you don't get the same answers as before.

I was going to say the same thing. But then I started to wonder What's
so special about the answers we used to give? They are also subject to
round off and the results are already quite questionable in those cases.


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread David Rowley
On Sun, Dec 15, 2013 at 12:48 PM, Greg Stark st...@mit.edu wrote:


 On 14 Dec 2013 15:40, Tom Lane t...@sss.pgh.pa.us wrote:
 
  David Rowley dgrowle...@gmail.com writes:
   The attached patch is not quite finished yet, I've not yet fully
 covered
   SUM and AVG for all types.
 
  I think you *can't* cover them for the float types; roundoff error
  would mean you don't get the same answers as before.

 I was going to say the same thing. But then I started to wonder What's
 so special about the answers we used to give? They are also subject to
 round off and the results are already quite questionable in those cases.

I guess they probably shouldn't be, subject to rounding / influenced by
errors from tuples that are out of scope of the aggregate context.
Though saying that it would be a shame to have this optimisation for all
but float and double. I can imagine the questions in [GENERAL].. Why is
SUM(int) OVER ().. fast but SUM(float) OVER () slow? I wonder what
other RDBMS' do here...

Regards

David Rowley


Re: [HACKERS] ruleutils vs. empty targetlists

2013-12-14 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 On 13 December 2013 15:07, Tom Lane t...@sss.pgh.pa.us wrote:
 How about as-is in the back branches, and throw the new errors only
 in HEAD?

 Seems reasonable.

After further experimentation I've concluded that maybe we'd better
not back-patch this change.  The reason for my change of heart is
that in 8.4, the patch made plpgsql stop complaining that return ;
was missing an expression.  While we could possibly fix that, or
just decide not to patch 8.4, it occurs to me that there might be
applications out there that are expecting SELECT ; to fail, too.
So the risk of undesirable behavior changes seems a little larger
than I'd previously believed, and I'm feeling that fixing this
corner case in the back branches is not worth that risk.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On 14 Dec 2013 15:40, Tom Lane t...@sss.pgh.pa.us wrote:
 I think you *can't* cover them for the float types; roundoff error
 would mean you don't get the same answers as before.

 I was going to say the same thing. But then I started to wonder What's
 so special about the answers we used to give? They are also subject to
 round off and the results are already quite questionable in those cases.

Well, we can't easily do better than the old answers, and the new ones
might be arbitrarily worse.  Example: sum or average across single-row
windows ought to be exact in any case, but it might be arbitrarily wrong
with the negative-transition technique.

More generally, this is supposed to be a performance enhancement only;
it's not supposed to change the results.

This consideration also makes me question whether we should apply the
method for NUMERIC.  Although in principle numeric addition/subtraction
is exact, such a sequence could leave us with a different dscale than
is returned by the existing code.  I'm not sure if changing the number of
trailing zeroes is a big enough behavior change to draw complaints.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread Josh Berkus
On 12/14/2013 05:00 PM, Tom Lane wrote:
 This consideration also makes me question whether we should apply the
 method for NUMERIC.  Although in principle numeric addition/subtraction
 is exact, such a sequence could leave us with a different dscale than
 is returned by the existing code.  I'm not sure if changing the number of
 trailing zeroes is a big enough behavior change to draw complaints.

If we're going to disqualify NUMERIC too, we might as well bounce the
feature.  Without a fast FLOAT or NUMERIC, you've lost most of the
target audience.

I think even the FLOAT case deserves some consideration.  What's the
worst-case drift?  In general, folks who do aggregate operations on
FLOATs aren't expecting an exact answer, or one which is consistent
beyond a certain number of significant digits.

And Dave is right: how many bug reports would we get about NUMERIC is
fast, but FLOAT is slow?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread David Rowley
On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus j...@agliodbs.com wrote:

 If we're going to disqualify NUMERIC too, we might as well bounce the
 feature.  Without a fast FLOAT or NUMERIC, you've lost most of the
 target audience.


I don't agree with this. I'm going with the opinion that the more types and
aggregates we can support (properly) the better. I'd rather delay
implementations of the ones which could change results than see them as a
roadblock for the ones we can implement today without this danger.

I think the feature is worth it alone if we could improve COUNT(*).
It's a bit silly to have to loop through all the tuples in a tuplestore to
see how many there are after removing one, when we already knew the count
before removing it.  Something like, 10 - 1  ummm I dunno, let's count
again.. 1, 2, 3, 4, 5, 6, 7, 8, 9 It's 9!! Where with this patch it's
just 10 - 1 *result*. Feels a bit like asking a kid, if you have 10 beans
and you take 1 away, how many will there be. You and I know 9, but the kid
might have to count them again. PostgreSQL counts them again.

Regards

David Rowley

--
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 I think even the FLOAT case deserves some consideration.  What's the
 worst-case drift?

Complete loss of all significant digits.

The case I was considering earlier of single-row windows could be made
safe (I think) if we apply the negative transition function first, before
incorporating the new row(s).  Then for example if you've got float8 1e20
followed by 1, you compute (1e20 - 1e20) + 1 and get the right answer.
It's not so good with two-row windows though:

Table   correct sum of  negative-transition
this + next value   result
1e201e201e20 + 1 = 1e20
1   1   1e20 - 1e20 + 0 = 0
0

 In general, folks who do aggregate operations on
 FLOATs aren't expecting an exact answer, or one which is consistent
 beyond a certain number of significant digits.

Au contraire.  People who know what they're doing expect the results
to be what an IEEE float arithmetic unit would produce for the given
calculation.  They know how the roundoff error ought to behave, and they
will not thank us for doing a calculation that's not the one specified.
I will grant you that there are plenty of clueless people out there
who *don't* know this, but they shouldn't be using float arithmetic
anyway.

 And Dave is right: how many bug reports would we get about NUMERIC is
 fast, but FLOAT is slow?

I've said this before, but: we can make it arbitrarily fast if we don't
have to get the right answer.  I'd rather get it's slow complaints
than this is the wrong answer complaints.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 On Sun, Dec 15, 2013 at 2:27 PM, Josh Berkus j...@agliodbs.com wrote:
 If we're going to disqualify NUMERIC too, we might as well bounce the
 feature.  Without a fast FLOAT or NUMERIC, you've lost most of the
 target audience.

 I think the feature is worth it alone if we could improve COUNT(*).

Agreed, count(*) and operations on integers are alone enough to be worth
the trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread Tom Lane
I wrote:
 It's not so good with two-row windows though:

Actually, carrying that example a bit further makes the point even more
forcefully:

Table   correct sum of  negative-transition
this + next value   result
1e201e201e20 + 1 = 1e20
1   1   1e20 - 1e20 + 0 = 0
0   0   0 - 1 + 0 = -1
0   1   -1 - 0 + 1 = 0
1

Those last few answers are completely corrupt.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] encoding name SHIFT_JIS is absent in chklocale.c

2013-12-14 Thread Tatsuo Ishii
 I got a complain from a user.
 
 If current locale is SJIS, psql does not set client encoding to SJIS.
 
 # localedef -f SHIFT_JIS -i ja_JP /usr/lib/locale/ja_JP.SJIS
 $ export LANG=ja_JP.SJIS
 $ psql
 \encoding
 SQL_ASCII -- This should be SJIS
 
 This is because the encoding map (encoding_match_list) in chklocale.c
 is lacking the definition for SHIFT_JIS.  Included is a proposed
 patch. If there's no objection, I will commit it.

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2013-12-14 Thread David Rowley
On Sun, Dec 15, 2013 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  It's not so good with two-row windows though:

 Actually, carrying that example a bit further makes the point even more
 forcefully:

 Table   correct sum of  negative-transition
 this + next value   result
 1e201e201e20 + 1 = 1e20
 1   1   1e20 - 1e20 + 0 = 0
 0   0   0 - 1 + 0 = -1
 0   1   -1 - 0 + 1 = 0
 1

 Those last few answers are completely corrupt.


I guess the answer for the people that complain about slowness could be
that they create their own aggregate function which implements float4pl as
the trans function and float4mi as the negative trans function. They can
call it SUMFASTBUTWRONG() if they like. Perhaps it would be worth a note in
the documents for this patch?

I've removed the negative trans function setups for float4 and float8 with
SUM and AVG stuff in my working copy now.

As for numeric, I did start working on this just after I posted the
original patch and before I saw your comment about it. I did end up making
do_numeric_desperse() which was to be the reverse of do_numeric_accum(),
but I got stuck on the equivalent of when do_numeric_accum()
does mul_var(X, X, X2, X.dscale * 2);

If it is decided that we don't want to implement a negative trans function
for numeric, then I guess I could leave in my trans function to allow users
to create their own fast version, maybe?

Regards

David Rowley


 regards, tom lane