date:20140428

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Heikki Linnakangas


On 04/26/2014 09:27 PM, Andres Freund wrote:

I don't think we need to decide this without benchmarks proving the
benefits. I basically want to know whether somebody has an actual
usecase - even if I really, really, can't think of one - of setting
max_connections even remotely that high. If there's something
fundamental out there that'd make changing the limit impossible, doing
benchmarks wouldn't be worthwile.


It doesn't seem unreasonable to have a database with tens of thousands 
of connections. Sure, performance will suffer, but if the connections 
sit idle most of the time so that the total load is low, who cares. 
Sure, you could use a connection pooler, but it's even better if you 
don't have to.


If there are big gains to be had from limiting the number of 
connections, I'm not against it. For the purpose of shrinking BufferDesc 
though, I have feeling there might be other lower hanging fruit in 
there. For example, wait_backend_pid and freeNext are not used very 
often, so they could be moved elsewhere, to a separate array. And buf_id 
and the LWLock pointers could be calculated from the memory address of 
the struct.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Heikki Linnakangas


On 04/27/2014 11:33 PM, Tom Lane wrote:

I wrote:

On closer inspection, the issue here is really that putting relpath.h/.c
in common/ was completely misguided from the get-go.  It's unnecessary:
there's nothing outside the backend that uses it, except for
contrib/pg_xlogdump which could very easily do without it.  And relpath.h
is a serious failure from a modularity standpoint anyway, because there is
code all over the backend that has intimate familiarity with the pathname
construction rules.  We could possibly clean that up to the extent of
being able to hide TABLESPACE_VERSION_DIRECTORY inside relpath.c, but what
then?  We'd still be talking about having CATALOG_VERSION_NO compiled into
frontend code for any frontend code that actually made use of relpath.c,
which is surely not such a great idea.



So it seems to me the right fix for the relpath end of it is to push most
of relpath.c back where it came from, which I think was backend/catalog/.


In short, what I'm proposing here is to revert commit
a73018392636ce832b09b5c31f6ad1f18a4643ea, lock stock n barrel.
According to the commit message, the point of that was to allow
pg_xlogdump to use relpath(), but I do not see it doing so; and
if it tried, I don't know where it would get an accurate value of
CATALOG_VERSION_NO from.  So I think that was just poorly thought out.

What contrib/pg_xlogdump *is* using is the forkNames[] array, which
it uses to print the fork-number field of a BkpBlock symbolically.
I'm inclined to think that printing numerically is good enough there,
and a lot more robust.

Comments?  If there's anyone who has a really good use-case for using
relpath() from outside the backend, better speak up.


I'm using it in the pg_rewind tool. It needs to know how to map 
relfilenodes to physical files.


It has quite intimate knowledge of the on-disk layout anyway, so it 
wouldn't be the end of the world to just copy-paste that code. But would 
rather not, of course.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-28 Thread Andres Freund

On 2014-04-27 18:44:16 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Just for some clarity, that also happens with expressions like:
  WHERE
  ROW(ev_class, rulename, ev_action) = ROW('pg_rewrite'::regclass, 
  '_RETURN', NULL)
  ORDER BY ROW(ev_class, rulename, ev_action);
 
  Previously we wouldn't detoast ev_action here, but now we do.
 ...
 The extra detoast you're seeing in this example is caused by the ROW()
 in the ORDER BY.  Which, as I said, is just bad SQL coding.

Good point.

  I agree that this needs to get backpatched at some point. But people
  also get very upset if queries gets slower by a magnitude or two after a
  minor version upgrade. And this does have the potential to do that, no?
 
 They don't get nearly as upset as they do if the database loses their
 data.  Yes, in an ideal world, we could fix this and not suffer any
 performance loss anywhere.  But no such option is on the table, and
 waiting is not going to make one appear.  What waiting *will* do is
 afford more opportunities for this bug to eat people's data.

We make tradeoffs between performance and safety *all the time*. A new
performance regression that has the potential of affecting a fair number
of installations very well can be worse than a decade old correctness
bug. A bug that only will hit in some specific usage scenarios and will
only affect individual datums.

And you *have* come up with an alternative approach. It might very well
be inferior, but that certainly doesn't make this approach one without
alternatives.

Anyway, I've now voiced my doubts about immediate backpatching. Any
other opinions?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Andres Freund

On 2014-04-28 10:48:30 +0300, Heikki Linnakangas wrote:
 On 04/26/2014 09:27 PM, Andres Freund wrote:
 I don't think we need to decide this without benchmarks proving the
 benefits. I basically want to know whether somebody has an actual
 usecase - even if I really, really, can't think of one - of setting
 max_connections even remotely that high. If there's something
 fundamental out there that'd make changing the limit impossible, doing
 benchmarks wouldn't be worthwile.
 
 It doesn't seem unreasonable to have a database with tens of thousands of
 connections. Sure, performance will suffer, but if the connections sit idle
 most of the time so that the total load is low, who cares. Sure, you could
 use a connection pooler, but it's even better if you don't have to.

65k connections will be absolutely *disastrous* for performance because
of the big PGPROC et al. I *do* think we have to make live easier for
users here by supplying builtin pooling at some point, but that's just a
separate feature.

 If there are big gains to be had from limiting the number of connections,
 I'm not against it. For the purpose of shrinking BufferDesc though, I have
 feeling there might be other lower hanging fruit in there. For example,
 wait_backend_pid and freeNext are not used very often, so they could be
 moved elsewhere, to a separate array. And buf_id and the LWLock pointers
 could be calculated from the memory address of the struct.

The main reason I want to shrink it is that I want to make pin/unpin
buffer lockless and all solutions I can come up with for that require
flags to be in the same uint32 as the refcount. For performance
it'd be beneficial if usagecount also fits in there.

I agree that we can move a good part of BufferDesc into a separately
indexed array. io_in_progress_lock, freeNext, wait_backend_id are imo
good candidates.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Abhijit Menon-Sen

Hi.

In the past, we've had situations where everything is hung turned out
to be because of a script that ran manual VACUUM that was holding some
lock. It's admittedly not a huge problem, but it might be useful if a
manual VACUUM could be cancelled the way autovacuum can be.

One way to do this would be to add an allow_vacuum_cancel GUC that, if
set, would cause VACUUM to set the (suitably renamed) PROC_IS_AUTOVACUUM
flag on its own process. Another would be to add an extra option to the
VACUUM command that enables this behaviour.

The former has the advantage of being backwards-compatible with existing
scripts that run manual VACUUM. The latter is arguably nicer.

Any thoughts?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Heikki Linnakangas


On 04/28/2014 12:39 PM, Andres Freund wrote:

On 2014-04-28 10:48:30 +0300, Heikki Linnakangas wrote:

On 04/26/2014 09:27 PM, Andres Freund wrote:

I don't think we need to decide this without benchmarks proving the
benefits. I basically want to know whether somebody has an actual
usecase - even if I really, really, can't think of one - of setting
max_connections even remotely that high. If there's something
fundamental out there that'd make changing the limit impossible, doing
benchmarks wouldn't be worthwile.


It doesn't seem unreasonable to have a database with tens of thousands of
connections. Sure, performance will suffer, but if the connections sit idle
most of the time so that the total load is low, who cares. Sure, you could
use a connection pooler, but it's even better if you don't have to.


65k connections will be absolutely *disastrous* for performance because
of the big PGPROC et al.


Well, often that's still good enough.


The main reason I want to shrink it is that I want to make pin/unpin
buffer lockless and all solutions I can come up with for that require
flags to be in the same uint32 as the refcount. For performance
it'd be beneficial if usagecount also fits in there.


Would it be enough to put only some of the flags in the same uint32?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] So why is EXPLAIN printing only plan time?

2014-04-28 Thread Robert Haas

On Sun, Apr 27, 2014 at 1:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ... and not, in particular, parse analysis or rewrite time?

I think breaking those out would be a good idea.  Especially rewrite time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Andres Freund

On 2014-04-28 13:32:45 +0300, Heikki Linnakangas wrote:
 On 04/28/2014 12:39 PM, Andres Freund wrote:
 On 2014-04-28 10:48:30 +0300, Heikki Linnakangas wrote:
 On 04/26/2014 09:27 PM, Andres Freund wrote:
 I don't think we need to decide this without benchmarks proving the
 benefits. I basically want to know whether somebody has an actual
 usecase - even if I really, really, can't think of one - of setting
 max_connections even remotely that high. If there's something
 fundamental out there that'd make changing the limit impossible, doing
 benchmarks wouldn't be worthwile.
 
 It doesn't seem unreasonable to have a database with tens of thousands of
 connections. Sure, performance will suffer, but if the connections sit idle
 most of the time so that the total load is low, who cares. Sure, you could
 use a connection pooler, but it's even better if you don't have to.
 
 65k connections will be absolutely *disastrous* for performance because
 of the big PGPROC et al.
 
 Well, often that's still good enough.

That may be true for 2-4k max_connections, but 65k? That won't even
*run*, not to speak of doing something, in most environments because of
the number of processes required.

Even making only 20k connections will probably crash your computer.

 The main reason I want to shrink it is that I want to make pin/unpin
 buffer lockless and all solutions I can come up with for that require
 flags to be in the same uint32 as the refcount. For performance
 it'd be beneficial if usagecount also fits in there.
 
 Would it be enough to put only some of the flags in the same uint32?

It's probably possible, but would make things more complicated. For a
feature nobody is ever going to use.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-28 Thread Marti Raudsepp

On Fri, Apr 25, 2014 at 8:58 PM, Josh Berkus j...@agliodbs.com wrote:
 Well, I've already had collisions with UUID-OSSP, in production, with
 only around 20 billion values.  So clearly there aren't 122bits of true
 randomness in OSSP.  I can't speak for other implementations because I
 haven't tried them.

Interesting. The statistical chances of this happening should be
approximately 4e-17. Are you certain that this was due to uuid-ossp
and not an application bug?

Can you say what kind of operating system and environment that was? I
skimmed the sources of uuid-ossp 1.6.2 and it seems to be doing the
right thing, using /dev/urandom or /dev/random on Unixes and
CryptGenRandom on Windows. Barring any bugs, of course. However, if
these fail for whatever reason (e.g. out of file descriptors), it
falls back to libc random(), which is clearly broken.

On Fri, Apr 25, 2014 at 6:18 PM, Greg Stark st...@mit.edu wrote:
 The difficulty lies not really in the PRNG implementation (which is
 hard but well enough understood that it's not much of an issue these
 days). The difficulty lies in obtaining enough entropy. There are ways
 of obtaining enough entropy and they are available.

 Obtaining enough entropy requires access to hardware devices which
 means a kernel system call.

This is a solved problem in most environments, too. The kernel
collects entropy from unpredictable events and then seeds a global
CSPRNG with that. This collection happens always regardless of whether
you request random numbers or not, so essentially comes for free.
Applications can then request output from this CSPRNG.

Reason being, this infrastructure is necessary for more critical tasks
than generating UUIDs: pretty much all of cryptography requires random
numbers.

 They also deplete
 the available entropy pool for other sources which may means they have
 security consequences.

This only affects the Linux /dev/random, which is discouraged these
days for that reason. Applications should use urandom instead. To my
knowledge, there are no other operating systems that have this
depletion behavior.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Christoph Berg

Re: Heikki Linnakangas 2014-04-28 535e09b7.3090...@vmware.com
 Comments?  If there's anyone who has a really good use-case for using
 relpath() from outside the backend, better speak up.
 
 I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes
 to physical files.
 
 It has quite intimate knowledge of the on-disk layout anyway, so it wouldn't
 be the end of the world to just copy-paste that code. But would rather not,
 of course.

Isn't pg_rewind so low-level server-close that it needs tons of server
headers anyway, including one that would still have relpath()? We are
talking here about what headers pure client apps need.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Robert Haas

On Mon, Apr 28, 2014 at 7:37 AM, Andres Freund and...@2ndquadrant.com wrote:
 Well, often that's still good enough.

 That may be true for 2-4k max_connections, but 65k? That won't even
 *run*, not to speak of doing something, in most environments because of
 the number of processes required.

 Even making only 20k connections will probably crash your computer.

I'm of two minds on this topic.  On the one hand, cat
/proc/sys/kernel/pid_max on a Linux system I just tested (3.2.6)
returns 65536, so we'll run out of PID space before we run out of 64k
backends.  On the other hand, that value can easily be increased to a
few million via, e.g., sysctl -w kernel.pid_max=4194303, and I imagine
that as machines continue to get bigger there will be more and more
people wanting to do things like that.

I think the fact that making 20k connections might crash your computer
is an artifact of other problems that we really ought to also fix
(like per-backend memory utilization, and lock contention on various
global data structures) rather than baking it into more places.  In
PostgreSQL 25.3, perhaps we'll be able to run distributed PostgreSQL
clusters that can service a million simultaneous connections across
dozens of physical machines.  Then again, there might not be much left
of our current buffer manager by that point, so maybe what we decide
right now isn't that relevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Different behaviour of concate() and concate operator ||

2014-04-28 Thread amul sul

Hi ALL,

I need little help to understand, text concatenation.

concat function and operator || have different behaviour, if any input is NULL.

test=# select  'abc' || NULL;
 ?column?
--

(1 row)

test=# select concat('abc', NULL);
 concat

 abc
(1 row)


It has simple reason, concat operator || use textcat() function which is STRICT.

my question is 

1. Do we required textcat to be STRICT, why?
2. textcat() is used for concat operator ||, can't it possible using concat() 
function?


Thanks in advance.

Regards,
Amul Sul  


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Heikki Linnakangas


On 04/28/2014 03:29 PM, Christoph Berg wrote:

Re: Heikki Linnakangas 2014-04-28 535e09b7.3090...@vmware.com

Comments?  If there's anyone who has a really good use-case for using
relpath() from outside the backend, better speak up.


I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes
to physical files.

It has quite intimate knowledge of the on-disk layout anyway, so it wouldn't
be the end of the world to just copy-paste that code. But would rather not,
of course.


Isn't pg_rewind so low-level server-close that it needs tons of server
headers anyway, including one that would still have relpath()? We are
talking here about what headers pure client apps need.


It knows how to decode WAL, similar to pg_xlogdump. And it knows about 
the data directory layout, in particular, how relfilenodes are mapped to 
physical files. Those are the low-level parts. So, it certainly needs 
some server headers, but I wouldn't call it tons.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-28 Thread Robert Haas

On Fri, Apr 18, 2014 at 11:46 AM, Greg Stark st...@mit.edu wrote:
 On Fri, Apr 18, 2014 at 4:14 PM, Robert Haas robertmh...@gmail.com wrote:
 I am a bit confused by this remark.  In *any* circumstance when you
 evict you're incurring precisely one page fault I/O when the page is
 read back in.   That doesn't mean that the choice of which page to
 evict is irrelevant.

 But you might be evicting a page that will be needed soon or one that
 won't be needed for a while. If it's not needed for a while you might
 be able to avoid many page evictions by caching a page that will be
 used several times.

Sure.

 If all the pages currently in RAM are hot -- meaning they're hot
 enough that they'll be needed again before the page you're reading in
 -- then they're all equally bad to evict.

Also true.  But the problem is that it is very rarely, if ever, the
case that all pages are *equally* hot.  On a pgbench workload, for
example, I'm very confident that while there's not really any cold
data, the btree roots and visibility map pages are a whole lot hotter
than a randomly-selected heap page.  If you evict a heap page, you're
going to need it back pretty quick, because it won't be long until the
random-number generator again chooses a key that happens to be located
on that page.  But if you evict the root of the btree index, you're
going to need it back *immediately*, because the very next query, no
matter what key it's looking for, is going to need that page.  I'm
pretty sure that's a significant difference.

 I'm trying to push us away from the gut instinct that frequently used
 pages are important to cache and towards actually counting how many
 i/os we're saving. In the extreme it's possible to simulate any cache
 algorithm on a recorded list of page requests and count how many page
 misses it generates to compare it with an optimal cache algorithm.

There's another issue, which Simon clued me into a few years back:
evicting the wrong page can cause system-wide stalls.  In the pgbench
case, evicting a heap page will force the next process that chooses a
random number that maps to a tuple on that page to wait for the page
to be faulted back in.  That's sad, but unless the scale factor is
small compared to the number of backends, there will probably be only
ONE process waiting.  On the other hand, if we evict the btree root,
within a fraction of a second, EVERY process that isn't already
waiting on some other I/O will be waiting for that I/O to complete.
The impact on throughput is much bigger in that case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-28 Thread Robert Haas

On Mon, Apr 21, 2014 at 6:38 PM, Jim Nasby j...@nasby.net wrote:
 I feel that if there is no memory pressure, frankly it doesnt matter much
 about what gets out and what not. The case I am specifically targeting is
 when the clocksweep gets to move about a lot i.e. high memory pressure
 workloads. Of course,  I may be totally wrong here.

 Well, there's either memory pressure or there isn't. If there isn't then
 it's all moot *because we're not evicting anything*.

I don't think that's really true.  A workload can fit within
shared_buffers at some times and spill beyond it at others.  Every
time it fits within shared_buffers for even a short period of time,
the reference count of any buffer that's not ice-cold goes to 5 and we
essentially lose all knowledge of which buffers are relatively hotter.
 Then, when we spill out again, evictions are random.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shm_mq inconsistent behavior of SHM_MQ_DETACHED

2014-04-28 Thread Robert Haas

On Tue, Apr 22, 2014 at 9:55 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 I was playing with shm_mq and found a little odd behavior with detaching
 after sending messages.

 Following sequence behaves as expected (receiver gets 2 messages):
 P1 - set_sender
 P1 - attach
 P2 - set_receiver
 P2 - attach
 P1 - send
 P2 - receive
 P1 - send
 P1 - detach
 P2 - receive
 P2 - detach

 But if I do first receive after detach like in this sequence:
 P1 - set_sender
 P1 - attach
 P2 - set_receiver
 P2 - attach
 P1 - send
 P1 - send
 P1 - detach
 P2 - receive

 I get SHM_MQ_DETACHED on the receiver even though there are messages in the
 ring buffer.

That's a bug.

 The reason for this behavior is that mqh_counterparty_attached is only set
 by shm_mq_receive. This does not seem to be consistent - I would either
 expect to get SHM_MQ_DETACHED always when other party has detached or always
 get all remaining messages that are in queue (and I would strongly prefer
 the latter).

 Maybe the shm_mq_get_bytes_written should be used to determine if there is
 something left for us to read in the receiver if we hit the
 !mqh_counterparty_attached code path with detached sender?

That's probably not a good idea, because there could be just a partial
message left in the buffer, if the sender died midway through writing
it.  I suspect that attacking the problem that way will lead to a
bunch of nasty edge cases.

I'm thinking that the problem is really revolves around
shm_mq_wait_internal().  It returns true if the queue is attached but
not detached, and false if either the detach has already happened, or
if we establish via the background worker handle that it will never
come.  But in the case of receiving, we want to treat
attached-then-detached as a success case, not a failure case.

Can you see if the attached patch fixes it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index c70f3bf..4f7dd9c 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -406,7 +406,8 @@ shm_mq_receive(shm_mq_handle *mqh, Size *nbytesp, void **datap, bool nowait)
 			if (shm_mq_get_sender(mq) == NULL)
 return SHM_MQ_WOULD_BLOCK;
 		}
-		else if (!shm_mq_wait_internal(mq, mq-mq_sender, mqh-mqh_handle))
+		else if (!shm_mq_wait_internal(mq, mq-mq_sender, mqh-mqh_handle)
+  shm_mq_get_sender(mq) == NULL)
 		{
 			mq-mq_detached = true;
 			return SHM_MQ_DETACHED;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Different behaviour of concate() and concate operator ||

2014-04-28 Thread Tom Lane

amul sul sul_a...@yahoo.co.in writes:
 concat function and operator || have different behaviour, if any input is 
 NULL.

The behavior of || is specified by the SQL standard, and it says (SQL99
6.27 string value expression general rule 2a):

a) If either S1 or S2 is the null value, then the result of the
  concatenation is the null value.

So no, we're not changing it to be more like concat().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(

2014-04-28 Thread Alvaro Herrera

Tom Lane wrote:
 Can't completely get rid of #ifndef FRONTEND in palloc.h :-(
 
 pg_controldata includes postgres.h not postgres_fe.h, so utils/palloc.h
 must be able to compile in a #define FRONTEND context.

Hmm, I had this patch in an abandoned branch from long ago, which I
think helped remove postgres.h from pg_controldata.  I remembered it
just now because of this commit message.  Maybe it's useful to re-remove
the #ifndef FRONTEND from palloc.h.

It's not rebased to latest master and maybe even not complete; if people
think this approach is worthwhile I can try and clean it up and
proposely more seriously; LMK.  (Also if people think it needs futher
tweaks.  I vaguely recall I didn't propose it back then because the set
of stuff in the new header could be tweaked.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index d960bbc..de06a36 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -30,6 +30,7 @@
 #include access/twophase.h
 #include access/xact.h
 #include access/xlog_internal.h
+#include access/xlogproc.h
 #include access/xlogreader.h
 #include access/xlogutils.h
 #include catalog/catversion.h
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 0c178c5..313486e 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -22,6 +22,7 @@
 
 #include access/xlog.h
 #include access/xlog_internal.h
+#include access/xlogproc.h
 #include miscadmin.h
 #include postmaster/startup.h
 #include replication/walsender.h
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..8f185aa 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -22,6 +22,7 @@
 #endif
 
 #include access/htup_details.h
+#include access/xlogproc.h
 #include bootstrap/bootstrap.h
 #include catalog/index.h
 #include catalog/pg_collation.h
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5fb2d81..54a09f0 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -43,6 +43,7 @@
 
 #include access/xlog.h
 #include access/xlog_internal.h
+#include access/xlogproc.h
 #include libpq/pqsignal.h
 #include miscadmin.h
 #include pgstat.h
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 3ddd596..4bd4740 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -34,6 +34,7 @@
 
 #include access/xlog.h
 #include access/xlog_internal.h
+#include access/xlogproc.h
 #include libpq/pqsignal.h
 #include miscadmin.h
 #include postmaster/fork_process.h
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 15c2320..08491a4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -93,6 +93,7 @@
 
 #include access/transam.h
 #include access/xlog.h
+#include access/xlogproc.h
 #include bootstrap/bootstrap.h
 #include catalog/pg_control.h
 #include lib/ilist.h
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 7ebf500..1ea9d6a 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -22,7 +22,7 @@
 #include signal.h
 #include unistd.h
 
-#include access/xlog.h
+#include access/xlogproc.h
 #include libpq/pqsignal.h
 #include miscadmin.h
 #include postmaster/startup.h
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 8359da6..bdfe6ca 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -46,7 +46,7 @@
 #include time.h
 #include unistd.h
 
-#include access/xlog.h
+#include access/xlogproc.h
 #include libpq/pqsignal.h
 #include miscadmin.h
 #include postmaster/walwriter.h
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 911a66b..60e6d8d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -49,6 +49,7 @@
 #include access/timeline.h
 #include access/transam.h
 #include access/xlog_internal.h
+#include access/xlogproc.h
 #include libpq/pqformat.h
 #include libpq/pqsignal.h
 #include miscadmin.h
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 8427006..22a9248 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -23,6 +23,7 @@
 #include access/htup_details.h
 #include access/sysattr.h
 #include access/xact.h
+#include access/xlogproc.h
 #include catalog/catalog.h
 #include catalog/indexing.h
 #include catalog/namespace.h
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 98149fc..6a90657 100644
--- a/src/backend/utils/misc/guc.c
+++

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 04/28/2014 03:29 PM, Christoph Berg wrote:
 Re: Heikki Linnakangas 2014-04-28 535e09b7.3090...@vmware.com
 I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes
 to physical files.

 Isn't pg_rewind so low-level server-close that it needs tons of server
 headers anyway, including one that would still have relpath()? We are
 talking here about what headers pure client apps need.

 It knows how to decode WAL, similar to pg_xlogdump. And it knows about 
 the data directory layout, in particular, how relfilenodes are mapped to 
 physical files. Those are the low-level parts. So, it certainly needs 
 some server headers, but I wouldn't call it tons.

I'm not even worried about which headers this program uses.  What I'm
worried about is that you've got CATALOG_VERSION_NO compiled into a
non-server executable.  Is that really such a great idea?  Wouldn't it be
better if pg_rewind did not depend on that?  (Perhaps it should get the
database's catalog version out of the pg_control file, for example.)

In short, while I don't deny that there may be non-server programs
that need to know about physical file paths, I do strongly doubt
that relpath.h/.c in their current form are a good solution to that
problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Tom Lane

Abhijit Menon-Sen a...@2ndquadrant.com writes:
 In the past, we've had situations where everything is hung turned out
 to be because of a script that ran manual VACUUM that was holding some
 lock. It's admittedly not a huge problem, but it might be useful if a
 manual VACUUM could be cancelled the way autovacuum can be.

I think the real answer to that is stop using manual VACUUM.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implied BETWEEN from join quals

2014-04-28 Thread Robert Haas

On Tue, Apr 22, 2014 at 11:44 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I was recently nudged to describe an optimisation of merge
 joins/sorts. Rather than decribe that, I've looked at the general
 case:

 There are some additional implications we may make when joining
 tables... a particularly interesting one is that

 SELECT *
 FROM Fact JOIN Dim on Fact.col = Dim.col

 can be rewritten as

 SELECT *
 FROM Fact JOIN Dim on Fact.col = Dim.col
 WHERE
 (
 Fact.col BETWEEN
  (SELECT min(col) FROM Dim)
 AND
  (SELECT max(col) FROM Dim)
 )
 AND
 (
 Dim.col BETWEEN
  (SELECT min(col) FROM Fact)
 AND
  (SELECT max(col) FROM Fact)
 )

 which also works similarly for anti/semi-joins.

 If we have indexes on A.col and B.col then these additional quals can
 be derived cheaply at run-time and could have an important effect on
 optimisation.

Interesting.  This can pretty obviously produce a big win if things
break our way.  But it'll take some effort do determine whether the
range of possible values for the join column is narrow enough to make
the inferred BETWEEN clause selective enough to be useful, and that
effort will be wasted if it turns out that the answer is no.  I can
certainly think of times when this would have helped me, so I kind of
like the idea in theory ... but I won't be surprised if it turns out
to be possible to construct realistic test cases where trying to
figure this out causes too much increase in planning time.

Actually, it seems possible to me that this could end up winning even
if it doesn't enable the use of an index scan rather than a sequential
scan, because eliminating rows during the scan is typically much
cheaper than eliminating them during the join step.  But it also seems
possible that it could lose heavily in that case, if the extra steps
during the scan phase don't eliminate enough rows to reduce the join
cost to a sufficient degree.  I'm not really sure how it will work out
on balance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I think the fact that making 20k connections might crash your computer
 is an artifact of other problems that we really ought to also fix
 (like per-backend memory utilization, and lock contention on various
 global data structures) rather than baking it into more places.  In
 PostgreSQL 25.3, perhaps we'll be able to run distributed PostgreSQL
 clusters that can service a million simultaneous connections across
 dozens of physical machines.  Then again, there might not be much left
 of our current buffer manager by that point, so maybe what we decide
 right now isn't that relevant.

Yeah.  I think that useful use of 64K backends is far enough away that
it shouldn't be a showstopper argument, assuming that we get something
good in return for baking that into bufmgr.  What I find much more
worrisome about Andres' proposals is that he seems to be thinking that
there are *no* other changes to the buffer headers on the horizon.
That's untenable.  And I don't want to be told that we can't improve
the buffer management algorithms because adding another field would
make the headers not fit in a cacheline.  (For the same reason, I'm
pretty unimpressed by the nearby suggestions that it'd be okay to put
very tight limits on the number of bits in the buffer header flags or
the usage count.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Heikki Linnakangas


On 04/28/2014 04:51 PM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

On 04/28/2014 03:29 PM, Christoph Berg wrote:

Re: Heikki Linnakangas 2014-04-28 535e09b7.3090...@vmware.com

I'm using it in the pg_rewind tool. It needs to know how to map relfilenodes
to physical files.



Isn't pg_rewind so low-level server-close that it needs tons of server
headers anyway, including one that would still have relpath()? We are
talking here about what headers pure client apps need.



It knows how to decode WAL, similar to pg_xlogdump. And it knows about
the data directory layout, in particular, how relfilenodes are mapped to
physical files. Those are the low-level parts. So, it certainly needs
some server headers, but I wouldn't call it tons.


I'm not even worried about which headers this program uses.  What I'm
worried about is that you've got CATALOG_VERSION_NO compiled into a
non-server executable.  Is that really such a great idea?  Wouldn't it be
better if pg_rewind did not depend on that?  (Perhaps it should get the
database's catalog version out of the pg_control file, for example.)


Sure, that would be better. Although I don't have much hope to make it 
completely version-independent. At the moment, pg_rewind explicitly 
reads the control file (yeah, it knows about that too), and checks that 
the catalog version matches what pg_rewind was compiled with.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor improvements in alter_table.sgml

2014-04-28 Thread Robert Haas

On Wed, Apr 23, 2014 at 3:46 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 (2014/04/15 15:27), Etsuro Fujita wrote:
 (2014/04/14 23:53), Robert Haas wrote:

 On Fri, Apr 11, 2014 at 5:00 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:

 Attached is an updated version of the patch.


 I think the other changes deserve to be considered separately, and in
 particular I'm still not sure it's a good idea to document both OF
 type_name and type_name.


 I've agreed on that point, but I think apart from the others, the trivial
 typo should be corrected.  Patch attached (doc-altertable-typo.patch).

 I noticed the description of index_name should also be corrected, because it
 is currently used not only in CLUSTER ON, but in ADD
 table_constraint_using_index.  (It will also be used in REPLICA IDENTITY in
 9.4.)  Patch attached (doc-altertable-indexname.patch).

Agreed.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Andres Freund

On 2014-04-28 10:03:58 -0400, Tom Lane wrote:
 What I find much more worrisome about Andres' proposals is that he
 seems to be thinking that there are *no* other changes to the buffer
 headers on the horizon.

Err. I am not thinking that at all. I am pretty sure I never made that
argument. The reason I want to limit the number of connections is it
allows *both*, shrinking the size of BufferDescs due to less alignment
padding *and* stuffing the refcount and flags into one integer.

 That's untenable.  And I don't want to be told that we can't improve
 the buffer management algorithms because adding another field would
 make the headers not fit in a cacheline.

I think we need to move some less frequently fields to a separate array
to be future proof. Heikki suggested freeNext, wait_backend_pid I added
io_in_progress_lock. We could theoretically replace buf_id by
calculating it based on the BufferDescriptors array, but that's probably
not a good idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Composite Datums containing toasted fields are a bad idea(?)

2014-04-28 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-27 18:44:16 -0400, Tom Lane wrote:
 They don't get nearly as upset as they do if the database loses their
 data.  Yes, in an ideal world, we could fix this and not suffer any
 performance loss anywhere.  But no such option is on the table, and
 waiting is not going to make one appear.  What waiting *will* do is
 afford more opportunities for this bug to eat people's data.

 We make tradeoffs between performance and safety *all the time*.

Really?  I'm not aware of any case where we've left unrecoverable data
corruption go unfixed.  It's true that we've rejected fixing some cases
where plpgsql code might transiently try to use a stale toast pointer ---
but that's a lot different from losing stored data permanently.

 And you *have* come up with an alternative approach.

No, I haven't.  I experimented with a different approach, at Noah's
insistence.  But I did not and will not try to extend it to a complete
solution, first because I'm unsure that a 100% solution can reasonably
be reached that way, and second because that approach *also* entails
performance penalties --- unavoidable ones, unlike this approach where
people can modify their SQL code to avoid fetching unnecessary columns.

If somebody *else* is sufficiently hell-bent on doing it that way that
they will take responsibility for completing and back-patching that
approach, I will stand aside and let them find out the downsides.
Otherwise, I propose to commit and back-patch this version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-04-28 Thread Robert Haas

On Thu, Apr 24, 2014 at 1:39 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 04/17/2014 08:35 AM, Craig Ringer wrote:
 On 04/17/2014 04:47 AM, Petr Jelinek wrote:

 Well the logging is just too spammy in general when it comes to dynamic
 bgworkers but that's easy to fix in the future, no need to make
 decisions for 9.4.

 Agreed - it's the *API* that we need sorted out for 9.4, and log output
 isn't something Pg tends to consider part of the API.

 However I really don't like that I have to exit with exit code 1, which
 is normally used as failure, if I want to shutdown my dynamic bgworker
 once it has finished the work. And this behavior is something we can set
 properly only once...

 As far as I can tell we have a couple of options:

 - Redefine what the exit codes mean so that exit 0 suppresses
 auto-restart and exits silently. Probably simplest.

 I'm now strongly in favour of this alternative.

 I've just noticed that the bgworker control interfaces do not honour
 bgw.bgw_restart_time = BGW_NEVER_RESTART if you exit with status zero.

 This means that it's not simply a problem where you can't say restart
 me if I crash, but not if I exit normally.

 You also can't even say never restart me at all. Because
 BGW_NEVER_RESTART seems to really mean BGW_NO_RESTART_ON_CRASH.

 This _needs_fixing before 9.4.

It seems we have consensus on what to do about this, but what we
haven't got is a patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor improvement to fdwhandler.sgml

2014-04-28 Thread Robert Haas

On Thu, Apr 24, 2014 at 7:59 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 The patch attached improves docs in fdwhandler.sgml a little bit.

When you submit a patch, it's helpful to describe what the patch
actually does, rather than just saying it makes things better.  For
example, I think that this patch could be described as in
fdwhandler.sgml, mark references to scan_clauses with structfield
tags.

A problem with that idea is that scan_clauses is not a field in any struct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(

2014-04-28 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 pg_controldata includes postgres.h not postgres_fe.h, so utils/palloc.h
 must be able to compile in a #define FRONTEND context.

 Hmm, I had this patch in an abandoned branch from long ago, which I
 think helped remove postgres.h from pg_controldata.  I remembered it
 just now because of this commit message.  Maybe it's useful to re-remove
 the #ifndef FRONTEND from palloc.h.

Hm.  It would certainly be better if pg_controldata could use
postgres_fe.h not postgres.h, but I'm confused about how the new header
added by this patch helps that?  None of the declarations you removed from
xlog.h look like they'd be more problematic than the ones you left behind.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(

2014-04-28 Thread Alvaro Herrera

Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Tom Lane wrote:
  pg_controldata includes postgres.h not postgres_fe.h, so utils/palloc.h
  must be able to compile in a #define FRONTEND context.
 
  Hmm, I had this patch in an abandoned branch from long ago, which I
  think helped remove postgres.h from pg_controldata.  I remembered it
  just now because of this commit message.  Maybe it's useful to re-remove
  the #ifndef FRONTEND from palloc.h.
 
 Hm.  It would certainly be better if pg_controldata could use
 postgres_fe.h not postgres.h, but I'm confused about how the new header
 added by this patch helps that?  None of the declarations you removed from
 xlog.h look like they'd be more problematic than the ones you left behind.

The point IIRC was to be able to remove stuff that required type Datum
to be defined:

-extern void ShutdownXLOG(int code, Datum arg);

which led to the idea that the new header would be about process control
for xlog.c.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-28 10:03:58 -0400, Tom Lane wrote:
 What I find much more worrisome about Andres' proposals is that he
 seems to be thinking that there are *no* other changes to the buffer
 headers on the horizon.

 Err. I am not thinking that at all. I am pretty sure I never made that
 argument. The reason I want to limit the number of connections is it
 allows *both*, shrinking the size of BufferDescs due to less alignment
 padding *and* stuffing the refcount and flags into one integer.

Weren't you saying you also wanted to stuff the usage count into that same
integer?  That's getting a little too tight for my taste, even if it would
fit today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(

2014-04-28 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Hm.  It would certainly be better if pg_controldata could use
 postgres_fe.h not postgres.h, but I'm confused about how the new header
 added by this patch helps that?  None of the declarations you removed from
 xlog.h look like they'd be more problematic than the ones you left behind.

 The point IIRC was to be able to remove stuff that required type Datum
 to be defined:

 -extern void ShutdownXLOG(int code, Datum arg);

 which led to the idea that the new header would be about process control
 for xlog.c.

Oh, Datum was the issue?  I see.  But then this approach is basically
requiring that we can never reference Datum in any header that
pg_controldata uses.  Which seems like a pretty draconian limitation,
particularly if the headers involved are general-purpose ones like xlog.h.

What might be less breakable is to identify exactly which declarations
pg_controldata needs out of this file and push just those into some new
header.

However, pg_controldata is just the tip of the iceberg --- to get any
mileage out of this, we'd also have to make pg_resetxlog and pg_xlogdump
able to compile with only postgres_fe.h, so there might be too much stuff
involved.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 04/28/2014 04:51 PM, Tom Lane wrote:
 I'm not even worried about which headers this program uses.  What I'm
 worried about is that you've got CATALOG_VERSION_NO compiled into a
 non-server executable.  Is that really such a great idea?  Wouldn't it be
 better if pg_rewind did not depend on that?  (Perhaps it should get the
 database's catalog version out of the pg_control file, for example.)

 Sure, that would be better. Although I don't have much hope to make it 
 completely version-independent. At the moment, pg_rewind explicitly 
 reads the control file (yeah, it knows about that too), and checks that 
 the catalog version matches what pg_rewind was compiled with.

... which might or might not be the same one that libpgcommon was compiled
with, no?  I don't think you're really protecting yourself against version
skew that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] So why is EXPLAIN printing only plan time?

2014-04-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Sun, Apr 27, 2014 at 1:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ... and not, in particular, parse analysis or rewrite time?

 I think breaking those out would be a good idea.  Especially rewrite time.

Rewrite time seems generally negligible in comparison to the other two
components, at least in the simple testing I did yesterday.  It would
only be significant if you were expanding some complicated views, in
which case planning time would almost surely dominate anyway.

Anyway, I'm starting to come to the conclusion that the idea of silently
adding parse/rewrite time into the planning time line isn't such a good
one.  So there may or may not be sufficient interest in the other numbers
to justify adding them as separate lines later --- but the key word there
is later.  I now think we should leave planning time as it's currently
defined, which means we don't need to address this issue for 9.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Decrease MAX_BACKENDS to 2^16

2014-04-28 Thread Andres Freund

On 2014-04-28 10:57:12 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-04-28 10:03:58 -0400, Tom Lane wrote:
  What I find much more worrisome about Andres' proposals is that he
  seems to be thinking that there are *no* other changes to the buffer
  headers on the horizon.
 
  Err. I am not thinking that at all. I am pretty sure I never made that
  argument. The reason I want to limit the number of connections is it
  allows *both*, shrinking the size of BufferDescs due to less alignment
  padding *and* stuffing the refcount and flags into one integer.
 
 Weren't you saying you also wanted to stuff the usage count into that same
 integer?  That's getting a little too tight for my taste, even if it would
 fit today.

That's a possible additional optimization that we could use. But it's
certainly not required. Would allow us to use fewer atomic operations...

Right now there'd be enough space for a more precise usagecount and more
flags. ATM there's 9 bits for flags and 3 bits of usagecount...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Can't completely get rid of #ifndef FRONTEND in palloc.h :-(

2014-04-28 Thread Alvaro Herrera

Tom Lane wrote:

 What might be less breakable is to identify exactly which declarations
 pg_controldata needs out of this file and push just those into some new
 header.

Yeah, I also thought about that but didn't get around to trying.

 However, pg_controldata is just the tip of the iceberg --- to get any
 mileage out of this, we'd also have to make pg_resetxlog and pg_xlogdump
 able to compile with only postgres_fe.h, so there might be too much stuff
 involved.

IIRC we didn't have pg_xlogdump when I wrote this, and I checked
pg_resetxlog and it wasn't nearly as simple as pg_controldata.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] So why is EXPLAIN printing only plan time?

2014-04-28 Thread Robert Haas

On Mon, Apr 28, 2014 at 11:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Apr 27, 2014 at 1:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ... and not, in particular, parse analysis or rewrite time?

 I think breaking those out would be a good idea.  Especially rewrite time.

 Rewrite time seems generally negligible in comparison to the other two
 components, at least in the simple testing I did yesterday.  It would
 only be significant if you were expanding some complicated views, in
 which case planning time would almost surely dominate anyway.

 Anyway, I'm starting to come to the conclusion that the idea of silently
 adding parse/rewrite time into the planning time line isn't such a good
 one.  So there may or may not be sufficient interest in the other numbers
 to justify adding them as separate lines later --- but the key word there
 is later.  I now think we should leave planning time as it's currently
 defined, which means we don't need to address this issue for 9.4.

Works for me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Alvaro Herrera

Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 04/28/2014 04:51 PM, Tom Lane wrote:
  I'm not even worried about which headers this program uses.  What I'm
  worried about is that you've got CATALOG_VERSION_NO compiled into a
  non-server executable.  Is that really such a great idea?  Wouldn't it be
  better if pg_rewind did not depend on that?  (Perhaps it should get the
  database's catalog version out of the pg_control file, for example.)
 
  Sure, that would be better. Although I don't have much hope to make it 
  completely version-independent. At the moment, pg_rewind explicitly 
  reads the control file (yeah, it knows about that too), and checks that 
  the catalog version matches what pg_rewind was compiled with.
 
 ... which might or might not be the same one that libpgcommon was compiled
 with, no?  I don't think you're really protecting yourself against version
 skew that way.

The CATALOG_VERSION dependency in that code is a mistake which I didn't
notice back then.  I can't put too much thought into this issue at this
time, but printing fork numbers rather than names seems pretty
user-unfriendly to me.  Rather than a revert of the whole patch I
would hope for some different solution, if possible, though I can't
offer anything right now.

I don't think it's very likely that we would renumber forks; so the only
possible problem would be that pg_rewind is linked with an older
libpgcommon than the server which doesn't know some newer fork
name/number and fails to produce a correct result.  But ISTM we can
rightly consider that as pilot error, right?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implied BETWEEN from join quals

2014-04-28 Thread Simon Riggs

On 28 April 2014 15:01, Robert Haas robertmh...@gmail.com wrote:

 Interesting.  This can pretty obviously produce a big win if things
 break our way.  But it'll take some effort do determine whether the
 range of possible values for the join column is narrow enough to make
 the inferred BETWEEN clause selective enough to be useful, and that
 effort will be wasted if it turns out that the answer is no.  I can
 certainly think of times when this would have helped me, so I kind of
 like the idea in theory ... but I won't be surprised if it turns out
 to be possible to construct realistic test cases where trying to
 figure this out causes too much increase in planning time.

 Actually, it seems possible to me that this could end up winning even
 if it doesn't enable the use of an index scan rather than a sequential
 scan, because eliminating rows during the scan is typically much
 cheaper than eliminating them during the join step.  But it also seems
 possible that it could lose heavily in that case, if the extra steps
 during the scan phase don't eliminate enough rows to reduce the join
 cost to a sufficient degree.  I'm not really sure how it will work out
 on balance.

I agree its not always a win and that the planning overhead is a
little high. That's why I suggest we only attempt to calculate such a
plan if the normal plan for the scan is already high. That way the
extra planning is worth it. I'm sure there are other optimizations
that we might only wish to consider for plans that are otherwise high
cost. This need hardly touch the main planning path for shirt queries
at all.

The extra run-time cost of the test adds 2*cpu_operator_cost to the
scan. However, it will save at least (OldSelectivity -
NewSelectivity)*(cpu_tuple_cost + cpu_operator_cost) in the join node
directly above it (and possible more above that - though without being
able to inspect the join tree we won't know that until later - and the
output plan might actually chage the later join plan anyway). So we
can calculate the threshold at which we should choose the modified
plan, though it will likely be a conservative value.

The new selectivity can be calculated just as other selectivities, I
see no problem there.

So we have a proposed way to limit planning time in cases where the
benefit would be low. Plus we have a way to estimate the reduction in
cost once we add the new tests.

The real value is whether we can use the additional tests to pick up
an index we hadn't been able to use before. Given that MinMax index
use will be a win even for fairly high selectivities (because of the
low cost of index access), it looks like this optimization will play
very nicely with MinMax.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Alvaro Herrera

Tom Lane wrote:
 Abhijit Menon-Sen a...@2ndquadrant.com writes:
  In the past, we've had situations where everything is hung turned out
  to be because of a script that ran manual VACUUM that was holding some
  lock. It's admittedly not a huge problem, but it might be useful if a
  manual VACUUM could be cancelled the way autovacuum can be.
 
 I think the real answer to that is stop using manual VACUUM.

As much as I'm a fan of autovacuum, that's not always possible.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Tom Lane

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 ... which might or might not be the same one that libpgcommon was compiled
 with, no?  I don't think you're really protecting yourself against version
 skew that way.

 The CATALOG_VERSION dependency in that code is a mistake which I didn't
 notice back then.  I can't put too much thought into this issue at this
 time, but printing fork numbers rather than names seems pretty
 user-unfriendly to me.  Rather than a revert of the whole patch I
 would hope for some different solution, if possible, though I can't
 offer anything right now.

I think it would be okay to have a common/ module that encapsulates
fork names/numbers.  It's relpath() and particularly
TABLESPACE_VERSION_DIRECTORY that bother me from a dependency standpoint.

As far as pg_xlogdump goes, I agree that symbolic fork names are probably
nice, but I think the case for printing db/ts/rel OIDs as a pathname is a
whole lot weaker --- to my taste, that's actually an anti-feature.  So I
wouldn't mind dropping that dependency on relpath.  Not sure what to do
for Heikki's pg_rewind, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Robert Haas

On Mon, Apr 28, 2014 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 ... which might or might not be the same one that libpgcommon was compiled
 with, no?  I don't think you're really protecting yourself against version
 skew that way.

 The CATALOG_VERSION dependency in that code is a mistake which I didn't
 notice back then.  I can't put too much thought into this issue at this
 time, but printing fork numbers rather than names seems pretty
 user-unfriendly to me.  Rather than a revert of the whole patch I
 would hope for some different solution, if possible, though I can't
 offer anything right now.

 I think it would be okay to have a common/ module that encapsulates
 fork names/numbers.  It's relpath() and particularly
 TABLESPACE_VERSION_DIRECTORY that bother me from a dependency standpoint.

 As far as pg_xlogdump goes, I agree that symbolic fork names are probably
 nice, but I think the case for printing db/ts/rel OIDs as a pathname is a
 whole lot weaker --- to my taste, that's actually an anti-feature.

I might be missing something, but, why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planned downtime @ Rackspace

2014-04-28 Thread Stephen Frost

All,

  Unfortunately, Rackspace has had to post-pone this pending
  finalization of the switch configuration.

  We'll send out an update when we have a (new) finalized downtime
  window.

Thanks,

Stephen

* Stephen Frost (sfr...@snowman.net) wrote:
 Greetings,
 
   As some may be aware, we are currently working with Rackspace to
   upgrade the PostgreSQL infrastructure systems which they graciously
   host for us.  As part of these upgrades there will be downtime for
   systems hosted there.
 
   Our first planned downtime will be for ~ 2 hours on Monday, April
   28th, between 1500-2000 UTC (11am-4pm US/Eastern).  We have requested
   that the outage be towards the end of that window (1800-2000 UTC), but
   there is no guarantee that they'll be able to support that request.
   Clearly, we hope the actual downtime will be less than 2 hours.
 
   This downtime will impact all systems hosted @ Rackspace as they are
   upgrading the switch for us.  End-user services which will be impacted
   include:
 
 yum.postgresql.org
 wiki.postgresql.org
 git master server (Committers only)
 planet.postgresql.org
 www.pgadmin.org, ftp.pgadmin.org
 media.postgresql.org
 commitfest.postgresql.org
 developer.postgresql.org (Developer personal homepages)
 redmine.postgresql.org
 jdbc.postgresql.org
 postgresopen.org
 developer.pgadmin.org (sandbox, Jenkins)
 babel.postgresql.org (Translation services)
 
   Redundant services (minimal impact expected):
 ns2.postgresql.org (other nameservers will still work)
 .us inbound mail relay (other MXs will still work)
 .us website front-end (should be removed from pool)
 FTP mirror (should be removed from pool)
 
   We anticipate this being the only whole-environment outage during
   this migration.  Future outages will happen for individual systems
   as we migrate them to the new hardware. 
 
 Thanks!
 
 Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Different behaviour of concate() and concate operator ||

2014-04-28 Thread Greg Stark

The missing bit of context is that concat() is there because early on
in Postgres's life there was an effort to have a full suite of Oracle
compatibility functions. If someone suggested it now they would be
pushed towards making it an extension or pointed at EDB. But things
like concat are the remnants of that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 28, 2014 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 As far as pg_xlogdump goes, I agree that symbolic fork names are probably
 nice, but I think the case for printing db/ts/rel OIDs as a pathname is a
 whole lot weaker --- to my taste, that's actually an anti-feature.

 I might be missing something, but, why?

It's more verbose, it's not actually any more information, and in many
cases it's actively misleading, because what's printed is NOT the real
file name --- it omits segment numbers for instance.  As a particularly
egregious example, in xact_desc_commit() we print a pathname including
MAIN_FORKNUM, which is a flat out lie to the reader, because what will
actually get deleted is all forks.

So yeah, it's an anti-feature.

BTW, for the same reasons I'm less than impressed with the uses of
relpath in error messages in, eg, reorderbuffer.c:

relation = RelationIdGetRelation(reloid);

if (relation == NULL)
elog(ERROR, could open relation descriptor %s,
 relpathperm(change-data.tp.relnode, 
MAIN_FORKNUM));

Printing anything other than the relation OID here is irrelevant,
misleading, and inconsistent with our practice everywhere else.
Let's not even mention the missing not in the message text.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Different behaviour of concate() and concate operator ||

2014-04-28 Thread Tom Lane

Greg Stark st...@mit.edu writes:
 The missing bit of context is that concat() is there because early on
 in Postgres's life there was an effort to have a full suite of Oracle
 compatibility functions. If someone suggested it now they would be
 pushed towards making it an extension or pointed at EDB. But things
 like concat are the remnants of that.

Well, that's historical revisionism, because concat() was added in 9.1.
But if it was defined this way for Oracle compatibility, that makes
sense, because Oracle doesn't distinguish NULL from empty strings.
So they pretty much would have to make concat() treat NULL as empty.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Claudio Freire

On Mon, Apr 28, 2014 at 12:52 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Tom Lane wrote:
 Abhijit Menon-Sen a...@2ndquadrant.com writes:
  In the past, we've had situations where everything is hung turned out
  to be because of a script that ran manual VACUUM that was holding some
  lock. It's admittedly not a huge problem, but it might be useful if a
  manual VACUUM could be cancelled the way autovacuum can be.

 I think the real answer to that is stop using manual VACUUM.

 As much as I'm a fan of autovacuum, that's not always possible.


Or even recommended, unless the docs changed radically in the last
couple of weeks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Andres Freund

On 2014-04-28 09:54:49 -0400, Tom Lane wrote:
 Abhijit Menon-Sen a...@2ndquadrant.com writes:
  In the past, we've had situations where everything is hung turned out
  to be because of a script that ran manual VACUUM that was holding some
  lock. It's admittedly not a huge problem, but it might be useful if a
  manual VACUUM could be cancelled the way autovacuum can be.
 
 I think the real answer to that is stop using manual VACUUM.

E.g. manually scheduling the full table vacuums to happen during low
activity periods is a very good idea on busy servers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Tom Lane

Claudio Freire klaussfre...@gmail.com writes:
 On Mon, Apr 28, 2014 at 12:52 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Tom Lane wrote:
 Abhijit Menon-Sen a...@2ndquadrant.com writes:
 In the past, we've had situations where everything is hung turned out
 to be because of a script that ran manual VACUUM that was holding some
 lock. It's admittedly not a huge problem, but it might be useful if a
 manual VACUUM could be cancelled the way autovacuum can be.

 I think the real answer to that is stop using manual VACUUM.

 As much as I'm a fan of autovacuum, that's not always possible.

 Or even recommended, unless the docs changed radically in the last
 couple of weeks.

Actually, having just looked at the code in question, I think this whole
thread is based on an obsolete assumption.  AFAICS, since commit b19e4250b
manual vacuum behaves exactly like autovacuum as far as getting kicked off
the exclusive lock is concerned.  There's certainly not any tests for
autovacuum in lazy_truncate_heap() today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Jeff Janes

On Mon, Apr 28, 2014 at 6:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Abhijit Menon-Sen a...@2ndquadrant.com writes:
  In the past, we've had situations where everything is hung turned out
  to be because of a script that ran manual VACUUM that was holding some
  lock. It's admittedly not a huge problem, but it might be useful if a
  manual VACUUM could be cancelled the way autovacuum can be.

 I think the real answer to that is stop using manual VACUUM.



Autovac is also going to promote itself to uninterruptible once every 150e6
transactions (by the default settings).

To stop using manual vacuums is not going to be a complete cure anyway.

It would be nice to know why the scripts are doing the manual vacuum.  Just
out of mythology, or is there an identifiable reason?

Cheers,

Jeff

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Andres Freund

On 2014-04-28 13:58:10 -0400, Tom Lane wrote:
 Claudio Freire klaussfre...@gmail.com writes:
  On Mon, Apr 28, 2014 at 12:52 PM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
  Tom Lane wrote:
  Abhijit Menon-Sen a...@2ndquadrant.com writes:
  In the past, we've had situations where everything is hung turned out
  to be because of a script that ran manual VACUUM that was holding some
  lock. It's admittedly not a huge problem, but it might be useful if a
  manual VACUUM could be cancelled the way autovacuum can be.
 
  I think the real answer to that is stop using manual VACUUM.
 
  As much as I'm a fan of autovacuum, that's not always possible.
 
  Or even recommended, unless the docs changed radically in the last
  couple of weeks.
 
 Actually, having just looked at the code in question, I think this whole
 thread is based on an obsolete assumption.  AFAICS, since commit b19e4250b
 manual vacuum behaves exactly like autovacuum as far as getting kicked off
 the exclusive lock is concerned.  There's certainly not any tests for
 autovacuum in lazy_truncate_heap() today.

I don't think this is about the truncation thing, but about the
deadlock.c/proc.c logic around DS_BLOCKED_BY_AUTOVACUUM. I.e. that a
autovacuum is cancelled if user code tries to acquire a conflicting
lock.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Jeff Janes

On Mon, Apr 28, 2014 at 10:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Claudio Freire klaussfre...@gmail.com writes:
  On Mon, Apr 28, 2014 at 12:52 PM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
  Tom Lane wrote:
  Abhijit Menon-Sen a...@2ndquadrant.com writes:
  In the past, we've had situations where everything is hung turned
 out
  to be because of a script that ran manual VACUUM that was holding some
  lock. It's admittedly not a huge problem, but it might be useful if a
  manual VACUUM could be cancelled the way autovacuum can be.

  I think the real answer to that is stop using manual VACUUM.

  As much as I'm a fan of autovacuum, that's not always possible.

  Or even recommended, unless the docs changed radically in the last
  couple of weeks.

 Actually, having just looked at the code in question, I think this whole
 thread is based on an obsolete assumption.  AFAICS, since commit b19e4250b
 manual vacuum behaves exactly like autovacuum as far as getting kicked off
 the exclusive lock is concerned.  There's certainly not any tests for
 autovacuum in lazy_truncate_heap() today.


I assumed he was a talking about the SHARE UPDATE EXCLUSIVE used during the
main work, not the ACCESS EXCLUSIVE one used during truncation.


Cheers,

Jeff

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 I don't think this is about the truncation thing, but about the
 deadlock.c/proc.c logic around DS_BLOCKED_BY_AUTOVACUUM. I.e. that a
 autovacuum is cancelled if user code tries to acquire a conflicting
 lock.

It's a bit of a stretch to claim that a manual VACUUM should be cancelled
by a manual DDL action elsewhere.  Who's to say which of those things
should have priority?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Andres Freund

On 2014-04-28 14:05:04 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I don't think this is about the truncation thing, but about the
  deadlock.c/proc.c logic around DS_BLOCKED_BY_AUTOVACUUM. I.e. that a
  autovacuum is cancelled if user code tries to acquire a conflicting
  lock.
 
 It's a bit of a stretch to claim that a manual VACUUM should be cancelled
 by a manual DDL action elsewhere.  Who's to say which of those things
 should have priority?

Yea, I am not that sure about the feature either. It sure would need to
be optional. Often enough VACUUMs are scripted to run during off hours,
for those it possibly makes sense.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] allowing VACUUM to be cancelled for conflicting locks

2014-04-28 Thread Jeff Janes

On Mon, Apr 28, 2014 at 11:05 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Andres Freund and...@2ndquadrant.com writes:
  I don't think this is about the truncation thing, but about the
  deadlock.c/proc.c logic around DS_BLOCKED_BY_AUTOVACUUM. I.e. that a
  autovacuum is cancelled if user code tries to acquire a conflicting
  lock.

 It's a bit of a stretch to claim that a manual VACUUM should be cancelled
 by a manual DDL action elsewhere.  Who's to say which of those things
 should have priority?


The proposal was to add either a GUC, or a syntax to the vacuum command, so
it would be either DBA or the invoker of the vacuum which is the one to
say.  Either one does seem a reasonable place to have such a say, although
perhaps not worth the effort to implement.


Cheers,

Jeff

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-28 Thread Peter Geoghegan

On Mon, Apr 28, 2014 at 6:02 AM, Robert Haas robertmh...@gmail.com wrote:
 Also true.  But the problem is that it is very rarely, if ever, the
 case that all pages are *equally* hot.  On a pgbench workload, for
 example, I'm very confident that while there's not really any cold
 data, the btree roots and visibility map pages are a whole lot hotter
 than a randomly-selected heap page.  If you evict a heap page, you're
 going to need it back pretty quick, because it won't be long until the
 random-number generator again chooses a key that happens to be located
 on that page.  But if you evict the root of the btree index, you're
 going to need it back *immediately*, because the very next query, no
 matter what key it's looking for, is going to need that page.  I'm
 pretty sure that's a significant difference.

I emphasized leaf pages because even with master the root and inner
pages are still going to be so hot as to make them constantly in
cache, at least with pgbench's use of a uniform distribution. You'd
have to have an absolutely enormous scale factor before this might not
be the case. As such, I'm not all that worried about inner pages when
performing these simple benchmarks. However, in the case of the
pgbench_accounts table, each of the B-Tree leaf pages that comprise
about 99.5% of the total is still going to be about six times more
frequently accessed than each heap page. That's a small enough
difference for it to easily go unappreciated, and yet a big enough
difference for it to hurt a lot.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Robert Haas

On Mon, Apr 28, 2014 at 1:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 28, 2014 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 As far as pg_xlogdump goes, I agree that symbolic fork names are probably
 nice, but I think the case for printing db/ts/rel OIDs as a pathname is a
 whole lot weaker --- to my taste, that's actually an anti-feature.

 I might be missing something, but, why?

 It's more verbose, it's not actually any more information, and in many
 cases it's actively misleading, because what's printed is NOT the real
 file name --- it omits segment numbers for instance.  As a particularly
 egregious example, in xact_desc_commit() we print a pathname including
 MAIN_FORKNUM, which is a flat out lie to the reader, because what will
 actually get deleted is all forks.

Yeah, technically it's a lie, but ls copy-and-paste-here* is pretty
handy.  If you format it some other way it's annoying to reformat it.

 So yeah, it's an anti-feature.

 BTW, for the same reasons I'm less than impressed with the uses of
 relpath in error messages in, eg, reorderbuffer.c:

 relation = RelationIdGetRelation(reloid);

 if (relation == NULL)
 elog(ERROR, could open relation descriptor %s,
  relpathperm(change-data.tp.relnode, 
 MAIN_FORKNUM));

 Printing anything other than the relation OID here is irrelevant,
 misleading, and inconsistent with our practice everywhere else.
 Let's not even mention the missing not in the message text.

Uggh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-28 Thread Peter Geoghegan

On Fri, Apr 25, 2014 at 10:45 AM, Peter Geoghegan p...@heroku.com wrote:
 I've now done a non-limited comparative benchmark of master against
 the patch (once again, with usage_count starting at 6, and
 BM_MAX_USAGE_COUNT at 30) with a Gaussian distribution. Once again,
 the distribution threshold used was consistently 5.0, causing the
 patched pgbench to report for each test:

 transaction type: Custom query
 scaling factor: 5000
 standard deviation threshold: 5.0
 access probability of top 20%, 10% and 5% records: 0.68269 0.38293 0.19741

 Results are available from:

 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/3-sec-delay-gauss/

I updated this with various changes in bgwriter configuration. Perhaps
unsurprisingly, disabling the background writer entirely helps for
both master and patched. It is perhaps notable that the largest
difference between two comparable patch + master test runs is seen
when the background writer is disabled entirely, and 32 clients.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 28, 2014 at 1:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's more verbose, it's not actually any more information, and in many
 cases it's actively misleading, because what's printed is NOT the real
 file name --- it omits segment numbers for instance.  As a particularly
 egregious example, in xact_desc_commit() we print a pathname including
 MAIN_FORKNUM, which is a flat out lie to the reader, because what will
 actually get deleted is all forks.

 Yeah, technically it's a lie, but ls copy-and-paste-here* is pretty
 handy.  If you format it some other way it's annoying to reformat it.

Handy for what?  How often do you need to do that?  (And if you do do it,
how often will you remember that the filename is only approximate?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Andres Freund

On 2014-04-28 14:44:16 -0400, Robert Haas wrote:
  BTW, for the same reasons I'm less than impressed with the uses of
  relpath in error messages in, eg, reorderbuffer.c:
 
  relation = RelationIdGetRelation(reloid);
 
  if (relation == NULL)
  elog(ERROR, could open relation descriptor %s,
   relpathperm(change-data.tp.relnode, 
  MAIN_FORKNUM));
 
  Printing anything other than the relation OID here is irrelevant,
  misleading, and inconsistent with our practice everywhere else.
  Let's not even mention the missing not in the message text.
 
 Uggh.

I'll send a fix once I am home (~3h).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Robert Haas

On Mon, Apr 28, 2014 at 2:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Apr 28, 2014 at 1:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's more verbose, it's not actually any more information, and in many
 cases it's actively misleading, because what's printed is NOT the real
 file name --- it omits segment numbers for instance.  As a particularly
 egregious example, in xact_desc_commit() we print a pathname including
 MAIN_FORKNUM, which is a flat out lie to the reader, because what will
 actually get deleted is all forks.

 Yeah, technically it's a lie, but ls copy-and-paste-here* is pretty
 handy.  If you format it some other way it's annoying to reformat it.

 Handy for what?  How often do you need to do that?  (And if you do do it,
 how often will you remember that the filename is only approximate?)

*shrug*

I think if you're looking at the output of xact_desc_commit() and you
*don't* know that the filenames are approximate (or at least that you
should Use The Source, Luke) then you're probably in way over your
head.

I have to admit it's been a few years since I've had to play with
WAL_DEBUG, so I don't really remember what I was trying to do.  But I
don't see any real argument that three slash-separated numbers will be
more useful to somebody who has to dig through this than a pathname,
even an approximate pathname, and I think people wanting to figure out
approximately where they need to look to find the data affected by the
WAL record will be pretty common among people decoding WAL records.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I have to admit it's been a few years since I've had to play with
 WAL_DEBUG, so I don't really remember what I was trying to do.  But I
 don't see any real argument that three slash-separated numbers will be
 more useful to somebody who has to dig through this than a pathname,
 even an approximate pathname, and I think people wanting to figure out
 approximately where they need to look to find the data affected by the
 WAL record will be pretty common among people decoding WAL records.

Meh.  I still think it's a bad idea to have CATALOG_VERSION_NO getting
compiled into libpgcommon.a, where there will be no way to cross-check
that it matches anything.  But I guess I'm losing this argument.

However, we've absolutely got to get that reference out of
common/relpath.h.  The usage of RelFileNode there is problematic
as well.

How about we change common/relpath.[hc] to export a single version of
relpath() that takes its arguments as separate plain OIDs, and then
make the other versions wrappers that are only declared in some
backend header?  The wrappers could possibly be macros, allowing
things like pg_xlogdump to keep using them as long as they didn't
mind importing backend headers.  (Though for the RelFileNode case this
would imply multiple evaluation of the macro argument, so maybe it's
not such a great idea.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-04-28 Thread Petr Jelinek


On 28/04/14 16:27, Robert Haas wrote:

On Thu, Apr 24, 2014 at 1:39 AM, Craig Ringer cr...@2ndquadrant.com wrote:

On 04/17/2014 08:35 AM, Craig Ringer wrote:
I've just noticed that the bgworker control interfaces do not honour
bgw.bgw_restart_time = BGW_NEVER_RESTART if you exit with status zero.

This means that it's not simply a problem where you can't say restart
me if I crash, but not if I exit normally.

You also can't even say never restart me at all. Because
BGW_NEVER_RESTART seems to really mean BGW_NO_RESTART_ON_CRASH.

This _needs_fixing before 9.4.


It seems we have consensus on what to do about this, but what we
haven't got is a patch.



If you mean the consensus that exit status 0 should mean permanent stop 
then I think the patch can be as simple as attached.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b573fd8..be211f0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2848,7 +2848,10 @@ CleanupBackgroundWorker(int pid,
 		if (!EXIT_STATUS_0(exitstatus))
 			rw-rw_crashed_at = GetCurrentTimestamp();
 		else
+		{
 			rw-rw_crashed_at = 0;
+			rw-rw_terminate = true;
+		}
 
 		/*
 		 * Additionally, for shared-memory-connected workers, just like a

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shm_mq inconsistent behavior of SHM_MQ_DETACHED

2014-04-28 Thread Petr Jelinek


On 28/04/14 15:36, Robert Haas wrote:

On Tue, Apr 22, 2014 at 9:55 AM, Petr Jelinek p...@2ndquadrant.com wrote:


But if I do first receive after detach like in this sequence:
P1 - set_sender
P1 - attach
P2 - set_receiver
P2 - attach
P1 - send
P1 - send
P1 - detach
P2 - receive

I get SHM_MQ_DETACHED on the receiver even though there are messages in the
ring buffer.


That's a bug.

I'm thinking that the problem is really revolves around
shm_mq_wait_internal().  It returns true if the queue is attached but
not detached, and false if either the detach has already happened, or
if we establish via the background worker handle that it will never
come.  But in the case of receiving, we want to treat
attached-then-detached as a success case, not a failure case.

Can you see if the attached patch fixes it?




Yes, the patch fixes it for me.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] includedir_internal headers are not self-contained

2014-04-28 Thread Andres Freund

On 2014-04-28 13:20:47 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, Apr 28, 2014 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  As far as pg_xlogdump goes, I agree that symbolic fork names are probably
  nice, but I think the case for printing db/ts/rel OIDs as a pathname is a
  whole lot weaker --- to my taste, that's actually an anti-feature.
 
  I might be missing something, but, why?
 
 It's more verbose, it's not actually any more information, and in many
 cases it's actively misleading, because what's printed is NOT the real
 file name --- it omits segment numbers for instance.  As a particularly
 egregious example, in xact_desc_commit() we print a pathname including
 MAIN_FORKNUM, which is a flat out lie to the reader, because what will
 actually get deleted is all forks.
 
 So yeah, it's an anti-feature.
 
 BTW, for the same reasons I'm less than impressed with the uses of
 relpath in error messages in, eg, reorderbuffer.c:
 
 relation = RelationIdGetRelation(reloid);
 
 if (relation == NULL)
 elog(ERROR, could open relation descriptor %s,
  relpathperm(change-data.tp.relnode, 
 MAIN_FORKNUM));
 
 Printing anything other than the relation OID here is irrelevant,
 misleading, and inconsistent with our practice everywhere else.

I don't think it's really comparable to the other scenarios. We should
print the oid, just as relation_open() does, but the filenode is also
rather helpful here. How about the attached?

If we had a version of relpath() that didn't require the fsm, I'd use
it. If you prefer, I'd be happy enough to replace it with
spcNode/dbNode/relNode. That's more than sufficient here.

 Let's not even mention the missing not in the message text.

That's clearly wrong.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 4493930..0b9731b 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1362,14 +1362,14 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 		change-data.tp.oldtuple == NULL)
 		continue;
 	else if (reloid == InvalidOid)
-		elog(ERROR, could not lookup relation %s,
+		elog(ERROR, could not map filenode %s to its oid,
 			 relpathperm(change-data.tp.relnode, MAIN_FORKNUM));
 
 	relation = RelationIdGetRelation(reloid);
 
 	if (relation == NULL)
-		elog(ERROR, could open relation descriptor %s,
-			 relpathperm(change-data.tp.relnode, MAIN_FORKNUM));
+		elog(ERROR, could not open relation with oid %u, filenode %s,
+			 reloid, relpathperm(change-data.tp.relnode, MAIN_FORKNUM));
 
 	if (RelationIsLogicallyLogged(relation))
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-28 Thread Kohei KaiGai

 Yeah.  I'm still not exactly convinced that custom-scan will ever allow
 independent development of new plan types (which, with all due respect to
 Robert, is what it was being sold as last year in Ottawa).  But I'm not
 opposed in principle to committing it, if we can find a way to have a cleaner
 API for things like setrefs.c.  It seems like late-stage planner processing
 in general is an issue for this patch (createplan.c and subselect.c are
 also looking messy).  EXPLAIN isn't too great either.

 I'm not sure exactly what to do about those cases, but I wonder whether
 things would get better if we had the equivalent of
 expression_tree_walker/mutator capability for plan nodes.  The state of
 affairs in setrefs and subselect, at least, is a bit reminiscent of the
 bad old days when we had lots of different bespoke code for traversing
 expression trees.

 Hmm. If we have something like expression_tree_walker/mutator for plan nodes,
 we can pass a walker/mutator function's pointer instead of exposing static
 functions that takes recursive jobs.
 If custom-plan provider (that has sub-plans) got a callback with walker/
 mutator pointer, all it has to do for sub-plans are calling this new
 plan-tree walking support routine with supplied walker/mutator.
 It seems to me more simple design than what I did.

I tried to code the similar walker/mutator functions on plan-node tree,
however, it was not available to implement these routines enough
simple, because the job of walker/mutator functions are not uniform
thus caller side also must have a large switch-case branches.

I picked up setrefs.c for my investigation.
The set_plan_refs() applies fix_scan_list() on the expression tree being
appeared in the plan node if it is delivered from Scan, however, it also
applies set_join_references() for subclass of Join, or
set_dummy_tlist_references() for some other plan nodes.
It implies that the walker/mutator functions of Plan node has to apply
different operation according to the type of Plan node. I'm not certain
how much different forms are needed.
(In addition, set_plan_refs() performs usually like a walker, but
often performs as a mutator if trivial subquery)

I'm expecting the function like below. It allows to call plan_walker
function for each plan-node and also allows to call expr_walker
function for each expression-node on the plan node.

bool
plan_tree_walker(Plan *plan,
 bool (*plan_walker) (),
 bool (*expr_walker) (),
 void *context)

I'd like to see if something other form to implement this routine.


One alternative idea to give custom-plan provider a chance to
handle its subplans is, to give function pointers (1) to handle
recursion of plan-tree and (2) to set up backend's internal
state.
In case of setrefs.c, set_plan_refs() and fix_expr_common()
are minimum necessity for extensions. It also kills necessity
to export static functions.

How about your thought?
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-28 Thread Jim Nasby


On 4/28/14, 8:04 AM, Robert Haas wrote:

On Mon, Apr 21, 2014 at 6:38 PM, Jim Nasby j...@nasby.net wrote:

I feel that if there is no memory pressure, frankly it doesnt matter much
about what gets out and what not. The case I am specifically targeting is
when the clocksweep gets to move about a lot i.e. high memory pressure
workloads. Of course,  I may be totally wrong here.


Well, there's either memory pressure or there isn't. If there isn't then
it's all moot *because we're not evicting anything*.


I don't think that's really true.  A workload can fit within
shared_buffers at some times and spill beyond it at others.  Every
time it fits within shared_buffers for even a short period of time,
the reference count of any buffer that's not ice-cold goes to 5 and we
essentially lose all knowledge of which buffers are relatively hotter.
  Then, when we spill out again, evictions are random.


That's a separate problem, but yes, just because we're not evicting something 
doesn't mean we can end up with every buffer marked as equally important.

OSes handle this by splitting pages between active and inactive, and 
maintaining a relative balance between the two (actually a bit more complex 
because there's a separate inactive/dirty pool).

In our case this could maybe be handled by simply not incrementing counts when 
there's no eviction... but I'm more a fan of separate polls/clocks, because 
that means you can do things like a LFU for active and an LRU for inactive.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem with displaying wide tables in psql

2014-04-28 Thread Peter Eisentraut

Please fix this compiler warning.  I think it came from this patch.


print.c: In function ‘print_aligned_vertical’:
print.c:1354:10: error: pointer targets in passing argument 1 of 
‘strlen_max_width’ differ in signedness [-Werror=pointer-sign]
  encoding);
  ^
print.c:126:12: note: expected ‘unsigned char *’ but argument is of type ‘char 
*’
 static int strlen_max_width(unsigned char *str, int *target_width, int 
encoding);
^




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem with displaying wide tables in psql

2014-04-28 Thread Michael Paquier

On Tue, Apr 29, 2014 at 12:37 PM, Sergey Muraviov
sergey.k.murav...@gmail.com wrote:
 2014-04-29 5:52 GMT+04:00 Peter Eisentraut pete...@gmx.net:

 Please fix this compiler warning.  I think it came from this patch.
 print.c: In function 'print_aligned_vertical':
 print.c:1354:10: error: pointer targets in passing argument 1 of
 'strlen_max_width' differ in signedness [-Werror=pointer-sign]
   encoding);
   ^
 print.c:126:12: note: expected 'unsigned char *' but argument is of type
 'char *'
  static int strlen_max_width(unsigned char *str, int *target_width, int
 encoding);
 ^
 fixed
Your patch has been committed by Greg not so long ago, you should send
for this warning a patch rebased on the latest master branch commit :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem with displaying wide tables in psql

2014-04-28 Thread Sergey Muraviov

rebased


2014-04-29 7:43 GMT+04:00 Michael Paquier michael.paqu...@gmail.com:

 On Tue, Apr 29, 2014 at 12:37 PM, Sergey Muraviov
 sergey.k.murav...@gmail.com wrote:
  2014-04-29 5:52 GMT+04:00 Peter Eisentraut pete...@gmx.net:
 
  Please fix this compiler warning.  I think it came from this patch.
  print.c: In function 'print_aligned_vertical':
  print.c:1354:10: error: pointer targets in passing argument 1 of
  'strlen_max_width' differ in signedness [-Werror=pointer-sign]
encoding);
^
  print.c:126:12: note: expected 'unsigned char *' but argument is of type
  'char *'
   static int strlen_max_width(unsigned char *str, int *target_width, int
  encoding);
  ^
  fixed
 Your patch has been committed by Greg not so long ago, you should send
 for this warning a patch rebased on the latest master branch commit :)
 --
 Michael




-- 
Best regards,
Sergey Muraviov
diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index 0eb..b2f56a3 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -1350,8 +1350,7 @@ print_aligned_vertical(const printTableContent *cont, FILE *fout)
 			{
 int swidth, twidth = hwidth + 1;
 fputs(hline? format-header_nl_left:  , fout);
-strlen_max_width((char *) hlineptr[hline].ptr, twidth,
- encoding);
+strlen_max_width(hlineptr[hline].ptr, twidth, encoding);
 fprintf(fout, %-s, hlineptr[hline].ptr);
 
 swidth = hwidth - twidth;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Different behaviour of concate() and concate operator ||

2014-04-28 Thread amul sul

On Monday, 28 April 2014 10:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Greg Stark st...@mit.edu writes:

So they pretty much would have to make concat() treat NULL as empty.


could it be problematic if I update pg_operator catalogue entry for ||, 
call to concat() function instead of texcat(), in my production environment ? 

for eg.
 oprname | oprleft | oprright | oprresult | oprcode 
-+-+--+---+- 
 ||      |    25   |   25     |     25    | concat 
(1 row) 


result would be 

test=# select  'abc' || NULL;   
 ; 
 ?column? 
-- 
 abc 
(1 row) 

Thank for your help.

Regards,
Amul Sul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Merge Join for Non '=' Operators

2014-04-28 Thread Dilip kumar

On 10 April 2014 14:21, I wrote

 
 I shall perform some more test, for that I need to do some more hack in
 the code and I will post them soon..
 
 Test Scenario:
   Create table t1 (a int, b int);
   Create table t2 (a int, b int);
 
   Random record inserted in t1 and t2, as per attached files. (10K
 records are inserted in both the tables)
 
   Performance is taken for the query : select count(*) from t1,t2
 where t1.b  t2.b;
 
 Test Result:
   Nest Loop Join :Time: 36038.842 ms
   Merge Join :  Time: 19774.975 ms

   Number of record selected: 42291979
 
I have some more testing with index and multiple conditions..
Test Scenario:
Create table t1 (a int, b int);
Create table t2 (a int, b int);

Create index t1_idx t1(b);
Create index t1_idx t1(b);  

Query: select count(*) from t1,t2 where t1.bt2.b and t1.b  12000;

Test Result: 
Nest Loop Join with Index Scan   : 1653.506 ms
Sort Merge Join for (seq scan)   : 610.257ms


From above both the scenario Sort merge join for  operator is faster than NLJ 
(using seq scan or index scan).

Any suggestion for other performance scenarios are welcome..

Thanks  Regards,
Dilip Kumar


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

72 matches

Mail list logo