Re: [HACKERS] [COMMITTERS] pgsql: Add restart_after_crash GUC.

2010-07-26 Thread Fujii Masao
On Tue, Jul 20, 2010 at 9:47 AM, Robert Haas  wrote:
> Log Message:
> ---
> Add restart_after_crash GUC.

In postgresql.conf.sample, on/off is used as a boolean value.
But true/false is used for exit_on_error and restart_after_crash.
Sorry, I had overlooked that inconsistency when reviewing the
original patch. I attached the bug-fix patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


true_false_to_on_off_0727.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao
On Wed, Jul 21, 2010 at 4:36 PM, Fujii Masao  wrote:
>> I was actually hoping to see a patch for these things first, before any of
>> the synchronous replication stuff. Eliminating the polling loops is
>> important, latency will be laughable otherwise, and it will help the
>> synchronous case too.
>
> At first, note that the poll loop in the backend and walreceiver doesn't
> exist without synchronous replication stuff.
>
> Yeah, I'll start with the change of the poll loop in the walsender. I'm
> thinking that we should make the backend signal the walsender to send the
> outstanding WAL immediately as the previous synchronous replication patch
> I submitted in the past year did. I use the signal here because walsender
> needs to wait for the request from the backend and the ack message from
> the standby *concurrently* in synchronous replication. If we use the
> semaphore instead of the signal, the walsender would not be able to
> respond the ack immediately, which also degrades the performance.
>
> The problem of this idea is that signal can be sent per transaction commit.
> I'm not sure if this frequent signaling really harms the performance of
> replication. BTW, when I benchmarked the previous synchronous replication
> patch based on the idea, AFAIR the result showed no impact of the
> signaling. But... Thought? Do you have another better idea?

The attached patch changes the backend so that it signals walsender to
wake up from the sleep and send WAL immediately. It doesn't include any
other synchronous replication stuff.

The signal is sent right after a COMMIT, PREPARE TRANSACTION,
COMMIT PREPARED or ABORT PREPARED record has been fsync'd.

To suppress redundant signaling, I added the flag which indicates whether
walsender is ready for sending WAL up to the currently-fsync'd location.
Only when the flag is false, the backend sets it to true and sends the
signal to walsender. When the flag is true, the signal doesn't need to be
sent. The flag is set to false right before walsender sends WAL.

The code is also available in my git repository:
git://git.postgresql.org/git/users/fujii/postgres.git
branch: wakeup-walsnd

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


change_poll_loop_in_walsender_0727.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao
On Mon, Jul 26, 2010 at 8:25 PM, Robert Haas  wrote:
> On Mon, Jul 26, 2010 at 6:48 AM, Marko Tiikkaja
>  wrote:
>> On 7/26/10 1:44 PM +0300, Fujii Masao wrote:
>>>
>>> On Mon, Jul 26, 2010 at 6:36 PM, Yeb Havinga  wrote:

 I wasn't entirely clear. My suggestion was to have only

   acknowledge_commit = {no|recv|fsync|replay}

 instead of

   replication_mode = {async|recv|fsync|replay}
>>>
>>> Okay, I'll change the patch accordingly.
>>
>> For what it's worth, I think replication_mode is a lot clearer.
>> Acknowledge_commit sounds like it would do something similar to
>> asynchronous_commit.
>
> I agree.

As the result of the vote, I'll leave the parameter "replication_mode"
as it is.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao
On Tue, Jul 27, 2010 at 12:36 PM, Joshua Tolley  wrote:
> Perhaps I'm hijacking the wrong thread for this, but I wonder if the quorum
> idea is really the best thing for us. I've been thinking about Oracle's way of
> doing things[1]. In short, there are three different modes: availability,
> performance, and protection. "Protection" appears to mean that at least one
> standby has applied the log; "availability" means at least one standby has
> received the log info (it doesn't specify whether that info has been fsynced
> or applied, but presumably does not mean "applied", since it's distinct from
> "protection" mode); "performance" means replication is asynchronous. I'm not
> sure this method is perfect, but it might be simpler than the quorum behavior
> that has been considered, and adequate for actual use cases.

In my case, I'd like to set up one synchronous standby on the near rack for
high-availability, and one asynchronous standby on the remote site for disaster
recovery. Can Oracle's way cover the case?

"availability" mode with two standbys might create a sort of similar situation.
That is, since the ACK from the near standby arrives in first, the near standby
acts synchronous and the remote one does asynchronous. But the ACK from the
remote standby can arrive in first, so it's not guaranteed that the near standby
has received the log info before transaction commit returns a "success" to the
client. In this case, we have to failover to the remote standby even if it's not
under control of a clusterware. This is a problem for me.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL cipher and version

2010-07-26 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jul 26, 2010 at 9:57 AM, Dave Page  wrote:
>> On Mon, Jul 26, 2010 at 2:49 PM, Robert Haas  wrote:
>>> Any objections to me committing this?
>> 
>> Might wanna fix this first:
>> 
>> +PG_FUNCTION_INFO_V1(ssl_veresion);
>>                                         

> Wow.  It works remarkably well without fixing that, but I'll admit
> that does seem lucky.

Well, it's got no arguments, which is the main thing that works
differently in call protocol V1.  I think you'd find that the
PG_RETURN_NULL case doesn't really work though ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Joshua Tolley
On Thu, Jul 22, 2010 at 10:37:12AM +0200, Yeb Havinga wrote:
> Fujii Masao wrote:
> Initially I also expected the quorum to behave like described by  
> Aidan/option 2. Also, IMHO the name "quorom" is a bit short, like having  
> "maximum" but not saying a max_something.
>
> quorum_min_sync_standbys
> quorum_max_sync_standbys

Perhaps I'm hijacking the wrong thread for this, but I wonder if the quorum
idea is really the best thing for us. I've been thinking about Oracle's way of
doing things[1]. In short, there are three different modes: availability,
performance, and protection. "Protection" appears to mean that at least one
standby has applied the log; "availability" means at least one standby has
received the log info (it doesn't specify whether that info has been fsynced
or applied, but presumably does not mean "applied", since it's distinct from
"protection" mode); "performance" means replication is asynchronous. I'm not
sure this method is perfect, but it might be simpler than the quorum behavior
that has been considered, and adequate for actual use cases.

[1]
http://download.oracle.com/docs/cd/B28359_01/server.111/b28294/protection.htm#SBYDB02000
alternatively, http://is.gd/dLkq4

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 3:16 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>
>> I actually think that memory management is one of the weakest
>> elements of our current architecture
>
> I'm actually pretty impressed by the memory contexts in PostgreSQL.
> Apparently I'm not alone in that, either; a paper by Hellerstein,
> Stonebraker, and Hamilton[1] has this in section 7.2 (Memory
> Allocator):
>
> "The interested reader may want to browse the open-source PostgreSQL
> code. This utilizes a fairly sophisticated memory allocator."
>
> I think the problem here is that we don't extend that sophistication
> to shared memory.

That's one aspect of it, and the other is that we don't have much
global coordination about how we use it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Kevin Grittner
Robert Haas  wrote:
 
> I actually think that memory management is one of the weakest
> elements of our current architecture
 
I'm actually pretty impressed by the memory contexts in PostgreSQL. 
Apparently I'm not alone in that, either; a paper by Hellerstein,
Stonebraker, and Hamilton[1] has this in section 7.2 (Memory
Allocator):
 
"The interested reader may want to browse the open-source PostgreSQL
code. This utilizes a fairly sophisticated memory allocator."
 
I think the problem here is that we don't extend that sophistication
to shared memory.
 
-Kevin

[1] Joseph M. Hellerstein, Michael Stonebraker and James Hamilton.
2007. Architecture of a Database System. Foundations and Trends(R)
in Databases Vol. 1, No. 2 (2007) 141*259. 
http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 1:50 PM, Markus Wanner  wrote:
> Note however, that a thread based design doesn't have this problem *at all*.
> Memory generally is shared (between threads) and you can dynamically
> allocate more or less (until Linux' OOM killer hits you.. yet another
> story). The OS reuses memory you don't currently need even for other
> applications.
>
> Users as well as developers know the threaded model (arguably, much better
> than the process based one). So that's what we get compared to. And what
> developers (including me) are used to.

I'm sort of used to the process model, myself, but I may be in the minority.

> I think we are getting by with fixed allocations at the moment, because we
> did a lot to get by with it. By working around these limitations.
>
> However, that's just my thinking. Thank you for your inputs.

I completely agree with you that fixed allocations suck.  We're just
disagreeing (hopefully, in a friendly and collegial fashion) about
what to do about it.

I actually think that memory management is one of the weakest elements
of our current architecture, though I think for somewhat different
reasons than what you're thinking about.  Besides the fact that we
have various smaller pools of dynamically shared memory (e.g. a
separate ring of buffers for each SLRU), I'm also unhappy about some
of the things we do with backend-private memory, work_mem being the
biggest culprit by far, because it's very difficult for the DBA to set
the knobs in a way that uses all of the memory he wants to allocate to
the database efficiently no overruns and none left over.  The case
where you can count on the database and all of your temporary files,
etc. to fit in RAM is really an exceptional case: in general, you need
to assume that there will be more demand for memory than there will be
memory available, and as much as possible you want the system (rather
than the user) to decide how it should optimally be allocated.  The
query planner and executor actually do have most of what is needed to
execute queries using more or less memory, but they lack the global
intelligence needed for intelligent decision-making.  Letting the OS
buffer cache rather than the PG buffer cache handle most of the
system's memory helps, but it's not a complete solution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Markus Wanner

Hi,

On 07/26/2010 07:16 PM, Robert Haas wrote:

Of course, there are other parts of the system (a whole bunch of them)
that used shared memory also, and perhaps some of those could be
modified to use the dynamic allocator as well.  But they're getting by
without it now, so maybe they don't really need it.  The SLRU stuff, I
think, works more or less like shared buffers (so you have the same
set of issues) and I think most of the other users are allocating
small, fixed-size chunks.


Yeah, I see your point(s).

Note however, that a thread based design doesn't have this problem *at 
all*. Memory generally is shared (between threads) and you can 
dynamically allocate more or less (until Linux' OOM killer hits you.. 
yet another story). The OS reuses memory you don't currently need even 
for other applications.


Users as well as developers know the threaded model (arguably, much 
better than the process based one). So that's what we get compared to. 
And what developers (including me) are used to.


I think we are getting by with fixed allocations at the moment, because 
we did a lot to get by with it. By working around these limitations.


However, that's just my thinking. Thank you for your inputs.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 12:51 PM, Markus Wanner  wrote:
>> Dynamically allocating out of a 2MB
>> segment gives up most of that flexibility.
>
> Absolutely, that's why I'd like to see other modules that use the dynamic
> allocator. The more the better.

Right, I agree.  The problem is that I don't think they can.  The
elephant in the room is shared_buffers, which I believe to be
typically BY FAR the largest consumer of shared memory.  It would be
absolutely fantastic if we had a shared_buffers implementation that
could free up unused buffers when they're not needed, or add more when
required.  But there are several reasons why I don't believe that will
ever happen.  One, much of the code that uses shared_buffers relies on
shared_buffers being located at a fixed memory address on a contiguous
chunk, and it's hard to see how we could change that assumption
without sacrificing performance.  Two, the overall size of the shared
memory arena is largely dependent on the size of shared_buffers, so
unless you also have the ability to resize the arena on the fly (which
is well-nigh to impossible with our current architecture, and maybe
with any architecture), resizing shared_buffers doesn't actually add
that much flexibility.  Three, the need for shared buffers is elastic
rather than absolute: stealing a few shared buffers for a defined
purpose (like sending imessages) is perfectly reasonable, but it's
rarely going to be a good idea for the buffer manager to proactively
free up memory just in case some other part of the system might need
some.  If you have a system that normally has 4GB of shared buffers
and some other module borrows 100MB and then returns it, the system
will just cache less data while that memory is in use and then start
right back up caching more again once it's returned.  That's very
nice, and it's hard to see how else to achieve that result.

Of course, there are other parts of the system (a whole bunch of them)
that used shared memory also, and perhaps some of those could be
modified to use the dynamic allocator as well.  But they're getting by
without it now, so maybe they don't really need it.  The SLRU stuff, I
think, works more or less like shared buffers (so you have the same
set of issues) and I think most of the other users are allocating
small, fixed-size chunks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Markus Wanner

Hi,

On 07/26/2010 06:33 PM, Robert Haas wrote:

It would be nice to think, for example, that this could be used as
infrastructure for parallel query to stream results back from worker
processes to the backend connected to the user.  If you're using 16
processors to concurrently scan 16 partitions of an appendrel and
stream those results back to the master


Now, *that* sounds like music to my ears ;-)

Or put another way: yes, I think imessages and the bgworker 
infrastructure stuff could enable or at least help that goal.



Dynamically allocating out of a 2MB
segment gives up most of that flexibility.


Absolutely, that's why I'd like to see other modules that use the 
dynamic allocator. The more the better.



What I think will end up happening here is that you'll always have to
size the segment used by the dynamic allocator considerably larger
than the amount of memory you expect to actually be used, so that
performance doesn't go into the toilet when it fills up.  As Markus
pointed out upthread, you'll always need some hard limit on the amount
of space that imessages can use, but you can make that limit much
larger if it's not reserved for a single purpose.  If you use the
"temporarily allocated shared buffers" method, then you could set the
default limit to something like "64MB, but not more than 1/8th of
shared buffers".


I've been thinking about such rules as well. They quickly get more 
complex if you begin to take OOM situations and their counter-measures 
into account.


In a way, fixing every separate pool to its specific size just is the 
very simples rule-set I can think of. The dynamic allocator buys you 
more flexibility, but choosing good limits and rules between the 
sub-systems is another issue.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Functional dependencies and GROUP BY

2010-07-26 Thread Alex Hunsaker
On Sat, Jul 24, 2010 at 06:23, Peter Eisentraut  wrote:

> Another open question I thought of was whether we should put the
> dependency record on the pg_index row, or the pg_constraint row, or
> perhaps the pg_class row.  Right now, it is using pg_index, because that
> was easiest to code up, but I suspect that once we have not-null
> constraints in pg_constraint, it will be more consistent to make all
> dependencies go against pg_constraint rather than a mix of several
> catalogs.

I think for primary keys pg_index is OK.  However for the not-null
case we have to use pg_constraint... So given that we end up having to
code that anyways, it seems like it will end up being
cleaner/consistent to always use the pg_constraint row(s).  So +1 for
using pg_constraint instead of pg_index from me.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Markus Wanner

Hi,

On 07/26/2010 04:31 PM, Alvaro Herrera wrote:

Excerpts from Robert Haas's message of lun jul 26 08:52:46 -0400 2010:

Here's another idea.  Instead of making imessages use an SLRU, how
about having it steal pages from shared_buffers?  This would require
segmenting messages into small enough chunks that they'd fit, but the
nice part is that it would avoid the need to have a completely
separate shared memory arena.  Ideally, we'd make the infrastructure
general enough that things like SLRU could use it also; and get rid of
or reduce in size some of the special-purpose chunks we're now
allocating.


To me that sounds like solving the same kind of problem for every module 
separately and somewhat differently. I tend to like general solutions 
(often too much, but that's another story), and to me it still seems a 
completely dynamic memory allocator solves that generically (and way 
more elegant than 'stealing pages' sounds).



Right
now we allocate a single large arena, and the lot of shared_buffers,
SLRU pools, locking objects, etc are all allocated from there.


Uh.. they all allocate from different, statically sized pool, don't they?


If we
want another 2 MB for "dynamic shmem", we'd just allocate 2 MB more in
that large arena and give those to this new code.


That's how it could work if we used a dynamic allocator. But currently, 
if I understand correctly, once the shared_buffers pool is full, it 
cannot steal memory from the SLRU pools. Or am I mistaken?


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 10:31 AM, Alvaro Herrera
 wrote:
> Excerpts from Robert Haas's message of lun jul 26 08:52:46 -0400 2010:
>> Here's another idea.  Instead of making imessages use an SLRU, how
>> about having it steal pages from shared_buffers?  This would require
>> segmenting messages into small enough chunks that they'd fit, but the
>> nice part is that it would avoid the need to have a completely
>> separate shared memory arena.  Ideally, we'd make the infrastructure
>> general enough that things like SLRU could use it also; and get rid of
>> or reduce in size some of the special-purpose chunks we're now
>> allocating.
>
> What's the problem you see with "another shared memory arena"?  Right
> now we allocate a single large arena, and the lot of shared_buffers,
> SLRU pools, locking objects, etc are all allocated from there.  If we
> want another 2 MB for "dynamic shmem", we'd just allocate 2 MB more in
> that large arena and give those to this new code.

But that's not a very flexible design.  If you discover that you need
3MB instead of 2MB, you get to restart the entire cluster.  If you
discover that you need 1MB instead of 2MB, you get to either restart
the entire cluster, or waste 1MB of shared memory.  And since actual
usage will almost certainly fluctuate, you'll almost certainly be
wasting some shared memory that could otherwise be used for other
purposes some of the time.  Now, granted, we have this problem already
today, and granted also, 2MB is not an enormous amount of memory on
today's machines.  If we really think that 2MB will always be adequate
for every purpose for which we wish to use unicast messaging, then
perhaps it's OK, but I'm not convinced that's true.

It would be nice to think, for example, that this could be used as
infrastructure for parallel query to stream results back from worker
processes to the backend connected to the user.  If you're using 16
processors to concurrently scan 16 partitions of an appendrel and
stream those results back to the master, will 128kB/backend be enough
memory to avoid pipeline stalls?  What if there's replication going on
at the same time?  What if there's other concurrent activity that also
uses imessages?  Or even better, what if there's other concurrent
activity that uses the dynamic allocator but NOT imessages?  If the
point of having a dynamic allocator is that it's eventually going to
be used by lots of different subsystems, then we had better have a
fairly high degree of confidence that it actually will, but in fact
we've made very little effort to characterize who the other users
might be and whether the stated implementation limitations will be
adequate for them.  Frankly, I doubt it.  One of the major reasons why
malloc() is so powerful is that you don't have to decide in advance
how much memory you're going to need, as you would if you put the
structure in the data segment.  Dynamically allocating out of a 2MB
segment gives up most of that flexibility.

What I think will end up happening here is that you'll always have to
size the segment used by the dynamic allocator considerably larger
than the amount of memory you expect to actually be used, so that
performance doesn't go into the toilet when it fills up.  As Markus
pointed out upthread, you'll always need some hard limit on the amount
of space that imessages can use, but you can make that limit much
larger if it's not reserved for a single purpose.  If you use the
"temporarily allocated shared buffers" method, then you could set the
default limit to something like "64MB, but not more than 1/8th of
shared buffers".  Since the memory won't get used unless it's needed,
you don't really have to care whether a particular installation is
likely to need some, none, or all of that; whereas if you're
allocating nailed-down memory, you're going to want a much smaller
default - a couple of MB, at most.  Furthermore, if you do happen to
be running on a 64GB machine with 8GB of shared_buffers and 64MB isn't
adequate, you can easily make it possible to bump that value up by
changing a GUC and hitting reload.  With the "nailed-down shared
memory" approach, you're locked into whatever you decide at postmaster
start.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch (for 9.1) string functions

2010-07-26 Thread Merlin Moncure
On Mon, Jul 26, 2010 at 11:07 AM, Robert Haas  wrote:
> On Mon, Jul 26, 2010 at 10:39 AM, Merlin Moncure  wrote:
>> It was absolutely a good decision because it prevented type inference
>> in ways that were ambiguous or surprising (for a canonical case see:
>> http://www.mail-archive.com/pgsql-gene...@postgresql.org/msg93224.html).
>>
>> || operator is still pretty tolerant in the 8.3+ world.
>> select interval_col || bool_col; -- error
>> select interval_col::text || bool_col; -- text concatenation
>> select text_col || interval_col || bool_col; -- text concatenation
>>
>> variadic text would require text casts on EVERY non 'unknown' argument
>> which drops it below the threshold of usefulness IMO -- it would be
>> far stricter than vanilla || concatenation.  Ok, pure bikeshed here
>> (shutting my trap now!), but concat() is one of those wonder functions
>> that you want to make as usable and terse as possible.  I don't see
>> the value in making it overly strict.
>
> I'm just very skeptical that we should give our functions argument
> types that are essentially fantasy.  CONCAT() doesn't concatenate
> integers or intervals or boxes: it concatenates strings, and only
> strings.  Surely the right fix, if our casting rules are too
> restrictive, is to fix the casting rules; not to start adding a bunch
> of kludgery in every function we define.
>
> The problem with your canonical example (and the other examples of
> this type I've seen) is that they are underspecified.  Given two
> identically-named operators, one of which accepts types T1 and T2, and
> the other of which accepts types T3 and T4, what is one to do with T1
> OP T4?  You can cast T1 to T3 and call the first operator or you can
> cast T4 to T2 and call the second one, and it's really not clear which
> is right, so you had better thrown an error.  The same applies to
> functions: given foo(text) and foo(date) and the call
> foo('2010-04-15'), you had better complain, or you may end up with a
> very sad user.  But sometimes our current casting rules require casts
> in situations where they don't seem necessary, such as
> LPAD(integer_column, '0', 5).  That's not ambiguous because there's
> only one definition of LPAD, and the chances that the user will be
> unpleasantly surprised if you call it seem quite low.
>
> In other words, this problem is not unique to CONCAT().

shoot, can't resist :-).

Are the casting rules broken? If you want to do lpad w/o casts for
integers, you can define it explicitly -- feature, not bug.  You can
basically do this for any function with fixed arguments -- either in
userland or core.  lpad(int) actually introduces a problem case with
the minus sign possibly requiring application specific intervention,
so things are probably correct exactly as they are.  Casting rules
need to be tight because if they are not they can introduce
independent behaviors when you underspecify as you noted above.

ISTM you are unhappy with the "any" variadic mechanism in general, not
the casting rules.  "any" essentially means: "the types you pass to me
are irrelevant, because i'm going to look up behaviors on the server
and apply them as I see fit".  You've defined a regular methodology
across ALL types and allow the user to pass anything -- I see nothing
at all wrong with this as long as it's only implemented in very
special cases.  If you happen to not like the predefined 'box'
behavior, nothing is stopping you from massaging it before sending it
in.  Also, there's no guarantee that the behavior hidden under the
"any" can be reproduced via manual cast...concat() is some thing of a
special case where you can.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch (for 9.1) string functions

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 10:39 AM, Merlin Moncure  wrote:
> On Mon, Jul 26, 2010 at 9:26 AM, Robert Haas  wrote:
>> On Mon, Jul 26, 2010 at 9:10 AM, Merlin Moncure  wrote:
 CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
 implements here?  And why does CONCAT() take a variadic "ANY"
 argument?  Shouldn't that be variadic TEXT?
>>>
>>> What does that accomplish, besides forcing you to sprinkle every
>>> concat call with text casts (maybe that's not a bad thing?)?
>>
>> You could ask the same thing about the existing || operator.  And in
>> fact, we used to have that behavior.  We changed it in 8.3.  Perhaps
>> that was a good decision and perhaps it wasn't, but I don't think
>> using CONCAT() to make an end-run around that decision is the way to
>> go.
>
> It was absolutely a good decision because it prevented type inference
> in ways that were ambiguous or surprising (for a canonical case see:
> http://www.mail-archive.com/pgsql-gene...@postgresql.org/msg93224.html).
>
> || operator is still pretty tolerant in the 8.3+ world.
> select interval_col || bool_col; -- error
> select interval_col::text || bool_col; -- text concatenation
> select text_col || interval_col || bool_col; -- text concatenation
>
> variadic text would require text casts on EVERY non 'unknown' argument
> which drops it below the threshold of usefulness IMO -- it would be
> far stricter than vanilla || concatenation.  Ok, pure bikeshed here
> (shutting my trap now!), but concat() is one of those wonder functions
> that you want to make as usable and terse as possible.  I don't see
> the value in making it overly strict.

I'm just very skeptical that we should give our functions argument
types that are essentially fantasy.  CONCAT() doesn't concatenate
integers or intervals or boxes: it concatenates strings, and only
strings.  Surely the right fix, if our casting rules are too
restrictive, is to fix the casting rules; not to start adding a bunch
of kludgery in every function we define.

The problem with your canonical example (and the other examples of
this type I've seen) is that they are underspecified.  Given two
identically-named operators, one of which accepts types T1 and T2, and
the other of which accepts types T3 and T4, what is one to do with T1
OP T4?  You can cast T1 to T3 and call the first operator or you can
cast T4 to T2 and call the second one, and it's really not clear which
is right, so you had better thrown an error.  The same applies to
functions: given foo(text) and foo(date) and the call
foo('2010-04-15'), you had better complain, or you may end up with a
very sad user.  But sometimes our current casting rules require casts
in situations where they don't seem necessary, such as
LPAD(integer_column, '0', 5).  That's not ambiguous because there's
only one definition of LPAD, and the chances that the user will be
unpleasantly surprised if you call it seem quite low.

In other words, this problem is not unique to CONCAT().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL cipher and version

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 9:57 AM, Dave Page  wrote:
> On Mon, Jul 26, 2010 at 2:49 PM, Robert Haas  wrote:
>> Last week, I ran across a situation where I needed to know the SSL
>> version and cipher in use for a particular database connection.
>> Magnus pointed me to contrib/sslinfo, but that didn't have quite what
>> I needed.  The attached patch adds two additional functions to
>> contrib/sslinfo to report this information.
>>
>> Any objections to me committing this?
>
> Might wanna fix this first:
>
> +PG_FUNCTION_INFO_V1(ssl_veresion);
>                                         

Wow.  It works remarkably well without fixing that, but I'll admit
that does seem lucky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch (for 9.1) string functions

2010-07-26 Thread Merlin Moncure
On Mon, Jul 26, 2010 at 9:26 AM, Robert Haas  wrote:
> On Mon, Jul 26, 2010 at 9:10 AM, Merlin Moncure  wrote:
>>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>>> implements here?  And why does CONCAT() take a variadic "ANY"
>>> argument?  Shouldn't that be variadic TEXT?
>>
>> What does that accomplish, besides forcing you to sprinkle every
>> concat call with text casts (maybe that's not a bad thing?)?
>
> You could ask the same thing about the existing || operator.  And in
> fact, we used to have that behavior.  We changed it in 8.3.  Perhaps
> that was a good decision and perhaps it wasn't, but I don't think
> using CONCAT() to make an end-run around that decision is the way to
> go.

It was absolutely a good decision because it prevented type inference
in ways that were ambiguous or surprising (for a canonical case see:
http://www.mail-archive.com/pgsql-gene...@postgresql.org/msg93224.html).

|| operator is still pretty tolerant in the 8.3+ world.
select interval_col || bool_col; -- error
select interval_col::text || bool_col; -- text concatenation
select text_col || interval_col || bool_col; -- text concatenation

variadic text would require text casts on EVERY non 'unknown' argument
which drops it below the threshold of usefulness IMO -- it would be
far stricter than vanilla || concatenation.  Ok, pure bikeshed here
(shutting my trap now!), but concat() is one of those wonder functions
that you want to make as usable and terse as possible.  I don't see
the value in making it overly strict.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Alvaro Herrera
Excerpts from Robert Haas's message of lun jul 26 08:52:46 -0400 2010:

> Here's another idea.  Instead of making imessages use an SLRU, how
> about having it steal pages from shared_buffers?  This would require
> segmenting messages into small enough chunks that they'd fit, but the
> nice part is that it would avoid the need to have a completely
> separate shared memory arena.  Ideally, we'd make the infrastructure
> general enough that things like SLRU could use it also; and get rid of
> or reduce in size some of the special-purpose chunks we're now
> allocating.

What's the problem you see with "another shared memory arena"?  Right
now we allocate a single large arena, and the lot of shared_buffers,
SLRU pools, locking objects, etc are all allocated from there.  If we
want another 2 MB for "dynamic shmem", we'd just allocate 2 MB more in
that large arena and give those to this new code.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL cipher and version

2010-07-26 Thread Dave Page
On Mon, Jul 26, 2010 at 2:49 PM, Robert Haas  wrote:
> Last week, I ran across a situation where I needed to know the SSL
> version and cipher in use for a particular database connection.
> Magnus pointed me to contrib/sslinfo, but that didn't have quite what
> I needed.  The attached patch adds two additional functions to
> contrib/sslinfo to report this information.
>
> Any objections to me committing this?

Might wanna fix this first:

+PG_FUNCTION_INFO_V1(ssl_veresion);
 

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SSL cipher and version

2010-07-26 Thread Robert Haas
Last week, I ran across a situation where I needed to know the SSL
version and cipher in use for a particular database connection.
Magnus pointed me to contrib/sslinfo, but that didn't have quite what
I needed.  The attached patch adds two additional functions to
contrib/sslinfo to report this information.

Any objections to me committing this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


ssl_cipher_and_version.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \timing output supressed in quiet mode

2010-07-26 Thread Kevin Grittner
Peter Eisentraut  wrote:
> In psql's quiet mode, the output of the \timing option is
suppressed,
> except in the \copy command.  That inconsistency should be fixed in
any
> case.  It seems to me that if I explicitly turn on timing, then that
> should be unaffected by the quiet mode.  Comments?
 
+1
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch (for 9.1) string functions

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 9:10 AM, Merlin Moncure  wrote:
>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>> implements here?  And why does CONCAT() take a variadic "ANY"
>> argument?  Shouldn't that be variadic TEXT?
>
> What does that accomplish, besides forcing you to sprinkle every
> concat call with text casts (maybe that's not a bad thing?)?

You could ask the same thing about the existing || operator.  And in
fact, we used to have that behavior.  We changed it in 8.3.  Perhaps
that was a good decision and perhaps it wasn't, but I don't think
using CONCAT() to make an end-run around that decision is the way to
go.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch (for 9.1) string functions

2010-07-26 Thread Pavel Stehule
>
>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>> implements here?  And why does CONCAT() take a variadic "ANY"
>> argument?  Shouldn't that be variadic TEXT?
>

CONCAT with variadic text parameter will be limited with existing
default casts to text - for example, you can't to cast date to text,
int to text.

postgres=# create or replace function concat(variadic text[]) returns
text as $$select string_agg(x,'') from unnest($1) x$$ language sql;
CREATE FUNCTION

postgres=# select concat('a','b');
 concat

 ab
(1 row)

Time: 20,812 ms
postgres=# select concat('a',10);
ERROR:  function concat(unknown, integer) does not exist
LINE 1: select concat('a',10);
   ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

so with variadic "any"[] concat doesn't need explicit cats.

Regards

Pavel Stehule

p.s. inside function is every value transformed to text.

> merlin
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch (for 9.1) string functions

2010-07-26 Thread Merlin Moncure
On Mon, Jul 26, 2010 at 8:02 AM, Robert Haas  wrote:
> Regardless of where this function ends up, the concat_ws documentation
> should contain some mention of the fact that "ws" is intended to mean
> "with separator",

big +1 on that -- I've been loosely following the thread and I had
assumed 'ws' meant 'wide string' all this time :-).

> Come to think of it, have we checked that the behavior of LEFT, RIGHT,
> REVERSE, etc. is the same on other DBs, especially as far as nulls,
> empty strings, too-large or negative subscripts, etc is concerned?

Probably 'standard' behavior wrt null would be to be strict; return
null if any argument is null.  The proposed behavior seems ok though.

> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
> implements here?  And why does CONCAT() take a variadic "ANY"
> argument?  Shouldn't that be variadic TEXT?

What does that accomplish, besides forcing you to sprinkle every
concat call with text casts (maybe that's not a bad thing?)?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamically allocating chunks from shared memory

2010-07-26 Thread Robert Haas
On Thu, Jul 22, 2010 at 3:09 PM, Markus Wanner  wrote:
>> FWIW I don't think you should be thinking in "replacing imessages with
>> SLRU".  I rather think you should be thinking in how can you implement
>> the imessages API on top of SLRU.
>
> Well, I'm rather comparing SLRU with the dynamic allocator. So far I'm
> unconvinced that SLRU would be a better base for imessages than a dynamic
> allocator. (And I'm arguing that SLRU should use a dynamic allocator
> underneath).

Here's another idea.  Instead of making imessages use an SLRU, how
about having it steal pages from shared_buffers?  This would require
segmenting messages into small enough chunks that they'd fit, but the
nice part is that it would avoid the need to have a completely
separate shared memory arena.  Ideally, we'd make the infrastructure
general enough that things like SLRU could use it also; and get rid of
or reduce in size some of the special-purpose chunks we're now
allocating.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multibyte-character aware support for function "downcase_truncate_identifier()"

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 12:40 AM, Rajanikant Chirmade
 wrote:
> Since discussion stopped in discussion thread
>
> http://archives.postgresql.org/pgsql-bugs/2006-09/msg00128.php
>
> Are there any implications of this change in handling identifiers ?
>
> Thanks & Regards,
> Rajanikant Chirmade

An even more relevant message appears to be this one:

http://archives.postgresql.org/pgsql-bugs/2006-09/msg00133.php

Both this and the comment in downcase_truncate_identifier() suggests
that the current method is attributable to lack of support for
Unicode-aware case normalization and is known not to work correctly in
all locales.  Locale and encoding stuff isn't really my area of
expertise, but if now have support for Unicode-aware case
normalization, shouldn't we be using it here, too?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch (for 9.1) string functions

2010-07-26 Thread Robert Haas
On Sun, Jul 25, 2010 at 11:29 PM, Itagaki Takahiro
 wrote:
> I think RAISE is badly designed. Using % as a placeholder has a limitation
> to format strings. For example, format() cannot work as concat():
>  SELECT format('%%', 123, 456) => ERROR

It's hard to argue with this, as far as it goes.

> So, my proposal is renaming stringfunc//sprintf() to format(),
> and moving it into the core. I think sprintf() is superior to format()
> in every aspect; '%s%s' works as concat(), and '%s%%' can append
> % without blanks.

So forget about format() and put sprintf() in contrib/stringfunc.
That's not an argument for putting anything in core.  Perhaps such an
argument can be made, but this isn't it.

> Then, concat_ws() will be moved into core because contrib/stringfunc
> only has the function now. In addition, I'd like to include the function for
> the compatibility to MySQL. Also, concat() and concat_ws() can share
> the implementation.

Regardless of where this function ends up, the concat_ws documentation
should contain some mention of the fact that "ws" is intended to mean
"with separator", and that the naming is chosen for compatibility with
MySQL.  As for where to put it, I see no terribly compelling reason
why it needs to be in core.  You can write array_to_string(array[txt1,
txt2, txt3], sep) and get the same effect as concat_ws(sep, txt1,
txt2, txt3).  I don't really want to start maintaining duplicate
functionality for things like this, especially since MySQL users will
no doubt expect that our implementation will be bug-compatible.  If
the output isn't identical to what MySQL does for every set of
arguments, it'll be reported as a bug.

Come to think of it, have we checked that the behavior of LEFT, RIGHT,
REVERSE, etc. is the same on other DBs, especially as far as nulls,
empty strings, too-large or negative subscripts, etc is concerned?  Is
CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
implements here?  And why does CONCAT() take a variadic "ANY"
argument?  Shouldn't that be variadic TEXT?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bg worker: overview

2010-07-26 Thread Dimitri Fontaine
Markus Wanner  writes:
> To simplify, you might want to start a bgworker on database 'postgres',
> which then acts as a sub-coordinator (and doesn't really need to use its
> database connection).

Yeah, that sounds like the simplest way forward, so that it's easy for
this "user daemon" to communicate with the coordinator. Typically this
would start a backend and LOAD a module, which would enter the user
daemon main loop, I guess.

Then all the usual backend code facilities are there, even SQL and
calling user defined function.

> Well, there are different types of imessages defined in imsg.h. If you are
> coding something within Postgres, you'd just add all the required messages
> types there. There's no such thing as an external registration for new
> message types.

Given that imessages can have a payload, maybe the simplest way there
would be to add a "IMSGT_EXEC_QUERY_PARAMS" message type, the payload of
which would be composed of the SQL text and its parameters. I get it
that requiring a bgworker backend connected to a given database is
already part of the API right?

> So, the bgworker infrastructure could probably satisfy the internal
> communication needs. But how does this ticker daemon talk to the outside?
> Does it need to open a socket and listen there? Or do the requests to that
> queue come in via SQL?


The ticker only job is to manage a "ticks" table per database. All it
needs for that is a libpq connection, really, but given your model it'd
be a single backend (worker) that would send imessages to the
coordinator so that a background worker would tick, by executing this
SQL: select pgq.ticker()).

So that would be a lot of changes to follow your facilities, it's
unclear to me how much we're twisting it so that it fits. Well, maybe
that's not a good example after all. Dave, want to see about pgagent?

Regards,
-- 
Dimitri Fontaine
PostgreSQL DBA, Architecte

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Robert Haas
On Mon, Jul 26, 2010 at 6:48 AM, Marko Tiikkaja
 wrote:
> On 7/26/10 1:44 PM +0300, Fujii Masao wrote:
>>
>> On Mon, Jul 26, 2010 at 6:36 PM, Yeb Havinga  wrote:
>>>
>>> I wasn't entirely clear. My suggestion was to have only
>>>
>>>   acknowledge_commit = {no|recv|fsync|replay}
>>>
>>> instead of
>>>
>>>   replication_mode = {async|recv|fsync|replay}
>>
>> Okay, I'll change the patch accordingly.
>
> For what it's worth, I think replication_mode is a lot clearer.
> Acknowledge_commit sounds like it would do something similar to
> asynchronous_commit.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gincostestimate

2010-07-26 Thread Jan Urbański

On 26/07/10 12:58, Oleg Bartunov wrote:

Jan,

On Sun, 25 Jul 2010, Jan Urbaski wrote:


On 02/07/10 14:33, Teodor Sigaev wrote:

Patch implements much more accuracy estimation of cost for GIN index
scan than generic cost estimation function.



I was able to reproduce his issue, that is: select id from ftstest where
body_fts @@ to_tsquery('commonterm80'); was choosing a sequential scan,
which was resulting in much longer execution than the bitmap index plan
that I got after disabling seqscans.

I then applied the patch, recompiled PG and tried again... and nothing
changed. I first tried running ANALYSE and then dropping and recreating
the GIN index, but the planner still chooses the seq scan.


read thread
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01407.php
There is always a fuzz factor, as Tom said, about 1% in path cost
comparisons.
You may compare plans for 'commonterm60', 'commonterm40'.


OK, I thought this might be the case, as with the patch the sequential 
scan is

winning only be a small margin.

Thanks,
Jan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gincostestimate

2010-07-26 Thread Oleg Bartunov

Jan,

On Sun, 25 Jul 2010, Jan Urbaski wrote:


On 02/07/10 14:33, Teodor Sigaev wrote:

Patch implements much more accuracy estimation of cost for GIN index
scan than generic cost estimation function.


Hi,

I'm reviewing this patch, and to begin with it I tried to reproduce the
problem that originally came up on -performance in
http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php


I attached scripts



The links from that mail are now dead, so I set up my own test environment:
* one table testfts(id serial, body text, body_fts tsvector)
* 5 rows, each with 1000 random words taken from
/usr/share/dict/british-english-insane (the wbritish-insane Debian
package) separated by a single space
* each row also had the word "commonterm" at the end, 80% had
commonterm80, 60% had commonterm60 etc (using the same methodology as
Jesper, that commonterm60 can appear only if commonterm80 is in the row)
* a GIN index on the tsvectors

I was able to reproduce his issue, that is: select id from ftstest where
body_fts @@ to_tsquery('commonterm80'); was choosing a sequential scan,
which was resulting in much longer execution than the bitmap index plan
that I got after disabling seqscans.

I then applied the patch, recompiled PG and tried again... and nothing
changed. I first tried running ANALYSE and then dropping and recreating
the GIN index, but the planner still chooses the seq scan.


read thread
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01407.php
There is always a fuzz factor, as Tom said, about 1% in path cost comparisons.
You may compare plans for 'commonterm60', 'commonterm40'.



Full explains below (the NOTICE is a debugging aid from the patch, which
I temporarily enabled to see if it's picking up the code).


from this debug you can see that cost estimation now are much accurate
than before.



I'll continue reading the code and trying to understand what it does,
but in the meantime: am I doing something wrong that I don't see the
planner switching to the bitmap index plan? I see that the difference in
costs is small, so maybe I just need to tweak the planner knobs a bit?
Is the output below expected?


I think Tom explained this
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01426.php




Cheers,
Jan


wulczer=# explain analyse select id from ftstest where body_fts @@
to_tsquery('commonterm80');
NOTICE:  GIN stats: nEntryPages: 49297.00 nDataPages: 16951.00
nPendingPages :0.00 nEntries: 277521.00
   QUERY PLAN

--
Seq Scan on ftstest  (cost=0.00..1567.00 rows=39890 width=4) (actual
time=221.893..33179.794 rows=39923 loops=1)
  Filter: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 33256.661 ms
(3 rows)

wulczer=# set enable_seqscan to false;
SET
Time: 0.257 ms
wulczer=# explain analyse select id from ftstest where body_fts @@
to_tsquery('commonterm80');
NOTICE:  GIN stats: nEntryPages: 49297.00 nDataPages: 16951.00
nPendingPages :0.00 nEntries: 277521.00
QUERY PLAN


Bitmap Heap Scan on ftstest  (cost=449.15..1864.50 rows=39890 width=4)
(actual time=107.421..181.284 rows=39923 loops=1)
  Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
  ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..439.18
rows=39890 width=0) (actual time=97.057..97.057 rows=39923 loops=1)
Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
Total runtime: 237.218 ms
(5 rows)

Time: 237.999 ms




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

a
able
about
account
acid
across
act
addition
adjustment
advertisement
after
again
against
agreement
air
all
almost
among
amount
amusement
and
angle
angry
animal
answer
ant
any
apparatus
apple
approval
arch
argument
arm
army
art
as
at
attack
attempt
attention
attraction
authority
automatic
awake
baby
back
bad
bag
balance
ball
band
base
basin
basket
bath
be
beautiful
because
bed
bee
before
behaviour
belief
bell
bent
berry
between
bird
birth
bit
bite
bitter
black
blade
blood
blow
blue
board
boat
body
boiling
bone
book
boot
bottle
box
boy
brain
brake
branch
brass
bread
breath
brick
bridge
bright
broken
brother
brown
brush
bucket
building
bulb
burn
burst
business
but
butter
button
by
cake
camera
canvas
card
care
carriage
cart
cat
cause
certain
chain
chalk
chance
change
cheap
cheese
chemical
chest
chief
chin
church
circle
clean
clear
clock
cloth
cloud
coal
c

Re: [HACKERS] Synchronous replication

2010-07-26 Thread Marko Tiikkaja

On 7/26/10 1:44 PM +0300, Fujii Masao wrote:

On Mon, Jul 26, 2010 at 6:36 PM, Yeb Havinga  wrote:

I wasn't entirely clear. My suggestion was to have only

   acknowledge_commit = {no|recv|fsync|replay}

instead of

   replication_mode = {async|recv|fsync|replay}


Okay, I'll change the patch accordingly.


For what it's worth, I think replication_mode is a lot clearer. 
Acknowledge_commit sounds like it would do something similar to 
asynchronous_commit.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao
On Mon, Jul 26, 2010 at 6:36 PM, Yeb Havinga  wrote:
> Fujii Masao wrote:
>>
>> I still like
>>
>>    replication_mode = {async|recv|fsync|replay}
>>
>> rather than
>>
>>    synchronous_replication = {on|off}
>>    acknowledge_commit = {no|recv|fsync|replay}
>>
>
> Hello Fujii,
>
> I wasn't entirely clear. My suggestion was to have only
>
>   acknowledge_commit = {no|recv|fsync|replay}
>
> instead of
>
>   replication_mode = {async|recv|fsync|replay}

Okay, I'll change the patch accordingly.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Yeb Havinga

Fujii Masao wrote:

I still like

replication_mode = {async|recv|fsync|replay}

rather than

synchronous_replication = {on|off}
acknowledge_commit = {no|recv|fsync|replay}
  

Hello Fujii,

I wasn't entirely clear. My suggestion was to have only

   acknowledge_commit = {no|recv|fsync|replay}

instead of

   replication_mode = {async|recv|fsync|replay}


regards,
Yeb Havinga

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: quorum commit Re: [HACKERS] Synchronous replication

2010-07-26 Thread Yeb Havinga

Fujii Masao wrote:

In the following case, how should quorum commit behave?

1. quorum_standbys = 2; there are three connected synchronous standbys
2. One standby sends the ACK back and fails
3. The ACK arrives from another standby
4. How should quorum commit behave?

(a) Transaction commit returns a "success" since the master has already
received two ACKs
(b) Transaction commit waits for the "last" ACK since only one of
currently connected standbys has sent the ACK
  
I'd opt for option (b) if that doesn't make the code very complex, or 
expensive (to check connected state when reaching quorum).


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


quorum commit Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao
On Thu, Jul 22, 2010 at 5:37 PM, Yeb Havinga  wrote:
> Fujii Masao wrote:
>>
>> How should the synchronous replication behave when the number of connected
>> standby servers is less than quorum?
>>
>> 1. Ignore quorum. The current patch adopts this. If the ACKs from all
>>   connected standbys have arrived, transaction commit is successful
>>   even if the number of standbys is less than quorum. If there is no
>>   connected standby, transaction commit always is successful without
>>   regard to quorum.
>>
>> 2. Observe quorum. Aidan wants this. Until the number of connected
>>   standbys has become more than or equal to quorum, transaction commit
>>   waits.
>>
>> Which is the right behavior of quorum commit? Or we should add new
>> parameter specifying the behavior of quorum commit?
>>
>
> Initially I also expected the quorum to behave like described by
> Aidan/option 2.

I have another question about the detailed design of quorum commit.

In the following case, how should quorum commit behave?

1. quorum_standbys = 2; there are three connected synchronous standbys
2. One standby sends the ACK back and fails
3. The ACK arrives from another standby
4. How should quorum commit behave?

(a) Transaction commit returns a "success" since the master has already
received two ACKs
(b) Transaction commit waits for the "last" ACK since only one of
currently connected standbys has sent the ACK

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao
On Mon, Jul 26, 2010 at 5:27 PM, Yeb Havinga  wrote:
> Fujii Masao wrote:
>>>
>>> Intuitively by looking at the enumeration of replication_mode I'd think
>>> that
>>> the sync standbys are all standby's that operate in a not async mode.
>>> That
>>> would be clearer with a boolean sync (or not) and for sync standbys the
>>> replication_mode specified.
>>>
>>
>> You mean that something like synchronous_replication as the recovery.conf
>> parameter should be added in addition to replication_mode? Since
>> increasing
>> the number of similar parameters would confuse users, I don't like do
>> that.
>>
>
> I think what would be confusing if there is a mismatch between implemented
> concepts and parameters.
>
> 1 does the master wait for standby servers on commit?
> 2 how many acknowledgements must the master receive before it can continue?
> 3 is a standby server a synchronous one, i.e. does it acknowledge a commit?
> 4 when do standby servers acknowledge a commit?
> 5 does it only wait when the standby's are connected, or also when they are
> not connected?
> 6..?
>
> When trying to match parameter names for the concepts above:
> 1 - does not exist, but can be answered with quorum_standbys = 0
> 2 - quorum_standbys
> 3 - yes, if replication_mode != async (here is were I thought I had to think
> to much)
> 4 - replication modes recv, fsync and replay bot not async
> 5 - Zoltan's strict_sync_replication parameter
>
> Just an idea, what about
> for 4: acknowledge_commit = {no|recv|fsync|replay}
> then 3 = yes, if acknowledge_commit != no

Thanks for the clarification.

I still like

replication_mode = {async|recv|fsync|replay}

rather than

synchronous_replication = {on|off}
acknowledge_commit = {no|recv|fsync|replay}

because the former is more intuitive for me and I don't want
to increase the number of parameters.

We need to hear from some users in this respect. If most want
the latter, of course, I'd love to adopt it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous replication

2010-07-26 Thread Yeb Havinga

Fujii Masao wrote:

Intuitively by looking at the enumeration of replication_mode I'd think that
the sync standbys are all standby's that operate in a not async mode. That
would be clearer with a boolean sync (or not) and for sync standbys the
replication_mode specified.



You mean that something like synchronous_replication as the recovery.conf
parameter should be added in addition to replication_mode? Since increasing
the number of similar parameters would confuse users, I don't like do that.
  
I think what would be confusing if there is a mismatch between 
implemented concepts and parameters.


1 does the master wait for standby servers on commit?
2 how many acknowledgements must the master receive before it can continue?
3 is a standby server a synchronous one, i.e. does it acknowledge a commit?
4 when do standby servers acknowledge a commit?
5 does it only wait when the standby's are connected, or also when they 
are not connected?

6..?

When trying to match parameter names for the concepts above:
1 - does not exist, but can be answered with quorum_standbys = 0
2 - quorum_standbys
3 - yes, if replication_mode != async (here is were I thought I had to 
think to much)

4 - replication modes recv, fsync and replay bot not async
5 - Zoltan's strict_sync_replication parameter

Just an idea, what about
for 4: acknowledge_commit = {no|recv|fsync|replay}
then 3 = yes, if acknowledge_commit != no

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Synchronous Replication patches

2010-07-26 Thread Fujii Masao
On Sat, Jul 24, 2010 at 4:40 PM,   wrote:
> Instead, I will post a patch that unifies my configuration choices with
> Fujii's patch. Do you have suggestions for better worded GUCs?
> "slave" seems to be phased out by "standby" for political correctness, so
> "synchronous_standby" instead of "synchronous_slave". You mentioned
> "min_sync_replication_clients" -> "quorum_min_sync_standbys". What else?

I think that the meaning of my "quorum" parameter is the same as
that of your "min_sync_replication_clients". Right?

I'm planning to add new parameter specifying the behavior of quorum
commit when the number of connected synchronous standbys becomes
less than "quorum".

1. Ignore quorum. If the ACKs from all connected standbys have
   arrived, transaction commit is successful even if the number
   of standbys is less than quorum. If there is no connected
   standby, transaction commit always is successful without
   regard to quorum.

2. Observe quorum. Until the number of connected standbys has
   become more than or equal to quorum, transaction commit waits.

http://archives.postgresql.org/pgsql-hackers/2010-07/msg01327.php

Is the meaning of this parameter the same as that of your
"strict_sync_replication"?

> You also noticed that my patch addressed 2PC, maybe I will have to add
> this part to Fujii's patch, too. Note: I haven't yet read his patch,
> maybe working with LSNs instead of XIDs make this work automatically,
> I don't know. We should definitely test.

Yeah, transaction commit seems to have to wait for replication in
not only RecordTransactionCommit() but also EndPrepare(),
RecordTransactionCommitPrepared() and RecordTransactionAbortPrepared().

I'll fix that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security label support, part.2

2010-07-26 Thread KaiGai Kohei
The attached patches are revised ones, as follows.

* A new SECURITY LABEL statement replaced the previous ALTER TABLE statement
  with SECURITY LABEL TO option. It has the following syntax.

  SECURITY LABEL [ FOR  ] ON   IS 
'';

  E.g) SECURITY LABEL ON TABLE t1 IS 'system_u:object_r:sepgsql_table_t:s0';

* It supports multiple security providers to assign its security label on
  a database object. The pg_seclabel catalog was modified as follows:

CATALOG(pg_seclabel,3037) BKI_WITHOUT_OIDS
{
Oid reloid; /* OID of table containing the object */
Oid objoid; /* OID of the object itself */
int4subid;  /* column number, or 0 if not used */
+   texttag;/* identifier of external security provider */
textlabel;  /* security label of the object */
} FormData_pg_seclabel;

  The new 'tag' field identifies which security provider manages this
  security label. For example, SE-PostgreSQL uses "selinux" for its
  identifier.

* The security hook to check relabeling become to be registered using
  register_object_relabel_hook() which takes a tag of ESP module and
  a function pointer to the security hook.
  ExecSecLabelStmt() picks up an appropriate security hook, then it
  shall be invoked even if multiple modules are loaded.

* Add _copySecLabelStmt() on nodes/copyfuncs.c and _equalSecLabelStmt()
  on nodes/equalfuncs.c, because I forgot to add them, although new
  parsenode (SecLabelStmt) was added.

* Add descriptions about pg_seclabel catalog and SECURITY LABEL statement
  on the documentation.

Thanks,

(2010/07/23 22:36), Robert Haas wrote:
> On Fri, Jul 23, 2010 at 8:59 AM, KaiGai Kohei  wrote:
>> (2010/07/23 20:44), Robert Haas wrote:
>>>
>>> 2010/7/23 KaiGai Kohei:
>
> Hmm.  How about if there's just one provider loaded, you can omit it,
> but if you fail to specify it and there's>1 loaded, we just throw an
> error saying you didn't specify whose label it is.
>
 Perhaps, we need to return the caller a state whether one provider
 checked
 the given label at least, or not.
>>>
>>> Return to the caller?  This is an SQL command.  You either get an
>>> error, or you don't.
>>>
>> Ahh, I was talked about relationship between the core PG code and ESP
>> module.
>> It means the security hook returns a state which informs the core PG code
>> whether one provider checked the given label, then the core PG code can
>> decide whether it raise an actual error to users, or not.
>>
>> In other words, I'd like to suggest the security hook which returns a tag
>> of ESP module, as follows:
>>
>>   const char *
>>   check_object_relabel_hook(const ObjectAddress *object,
>> const char *provider,
>> const char *seclabel);
> 
> I don't think that's a very good design.  What I had in mind was a
> simple API for security providers to register themselves (including
> their names), and then the core code will only call the relevant
> security provider.  I did try to explain this in point #3 of my
> original review.
> 


-- 
KaiGai Kohei 


pgsql-v9.1-security-label-2.v2.patch
Description: application/octect-stream


pgsql-v9.1-security-label-1.v2.patch
Description: application/octect-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers