Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote:
 On Wed, May 26, 2010 at 10:20 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote:
 
  I guess that dropping the support of #3 doesn't reduce complexity
  since the code of #3 is almost the same as that of #2. Like
  walreceiver sends the ACK after receiving the WAL in #2 case, it has
  only to do the same thing after the WAL flush.
 
  Hmm, well the code for #3 is similar also to the code for #4. So if you
  do #2, its easy to do #2, #3 and #4 together.
 
 No. #4 requires the way of prompt communication between walreceiver and
 startup process, but #2 and #3 not. That is, in #4, walreceiver has to
 wake the startup process up as soon as it has flushed WAL. OTOH, the
 startup process has to wake walreceiver up as soon as it has replayed
 WAL, to request it to send the ACK to the master. In #2 and #3, the
 prompt communication from walreceiver to startup process, i.e., changing
 the poll loop in the startup process would also be useful for the data
 to be visible immediately on the standby. But it's not required.

You need to pass WAL promptly on primary from backend to WALSender.
Whatever mechanism you use can also be reused symmetrically on standby
to provide #4. So not a problem.

  The comment is about whether having #3 makes sense from a user interface
  perspective. It's easy to add options, but they must have useful
  meaning.
 
 #3 would be useful for people wanting further robustness. In #2,
 when simultaneous power failure on the master and the standby,
 and concurrent disk crash on the master happen, transaction whose
 success indicator has been returned to a client might be lost.
 #3 can avoid such a critical situation. This is one of reasons that
 DRBD supports Protocol C, I think.

Which few people use it, or if they do its because DRBD didn't
originally support multiple standbys. Not worth emulating IMHO.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT

2010-05-27 Thread Heikki Linnakangas

On 27/05/10 03:25, Florian Pflug wrote:

On May 27, 2010, at 0:58 , Heikki Linnakangas wrote:

On 26/05/10 02:00, Sam Vilain wrote:

Florian Pflug wrote:

On May 25, 2010, at 12:18 , Heikki Linnakangas wrote:

Releasing the newer savepoint will cause the older one to again become 
accessible, as the doc says, but rolling back to a savepoint does not 
implicitly release it. You'll have to use RELEASE SAVEPOINT for that.


Ah, now I get it. Thanks.

Would changing Releasing the newer savepoint will cause ...  to Explicitly releasing the 
newer savepoint or maybe even Explicitly releasing the newer savepoint with RELEASE SAVEPOINT 
will cause ... make things clearer?


Yes, probably - your misreading matches my misreading of it :-)


+1.


Patch that changes the wording to Explicitly releasing the newer savepoint with 
RELEASE SAVEPOINT will cause ... is attached.


Thanks, committed. I left out the Explicitly, though, because as Sam 
pointed out the newer savepoint can also be implicitly released by 
rolling back to an earlier savepoint.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Pavel Stehule


 I think we should fix it now.  Quick thought: maybe we could use FOR instead
 of AS: select myfunc(7 for a, 6 for b); IIRC the standard's mechanism for
 this is 'paramname = value', but I think that has problems because of our
 possibly use of = as an operator - otherwise that would be by far the best
 way to go.


What is advice of FOR instead AS?

it is exactly same.

Regards
Pavel

 cheers

 andrew

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Pavel Stehule
2010/5/27 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 27/05/10 02:09, alvherre wrote:

 Excerpts from Andrew Dunstan's message of mié may 26 18:52:33 -0400 2010:

 I think we should fix it now.  Quick thought: maybe we could use FOR
 instead of AS: select myfunc(7 for a, 6 for b); IIRC the standard's
 mechanism for this is 'paramname =  value', but I think that has
 problems because of our possibly use of =  as an operator - otherwise
 that would be by far the best way to go.

 I think we were refraining from =  because the standard didn't specify
 this back then -- AFAIU this was introduced very recently.  But now that
 it does, and that the syntax we're implementing conflicts with a
 different feature, it seems wise to use the standard-mandated syntax.

 The problem with the =  operator seems best resolved as not accepting
 such an operator in a function parameter, which sucks but we don't seem
 to have a choice.  Perhaps we could allow = to resolve as the
 operator for the case the user really needs to use it; or a
 schema-qualified operator.

 AFAIU, the standard doesn't say anything about named parameters. Oracle uses
 =, but as you said, that's ambiguous with the = operator.

 +1 for FOR.


I don't see any advantage of FOR. We can change ir to support new
standard or don't change it.

Pavel

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 02:18 +0300, Heikki Linnakangas wrote:
 On 27/05/10 01:23, Simon Riggs wrote:
  On Thu, 2010-05-27 at 00:21 +0300, Heikki Linnakangas wrote:
  On 26/05/10 23:31, Dimitri Fontaine wrote:
 d. choice of commit or rollback at timeout
 
  Rollback is not an option. There is no going back after the commit
  record has been flushed to disk or sent to a standby.
 
  There's definitely no going back after the xid has been removed from
  procarray because other transactions will then depend upon the final
  state. Currently we PANIC if we abort after we've marked clog, though
  that happens after XLogFlush(), which is where we're planning to wait
  for synch rep. If we abort after having written a commit record to disk
  we can still successfully generate an abort record as well. (Luckily, I
  note HS does actually cope with that. Phew!)
 
  So actually, an abort is a reasonable possibility, though I know it
  doesn't sound like it could be at first thought.

 Hmm, that's an interesting thought. Interesting, as in crazy ;-).

:-) It's a surprising thought for me also.

 I don't understand how HS could handle that. As soon as it sees the 
 commit record, the transaction becomes visible to readers.

I meant not-barf completely.

  The choice is to either commit anyway after the timeout, or wait forever.
 
  Hmm, wait forever. What happens if we try to shutdown fast while there
  is a transaction that is waiting forever? Is that then a commit, even
  though it never made it to the standby? How would we know it was safe to
  switchover or not? Hmm.
 
 Refuse to shut down until the standby acknowledges the commit. That's 
 the only way to be sure..
 
 In practice, hard synchronous don't return ever until the commit hits 
 the standby behavior is rarely what admins actually want, because it's 
 disastrous from an availability point of view. More likely, admins want 
 wait for ack from standby, unless it's not responding, in which case to 
 hell with redundancy and just act like a single server. It makes sense 
 if you just want to make sure that the standby doesn't return stale 
 results when it's working properly, and you're not worried about 
 durability but I'm not sure it's very sound otherwise.

Which is also crazy. If you're using synch rep its because you care
deeply about durability. Some people wish to treat the COMMIT as a
guarantee, not just a shrug.

I agree that don't-return-ever isn't something anyone will want.

What we need is a COMMIT with ERROR message!

Note that Oracle gives the options of COMMIT | SHUTDOWN at this point.
Shutdown is an implicit abort for the writing transaction...

At this point the primary thinks standby is no longer available. If we
have a split brain situation then we should be assuming we will STONITH
and shutdown the primary anyway. If we have more than one standby we can
stay up and probably shouldn't be sending an abort after a commit.

The trouble is *every* option is crazy from some perspective, so we must
consider them all, to see whether they are practical or impractical.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Pavel Stehule
2010/5/27 Tom Lane t...@sss.pgh.pa.us:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, May 26, 2010 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we go with the spec's syntax I think we'd have no realistic choice
 except to forbid = altogether as an operator name.  (And no, I'm not
 for that.)

 I suppose the most painful thing about doing that is that it would
 break hstore.  Are there other commonly-used modules that rely on =
 as an operator name?

 There don't seem to be any other contrib modules that define = as an
 operator name, but I'm not sure what's out there on pgfoundry or
 elsewhere.  The bigger issue to me is not so much hstore itself as that
 this is an awfully attractive operator name for anything container-ish.
 Wasn't the JSON-datatype proposal using = for an operator at one stage?
 (The current wiki page for it doesn't seem to reflect any such idea,
 though.)  And I think I remember Oleg  Teodor proposing such an
 operator in conjunction with some GIN-related idea or other.

 In spite of the difficulties, I'm reluctant to give up on it.  I
 always thought that the AS syntax was a crock and I'm not eager to
 invent another crock to replace it.  Being compatible with the SQL
 standard and with Oracle is not to be taken lightly.

 Yeah, I know.  Though this could end up being one of the bits of the
 spec that we politely decline to follow, like upper-casing identifiers.
 Still, it's a good idea to think again before we've set the release
 in stone ...

we have a last minutes for decision. any other change will need years
- like 'standard strings'. I agree so it's not good time for change.
But this change is a few lines in parser.

Regards
Pavel



                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Pavel Stehule
2010/5/27 Robert Haas robertmh...@gmail.com:
 On Wed, May 26, 2010 at 9:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, May 26, 2010 at 8:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we go with the spec's syntax I think we'd have no realistic choice
 except to forbid = altogether as an operator name.  (And no, I'm not
 for that.)

 I suppose the most painful thing about doing that is that it would
 break hstore.  Are there other commonly-used modules that rely on =
 as an operator name?

 There don't seem to be any other contrib modules that define = as an
 operator name, but I'm not sure what's out there on pgfoundry or
 elsewhere.  The bigger issue to me is not so much hstore itself as that
 this is an awfully attractive operator name for anything container-ish.
 Wasn't the JSON-datatype proposal using = for an operator at one stage?
 (The current wiki page for it doesn't seem to reflect any such idea,
 though.)  And I think I remember Oleg  Teodor proposing such an
 operator in conjunction with some GIN-related idea or other.

 In spite of the difficulties, I'm reluctant to give up on it.  I
 always thought that the AS syntax was a crock and I'm not eager to
 invent another crock to replace it.  Being compatible with the SQL
 standard and with Oracle is not to be taken lightly.

 Yeah, I know.  Though this could end up being one of the bits of the
 spec that we politely decline to follow, like upper-casing identifiers.
 Still, it's a good idea to think again before we've set the release
 in stone ...

 Perhaps one idea would be to:

 1. Invent a new crock for now.
 2. Add a duplicate version of the hstore = operator with a different name.
 3. Emit a warning whenever an operator called = is created.
 4. Document that beginning in PG 9.1, we will no longer support = as
 an operator name.

+1

Pavel


 That's still going to cause a fair amount of pain, but certainly less
 if we decide it now rather than later.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC] Security label support

2010-05-27 Thread KaiGai Kohei
As we talked at the developer meeting on Ottawa, it needs to provide
a capability to assign a short text identifier on database objects
to support label based ESP (such as SELinux).
So, I'd like to propose a few approaches to support security label
as a draft of discussion.

An example of label: system_u:object_r:sepgsql_ro_table_t:s0.

The format/contains/meanings of the security label shall be parsed
and validated by ESP module, so all we need to do is associate such a
short text on a certain database. It is quite similar to COMMENT ON.

I don't want to support multiple labels of an object in this stage,
because it makes ESP interfaces more complex and it is unclear whether
it is actually wanted. For example, OS does not support multiple MAC
features concurrently.

Here are a few idea to support security labels.
In this stage, I think the idea of [2] is most reasonable for us.
(Perhaps, I guess Stephen has same opinion, because the idea was
originally came from him.)


[1] Inject a text label field to every system catalog
-

This idea tries to add a new field to the schema of existing system
catalog.

Its implementation will be simple at first, however, it will be
entirely painful to modify every system catalog definitions and
(typically) Create() functions under the src/backend/commands/.

I doubt it is a correct way, even if short-term development.
It will be reasonabel just only conceptual development.


[2] Using OID as a key of text representation in separated catalog
--

This idea is similar to pg_description/pg_shdescription.
A new system catalog pg_seclabel and pg_shseclabel stores text form
of labels for pair of the relation-Id, object-Oid and object-Subid.
It does not damage to the schema of existing system catalog,

It adds two new system catalogs; pg_seclabel (local) and pg_shseclabel (shared).
The catalogs shall be declared as follows:

  CATALOG(pg_seclabel, 3037) BKI_WITHOUT_OIDS
  {
  Oid relid;  /* OID of the catalog containing the object */
  Oid objid;  /* OID of the object itself */
  int4subid;  /* column number, or 0 if unused */
  textlabel;  /* text form of security label */
  } FormData_pg_seclabel;

We also add a dependency between the labeled object and the security
label itself. It also enables to clean up orphan labels automatically,
without any new invention.

The related code will be stored in src/backend/catalog/pg_seclabel.c.
It provides an internal interface to assign a security label on
a certain database object when creation or relabeling.

However, it also has a limitation from the viewpoint of long-term.
From the definition, OID of database objects are unique. So, we cannot
share text form of labels even if massive number of database objects
have an identical security label; it can lead waste of storage consumption
because of the duplicated security labels. So, this idea shall be switched
to the [3] when we support row-level security with ESP.
But I think the idea [2] is reasonable in short-term development.


[3] Using security-Id as a key of text representation in separated catalog
--

This idea is a derivation from the idea of [2].
It also stores text form of labels into pg_seclabel/pg_shseclabel, but it
shall be identified with a pair of relation-Id and security-Id which is
newly supported.

The security-Id shall be stored within padding area of HeapTupleHeader like
object-Id. But, unlike object-Id, it does not need to be unique for each tuples.
It allows multiple tuples has same security-Id that is related to a certain
text form of security label. It means we can reduce waste of storage due to
the duplicated labels in text (Note, massive number of objects tend to share
a limited number of labels in general).

So, this approach has advantage toward the idea of [2], however, it needs more
code to be implemented/reviewed than [2], such as management of security-Id,
reclaim of orphan labels and so on.
Therefore, it is not feasible at the statring-up stage, as long as row-level
security with ESP is not available.


* SQL Statement
---

It also need to provide SQL statement to manage security label of the database
object. I plan the following statement to change the security label.

  ALTER xxx name SECURITY LABEL TO 'label';

  (For columns)
  ALTER TABLE name ALTER column SECURITY LABEL TO 'label';

The 'xxx' part is replaced by an object class, such as TABLE, SCHEMA and so on.

When the ALTER command is executed, ESP module validate the given label,
in addition to permission checks to relabel it.
If no ESP module is available, the ALTER always raises a feature-not-supported
error.

  Example)
  ALTER TABLE t1 SECURITY LABEL TO 'system_u:object_r:sepgsql_ro_table_t:s0';

  ALTER SCHEMA kaigai SECURITY 

Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Heikki Linnakangas

On 27/05/10 09:51, Simon Riggs wrote:

On Thu, 2010-05-27 at 02:18 +0300, Heikki Linnakangas wrote:

In practice, hard synchronous don't return ever until the commit hits
the standby behavior is rarely what admins actually want, because it's
disastrous from an availability point of view. More likely, admins want
wait for ack from standby, unless it's not responding, in which case to
hell with redundancy and just act like a single server. It makes sense
if you just want to make sure that the standby doesn't return stale
results when it's working properly, and you're not worried about
durability but I'm not sure it's very sound otherwise.


Which is also crazy. If you're using synch rep its because you care
deeply about durability.


No, not necessarily. As I said above, you might just want a guarantee 
that *if* you query the standby, you get up-to-date results. But if the 
standby is down for any reason, you don't care about it. That's a very 
sensible mode of operation, for example if you're offloading reads to 
the standby with something like pgpool.


In fact I have the feeling that that's the most common use case for 
synchronous replication, not a deep concern of durability.



I agree that don't-return-ever isn't something anyone will want.

What we need is a COMMIT with ERROR message!


Hmm, perhaps we could emit a warning with the commit. I'm not sure what 
an application could do with it, though.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Heikki Linnakangas

On 27/05/10 09:50, Pavel Stehule wrote:

2010/5/27 Heikki Linnakangasheikki.linnakan...@enterprisedb.com:

AFAIU, the standard doesn't say anything about named parameters. Oracle uses
=, but as you said, that's ambiguous with the =  operator.

+1 for FOR.


I don't see any advantage of FOR.


Any advantage over AS? It doesn't clash with the foo AS bar syntax 
that the standard is using for something completely different, as Peter 
pointed out in the original post.



We can change ir to support new  standard or don't change it.


What new standard?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Wed, May 26, 2010 at 10:37 PM, Simon Riggs si...@2ndquadrant.com wrote:
 If the remote server responded first, then that proves it is a better
 candidate for failover than the one you think of as near. If the two
 standbys vary over time then you have network problems that will
 directly affect the performance on the master; synch_rep = N would
 respond better to any such problems.

No. The remote standby might respond first temporarily though it's almost
behind the near one. The read-only queries or incrementally updated
backup operation might cause a bursty disk write, and delay the ACK from
the standby. The lock contention between read-only queries and recovery
would delay the ACK. So the standby which responds first is not always
the best candidate for failover. Also the administrator generally doesn't
put the remote standby under the control of a clusterware like heartbeat.
In this case, the remote standby will never be the candidate for failover.
But quorum commit cannot cover this simple case.

 OTOH, synchronous_replication=2 degrades the
 performance on the master very much.

 Yes, but only because you have only one near standby. It would clearly
 to be foolish to make this setting without 2+ near standbys. We would
 then have 4 or more servers; how do we specify everything for that
 config??

If you always want to use the near standby as the candidate for failover
by using quorum commit in the above simple case, you would need to choose
such a foolish setting. Otherwise, unfortunately you might have to failover
to the remote standby not under the control of a clusterware.

 synchronous_replication approach
 doesn't seem to cover the typical use case.

 You described the failure modes for the quorum proposal, but avoided
 describing the failure modes for the per-standby proposal.

 Please explain what will happen when the near server is unavailable,
 with per-standby settings. Please also explain what will happen if we
 choose to have 4 or 5 servers to maintain performance in case of the
 near server going down. How will we specify the failure modes?

I'll try to explain that.

(1) most standard case: 1 master + 1 sync standby (near)
When the master goes down, something like a clusterware detects that
failure, and brings the standby online. Since we can ensure that the
standby has all the committed transactions, failover doesn't cause
any data loss.

When the standby goes down or network outage happens, walsender
detects that failure via the replication timeout, keepalive or error
return from the system calls. Then walsender does something according
to the specified reaction (GUC) to the failure of the standby, e.g.,
walsender wakes the transaction commit up from the wait-for-ACK, and
exits. Then the master runs standalone.

(2) 1 master + 1 sync standby (near) + 1 async standby (remote)
When the master goes down, something like a clusterware brings the
sync standby in the near location online. The administrator would
need to take a fresh base backup of the new master, load it on the
remote standby, change the primary_conninfo, and restart the remote
standby.

When one of standbys goes down, walsender does the same thing described
in (1). Until the failed standby has restarted, the master runs together
with another standby.

In (1) and (2), after some failure happens, there would be only one server
which is guaranteed to have all the committed transactions. When it also
goes down, the database service stops. If you want to avoid this fragile
situation, you would need to add one more sync standby in the near site.

(3) 1 master + 2 sync standbys (near) + 1 async standby (remote)
When the master goes down, something like a clusterware brings the
one of sync standbys online by using some selection algorithm.
The administrator would need to take a fresh base backup of the new
master, load it on both remaining standbys, change the primary_conninfo,
and restart them.

When one of standbys goes down, walsender does the same thing described
in (1). Until the failed standby has restarted, the master runs together
with two standbys. At least one standby is guaranteed to be sync with
the master.

Is this explanation enough?

 Also, when synchronous_replication=1 and one of synchronous standbys
 goes down, how should the surviving standby catch up with the master?
 Such standby might be too far behind the master. The transaction commit
 should wait for the ACK from the lagging standby immediately even if
 there might be large gap? If yes, synch_rep_timeout would screw up
 the replication easily.

 That depends upon whether we send the ACK at point #2, #3 or #4. It
 would only cause a problem if you waited until #4.

Yeah, the problem happens. If we implement quorum commit, we need to
design how the surviving standby catches up with the master.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND 

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Abhijit Menon-Sen
At 2010-05-27 08:50:18 +0200, pavel.steh...@gmail.com wrote:

 I don't see any advantage of FOR. We can change ir to support new
 standard or don't change it.

Adopting FOR would mean we don't use AS in a way that conflicts with the
standard. That's its only advantage. But I agree with you, I don't think
it's worth inventing a new non-standard wart for this case.

I don't really like the idea of getting rid of = as an operator either;
I'm torn between staying true to the standard and politely looking the
other way as Tom suggested we might end up doing.

-- ams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Pavel Stehule
2010/5/27 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 27/05/10 09:50, Pavel Stehule wrote:

 2010/5/27 Heikki Linnakangasheikki.linnakan...@enterprisedb.com:

 AFAIU, the standard doesn't say anything about named parameters. Oracle
 uses
 =, but as you said, that's ambiguous with the =  operator.

 +1 for FOR.

 I don't see any advantage of FOR.

 Any advantage over AS? It doesn't clash with the foo AS bar syntax that
 the standard is using for something completely different, as Peter pointed
 out in the original post.

No, standard knows AS in different context. In param list standard
doesn't use keyword AS.


 We can change ir to support new  standard or don't change it.

 What new standard?


ANSI SQL 2011

Pavel
 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Pavel Stehule
2010/5/27 Abhijit Menon-Sen a...@toroid.org:
 At 2010-05-27 08:50:18 +0200, pavel.steh...@gmail.com wrote:

 I don't see any advantage of FOR. We can change ir to support new
 standard or don't change it.

 Adopting FOR would mean we don't use AS in a way that conflicts with the
 standard. That's its only advantage. But I agree with you, I don't think
 it's worth inventing a new non-standard wart for this case.

current using AS isn't in conflict with standard .. look to standard, please.

Pavel


 I don't really like the idea of getting rid of = as an operator either;
 I'm torn between staying true to the standard and politely looking the
 other way as Tom suggested we might end up doing.

 -- ams


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-27 Thread Heikki Linnakangas

On 27/05/10 08:56, Jesper Krogh wrote:

Just a thought. Wouldn't a All-visible bit also enable index only scans
to some degree?


Yes. In fact, that's one reason I implemented the visibility map in the 
first place. I started working on index-only scans based on that last 
year, if you search the archives for index-only scans you'll find those 
discussions.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 3:21 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote:
 On Wed, May 26, 2010 at 10:20 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote:
 
  I guess that dropping the support of #3 doesn't reduce complexity
  since the code of #3 is almost the same as that of #2. Like
  walreceiver sends the ACK after receiving the WAL in #2 case, it has
  only to do the same thing after the WAL flush.
 
  Hmm, well the code for #3 is similar also to the code for #4. So if you
  do #2, its easy to do #2, #3 and #4 together.

 No. #4 requires the way of prompt communication between walreceiver and
 startup process, but #2 and #3 not. That is, in #4, walreceiver has to
 wake the startup process up as soon as it has flushed WAL. OTOH, the
 startup process has to wake walreceiver up as soon as it has replayed
 WAL, to request it to send the ACK to the master. In #2 and #3, the
 prompt communication from walreceiver to startup process, i.e., changing
 the poll loop in the startup process would also be useful for the data
 to be visible immediately on the standby. But it's not required.

 You need to pass WAL promptly on primary from backend to WALSender.
 Whatever mechanism you use can also be reused symmetrically on standby
 to provide #4. So not a problem.

I cannot be so optimistic since the situation differs from one process
to another.

  The comment is about whether having #3 makes sense from a user interface
  perspective. It's easy to add options, but they must have useful
  meaning.

 #3 would be useful for people wanting further robustness. In #2,
 when simultaneous power failure on the master and the standby,
 and concurrent disk crash on the master happen, transaction whose
 success indicator has been returned to a client might be lost.
 #3 can avoid such a critical situation. This is one of reasons that
 DRBD supports Protocol C, I think.

 Which few people use it, or if they do its because DRBD didn't
 originally support multiple standbys. Not worth emulating IMHO.

If so, #3 would be useful for people who don't afford to buy more
than one standby servers, too :)

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Heikki Linnakangas

On 27/05/10 10:16, Pavel Stehule wrote:

2010/5/27 Heikki Linnakangasheikki.linnakan...@enterprisedb.com:

On 27/05/10 09:50, Pavel Stehule wrote:


2010/5/27 Heikki Linnakangasheikki.linnakan...@enterprisedb.com:


AFAIU, the standard doesn't say anything about named parameters. Oracle
uses
=, but as you said, that's ambiguous with the =operator.

+1 for FOR.


I don't see any advantage of FOR.


Any advantage over AS? It doesn't clash with the foo AS bar syntax that
the standard is using for something completely different, as Peter pointed
out in the original post.


No, standard knows AS in different context. In param list standard
doesn't use keyword AS.


As Peter pointed out in the original post, according to the standard 
function(foo AS bar) means something else than what we have now. 
Please re-read the original post.



We can change ir to support new  standard or don't change it.


What new standard?


ANSI SQL 2011


Oh, does that have something to say about named parameters? Is the draft 
publicly available somewhere?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Peter Eisentraut
On tor, 2010-05-27 at 04:06 +0300, Heikki Linnakangas wrote:
 On 27/05/10 03:57, Robert Haas wrote:
  Being compatible with the SQL
  standard and with Oracle is not to be taken lightly.
 
 I seem to be alone believing that the SQL standard doesn't say anything 
 about named function parameters. Can someone point me to the relevant 
 section of the standard?

It will be in SQL:2011.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Heikki Linnakangas

On 27/05/10 10:49, Peter Eisentraut wrote:

On tor, 2010-05-27 at 04:06 +0300, Heikki Linnakangas wrote:

On 27/05/10 03:57, Robert Haas wrote:

Being compatible with the SQL
standard and with Oracle is not to be taken lightly.


I seem to be alone believing that the SQL standard doesn't say anything
about named function parameters. Can someone point me to the relevant
section of the standard?


It will be in SQL:2011.


Does it mandate = ?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Peter Eisentraut
On tor, 2010-05-27 at 10:51 +0300, Heikki Linnakangas wrote:
 On 27/05/10 10:49, Peter Eisentraut wrote:
  On tor, 2010-05-27 at 04:06 +0300, Heikki Linnakangas wrote:
  On 27/05/10 03:57, Robert Haas wrote:
  Being compatible with the SQL
  standard and with Oracle is not to be taken lightly.
 
  I seem to be alone believing that the SQL standard doesn't say anything
  about named function parameters. Can someone point me to the relevant
  section of the standard?
 
  It will be in SQL:2011.
 
 Does it mandate = ?

routine invocation ::= routine name SQL argument list

routine name ::= [ schema name period ] qualified identifier

SQL argument list ::= left paren [ SQL argument [ { comma SQL
argument }... ] ] right paren

SQL argument ::= value expression
| generalized expression
| target specification
| contextually typed value specification
| named argument specification

generalized expression ::= value expression AS path-resolved
user-defined type name

named argument specification ::= SQL parameter name named argument
assignment token named argument SQL argument

named argument SQL argument ::= value expression
| target specification
| contextually typed value specification

named argument assignment token ::=
=



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Pavel Stehule
2010/5/26 Peter Eisentraut pete...@gmx.net:
 It turns out that the SQL standard uses the function call notation

 foo(this AS that)

 for something else:

 routine invocation ::= routine name SQL argument list

 routine name ::= [ schema name period ] qualified identifier

 SQL argument list ::= left paren [ SQL argument [ { comma SQL
 argument }... ] ] right paren

 SQL argument ::= value expression
 | generalized expression
 | target specification

 generalized expression ::= value expression AS path-resolved
 user-defined type name

 In systems that have inheritance of composite types, this is used to
 specify which type the value is supposed to be interpreted as (for
 example, to treat the value as a supertype).


can it be used (in ANSI SQL semantic) as cast?

like SELECT foo(10.33 AS int)

 Seems kind of bad to overload this with something completely different.
 What should we do?



Is ANSI SQL consistent in this syntax? SQL/XML use AS in different meaning.

Regards
Pavel


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 26/05/10 23:31, Dimitri Fontaine wrote:
 So if you want simplicity to admin, effective data availability and
 precise control over the global setup, I say go for:
   a. transaction level control of the replication level
   b. cascading support
   c. quorum with timeout
   d. choice of commit or rollback at timeout

 Then give me a setup example that you can't express fully.

 One master, one synchronous standby on another continent for HA purposes,
 and one asynchronous reporting server in the same rack as the master. You
 don't want to set up the reporting server as a cascaded slave of the standby
 on the other continent, because that would double the bandwidth required,
 but you also don't want the master to wait for the reporting server.

 The possibilities are endless... Your proposal above covers a pretty good
 set of scenarios, but it's by no means complete. If we try to solve
 everything the configuration will need to be written in a Turing-complete
 Replication Description Language. We'll have to pick a useful,
 easy-to-understand subset that covers the common scenarios. To handle the
 more exotic scenarios, you can write a proxy that sits in front of the
 master, and implements whatever rules you wish, with the rules written
 in C.

Agreed on the Turing-completeness side of those things. My current
thinking is that the proxy I want might simply be a PostgreSQL instance
with cascading support. In your example that would give us:

  Remote Standby, HA 
  Master -- Proxy - 
  Local Standby, Reporting

So what I think we have here is a pretty good trade-off in terms of what
you can do with some simple setup knobs. What's left there is that with
the quorum idea, you're not sure if the one server that's synced is the
remote or local standby, in this example. Several ideas are floating
around (votes, mixed per-standby and per-transaction settings).

Maybe we could have the standby be able to say it's not interesting into
participating into the quorum, that is, it's an async replica, full
stop.

In your example we'd set the local reporting standby as a non-voting
member of the replication setting, the proxy and the master would have a
quorum of 1, and the remote HA standby would vote.

I don't think the idea of having any number of voting coupons other than
0 or 1 on any server will help us the least.

I do think that your proxy idea is a great one and should be in core. By
the way, the cascading/proxy instance could be set without Hot Standby,
if you don't like to be able to monitor it via a libpq connection and
some queries.

 BTW, I think we're going to need a separate config file for listing the
 standbys anyway. There you can write per-server rules and options, but
 explicitly knowing about all the standbys also allows the master to recycle
 WAL as soon as it has been streamed to all the registered
 standbys. Currently we just keep wal_keep_segments files around, just in
 case there's a standby out there that needs them.

I much prefer that each server in the set publish what it wants. It only
connects to 1 given provider. Then we've been talking about this exact
same retention problem for queueing solutions, with Jan, Marko and Jim.

The idea we came up with is a watermarking solution (which already
exists in Skytools 3, in its coarse-grain version). The first approach
is to have all slave give back to its local master/provider/origin the
last replayed WAL/LSN, once in a while. You derive from that a global
watermark and drop WAL files depending on it.

You now have two problems: no more space and why keeping that many files
on the master anyway, maybe some slave could be set up for retention
instead?

To solve that it's possible for each server to be setup with a
restricted set of servers they're deriving their watermark from. That's
when you need per-server options and an explicit list of all the
standbys whatever their level in the cascading tree. That means explicit
maintenance of the entire replication topology.

I don't think we need to solve that already. I think we need to provide
an option on each member of the replication tree to either PANIC or lose
WALs in case they're running out of space when trying to follow the
watermark. It's crude but already allows to have a standby set to
maintain the common archive and have the master drop the WAL files as
soon as possible (respecting wal_keep_segments).

In our case, if a WAL file is no more available from any active server
we still have the option to fetch it from the archives...

Regards,
-- 
Dimitri Fontaine
PostgreSQL DBA, Architecte

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 10:09 +0300, Heikki Linnakangas wrote:
 No, not necessarily. As I said above, you might just want a guarantee 
 that *if* you query the standby, you get up-to-date results.

Of course. COMMIT was already one of the options, so this comment was
already understood.

What we are discussing is whether additional options exist and/or are
desirable. We should not be forcing everybody to COMMIT whether or not
it is robust.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 16:35 +0900, Fujii Masao wrote:
 On Thu, May 27, 2010 at 3:21 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Thu, 2010-05-27 at 11:28 +0900, Fujii Masao wrote:
  On Wed, May 26, 2010 at 10:20 PM, Simon Riggs si...@2ndquadrant.com 
  wrote:
   On Wed, 2010-05-26 at 18:52 +0900, Fujii Masao wrote:
  
   I guess that dropping the support of #3 doesn't reduce complexity
   since the code of #3 is almost the same as that of #2. Like
   walreceiver sends the ACK after receiving the WAL in #2 case, it has
   only to do the same thing after the WAL flush.
  
   Hmm, well the code for #3 is similar also to the code for #4. So if you
   do #2, its easy to do #2, #3 and #4 together.
 
  No. #4 requires the way of prompt communication between walreceiver and
  startup process, but #2 and #3 not. That is, in #4, walreceiver has to
  wake the startup process up as soon as it has flushed WAL. OTOH, the
  startup process has to wake walreceiver up as soon as it has replayed
  WAL, to request it to send the ACK to the master. In #2 and #3, the
  prompt communication from walreceiver to startup process, i.e., changing
  the poll loop in the startup process would also be useful for the data
  to be visible immediately on the standby. But it's not required.
 
  You need to pass WAL promptly on primary from backend to WALSender.
  Whatever mechanism you use can also be reused symmetrically on standby
  to provide #4. So not a problem.
 
 I cannot be so optimistic since the situation differs from one process
 to another.

This spurs some architectural thinking:

I think we need to disconnect the idea of waiting in any of the
components. Anytime we ask WALSender or WALReceiver to wait for
acknowledgement we will be reducing throughput. So we should assume that
they will continue to work as quickly as possible.

The acknowledgement from standby can contain the latest xlog location of
WAL received, WAL written to disk and WAL applied, all by reading values
from shared memory. It's all the same, whether we send back 2 or 3 xlog
locations in the ack message.

Who sends the ack message? Who receives it? Would it be easier to have
this happen in a second pair of processes WALSynchroniser (on primary)
and WAL Acknowledger (on standby). WALAcknowledger would send back a
stream of ack messages with latest xlog positions. WALSynchroniser would
receive these messages and wake up sleeping backends. If we did that
then there'd be almost no change at all to existing code, just
additional code and processes for the sync case. Code would be separate
and there would be no performance concerns either.

Backends can then choose to wait until the xlog location they wish has
been achieved which might be in the next acknowledgement message or in a
subsequent one. That also ensures that the logic for this is completely
on the master and the standby doesn't act differently, apart from
needing to start a WALAcknowledger process if sync rep is requested.

If you do choose to make #3 important, then I'd say you need to work out
how to make WALWriter active as well, so it can perform regular fsyncs,
rather than having WALReceiver wait across that I/O.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] primary/secondary/master/slave/standby

2010-05-27 Thread Dimitri Fontaine
Greg Stark gsst...@mit.edu writes:
 Fwiw I like the word replica but I don't see an obvious choice of
 word to pair it with

I guess it's replica / origin, per choice of Jan Wieck to be found in
our catalogs:

 http://www.postgresql.org/docs/current/static/catalog-pg-trigger.html

  tgenabled char 

  Controls in which session_replication_role modes the trigger fires. 
  O = trigger fires in origin and local modes, D = trigger is
  disabled, R = trigger fires in replica mode, A = trigger fires
  always.

So that's origin/replica, master/slave, primary/standby, master/standby.

-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 16:13 +0900, Fujii Masao wrote:
 On Wed, May 26, 2010 at 10:37 PM, Simon Riggs si...@2ndquadrant.com wrote:

  Please explain what will happen when the near server is unavailable,
  with per-standby settings. Please also explain what will happen if we
  choose to have 4 or 5 servers to maintain performance in case of the
  near server going down. How will we specify the failure modes?
 
 I'll try to explain that.

We've been discussing parameters and how we would define what we want to
happen in various scenarios.

You've not explained what parameters you would use, how and where they
would be set, so we aren't yet any closer to understanding what it is
your proposing.

Please explain how your proposal will work.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 16:13 +0900, Fujii Masao wrote:
 On Wed, May 26, 2010 at 10:37 PM, Simon Riggs si...@2ndquadrant.com wrote:
  If the remote server responded first, then that proves it is a better
  candidate for failover than the one you think of as near. If the two
  standbys vary over time then you have network problems that will
  directly affect the performance on the master; synch_rep = N would
  respond better to any such problems.
 
 No. The remote standby might respond first temporarily though it's almost
 behind the near one. The read-only queries or incrementally updated
 backup operation might cause a bursty disk write, and delay the ACK from
 the standby. The lock contention between read-only queries and recovery
 would delay the ACK. So the standby which responds first is not always
 the best candidate for failover. 

Seems strange. If you have 2 standbys and you say you would like node1
to be the preferred candidate, then you load it so heavily that a remote
server with by-definition much larger network delay responds first, then
I say your preference was wrong. The above situation is caused by the
DBA and the DBA can solve it also - if the preference is to keep a
preferred server then that server would need to be lightly loaded so
it could respond sensibly. 

This is the same thing as having an optimizer pick the best path and
then the user saying no dumb-ass, use the index I tell you even though
it is slower. If you really don't want to know the fastest way, then I
personally will agree you can have that, as is my view (now) on the
optimizer issue also - sometimes the admin does know best.

 Also the administrator generally doesn't
 put the remote standby under the control of a clusterware like heartbeat.
 In this case, the remote standby will never be the candidate for failover.
 But quorum commit cannot cover this simple case.

If you, Jan and Yeb wish to completely exclude standbys from being part
of any quorum, then I guess we need to have per-standby settings to
allow that to be defined. I'm in favour of giving people options. That
needn't be a mandatory per-standby setting, just a non-default option,
so that we can reduce the complexity of configuration for common cases.
If we're looking for simplest-implementation-first that isn't it.

Currently, Oracle provides these settings, which correspond to 
Maximum Performance = quorum = 0
Maximum Availability = quorum = 1, timeout_action = commit
Maximum Protection = quorum = 1, timeout_action = shutdown

So Oracle already supports the quorum case...

Oracle doesn't provide
i) any capability to have quorum  1
ii) any capability to include an async node as a sync node, if the
quorum cannot be reached with servers marked sync, or in the situation
where because of mis-use/mis-configuration the sync servers are actually
slower.
iii) ability to wait for apply
iv) ability to specify wait mode at transaction level

all of those are desirable in some cases and easily possible by
specifying things in the way I've suggested.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 6:30 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Who sends the ack message?

walreceiver

 Who receives it?

walsender

 Would it be easier to have
 this happen in a second pair of processes WALSynchroniser (on primary)
 and WAL Acknowledger (on standby). WALAcknowledger would send back a
 stream of ack messages with latest xlog positions. WALSynchroniser would
 receive these messages and wake up sleeping backends. If we did that
 then there'd be almost no change at all to existing code, just
 additional code and processes for the sync case. Code would be separate
 and there would be no performance concerns either.

No, this seems to be bad idea. We should not establish extra connection
between servers. That would be a source of trouble.

 If you do choose to make #3 important, then I'd say you need to work out
 how to make WALWriter active as well, so it can perform regular fsyncs,
 rather than having WALReceiver wait across that I/O.

Yeah, this might be an option for optimization though I'm not sure how
it has good effect.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard

2010-05-27 Thread Mike Fowler

Peter Eisentraut wrote:

On ons, 2010-05-26 at 11:47 +0100, Mike Fowler wrote:
  

The XMLEXISTS function works with XQuery expressions and doesn't have
the call signature that your patch implements
  
Looking at the manuals of Oracle, Derby and DB2 I see how the call 
signature differs. I also note that Oracle's implementation is XPath 
only, Derby's is partial XQuery and DB2 appears to be full XQuery.


What do people prefer me to do? I see the options as:

1) Change the call signature to match the standard
2) Change the function name back to xpath_exists



It would be nice to make XMLEXISTS work as in the standard, seeing how
many others are providing the same interface.

  

Should option one be the more popular there's further choices:

1) Integrate XQuery support to completely match the standard, however 
this will require the addition of a new library libxquery

2) Leave the XPath as is, inline with Oracle's implementation
3) Hybrid approach. Since XML is a comple time option, add XQuery as 
another. Conditional completion gives the full XQuery support when 
available or just the XPath when not



I think providing XPath is enough, at least for now


Agreed. I'll get another patch together in the next day or two.

Regards,

--
Mike Fowler
Registered Linux user: 379787


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 19:21 +0900, Fujii Masao wrote:
 On Thu, May 27, 2010 at 6:30 PM, Simon Riggs si...@2ndquadrant.com wrote:

  Would it be easier to have
  this happen in a second pair of processes WALSynchroniser (on primary)
  and WAL Acknowledger (on standby). WALAcknowledger would send back a
  stream of ack messages with latest xlog positions. WALSynchroniser would
  receive these messages and wake up sleeping backends. If we did that
  then there'd be almost no change at all to existing code, just
  additional code and processes for the sync case. Code would be separate
  and there would be no performance concerns either.
 
 No, this seems to be bad idea. We should not establish extra connection
 between servers. That would be a source of trouble.

What kind of trouble? You think using an extra connection would cause
problems; why?

I've explained it would greatly simplify the code to do it that way and
improve performance. Those sound like good things, not problems.

  If you do choose to make #3 important, then I'd say you need to work out
  how to make WALWriter active as well, so it can perform regular fsyncs,
  rather than having WALReceiver wait across that I/O.
 
 Yeah, this might be an option for optimization though I'm not sure how
 it has good effect.

As I said, WALreceiver would not need to wait across fsync...

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 7:21 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Seems strange. If you have 2 standbys and you say you would like node1
 to be the preferred candidate, then you load it so heavily that a remote
 server with by-definition much larger network delay responds first, then
 I say your preference was wrong. The above situation is caused by the
 DBA and the DBA can solve it also - if the preference is to keep a
 preferred server then that server would need to be lightly loaded so
 it could respond sensibly.

No. Even if the load is very low in the preferred server, there is
*no* guarantee that it responds first. Per-standby setting can give
such a guarantee, i.e., we can specify #2, #3 or #4 in the preferred
server and #1 in the other.

 This is the same thing as having an optimizer pick the best path and
 then the user saying no dumb-ass, use the index I tell you even though
 it is slower. If you really don't want to know the fastest way, then I
 personally will agree you can have that, as is my view (now) on the
 optimizer issue also - sometimes the admin does know best.

I think that choice of wrong master causes more serious situation than
that of wrong plan.

 Also the administrator generally doesn't
 put the remote standby under the control of a clusterware like heartbeat.
 In this case, the remote standby will never be the candidate for failover.
 But quorum commit cannot cover this simple case.

 If you, Jan and Yeb wish to completely exclude standbys from being part
 of any quorum, then I guess we need to have per-standby settings to
 allow that to be defined. I'm in favour of giving people options. That
 needn't be a mandatory per-standby setting, just a non-default option,
 so that we can reduce the complexity of configuration for common cases.
 If we're looking for simplest-implementation-first that isn't it.

For now, I agree that we support a quorum commit feature for 9.1 or later.
But I don't think that it's simpler, more intuitive and easier-to-understand
than per-standby setting. So I think that we should include the per-standby
setting in the first patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Teodor Sigaev

Anyone working on make contrib/pg_trgm mutibyte encoding aware? If
not, I'm interested in the work.


It's already multibyte safe since 8.4
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 7:33 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-05-27 at 19:21 +0900, Fujii Masao wrote:
 On Thu, May 27, 2010 at 6:30 PM, Simon Riggs si...@2ndquadrant.com wrote:

  Would it be easier to have
  this happen in a second pair of processes WALSynchroniser (on primary)
  and WAL Acknowledger (on standby). WALAcknowledger would send back a
  stream of ack messages with latest xlog positions. WALSynchroniser would
  receive these messages and wake up sleeping backends. If we did that
  then there'd be almost no change at all to existing code, just
  additional code and processes for the sync case. Code would be separate
  and there would be no performance concerns either.

 No, this seems to be bad idea. We should not establish extra connection
 between servers. That would be a source of trouble.

 What kind of trouble? You think using an extra connection would cause
 problems; why?

Because the number of connection failure cases doubles. Likewise, the number
of process failure cases would double.

  If you do choose to make #3 important, then I'd say you need to work out
  how to make WALWriter active as well, so it can perform regular fsyncs,
  rather than having WALReceiver wait across that I/O.

 Yeah, this might be an option for optimization though I'm not sure how
 it has good effect.

 As I said, WALreceiver would not need to wait across fsync...

Right, but walreceiver still needs to wait for WAL flush by walwriter.
If currently WAL flush is the dominant workload for walreceiver,
only leaving it to walwriter might not have so good effect. I'm not sure
whether.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Dimitri Fontaine
Simon Riggs si...@2ndquadrant.com writes:
 Seems strange. If you have 2 standbys and you say you would like node1
 to be the preferred candidate, then you load it so heavily that a remote
 server with by-definition much larger network delay responds first, then
 I say your preference was wrong.

There's a communication mismatch here I think. The problem with the
dynamic aspect of the system is that the admin wants to plan ahead and
choose in advance the failover server.

Other than that I much prefer the automatic and dynamic quorum idea.

 If you, Jan and Yeb wish to completely exclude standbys from being part
 of any quorum, then I guess we need to have per-standby settings to
 allow that to be defined. I'm in favour of giving people options. That
 needn't be a mandatory per-standby setting, just a non-default option,
 so that we can reduce the complexity of configuration for common
 cases.

+1

 Maximum Performance = quorum = 0
 Maximum Availability = quorum = 1, timeout_action = commit
 Maximum Protection = quorum = 1, timeout_action = shutdown

+1

Being able to say that a given server has not been granted to
participate into the vote allowing to reach the global durability quorum
will allow for choosing the failover candidates.

Now you're able to have this reporting server and know for sure that
your sync replicated transactions are not waiting for it.

To summarize, the current per-transaction approach would be :

 - transaction level replication synchronous behaviour
 - proxy/cascading in core
 - quorum setup for deciding any commit is safe
 - any server can be excluded from the sync quorum
 - timeout can still raises exception or ignore (commit)?

This last point seems to need some more discussion, or I didn't
understand well the current positions and proposals.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 3:13 AM, Fujii Masao masao.fu...@gmail.com wrote:
 (1) most standard case: 1 master + 1 sync standby (near)
    When the master goes down, something like a clusterware detects that
    failure, and brings the standby online. Since we can ensure that the
    standby has all the committed transactions, failover doesn't cause
    any data loss.

How do you propose to guarantee that?  ISTM that you have to either
commit locally first, or send the commit to the remote first.  Either
way, the two events won't occur exactly simultaneously.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 20:13 +0900, Fujii Masao wrote:
 On Thu, May 27, 2010 at 7:33 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Thu, 2010-05-27 at 19:21 +0900, Fujii Masao wrote:
  On Thu, May 27, 2010 at 6:30 PM, Simon Riggs si...@2ndquadrant.com wrote:
 
   Would it be easier to have
   this happen in a second pair of processes WALSynchroniser (on primary)
   and WAL Acknowledger (on standby). WALAcknowledger would send back a
   stream of ack messages with latest xlog positions. WALSynchroniser would
   receive these messages and wake up sleeping backends. If we did that
   then there'd be almost no change at all to existing code, just
   additional code and processes for the sync case. Code would be separate
   and there would be no performance concerns either.
 
  No, this seems to be bad idea. We should not establish extra connection
  between servers. That would be a source of trouble.
 
  What kind of trouble? You think using an extra connection would cause
  problems; why?
 
 Because the number of connection failure cases doubles. Likewise, the number
 of process failure cases would double.

Not really. The users wait for just the synchroniser to return not for
two things.  Looks to me that other processes are independent of each
other. Very simple.

   If you do choose to make #3 important, then I'd say you need to work out
   how to make WALWriter active as well, so it can perform regular fsyncs,
   rather than having WALReceiver wait across that I/O.
 
  Yeah, this might be an option for optimization though I'm not sure how
  it has good effect.
 
  As I said, WALreceiver would not need to wait across fsync...
 
 Right, but walreceiver still needs to wait for WAL flush by walwriter.

Why does it? I just explained a design where that wasn't required.

 If currently WAL flush is the dominant workload for walreceiver,
 only leaving it to walwriter might not have so good effect. I'm not sure
 whether.

If we're not sure, we could check before agreeing a design.

WAL flush will be costly unless you have huge disk cache.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 19:50 +0900, Fujii Masao wrote:

 For now, I agree that we support a quorum commit feature for 9.1 or later.
 But I don't think that it's simpler, more intuitive and easier-to-understand
 than per-standby setting. So I think that we should include the per-standby
 setting in the first patch.

There already is a first patch to the community that implements quorum
commit, just not by you.

If you have a better way, describe it in detail and in full now, with
reference to each of the use cases you mentioned, so that people get a
chance to give their opinions on your design. Then we can let the
community decide whether or not that second way is actually better. We
may not need a second patch.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tatsuo Ishii
 It's already multibyte safe since 8.4

No, it doesn't.

$ psql test
Pager usage is off.
psql (8.4.4)
Type help for help.

test=# select similarity('abc', 'abd'); -- OK
 similarity 

   0.33
(1 row)

test=# select similarity('日本語', '日本後'); -- NG
 similarity 

NaN
(1 row)

test=# select show_trgm('abc'); -- OK
show_trgm
-
 {  a, ab,abc,bc }
(1 row)

test=# select show_trgm('日本語'); -- NG
 show_trgm 
---
 {}
(1 row)

Encoding is EUC_JP, locale is C. Included is the script to reproduce
the problem.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


pg_trgm.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 8:28 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 27, 2010 at 3:13 AM, Fujii Masao masao.fu...@gmail.com wrote:
 (1) most standard case: 1 master + 1 sync standby (near)
    When the master goes down, something like a clusterware detects that
    failure, and brings the standby online. Since we can ensure that the
    standby has all the committed transactions, failover doesn't cause
    any data loss.

 How do you propose to guarantee that?  ISTM that you have to either
 commit locally first, or send the commit to the remote first.  Either
 way, the two events won't occur exactly simultaneously.

Letting the transaction wait until the standby has received / flushed /
replayed the WAL before it returns a success indicator to a client
would guarantee that. This ensures that all transactions which a client
knows as committed exist in the memory or disk of the standby. So we
would be able to see those transactions from new master after failover.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 8:30 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Why does it? I just explained a design where that wasn't required.

Hmm.. my expression might have been ambiguous. Walreceiver still needs
to wait for WAL flush by walwriter *before* sending the ACK to the master,
in #3 case. Because, in #3, the master has to wait until the standby has
flushed the WAL.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 8:30 PM, Simon Riggs si...@2ndquadrant.com wrote:
 There already is a first patch to the community that implements quorum
 commit, just not by you.

Yeah, AFAIK, that patch includes also per-standby setting.

 If you have a better way, describe it in detail and in full now, with
 reference to each of the use cases you mentioned, so that people get a
 chance to give their opinions on your design. Then we can let the
 community decide whether or not that second way is actually better. We
 may not need a second patch.

See http://archives.postgresql.org/pgsql-hackers/2010-05/msg01407.php

But I think that we should focus on per-standby setting at first.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Andres Freund
Hi,

On Thursday 27 May 2010 13:53:37 Tatsuo Ishii wrote:
  It's already multibyte safe since 8.4
 
 No, it doesn't.
 Encoding is EUC_JP, locale is C. Included is the script to reproduce
 the problem.
test=# select show_trgm('日本語');
show_trgm   
 ---
  {0x8194c0,0x836e53,0x1dc363,0x1e22e9}
 (1 row)
 
Time: 0.443 ms
test=# select similarity('日本語', '日本後');
  similarity 
 
0.33
 (1 row)
 
Time: 0.426 ms


Encoding is UTF-8...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tatsuo Ishii
  No, it doesn't.
  Encoding is EUC_JP, locale is C. Included is the script to reproduce
  the problem.
 test=# select show_trgm('日本語');
 show_trgm   
  ---
   {0x8194c0,0x836e53,0x1dc363,0x1e22e9}
  (1 row)
  
 Time: 0.443 ms
 test=# select similarity('日本語', '日本後');
   similarity 
  
 0.33
  (1 row)
  
 Time: 0.426 ms
 
 
 Encoding is UTF-8...

What is your locale?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 8:02 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, May 27, 2010 at 8:28 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 27, 2010 at 3:13 AM, Fujii Masao masao.fu...@gmail.com wrote:
 (1) most standard case: 1 master + 1 sync standby (near)
    When the master goes down, something like a clusterware detects that
    failure, and brings the standby online. Since we can ensure that the
    standby has all the committed transactions, failover doesn't cause
    any data loss.

 How do you propose to guarantee that?  ISTM that you have to either
 commit locally first, or send the commit to the remote first.  Either
 way, the two events won't occur exactly simultaneously.

 Letting the transaction wait until the standby has received / flushed /
 replayed the WAL before it returns a success indicator to a client
 would guarantee that. This ensures that all transactions which a client
 knows as committed exist in the memory or disk of the standby. So we
 would be able to see those transactions from new master after failover.

There could still be additional transactions that the original master
has committed locally but were not acked to the client.  I guess you'd
just work around that by taking a new base backup from the new master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Andres Freund
On Thursday 27 May 2010 14:40:41 Tatsuo Ishii wrote:
   No, it doesn't.
   Encoding is EUC_JP, locale is C. Included is the script to reproduce
   the problem.
  
  test=# select show_trgm('日本語');
  
  show_trgm
   
   ---
   
{0x8194c0,0x836e53,0x1dc363,0x1e22e9}
   
   (1 row)
  
  Time: 0.443 ms
  test=# select similarity('日本語', '日本後');
  
similarity
   
   
   
  0.33
   
   (1 row)
  
  Time: 0.426 ms
  
  
  Encoding is UTF-8...
 
 What is your locale?
It was en_EN.UTF-8. Interesting. With C it fails...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 9:48 PM, Robert Haas robertmh...@gmail.com wrote:
 There could still be additional transactions that the original master
 has committed locally but were not acked to the client.  I guess you'd
 just work around that by taking a new base backup from the new master.

Right.

Unfortunately the transaction aborted for a client might have already
been committed in the standby. In this case, we might need to eliminate
the mismatch of transaction status between a client and new master
after failover.

BTW, the similar situation might happen even when only one server is
running. If the server goes down before returning a success to a
client after flushing the commit record, the mismatch would happen
after restart of the server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] primary/secondary/master/slave/standby

2010-05-27 Thread Heikki Linnakangas

On 27/05/10 12:39, Dimitri Fontaine wrote:

Greg Starkgsst...@mit.edu  writes:

Fwiw I like the word replica but I don't see an obvious choice of
word to pair it with


I guess it's replica / origin, per choice of Jan Wieck to be found in
our catalogs:

  http://www.postgresql.org/docs/current/static/catalog-pg-trigger.html

   tgenabledchar

   Controls in which session_replication_role modes the trigger fires.
   O = trigger fires in origin and local modes, D = trigger is
   disabled, R = trigger fires in replica mode, A = trigger fires
   always.

So that's origin/replica, master/slave, primary/standby, master/standby.


master/standby is my favourite, and I believe we have a rough consensus 
on that.


I started to search/replace primary - master, but started to have 
second thoughts when I got to the section in the docs about standby servers:


http://developer.postgresql.org/pgdocs/postgres/warm-standby.html

Somehow that just doesn't sound as good after s/primary/master, the 
first sentence in particular. I think the reason is that master brings 
to mind an active connection between the master and standby, while 
primary sounds more loosely-coupled.


Perhaps we should use master/standby when discussing streaming 
replication, and primary/standby when talking about a standby setup in 
general, possibly using file-based log shipping. The distinction is 
quite vague, so we'll have to document both terms as synonyms of each other.


Thoughts?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] primary/secondary/master/slave/standby

2010-05-27 Thread Heikki Linnakangas

On 27/05/10 12:39, Dimitri Fontaine wrote:

Greg Starkgsst...@mit.edu  writes:

Fwiw I like the word replica but I don't see an obvious choice of
word to pair it with


I guess it's replica / origin, per choice of Jan Wieck to be found in
our catalogs:

  http://www.postgresql.org/docs/current/static/catalog-pg-trigger.html

   tgenabledchar

   Controls in which session_replication_role modes the trigger fires.
   O = trigger fires in origin and local modes, D = trigger is
   disabled, R = trigger fires in replica mode, A = trigger fires
   always.

So that's origin/replica, master/slave, primary/standby, master/standby.


master/standby is my favorite, and I believe we have a rough consensus 
on that.


I started to search/replace primary - master, but started to have 
second thoughts when I got to the section in the docs about standby servers:


http://developer.postgresql.org/pgdocs/postgres/warm-standby.html

Somehow that just doesn't sound as good after s/primary/master, the 
first sentence in particular. I think the reason is that master brings 
to mind an active connection between the master and standby, while 
primary sounds more loosely-coupled.


Perhaps we should use master/standby when discussing streaming 
replication, and primary/standby when talking about a standby setup in 
general, possibly using file-based log shipping. The distinction is 
quite vague, so we'll have to document both terms as synonyms of each other.


Thoughts?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tatsuo Ishii
  What is your locale?
 It was en_EN.UTF-8. Interesting. With C it fails...

Yes, pg_trgm seems to have problems with multibyte + C locale.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Straightforward Synchronous Replication

2010-05-27 Thread Simon Riggs
Following design offers simplicity of design, performance and user
control over sync rep waits, including wait-for-apply for HS.

This implements Oracle's Maximum Availability option AND Maximum
Performance options both together, rather than just one or the other:
async and sync replication together, under user control.

* BACKEND:
In xact.c: Immediately after fsync during commit logic
if (sync_rep != NONE)
{
max_wakeup_time = commit_timestamp + sync_rep_timeout;

SetAlarm(max_wakeup_time);  // similar to statement timeout
WaitOnQueue(commitLSN);
DisableAlarm();
}

In proc.c: in signal handler code
if (wakeup  waiting_on_commit)
RemoveFromQueue()

* New process: WALSync (on primary)
Receives messages from WALAck on standby and wakes up queued backends
that have reached the requested commitLSN. If there are multiple WALSync
processes they all try to remove backends from the head of the queue.
Process started in same way as WALSender, when request arrives from
standby.
(WaitOnQueue() returns immediately if no WALSync are started, since that
means no sync rep yet available)

* New process: WALAck (on standby)
Reads shared memory to get last received and last applied xlog location
and sends message to WALSync on primary. Loop/Sleep forever.
Values in shared mem already put there by WALReceiver and Startup
processes. Reuse message protocol as for WALSender-WALReceiver.
Process started after WALReceiver connects, if additional option in
recovery.conf. Initiates second connection to primary, issues slightly
different startup command to create WALSync.

That's it.

The above needs just two parameters at user level
synch_rep = none | recv | apply
synch_rep_timeout = Ns
and an additional parameter in recovery.conf to say whether a standby is
providing the facility for sync replication (as requested by Yeb etc)
(default = yes).

So this is the same as having quorum = 0 or 1 (boring but simple) and
having sync_rep_timeout_action = commit in all cases (clear behaviour in
failure modes, without need for per-standby parameters).

The user specifies how long they wish to wait, but that wait never
changes the flow of WAL data through the cluster, so we don't need to
retune and redesign the existing system for reduced latency. It allows
mixed synchronous and asynchronous replication with *ease*. If we design
things differently that wouldn't be the case.

The design is:
* simple - Doesn't require any WAL or libpq changes
* modular - almost completely isolated from existing components in 9.0.
(e.g. WALSender doesn't know or care about WALSync, WALReceiver never
needs to speak to WALAck directly).
* performant - async and sync can co-exist; WALReceiver never waits; no
need to retune WALSender operation for synchronous mode
* low latency - the backchannel from standby to primary uses a separate
connection so can operate without slowing down data from primary
* user centric - allows user control over this feature, an important
tool for real world performance
* hot standby - implements xid back channel with ease (later phase)

We can hang other options on this later - nothing else is essential.

Development time ~ 1 man month because similar code exists for all
aspects described above, so no research or internals discussion
required.

Yes, this is a 3rd design for sync rep, though I think it improves upon
the things I've heard so far from other authors and also includes
feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as
well, when 9.1 dev starts and a benchmark should be interesting also.

-- 
 Simon Riggs   www.2ndQuadrant.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-27 Thread Sander, Ingo (NSN - DE/Munich)

With the parameter checkpoint_segment and wal_keep_segments the max. number of 
wal segments are set. If now the max number is reached, 
(1) the segments are deleted/recycled 
or (2) if the time set by the checkpoint_timeout is over, a checkpoint is set 
and if possible a deletion/recycling is done. 
This is the mechanism on the active side of a db server. On the standby side 
however only unused tranferred segments will be deleted if the 
checkpoint_timeout mechanism (2) is executed.
Is this a correct behaviour or it is an error? 
 
I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10 and 
checkpoint_timeout set to 30min) that in my stress test the disk usage on 
standby side is increased up to 2GB with xlog segments whereby on the active 
side only ~60MB xlog files are available (we have patched the xlog file size to 
4MB). To prevent this one possibility is to decreace the checkpoint_timeout to 
a low value (30sec), however this had the disadvantage that a checkpoint is 
often executed on active side which can influence the performance. Another 
possibility is to have different postgresql.conf on active and on standby side, 
but this is not our preferred solution. 



Best Regards/mfG
Ingo Sander
=
Nokia Siemens Networks GmbH Co. KG
NWS EP CP SVSS Platform Tech Support DE
St.-Martin-Str. 76
D-81541 München
*Tel.:  +49-89-515938390
*ingo.san...@nsn.com




Re: [HACKERS] Synchronization levels in SR

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 9:09 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, May 27, 2010 at 9:48 PM, Robert Haas robertmh...@gmail.com wrote:
 There could still be additional transactions that the original master
 has committed locally but were not acked to the client.  I guess you'd
 just work around that by taking a new base backup from the new master.

 Right.

 Unfortunately the transaction aborted for a client might have already
 been committed in the standby. In this case, we might need to eliminate
 the mismatch of transaction status between a client and new master
 after failover.

 BTW, the similar situation might happen even when only one server is
 running. If the server goes down before returning a success to a
 client after flushing the commit record, the mismatch would happen
 after restart of the server.

True.  But that's a slightly different case.  Clients could fail to
receive commit ACKs for a variety of reasons, like losing network
connectivity momentarily.  They had better be prepared for that no
matter whether replication is in use or not.  The new issue that
replication adds is that you've got to make sure that the two (or n)
nodes don't disagree with each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-27 Thread Kevin Grittner
Jan Wieck janwi...@yahoo.com wrote:
 On 5/26/2010 4:34 PM, Kevin Grittner wrote:
 
 My latest idea for handling this in WAL-based replication
 involves WAL-logging information about the transaction through
 which a the committing transaction makes it safe to view.  There
 are a few options here at the detail level that I'm still
 thinking through.  The idea would be that the xmin from read-only
 queries on the slaves might be somewhere behind where you would
 expect based on transactions committed.  (The details involve
 such things as where non-serializable transactions fall into the
 plan on both sides, and whether it's worth the effort to
 special-case read-only transactions on the master.)
  
 I can't say that I'm 100% sure that some lurking detail won't
 shoot this technique down for HS, but it seems good to me at a
 conceptual level.
 
 Without simulating multiple simultaneous transactions during
 playback, how are you going to manage that the tuples, already
 inserted on behalf of the ongoing master transactions, disappear
 when they abort on the master?
 
When do writes ever become visible to a snapshot without having been
committed?  I'm not talking about changing that in any way.  I'm
talking about deferring visibility of committed transactions until
they can be viewed without risking serialization anomalies.  This
requires, at a minimum, that any concurrent serializable
transactions which are not read-only have completed.
(Perhaps I'm not understanding your question)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 What is your locale?
 It was en_EN.UTF-8. Interesting. With C it fails...

 Yes, pg_trgm seems to have problems with multibyte + C locale.

It's not a problem, it's just pilot error, or possibly inadequate
documentation.  pg_trgm uses the locale's definition of alpha,
digit, etc.  In C locale only basic ASCII letters and digits will be
recognized as word constituents.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-27 Thread Greg Stark
On Wed, May 26, 2010 at 5:38 PM, Greg Stark gsst...@mit.edu wrote:
 How about just API generalities? Like, where do you need this data, on
 the master or on the slave? Would PGXC like it on the transaction
 coordinator?

 What question do you need to answer, do you need to pull out sets of
 commits in certain ranges or look up specific transaction ids and find
 out when they committed? Or do you only need to answer which of two
 transaction ids committed first?


This thread has been hard to follow for me. Were any of these
questions answered?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tatsuo Ishii
  Yes, pg_trgm seems to have problems with multibyte + C locale.
 
 It's not a problem, it's just pilot error, or possibly inadequate
 documentation.  pg_trgm uses the locale's definition of alpha,
 digit, etc.  In C locale only basic ASCII letters and digits will be
 recognized as word constituents.

That means there is no chance to make pg_trgm work with multibyte + C
locale?  If so, I will leave pg_trgm as it is and provide private
patches for those who need the functionality.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-27 Thread Fujii Masao
On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich)
ingo.san...@nsn.com wrote:

 With the parameter checkpoint_segment and wal_keep_segments the max. number
 of wal segments are set. If now the max number is reached,

 (1) the segments are deleted/recycled
 or (2) if the time set by the checkpoint_timeout is over, a checkpoint is
 set and if possible a deletion/recycling is done.

 This is the mechanism on the active side of a db server. On the standby side
 however only unused tranferred segments will be deleted if the
 checkpoint_timeout mechanism (2) is executed.

 Is this a correct behaviour or it is an error?

 I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10
 and checkpoint_timeout set to 30min) that in my stress test the disk usage
 on standby side is increased up to 2GB with xlog segments whereby on the
 active side only ~60MB xlog files are available (we have patched the xlog
 file size to 4MB). To prevent this one possibility is to decreace the
 checkpoint_timeout to a low value (30sec), however this had the disadvantage
 that a checkpoint is often executed on active side which can influence the
 performance. Another possibility is to have different postgresql.conf on
 active and on standby side, but this is not our preferred solution.

I guess this happens because the frequency of checkpoint on the standby is
too lower than that on the master. In the master, checkpoint occurs for every
consumption of three segments because of checkpoint_segments = 3. On the
other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
occurs for every 30 minutes because of checkpoint_timeout = 30min.

The walreceiver should signal the bgwriter to start checkpoint if it has
received more than checkpoint_segments WAL files, like normal processing?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Straightforward Synchronous Replication

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 9:08 AM, Simon Riggs si...@2ndquadrant.com wrote:
 * New process: WALAck (on standby)
 Reads shared memory to get last received and last applied xlog location
 and sends message to WALSync on primary. Loop/Sleep forever.

So would WALAck be polling shared memory?  That would increase latency
significantly, I think, though perhaps you have a plan for avoiding
that?

 The above needs just two parameters at user level
 synch_rep = none | recv | apply
 synch_rep_timeout = Ns
 and an additional parameter in recovery.conf to say whether a standby is
 providing the facility for sync replication (as requested by Yeb etc)
 (default = yes).

 So this is the same as having quorum = 0 or 1 (boring but simple) and
 having sync_rep_timeout_action = commit in all cases (clear behaviour in
 failure modes, without need for per-standby parameters).

This seems good, but I think we need a little more definition about
what happens with sync_rep_timeout expires.

 Yes, this is a 3rd design for sync rep, though I think it improves upon
 the things I've heard so far from other authors and also includes
 feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as
 well, when 9.1 dev starts and a benchmark should be interesting also.

It's great that we have so many people who want to implement this
feature, or in one case already have.  I'm not sure whose design is
best, but I do hope that we can avoid dueling patches.  There are
plenty of other good features to work on also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 10:09 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, May 27, 2010 at 10:13 PM, Sander, Ingo (NSN - DE/Munich)
 ingo.san...@nsn.com wrote:

 With the parameter checkpoint_segment and wal_keep_segments the max. number
 of wal segments are set. If now the max number is reached,

 (1) the segments are deleted/recycled
 or (2) if the time set by the checkpoint_timeout is over, a checkpoint is
 set and if possible a deletion/recycling is done.

 This is the mechanism on the active side of a db server. On the standby side
 however only unused tranferred segments will be deleted if the
 checkpoint_timeout mechanism (2) is executed.

 Is this a correct behaviour or it is an error?

 I have observed (checkpoint_segment set to 3; wal_keep_segments set to 10
 and checkpoint_timeout set to 30min) that in my stress test the disk usage
 on standby side is increased up to 2GB with xlog segments whereby on the
 active side only ~60MB xlog files are available (we have patched the xlog
 file size to 4MB). To prevent this one possibility is to decreace the
 checkpoint_timeout to a low value (30sec), however this had the disadvantage
 that a checkpoint is often executed on active side which can influence the
 performance. Another possibility is to have different postgresql.conf on
 active and on standby side, but this is not our preferred solution.

 I guess this happens because the frequency of checkpoint on the standby is
 too lower than that on the master. In the master, checkpoint occurs for every
 consumption of three segments because of checkpoint_segments = 3. On the
 other hand, in the standby, only checkpoint_timeout has effect, so checkpoint
 occurs for every 30 minutes because of checkpoint_timeout = 30min.

 The walreceiver should signal the bgwriter to start checkpoint if it has
 received more than checkpoint_segments WAL files, like normal processing?

Is this also an issue when using log shipping, or just with SR?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 It's not a problem, it's just pilot error, or possibly inadequate
 documentation.  pg_trgm uses the locale's definition of alpha,
 digit, etc.  In C locale only basic ASCII letters and digits will be
 recognized as word constituents.

 That means there is no chance to make pg_trgm work with multibyte + C
 locale?  If so, I will leave pg_trgm as it is and provide private
 patches for those who need the functionality.

Exactly what do you consider to be the missing functionality?
You need a notion of word vs non-word character from somewhere,
and the locale setting is the standard place to get that.  The
core text search functionality behaves the same way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tatsuo Ishii
 Exactly what do you consider to be the missing functionality?
 You need a notion of word vs non-word character from somewhere,
 and the locale setting is the standard place to get that.  The
 core text search functionality behaves the same way.

No. Text search works fine with multibyte + C locale.

Anyway locale is completely usesless for finding word vs non-character
an agglutinative language such as Japanese.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tom Lane
Tatsuo Ishii is...@sraoss.co.jp writes:
 Anyway locale is completely usesless for finding word vs non-character
 an agglutinative language such as Japanese.

Well, that doesn't mean that the answer is to use C locale ;-)

However, you could possibly think about making this bit of code
more flexible:

#ifdef KEEPONLYALNUM
#define iswordchr(c)(t_isalpha(c) || t_isdigit(c))
#else
#define iswordchr(c)(!t_isspace(c))
#endif

Currently it seems to be hard-wired to the first case in standard
builds.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tatsuo Ishii
 Well, that doesn't mean that the answer is to use C locale ;-)

Of course it's up to user whether to use C locale or not. I just want
pg_trgm work with C locale as well.

 However, you could possibly think about making this bit of code
 more flexible:
 
 #ifdef KEEPONLYALNUM
 #define iswordchr(c)  (t_isalpha(c) || t_isdigit(c))
 #else
 #define iswordchr(c)  (!t_isspace(c))
 #endif
 
 Currently it seems to be hard-wired to the first case in standard
 builds.

Yup. Here is the patch in my mind:

*** trgm_op.c~  2009-06-11 23:48:51.0 +0900
--- trgm_op.c   2010-05-27 23:38:20.0 +0900
***
*** 59,65 
  }
  
  #ifdef KEEPONLYALNUM
! #define iswordchr(c)  (t_isalpha(c) || t_isdigit(c))
  #else
  #define iswordchr(c)  (!t_isspace(c))
  #endif
--- 59,65 
  }
  
  #ifdef KEEPONLYALNUM
! #define iswordchr(c)  (t_isalpha(c) || t_isdigit(c) || (lc_ctype_is_c()  
!t_isspace(c)))
  #else
  #define iswordchr(c)  (!t_isspace(c))
  #endif

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 ! #define iswordchr(c)(t_isalpha(c) || t_isdigit(c) || 
 (lc_ctype_is_c()  !t_isspace(c)))

This seems entirely arbitrary.  It might fix things in your view
but it will break the longstanding behavior for other people.

I think a more appropriate type of fix would be to expose the
KEEPONLYALNUM option as a GUC, or some other way of letting the
user decide what he wants.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Greg Stark
On Thu, May 27, 2010 at 3:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think a more appropriate type of fix would be to expose the
 KEEPONLYALNUM option as a GUC, or some other way of letting the
 user decide what he wants.


So I think a GUC is broken because pg_tgrm has a index opclasses and
any indexes built using one setting will be broken if the GUC is
changed.

Perhaps we need two sets of functions (which presumably call the same
implementation with a flag to indicate which definition to use). Then
you can define an index using one or the other and the meaning would
be stable.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.0beta2 release plans

2010-05-27 Thread Tom Lane
Current thought among core is to wrap beta2 Thursday June 3 (a week
from today) for public release on Monday the 7th.  Get those fixes in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specification for Trusted PLs?

2010-05-27 Thread Bruce Momjian
Tom Lane wrote:
 Joshua Tolley eggyk...@gmail.com writes:
  Agreed. As long as a trusted language can do things outside the
  database only by going through a database and calling some function to
  which the user has rights, in an untrusted language, that seems decent
  to me. A user with permissions to launch_missiles() would have a
  function in an untrusted language to do it, but there's no reason an
  untrusted language shouldn't be able to say SELECT
 
 s/untrusted/trusted/ here, right?

One thing that has always bugged me is that the use of
trusted/untrusted for languages is confusing, because it is trusted
users who can run untrusted languages.  I think trust is more
associated with users than with software features.  I have no idea how
this confusion could  be clarified.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Peter Eisentraut
On tor, 2010-05-27 at 23:20 +0900, Tatsuo Ishii wrote:
 Anyway locale is completely usesless for finding word vs non-character
 an agglutinative language such as Japanese.

I don't know about Japanese, but the locale approach works just fine for
other agglutinative languages.  I would rather suspect that it is the
trigram approach that might be rather useless for such languages,
because you are going to get a lot of similarity hits for the affixes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] quoting and recovery.conf

2010-05-27 Thread Greg Stark
On Fri, May 14, 2010 at 6:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The main reason for having a separate recovery.conf file is that its
 existence is what drives the setting of InArchiveRecovery.  If we were
 to devise some other trigger for that condition, it'd be possible to
 fold all those settings in as GUC variables.

 And the removal of recovery.conf at end of recovery prevents the
 re-entry into archive recovery if we crash.

These things made sense when we were only dealing with PITR recovery
from a backup intending to bring up the database when the PITR reached
the target. That's now only one specific use case. For hot standbys we
don't want them to come up ever, even if we crash. And for other
standby databases we probably want to control this manually, not
automatically.

In these cases it's really confusing for users that some parameters
have to be in postgresql.conf and some in recovery.conf. They don't
see any distinction between these two files since they don't intend to
ever see the case where the target is reached and the file moved out
of the way automatically.

My suggestion is we should fold all the parameters into
postgresql.conf and treat recovery.conf as an additional
postgresql.conf to read. It would allow any GUC. The only difference
is that it would be moved out of the way automatically when the target
is reached.

Ideally I would have all the configuration be in postgresql.conf and
be identical between all servers, masters and slaves. The only
difference would be a single line which controlled whether to push or
pull wal data. Switching roles would be a regular GUC change in
postgresql.conf and pg_ctl reload.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Distclean does not remove gram.c

2010-05-27 Thread Gurjeet Singh
On Wed, May 26, 2010 at 9:00 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, May 26, 2010 at 8:20 PM, Andrew Dunstan and...@dunslane.net
 wrote:
  Gurjeet Singh wrote:
 
  I did a `git clean -f -d` and even that did not remove gram.c,
 apparently
  because this file _was_ alive at some point in the past hence git won't
  remove it even though the current branch does not have gram.c.
 
  At first glance that looks like a git bug.

 My guess is that either .git/info/exclude or a .gitignore file
 someplace says to ignore gram.c.  git clean -df will not remove such
 files; you need git clean -dfx if you want that.


Yes indeed, -x is what I needed.

Thanks.
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.enterprisedb.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device


Re: [HACKERS] List traffic

2010-05-27 Thread Greg Stark
On Sat, May 15, 2010 at 7:38 AM, Marc G. Fournier scra...@hub.org wrote:
 most people are not prepared to understand the concept of more than
 one list for project...

 Apparently you don't use very many large projects ... FreeBSD has 20+ lists,
 dedicated to various aspects of both end user and developer ... I imagine
 Linux has *as many if not more* ... MySQL, if memory servers, has a half
 dozen or more ... etc ...

Sure, if we have distinctions which make sense then having separate
lists makes sense. Linux has separate lists for different drivers,
different parts of the kernel, projects to improve the kernel in
various specific ways (latency, etc). I'm all for having a list
dedicated to infrastructure (oddly named -www here) and a list
dedicated to printing flyers and arranging conferences (-advocacy)
since those topics are usually well defined. Lists like -ecpg or -odbc
would work fine if the traffic warranted them.

But some of the lists we have now are 99% overlap with each other -- I
claim because the definitions are meaningless. What part of postgres
discussion (aside from this thread) *don't* relate in some way to SQL?
Or administration? Or performance? Most performance problems end up
being solved by adjusting SQL or changing GUCs. Mot administration
questions are originally posed as general help questions. If you're
subscribed to these lists you get a random, fairly small, subset of
discussion related these topics.

Perhaps what I'm looking for is a more sensible division that allows
most of the traffic related to the subtopics to actually go there. It
would have to be a division so clearcut that anyone who doesn't follow
could reasonably be blamed for not following etiquette. That's simply
not true with the current divisions.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] quoting and recovery.conf

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 16:20 +0100, Greg Stark wrote:

 My suggestion is we should fold all the parameters into
 postgresql.conf and treat recovery.conf as an additional
 postgresql.conf to read. It would allow any GUC. The only difference
 is that it would be moved out of the way automatically when the target
 is reached.

Good idea, needs a little fleshing out.

What do we do if recovery.conf and postgresql.conf have different
settings in them? Trigger reload at end of recovery? Presumably we would
also ignore server startup parameters in recovery.conf?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tatsuo Ishii
 I don't know about Japanese, but the locale approach works just fine for
 other agglutinative languages.  I would rather suspect that it is the
 trigram approach that might be rather useless for such languages,
 because you are going to get a lot of similarity hits for the affixes.

I'm not sure what you mean by affixes.  But I will explain...

A Japanese sentence consists of words. Problem is, each word is not
separated by space (agglutinative). So most text tools such as text
search need preprocess which finds word boundaries by looking up
dictionaries (and smart grammer analysis routine). In the process
affixes can be determined and perhaps removed from the target word
group to be used for text search (note that removing affixes is no
relevant to locale). Once we get space separated sentence, it can be
processed by text search or by pg_trgm just same as Engligh. (Note
that these preprocessing are done outside PostgreSQL world). The
difference is just the word can be consists of non ASCII letters.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Straightforward Synchronous Replication

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 10:11 -0400, Robert Haas wrote:
 On Thu, May 27, 2010 at 9:08 AM, Simon Riggs si...@2ndquadrant.com wrote:
  * New process: WALAck (on standby)
  Reads shared memory to get last received and last applied xlog location
  and sends message to WALSync on primary. Loop/Sleep forever.
 
 So would WALAck be polling shared memory?  That would increase latency
 significantly, I think, though perhaps you have a plan for avoiding
 that?

The backends are going to be released in batches anyway, so I can't see
how polling makes a difference.

Polling means no waiting, so asynchronous action and higher throughput,
and with sufficiently high polling rate no significant loss of latency.

The other plan requires WALReceiver to wait for fsync and apply, which
seems very likely to suck badly from a latency perspective. While its
waiting it is also reducing throughout of incoming WAL. It's hard to see
how that would work well.

You could also do this by avoiding the wait in WALReceiver, but then
that becomes more like polling anyway.

  The above needs just two parameters at user level
  synch_rep = none | recv | apply
  synch_rep_timeout = Ns
  and an additional parameter in recovery.conf to say whether a standby is
  providing the facility for sync replication (as requested by Yeb etc)
  (default = yes).
 
  So this is the same as having quorum = 0 or 1 (boring but simple) and
  having sync_rep_timeout_action = commit in all cases (clear behaviour in
  failure modes, without need for per-standby parameters).
 
 This seems good, but I think we need a little more definition about
 what happens with sync_rep_timeout expires.

It commits... that is very clear: sync_rep_timeout_action = commit in
all cases. Commit is the only viable option, since abort and
wait-forever both have disadvantages pointed out for them.

  Yes, this is a 3rd design for sync rep, though I think it improves upon
  the things I've heard so far from other authors and also includes
  feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as
  well, when 9.1 dev starts and a benchmark should be interesting also.
 
 It's great that we have so many people who want to implement this
 feature, or in one case already have.  I'm not sure whose design is
 best, but I do hope that we can avoid dueling patches.  There are
 plenty of other good features to work on also.

There is already a patch on SR, yet Masao is discussing another that
contains what looks to me like very close to nothing of Zoltan's work,
not even similar ideas. The dueling patches situation looks like it
already exists to me, though not of my making or encouragement. Even if
I agreed with everything one of those authors say, there would still be
two patches.

Considering a variety of design approaches seems like a good idea for an
important feature, especially when the information is thin and opinions
run high. It's unlikely that anyone is right about everything, which is
why I've amalgamated this simple proposal from everything said so far.

It's easy to add some things if we add them at the start, much harder to
retrofit them. I've shown that some things are easier than has been
said, with fewer parameters and a good case for better performance also.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Tatsuo Ishii
 So I think a GUC is broken because pg_tgrm has a index opclasses and
 any indexes built using one setting will be broken if the GUC is
 changed.
 
 Perhaps we need two sets of functions (which presumably call the same
 implementation with a flag to indicate which definition to use). Then
 you can define an index using one or the other and the meaning would
 be stable.

It's worse. pg_trgm has another compile option IGNORECASE which
might affect index opclasses.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] List traffic

2010-05-27 Thread Marc G. Fournier

On Thu, 27 May 2010, Greg Stark wrote:


Sure, if we have distinctions which make sense then having separate
lists makes sense. Linux has separate lists for different drivers,
different parts of the kernel, projects to improve the kernel in
various specific ways (latency, etc). I'm all for having a list
dedicated to infrastructure (oddly named -www here)


Actually, infrastructure is appropriately discussed on -sysadmins ... web 
is on -www ... tends to be a bit of overlap since -sysadmins was added 
later, and prior to that we did discuss on -www ...



since those topics are usually well defined. Lists like -ecpg or -odbc
would work fine if the traffic warranted them.


I don't agree with the comment about 'if traffic warranted them' though 
... the fact that there is very little traffic should be what makes them 
attractive / useful ... you don't have to weed through alot of posts to 
find the odbc/ecpg related ones ...


Perhaps what I'm looking for is a more sensible division that allows 
most of the traffic related to the subtopics to actually go there. It 
would have to be a division so clearcut that anyone who doesn't follow 
could reasonably be blamed for not following etiquette. That's simply 
not true with the current divisions.


how about something -sql vs -tuning ... ?  -tuning replacing -performance, 
which I do agree could be sql *or* server ... where -tuning would be more 
obviously server related ...



Marc G. FournierHub.Org Hosting Solutions S.A.
scra...@hub.org http://www.hub.org

Yahoo:yscrappySkype: hub.orgICQ:7615664MSN:scra...@hub.org

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Straightforward Synchronous Replication

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 11:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-05-27 at 10:11 -0400, Robert Haas wrote:
 On Thu, May 27, 2010 at 9:08 AM, Simon Riggs si...@2ndquadrant.com wrote:
  * New process: WALAck (on standby)
  Reads shared memory to get last received and last applied xlog location
  and sends message to WALSync on primary. Loop/Sleep forever.

 So would WALAck be polling shared memory?  That would increase latency
 significantly, I think, though perhaps you have a plan for avoiding
 that?

 The backends are going to be released in batches anyway, so I can't see
 how polling makes a difference.

 Polling means no waiting, so asynchronous action and higher throughput,
 and with sufficiently high polling rate no significant loss of latency.

I guess what I'm trying to figure out is the part that says
loop/sleep forever.  That sounds like you wait 50 ms (or some other
interval), then check shared memory to see if anything has changed, if
not you do it again.  That means that up to 49.9 ms (or whatever
interval you picked) could be spent waiting before you realize that
new WAL has been applied, which I suspect will not work out very well.
 On the other hand checking it in a TIGHT loop would mean using up a
whole CPU on an idle system, so that's not practical either.  ISTM
you'd need some kind of signalling system between the startup process
and the WALAck process, so that the startup process can wake WALAck
after applying each bit of WAL (or maybe the startup process knows
about the lowest LSN that WALAck cares about, and wakes it only upon
reaching that point).

 The other plan requires WALReceiver to wait for fsync and apply, which
 seems very likely to suck badly from a latency perspective. While its
 waiting it is also reducing throughout of incoming WAL. It's hard to see
 how that would work well.

 You could also do this by avoiding the wait in WALReceiver, but then
 that becomes more like polling anyway.

I'm not sure if I understand this part, so let me try to say it
another way and you can tell me if I've got it right.  I think your
concern is that, during the time that WALReceiver is waiting for one
chunk of WAL to get fsynced, the startup process might finish applying
an earlier chunk of WAL that is of interest to the master.  The ACK
will therefore be delayed until the fsync completes and WALReceiver
can again do other things, like check whether there are any ACKs that
must be sent.  Is that it, or have I missed the boat completely?

  The above needs just two parameters at user level
  synch_rep = none | recv | apply
  synch_rep_timeout = Ns
  and an additional parameter in recovery.conf to say whether a standby is
  providing the facility for sync replication (as requested by Yeb etc)
  (default = yes).
 
  So this is the same as having quorum = 0 or 1 (boring but simple) and
  having sync_rep_timeout_action = commit in all cases (clear behaviour in
  failure modes, without need for per-standby parameters).

 This seems good, but I think we need a little more definition about
 what happens with sync_rep_timeout expires.

 It commits... that is very clear: sync_rep_timeout_action = commit in
 all cases. Commit is the only viable option, since abort and
 wait-forever both have disadvantages pointed out for them.

So, do we declare the sync server offline at that point and stop
waiting for it, or do we continue waiting for it on every transaction?
 If we declare it dead, what are the criteria for subsequently making
it alive again?

  Yes, this is a 3rd design for sync rep, though I think it improves upon
  the things I've heard so far from other authors and also includes
  feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as
  well, when 9.1 dev starts and a benchmark should be interesting also.

 It's great that we have so many people who want to implement this
 feature, or in one case already have.  I'm not sure whose design is
 best, but I do hope that we can avoid dueling patches.  There are
 plenty of other good features to work on also.

 There is already a patch on SR, yet Masao is discussing another that
 contains what looks to me like very close to nothing of Zoltan's work,
 not even similar ideas. The dueling patches situation looks like it
 already exists to me, though not of my making or encouragement. Even if
 I agreed with everything one of those authors say, there would still be
 two patches.

Oh, I wasn't aware that Fujii Masao's work had progressed as far as an
actual patch yet.

 Considering a variety of design approaches seems like a good idea for an
 important feature, especially when the information is thin and opinions
 run high. It's unlikely that anyone is right about everything, which is
 why I've amalgamated this simple proposal from everything said so far.

Agreed.

 It's easy to add some things if we add them at the start, much harder to
 retrofit them. I've shown that some things are easier than has been
 said, with 

Re: [HACKERS] [ADMIN] command tag logging

2010-05-27 Thread Tom Lane
alvherre alvhe...@commandprompt.com writes:
 Excerpts from Ray Stell's message of mié may 26 17:08:33 -0400 2010:
 I just installed a compiled from src 8.3.11.  I usually include %i, command 
 tag,
 in the log_line_prefix setting.  This causes some spewage I'd not seen before
 on connection received lines as if it is dumping the environment:

 Hmm, I bet it's the recent %.*s patch.

That is in the right place, isn't it.  That would suggest that
get_ps_display() is returning a wrong length on Ray's machine.
It works okay here, but since that's platform-specific code that
hardly proves much.  Ray, what platform is this exactly?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] command tag logging

2010-05-27 Thread alvherre
Excerpts from Tom Lane's message of jue may 27 12:49:49 -0400 2010:
 alvherre alvhe...@commandprompt.com writes:
  Excerpts from Ray Stell's message of mié may 26 17:08:33 -0400 2010:
  I just installed a compiled from src 8.3.11.  I usually include %i, 
  command tag,
  in the log_line_prefix setting.  This causes some spewage I'd not seen 
  before
  on connection received lines as if it is dumping the environment:
 
  Hmm, I bet it's the recent %.*s patch.
 
 That is in the right place, isn't it.  That would suggest that
 get_ps_display() is returning a wrong length on Ray's machine.
 It works okay here, but since that's platform-specific code that
 hardly proves much.  Ray, what platform is this exactly?

FWIW it fails for me too (Debian running Linux 2.6.32).  Adding some
logging to stderr results in this:

psdisp (len 2130) is: “““/pgsql/install/83_rel/bin/postmaster”””
/pgsql/install/83_rel/bin/postmasterPGDATA=/pgsql/install/83_rel/dataORBIT_SOCKETDIR=/home/alvherre/tmp/orbit-alvherreSSH_AGENT_PID=2739GPG_AGENT_INFO=/tmp/gpg-aXAHSs/S.gpg-agent:2704:1SHELL=/bin/bashTERM=xtermXDG_SESSION_COOKIE=e50959452240490c59b0366b96665400-1274967349.87074-853952583HISTSIZE=1TMPDIR=/home/alvherre/tmpGTK_RC_FILES=/etc/gtk/gtkrc:/home/alvherre/.gtkrc-1.2-gnome2WINDOWID=29360152GNOME_KEYRING_CONTROL=/home/alvherre/tmp/keyring-EUoSfgGTK_MODULES=canberra-gtk-moduleUSER=alvherrehttp_proxy=http://localhost:8118XTERM_SHELL=/bin/bashHISTFILESIZE=1LD_LIBRARY_PATH=/pgsql/install/83_rel/libLS_COLORS=no=00:fi=00:di=01;35:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.tar=01;31:*.tgz=01;31:*.tbz2=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lha=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.jpg=01
 
;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:*.tiff=01;35:SSH_AUTH_SOCK=/home/alvherre/tmp/keyring-EUoSfg/sshTMOUT=0USERNAME=alvherreSESSION_MANAGER=local/perhan:@/tmp/.ICE-unix/2689,unix/perhan:/tmp/.ICE-unix/2689PAGER=lessDESKTOP_SESSION=gnomePATH=/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games:/home/alvherre/bin:/sbin:/usr/sbinGDM_XSERVER_LOCATION=localPX_CONFIG_ORDER=envvarPWD=/home/alvherreEDITOR=vimLANG=es_CL.UTF-8GDM_LANG=es_CL.UTF-8TZ=America/SantiagoGDMSESSION=gnomeHISTIGNORE=ls:bg:fg:cd:exit:XTERM_VERSION=XTerm(256)XTERM_LOCALE=es_CL.UTF-8HISTCONTROL=ignorespace:erasedupsHOME=/home/alvherreSHLVL=1GNOME_DESKTOP_SESSION_ID=this-is-deprecatedno_proxy=localhost,127.0.0.0/8BASH_ENV=/home/alvherre/.bashrcLOGNAME=alvherreLESS=-XRM
 
-x4VISUAL=vimXDG_DATA_DIRS=/usr/share/gnome:/usr/local/share/:/usr/share/:/usr/share/gdm/DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-TnbbC5PUiR,guid=833f76565b26a89543f6aa42004f_PX_CONFIG
 
_ORDER=WINDOWPATH=7DISPLAY=:0.0HOSTFILE=/home/alvherre/.hostsXAUTHORITY=/home/alvherre/.Xauthority
 LOG:  connection received: host=[local]


Maybe the problem is the PS_PADDING setting?

I patched as below -- obviously the \0 didn't make any difference (it
was the first thing I tried), because the length, as you say, is wrong.

*** log_line_prefix(StringInfo buf)
*** 1615,1621 
--- 1615,1623 
int displen;
  
psdisp = get_ps_display(displen);
+   fprintf(stderr, psdisp (len %d) is: “““%s”””\n, displen, 
psdisp);
appendBinaryStringInfo(buf, psdisp, displen);
+   appendStringInfoChar(buf, '\0');
}
break;
case 'r':


-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-27 Thread Jan Wieck

On 5/27/2010 9:59 AM, Greg Stark wrote:

On Wed, May 26, 2010 at 5:38 PM, Greg Stark gsst...@mit.edu wrote:

How about just API generalities? Like, where do you need this data, on
the master or on the slave? Would PGXC like it on the transaction
coordinator?

What question do you need to answer, do you need to pull out sets of
commits in certain ranges or look up specific transaction ids and find
out when they committed? Or do you only need to answer which of two
transaction ids committed first?



This thread has been hard to follow for me. Were any of these
questions answered?



Yes.

On 5/26/2010 4:49 PM, Jan Wieck wrote:

On 5/26/2010 12:38 PM, Greg Stark wrote:

 On Wed, May 26, 2010 at 5:10 PM, Jan Wieck janwi...@yahoo.com wrote:

 ... but to answer that request, actually I don't even think we should be
 discussing API specifics.

 
 How about just API generalities? Like, where do you need this data, on

 the master or on the slave? Would PGXC like it on the transaction
 coordinator?
 
 What question do you need to answer, do you need to pull out sets of

 commits in certain ranges or look up specific transaction ids and find
 out when they committed? Or do you only need to answer which of two
 transaction ids committed first?


The question I want answered is

   what was the order and xid of the next 0..n transactions, that
   committed after transaction X?

Preferably I would avoid scanning the entire available WAL just to get 
the next n xid's to process.


The proposal assigned a unique serial number (file segment and position 
driven) to each xid and used that for the ordering as well as 
identification of the last known transaction. That is certainly a 
premature implementation detail.


In this implementation it wouldn't even matter if a transaction that was 
recorded actually never made it because it crashed before the WAL flush. 
It would be reported by this commit order feature, but there would be 
no traces of whatever it did to be found inside the DB, so that anomaly 
is harmless.



Jan

-- Anyone who trades liberty for security deserves neither liberty nor 
security. -- Benjamin Franklin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers 




--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Peter Eisentraut wrote:
 In systems that have inheritance of composite types, this is used to
 specify which type the value is supposed to be interpreted as (for
 example, to treat the value as a supertype).

Why don't they just use CAST() syntax for that, instead of adding this
unnecessary syntax wart?

If their complaint is that CAST() is too much typing, perhaps they
could adopt :: cast notation ;-)

 I think we should fix it now.  Quick thought: maybe we could use FOR 
 instead of AS: select myfunc(7 for a, 6 for b);

I'm afraid FOR doesn't work either; it'll create a conflict with the
spec-defined SUBSTRING(x FOR y) syntax.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] command tag logging

2010-05-27 Thread Tom Lane
alvherre alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of jue may 27 12:49:49 -0400 2010:
 That is in the right place, isn't it.  That would suggest that
 get_ps_display() is returning a wrong length on Ray's machine.
 It works okay here, but since that's platform-specific code that
 hardly proves much.  Ray, what platform is this exactly?

 FWIW it fails for me too (Debian running Linux 2.6.32).

Hmm.  It seems like the %.*s change could only have affected things if
the PS display area contains \0 characters before the theoretical end
of the string.  Which it shouldn't, once we've set the display, but
Ray is only reporting this for log_connection output which might come
out before that.

In any case it strikes me that get_ps_display() is designed on the
assumption that it needn't be particularly fast, but using its result
in log_line_prefix is a place in which performance could indeed matter.
Maybe we should go to some effort to track the intended display string
length explicitly so we could avoid the mucking about in
get_ps_display().

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] command tag logging

2010-05-27 Thread Ray Stell
On Thu, May 27, 2010 at 12:49:49PM -0400, Tom Lane wrote:
 alvherre alvhe...@commandprompt.com writes:
  Excerpts from Ray Stell's message of mi?? may 26 17:08:33 -0400 2010:
  I just installed a compiled from src 8.3.11.  I usually include %i, 
  command tag,
  in the log_line_prefix setting.  This causes some spewage I'd not seen 
  before
  on connection received lines as if it is dumping the environment:
 
  Hmm, I bet it's the recent %.*s patch.
 
 That is in the right place, isn't it.  That would suggest that
 get_ps_display() is returning a wrong length on Ray's machine.
 It works okay here, but since that's platform-specific code that
 hardly proves much.  Ray, what platform is this exactly?

I should have included this:

   version  
 
-
 PostgreSQL 8.3.11 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 
20080704 (Red Hat 4.1.2-46)
(1 row)

[postgres ~]$ uname -a
Linux horntail.cns.vt.edu 2.6.18-194.3.1.el5PAE #1 SMP Sun May 2 04:42:25 EDT 
2010 i686 i686 i386 GNU/Linux

[postgres ~]$ cat /etc/issue
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
Kernel \r on an \m

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Keepalive for max_standby_delay

2010-05-27 Thread Simon Riggs
On Wed, 2010-05-26 at 16:22 -0700, Josh Berkus wrote:
  Just this second posted about that, as it turns out.
  
  I have a v3 *almost* ready of the keepalive patch. It still makes sense
  to me after a few days reflection, so is worth discussion and review. In
  or out, I want this settled within a week. Definitely need some RR
  here.
 
 Does the keepalive fix all the issues with max_standby_delay?  Tom?

OK, here's v4.

Summary

* WALSender adds a timestamp onto the header of every WAL chunk sent.

* Each WAL record now has a conceptual send timestamp that remains
constant while that record is replayed. This is used as the basis from
which max_standby_delay is calculated when required during replay.

* Send timestamp is calculated as the later of the timestamp of chunk in
which WAL record was sent and the latest XLog time.

* WALSender sends an empty message as a keepalive when nothing else to
send. (No longer a special message type for the keepalive).

I think its close, but if there's a gaping hole here somewhere then I'll
punt for this release.

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
***
*** 4222,4247  The commands accepted in walsender mode are:
/varlistentry
varlistentry
term
!   Bytereplaceablen/replaceable
/term
listitem
para
!   Data that forms part of WAL data stream.
/para
/listitem
/varlistentry
!   /variablelist
/para
/listitem
/varlistentry
/variablelist
!  /para
!  para
 A single WAL record is never split across two CopyData messages. When
 a WAL record crosses a WAL page boundary, however, and is therefore
 already split using continuation records, it can be split at the page
 boundary. In other words, the first main WAL record and its
 continuation records can be split across different CopyData messages.
   /para
  /listitem
/varlistentry
--- 4222,4257 
/varlistentry
varlistentry
term
!   Bytereplaceable8/replaceable
/term
listitem
para
!   Message timestamp.
/para
/listitem
/varlistentry
!   varlistentry
!   term
!   Bytereplaceablen/replaceable
!   /term
!   listitem
!   para
!   Data that forms part of WAL data stream. (May be zero length).
/para
/listitem
/varlistentry
/variablelist
!   /para
!   para
 A single WAL record is never split across two CopyData messages. When
 a WAL record crosses a WAL page boundary, however, and is therefore
 already split using continuation records, it can be split at the page
 boundary. In other words, the first main WAL record and its
 continuation records can be split across different CopyData messages.
+   /para
+   /listitem
+   /varlistentry
+   /variablelist
   /para
  /listitem
/varlistentry
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 1938,1944  UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)
  			UpdateControlFile();
  			minRecoveryPoint = newMinRecoveryPoint;
  
! 			ereport(DEBUG2,
  	(errmsg(updated min recovery point to %X/%X,
  		minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff)));
  		}
--- 1938,1944 
  			UpdateControlFile();
  			minRecoveryPoint = newMinRecoveryPoint;
  
! 			ereport(DEBUG3,
  	(errmsg(updated min recovery point to %X/%X,
  		minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff)));
  		}
***
*** 9210,9218  retry:
  {
  	/*
  	 * While walreceiver is active, wait for new WAL to arrive
! 	 * from primary.
  	 */
! 	receivedUpto = GetWalRcvWriteRecPtr();
  	if (XLByteLT(*RecPtr, receivedUpto))
  	{
  		/*
--- 9210,9218 
  {
  	/*
  	 * While walreceiver is active, wait for new WAL to arrive
! 	 * from primary. Get next applychunk and do other bookkeeping.
  	 */
! 	receivedUpto = GetWalRcvNextApplyChunk();
  	if (XLByteLT(*RecPtr, receivedUpto))
  	{
  		/*
*** a/src/backend/replication/walreceiver.c
--- b/src/backend/replication/walreceiver.c
***
*** 394,410  XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len)
  		case 'w':/* WAL records */
  			{
  XLogRecPtr	recptr;
  
! if (len  sizeof(XLogRecPtr))
  	ereport(ERROR,
  			(errcode(ERRCODE_PROTOCOL_VIOLATION),
  			 errmsg_internal(invalid WAL message received from primary)));
  
  memcpy(recptr, buf, sizeof(XLogRecPtr));
  buf += sizeof(XLogRecPtr);
  len -= sizeof(XLogRecPtr);
  
! XLogWalRcvWrite(buf, len, recptr);
  break;
  			}
  		default:
--- 394,427 
  		case 'w':/* WAL records */
  			{
  XLogRecPtr	recptr;

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread David E. Wheeler
On May 27, 2010, at 9:59 AM, Tom Lane wrote:

 I think we should fix it now.  Quick thought: maybe we could use FOR 
 instead of AS: select myfunc(7 for a, 6 for b);
 
 I'm afraid FOR doesn't work either; it'll create a conflict with the
 spec-defined SUBSTRING(x FOR y) syntax.

How about ISPARAMVALUEFOR? That shouldn't conflict with anything.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why my manualy constructed raw parser tree produce failed to execute?

2010-05-27 Thread Mohammad Heykal Abdillah
Hi all,

I have try to understand how parser work. So far the raw parser in
PostgreSQL will produce that called raw parser tree and the raw
parser tree will be passed to analyzer, right?

I have modified PostgreSQL, so the program wont call function
raw_parser -a function container that make by yacc and lex-. The part
that i try to modified was at src/backend/tcop/postgres.c function
pg_parse_query. I have using GDB to see list and node structure (a
parse tree structure that made using yacc) of my working query.

I have constructed same list and node structure as my working query,
using manual code (lit_make1, lappend, makeNode, etc). GDB pprint show
that my manualy constructed list are identically with my working query,
at least when i try to compare it by my eye 1 on 1. But when my manualy
constructed query tree is execute, it produce error: unrecognized node
type.

Now to the question, why my manualy constructed list was failed to
execute? I was pretty sure that my list node was identical with yacc.

Is there something that i miss when i consctructed my list (perhaps some
list structure part that not printed by GDB)? By the way in GBD i using
call pprint(node/list name) to display my list.

Thank You.
-- 
Mohammad Heykal Abdillah heykal.abdil...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-27 Thread Josh Berkus

 Well, maybe I'm confused here, but arranging things so that we NEVER
 have to visit the page after initially writing it seems like it's
 setting the bar almost impossibly high. 

That is the use case, though.  What I've encountered so far at 3 client
sites is tables which are largely append-only, with a few selects and
very few updates ( 2%) on recent data.   In general, once data gets
flushed out of memory, it goes to disk and never gets recalled, and
certainly not written.  Thinks are hunky-dory until we reach
max_freeze_age, at which point the server has to chew through hundreds
of gigabytes of old data just to freeze them, sometimes bringing the
application to a halt in the process.

The user's perspective on this is quite reasonable: if I haven't
selected these pages, and I haven't written to them, why does autovacuum
need to visit them and screw up my server performance?

  Consider a table that is
 regularly written but append-only.  Every time autovacuum kicks in,
 we'll go and remove any dead tuples and then mark the pages
 PD_ALL_VISIBLE and set the visibility map bits, which will cause
 subsequent vacuums to ignore the all-visible portions of the table...
 until anti-wraparound kicks in, at which point we'll vacuum the entire
 table and freeze everything.

 If, however, we decree that you can't write a new tuple into a
 PD_ALL_VISIBLE page without freezing the existing tuples, then you'll
 still have the small, incremental vacuums but those are pretty cheap,

That only works if those pages were going to be autovacuumed anyway.  In
the case outlined above (which I've seen at 3 different production sites
this year), they wouldn't be; a table with less than 2% updates and
deletes does not get vacuumed until max_freeze_age for any reason.  For
that matter, pages which are getting autovacuumed are not a problem,
period; they're being read and written and freezing them is not an issue.

I'm not seeing a way of fixing this common issue short of overhauling
CLOG, or of creating a freeze_map.  Darn.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Peter Eisentraut
On fre, 2010-05-28 at 00:46 +0900, Tatsuo Ishii wrote:
  I don't know about Japanese, but the locale approach works just fine for
  other agglutinative languages.  I would rather suspect that it is the
  trigram approach that might be rather useless for such languages,
  because you are going to get a lot of similarity hits for the affixes.
 
 I'm not sure what you mean by affixes.  But I will explain...
 
 A Japanese sentence consists of words. Problem is, each word is not
 separated by space (agglutinative). So most text tools such as text
 search need preprocess which finds word boundaries by looking up
 dictionaries (and smart grammer analysis routine). In the process
 affixes can be determined and perhaps removed from the target word
 group to be used for text search (note that removing affixes is no
 relevant to locale). Once we get space separated sentence, it can be
 processed by text search or by pg_trgm just same as Engligh. (Note
 that these preprocessing are done outside PostgreSQL world). The
 difference is just the word can be consists of non ASCII letters.

I think the problem at hand has nothing at all to do with agglutination
or CJK-specific issues.  You will get the same problem with other
languages *if* you set a locale that does not adequately support the
characters in use.  E.g., Russian with locale C and encoding UTF8:

select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E
\u043D\u044B');
 similarity

NaN
(1 row)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] List traffic

2010-05-27 Thread Josh Berkus
On 5/27/10 8:38 AM, Greg Stark wrote:
 Lists like -ecpg or -odbc
 would work fine if the traffic warranted them.

A low-traffic list is a feature, not a bug.  Most people don't *like*
subscribing to lists which have 80posts/day.

 But some of the lists we have now are 99% overlap with each other -- I
 claim because the definitions are meaningless. What part of postgres
 discussion (aside from this thread) *don't* relate in some way to SQL?
 Or administration? Or performance? Most performance problems end up
 being solved by adjusting SQL or changing GUCs. 

This is a set theory fallacy.  While most performance issues are
administration issues as well, it is NOT therefore true that most
administration issues are also performance issues.  In fact, I'd say
that the -performance list does an excellent job of sticking to
troubleshooting performance issues only.  And for someone who has a
performance issue, and does not want to field 100 emails about can't
install Postgre, that's a feature.

 Mot administration
 questions are originally posed as general help questions. If you're
 subscribed to these lists you get a random, fairly small, subset of
 discussion related these topics.

Only someone who is a postgresql developer would consider 15-30
posts/day small.  For most of our user base, the level of traffic on
-performance, -sql, and -general is already too high and many people
don't subscribe to these lists because it is too high.  I get complaints
-- and people personal-sending me questions because they don't want to
subscribe -- all the time.

Having fewer posts on any particular list is *desireable*.  It's a good
thing.  It's *only* a problem when a bug report or user question goes
unanswered because the list is unattended.  And so far, I've only seen
one report of that.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.0beta2 release plans

2010-05-27 Thread Simon Riggs
On Thu, 2010-05-27 at 11:24 -0400, Tom Lane wrote:

 Current thought among core is to wrap beta2 Thursday June 3 (a week
 from today) for public release on Monday the 7th.  Get those fixes in.

I believe my issues are all sorted, apart from one outstanding patch to
attempt to address the max_standby_delay discussions.

If anybody knows different please ping me directly. I'm going to be in
powersave mode for a few weeks.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-27 Thread Josh Berkus
On 5/26/10 6:32 PM, Robert Haas wrote:
 Hmm, yeah.  Maybe we should freeze when we set PD_ALL_VISIBLE; that
 might be just as good, and simpler.  Assuming the visibility map is
 sufficiently crash-safe/non-buggy, we could then teach VACUUM that
 it's OK to advance relfrozenxid even when doing just a partial vacuum
 - because any pages that were skipped must contain only frozen tuples.
  Previously you've objected to proposals in this direction because
 they might destroy forensic information, but maybe we should do it
 anyway.

It would be an improvement, and easier than the various ways of never
having to visit the pages, which are all fairly intensive.  Given the
destruction of rollback information, though, we'd probably want a way to
switch this behaviour on and off as an autovacuum setting.

Does this send us down the wrong path, though?  I thought we wanted to
think about removing hint bits so that we could implement things like
CRCs.  No?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] command tag logging

2010-05-27 Thread Tom Lane
Ray Stell ste...@cns.vt.edu writes:
 On Thu, May 27, 2010 at 12:49:49PM -0400, Tom Lane wrote:
 That is in the right place, isn't it.  That would suggest that
 get_ps_display() is returning a wrong length on Ray's machine.
 It works okay here, but since that's platform-specific code that
 hardly proves much.  Ray, what platform is this exactly?

 [postgres ~]$ cat /etc/issue
 Red Hat Enterprise Linux Server release 5.5 (Tikanga)

OK, I can reproduce it when I try on my Fedora box.  The problem is that
log_connections emits a log message before init_ps_display() has been
called, and the ps_status.c logic isn't careful to ensure that it
returns something sane in that case.  It accidentally failed to fail
too badly before I changed the elog.c logic, because there'd be an
embedded null after the program name in most cases.  I suppose people
might even have thought that printing postmaster or postgres for
%i was intended behavior there.

I think the most useful fix is to create a static variable to hold the
notional strlen(ps_buffer), which will initialize to zero, and then
we can use that instead of groveling over the string in get_ps_display.
Should improve performance a tad as well as fixing this problem.
Will work on that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON manipulation functions

2010-05-27 Thread Joseph Adams
I've started implementing the JSON datatype; the repo is at
http://git.postgresql.org/gitweb?p=json-datatype.git .

On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 Would it be a bad idea to give an enum and a function the same name
 (which appears to be allowed by PostgreSQL) ?  If so, json_type(json)
 could be json_typeof(json) or something instead.

 No, I think that's a fine idea.

I tried making a function named json_type that has the same name as
the type json_type.  However, this doesn't work as expected:

SELECT json_type('[1,2,3]');

Instead of calling json_type with '[1,2,3]' casted to JSON, it's
trying to cast '[1,2,3]' to json_type.  Is there a way to override
this behavior, or would I be better off renaming the function?

Note that if the function were renamed, the literal would implicitly be json:

SELECT json_typeof('[1,2,3]'); -- works

I tried this:

CREATE OR REPLACE FUNCTION json_type(json)
RETURNS json_type
AS 'MODULE_PATHNAME','json_get_type'
LANGUAGE C STRICT IMMUTABLE;

CREATE CAST (json AS json_type) WITH FUNCTION json_type(json);

However, json_type('[1,2,3]') still doesn't work (it doesn't infer
that '[1,2,3]' should be casted to json first).  I also tried each of
AS ASSIGNMENT and AS IMPLICIT as well.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON manipulation functions

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 2:39 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 I've started implementing the JSON datatype; the repo is at
 http://git.postgresql.org/gitweb?p=json-datatype.git .

 On Fri, May 14, 2010 at 1:15 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 Would it be a bad idea to give an enum and a function the same name
 (which appears to be allowed by PostgreSQL) ?  If so, json_type(json)
 could be json_typeof(json) or something instead.

 No, I think that's a fine idea.

 I tried making a function named json_type that has the same name as
 the type json_type.  However, this doesn't work as expected:

 SELECT json_type('[1,2,3]');

 Instead of calling json_type with '[1,2,3]' casted to JSON, it's
 trying to cast '[1,2,3]' to json_type.  Is there a way to override
 this behavior, or would I be better off renaming the function?

Well, you should rename either the function or the type, I guess.  Not
sure which.  Calling it json_typeof would be reasonable...

 Note that if the function were renamed, the literal would implicitly be json:

 SELECT json_typeof('[1,2,3]'); -- works

 I tried this:

 CREATE OR REPLACE FUNCTION json_type(json)
 RETURNS json_type
 AS 'MODULE_PATHNAME','json_get_type'
 LANGUAGE C STRICT IMMUTABLE;

 CREATE CAST (json AS json_type) WITH FUNCTION json_type(json);

 However, json_type('[1,2,3]') still doesn't work (it doesn't infer
 that '[1,2,3]' should be casted to json first).  I also tried each of
 AS ASSIGNMENT and AS IMPLICIT as well.

Yeah, I don't think you want to go that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 2:17 PM, Josh Berkus j...@agliodbs.com wrote:
 On 5/26/10 6:32 PM, Robert Haas wrote:
 Hmm, yeah.  Maybe we should freeze when we set PD_ALL_VISIBLE; that
 might be just as good, and simpler.  Assuming the visibility map is
 sufficiently crash-safe/non-buggy, we could then teach VACUUM that
 it's OK to advance relfrozenxid even when doing just a partial vacuum
 - because any pages that were skipped must contain only frozen tuples.
  Previously you've objected to proposals in this direction because
 they might destroy forensic information, but maybe we should do it
 anyway.

 It would be an improvement, and easier than the various ways of never
 having to visit the pages, which are all fairly intensive.  Given the
 destruction of rollback information, though, we'd probably want a way to
 switch this behaviour on and off as an autovacuum setting.

It's not going to destroy anything that is needed for rollback unless
there's a bug - PD_ALL_VISIBLE only gets set when all tuples on the
page are visible to all backends.  That can't happen until all
transactions that wrote the page, and all others that have a lower
xmin, have committed.  That having been said, if making it a GUC makes
people less nervous about doing it, then +1 from me.

 Does this send us down the wrong path, though?  I thought we wanted to
 think about removing hint bits so that we could implement things like
 CRCs.  No?

PD_ALL_VISIBLE is a page-level bit, not a tuple-level bit, and I
strongly suspect it's not going anywhere.  It's critical
infrastructure for index-only scans, among other things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 2:00 PM, Josh Berkus j...@agliodbs.com wrote:
 Well, maybe I'm confused here, but arranging things so that we NEVER
 have to visit the page after initially writing it seems like it's
 setting the bar almost impossibly high.

 That is the use case, though.  What I've encountered so far at 3 client
 sites is tables which are largely append-only, with a few selects and
 very few updates ( 2%) on recent data.   In general, once data gets
 flushed out of memory, it goes to disk and never gets recalled, and
 certainly not written.

We might be able to optimize this case if the transactions are small,
such that they commit before dirtying too large a fraction of
shared_buffers.  We could - at least in theory - teach the bgwriter or
some other process to freeze them before writing them to disk the
first time.  But if the blocks have to be written to disk before
transaction commit it seems to me we're DOA, unless we're willing to
retain arbitrarily large amounts of CLOG.

What might be more practical is to try to find ways to spread out the
I/O so that it doesn't happen all at once in a huge ornery spike.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 1:27 PM, David E. Wheeler da...@kineticode.com wrote:
 On May 27, 2010, at 9:59 AM, Tom Lane wrote:

 I think we should fix it now.  Quick thought: maybe we could use FOR
 instead of AS: select myfunc(7 for a, 6 for b);

 I'm afraid FOR doesn't work either; it'll create a conflict with the
 spec-defined SUBSTRING(x FOR y) syntax.

 How about ISPARAMVALUEFOR? That shouldn't conflict with anything.

Or we could use the Finnish word
epäjärjestelmällistyttämättömyydellänsäkäänköhän, which I'm pretty
sure is not currently used in our grammar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread David E. Wheeler
On May 27, 2010, at 11:55 AM, Robert Haas wrote:

 Or we could use the Finnish word
 epäjärjestelmällistyttämättömyydellänsäkäänköhän, which I'm pretty
 sure is not currently used in our grammar.

I thought that was an Icelandic volcano.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 2:01 PM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-05-28 at 00:46 +0900, Tatsuo Ishii wrote:
  I don't know about Japanese, but the locale approach works just fine for
  other agglutinative languages.  I would rather suspect that it is the
  trigram approach that might be rather useless for such languages,
  because you are going to get a lot of similarity hits for the affixes.

 I'm not sure what you mean by affixes.  But I will explain...

 A Japanese sentence consists of words. Problem is, each word is not
 separated by space (agglutinative). So most text tools such as text
 search need preprocess which finds word boundaries by looking up
 dictionaries (and smart grammer analysis routine). In the process
 affixes can be determined and perhaps removed from the target word
 group to be used for text search (note that removing affixes is no
 relevant to locale). Once we get space separated sentence, it can be
 processed by text search or by pg_trgm just same as Engligh. (Note
 that these preprocessing are done outside PostgreSQL world). The
 difference is just the word can be consists of non ASCII letters.

 I think the problem at hand has nothing at all to do with agglutination
 or CJK-specific issues.  You will get the same problem with other
 languages *if* you set a locale that does not adequately support the
 characters in use.  E.g., Russian with locale C and encoding UTF8:

 select similarity(E'\u0441\u043B\u043E\u043D', E'\u0441\u043B\u043E
 \u043D\u044B');
  similarity
 
        NaN
 (1 row)

What I can't help wondering as I'm reading this discussion is -
Tatsuo-san said upthread that he has a problem with pg_trgm that he
does not have with full text search.  So what is full text search
doing differently than pg_trgm?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-27 Thread Robert Haas
On Thu, May 27, 2010 at 2:59 PM, David E. Wheeler da...@kineticode.com wrote:
 On May 27, 2010, at 11:55 AM, Robert Haas wrote:
 Or we could use the Finnish word
 epäjärjestelmällistyttämättömyydellänsäkäänköhän, which I'm pretty
 sure is not currently used in our grammar.

 I thought that was an Icelandic volcano.

No, that's Eyjafjallajökull.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >