date:20070622

[HACKERS] In California for a few days

2007-06-22 Thread Bruce Momjian

FYI, I am visiting California until Wednesday, June 27, to attend a
funeral.  I will be reading email, but not as frequently.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Bugtraq: Having Fun With PostgreSQL

2007-06-22 Thread Tom Lane

Jim Nasby <[EMAIL PROTECTED]> writes:
> On Jun 19, 2007, at 1:27 PM, Josh Berkus wrote:
>> Not all OSes support ident ... Solaris and OpenBSD for two, don't,  
>> because they see ident as insecure.

> What about the unix domain socket, though? AFAIK that doesn't rely on  
> ident but some other method...

On OpenBSD we use getpeereid() for unix sockets, and there are
equivalent things on some other Unixen.  We could never go over to
ident as the standard default, though, because not all platforms
have these sorts of features (if indeed they have unix sockets at
all ...); and in any case it's not very secure for TCP.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Bugtraq: Having Fun With PostgreSQL

2007-06-22 Thread Jim Nasby


On Jun 19, 2007, at 1:27 PM, Josh Berkus wrote:

I know there's issues with using ident sameuser via TCP, but what
about for filesystem socket connections?


Not all OSes support ident ... Solaris and OpenBSD for two, don't,  
because

they see ident as insecure.


What about the unix domain socket, though? AFAIK that doesn't rely on  
ident but some other method...

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] Refactoring parser/analyze.c

2007-06-22 Thread Tom Lane

In connection with bug #3403
http://archives.postgresql.org/pgsql-bugs/2007-06/msg00114.php
I've come to the conclusion that we really shouldn't do *any* processing
of utility commands at parse analysis time; they should be left as
raw-grammar output trees until execution.

The key reason for this is that any processing we do that is dependent
on database state might be obsolete by the time of execution, and we
don't have any infrastructure for taking locks or otherwise checking the
up-to-dateness of a utility command tree.  The time delay involved could
be significant in the case of a command that is put into the plan cache
(eg, a statement in a plpgsql function), so this isn't an academic
concern.  I had already foreseen this and delayed the processing of
several utility commands (eg, CREATE INDEX, CREATE RULE) until runtime
as part of the plan-cache patch; but I left CREATE TABLE and ALTER TABLE
alone, mistakenly thinking that their parse analysis work was purely
syntactic transformations and so could be done without reference to the
database state.  As noted in the discussion of bug #3403, this is wrong
with respect to the processing of SERIAL-column sequences.  And there's
also the matter of CREATE TABLE ... LIKE, for which the CVS-HEAD code
says

 * Change the LIKE  portion of a CREATE TABLE statement into
 * column definitions which recreate the user defined column portions of
 * .
 *
 * Note: because we do this at parse analysis time, any change in the
 * referenced table between parse analysis and execution won't be reflected
 * into the new table.  Is this OK?

So I'm thinking we should complete the break-up and delay the processing
done by transformCreateStmt and transformAlterTableStmt until execution
of the utility command begins.  In the case of ALTER TABLE we should
take out an exclusive lock on the target table before we even start to
do any of transformAlterTableStmt's work.

I had originally thought that parser/analyze.c was too intertwined to
try to break up, but upon looking more closely I find that there is
actually almost complete separation between the handling of plannable
commands and utility commands.  I would like to refactor analyze.c
into two files to reflect this new understanding of when things happen:

analyze.c: keeps parse_analyze, transformStmt, and the handling of
SELECT/INSERT/UPDATE/DELETE commands, as well as EXPLAIN and DECLARE
CURSOR, which are special cases but more nearly related to plannable
commands than not.

a new file named something like parse_utilcmd.c: transformCreateStmt,
transformAlterTableStmt, transformCreateSchemaStmt, transformIndexStmt,
transformRuleStmt, and subsidiary routines.  These functions would now
be called at the beginning of execution of the respective utility
commands, and not from parse_analyze() at all.

It looks like only release_pstate_resources() and makeFromExpr() are
used in common by these two files; both of them arguably belong
somewhere else anyway (parse_node.c and makefuncs.c respectively).
Also we might need to export transformStmt() from analyze.c; the
utility-command routines currently call that directly, and I'm undecided
whether they can or should go through parse_analyze() instead.

With this refactoring, there will not be any use of the
extras_before/extras_after mechanism within analyze.c, and I'm sorely
tempted to just rip it out, redeclaring parse_analyze() and friends
to return a single Query node instead of a List.  Can anyone foresee
a reason we might still need to return multiple Query nodes from a
single plannable statement?  (Note: "rule expansion" isn't a reason,
that happens later.)

Comments?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] fast stop before database system is ready

2007-06-22 Thread Kevin Grittner

I apologize for not grabbing more information before the evidence was gone,
but I think there may be a vulnerability to database corruption on PITR
recovery if a stop is done with the "fast" option right after a database logs
"archive recovery complete".  We normally have about 17 seconds between that
and the "database system is ready" message for a particular database.
Someone was watching the log and issued a fast stop about 1.5 seconds after
the "archive recovery is complete" message.  When the database came back up,
it was corrupted.  (The first problem message was about a bad sibling
pointer, but the wheels pretty much fell off after that.)  He deleted the
database instance, got a fresh dump, and tried again without stopping the
server at that point, and all is well.
 
The dump used in the problem recovery attempt is now gone.  I hesitate to
report this since my information is so sketchy, but thought you might want
the report anyway.
 
The source and target of this PITR-style copy were both PostgreSQL 8.2.4 on
SuSE Linux.  For more details on the target see my recent posts about the
corrupted database which turned out to be caused by bad hardware and outdated
drivers.
( http://archives.postgresql.org/pgsql-admin/2007-06/msg00151.php )
The failed recovery was on that box, after fixing all known hardware and
driver issues.
 
No assistance needed.
 
-Kevin
 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Alvaro Herrera

[EMAIL PROTECTED] wrote:
> > Why not do it the other way around?
> > es_ES   spanish
> > Spanish_Spain   spanish
> > ru_RU   russian
> > pt_BR   portuguese_brazil
> >
> > That way you don't need any funny index.  Or do you need the list of
> > locales for each language? (but even if you do, you can easily obtain it
> > by indexing both columns separately using btrees anyway)
> 
> Yes, that's possible but that icreases number of identical configuration:
> russian_win Russian_Russia
> russian_unixru_RU
> 
> They doesn't differ except locale name.

But why do you need them to be different at all?  Just make it

russian Russian_Russia
russian ru_RU

Does that not work for some reason?

What I was really suggesting was having a table mapping locale names
into "tsearch languages".  Then the configuration could be made based on
the language, not on the locale name.  So the stopword list is for
"russian", regardless of whether the locale is Russian_Russia or ru_RU.

Is this only for the stopword list, or does it also affect selecting a
stemmer?

Note: it's possible that the stopword list is different for brazilian
portuguese than portuguese portuguese, which is why I was suggesting
using a language "portuguese_brazil" and not just "postuguese".  Whereas
you need a single stopword list for all the countries speaking spanish,
which is why you need only one language called spanish.

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
"Llegará una época en la que una investigación diligente y prolongada sacará
a la luz cosas que hoy están ocultas" (Séneca, siglo I)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-22 Thread teodor


>> How would this work for initdb with locale C?
>
> I'm worrying about that too.

english '{en_GB, en_US, C}'

I suppose, that locale name always has a dot separator exept C locale ---
which is well known exception




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Florian G. Pflug


Richard Huxton wrote:

Bruce Momjian wrote:

Tom Lane wrote:

What's wrong with synchronous_commit?  It's accurate and simple.


That is fine too.


My concern would be that it can be read two ways:
1. When you commit, sync (something or other - unspecified)
2. Synchronise commits (to each other? to something else?)*

It's obvious to people on the -hackers list what we're talking about, 
but is it so clear to a newbie, perhaps non-English speaker?


* I can see people thinking this means something like "commit_delay".


OTOH, the concept of synchronous vs. asynchronous (function) calls
should be pretty well-known among database programmers and administrators.
And (at least to me), this is really what this is about - the commit
happens asynchronously, at the convenience of the database, and not
the instant that I requested it.

greetings, Florian Pflug


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] GUC time unit spelling a bit inconsistent

2007-06-22 Thread Peter Eisentraut

Am Freitag, 22. Juni 2007 15:34 schrieb Bruce Momjian:
> Consider even if we are clear that "min" is "minutes", it could be
> chronological minutes or radial degree minutes, so yea, the context has
> to be considered.

The correct symbol for an arc minute is ´, so there is no context dependency.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Tom Lane

Richard Huxton <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> What's wrong with synchronous_commit?  It's accurate and simple.

> My concern would be that it can be read two ways:
> 1. When you commit, sync (something or other - unspecified)
> 2. Synchronise commits (to each other? to something else?)*

Well, that's a fair point.  deferred_commit would avoid that objection.

I'm not sure it's real important though --- with practically all of the
postgresql.conf variables, you really need to read the manual to know
exactly what they do.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread teodor

> Why not do it the other way around?
> es_ES spanish
> Spanish_Spain spanish
> ru_RU russian
> pt_BR portuguese_brazil
>
> That way you don't need any funny index.  Or do you need the list of
> locales for each language? (but even if you do, you can easily obtain it
> by indexing both columns separately using btrees anyway)

Yes, that's possible but that icreases number of identical configuration:
russian_win Russian_Russia
russian_unixru_RU

They doesn't differ except locale name.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [Fwd: Re: [HACKERS] tsearch in core patch]

2007-06-22 Thread Tatsuo Ishii

> >> How would this work for initdb with locale C?
> >
> > I'm worrying about that too.
> 
> english '{en_GB, en_US, C}'
> 
> I suppose, that locale name always has a dot separator exept C locale ---
> which is well known exception

So we would have to?:

japanese '{ja_JP, C}'

How would we know C -> japanese?

Also I'm wondering how we could handle texts including Japanese and
English. It's very common in Japan.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes:
>> On Jun 22, 2007, at 9:28 , Tom Lane wrote:
>>> Is the point here for initdb to be able to establish a sane default
>>> initially?  Seems to me it can guess the language from the first
>>> component of the locale (ru_RU -> russian).
>> 
>> How would this work for initdb with locale C?

> I'm worrying about that too.

I would be surprised if C locale defaulted to anything except English.
I suppose it would be sensible to add a switch to allow people to select
a different language.  In any case, the only thing initdb would be doing
would be setting up an initial value of a table entry or GUC variable,
so you could always change it yourself later; it may not be worth
sweating too much about this.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Tatsuo Ishii

> On Jun 22, 2007, at 9:28 , Tom Lane wrote:
> 
> > Is the point here for initdb to be able to establish a sane default
> > initially?  Seems to me it can guess the language from the first
> > component of the locale (ru_RU -> russian).
> 
> How would this work for initdb with locale C?

I'm worrying about that too.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Alvaro Herrera

[EMAIL PROTECTED] wrote:

> So, final propose:
> rename cfglocale to cfglanguages and store in it array of laguage names
> which is produced from first part of locale names:
> russian   '{ru_RU, Russian_Russia}'
> spanish   '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}'
> 
> Comments?

Why not do it the other way around?
es_ES   spanish
Spanish_Spain   spanish
ru_RU   russian
pt_BR   portuguese_brazil

That way you don't need any funny index.  Or do you need the list of
locales for each language? (but even if you do, you can easily obtain it
by indexing both columns separately using btrees anyway)

-- 
Alvaro Herrera   http://www.PlanetPostgreSQL.org/
"I can see support will not be a problem.  10 out of 10."(Simon Wittber)
  (http://archives.postgresql.org/pgsql-general/2004-12/msg00159.php)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Bruce Momjian

Michael Glaesemann wrote:
> 
> On Jun 22, 2007, at 9:28 , Tom Lane wrote:
> 
> > Is the point here for initdb to be able to establish a sane default
> > initially?  Seems to me it can guess the language from the first
> > component of the locale (ru_RU -> russian).
> 
> How would this work for initdb with locale C?

Yea, that's a problem.  I am thinking we should just avoid the entire
issue and require it to be set by the user, and throw an error if the
configuration is not set.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread teodor


>> That may have been true until we started supporting Windows...
>> Swedish_Sweden.1252 is what I get on my machine, for example. Principle
>> is the same, but values certainly aren't.
>
> Well, at least the name is not itself translated, so a mapping table is
> not right out of the question.  If they had put a name like
> "Español_Chile" instead of "Spanish_Chile" we would be in serious
> trouble.
I don't think so, in oppsite case you can't type or show it to change
locale :).

So, final propose:
rename cfglocale to cfglanguages and store in it array of laguage names
which is produced from first part of locale names:
russian   '{ru_RU, Russian_Russia}'
spanish   '{es_ES, es_CL, Spanish_Spain, Spanish_Chile}'

Comments?

Is there some obstacles to  use GIN indexes in pg_catalog?


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Richard Huxton


Bruce Momjian wrote:

Tom Lane wrote:

What's wrong with synchronous_commit?  It's accurate and simple.


That is fine too.


My concern would be that it can be read two ways:
1. When you commit, sync (something or other - unspecified)
2. Synchronise commits (to each other? to something else?)*

It's obvious to people on the -hackers list what we're talking about, 
but is it so clear to a newbie, perhaps non-English speaker?


* I can see people thinking this means something like "commit_delay".

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Michael Glaesemann



On Jun 22, 2007, at 9:28 , Tom Lane wrote:


Is the point here for initdb to be able to establish a sane default
initially?  Seems to me it can guess the language from the first
component of the locale (ru_RU -> russian).


How would this work for initdb with locale C?

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Alvaro Herrera

Magnus Hagander wrote:
> Tom Lane wrote:
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> >> I very much doubt that the different spanishes are any different in the
> >> stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
> >> but in the case of portuguese I'm not so sure.  Maybe there are other
> >> examples (like chinese, but I'm not sure how useful is tsearch for
> >> chinese).
> > 
> >> And the .ISO8859-1 part you don't need at all if you accept that the
> >> files are UTF8 by design, as Tom proposed.
> > 
> > Also, the problem we're dealing with here is mainly lack of
> > standardization of the encoding part of locale names.  AFAIK, just about
> > everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
> > after that (if any) that is not too consistent across platforms.
> 
> That may have been true until we started supporting Windows...
> Swedish_Sweden.1252 is what I get on my machine, for example. Principle
> is the same, but values certainly aren't.

Well, at least the name is not itself translated, so a mapping table is
not right out of the question.  If they had put a name like
"Español_Chile" instead of "Spanish_Chile" we would be in serious
trouble.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Michael Glaesemann



On Jun 22, 2007, at 9:23 , Richard Huxton wrote:


Or perhaps "sync_on_commit = off"?


Or switch it around...

sink_on_commit = on

(sorry for the noise)

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Florian G. Pflug

PFC wrote:

On Fri, 22 Jun 2007 16:43:00 +0200, Bruce Momjian <[EMAIL PROTECTED]> wrote:

Simon Riggs wrote:

On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> untrustworthy disk hardware, for instance.  I'd much rather use names
> >> derived from "deferred commit" or "delayed commit" or some such.
> >
> > Honestly, I prefer these names as well as it seems directly related versus
> > transaction guarantee which sounds to be more like us saying, if we turn it 
off
> > our transactions are bogus.

That was the intention..., but name change accepted.

> Hm, another possibility: "synchronous_commit = off"

Ooo, I like that. Any other takers?

Yea, I like that too but I am now realizing that we are not really
deferring or delaying the "COMMIT" command but rather the recovery of
the commit.  GUC as full_commit_recovery?

commit_waits_for_fsync =

force_yes: makes all commits "hard"
yes: commits are "hard" unless specified otherwise [default]
no: commits are "soft" unless specified otherwise [should 
replace fsync=off use case]
force_no: makes all commits "soft" (controller with write cache 
"emulator")

I think you got the last line backwards - without the fsync() after
a commit, you can't be sure that the data made it into the controller
cache. To be safe you *always* need the fsync() - but it will probably
be much cheaper if your controller doesn't have to actually write to
the disks, but can cache in battery-backed ram instead. Therefore,
if you own such a controller, you probably don't need deferred commits.

BTW, I like synchronous_commit too - but maybe asynchronous_commit
would be even better, with inverted semantics of course.
The you'd have "asynchronous_commit = off" as default.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Joshua D. Drake wrote:
>  Hm, another possibility: "synchronous_commit = off"
> 
>  Ooo, I like that. Any other takers?
> 
> >>> Yea, I like that too but I am now realizing that we are not really
> >>> deferring or delaying the "COMMIT" command but rather the recovery of
> >>> the commit.  GUC as full_commit_recovery?
> >> 
> >> recovery is a bad word I think. It is related too closely to failure.
> 
> > commit_stability?  reliable_commit?
> 
> What's wrong with synchronous_commit?  It's accurate and simple.

That is fine too.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Magnus Hagander

Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> I very much doubt that the different spanishes are any different in the
>> stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
>> but in the case of portuguese I'm not so sure.  Maybe there are other
>> examples (like chinese, but I'm not sure how useful is tsearch for
>> chinese).
> 
>> And the .ISO8859-1 part you don't need at all if you accept that the
>> files are UTF8 by design, as Tom proposed.
> 
> Also, the problem we're dealing with here is mainly lack of
> standardization of the encoding part of locale names.  AFAIK, just about
> everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
> after that (if any) that is not too consistent across platforms.

That may have been true until we started supporting Windows...
Swedish_Sweden.1252 is what I get on my machine, for example. Principle
is the same, but values certainly aren't.

//Magnus


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Oleg Bartunov


On Fri, 22 Jun 2007, Bruce Momjian wrote:


Tom Lane wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure.  Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).



And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.


Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names.  AFAIK, just about
everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
after that (if any) that is not too consistent across platforms.
So I see no problem in distinguishing between pt_PT and pt_BR if it
turns out we have to.  The trick is to not look at any more of the
locale name than that; and if we standardize on "stopword files are
UTF8" then I don't think we need to.


OK, and the open question is when do we do this default setting.  If we
do it in initdb then we can isolate all the detection there.


We can do that at initdb time, but we still have to decide how to map
human-readable language name and lang part of locale name. Are we going
to hardcode it ?

It's not friendly for hosting solution, when people often have no access
to the postgresql.conf, so they need to remember setting tsearch_conf_name.
It could be solved using 'alter user ... set tsearch_conf_name' command though.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Simon Riggs

On Fri, 2007-06-22 at 10:52 -0400, Bruce Momjian wrote:

> commit_stability?  reliable_commit?

commit_durability?

That then relates it directly to the D in ACID.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> Joshua D. Drake wrote:
 Hm, another possibility: "synchronous_commit = off"

 Ooo, I like that. Any other takers?

>>> Yea, I like that too but I am now realizing that we are not really
>>> deferring or delaying the "COMMIT" command but rather the recovery of
>>> the commit.  GUC as full_commit_recovery?
>> 
>> recovery is a bad word I think. It is related too closely to failure.

> commit_stability?  reliable_commit?

What's wrong with synchronous_commit?  It's accurate and simple.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread PFC


On Fri, 22 Jun 2007 16:43:00 +0200, Bruce Momjian <[EMAIL PROTECTED]> wrote:


Simon Riggs wrote:

On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
>
> > Tom Lane wrote:
> >
> >> untrustworthy disk hardware, for instance.  I'd much rather use  
names

> >> derived from "deferred commit" or "delayed commit" or some such.
> >
> > Honestly, I prefer these names as well as it seems directly related  
versus
> > transaction guarantee which sounds to be more like us saying, if we  
turn it off

> > our transactions are bogus.

That was the intention..., but name change accepted.

> Hm, another possibility: "synchronous_commit = off"

Ooo, I like that. Any other takers?


Yea, I like that too but I am now realizing that we are not really
deferring or delaying the "COMMIT" command but rather the recovery of
the commit.  GUC as full_commit_recovery?



commit_waits_for_fsync =

force_yes   : makes all commits "hard"
yes : commits are "hard" unless specified otherwise [default]
	no	: commits are "soft" unless specified otherwise [should replace  
fsync=off use case]
	force_no	: makes all commits "soft" (controller with write cache  
"emulator")


	the force_yes and force_no are for benchmarking purposes mostly, ie. once  
your app is tuned to specify which commits have to be guaranteed ("hard")  
and which don't ("soft") you can then bench it with force_yes and force_no  
to see how much you gained, and how much you'd gain by buying a write  
cache controller...



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Bruce Momjian

Joshua D. Drake wrote:
> Bruce Momjian wrote:
> > Simon Riggs wrote:
> >> On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:
> >>> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> >>>
>  Tom Lane wrote:
> 
> > untrustworthy disk hardware, for instance.  I'd much rather use names
> > derived from "deferred commit" or "delayed commit" or some such.
>  Honestly, I prefer these names as well as it seems directly related 
>  versus
>  transaction guarantee which sounds to be more like us saying, if we turn 
>  it off
>  our transactions are bogus.
> >> That was the intention..., but name change accepted.
> >>
> >>> Hm, another possibility: "synchronous_commit = off"
> >> Ooo, I like that. Any other takers?
> > 
> > Yea, I like that too but I am now realizing that we are not really
> > deferring or delaying the "COMMIT" command but rather the recovery of
> > the commit.  GUC as full_commit_recovery?
> 
> recovery is a bad word I think. It is related too closely to failure.

commit_stability?  reliable_commit?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] month abreviation

2007-06-22 Thread Bruce Momjian

Jaime Casanova wrote:
> On 6/22/07, Euler Taveira de Oliveira <[EMAIL PROTECTED]> wrote:
> > Jaime Casanova wrote:
> >
> > > note the month abreviation (mons?) is this intentional?
> > >
> > This notation has been used since the code was written (~7 years ago) [1].
> >
> > [1]
> > http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/datetime.c?rev=1.42;content-type=text%2Fx-cvsweb-markup
> >
> 
> mmm... so, it had been bad for 7 years now... ;)
> ok, acceptting that as an abreviattion for months, what controls that.
> why u get "years", "days" and "mons", i mean, why is this one
> abreviated when the other two are not

I thought there was some standard that required that, but I don't
remember which one.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Joshua D. Drake


Bruce Momjian wrote:

Simon Riggs wrote:

On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:


Tom Lane wrote:


untrustworthy disk hardware, for instance.  I'd much rather use names
derived from "deferred commit" or "delayed commit" or some such.

Honestly, I prefer these names as well as it seems directly related versus
transaction guarantee which sounds to be more like us saying, if we turn it off
our transactions are bogus.

That was the intention..., but name change accepted.


Hm, another possibility: "synchronous_commit = off"

Ooo, I like that. Any other takers?


Yea, I like that too but I am now realizing that we are not really
deferring or delaying the "COMMIT" command but rather the recovery of
the commit.  GUC as full_commit_recovery?


recovery is a bad word I think. It is related too closely to failure.

Sincerely,

Joshua D. Drake







--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Bruce Momjian

Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > I very much doubt that the different spanishes are any different in the
> > stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
> > but in the case of portuguese I'm not so sure.  Maybe there are other
> > examples (like chinese, but I'm not sure how useful is tsearch for
> > chinese).
> 
> > And the .ISO8859-1 part you don't need at all if you accept that the
> > files are UTF8 by design, as Tom proposed.
> 
> Also, the problem we're dealing with here is mainly lack of
> standardization of the encoding part of locale names.  AFAIK, just about
> everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
> after that (if any) that is not too consistent across platforms.
> So I see no problem in distinguishing between pt_PT and pt_BR if it
> turns out we have to.  The trick is to not look at any more of the
> locale name than that; and if we standardize on "stopword files are
> UTF8" then I don't think we need to.

OK, and the open question is when do we do this default setting.  If we
do it in initdb then we can isolate all the detection there.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Bruce Momjian

Simon Riggs wrote:
> On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:
> > "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> > 
> > > Tom Lane wrote:
> > >
> > >> untrustworthy disk hardware, for instance.  I'd much rather use names
> > >> derived from "deferred commit" or "delayed commit" or some such.
> > >
> > > Honestly, I prefer these names as well as it seems directly related versus
> > > transaction guarantee which sounds to be more like us saying, if we turn 
> > > it off
> > > our transactions are bogus.
> 
> That was the intention..., but name change accepted.
> 
> > Hm, another possibility: "synchronous_commit = off"
> 
> Ooo, I like that. Any other takers?

Yea, I like that too but I am now realizing that we are not really
deferring or delaying the "COMMIT" command but rather the recovery of
the commit.  GUC as full_commit_recovery?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] GUC time unit spelling a bit inconsistent

2007-06-22 Thread Joshua D. Drake


Andrew Sullivan wrote:

On Thu, Jun 21, 2007 at 03:24:51PM +0200, Michael Paesold wrote:
There are valid reasons against 5m as mega-bytes, because here m does 
not refer to a unit, it refers to a quantifier (if that is a reasonable 
English word) of a unit. So it should really be 5mb.


log_rotation_age = 5m
log_rotation_size = 5mb


Except, of course, that "5mb" would be understood by those of us who
work in metric and use both bits and bytes as 5 millibits.


I at one point submitted a patch to make units case insensitive, I have 
since submitting that patch decided that was a horrible idea. Why can't 
we use standard units? Mb, Kb, KB, MB... (I don't know the standard unit 
for minutes).


The more I see this going back and forth it seems we should just do it 
right the first time and tell everyone else to read:


The fine manual
The spec(s) that define the units.

Joshua D. Drake





 Which
would be an absurd value, but since Postgres had support for time
travel once, who knows what other wonders the developers have come up
with ;-)  (I will note, though, that this B vs b problem really gets
up my nose, especially when I hear people who are ostensibly
designing networks talking about "gigabyte ethernet" cards.  I would
_like_ such a card, I confess, but to my knowledge the standard
hasn't gotten that far yet.)

Nevertheless, I think that Tom's original suggestion was at least a
HINT, which seems perfectly reasonable to me.  


A




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> I very much doubt that the different spanishes are any different in the
> stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
> but in the case of portuguese I'm not so sure.  Maybe there are other
> examples (like chinese, but I'm not sure how useful is tsearch for
> chinese).

> And the .ISO8859-1 part you don't need at all if you accept that the
> files are UTF8 by design, as Tom proposed.

Also, the problem we're dealing with here is mainly lack of
standardization of the encoding part of locale names.  AFAIK, just about
everybody agrees on "es_ES", "ru_RU", etc; it's the part that comes
after that (if any) that is not too consistent across platforms.
So I see no problem in distinguishing between pt_PT and pt_BR if it
turns out we have to.  The trick is to not look at any more of the
locale name than that; and if we standardize on "stopword files are
UTF8" then I don't think we need to.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread PFC



So now we're poking a hole in that but we certainly have to ensure that  
any
transactions that do see the results of our deferred commit themselves  
don't

record any visible effects until both their commit and ours hit WAL. The
essential point in Simon's approach that guarantees that is that when you
fsync you fsync all work that came before you. So committing a  
transaction

also commits all deferred commits that you might depend on.


BTW: I really dislike the name "transaction guarantee" for the feature;
it sounds like marketing-speak, not to mention overpromising what we
can deliver.  Postgres can't "guarantee" anything in the face of
untrustworthy disk hardware, for instance.  I'd much rather use names
derived from "deferred commit" or "delayed commit" or some such.


Well from an implementation point of view we're delaying or deferring the
commit. But from a user's point of view the important thing for them to
realize is that a committed record could be lost.

Perhaps we should just not come up with a new name and reuse the fsync
variable. That way users of old installs which have fsync=off silently  
get
this new behaviour. I'm not sure I like that idea since I use fsync=off  
to run
cpu overhead tests here. But from a user's point of view it's probably  
the
"right" thing. This is really what fsync=off should always have been  
doing.


Say you call them SOFT COMMIT and HARD COMMIT...
HARD COMMIT fsyncs, obviously.
Does SOFT COMMIT fflush() the WAL (so it's postgres-crash-safe) or not ?
(just in case some user C function misbehaves and crashes)

Do we get a config param to set default_commit_mode=hard or soft ?

	By the way InnoDB has a similar option where you set  
innodb_flush_log_on_commit (or something). However you cannot set it on a  
per-transaction basis. So, on a e-commerce site, for instance, most  
transactions will be "unimportant" (ie. no need to fsync, ACI only, like  
incrementing products view counts, add to cart, etc) but some transactions  
will have to be guaranteed (full ACID) like recording that an order has  
been submitted / paid / shipped. But with InnoDB you can't choose this on  
a per-transaction basis, so it's all or nothing.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Andrew Dunstan




Joshua D. Drake wrote:


I like "synchronous_commit = off", it even has a little girlfriend 
getting spin while being accurate :)




In my experience, *_commit = off rarely gets you a girlfriend ...

cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Tom Lane

Teodor Sigaev <[EMAIL PROTECTED]> writes:
>> I don't think we are going to do language selection automatically ---
>> the user is going to have to set tsearch_conf_name.

> Are you suggest to remove long-lived feature of tsearch? In that case we 
> don't 
> need cfglocale (or cfglanguage as Tom suggested) and cfgdefault columns in 
> pg_ts_cfg at all. Just set up tsearch_conf_name.

Is the point here for initdb to be able to establish a sane default
initially?  Seems to me it can guess the language from the first
component of the locale (ru_RU -> russian).

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Alvaro Herrera

Teodor Sigaev wrote:

> >> --- how do many languages use ISO8859-1 locale?. 
> > ISO8859-1 is encoding, not locale.
> 
> I meant, if we'll use encoding name (for example PG_LATIN1) we couldn't 
> distinguish languages which use that encoding (for example italian and 
> finnish and some more), but using locale names it's possible: 
> it_IT.ISO8859-1, fi_FI.ISO8859-1

I don't understand.  Why use "it_IT.ISO8859-1"?  You just need to know
the language, so "it" is enough.  The _IT part specifies that it's the
italian spoken in Italy.  This may be irrelevant in most cases, but
consider that pt_PT and pt_BR are AFAIK somewhat different languages.

I very much doubt that the different spanishes are any different in the
stemming rules, so there's no need for es_ES, es_PE, es_AR, es_CL etc;
but in the case of portuguese I'm not so sure.  Maybe there are other
examples (like chinese, but I'm not sure how useful is tsearch for
chinese).

And the .ISO8859-1 part you don't need at all if you accept that the
files are UTF8 by design, as Tom proposed.

-- 
Alvaro Herrera  Developer, http://www.PostgreSQL.org/
"Nadie esta tan esclavizado como el que se cree libre no siendolo" (Goethe)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Tom Lane

"Simon Riggs" <[EMAIL PROTECTED]> writes:
> On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:
>> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
>> Hm, another possibility: "synchronous_commit = off"

> Ooo, I like that. Any other takers?

OK with me

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Richard Huxton


Joshua D. Drake wrote:

Simon Riggs wrote:

On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:


Tom Lane wrote:


untrustworthy disk hardware, for instance.  I'd much rather use names
derived from "deferred commit" or "delayed commit" or some such.
Honestly, I prefer these names as well as it seems directly related 
versus
transaction guarantee which sounds to be more like us saying, if we 
turn it off

our transactions are bogus.


That was the intention..., but name change accepted.


Hm, another possibility: "synchronous_commit = off"


Ooo, I like that. Any other takers?


I like "synchronous_commit = off", it even has a little girlfriend 
getting spin while being accurate :)


Or perhaps "sync_on_commit = off"?
Less girlfriend-speak perhaps:"no_sync_on_commit = on"

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Teodor Sigaev


I don't think we are going to do language selection automatically ---
the user is going to have to set tsearch_conf_name.


Are you suggest to remove long-lived feature of tsearch? In that case we don't 
need cfglocale (or cfglanguage as Tom suggested) and cfgdefault columns in 
pg_ts_cfg at all. Just set up tsearch_conf_name.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Joshua D. Drake


Simon Riggs wrote:

On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:


Tom Lane wrote:


untrustworthy disk hardware, for instance.  I'd much rather use names
derived from "deferred commit" or "delayed commit" or some such.

Honestly, I prefer these names as well as it seems directly related versus
transaction guarantee which sounds to be more like us saying, if we turn it off
our transactions are bogus.


That was the intention..., but name change accepted.


Hm, another possibility: "synchronous_commit = off"


Ooo, I like that. Any other takers?


I like "synchronous_commit = off", it even has a little girlfriend 
getting spin while being accurate :)


Joshua D. Drake




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Simon Riggs

On Fri, 2007-06-22 at 14:29 +0100, Gregory Stark wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> 
> > Tom Lane wrote:
> >
> >> untrustworthy disk hardware, for instance.  I'd much rather use names
> >> derived from "deferred commit" or "delayed commit" or some such.
> >
> > Honestly, I prefer these names as well as it seems directly related versus
> > transaction guarantee which sounds to be more like us saying, if we turn it 
> > off
> > our transactions are bogus.

That was the intention..., but name change accepted.

> Hm, another possibility: "synchronous_commit = off"

Ooo, I like that. Any other takers?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Bruce Momjian

Teodor Sigaev wrote:
> > The recommendation I was making was to use the language name, not the
> > encoding name, in the user-visible configuration.

> How does it determine language of db automatically?

I don't think we are going to do language selection automatically ---
the user is going to have to set tsearch_conf_name.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] GUC time unit spelling a bit inconsistent

2007-06-22 Thread Bruce Momjian

Peter Eisentraut wrote:
> Am Donnerstag, 21. Juni 2007 15:12 schrieb Andrew Dunstan:
> > You don't seem to have any understanding that the units should be
> > interpreted in context.
> 
> You are right.  I definitely have an understanding that units must be 
> interpretable without context.  And that clearly works for the most part.

Consider even if we are clear that "min" is "minutes", it could be
chronological minutes or radial degree minutes, so yea, the context has
to be considered.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] GUC time unit spelling a bit inconsistent

2007-06-22 Thread Bruce Momjian

Michael Paesold wrote:
> Marko Kreen wrote:
> > Considering Postgres will never user either "meter" or "mile"
> > in settings, I don't consider your argument valid.
> > 
> > I don't see the value of having units globally unique (literally).
> > It's enough if they unique in the context of postgresql.conf.
> > 
> > Thus +1 of having additional shortcuts Tom suggested.
> > Also +1 for having them case-insensitive.
> 
> Agreed. Although I suggest perhaps to not press for "m" as minutes, 
> because it really is ambiguous for "months" or "minutes", esp. in a 
> context like "log_rotation_age".
> 
> Please lets have the unambiguous abbreviations. Please lets make it all 
> case-insensitive. After all this discussion, what about a straight 
> forward vote? Bruce, we had those before, no?

Right.  No one dictates what goes into PostgreSQL and I think there are
clearly enough people who want improvement in this area, including
perhaps having 'm' meaning minutes and going with case insensitivity.
Please post a patch that we can discuss/review.  If it is small we can
try to get it into 8.3.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Gregory Stark

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:

> Tom Lane wrote:
>
>> untrustworthy disk hardware, for instance.  I'd much rather use names
>> derived from "deferred commit" or "delayed commit" or some such.
>
> Honestly, I prefer these names as well as it seems directly related versus
> transaction guarantee which sounds to be more like us saying, if we turn it 
> off
> our transactions are bogus.

Hm, another possibility: "synchronous_commit = off"

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] GUC time unit spelling a bit inconsistent

2007-06-22 Thread Bruce Momjian

Michael Paesold wrote:
> > Btw.: I'm currently at DebConf in Edinburgh.  On Scottish motorway 
> > signage, "5m" means "five miles".  Even the Americans do that better.  So, 
> > no, you can't have "m" for "minutes". ;)
> 
> Even with the ;) here and the context, the last sentence sounds to me 
> quite arrogant. Most people here have tried to bring arguments and 
> reasoning... you put it off with irrelevant anecdotes in the wrong context.

It is hard to argue with your analysis here.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Joshua D. Drake


Tom Lane wrote:

I've been reflecting a bit about whether the notion of deferred fsync
for transaction commits is really safe.  The proposed patch tries to
ensure that no consequences of a committed transaction can reach disk
before the commit WAL record is fsync'd, but ISTM there are potential
holes in what it's doing.  In particular the path that concerns me is



BTW: I really dislike the name "transaction guarantee" for the feature;
it sounds like marketing-speak, not to mention overpromising what we
can deliver.  Postgres can't "guarantee" anything in the face of


Ahh but it can. :). PostgreSQL can guarantee that "if" the hardware is 
not faulty and the OS does what it is supposed to do... etc..


And yes, it is marketing but life is marketing, getting girlfriends is 
marketing. What matters is that once the marketing is over, you can 
stand up to the hype.



untrustworthy disk hardware, for instance.  I'd much rather use names
derived from "deferred commit" or "delayed commit" or some such.


Honestly, I prefer these names as well as it seems directly related 
versus transaction guarantee which sounds to be more like us saying, if 
we turn it off our transactions are bogus.


Joshua D. Drake



regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] EOL characters and multibyte encodings

2007-06-22 Thread Andrew Dunstan




William ZHANG wrote:


It's safe, because you'll be dealing with prosrc inside the backend,
therefore using a backend-legal encoding, and those don't have any ASCII
aliasing problems (all bytes of an MB character must have high bit set).
  


The lower byte of some characters in BIG5, GBK, GB18030 may be less than
0x7F and don't have the high bit set. Fortunately, they don't use 0x0D and
0x0A (CR and LF).

  
  


Those are client-only encodings, precisely for this sort of reason, and 
thus not relevant to the present discussion. As Tom points out above, 
when the language handler gets the code it will be encoded in the 
relevant backend encoding which can't be any of these.


(Side note: the restriction by the R parser to unix-only line endings is 
a dreadful piece of design. As Jon Postel rightly said, the best rule is 
"Be liberal in what you accept and conservative in what you send." Just 
about every parser for every language has been able to handle this, so 
why must R be different?)


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

"Tom Lane" <[EMAIL PROTECTED]> writes:

> I've been reflecting a bit about whether the notion of deferred fsync
> for transaction commits is really safe.  The proposed patch tries to
> ensure that no consequences of a committed transaction can reach disk
> before the commit WAL record is fsync'd, but ISTM there are potential
> holes in what it's doing.  In particular the path that concerns me is
>
> (1) transaction A commits with deferred fsync;
>
> (2) transaction B observes some effect of A (eg, a committed-good tuple);
>
> (3) transaction B makes a change that is contingent on the observation.
>
> If B's changes were to reach disk in advance of A's commit record, we'd
> have a risk of logical inconsistency.  The patch is doing what it can
> to prevent *direct* effects of A from reaching disk before the commit
> record does, but it doesn't (and I think cannot) extend this to indirect
> effects perpetrated by other transactions.  An example of the sort of
> risk I'm worried about is a REINDEX omitting an index entry for a tuple
> that it sees as committed dead by A.
>
> Now this may be safe anyway, but it requires analysis that I don't
> recall anyone having put forward.  The cases that I can see are:

I think Simon did try to put all this in writing when he first proposed it.
It's worth going through again with the actual implementation to be sure all
the same guarantees hold.

> So I think it's probably all OK, but this is a sufficiently long chain
> of reasoning that it had better be checked over by multiple people and
> recorded as part of the design implications of the patch.  Does anyone
> think any of this is wrong, or too fragile to survive future code
> changes?  Are there cases I've missed?

I think the logic you describe is not quite as subtle as you make it out to
be. Certainly it's a bit surprising at first but it all boils down to the
basic idea of how transactions and WAL records work: We never allow any other
transactions to see the effects of our transaction until the commit record is
fsynced to WAL. 

So now we're poking a hole in that but we certainly have to ensure that any
transactions that do see the results of our deferred commit themselves don't
record any visible effects until both their commit and ours hit WAL. The
essential point in Simon's approach that guarantees that is that when you
fsync you fsync all work that came before you. So committing a transaction
also commits all deferred commits that you might depend on.

> BTW: I really dislike the name "transaction guarantee" for the feature;
> it sounds like marketing-speak, not to mention overpromising what we
> can deliver.  Postgres can't "guarantee" anything in the face of
> untrustworthy disk hardware, for instance.  I'd much rather use names
> derived from "deferred commit" or "delayed commit" or some such.

Well from an implementation point of view we're delaying or deferring the
commit. But from a user's point of view the important thing for them to
realize is that a committed record could be lost.

Perhaps we should just not come up with a new name and reuse the fsync
variable. That way users of old installs which have fsync=off silently get
this new behaviour. I'm not sure I like that idea since I use fsync=off to run
cpu overhead tests here. But from a user's point of view it's probably the
"right" thing. This is really what fsync=off should always have been doing.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Teodor Sigaev


The recommendation I was making was to use the language name, not the
encoding name, in the user-visible configuration.

How does it determine language of db automatically?

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] tsearch in core patch

2007-06-22 Thread Teodor Sigaev


3) ALTER FULLTEXT CONFIGURATION cfgname ADD/ALTER/DROP MAPPING
done

Why not rename ALTER FULLTEXT CONFIGURATION --> ALTER TEXT SEARCH
CONFIGURATION here too ?


It's renamed too.


most languages can be written using UNICODE charset and UTF-8 encoding,
so neither charset not encoding can be used to determine language.

yes


 --- how do many languages use ISO8859-1 locale?. 

> ISO8859-1 is encoding, not locale.

I meant, if we'll use encoding name (for example PG_LATIN1) we couldn't 
distinguish languages which use that encoding (for example italian and finnish 
and some more), but using locale names it's possible: it_IT.ISO8859-1, 
fi_FI.ISO8859-1


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Simon Riggs

On Thu, 2007-06-21 at 18:15 -0400, Tom Lane wrote:
> I've been reflecting a bit about whether the notion of deferred fsync
> for transaction commits is really safe.  The proposed patch tries to
> ensure that no consequences of a committed transaction can reach disk
> before the commit WAL record is fsync'd, but ISTM there are potential
> holes in what it's doing.  In particular the path that concerns me is
> 
> (1) transaction A commits with deferred fsync;
> 
> (2) transaction B observes some effect of A (eg, a committed-good tuple);
> 
> (3) transaction B makes a change that is contingent on the observation.
> 
> If B's changes were to reach disk in advance of A's commit record, we'd
> have a risk of logical inconsistency.  

B's changes cannot reach disk before B's commit record. That is the
existing WAL-before-data rule implemented by the buffer manager.

If B can see A's changes, then A has written a commit record to the log
that is definitely before B's commit record. So B's commit will also
commit A's changes to WAL when it flushes at EOX. So whether A is a
guaranteed transaction or not, B can always rely on those changes.

I agree this feels unsafe when you first think about it, and was the
reason for me taking months before publishing the idea.

> The patch is doing what it can
> to prevent *direct* effects of A from reaching disk before the commit
> record does, but it doesn't (and I think cannot) extend this to indirect
> effects perpetrated by other transactions.  An example of the sort of
> risk I'm worried about is a REINDEX omitting an index entry for a tuple
> that it sees as committed dead by A.
> 
> Now this may be safe anyway, but it requires analysis that I don't
> recall anyone having put forward.  The cases that I can see are:
> 
> 1. Ordinary WAL-logged change in a shared buffer page.  The change will
> not be allowed to reach disk before the associated WAL record does, and
> that WAL record must follow A's commit, so we're safe.
> 
> 2. Non-WAL-logged change in a temp table.  Could reach disk in advance
> of A's commit, but we don't care since temp table contents don't survive
> crashes anyway.
> 
> 3. Non-WAL-logged change made via one of the paths we have introduced
> to avoid WAL overhead for bulk updates.  In these cases it's entirely
> possible for the data to reach disk before A's commit, because B will
> fsync it down to disk without any sort of interlock, as soon as it
> finishes the bulk update.  However, I believe it's the case that all
> these paths are designed to write data that no other transaction can see
> until after B commits.  That commit must follow A's in the WAL log,
> so until it has reached disk, the contents of the bulk-updated file
> are unimportant after a crash.
> 
> So I think it's probably all OK, but this is a sufficiently long chain
> of reasoning that it had better be checked over by multiple people and
> recorded as part of the design implications of the patch.  Does anyone
> think any of this is wrong, or too fragile to survive future code
> changes?  Are there cases I've missed?

I've done the analysis, but perhaps I should finish the docs now to aid
with review of the patch on the points you make.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Worries about delayed-commit semantics

2007-06-22 Thread Simon Riggs

On Thu, 2007-06-21 at 18:15 -0400, Tom Lane wrote:

> BTW: I really dislike the name "transaction guarantee" for the
> feature; it sounds like marketing-speak, not to mention overpromising
> what we can deliver. 

There is no marketing speak there, nor any overpromising of what is
delivered. I really don't know where you get that idea.

The patch says exactly what it does: it reduces the level of guarantee
provided by a transaction commit and thereby causes almost-certain data
loss in the event of a crash, for transactions that have used the
feature. How can 'you get less' be an overpromise? 

So the purpose of the name was to be explicit about the loss of
robustness that is being traded for performance. If I'd wanted to give
it a marketing name, it would be called 'fast commit'.

>  Postgres can't "guarantee" anything in the face of
> untrustworthy disk hardware, for instance.  

The "guarantee" PostgreSQL currently offers is the ACID durability
guarantee. Postgres has for many years differentiated itself from MySQL
on the basis of the certainty of the commit action.

True, disk hardware can nullify the commit guarantee.

> I'd much rather use names
> derived from "deferred commit" or "delayed commit" or some such.

I'm happy with various names and even had a specific post on the
subject.

Deferred Commit is the best description of the feature to me, but
doesn't highlight the dangers of using it. Am I being too cautious?
Would users understand that "deferred commit" would cause data loss? If
yes, then I'm perfectly happy with that name.

No feedback means name change to deferred commit.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] EOL characters and multibyte encodings

2007-06-22 Thread William ZHANG


"Joe Conway" <[EMAIL PROTECTED]>
> Tom Lane wrote:
>> Joe Conway <[EMAIL PROTECTED]> writes:
>>> My first thought on fixing this issue was to simply replace all 
>>> instances of '\r' in pg_proc.prosrc with '\n' prior to sending it to the 
>>> R parser. As far as I know, any instances of '\r' embedded in a 
>>> syntactically valid R statement must be escaped (i.e. literally the 
>>> characters "\" and "r"), so that should not be a problem. But I am 
>>> concerned about how this potentially plays against multibyte characters. 
>>> Is it safe to do this, or do I need to use a mb-aware replace algorithm?
>>
>> It's safe, because you'll be dealing with prosrc inside the backend,
>> therefore using a backend-legal encoding, and those don't have any ASCII
>> aliasing problems (all bytes of an MB character must have high bit set).

The lower byte of some characters in BIG5, GBK, GB18030 may be less than
0x7F and don't have the high bit set. Fortunately, they don't use 0x0D and
0x0A (CR and LF).

Regards,
William ZHANG

> Great -- I wasn't sure about that.
>
 



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: What does Page Layout version mean? (Was: Re: [HACKERS] Reducing NUMERIC size for 8.3)

2007-06-22 Thread Zdenek Kotala


Heikki Linnakangas wrote:
Since we're discussing upgrades, let me summarize the discussions we had 
over dinner in Ottawa for the benefit of all:




Thanks for summary.


As before, someone just needs to step up and do it.


I'm now working on proposal. I hope that it will ready soon.

Zdenek

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

58 matches

Mail list logo