date:20100528

Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Fujii Masao

On Fri, May 28, 2010 at 7:58 PM, Heikki Linnakangas
 wrote:
> At PGCon, several people asked me about restarting an old master as a
> standby after failover has happened. And it wasn't the first time people ask
> me about it, even before 9.0. We have no mention of that in the docs, which
> is a pretty serious oversight. What can we say about it?
>
> I believe the current official policy is that you have to take a new base
> backup and restore from that. Rsync can be used to speed that up.
>
> However, someone once asked me for a comment on a script he wrote to do that
> in a smarter way. I forget who and when and how exactly it worked, but it
> seems possible to do safely.
>
> First of all, you have to shut down the master cleanly for this to work,
> otherwise there can be changes in the old master that never made it to the
> standby.
>
> Assuming controlled shutdown and that the standby received all WAL from the
> old master before it was promoted, I think you can simply create a
> recovery.conf in the old master's data directory to turn it into a standby
> server, and restart. Am I missing something?

Failover always increments the timeline ID of the old standby (i.e.,
new master).
Can that procedure work around the gap of the timeline ID between servers?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] small exclusion constraints patch

2010-05-28 Thread Bruce Momjian

Tom Lane wrote:
> Jeff Davis  writes:
> > Currently, the check for exclusion constraints performs a sanity check
> > that's slightly too strict -- it assumes that a tuple will conflict with
> > itself. That is not always the case: the operator might be "<>", in
> > which case it's perfectly valid for the search for conflicts to not find
> > itself.
> 
> > This patch simply removes that sanity check, and leaves a comment in
> > place.
> 
> I'm a bit uncomfortable with removing the sanity check; it seems like a
> good thing to have, especially since this code hasn't even made it out
> of beta yet.  AFAIK the "<>" case is purely hypothetical, because we
> have no index opclasses supporting such an operator, no?  How about just
> documenting that we'd need to remove the sanity check if we ever did add
> support for such a case?

Done, with attached, applied patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com
Index: src/backend/executor/execUtils.c
===
RCS file: /cvsroot/pgsql/src/backend/executor/execUtils.c,v
retrieving revision 1.171
diff -c -c -r1.171 execUtils.c
*** src/backend/executor/execUtils.c	26 Feb 2010 02:00:41 -	1.171
--- src/backend/executor/execUtils.c	29 May 2010 02:30:23 -
***
*** 1310,1316 
  
  	/*
  	 * We should have found our tuple in the index, unless we exited the loop
! 	 * early because of conflict.  Complain if not.
  	 */
  	if (!found_self && !conflict)
  		ereport(ERROR,
--- 1310,1317 
  
  	/*
  	 * We should have found our tuple in the index, unless we exited the loop
! 	 * early because of conflict.  Complain if not.  If we ever implement
!  * '<>' index opclasses, this check will fail and will have to be removed.
  	 */
  	if (!found_self && !conflict)
  		ereport(ERROR,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-28 Thread Robert Haas


On May 28, 2010, at 7:19 PM, Bruce Momjian  wrote:

Jan Wieck wrote:
Reading the entire WAL just to find all COMMIT records, then go  
back to
the origin database to get the actual replication log you're  
looking for

is simpler and more efficient? I don't think so.


Agreed, but I think I've not explained myself well enough.

I proposed two completely separate ideas; the first one was this:

If you must get commit order, get it from WAL on *origin*, using  
exact

same code that current WALSender provides, plus some logic to read
through the WAL records and extract commit/aborts. That seems much
simpler than the proposal you outlined and as SR shows, its low  
latency

as well since commits write to WAL. No need to generate event ticks
either, just use XLogRecPtrs as WALSender already does.

I see no problem with integrating that into core, technically or
philosophically.



Which means that if I want to allow a consumer of that commit order  
data

to go offline for three days or so to replicate the 5 requested, low
volume tables, the origin needs to hang on to the entire WAL log from
all 100 other high volume tables?


I suggest writing an external tool that strips out what you need that
can be run at any time, rather than creating a new data format and
overhead for this usecase.


That would be FAR more complex, less robust, and less performant -  
whereas doing what Jan has proposed is pretty straightforward and  
should have minimal impact on performance - or none when not enabled.


...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-28 Thread Bruce Momjian

Jan Wieck wrote:
> >> Reading the entire WAL just to find all COMMIT records, then go back to 
> >> the origin database to get the actual replication log you're looking for 
> >> is simpler and more efficient? I don't think so.
> > 
> > Agreed, but I think I've not explained myself well enough.
> > 
> > I proposed two completely separate ideas; the first one was this:
> > 
> > If you must get commit order, get it from WAL on *origin*, using exact
> > same code that current WALSender provides, plus some logic to read
> > through the WAL records and extract commit/aborts. That seems much
> > simpler than the proposal you outlined and as SR shows, its low latency
> > as well since commits write to WAL. No need to generate event ticks
> > either, just use XLogRecPtrs as WALSender already does.
> > 
> > I see no problem with integrating that into core, technically or
> > philosophically.
> > 
> 
> Which means that if I want to allow a consumer of that commit order data 
> to go offline for three days or so to replicate the 5 requested, low 
> volume tables, the origin needs to hang on to the entire WAL log from 
> all 100 other high volume tables?

I suggest writing an external tool that strips out what you need that
can be run at any time, rather than creating a new data format and
overhead for this usecase.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clearing psql's input buffer after auto-reconnect

2010-05-28 Thread Bruce Momjian

Tom Lane wrote:
> We determined that $SUBJECT would be a good idea in this thread:
> http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php
> 
> I looked a bit at what it would take to make this happen.  The
> difficulty is that the input buffer is a local variable in MainLoop(),
> and so are a bunch of other subsidiary variables that would need to be
> reset along with it.  The place where auto-reconnect presently happens
> is CheckConnection(), which is in a different file and is also several
> levels of subroutine call away from MainLoop.  AFAICS there are three
> ways that we might attack this:
> 
> 1. Massive restructuring of the code in common.c so that the fact of
> a connection reset having happened can be returned back to MainLoop.
> 
> 2. Export much of MainLoop's internal state as globals, so that
> CheckConnection can hack on it directly.
> 
> 3. Have CheckConnection do longjmp(sigint_interrupt_jmp) after resetting
> the connection, to force control to go back to MainLoop directly.
> MainLoop is already coded to clear its local state after catching a
> longjmp.
> 
> Now #1 might be the best long-term solution but I have no particular
> appetite to tackle it, and #2 is just too ugly to contemplate.  That
> leaves #3, which is a bit ugly in its own right but seems like the best
> fix we're likely to get.
> 
> Comments, better ideas?

Added to TODO:

Prevent psql from sending remaining single-line multi-statement queries
after reconnection

* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00159.php
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01283.php 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql's is_select_command is naive

2010-05-28 Thread Bruce Momjian

Robert Haas wrote:
> On Wed, May 26, 2010 at 10:35 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> It knows that queries beginning with "select" or "values" are select
> >> commands, but it seems not to be clued in about "table" and "with".
> >
> > What we really ought to do IMO is throw out the entire current
> > implementation of fetch_count. ?If libpq exposed access to the
> > protocol-level fetch count, we could implement it without this
> > cursor kluge.
> 
> I suspect that would make a lot of people very happy.

I have added the following TODO:

Fix FETCH_COUNT to handle SELECT ... INTO and WITH queries
* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01565.php
* http://archives.postgresql.org/pgsql-bugs/2010-05/msg00192.php

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-28 Thread Jan Urbański

On 28/05/10 22:22, Tom Lane wrote:
> The idea that I was toying with is to assume a Zipfian distribution of
> the input (with some reasonable parameter), and use that to estimate
> what the frequency of the K'th element will be, where K is the target
> number of MCV entries or perhaps a bit more.  Then use that estimate as
> the "s" value, and set e = s/10 or so, and then w = 1/e and continue as
> per the paper.  If the eventual filtering results in a lot less than the
> target number of MCV entries (because the input wasn't so Zipfian), we
> lose, but at least we have accurate numbers for the entries we kept.

I see what you mean, so the idea would be:

 * assume some value of W as the number of all words in the language
 * estimate s as 1/(st + 10)*H(W), where H(W) is the W'th harmonic
number and st is the statistics target, using Zipf's law
 * set e = s/10 and w = 1/e, that is 10/s
 * perform LC using that value of w
 * remove all elements for which f < (s-e)N, that is f < 0.9*sN, where N
is the total number of lexemes processed
 * create the MCELEM entries as (item, f/N)

Now I tried to substitute some numbers there, and so assuming the
English language has ~1e6 words H(W) is around 6.5. Let's assume the
statistics target to be 100.

I chose s as 1/(st + 10)*H(W) because the top 10 English words will most
probably be stopwords, so we will never see them in the input.

Using the above estimate s ends up being 6.5/(100 + 10) = 0.06

We then do LC, pruning the D structure every w = 1/0.006 = 167 lexemes

After that, we remove lexemes with f < 0.9 * 0.06 * N = 0.054*N

So assuming that on average a tsvector has 154 elements and that we went
through 35017 rows (as it would be in Jesper's case, before he raised
the stats target from 100 to 1000), we will remove lexemes with f <
0.054 * 35017 * 154 that is f < 291201.37

I wonder what would happen if Jasper's case if we did that... And I
wonder how sound that maths is.

>> I we should definitely prune the table one last time in the very
>> probable case that the loop ended in the middle of two regularly
>> happening prunes.
> 
> The paper doesn't say that you need to do that.  I suspect if you work
> through the math, you'll find out that the minimum-f filter skips
> anything that would have been pruned anyway.

Possible.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Martijn van Oosterhout

On Fri, May 28, 2010 at 10:32:34PM +0300, Peter Eisentraut wrote:
> On fre, 2010-05-28 at 14:48 -0400, Tom Lane wrote:
> > > SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;
> > 
> > That seems fairly bizarre.  What does this mean:
> > 
> > WHERE a COLLATE en > b COLLATE de
> > 
> > ?  If it's an error, why is this not an error
> > 
> > WHERE a COLLATE en > b
> > 
> > if b is marked as COLLATE de in its table?
> 
> The way I understand it, a collation "derivation" can be explicit or
> implicit.  Explicit derivations override implicit derivations.  If in
> the argument set of an operation, explicit collation derivations exist,
> they must all be the same.

The SQL standard has an explicit set of rules for determining the
collations of any particular operation (they apply to
operators/functions not to the datums).

The basic idea is that tables/columns/data types define an implicit
collation, which can be overidden by explicit collations. If there is
ambiguity you throw an error. I implemented this all several years ago,
it's not all that complicated really. IIRC I added a field to the Node type
and each level determined it's collection from the sublevels.

I solved the problem for the OP by providing an extra function to user
defined functions which would return the collation for that particular
call.

The more interesting question I found was that the standard only
defined collation for strings, whereas it can be applied much more
broadly. I described a possible solution several years back, it should
in the archives somewhere. It worked pretty well as I recall.

IIRC The idea was to let each type has its own set of collations and
when using an operator/function you let the collection be determined
by the argument that had the same type as the return type.

It would be nice if COLLATE could finally be implemented, it'd be quite
useful.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first. 
>   - Charles de Gaulle

signature.asc
Description: Digital signature

Re: [HACKERS] List traffic

2010-05-28 Thread Jaime Casanova

On Fri, May 28, 2010 at 3:44 PM, Josh Berkus  wrote:
> On 5/27/10 5:42 PM, Tom Lane wrote:
>> Josh Berkus  writes:
>>> We do not have a problem.   The lists are fine the way they are.
>>
>> +1 ... wasn't the point I thought you were trying to make, but I'm
>> good with not changing things.
>
> Yeah, that's because I was responding to the suggestion that 5 of our
> lists should be collapsed into 'general' as the One Uber-List.
>

i think not all should be collapsed but at least -novice, IMHO

-- 
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Heikki Linnakangas


On 28/05/10 23:15, Robert Haas wrote:

On Fri, May 28, 2010 at 3:20 PM, Peter Eisentraut  wrote:

On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:

I think we need to think of the comparison operators as ternary, and
the COLLATE syntax applied to columns or present in queries as various
ways of setting defaults or explicit overrides for what the third
argument will end up being.


How could this extend to things like isalpha() or upper() that would
need access to ctype information?


Good question.  :-(


Strictly speaking, collation and ctype are two different things. Which 
is a convenient way to evade the question :-).


But you could ask, how would we handle more fine-grained ctype in 
upper() then? Perhaps by adding a second argument for ctype. Similarly 
to to_tsvector([config, ] string), you could explicitly pass the ctype 
as an argument, or leave it out in which case a default is used. It 
wouldn't give you per-column ctype, though.


What does the spec have to say about the ctype used for upper() et al BTW?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Heikki Linnakangas


On 28/05/10 22:20, Dimitri Fontaine wrote:

Heikki Linnakangas  writes:

Not shipped before the first failover you mean? No, if any WAL records were
created in the old master that were not shipped to the standby before
failover, the corresponding changes to the data files might've been flushed
to disk already, and you can't undo those by not replaying the WAL record on
restart.


Ah yes you need to fail between when (WAL is written and not sent) and
CHECKPOINT for this to be possible.


Checkpoint only guarantees that everything before that is flushed to 
disk. It doesn't guarantee that nothing is flushed to disk until that. 
If there's a checkpoint that hasn't been shipped to the standby, you're 
certainly hosed, but if there is no checkpoint you don't know if the 
data files have changed or not.



But automatic testing of the
situation (is the data already safe in PGDATA) might still be possible?


Hmm, so the situation is this:

D - E - crash!
  /
A - B - C
  \
d - f - g - h

The letters represent WAL records. C is the last WAL record that was 
shipped to the standby, D & E are WAL records that were generated in the 
old master before the crash but never sent to the standby, and d-h are 
WAL records created in the standby after failover.


I guess you could read the WAL in the old master and compare it with the 
WAL from the standby to figure out where the failover happened (C), and 
then scan all the data pages involved in records D - E, checking that 
the LSNs on the data pages touched by those records are earlier than C. 
That's a bit laborious, and requires knowledge of all different kinds of 
WAL records to figure out which data pages they touch, but seems 
possible in theory.



How easy is it to script that? It seems all we need is the current XID
of the slave at the end of recovery. It should be in the log, maybe it's
easy enough to expose it at the SQL level…


XID doesn't help at all, LSN more likely, but I feel that I don't fully
understand what you're saying.


Sorry I was unclear, I was thinking in terms of recovery.conf file and
either recovery_target_xid or recovery_target_time. The idea being that
if the old-master didn't CHECKPOINT the changes that the slave missed,
then we can do crash recovery and choose to stop before that point, then
apply WALs from the new master.


Ah, I see. No, you don't want to use a recovery target, that would end 
the recovery and start the server. You just need to make sure to use 
WALs from the new master instead of the old one when both exist.



So you're saying controlled failover could possibly skip base backup to
reuse old master as new slave, and I'm asking if by some luck (crash
happened before CHECKPOINT) and some recovery.conf setup we could get to
the same situation in case of hard failure. That would allow completely
automatic switchover / failover with no need to resync.


Yeah, that would be nice. In practice, I think you would get lucky more 
often than not, because whenever you modify and dirty a page, writing a 
WAL record, the usage count on the buffer is incremented and it won't be 
evicted from the buffer cache for a while.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] List traffic

2010-05-28 Thread Josh Berkus

On 5/27/10 5:42 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> We do not have a problem.   The lists are fine the way they are.
> 
> +1 ... wasn't the point I thought you were trying to make, but I'm
> good with not changing things.

Yeah, that's because I was responding to the suggestion that 5 of our
lists should be collapsed into 'general' as the One Uber-List.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Pavel Stehule

2010/5/28 alvherre :
> Excerpts from Peter Eisentraut's message of vie may 28 12:27:52 -0400 2010:
>
>> Option 2, invent some new mechanism that accompanies a datum or a type
>> whereever it goes.  Kind of like typmod, but not really.  Then the
>> collation information would presumably be made available to functions
>> through the fmgr interface.  The binary representation of data values
>> stays the same.
>
> Is the collation a property of the datum, or one of the comparison?
> If the latter, should it be really be made a sidecar of a datum, or
> would it make more sense to attach it to the operation being performed?

>
> I wonder if instead of trying to pass it down multiple layers till
> bttextcmp and further down, it would make more sense to set a global
> variable somewhere in the high levels, and only have it checked in
> varstr_cmp.
>

Maybe collation is property of some operation: func call, sort, ... I
prefer to put collation info to FunctionCallInfo structure. Usually
you cannot change collation per row - collation is attached to
expression.

Regards

Pavel
> --
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-28 Thread Tom Lane

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
> We follow the algorithm as written, the trouble starts when we want to
> output the result. The paper says which items from the D structure
> should be returned when the user asks for items that have frequencies
> higher than a threshold s. What we want to put in the statistics table
> are items accompanied by their frequencies, so we need to do some
> reasoning before we can construct the result.

Well, the estimated frequency is still just f/N.  The point is that we
must filter out items with small f values because they're probably
inaccurate --- in particular, anything with f < eN is completely
untrustworthy.

I agree that we currently aren't bothering to determine a specific s
value, but we probably need to do that in order to have a clear
understanding of what we are getting.

The idea that I was toying with is to assume a Zipfian distribution of
the input (with some reasonable parameter), and use that to estimate
what the frequency of the K'th element will be, where K is the target
number of MCV entries or perhaps a bit more.  Then use that estimate as
the "s" value, and set e = s/10 or so, and then w = 1/e and continue as
per the paper.  If the eventual filtering results in a lot less than the
target number of MCV entries (because the input wasn't so Zipfian), we
lose, but at least we have accurate numbers for the entries we kept.

> I we should definitely prune the table one last time in the very
> probable case that the loop ended in the middle of two regularly
> happening prunes.

The paper doesn't say that you need to do that.  I suspect if you work
through the math, you'll find out that the minimum-f filter skips
anything that would have been pruned anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Robert Haas

On Fri, May 28, 2010 at 3:20 PM, Peter Eisentraut  wrote:
> On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:
>> I think we need to think of the comparison operators as ternary, and
>> the COLLATE syntax applied to columns or present in queries as various
>> ways of setting defaults or explicit overrides for what the third
>> argument will end up being.
>
> How could this extend to things like isalpha() or upper() that would
> need access to ctype information?

Good question.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql

2010-05-28 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I thought it might be about that simple once you went at it the right
>> way ;-).  However, I'd suggest checking ferror(pset.queryFout) as well
>> as the fflush result.

> Sure, I can add the ferror() check.  Patch attached.

This seemed pretty small and uncontroversial, so I went ahead and
committed it for 9.0.  I rearranged the order of operations a bit to
make it seem more coherent, and also added an initial clearerr() just
to forestall problems if stdout had the error flag set for some reason.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-28 Thread Jan Urbański

On 28/05/10 04:47, Tom Lane wrote:
> I re-scanned that paper and realized that there is indeed something
> wrong with the way we are doing it.  The paper says (last sentence in
> the definition of the algorithm, section 4.2):
> 
>   When a user requests a list of items with threshold s, we output
>   those entries in D where f >= (s-e)N.
> 
> What we are actually doing is emitting every entry with f >= 2.  Since
> e is fixed at 1/w, this effectively means that we are setting s to be
> only fractionally greater than e, which means very large relative errors
> in the estimates.

I gave it a though and reread the paper, but since I already blundered
once, please verify me on this.

We follow the algorithm as written, the trouble starts when we want to
output the result. The paper says which items from the D structure
should be returned when the user asks for items that have frequencies
higher than a threshold s. What we want to put in the statistics table
are items accompanied by their frequencies, so we need to do some
reasoning before we can construct the result.

Say we have an item I with count f (taken from our D structure). The
total number of entries is N. The question would be: what would be the
minimum frequency that the user could specify, that would still make us
output this element. From

f >= (s - e) * N

we can say it's

s <= (f / N) + e

So if the user wants items that occur with frequency (f / N) + e or
less. This would mean that the corresponding value in the statistics
entry should be < I, (f / N) + e) >

The thing is, this doesn't change much, because currently we are putting
(f / N) there, and e is set to 1 / stats_target * 10, so the difference
would not be dramatic.

> Or, if you want it explained another way: we are emitting words whose f
> is very small and whose delta is very large, representing items that got
> added to the scan very late.  These really shouldn't be there because
> their true frequency is probably a lot less than the intended threshold.

Well, the idea of the algorithm is that if their frequency would have
been bigger, they would appear earlier and would survive the pruning, as
I understand it.

> The net effect of this is first that there are a lot of rather useless
> entries in the MCV list whose claimed frequency is quite small, like as
> little as two occurrences.  Their true frequency could be quite a bit
> more.  What's even worse is that we believe that the minimum of these
> claimed frequencies is a reliable upper bound for the frequencies of
> items not listed in the MCV list.

Per the algorithm it *is* the upper bound, if I got my maths correctly.
The last item in the list would not be returned if the requested
frequency was higher than the one that is associated to that item.

> So I think we have to fix this.  The right thing is to select s and e
> values that are actually meaningful in the terms of the paper, then
> compute w from that, and be sure to enforce the minimum f value
> specified in the algorithm ... ie, don't be afraid to throw away values
> in the final D table.

I we should definitely prune the table one last time in the very
probable case that the loop ended in the middle of two regularly
happening prunes.

As for selecting the algorithm parameters: we don't get to select s. We
do get to select e, but that's it. I have a feeling that our problems
are caused by thte fact, that the algorithm tries to answer the question
"which elements occur more frequently than X" and we actually want the
answer to the "what's the frequency of each element". I've almost
convinced myself that the transformation between the answers to these
questions exists, but this trouble report keeps giving me doubts.

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Dimitri Fontaine

Hi,

Peter Eisentraut  writes:
> On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote:
>> USING  syntax). The behavior is exactly what we want, it's 
>> just completely inpractical, so we need something to do the same in a 
>> less cumbersome way.

For an example, here is something I did to better understand the system
a while ago. Of course I never got to use it for real:

  http://pgsql.tapoueh.org/btree_fr_ops/

> Well, maybe we should step back a little and work out what sort of
> feature we actually want, if any.  The feature I'm thinking of is what
> people might call "per-column locale", and the SQL standard defines
> that.  It would look like this:
>
> CREATE TABLE test (
> a varchar COLLATE de,
> b varchar COLLATE fr
> );
>
> SELECT * FROM test WHERE a > 'baz' ORDER BY b;
>
> So while it's true that the collation is used by the operations (> and
> ORDER BY), the information which collation to use comes with the data
> values.  It's basically saying, a is in language "de", so sort it like
> that unless told otherwise.  There is also an override syntax available,
> like this:
>
> SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;
>
> But here again the collation is attached to a data value, and only from
> there it is passed to the operator.  What is actually happening is
>
> SELECT * FROM test WHERE (a COLLATE en) > 'baz' ORDER BY (b COLLATE sv);
>
>
> What you appear to be describing is a "per-operation locale", which also
> sounds valid, but it would be a different thing.  It might be thought of
> as this:
>
> SELECT * FROM test WHERE a (> COLLATE en) 'baz' ORDER BY COLLATE sv b;
>
> with some suitable global default.
>
>
> So which one of these should it be?

My understanding is that what we do is per-operation locale. The locale
information bears no semantic when not attached to some operation on
strings, like sorting or comparing.

So what you're showing here I think is how to attach a collation label
to every string in the system, at the catalog level or dynamically at
the query level. 

Now this collation label will only be used whenever you want to use a
collation aware function or operator. Those functions need to get the
labels for their implementation to have the expected meaning.

So we need both to attach collations to all known strings (defaulting to
the current database collation I guess), as you showed at the SQL level,
and to pass this information down to the functions operating on those
strings.

A confusing example on this grounds would be the following, which I hope
the standard disallow:

  SELECT * FROM test WHERE a COLLATE en > b COLLATE sv;

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Dimitri Fontaine

Heikki Linnakangas  writes:
> On 28/05/10 19:19, Josh Berkus wrote:
>> EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5
>
> Once you solve the problem of finding the '='s in the source, replacing them
> is exactly the same effort regardless of what you replace them with.

I guess it would be a choice of target between
  'GXKP' AS ItemCode, 5 AS PriceLevel
and
  ItemCode := 'GXKP', PriceLevel := 5

By the way, as it seems we're voting, I much prefer := than either the
AS and => variant, and I'm not keen on seeing us deprecate the operator.

Further, as said Andrew, keeping AS conflicting with the standard with
no hysterical raisin to do so would be a bad move IMHO.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Peter Eisentraut

On fre, 2010-05-28 at 14:48 -0400, Tom Lane wrote:
> > SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;
> 
> That seems fairly bizarre.  What does this mean:
> 
>   WHERE a COLLATE en > b COLLATE de
> 
> ?  If it's an error, why is this not an error
> 
>   WHERE a COLLATE en > b
> 
> if b is marked as COLLATE de in its table?

The way I understand it, a collation "derivation" can be explicit or
implicit.  Explicit derivations override implicit derivations.  If in
the argument set of an operation, explicit collation derivations exist,
they must all be the same.

> I guess the more general question is whether the spec expects that
> collation settings can be derived statically (like type information)
> or whether they might sometimes only be known at runtime.

It looks like it is treated like type information.  The derivation and
validation rules are part of the Syntax Rules.

> We also need to think about whether we're okay with only applying
> collation to built-in types (text, varchar, char) or whether we need
> the feature to work for add-on types as well.  In particular, is citext
> still a meaningful feature if we have this, or is it superseded by
> COLLATE?  In the abstract I'd prefer to let it work for user-defined
> types, but if we can have a much simpler implementation by not doing
> so, it might be better to give that up.

I think if we get this done using the strcoll_l() API to do the work,
which looks like the path of least resistance at the moment, citext
would still be useful because all the standard locales would still be
case sensitive.

> Is COLLATE a property that can be attached to a domain over text?

According to the spec, yes.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: PGDLLEXPORT is __declspec (dllexport) only on MSVC, but is

2010-05-28 Thread Tom Lane

itag...@postgresql.org (Takahiro Itagaki) writes:
> Log Message:
> ---
> PGDLLEXPORT is __declspec (dllexport) only on MSVC,
> but is __declspec (dllimport) on other compilers
> because cygwin and mingw don't like dllexport.

That probably explains why the code was the way it was before ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Peter Eisentraut

On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:
> I think we need to think of the comparison operators as ternary, and
> the COLLATE syntax applied to columns or present in queries as various
> ways of setting defaults or explicit overrides for what the third
> argument will end up being.

How could this extend to things like isalpha() or upper() that would
need access to ctype information?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Dimitri Fontaine

Heikki Linnakangas  writes:
> Not shipped before the first failover you mean? No, if any WAL records were
> created in the old master that were not shipped to the standby before
> failover, the corresponding changes to the data files might've been flushed
> to disk already, and you can't undo those by not replaying the WAL record on
> restart.

Ah yes you need to fail between when (WAL is written and not sent) and
CHECKPOINT for this to be possible. But automatic testing of the
situation (is the data already safe in PGDATA) might still be possible?

>> How easy is it to script that? It seems all we need is the current XID
>> of the slave at the end of recovery. It should be in the log, maybe it's
>> easy enough to expose it at the SQL level…
>
> XID doesn't help at all, LSN more likely, but I feel that I don't fully
> understand what you're saying.

Sorry I was unclear, I was thinking in terms of recovery.conf file and
either recovery_target_xid or recovery_target_time. The idea being that
if the old-master didn't CHECKPOINT the changes that the slave missed,
then we can do crash recovery and choose to stop before that point, then
apply WALs from the new master.

That might sounds like a strange thing to do, but if switching from
master to slave allows skipping the base backup to get a slave again, I
guess we'll see people choosing the all automated failover scripting
(with heartbeat and so on). The goal would be to reduce downtime the
more you can.

When possible I'd still choose manual failover to the slave after a
master's restart and crash recovery, but the downtime constraint might
not allow that everywhere.

So you're saying controlled failover could possibly skip base backup to
reuse old master as new slave, and I'm asking if by some luck (crash
happened before CHECKPOINT) and some recovery.conf setup we could get to
the same situation in case of hard failure. That would allow completely
automatic switchover / failover with no need to resync.

I'm not sure how much clearer I managed to be :)

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Robert Haas

On Fri, May 28, 2010 at 10:08 AM, Pavel Stehule  wrote:
> 2010/5/28 Tom Lane :
>> Heikki Linnakangas  writes:
 Peter Eisentraut  writes:
> How about
> select myfunc(a := 7, b := 6);
>>
>>> If we go with that, should we make some preparations to allow => in the
>>> future? Like provide an alternative operator name for hstore's =>, and
>>> add a note somewhere in the docs to discourage other modules from using =>.
>>
>> I'd vote no.  We're intentionally choosing to deviate from a very poor
>> choice of notation.  Maybe Peter can interest the committee in allowing
>> := as an alternate notation, instead.
>
> -1
>
> I prefer a standard. And again - it isn't poor syntax - ADA, Perl use
> it, It can be a funny if ANSI SQL committee change some design from
> Oracle's proposal to PostgreSQL's proposal.

I agree.  It's good syntax.  I think we should try hard to adopt it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Robert Haas

On Fri, May 28, 2010 at 2:48 PM, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> So while it's true that the collation is used by the operations (> and
>> ORDER BY), the information which collation to use comes with the data
>> values.  It's basically saying, a is in language "de", so sort it like
>> that unless told otherwise.  There is also an override syntax available,
>> like this:
>
>> SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;
>
> That seems fairly bizarre.  What does this mean:
>
>        WHERE a COLLATE en > b COLLATE de
>
> ?  If it's an error, why is this not an error
>
>        WHERE a COLLATE en > b
>
> if b is marked as COLLATE de in its table?

I think we need to think of the comparison operators as ternary, and
the COLLATE syntax applied to columns or present in queries as various
ways of setting defaults or explicit overrides for what the third
argument will end up being.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Tom Lane

Peter Eisentraut  writes:
> So while it's true that the collation is used by the operations (> and
> ORDER BY), the information which collation to use comes with the data
> values.  It's basically saying, a is in language "de", so sort it like
> that unless told otherwise.  There is also an override syntax available,
> like this:

> SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;

That seems fairly bizarre.  What does this mean:

WHERE a COLLATE en > b COLLATE de

?  If it's an error, why is this not an error

WHERE a COLLATE en > b

if b is marked as COLLATE de in its table?

I guess the more general question is whether the spec expects that
collation settings can be derived statically (like type information)
or whether they might sometimes only be known at runtime.

We also need to think about whether we're okay with only applying
collation to built-in types (text, varchar, char) or whether we need
the feature to work for add-on types as well.  In particular, is citext
still a meaningful feature if we have this, or is it superseded by
COLLATE?  In the abstract I'd prefer to let it work for user-defined
types, but if we can have a much simpler implementation by not doing
so, it might be better to give that up.

Is COLLATE a property that can be attached to a domain over text?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Tom Lane

Andrew Dunstan  writes:
> Yeah. Whether or not we ever implement it really doesn't matter, IMO. We 
> should not be in the business of taking an SQL standard piece of syntax 
> and using it for some other purpose.

Evidently the 201x SQL standard has blindsided us twice: first by
defining a syntax for named parameters that wasn't like ours, and second
by defining a syntax for something else that conflicted with ours.
I agree that the AS approach is pretty untenable given that double
whammy, and we'd better get rid of it.  (Hopefully Peter will be able
to keep us better apprised of things in the future.)

It seems that we're agreed on trying to use := instead, and the only
debate is about whether to deprecate use of => as an operator.  But
anything that we might do about the latter would reach no farther than
the documentation in 9.0 anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Tom Lane

Josh Berkus  writes:
> Since former SQL Server / Sybase apps are the most likely to use named 
> parameter notation in PostgreSQL, having a syntax which could be ported 
> using only "sed" would be nice.

I fear you're vastly overestimating the ability of sed to distinguish
between = used in this way and = used in any other way.  Still, putting
the parameter name on the left is clearly both more natural and more
like every other product.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VPATH docs

2010-05-28 Thread Bruce Momjian


Patch applied.  Thanks.

---


David Fetter wrote:
> Folks,
> 
> Andrew Dunstan posted some instructions on his blog, and I'm thinking
> they clarify things a great deal for people who want to learn how to
> do VPATH builds.
> 
> Attached patch adds the description along with an index term.  What
> say?
> 
> Cheers,
> David.
> -- 
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter  XMPP: david.fet...@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
> 
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate

[ Attachment, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Peter Eisentraut

On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote:
> It's also fundamentally wrong, collation is not a property of a datum 
> but of the operation.

> One way to approach this is to realize that it's already possible to
> use 
> multiple collations in a database. You just have to define separate <
> = 
>  > operators and operator classes for every collation, and change all 
> your queries to use the right operator depending on the desired 
> collation everywhere where you use < = > (including ORDER BYs, with
> the 
> USING  syntax). The behavior is exactly what we want, it's 
> just completely inpractical, so we need something to do the same in a 
> less cumbersome way.

Well, maybe we should step back a little and work out what sort of
feature we actually want, if any.  The feature I'm thinking of is what
people might call "per-column locale", and the SQL standard defines
that.  It would look like this:

CREATE TABLE test (
a varchar COLLATE de,
b varchar COLLATE fr
);

SELECT * FROM test WHERE a > 'baz' ORDER BY b;

So while it's true that the collation is used by the operations (> and
ORDER BY), the information which collation to use comes with the data
values.  It's basically saying, a is in language "de", so sort it like
that unless told otherwise.  There is also an override syntax available,
like this:

SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;

But here again the collation is attached to a data value, and only from
there it is passed to the operator.  What is actually happening is

SELECT * FROM test WHERE (a COLLATE en) > 'baz' ORDER BY (b COLLATE sv);

What you appear to be describing is a "per-operation locale", which also
sounds valid, but it would be a different thing.  It might be thought of
as this:

SELECT * FROM test WHERE a (> COLLATE en) 'baz' ORDER BY COLLATE sv b;

with some suitable global default.

So which one of these should it be?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] dividing money by money

2010-05-28 Thread Andy Balholm

I'm not quite sure how to go about changing it from an add-on function to a 
built-in one. So if you want to do that, go ahead. If you'd rather I did, just 
tell me how it's done.

Andy Balholm
(509) 276-2065
a...@balholm.com

On May 26, 2010, at 11:18 AM, Kevin Grittner wrote:

> Hi Andy,
> 
> Do you want to package this up as a patch for 9.1?  If not, is it OK
> if I do?
> 
> -Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Heikki Linnakangas


On 28/05/10 19:19, Josh Berkus wrote:

( parameter := value ) notation is not only consistent with what is used
inside pl/pgsql, it's also more consistent than "AS" with SQL Server's
named parameter notation, which is:

EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5

Since former SQL Server / Sybase apps are the most likely to use named
parameter notation in PostgreSQL, having a syntax which could be ported
using only "sed" would be nice.


Once you solve the problem of finding the '='s in the source, replacing 
them is exactly the same effort regardless of what you replace them with.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread Heikki Linnakangas


On 28/05/10 19:27, Peter Eisentraut wrote:

I have been thinking about this collation support business a bit.
Ignoring for the moment where we would get the actual collation routines
from, I wonder how we are going to pass this information around in the
system.  Someone declares a collation on a column in a table, and
somehow this information needs to arrive in bttextcmp() and friends.


Yes. Comparison operators need it, as do functions like isalpha().


Also, functions that take in a string and return one (e.g., substring),
need to take in this information and return it back out.  How should
this work?


Hmm, I don't see what substring would need collation for. And it 
certainly shouldn't be returning it. Collation is a property of the 
comparison operators (and isalpha etc.), and the planner needs to deduce 
the right collation for each such operation in the query. That involves 
looking at the tables and columns involved, as well as per-user 
information and any explicit COLLATE clauses in the query, but all that 
happens at plan-time.



Option 1, make it part of the datum.  That way it will pass through the
system just fine, but it would waste a lot of storage and break just
about everything that operates on string types now, as well as
pg_upgrade.  So that's probably out.


It's also fundamentally wrong, collation is not a property of a datum 
but of the operation.



Option 2, invent some new mechanism that accompanies a datum or a type
whereever it goes.  Kind of like typmod, but not really.  Then the
collation information would presumably be made available to functions
through the fmgr interface.  The binary representation of data values
stays the same.


Something like that. I'm thinking that bttextcmp() and friends will 
simply take an extra argument indicating the collation, and we'll teach 
the operator / operator class infrastructure about that too.


One way to approach this is to realize that it's already possible to use 
multiple collations in a database. You just have to define separate < = 
> operators and operator classes for every collation, and change all 
your queries to use the right operator depending on the desired 
collation everywhere where you use < = > (including ORDER BYs, with the 
USING  syntax). The behavior is exactly what we want, it's 
just completely inpractical, so we need something to do the same in a 
less cumbersome way.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to pass around collation information

2010-05-28 Thread alvherre

Excerpts from Peter Eisentraut's message of vie may 28 12:27:52 -0400 2010:

> Option 2, invent some new mechanism that accompanies a datum or a type
> whereever it goes.  Kind of like typmod, but not really.  Then the
> collation information would presumably be made available to functions
> through the fmgr interface.  The binary representation of data values
> stays the same.

Is the collation a property of the datum, or one of the comparison?
If the latter, should it be really be made a sidecar of a datum, or
would it make more sense to attach it to the operation being performed?

I wonder if instead of trying to pass it down multiple layers till
bttextcmp and further down, it would make more sense to set a global
variable somewhere in the high levels, and only have it checked in
varstr_cmp.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Andrew Dunstan




Bruce Momjian wrote:

Josh Berkus wrote:
  
Since former SQL Server / Sybase apps are the most likely to use named 
parameter notation in PostgreSQL, having a syntax which could be ported 
using only "sed" would be nice.


Relevant to the whole discussion, though ... is the conflicting SQL 
standard syntax something we're every likely to implement?



Not sure, but I assume people could be using the AS syntax in other
databases (for the inheritance usage) and then trying to use it in our
database.

  


Yeah. Whether or not we ever implement it really doesn't matter, IMO. We 
should not be in the business of taking an SQL standard piece of syntax 
and using it for some other purpose.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Bruce Momjian

Josh Berkus wrote:
> Since former SQL Server / Sybase apps are the most likely to use named 
> parameter notation in PostgreSQL, having a syntax which could be ported 
> using only "sed" would be nice.
> 
> Relevant to the whole discussion, though ... is the conflicting SQL 
> standard syntax something we're every likely to implement?

Not sure, but I assume people could be using the AS syntax in other
databases (for the inheritance usage) and then trying to use it in our
database.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to pass around collation information

2010-05-28 Thread Peter Eisentraut

I have been thinking about this collation support business a bit.
Ignoring for the moment where we would get the actual collation routines
from, I wonder how we are going to pass this information around in the
system.  Someone declares a collation on a column in a table, and
somehow this information needs to arrive in bttextcmp() and friends.
Also, functions that take in a string and return one (e.g., substring),
need to take in this information and return it back out.  How should
this work?

Option 1, make it part of the datum.  That way it will pass through the
system just fine, but it would waste a lot of storage and break just
about everything that operates on string types now, as well as
pg_upgrade.  So that's probably out.

Option 2, invent some new mechanism that accompanies a datum or a type
whereever it goes.  Kind of like typmod, but not really.  Then the
collation information would presumably be made available to functions
through the fmgr interface.  The binary representation of data values
stays the same.

Option 2a, while we are at it, are there any other things of this nature
that would be worth supporting at the same time?  I could imagine that
having the option to pass around the ctype locale or the text search
dictionary in a similar way could be useful.  Is this something that
could be combined with typmod or other dormant data type metadata
requirements (PostGIS?, XML?)?

Ideas?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Josh Berkus




What's poor about it? It probably comes from PLSQL which in turn got it
from Ada, so they aren't just making this up. I agree it's inconvenient
for us, but that's a different issue.


Further, the
( parameter := value ) notation is not only consistent with what is used 
inside pl/pgsql, it's also more consistent than "AS" with SQL Server's 
named parameter notation, which is:


EXEC dbo.GetItemPrice @ItemCode = 'GXKP', @PriceLevel = 5

Since former SQL Server / Sybase apps are the most likely to use named 
parameter notation in PostgreSQL, having a syntax which could be ported 
using only "sed" would be nice.


Relevant to the whole discussion, though ... is the conflicting SQL 
standard syntax something we're every likely to implement?


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Bruce Momjian

Peter Eisentraut wrote:
> On fre, 2010-05-28 at 10:21 -0400, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > Tom Lane wrote:
> > > >> ... indeed.  Is it worth burdening the pg_stats mechanism with this?
> > > >> The use case seems vanishingly thin.
> > > 
> > > > I am confused how this is different from inet_server_addr() and
> > > > inet_server_port().  
> > > 
> > > I think the point is to let someone find out *from another session*
> > > which server port number a particular session is using.  I fail to see
> > > a significant use case for that, though.
> > 
> > Uh, aren't they all using the same server port number, e.g. 5432?  Is
> > the issue different IP addresses for the same server?
> 
> Yes, I would like to know who is connecting to what IP address.  It's
> useful if you have HA setups and you need to check which way your
> connections are going.

OK, at least now I understand the goal.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Peter Eisentraut

On fre, 2010-05-28 at 10:21 -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Tom Lane wrote:
> > >> ... indeed.  Is it worth burdening the pg_stats mechanism with this?
> > >> The use case seems vanishingly thin.
> > 
> > > I am confused how this is different from inet_server_addr() and
> > > inet_server_port().  
> > 
> > I think the point is to let someone find out *from another session*
> > which server port number a particular session is using.  I fail to see
> > a significant use case for that, though.
> 
> Uh, aren't they all using the same server port number, e.g. 5432?  Is
> the issue different IP addresses for the same server?

Yes, I would like to know who is connecting to what IP address.  It's
useful if you have HA setups and you need to check which way your
connections are going.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Andrew Dunstan




Tom Lane wrote:

Heikki Linnakangas  writes:
  

Peter Eisentraut  writes:
  

How about
select myfunc(a := 7, b := 6);



  
If we go with that, should we make some preparations to allow => in the 
future? Like provide an alternative operator name for hstore's =>, and 
add a note somewhere in the docs to discourage other modules from using =>.



I'd vote no.  We're intentionally choosing to deviate from a very poor
choice of notation.  Maybe Peter can interest the committee in allowing
:= as an alternate notation, instead.


  


What's poor about it? It probably comes from PLSQL which in turn got it 
from Ada, so they aren't just making this up.  I agree it's inconvenient 
for us, but that's a different issue.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian  writes:
> > Tom Lane wrote:
> >> ... indeed.  Is it worth burdening the pg_stats mechanism with this?
> >> The use case seems vanishingly thin.
> 
> > I am confused how this is different from inet_server_addr() and
> > inet_server_port().  
> 
> I think the point is to let someone find out *from another session*
> which server port number a particular session is using.  I fail to see
> a significant use case for that, though.

Uh, aren't they all using the same server port number, e.g. 5432?  Is
the issue different IP addresses for the same server?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Tom Lane

Bruce Momjian  writes:
> Tom Lane wrote:
>> ... indeed.  Is it worth burdening the pg_stats mechanism with this?
>> The use case seems vanishingly thin.

> I am confused how this is different from inet_server_addr() and
> inet_server_port().  

I think the point is to let someone find out *from another session*
which server port number a particular session is using.  I fail to see
a significant use case for that, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Pavel Stehule

2010/5/28 Tom Lane :
> Heikki Linnakangas  writes:
>>> Peter Eisentraut  writes:
 How about
 select myfunc(a := 7, b := 6);
>
>> If we go with that, should we make some preparations to allow => in the
>> future? Like provide an alternative operator name for hstore's =>, and
>> add a note somewhere in the docs to discourage other modules from using =>.
>
> I'd vote no.  We're intentionally choosing to deviate from a very poor
> choice of notation.  Maybe Peter can interest the committee in allowing
> := as an alternate notation, instead.

-1

I prefer a standard. And again - it isn't poor syntax - ADA, Perl use
it, It can be a funny if ANSI SQL committee change some design from
Oracle's proposal to PostgreSQL's proposal.

Regards

Pavel


>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Working with PostgreSQL enums in C code

2010-05-28 Thread Tom Lane

Robert Haas  writes:
> On Fri, May 28, 2010 at 12:07 AM, Joseph Adams
>  wrote:
>> I learned that to return an enum value from C, one needs to return the
>> OID of the right row of the pg_enum table.  I eventually managed to
>> write the code below, which is mostly based on the enum_in function in
>> src/backend/utils/adt/enum.c .

> PG_RETURN macros shouldn't do any nontrivial processing (see the
> existing ones for references).

Yeah, that was my first reaction too.  If we don't already have one,
it would be appropriate to provide a "lookup enum value" function
(functionally about the same as enum_in, but designed to be called
conveniently from C).  Then, if you needed to work from a textual
enum label, you'd call that function and then PG_RETURN_OID.

However, for a built-in enum type, I agree with Robert's solution of
just #define-ing fixed OIDs for the values of the type.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.1] pg_stat_get_backend_server_addr

2010-05-28 Thread Bruce Momjian

Tom Lane wrote:
> Peter Eisentraut  writes:
> > There are functions pg_stat_get_backend_client_addr and
> > pg_stat_get_backend_client_port, which are exposed through the
> > pg_stat_activity view, but there is no straightforward way to get the
> > server-side address and port of a connection.  This is obviously much
> > less commonly needed than the client information,
> 
> ... indeed.  Is it worth burdening the pg_stats mechanism with this?
> The use case seems vanishingly thin.

I am confused how this is different from inet_server_addr() and
inet_server_port().  

Also, these functions return nothing for unix domain connections. 
Should they, particularly for the port number which we do use to map to
a socket name?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] functional call named notation clashes with SQL feature

2010-05-28 Thread Tom Lane

Heikki Linnakangas  writes:
>> Peter Eisentraut  writes:
>>> How about
>>> select myfunc(a := 7, b := 6);

> If we go with that, should we make some preparations to allow => in the 
> future? Like provide an alternative operator name for hstore's =>, and 
> add a note somewhere in the docs to discourage other modules from using =>.

I'd vote no.  We're intentionally choosing to deviate from a very poor
choice of notation.  Maybe Peter can interest the committee in allowing
:= as an alternate notation, instead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Heikki Linnakangas


On 28/05/10 16:11, Dimitri Fontaine wrote:

Heikki Linnakangas  writes:

Assuming controlled shutdown and that the standby received all WAL from the
old master before it was promoted, I think you can simply create a
recovery.conf in the old master's data directory to turn it into a standby
server, and restart. Am I missing something?


Would that mean that a controlled restart of the old master so that the
recovery stops before applying the logs that were not shipped to the
slave would put it in the same situation?


Not shipped before the first failover you mean? No, if any WAL records 
were created in the old master that were not shipped to the standby 
before failover, the corresponding changes to the data files might've 
been flushed to disk already, and you can't undo those by not replaying 
the WAL record on restart.



How easy is it to script that? It seems all we need is the current XID
of the slave at the end of recovery. It should be in the log, maybe it's
easy enough to expose it at the SQL level…


XID doesn't help at all, LSN more likely, but I feel that I don't fully 
understand what you're saying.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failback with log shipping

2010-05-28 Thread Dimitri Fontaine

Heikki Linnakangas  writes:
> Assuming controlled shutdown and that the standby received all WAL from the
> old master before it was promoted, I think you can simply create a
> recovery.conf in the old master's data directory to turn it into a standby
> server, and restart. Am I missing something?

Would that mean that a controlled restart of the old master so that the
recovery stops before applying the logs that were not shipped to the
slave would put it in the same situation?

How easy is it to script that? It seems all we need is the current XID
of the slave at the end of recovery. It should be in the log, maybe it's
easy enough to expose it at the SQL level…

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specification for Trusted PLs?

2010-05-28 Thread Andrew Dunstan




Sam Mason wrote:

On Thu, May 27, 2010 at 11:09:26PM -0400, Tom Lane wrote:
  

David Fetter  writes:


I don't know about a *good* idea, but here's the one I've got.
  
1.  Make a whitelist.  This is what needs to work in order for a

language to be a fully functional trusted PL.
  

Well, I pretty much lose interest right here, because this is already
assuming that every potentially trusted PL is isomorphic in its
capabilities.



That's not normally a problem.  The conventional way would be to place
the interpreter in its own sandbox, similar to how Chrome has each tab
running in its own process.  These processes are protected in a way
so that the code running inside them can't do any harm--e.g. a ptrace
jail[1].  This is quite a change from existing pl implementations, and
present a different set of performance/compatibility issues.

  


I have my own translation of this last sentence.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specification for Trusted PLs?

2010-05-28 Thread Peter Eisentraut

On fre, 2010-05-28 at 13:03 +0100, Sam Mason wrote:
> That's not normally a problem.  The conventional way would be to place
> the interpreter in its own sandbox, similar to how Chrome has each tab
> running in its own process.  These processes are protected in a way
> so that the code running inside them can't do any harm--e.g. a ptrace
> jail[1].  This is quite a change from existing pl implementations, and
> present a different set of performance/compatibility issues.

Surely a definition of a trusted language that invalidates the existing
trusted languages is not going help resolve the issue.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Specification for Trusted PLs?

2010-05-28 Thread Sam Mason

On Thu, May 27, 2010 at 11:09:26PM -0400, Tom Lane wrote:
> David Fetter  writes:
> > I don't know about a *good* idea, but here's the one I've got.
> 
> > 1.  Make a whitelist.  This is what needs to work in order for a
> > language to be a fully functional trusted PL.
> 
> Well, I pretty much lose interest right here, because this is already
> assuming that every potentially trusted PL is isomorphic in its
> capabilities.

That's not normally a problem.  The conventional way would be to place
the interpreter in its own sandbox, similar to how Chrome has each tab
running in its own process.  These processes are protected in a way
so that the code running inside them can't do any harm--e.g. a ptrace
jail[1].  This is quite a change from existing pl implementations, and
present a different set of performance/compatibility issues.

> If that were so, there'd not be very much point in
> supporting multiple PLs.  A good example here is R.  I have no idea
> whether PL/R is trusted or trustworthy, but in any case the main point
> of supporting that PL is to allow access to the R statistical library.
> How does that fit into a whitelist designed for some other language?
> It doesn't.

AFAIU, a trusted language should only be able to perform computation,
e.g. not touch the local filesystem, beyond readonly access to library
code, and not see the network.  Policies such as these are easy to
enforce in a ptrace jail, and would still allow a trusted pl/r to do
whatever it wants to get any pure calculation done.  As soon as it needs
to touch the file system the language becomes non-trusted.

> > 3.  (the un-fun part) Write tests which attempt to do things not in
> > the whitelist.  We can start from the vulnerabilities so far
> > discovered.
> 
> And here is the *other* fatal problem: a whitelist does not in fact give
> any leverage at all for testing whether there is access to functionality
> outside the whitelist.  (It might be useful if you could enforce the
> whitelist at some sufficiently low level of the language implementation,
> but as a matter of testing, it does nothing for you.)  What you're
> suggesting isn't so much un-fun as un-possible.  Given a maze of twisty
> little subroutines all different, how will you find out if any of them
> contain calls of unwanted functionality?

A jail helps with a lot of this; the remainder is in the normal fact
that bug testing can only demonstrate the presence of bugs and you need
to do formal code proof to check for the absence of bugs.

-- 
  Sam  http://samason.me.uk/

 [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.5494

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT

2010-05-28 Thread Heikki Linnakangas


On 27/05/10 10:59, Takahiro Itagaki wrote:

Log Message:
---
Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT
independently from BUILDING_DLL. It is always __declspec(dllexport).


It looks like the Windows buildfarm members are not happy about this 
change...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mingw initdb failure on HEAD

2010-05-28 Thread Andrew Dunstan




Takahiro Itagaki wrote:

Andrew Dunstan  wrote:

  
Several buildfarm mingw members are getting failures like this, when 
running initdb:



Could it have been caused by the PGDLLIMPORT/PGDLLEXPORT changes?



Probably, but it's curious because MSVC members are OK.
Do we have special treatments for exported functions in mingw?
It might export 'dllimport' funtions/variables, but not 'dllexport' ones.


  


It has broken Cygwin as well, so that's two out of three Windows 
platforms that don't like this.


I am not sure what the best fix for the original problem is, but this 
isn't it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Working with PostgreSQL enums in C code

2010-05-28 Thread Robert Haas

On Fri, May 28, 2010 at 12:07 AM, Joseph Adams
 wrote:
> I learned that to return an enum value from C, one needs to return the
> OID of the right row of the pg_enum table.  I eventually managed to
> write the code below, which is mostly based on the enum_in function in
> src/backend/utils/adt/enum.c .

PG_RETURN macros shouldn't do any nontrivial processing (see the
existing ones for references).  I assume you have the enum labels
declared in pg_enum.h, so I think you can just return the correct OID
values directly.  Declare constants for them in pg_enum.h and then
just do PG_RETURN_OID(whatever).

#define JSONTypeNullOid   ...
#define JSONTypeStringOid ...
#define JSONTypeNumberOid ...

It really shouldn't be necessary to do a catalog lookup to retrieve a constant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Failback with log shipping

2010-05-28 Thread Heikki Linnakangas

At PGCon, several people asked me about restarting an old master as a 
standby after failover has happened. And it wasn't the first time people 
ask me about it, even before 9.0. We have no mention of that in the 
docs, which is a pretty serious oversight. What can we say about it?


I believe the current official policy is that you have to take a new 
base backup and restore from that. Rsync can be used to speed that up.


However, someone once asked me for a comment on a script he wrote to do 
that in a smarter way. I forget who and when and how exactly it worked, 
but it seems possible to do safely.


First of all, you have to shut down the master cleanly for this to work, 
otherwise there can be changes in the old master that never made it to 
the standby.


Assuming controlled shutdown and that the standby received all WAL from 
the old master before it was promoted, I think you can simply create a 
recovery.conf in the old master's data directory to turn it into a 
standby server, and restart. Am I missing something?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch submission deadline for CommitFest 2010-07

2010-05-28 Thread Fujii Masao

On Fri, May 28, 2010 at 4:08 PM, Heikki Linnakangas
 wrote:
> On 28/05/10 09:26, Fujii Masao wrote:
>>
>> When is the patch submission deadline for CommitFest 2010-07?
>> July 14? or June 14 (before review fest)? Sorry, I'm not sure
>> what is actually different between CF and RF.
>
> July 14. But if you finish the patch before June 14, it will get reviewed
> earlier, between June 14 and July 14. So aim for June 14 :-).

Thanks! I'll do my best :)

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tsvector pg_stats seems quite a bit off.

2010-05-28 Thread Jan Urbański

On 28/05/10 04:47, Tom Lane wrote:
> =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?=  writes:
>> On 19/05/10 21:01, Jesper Krogh wrote:
>>> In practice, just cranking the statistics estimate up high enough seems
>>> to solve the problem, but doesn't
>>> there seem to be something wrong in how the statistics are collected?
> 
>> The algorithm to determine most common vals does not do it accurately.
>> That would require keeping all lexemes from the analysed tsvectors in
>> memory, which would be impractical. If you want to learn more about the
>> algorithm being used, try reading
>> http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in
>> ts_typanalyze.c
> 
> I re-scanned that paper and realized that there is indeed something
> wrong with the way we are doing it.

> So I think we have to fix this. 

Hm, I'll try to take another look this evening (CEST).

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages

2010-05-28 Thread Heikki Linnakangas


On 27/05/10 22:56, Robert Haas wrote:

On Thu, May 27, 2010 at 3:52 PM, Kevin Grittner
  wrote:

Robert Haas  wrote:

On Thu, May 27, 2010 at 3:15 PM, Kevin Grittner



(a)  The tuples were written within the same transaction which
created or truncated the table.



In case (a), you mess up visibility with respect to other
command-IDs within the transaction.


Surely that problem is surmountable?


I proposed an idea at PGCon, but I believe Tom and Heikki thought it
was far too grotty to consider.


No, I think it's surmountable too. We discussed hacks to teach the MVCC 
checks that all frozen tuples on a table that was created in the same 
transaction (i.e. the same cases where we skip WAL logging) were 
actually created by the running transaction, and check commandid 
accordingly.


Or detect simple DML commands where we know that the command doesn't 
read the table. COPY would usually fall into that category, though 
non-immutable input functions make that a bit iffy.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch submission deadline for CommitFest 2010-07

2010-05-28 Thread Heikki Linnakangas


On 28/05/10 09:26, Fujii Masao wrote:

When is the patch submission deadline for CommitFest 2010-07?
July 14? or June 14 (before review fest)? Sorry, I'm not sure
what is actually different between CF and RF.


July 14. But if you finish the patch before June 14, it will get 
reviewed earlier, between June 14 and July 14. So aim for June 14 :-).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

60 matches

Mail list logo