Re: [HACKERS] Sync Rep: Second thoughts

2008-12-22 Thread Emmanuel Cecchet

Hi Markus,


I'm not quite sure what you mean by "certification protocol", there's no
such thing in Postgres-R (as proposed by Kemme). Although, I remember
having heard that term in the context of F. Pedone's work. Can you point
me to some paper explaining this certification protocol?
  
What Bettina calls the Lock Phase in 
http://www.cs.mcgill.ca/~kemme/papers/vldb00.pdf is actually a 
certification.
You can find more references to certification protocols in 
http://gorda.di.uminho.pt/download/reports/gapi.pdf
I would also recommend the work of Sameh on Tashkent and Taskent+ that 
was based on Postgres: 
http://labos.epfl.ch/webdav/site/labos/users/157494/public/papers/tashkent.eurosys2006.pdf 
and 
http://infoscience.epfl.ch/record/97654/files/tashkentPlus.eurosys2007.final.pdf

Certification-based
approaches have already multiple reliability issues to improve write
performance compared to statement-based replication, but this is very
dependent on the capacity of the system to limit the conflicting window
for concurrent transactions.



What do you mean by "reliability issues"?
  
These approaches usually require an atomic broadcast primitive that is 
usually fragile (limited scalability, hard to tune failure timeouts, ). 
Most prototype implementations have the load balancer and/or the 
certifier as a SPOF (single point of failure). Building reliability for 
these components will come with a significant performance penalty.

The writeset extraction mechanisms have had
too many limitations so far to allow the use of certification-based
replication in production (AFAIK).


What limitations are you speaking of here?
  
Oftentimes DDL support is very limited. Non-transactional objects like 
sequences are not captured.
Session or environment variables are not necessarily propagated. Support 
of temp tables varies between databases which makes it hard to support 
them properly in a generic way.
Well I guess everyone has a story on some limitations it has found with 
some database replication technology especially when a user expects a 
cluster to behave like a single database instance.


Happy holidays,
Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-22 Thread Fujii Masao
On Tue, Dec 23, 2008 at 4:24 AM, Heikki Linnakangas
 wrote:
> Heikki Linnakangas wrote:
>>
>> Peter Eisentraut wrote:
>>>
>>> Heikki Linnakangas wrote:

 I think we need a threshold similar to autovacuum_freeze_max_age for
 manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and
 relfrozenxid is older than vacuum_freeze_max_age, the visibility map is
 ignored and all pages are scanned.
>>>
>>> Would one parameter to control both suffice?  (i.e., rename
>>> autovacuum_freeze_max_age to vacuum_freeze_max_age)
>>
>> Imagine that you run a nightly VACUUM from cron, and have autovacuum
>> disabled. If autovacuum_freeze_max_age is the same as vacuum_freeze_max_age,
>> as soon as that age is reached, an anti-wraparound autovacuum launched. What
>> we'd want to happen is for the next nightly VACUUM to do the work instead.
>> So they need to be separate settings, with some space between them by
>> default.
>
> Attached is a proposed patch to handle freezing. In a nutshell:
>
> Introduce new vacuum_freeze_max_age setting. Manual VACUUM will scan the
> whole table and advance relfrozenxid, if relfrozenxid is older than
> vacuum_freeze_max_age.
>
> If vacuum_freeze_max_age is greater than autovacuum_freeze_max_age - 1
> million transactions, it's effectively capped at that value. It doesn't make
> sense to have vacuum_freeze_max_age > autovacuum_freeze_max_age, because the
> manual VACUUM wouldn't have a chance to do the full sweep before the
> anti-wraparound autovacuum is launched. The "minus one million transactions"
> is to give some headroom.
>
> I'm lowering vacuum_freeze_min_age from 100 million transactions to 50
> million, so that the whole-table vacuum doesn't need to run as often. Note
> that since VACUUM normally only scans pages that need vacuuming according to
> the visibility map, tuples on skipped pages are not frozen any earlier even
> though vacuum_freeze_min_age is lower.
>
> To recap, here's the new defaults:
> autovacuum_freeze_max_age   2
> vacuum_freeze_max_age   15000
> vacuum_freeze_min_age5000
>
> This means that with defaults, autovacuum will launch a whole-table vacuum
> every 150 million transactions (autovacuum_freeze_max_age -
> vacuum_freeze_min_age). If you run a nightly manual VACUUM, you'll get a
> whole-table vacuum every 100 million transactions.
>
> vacuum_freeze_max_age also affects autovacuums. If an autovacuum is launched
> on table to remove dead tuples, and vacuum_freeze_max_age has been reached
> (but not yet autovacuum_freeze_max_age), the autovacuum will scan the whole
> table. I'm not sure if this is desirable, to avoid having to launch separate
> anti-wraparound autovacuums even when there's not many dead tuples, or just
> confusing.
>
> If you set vacuum_freeze_max_age to 0, the visibility map is not used to
> skip pages, so you'll get the pre-8.4 old behavior.

It seems to be strange that "max" can be less than "min". Is it worth dividing
a parameter into two(min/max)?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Emmanuel Cecchet

Kevin,

If you want to know how to build SERIALIZABLE with a database that 
provides SI (Snapshot Isolation), read 
http://portal.acm.org/citation.cfm?doid=1376616.137669
Note that in practice, READ COMMITTED is the most largely used isolation 
level and its limitations are relatively well understood by the average 
programmer that can program his application accordingly. I still don't 
get why people would use SERIALIZABLE since there is no efficient 
implementation of it.


My 2 cents.
Emmanuel


Kevin Grittner wrote:

As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:
 
Given concurrent transactions T1 and T2 and non-overlapping sets of

data A and B, T1 reads data including A and uses the data to modify B
while T2 reads data including B and uses that data to modify A, where
the modifications performed by either would affect the modifications
made by the other, if they were visible.
 
For reasons I'll omit here, that scenario didn't worry me for my

current uses of PostgreSQL.
 
I've found another form of deviation from the standard SERIALIZABLE

behavior, though, which does worry me.  Although the above appears to be
the only situation where the end result after everything commits is
inconsistent with standard SERIALIZABLE behavior, the PostgreSQL
implementation allows transactions to view the data in states which
would never be possible during the application of the transactions in
series in the order they will appear to have been applied after the
commit.
 
Imagine, as an example, a system which involves recording receipts,

each of which must go into a daily deposit.  There is a control table
with one row containing the current deposit date for receipts. 
Somewhere mid-afternoon that date is updated, all subsequent receipts

fall into the new day, and a report is run listing the receipts for the
day and giving the deposit total.
 
Under a standard-compliant implementation of SERIALIZABLE, this is

straightforward: a transaction which is inserting a receipt selects the
deposit date to use in its transaction, and any SELECT of receipts for a
date prior to the current deposit date will see the accurate, final
data.  Under the PostgreSQL implementation, although data eventually
gets to a coherent state, there can be a window of time where a SELECT
can return an incomplete list of receipts for a date which appears to be
closed, even if all transactions for modifying and viewing data are
SERIALIZABLE.
 
-- setup

create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key, deposit_date
date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);

-- connection 1
start transaction isolation level serializable ;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2
start transaction isolation level serializable ;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;

-- connection 1
commit transaction;

-- connection 2
start transaction isolation level serializable ;
select * from receipt;
-- (All receipts for the 2008-12-22 deposit date now show.)
commit transaction;
 
At this point, SERIALIZABLE transactions appear to have worked, with

receipt 3 happening before the update of deposit_date; however, there
was a window of time when the update to deposit_date was visible and
receipt 3 was not.
 
This absolutely can't happen in a standard-compliant implementation.

At a minimum, this window where visible data lacks coherency should be
noted in the documentation.  I don't know if there's any way to fix
this without killing performance.
 
-Kevin


  



--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development & Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some semantic details of the window-function spec

2008-12-22 Thread Tom Lane
"Hitoshi Harada"  writes:
> 2008/12/23 Tom Lane :
>> * Unlike aggregates, there doesn't seem to be any concept of a window
>> function being attached to an outer-level query --- in fact 6.10 rule
>> 4 says that a window function's argument can't contain outer references
>> at all.  That seems excessively strong, but it does seem that there is
>> no point in a "winlevelsup" field in WindowFunc, nor in implementing
>> any infrastructure for outer-level window functions.

> I am so ignorant that I don't know what exactly agglevelsup
> represents. Just copied it from agg to window functions... Could
> someone show me easy example?

In something like

select ..., (select avg(a.x) from b), ... from a;

the avg() invocation is actually an aggregate of the outer query over a.
It's a constant so far as any one invocation of the sub-select on b
is concerned.  The SQL:2008 spec is pretty opaque about this (as it
is on a whole lot of matters :-() but if you read older versions like
SQL92 it's spelled out a bit more clearly.  I don't see any wording
suggesting that window functions are supposed to work this way, however.

>> * The last part of section 4.14 states that two different window
>> functions must be evaluated against the same sorted row ordering if
>> they have syntactically identical partition and ordering clauses,
>> even if the windows are otherwise distinct (in particular there
>> could be different framing clauses).  Since we don't currently implement
>> framing clauses the latter is not too interesting, but it's still true
>> that the patch as I currently have it doesn't fully meet that
>> requirement: if you intentionally specify separate but equivalent named
>> window definitions, it won't be smart enough to fold them together,
>> and you could end up with extra sorts happening and possibly a different
>> ordering of sort-peer rows.  How worried are we about that?

> Is it? I intended all equivalent windows are folded into one as far as
> equal(w1->partitionClause, w2->partitionClause) &&
> equal(w1->orderClause, w2->orderClause) is true.

Well, I rewrote that code pretty heavily because it didn't work per spec
as far as references to existing windows went.  The problem that remains
is that in something like

WINDOW  w1 as (partition by x),
w2 as (partition by x),
w3 as (w1 order by y),
w4 as (w2 order by y)

w3 and w4 are equivalent but it's pretty hard to recognize that.  And
even if we did recognize it, we couldn't simply fold them together into
a single windowClause entry without changing the way that the query
looks on reverse-listing.  (The patch as submitted doesn't even *have*
reverse-listing capability for WINDOW clauses, but I plan to fix that
tomorrow ...)

This is doubtless fixable with another level of indirection, I'm just
wondering how much trouble it's worth.  It seems like the only case
that will really arise in practice is duplicate anonymous windows
(identical OVER clauses), and the code does fold those together.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)

2008-12-22 Thread Fujii Masao
On Tue, Dec 23, 2008 at 5:18 AM, Heikki Linnakangas
 wrote:
> Simon Riggs wrote:
>>
>> On Wed, 2008-12-17 at 23:32 -0300, Alvaro Herrera wrote:
>>>
>>> Simon Riggs escribió:
>>>
 Please let me know how I can make the reviewer's job easier. Diagrams,
 writeups, whatever. Thanks,
>>>
>>> A link perhaps?
>>
>> There is much confusion on this point for which I'm very sorry.
>>
>> I originally wrote "infra" patch to allow it to be committed separately
>> in the Sept commitfest, to reduce size of the forthcoming hotstandby
>> patch. That didn't happen (no moans there) so the eventual "hotstandby"
>> patch includes all of what was the infra patch, plus the new code.
>>
>> So currently there is no separate "infra" patch. The two line items on
>> the CommitFest page are really just one large project. I would be in
>> favour of removing the "infra" lines from the CommitFest page.
>
> I think it's useful to review the "infra" part of the patch separately, so I
> split it out of the big patch again. I haven't looked at this in detail yet,
> but it compiles and passes regression tests.

Super! I would fix synch rep code based on this patch.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] encoding cleanups in cvs repo

2008-12-22 Thread Tom Lane
Magnus Hagander  writes:
> I have cleaned up a couple of badly broken encodings in cvs commit
> messages in:

> src/backend/utils/error/Attic/exc.c,v

Out of curiosity ... what problems exactly?  I just looked through my
last complete dump of CVS log history and didn't see anything funny
in the messages for exc.c ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Tom Lane
"Kevin Grittner"  writes:
> At this point, SERIALIZABLE transactions appear to have worked, with
> receipt 3 happening before the update of deposit_date; however, there
> was a window of time when the update to deposit_date was visible and
> receipt 3 was not.
 
> This absolutely can't happen in a standard-compliant implementation.

I think you mean "you'd like to believe that can't happen in a
standard-compliant implementation".  It doesn't include any of the
specific behaviors that are forbidden by the spec, though, so I'm less
than convinced.

An appropriate way to prevent the problem is probably for the
transaction that changes the deposit_date to take out a write-excluding
lock on the receipts table before it does so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lock conflict behavior?

2008-12-22 Thread Jeff Davis
On Mon, 2008-12-22 at 17:14 +0900, Tatsuo Ishii wrote:
> Also, it seems that an attacker could do a denial service attack if he
> could open session A and B, since other users on session C or
> following sessions will be blocked.

LOCK TABLE checks the permissions before attempting to acquire the lock,
is there a reason that ALTER TABLE doesn't?

Even if they don't have any rights to the table at all (not even
SELECT), there are still other problems. For instance, the user could
just wait for a long running query (or VACUUM) and issue the ALTER TABLE
at that time.

I know we don't make any guarantees about preventing denial-of-service
attacks from users that can connect, but if possible we should be
consistent about checking the permissions.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/pg_stat_statements 1212

2008-12-22 Thread Alex Hunsaker
On Mon, Dec 22, 2008 at 00:44, ITAGAKI Takahiro
 wrote:
>
> "Alex Hunsaker"  wrote:
>
>> A few comments:
>>
>> Is there a reason you add sourceText to QueryDesc? AFAICT you can do
>> ActivePortal->sourceText and it will always be populated correctly.
>
> That's for nested statements (SQLs called in stored functions).
> ActivePortal->sourceText shows text of only top most query.
>
>> I think the explain_analyze_format guc is a clever way of getting
>> around the explain analyze verbose you proposed earlier.  But I dont
>> see any doc updates for it.
>
> Sure, no docs for now. The guc approach is acceptable?
> (I'm not sure whether it is the best way...)
> If ok, I'll write docs for it.

I dunno, Im hopping that splitting up the patches and making the
change more visible some more people might chime in :)

>> Im still not overly fond of the "statistics." custom guc name, but
>> what can you do...
>
> I have no obsessions with the name. The "pg_stat_statements.*" might
> be better to avoid confliction of prefix. If so, we'd better to rename
> variables to kill duplication of "statements" from the names.
>
> Ex.
>statistics.max_statements   -> pg_stat_statements.limit
>statistics.track_statements -> pg_stat_statements.target

How about just pg_stat_statements.track ?

>statistics.saved_file   -> pg_stat_statements.saved_file

I do like the consistency of having the custom gucs be the same as the
module name, easy to grep or google for.

>> Other than that it looks good, though I admit I have not had the time
>> to sit down and thoroughly test it yet...
>
> I found another bug in my patch.
>
> [pg_stat_statements-1212.patch # pg_stat_statements()]
>SpinLockAcquire(&entry->mutex);
>values[i++] = Int64GetDatumFast(entry->counters.calls);
>values[i++] = Float8GetDatumFast(entry->counters.total_time);
>values[i++] = Float8GetDatumFast(entry->counters.cpu_user);
>values[i++] = Float8GetDatumFast(entry->counters.cpu_sys);
>values[i++] = Int64GetDatumFast(entry->counters.gets);
>values[i++] = Int64GetDatumFast(entry->counters.reads);
>values[i++] = Int64GetDatumFast(entry->counters.lreads);
>values[i++] = Int64GetDatumFast(entry->counters.rows);
>SpinLockRelease(&entry->mutex);
>
> The variables are not protected by spinlock actually when float64 and
> int64 are passed by reference (especially on 32bit platform).
> It'd be better to copy values:
>
>Counterstmp;
>/* copy the actual values in spinlock */
>SpinLockAcquire(&entry->mutex);
>tmp = entry->counters;
>SpinLockRelease(&entry->mutex);
>/* create a tuple after lock is released. */
>values[i++] = Int64GetDatumFast(tmp.calls);
>values[i++] = Float8GetDatumFast(tmp.total_time);
>...

Ive only been testing on 64bit... maybe thats why I never ran into this.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HAVE_FSEEKO for WIN32

2008-12-22 Thread Bruce Momjian
Andrew Dunstan wrote:
> 
> Cleaning up the parallel restore patch I came across a question I might 
> have asked before, but one which in any case I worked around:
> 
> Why do we carefully define fseeko() for WIN32 but then not define 
> HAVE_FSEEKO, which makes doing the former pretty much pointless?

Well, we are doing something odd here but it might not be what you
think.

We currently use fseeko() only in pg_dump.  We define C code in /port
for some Unix platforms that don't support fseeko.

For platforms that don't support fseeko and don't have /port support for
it we just use fseek() in port.h:

#ifndef HAVE_FSEEKO
#define fseeko(a, b, c) fseek(a, b, c)
#define ftello(a)   ftell(a)
#endif

but then for Win32 we #undef fseeko and redefine it:

#ifdef WIN32
#define pgoff_t __int64
#undef fseeko
#undef ftello
#ifdef WIN32_ONLY_COMPILER
#define fseeko(stream, offset, origin) _fseeki64(stream, offset, origin)
#define ftello(stream) _ftelli64(stream)
#else
#define fseeko(stream, offset, origin) fseeko64(stream, offset, origin)
#define ftello(stream) ftello64(stream)
#endif
#else
#define pgoff_t off_t
#endif

Clearly this code should be moved into port.h, I think.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] affected rows count

2008-12-22 Thread Grzegorz Jaskiewicz


On 2008-12-22, at 22:35, Dawid Kuroczko wrote:



atlantis=> CREATE OR REPLACE FUNCTION foo_trigger() RETURNS trigger AS
$$ BEGIN UPDATE bar SET t=NEW.t WHERE i=NEW.i; RETURN NULL; END; $$
LANGUAGE plpgsql;
atlantis=> CREATE TRIGGER foo_update BEFORE UPDATE ON foo FOR EACH ROW
EXECUTE PROCEDURE foo_trigger();
CREATE TRIGGER
CREATE FUNCTION
atlantis=> UPDATE foo SET t='##'||t;
UPDATE 0
^^

Grzegorz means such a situation.  Personally I understand the current
behavior to be correct -- since no row in that table is updated.


that's not quite what I meant. It is correct behavior in this  
situation - since we didn't update anything.
Like I said, it is my bad - I should have tested it before emailing - 
hackers... So far my ratio of useful emails here is very low. :/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] affected rows count

2008-12-22 Thread Dawid Kuroczko
On Mon, Dec 22, 2008 at 9:07 PM, Bruce Momjian  wrote:
> Grzegorz Jaskiewicz wrote:
>> Hey folks,
>>
>> It doesn't stop to bug me, that postgres will return 0 number of
>> affected rows, if table is triggered.
>> Now, question is - is this fixable, but no one cares, or is it some
>> sort of a design/implementation flaw and we just have to live with it.
>
> Would you show us an example of your problem?

If I understand the problem correctly:

atlantis=> CREATE TABLE foo (i int PRIMARY KEY, t text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
CREATE TABLE
atlantis=> CREATE TABLE bar (i int PRIMARY KEY, t text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"bar_pkey" for table "bar"
CREATE TABLE
atlantis=> INSERT INTO foo (i,t) SELECT n, '#'||n FROM
generate_series(0,99) AS g(n);
INSERT 0 100
atlantis=> INSERT INTO bar (i) SELECT i FROM foo;
INSERT 0 100
atlantis=> UPDATE foo SET t='##'||t;
UPDATE 100

atlantis=> CREATE OR REPLACE FUNCTION foo_trigger() RETURNS trigger AS
$$ BEGIN UPDATE bar SET t=NEW.t WHERE i=NEW.i; RETURN NULL; END; $$
LANGUAGE plpgsql;
atlantis=> CREATE TRIGGER foo_update BEFORE UPDATE ON foo FOR EACH ROW
EXECUTE PROCEDURE foo_trigger();
CREATE TRIGGER
CREATE FUNCTION
atlantis=> UPDATE foo SET t='##'||t;
UPDATE 0
^^

Grzegorz means such a situation.  Personally I understand the current
behavior to be correct -- since no row in that table is updated.

OTOH when you use triggers for emulating table partitioning it leads
to confusion (parent table was not updated, but the child table is
(or isn't because there were really 0 rows updated -- you can't really tell)).

   Best regards,
 Dawid
-- 
  ..``The essence of real creativity is a certain
 : *Dawid Kuroczko* : playfulness, a flitting from idea to idea
 : qne...@gmail.com : without getting bogged down by fixated demands.''
 `..'  Sherkaner Underhill, A Deepness in the Sky, V. Vinge

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] affected rows count

2008-12-22 Thread Grzegorz Jaskiewicz


On 2008-12-22, at 21:07, Bruce Momjian wrote:


Grzegorz Jaskiewicz wrote:

Hey folks,

It doesn't stop to bug me, that postgres will return 0 number of
affected rows, if table is triggered.
Now, question is - is this fixable, but no one cares, or is it some
sort of a design/implementation flaw and we just have to live with  
it.


Would you show us an example of your problem?
Dunno what's wrong with me lately. I was under impression, that about  
1/2 year ago on 8.1 I wasn't able to get row count anymore if there  
was a trigger on a table. Well, affected row count would be always 0  
than.
But trying now on cvs head, it all works great. heh, I am terribly  
sorry...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] HAVE_FSEEKO for WIN32

2008-12-22 Thread Andrew Dunstan


Cleaning up the parallel restore patch I came across a question I might 
have asked before, but one which in any case I worked around:


Why do we carefully define fseeko() for WIN32 but then not define 
HAVE_FSEEKO, which makes doing the former pretty much pointless?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] affected rows count

2008-12-22 Thread Jeff Davis
On Mon, 2008-12-22 at 15:07 -0500, Bruce Momjian wrote:
> Grzegorz Jaskiewicz wrote:
> > Hey folks,
> > 
> > It doesn't stop to bug me, that postgres will return 0 number of  
> > affected rows, if table is triggered.
> > Now, question is - is this fixable, but no one cares, or is it some  
> > sort of a design/implementation flaw and we just have to live with it.
> 
> Would you show us an example of your problem?
> 

This may not be the problem he's talking about, but it's bothered me for
a while that there is no way to control the value returned for the
affected rows.

For instance, if you have an updatable view that uses a function that
updates a table in a remote database, it would be nice to be able to
pass that value back to the client.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] affected rows count

2008-12-22 Thread Bruce Momjian
Grzegorz Jaskiewicz wrote:
> Hey folks,
> 
> It doesn't stop to bug me, that postgres will return 0 number of  
> affected rows, if table is triggered.
> Now, question is - is this fixable, but no one cares, or is it some  
> sort of a design/implementation flaw and we just have to live with it.

Would you show us an example of your problem?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2008-12-22 Thread Alvaro Herrera
Kenneth Marshall wrote:
> Dear PostgreSQL developers,
> 
> I am re-sending this to keep this last change to the
> internal hash function on the radar.

Please add it to the commitfest wiki page,
http://wiki.postgresql.org/wiki/CommitFest_2008-11

Thanks

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1

2008-12-22 Thread Kenneth Marshall
Dear PostgreSQL developers,

I am re-sending this to keep this last change to the
internal hash function on the radar.

Ken

Sorry about the delay for this update to the new hash
index implementation. I was trying to get the WAL logging
in place and forgot to post the actual patch. The WAL
for hash indexes will need to wait for 8.5, but I did
want to add back in the piece of the Bob Jenkins 2006
hash function that was stripped out of the initial
patch on application due to concerns about the randomness
of the resulting hash values. Here is a re-post of my
initial findings comparing the old/new Jenkins hash
from lookup2 and lookup3. I have added a third column
containing the results for the hash_any() resulting
from the attached patch as well as simple timing test
for a DB reindex both before and after patching.

Also attached is a simple documentation patch updating
the note attached to the hash index description.

Regards,
Ken

Hi,

I have finally had a chance to do some investigation on
the performance of the old hash mix() function versus
the updated mix()/final() in the new hash function. Here
is a table of my current results for both the old and the
new hash function. In this case cracklib refers to the
cracklib-dict containing 1648379 unique words massaged
in various ways to generate input strings for the hash
functions. The result is the number of collisions in the
hash values generated.

hash inputoldnew  newv2
--------  -
cracklib  338316  338
cracklib x 2 (i.e. clibclib)  305319  300
cracklib x 3 (clibclibclib)   323329  315
cracklib x 10 302310  329
cracklib x 100350335  298
cracklib x 1000   314309  315
cracklib x 100 truncated to char(100) 311327  320

uint32 from 1-1648379 309319  347
(uint32 1-1948379)*256309314  304
(uint32 1-1948379)*16 310314  324
"a"uint32 (i.e. a1,a0002...)  320321  312

uint32uint32 (i.e. uint64)321287  309

The different result columns are old = Jenkins 1996 hash
function(lookup2.c), new = Jenkins 2006 hash function
(lookup3.c), and newv2 = adaptation of current hash_any()
to incorporate the separate mix()/final() functions. As
you can see from the results, spliting the mix() and final()
apart does not result in any perceptible loss of randomness
in the hash assignment. I also ran a crude timing for a
reindex of the following database:

CREATE TABLE dict (word text);
CREATE INDEX wordhash ON dict USING hash (word);
INSERT INTO dict (word) VALUES('s;lhkjdpyoijxfg;lktjgh;sdlhkjo');
INSERT INTO dict (SELECT MAX(word)||MAX(word) FROM dict);
... (21 times)

REINDEX TABLE
...

The average time to reindex the table using our current
hash_any() without the separate mix()/final() was 1696ms
and 1482ms with the separate mix()/final() stages giving
almost 13% better performance for this stupid metric.
--- indices.sgml2008-10-13 14:40:06.0 -0500
+++ indices.sgml.NEW2008-11-04 12:42:35.0 -0600
@@ -190,13 +190,11 @@
 
   

-Testing has shown PostgreSQL's hash
-indexes to perform no better than B-tree indexes, and the
-index size and build time for hash indexes is much worse.
-Furthermore, hash index operations are not presently WAL-logged,
+PostgreSQL's hash indexes provide
+the fast O(1) lookups, even for very large objects.
+Hash index operations are not presently WAL-logged,
 so hash indexes might need to be rebuilt with REINDEX
-after a database crash.
-For these reasons, hash index use is presently discouraged.
+after a database crash. 

   
 
--- hashfunc.c.ORIG 2008-09-03 13:07:14.0 -0500
+++ hashfunc.c.NEW  2008-11-04 08:36:16.0 -0600
@@ -200,39 +200,94 @@
  * hash function, see http://burtleburtle.net/bob/hash/doobs.html,
  * or Bob's article in Dr. Dobb's Journal, Sept. 1997.
  *
- * In the current code, we have adopted an idea from Bob's 2006 update
- * of his hash function, which is to fetch the data a word at a time when
- * it is suitably aligned.  This makes for a useful speedup, at the cost
- * of having to maintain four code paths (aligned vs unaligned, and
- * little-endian vs big-endian).  Note that we have NOT adopted his newer
- * mix() function, which is faster but may sacrifice some randomness.
+ * In the current code, we have adopted Bob's 2006 update of his hash
+ * which fetches the data a word at a time when it is suitably aligned.
+ * This makes for a useful speedup, at the cost of having to maintain
+ * four code paths (aligned vs unaligned, and little-endian vs big-endian).
+ * It also two separate mixing functions mix() and final() instead
+ * of a single multi-purpose function, that is s

Re: [HACKERS] Some semantic details of the window-function spec

2008-12-22 Thread Hitoshi Harada
2008/12/23 Tom Lane :
> * Unlike aggregates, there doesn't seem to be any concept of a window
> function being attached to an outer-level query --- in fact 6.10 rule
> 4 says that a window function's argument can't contain outer references
> at all.  That seems excessively strong, but it does seem that there is
> no point in a "winlevelsup" field in WindowFunc, nor in implementing
> any infrastructure for outer-level window functions.

I am so ignorant that I don't know what exactly agglevelsup
represents. Just copied it from agg to window functions... Could
someone show me easy example?

>
> * The last part of section 4.14 states that two different window
> functions must be evaluated against the same sorted row ordering if
> they have syntactically identical partition and ordering clauses,
> even if the windows are otherwise distinct (in particular there
> could be different framing clauses).  Since we don't currently implement
> framing clauses the latter is not too interesting, but it's still true
> that the patch as I currently have it doesn't fully meet that
> requirement: if you intentionally specify separate but equivalent named
> window definitions, it won't be smart enough to fold them together,
> and you could end up with extra sorts happening and possibly a different
> ordering of sort-peer rows.  How worried are we about that?

Is it? I intended all equivalent windows are folded into one as far as
equal(w1->partitionClause, w2->partitionClause) &&
equal(w1->orderClause, w2->orderClause) is true. And I believe
everytime it's true, we can process them in the same window ordering.
One thing, in the former discussion someone pointed me that we might
have to care about that volatile functions are contained in the
window. I currently don't have any idea about that.


Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-22 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

Peter Eisentraut wrote:

Heikki Linnakangas wrote:
I think we need a threshold similar to autovacuum_freeze_max_age for 
manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and 
relfrozenxid is older than vacuum_freeze_max_age, the visibility map 
is ignored and all pages are scanned.


Would one parameter to control both suffice?  (i.e., rename 
autovacuum_freeze_max_age to vacuum_freeze_max_age)


Imagine that you run a nightly VACUUM from cron, and have autovacuum 
disabled. If autovacuum_freeze_max_age is the same as 
vacuum_freeze_max_age, as soon as that age is reached, an 
anti-wraparound autovacuum launched. What we'd want to happen is for the 
next nightly VACUUM to do the work instead. So they need to be separate 
settings, with some space between them by default.


Attached is a proposed patch to handle freezing. In a nutshell:

Introduce new vacuum_freeze_max_age setting. Manual VACUUM will scan the 
whole table and advance relfrozenxid, if relfrozenxid is older than 
vacuum_freeze_max_age.


If vacuum_freeze_max_age is greater than autovacuum_freeze_max_age - 1 
million transactions, it's effectively capped at that value. It doesn't 
make sense to have vacuum_freeze_max_age > autovacuum_freeze_max_age, 
because the manual VACUUM wouldn't have a chance to do the full sweep 
before the anti-wraparound autovacuum is launched. The "minus one 
million transactions" is to give some headroom.


I'm lowering vacuum_freeze_min_age from 100 million transactions to 50 
million, so that the whole-table vacuum doesn't need to run as often. 
Note that since VACUUM normally only scans pages that need vacuuming 
according to the visibility map, tuples on skipped pages are not frozen 
any earlier even though vacuum_freeze_min_age is lower.


To recap, here's the new defaults:
autovacuum_freeze_max_age   2
vacuum_freeze_max_age   15000
vacuum_freeze_min_age5000

This means that with defaults, autovacuum will launch a whole-table 
vacuum every 150 million transactions (autovacuum_freeze_max_age - 
vacuum_freeze_min_age). If you run a nightly manual VACUUM, you'll get a 
whole-table vacuum every 100 million transactions.


vacuum_freeze_max_age also affects autovacuums. If an autovacuum is 
launched on table to remove dead tuples, and vacuum_freeze_max_age has 
been reached (but not yet autovacuum_freeze_max_age), the autovacuum 
will scan the whole table. I'm not sure if this is desirable, to avoid 
having to launch separate anti-wraparound autovacuums even when there's 
not many dead tuples, or just confusing.


If you set vacuum_freeze_max_age to 0, the visibility map is not used to 
skip pages, so you'll get the pre-8.4 old behavior.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7493ca9..9848ce0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3925,6 +3925,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  vacuum_freeze_max_age (integer)
+  
+   vacuum_freeze_max_age configuration parameter
+  
+  
+   
+VACUUM performs a whole-table scan if the table's
+pg_class.relfrozenxid field reaches the
+age specified by this setting.  The default is 150 million
+transactions.  Although users can set this value anywhere from zero to
+one billion, VACUUM will silently limit the effective value
+to the value of  minus
+1 million transactions, so that regular manual VACUUM has a
+chance to run before autovacuum is launched to prevent XID wraparound.
+For more information see .
+   
+  
+ 
+
  
   vacuum_freeze_min_age (integer)
   
@@ -3935,7 +3955,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 Specifies the cutoff age (in transactions) that VACUUM
 should use to decide whether to replace transaction IDs with
 FrozenXID while scanning a table.
-The default is 100 million transactions.  Although
+The default is 50 million transactions.  Although
 users can set this value anywhere from zero to one billion,
 VACUUM will silently limit the effective value to half
 the value of , so
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 03ce2e9..c41d464 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -367,10 +367,14 @@

 

-VACUUM's behavior is controlled by the configuration parameter
-: any XID older than
-vacuum_freeze_min_age transactions is replaced by
-FrozenXID.  Larger values of vacuum_freeze_min_age
+VACUUM's behavior is controlled by the two configuration
+parameters:  and
+.
+vacuum_freeze_max_age controls when VACUUM
+performs a full sweep of the table to replace old XID v

Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Kevin Grittner
>>> Martijn van Oosterhout  wrote:
> On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:
>> As I've understood limitations of the PostgreSQL implementation of
>> SERIALIZABLE transactions, at least the only example given in the
>> documentation, revolve around a rather unlikely situation:
>>  
>> Given concurrent transactions T1 and T2 and non-overlapping sets of
>> data A and B, T1 reads data including A and uses the data to modify
B
>> while T2 reads data including B and uses that data to modify A,
where
>> the modifications performed by either would affect the
modifications
>> made by the other, if they were visible.
>
> In so far as the "modifications" are just INSERTs (no UPDATEs or
> DELETEs), yes. This case is covered in the documentation.
 
Let's not understate the scope of the issue.
UPDATEs and DELETEs count, too.  For example:
 
-- connection 1
cir=> create table mytab (class int not null, value int not null);
CREATE TABLE
cir=> copy mytab from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 110
>> 120
>> 2100
>> 2200
>> \.
cir=> start transaction isolation level serializable;
START TRANSACTION
cir=> update mytab set value = (select sum(value) from mytab where
class = 2) where class = 1 and value = 10;
UPDATE 1
 
-- connection 2
cir=> start transaction isolation level serializable;
START TRANSACTION
cir=> update mytab set value = (select sum(value) from mytab where
class = 1) where class = 2 and value = 100;
UPDATE 1
cir=> commit transaction;
COMMIT
 
-- connection 1
cir=> commit transaction;
COMMIT
cir=> select * from mytab;
 class | value
---+---
 1 |20
 2 |   200
 1 |   300
 2 |30
(4 rows)
 
>> Imagine, as an example, a system which involves recording receipts,
>> each of which must go into a daily deposit.  There is a control
table
>> with one row containing the current deposit date for receipts. 
>> Somewhere mid-afternoon that date is updated, all subsequent
receipts
>> fall into the new day, and a report is run listing the receipts for
the
>> day and giving the deposit total.
>
> This is a variation of the above and has the same "proper" solution:
> predicate locking. However, in this case the records in question are
> already present so you can workaround it easily. First do a SELECT
FOR
> UPDATE on all the records you want to update. This will serialize
all
> parallel transactions to either before or after you. Then do your
> update.
 
My point isn't that there aren't workarounds, it is that people might
reasonably assume that SERIALIZABLE transactions provide sufficient
concurrency control for this, since the only example we give of a
problem is a rather contrived update anomaly.  The fact that even in
cases where the data settles into good form at commit leave windows
where race conditions could cause occasional bad results without extra
explicit locking is not obvious.
 
>> This absolutely can't happen in a standard-compliant
implementation.
>> At a minimum, this window where visible data lacks coherency should
be
>> noted in the documentation.  I don't know if there's any way to fix
>> this without killing performance.
>
> Predicate locking is nasty and we don't try. I'm not sure if anybody
> else does.
 
I know for a fact that Sybase ASE does.  I've heard from reasonably
reliable sources that DB2 does.  I know that Microsoft SQL Server did
for some time after the split from the Sybase code base, but I'm not
sure they've continued that; in fact there was a reference to
concurrency issues in wikipedia which implied that they no longer do.
 
The implementation is not pretty -- they do it by locking accessed
pages and insertion points, and in some cases entire tables.  It is so
ugly that at one point in discussing similar issues Tom said that it
couldn't really qualify as predicate locking, but in the face of the
fact that they have covered all the bases to provide true serializable
transactions, and that theory says that only predicate locking can do
that, he conceded that it was predicate locking -- but really ugly and
in a form he would never support.
 
Anyway, I didn't argue that we should provide truly serializable
transactions, just that we should provide a less contrived example of
where the PostgreSQL implementation can show anomalies, so that people
don't get burned through a false sense of security.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Martijn van Oosterhout
On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote:
> As I've understood limitations of the PostgreSQL implementation of
> SERIALIZABLE transactions, at least the only example given in the
> documentation, revolve around a rather unlikely situation:
>  
> Given concurrent transactions T1 and T2 and non-overlapping sets of
> data A and B, T1 reads data including A and uses the data to modify B
> while T2 reads data including B and uses that data to modify A, where
> the modifications performed by either would affect the modifications
> made by the other, if they were visible.

In so far as the "modifications" are just INSERTs (no UPDATEs or
DELETEs), yes. This case is covered in the documentation.

> Imagine, as an example, a system which involves recording receipts,
> each of which must go into a daily deposit.  There is a control table
> with one row containing the current deposit date for receipts. 
> Somewhere mid-afternoon that date is updated, all subsequent receipts
> fall into the new day, and a report is run listing the receipts for the
> day and giving the deposit total.

This is a variation of the above and has the same "proper" solution:
predicate locking. However, in this case the records in question are
already present so you can workaround it easily. First do a SELECT FOR
UPDATE on all the records you want to update. This will serialize all
parallel transactions to either before or after you. Then do your
update.

> This absolutely can't happen in a standard-compliant implementation.
> At a minimum, this window where visible data lacks coherency should be
> noted in the documentation.  I don't know if there's any way to fix
> this without killing performance.

Predicate locking is nasty and we don't try. I'm not sure if anybody
else does.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


[HACKERS] incoherent view of serializable transactions

2008-12-22 Thread Kevin Grittner
As I've understood limitations of the PostgreSQL implementation of
SERIALIZABLE transactions, at least the only example given in the
documentation, revolve around a rather unlikely situation:
 
Given concurrent transactions T1 and T2 and non-overlapping sets of
data A and B, T1 reads data including A and uses the data to modify B
while T2 reads data including B and uses that data to modify A, where
the modifications performed by either would affect the modifications
made by the other, if they were visible.
 
For reasons I'll omit here, that scenario didn't worry me for my
current uses of PostgreSQL.
 
I've found another form of deviation from the standard SERIALIZABLE
behavior, though, which does worry me.  Although the above appears to be
the only situation where the end result after everything commits is
inconsistent with standard SERIALIZABLE behavior, the PostgreSQL
implementation allows transactions to view the data in states which
would never be possible during the application of the transactions in
series in the order they will appear to have been applied after the
commit.
 
Imagine, as an example, a system which involves recording receipts,
each of which must go into a daily deposit.  There is a control table
with one row containing the current deposit date for receipts. 
Somewhere mid-afternoon that date is updated, all subsequent receipts
fall into the new day, and a report is run listing the receipts for the
day and giving the deposit total.
 
Under a standard-compliant implementation of SERIALIZABLE, this is
straightforward: a transaction which is inserting a receipt selects the
deposit date to use in its transaction, and any SELECT of receipts for a
date prior to the current deposit date will see the accurate, final
data.  Under the PostgreSQL implementation, although data eventually
gets to a coherent state, there can be a window of time where a SELECT
can return an incomplete list of receipts for a date which appears to be
closed, even if all transactions for modifying and viewing data are
SERIALIZABLE.
 
-- setup
create table ctl (k text not null primary key, deposit_date date not
null);
insert into ctl values ('receipt', date '2008-12-22');
create table receipt (receipt_no int not null primary key, deposit_date
date not null, amount numeric(13,2));
insert into receipt values (1, (select deposit_date from ctl where k =
'receipt'), 1.00);
insert into receipt values (2, (select deposit_date from ctl where k =
'receipt'), 2.00);

-- connection 1
start transaction isolation level serializable ;
insert into receipt values (3, (select deposit_date from ctl where k =
'receipt'), 4.00);

-- connection 2
start transaction isolation level serializable ;
update ctl set deposit_date = date '2008-12-23' where k = 'receipt';
commit transaction;
start transaction isolation level serializable ;
select * from ctl;
-- (deposit_date shows as 2008-12-23)
select * from receipt;
-- (Only receipts 1 and 2 show for 2008-12-22.)
commit;

-- connection 1
commit transaction;

-- connection 2
start transaction isolation level serializable ;
select * from receipt;
-- (All receipts for the 2008-12-22 deposit date now show.)
commit transaction;
 
At this point, SERIALIZABLE transactions appear to have worked, with
receipt 3 happening before the update of deposit_date; however, there
was a window of time when the update to deposit_date was visible and
receipt 3 was not.
 
This absolutely can't happen in a standard-compliant implementation.
At a minimum, this window where visible data lacks coherency should be
noted in the documentation.  I don't know if there's any way to fix
this without killing performance.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Some semantic details of the window-function spec

2008-12-22 Thread Tom Lane
After a couple of hours reading the SQL:2008 spec, I've come to some
conclusions about the semantics that are demanded for window functions.
Anyone want to to check my work?

* If window functions are used together with aggregates or grouping,
the grouping and regular aggregation happens first, and then windowing
is done on the output rows (group rows).  See section 7.12 rules 11,12.

* Therefore, the argument of a window function can contain aggregate
functions, and cannot reference ungrouped columns (both unlike regular
aggregates).  Conversely, an ordinary aggregate's argument can't
contain a window function.

* Also, window specifications can contain aggregate functions, and
cannot reference ungrouped columns; this is because they represent
another layer of grouping/ordering on top of the GROUP BY if any.
(An explicit ORDER BY, if any, happens last of all.)

* It is not very clear whether a window function's argument can contain
another window function.  7.11 forbids window specifications from
containing window functions, which is sensible, but I can't find any
such statement about the arguments.  The present patch forbids nesting
window functions, and I'm fine with imposing that as an implementation
restriction even if it's not in the spec; but can anyone find it in the
spec?

* Unlike aggregates, there doesn't seem to be any concept of a window
function being attached to an outer-level query --- in fact 6.10 rule
4 says that a window function's argument can't contain outer references
at all.  That seems excessively strong, but it does seem that there is
no point in a "winlevelsup" field in WindowFunc, nor in implementing
any infrastructure for outer-level window functions.

* The last part of section 4.14 states that two different window
functions must be evaluated against the same sorted row ordering if
they have syntactically identical partition and ordering clauses,
even if the windows are otherwise distinct (in particular there
could be different framing clauses).  Since we don't currently implement
framing clauses the latter is not too interesting, but it's still true
that the patch as I currently have it doesn't fully meet that
requirement: if you intentionally specify separate but equivalent named
window definitions, it won't be smart enough to fold them together,
and you could end up with extra sorts happening and possibly a different
ordering of sort-peer rows.  How worried are we about that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DTrace probes patch

2008-12-22 Thread Robert Lor

Tom Lane wrote:

Robert Lor  writes:
  

Tom Lane wrote:


I agree.  If the probe is meant to track only *some* WAL writes
then it needs to be named something less generic than
TRACE_POSTGRESQL_WAL_BUFFER_WRITE.

  

How about change it to TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY similar to 
TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY for shared buffers?



Works for me...



Attached is the patch for the above name change.


-Robert
Index: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.324
diff -u -3 -p -r1.324 xlog.c
--- src/backend/access/transam/xlog.c   17 Dec 2008 01:39:03 -  1.324
+++ src/backend/access/transam/xlog.c   22 Dec 2008 16:28:00 -
@@ -1318,14 +1318,14 @@ AdvanceXLInsertBuffer(bool new_segment)
 * Have to write buffers while holding insert 
lock. This is
 * not good, so only write as much as we 
absolutely must.
 */
-   TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START();
+   TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START();
WriteRqst.Write = OldPageRqstPtr;
WriteRqst.Flush.xlogid = 0;
WriteRqst.Flush.xrecoff = 0;
XLogWrite(WriteRqst, false, false);
LWLockRelease(WALWriteLock);
Insert->LogwrtResult = LogwrtResult;
-   TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE();
+   TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
}
}
}
Index: src/backend/utils/probes.d
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v
retrieving revision 1.4
diff -u -3 -p -r1.4 probes.d
--- src/backend/utils/probes.d  17 Dec 2008 01:39:04 -  1.4
+++ src/backend/utils/probes.d  22 Dec 2008 16:28:01 -
@@ -89,6 +89,6 @@ provider postgresql {
 
probe xlog__insert(unsigned char, unsigned char);
probe xlog__switch();
-   probe wal__buffer__write__start();
-   probe wal__buffer__write__done();
+   probe wal__buffer__write__dirty__start();
+   probe wal__buffer__write__dirty__done();
 };

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] generic reloptions improvement

2008-12-22 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:

> Alvaro Herrera  wrote:
> 
> > Here's a patch for improving the general reloptions mechanism.  What
> > this patch does is add a table-based option parser.  This allows adding
> > new options very easily, and stops the business of having to pass the
> > minimum and default fillfactor each time you want the reloptions
> > processed.
> 
> You use struct relopt_gen (and its subclasses) for the purpose of
> both "definition of options" and "parsed result". But I think
> it is cleaner to separete parsed results into another struct
> something like:

Thanks for the suggestion -- yes, it is better as you suggest.  I think
putting the default on the same struct was just out of laziness at
first, and inertia later.

Here's the next version, which also fixes some particularly embarrasing
bugs.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Index: src/backend/access/common/reloptions.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/common/reloptions.c,v
retrieving revision 1.11
diff -c -p -r1.11 reloptions.c
*** src/backend/access/common/reloptions.c	23 Jul 2008 17:29:53 -	1.11
--- src/backend/access/common/reloptions.c	22 Dec 2008 16:14:40 -
***
*** 15,20 
--- 15,23 
  
  #include "postgres.h"
  
+ #include "access/gist_private.h"
+ #include "access/hash.h"
+ #include "access/nbtree.h"
  #include "access/reloptions.h"
  #include "catalog/pg_type.h"
  #include "commands/defrem.h"
***
*** 22,29 
--- 25,157 
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/guc.h"
+ #include "utils/memutils.h"
  #include "utils/rel.h"
  
+ /*
+  * Contents of pg_class.reloptions
+  *
+  * To add an option:
+  *
+  * (i) decide on a class (integer, real, bool), name, default value, upper
+  * and lower bounds (if applicable).
+  * (ii) add a record below.
+  * (iii) add it to StdRdOptions if appropriate
+  * (iv) add a block to the appropriate handling routine (probably
+  * default_reloptions)
+  * (v) don't forget to document the option
+  *
+  * Note that we don't handle "oids" in relOpts because it is handled by
+  * interpretOidsOption().
+  */
+ 
+ static relopt_bool boolRelOpts[] =
+ {
+ 	/* list terminator */
+ 	{ { NULL } }
+ };
+ 
+ static relopt_int intRelOpts[] =
+ {
+ 	{
+ 		{
+ 			"fillfactor",
+ 			"Packs table pages only to this percentage",
+ 			RELOPT_KIND_HEAP
+ 		},
+ 		HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100
+ 	},
+ 	{
+ 		{
+ 			"fillfactor",
+ 			"Packs btree index pages only to this percentage",
+ 			RELOPT_KIND_BTREE
+ 		},
+ 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100
+ 	},
+ 	{
+ 		{
+ 			"fillfactor",
+ 			"Packs hash index pages only to this percentage",
+ 			RELOPT_KIND_HASH
+ 		},
+ 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100
+ 	},
+ 	{
+ 		{
+ 			"fillfactor",
+ 			"Packs gist index pages only to this percentage",
+ 			RELOPT_KIND_GIST
+ 		},
+ 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100
+ 	},
+ 	/* list terminator */
+ 	{ { NULL } }
+ };
+ 
+ static relopt_real realRelOpts[] =
+ {
+ 	/* list terminator */
+ 	{ { NULL } }
+ };
+ 
+ static relopt_gen **relOpts = NULL;
+ 
+ static void parse_one_reloption(relopt_value *option, char *text_str,
+ 	int text_len, bool validate);
+ 
+ /*
+  * initialize_reloptions
+  * 		initialization routine, must be called at backend start
+  *
+  * Initialize the relOpts array and fill each variable's type and name length.
+  */
+ void
+ initialize_reloptions(void)
+ {
+ 	int		i;
+ 	int		j = 0;
+ 
+ 	for (i = 0; boolRelOpts[i].gen.name; i++)
+ 		j++;
+ 	for (i = 0; intRelOpts[i].gen.name; i++)
+ 		j++;
+ 	for (i = 0; realRelOpts[i].gen.name; i++)
+ 		j++;
+ 
+ 	if (relOpts)
+ 		pfree(relOpts);
+ 	relOpts = MemoryContextAlloc(TopMemoryContext,
+  (j + 1) * sizeof(relopt_gen *));
+ 
+ 	j = 0;
+ 	for (i = 0; boolRelOpts[i].gen.name; i++)
+ 	{
+ 		relOpts[j] = &boolRelOpts[i].gen;
+ 		relOpts[j]->type = RELOPT_TYPE_BOOL;
+ 		relOpts[j]->namelen = strlen(relOpts[j]->name);
+ 		j++;
+ 	}
+ 
+ 	for (i = 0; intRelOpts[i].gen.name; i++)
+ 	{
+ 		relOpts[j] = &intRelOpts[i].gen;
+ 		relOpts[j]->type = RELOPT_TYPE_INT;
+ 		relOpts[j]->namelen = strlen(relOpts[j]->name);
+ 		j++;
+ 	}
+ 
+ 	for (i = 0; realRelOpts[i].gen.name; i++)
+ 	{
+ 		relOpts[j] = &realRelOpts[i].gen;
+ 		relOpts[j]->type = RELOPT_TYPE_REAL;
+ 		relOpts[j]->namelen = strlen(relOpts[j]->name);
+ 		j++;
+ 	}
+ 
+ 	/* add a list terminator */
+ 	relOpts[j] = NULL;
+ }
  
  /*
   * Transform a relation options list (list of DefElem) into the text array
*** transformRelOptions(Datum oldOptions, Li
*** 73,81 
  
  		for (i = 0; i < noldoptions; i++)
  		{
! 			text	   *oldoption = DatumGetTextP(oldoptions[i]);
! 			char	   *text_str = VARDATA(oldoption);
! 			int			text_len = VARSIZE(oldoption) - VARHDR

[HACKERS] encoding cleanups in cvs repo

2008-12-22 Thread Magnus Hagander
I have cleaned up a couple of badly broken encodings in cvs commit
messages in:

src/backend/utils/error/Attic/exc.c,v


per discussion with Alvaro, by simply editing the RCS file directly in
the cvs repository. Only the commit message was changed (broken
character removed)

Should have no effect. In case something broke, I have extra backups :-)

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reloptions and toast tables

2008-12-22 Thread Jaime Casanova
On 12/20/08, Alvaro Herrera  wrote:
>
> ALTER TABLE foo SET (TOAST autovacuum_enabled = false);
...
> ALTER TABLE foo SET TOAST (autovacuum_enabled = false);
>

i will be happy with any of this options (actually i prefer the second
one but don't have a strong argument against the first)

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

2008-12-22 Thread Joshua Tolley
On Sun, Dec 21, 2008 at 10:25:59PM -0500, Robert Haas wrote:
> [Some performance testing.]

I (finally!) have a chance to post my performance testing results... my
apologies for the really long delay. 

Unfortunately I'm not seeing wonderful speedups with the particular
queries I did in this case. I generated three 1GB datasets, with skews
set at 1, 2, and 3. The test script I wrote turns on enable_usestatmcvs
and runs EXPLAIN ANALYZE on the same query five times. Then it turns
enable_usestatmcvs off, and runs the same query five more times. It does
this with each of the three datasets in turn, and then starts over at
the beginning until I tell it to quit. My results showed a statistically
significant improvement in speed only on the skew == 3 dataset. 

I did the same tests twice, once with default_statistics_target set to
10, and once with it set to 100. I've attached boxplots of the total
query times as reported by EXPLAIN ANALYZE ("dst10" in the filename
indicates default_statistics_target was 10, and so on), my results
parsed out of the EXPLAIN ANALYZE output (test.filtered.10 and
test.filtered.100), the results of one-tailed Student's T tests of the
result set (ttests), and the R code to run the tests if anyone's really
interested (t.test.R).

The results data includes six columns: the skew value, whether
enable_usestatmcvs was on or not (represented by a 1 or 0), total times
for each of the three joins that made up the query, and total time for
the query itself. The results above pay attention only to the total
query time.

Finally, the query involved:

SELECT * FROM lineitem l LEFT JOIN part p ON (p.p_partkey = l.l_partkey)
LEFT JOIN orders o ON (o.o_orderkey = l.l_orderky) LEFT JOIN customer c
ON (c.c_custkey = o.o_custkey);

- Josh / eggyknap
<><>SKEWUSESTAT J1  J2  J3  TOT
1   1   50461.443000397244.673000   453217.081000   459501.492
1   1   47884.085000392737.144000   453039.924000   460809.210
1   1   52175.049000473484.66   518528.66   523864.739
1   1   47127.359000463970.123000   510257.929000   515556.171
1   1   49382.039000492098.877000   542123.146000   547503.329
1   0   43094.98464022.565000   509026.652000   514349.238
1   0   45901.734000439642.013000   490180.994000   495489.335
1   0   43127.40430072.203000   475914.797000   481192.279
1   0   42070.676000375572.825000   423910.457000   429677.988
1   0   56491.288000498455.906000   551204.091000   557467.631
2   1   58372.411000461959.358000   508724.227000   514004.653
2   1   55187.182000451564.246000   497331.791000   502957.730
2   1   61093.577000443683.358000   493160.552000   498868.413
2   1   55299.883000482283.701000   541617.568000   548030.650
2   1   54002.928000499089.964000   544504.041000   549828.715
2   0   56133.232000452656.945000   501956.569000   507287.362
2   0   56900.88478264.522000   537943.058000   544455.088
2   0   61512.999000480176.724000   541688.121000   548684.876
2   0   55106.671000474847.36   522074.604000   527428.018
2   0   57440.536000512357.019000   558515.194000   563922.575
3   1   48912.233000519270.741000   562948.024000   568318.976
3   1   51509.014000455114.005000   502253.369000   507639.017
3   1   48977.903000399254.515000   442796.459000   448157.712
3   1   52664.751000398226.595000   02.503000   449745.454
3   1   57036.981000498623.476000   541792.07   547105.638
3   0   53972.755000490592.656000   544792.70   550086.185
3   0   59046.762000490597.511000   534615.83   539919.402
3   0   49112.387000517318.422000   574361.142000   581877.479
3   0   50138.407000499705.817000   545116.168000   550505.373
3   0   48691.832000510223.136000   564247.448000   570378.601
1   1   68256.834000496599.31   557998.082000   565697.676
1   1   56864.637000456848.446000   502898.716000   508340.867
1   1   53933.953000479646.739000   528711.936000   534046.589
1   1   56468.009000456499.306000   503936.705000   509286.867
1   1   56117.481000464881.592000   511655.733000   517015.575
1   0   60140.954000466226.599000   519332.729000   524760.071
1   0   56106.889000487886.698000   544010.57   550316.703
1   0   62452.804000509665.97   556011.068000   561309.527
1   0   58373.154000468318.808000   515009.584000   520342.427
1   0   52479.479000499852.717000   546099.564000   551457.608
2   1   58950.898000487229.024000   535760.246000   541083.469
2   1   77649.141000542007.659000   596702.949000   602057.034
2   1   66

Re: [HACKERS] Lock conflict behavior?

2008-12-22 Thread Jan Urbański
Tom Lane wrote:
> Tatsuo Ishii  writes:
>> I'm wondering if following behavior of PostgreSQL regarding lock
>> conflict is an expected one. Here's a scenario:
> 
>> Session A:
>>  BEGIN;
>>  SELECT * FROM  pg_class limit 1; -- acquires access share lock
> 
>> Session B:
>>  BEGIN;
>>  ALTER TABLE pg_class ;  -- waits for acquiring access
>> exclusive lock(wil fail anyway 
>> though)
>> Session C:
>>  SELECT * FROM pg_class...;  -- whatever query which needs
>> to acces pg_class will be
>> blocked, too bad...
> 
>> I understand that B should wait for aquiring lock, but Should C wait
>> for?
> 
> If we didn't do this, then a would-be acquirer of exclusive lock would
> have a very serious problem with lock starvation: it might wait forever
> in the face of a continuous stream of access-share lock requests.

See http://en.wikipedia.org/wiki/Readers-writers_problem

Jan

-- 
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Automatic view update rules

2008-12-22 Thread Bernd Helmle
--On Mittwoch, November 26, 2008 10:54:01 +0100 Bernd Helmle 
 wrote:



--On Dienstag, November 25, 2008 23:43:02 -0500 Robert Haas
 wrote:


Do you intend to submit an updated version of this patch for this
commitfest?


I'll do asap, i've updated the status to 'waiting on author'.


Okay, i've finally managed to create an updated version with (hopefully) 
all issues mentioned by Robert adressed.


--
 Thanks

   Bernd

view_update.patch.bz2
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lock conflict behavior?

2008-12-22 Thread Tom Lane
Tatsuo Ishii  writes:
> I'm wondering if following behavior of PostgreSQL regarding lock
> conflict is an expected one. Here's a scenario:

> Session A:
>   BEGIN;
>   SELECT * FROM  pg_class limit 1; -- acquires access share lock

> Session B:
>   BEGIN;
>   ALTER TABLE pg_class ;  -- waits for acquiring access
>  exclusive lock(wil fail anyway 
> though)
> Session C:
>   SELECT * FROM pg_class...;  -- whatever query which needs
>  to acces pg_class will be
>  blocked, too bad...

> I understand that B should wait for aquiring lock, but Should C wait
> for?

If we didn't do this, then a would-be acquirer of exclusive lock would
have a very serious problem with lock starvation: it might wait forever
in the face of a continuous stream of access-share lock requests.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a small proposal for avoiding foot-shooting

2008-12-22 Thread Andrew Gierth
> "Albe" == Albe Laurenz  writes:

 Albe> I *guess* it is the problem addressed by

 Albe> http://archives.postgresql.org/pgsql-committers/2008-04/msg00275.php
 Albe> and
 Albe> http://archives.postgresql.org/pgsql-committers/2008-04/msg00358.php

No; the problem is with stop -mimmediate (not -mfast), or any other
PANIC shutdown of the postmaster during backup.

-- 
Andrew.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a small proposal for avoiding foot-shooting

2008-12-22 Thread Albe Laurenz
Tom Lane wrote:
>> I propose that this behaviour be changed such that 'terse' is ignored
>> for all log messages of FATAL or PANIC severity.
>> [ on the strength of a single example ]
> 
[...]
> 
> It seems like it might be better to rephrase error messages to ensure
> that anything really critical is mentioned in the primary message.
> In this case, perhaps instead of
>   errmsg("could not locate required checkpoint record")
> we could have it print
>   errmsg("could not locate checkpoint record specified in file 
> \"%s/backup_label\".", DataDir)
> assuming we did actually get the location from there.
> 
> Anyway, you've omitted a lot of details that would be necessary
> to judge exactly what was misleading about what the DBA saw.

I *guess* it is the problem addressed by

http://archives.postgresql.org/pgsql-committers/2008-04/msg00275.php
and
http://archives.postgresql.org/pgsql-committers/2008-04/msg00358.php

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Lock conflict behavior?

2008-12-22 Thread Tatsuo Ishii
Hi,

I'm wondering if following behavior of PostgreSQL regarding lock
conflict is an expected one. Here's a scenario:

Session A:
BEGIN;
SELECT * FROM  pg_class limit 1; -- acquires access share lock

Session B:
BEGIN;
ALTER TABLE pg_class ;  -- waits for acquiring access
   exclusive lock(wil fail anyway 
though)
Session C:
SELECT * FROM pg_class...;  -- whatever query which needs
   to acces pg_class will be
   blocked, too bad...

I understand that B should wait for aquiring lock, but Should C wait
for?

Also, it seems that an attacker could do a denial service attack if he
could open session A and B, since other users on session C or
following sessions will be blocked.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers