Re: [HACKERS] Sync Rep: Second thoughts
Hi Markus, I'm not quite sure what you mean by "certification protocol", there's no such thing in Postgres-R (as proposed by Kemme). Although, I remember having heard that term in the context of F. Pedone's work. Can you point me to some paper explaining this certification protocol? What Bettina calls the Lock Phase in http://www.cs.mcgill.ca/~kemme/papers/vldb00.pdf is actually a certification. You can find more references to certification protocols in http://gorda.di.uminho.pt/download/reports/gapi.pdf I would also recommend the work of Sameh on Tashkent and Taskent+ that was based on Postgres: http://labos.epfl.ch/webdav/site/labos/users/157494/public/papers/tashkent.eurosys2006.pdf and http://infoscience.epfl.ch/record/97654/files/tashkentPlus.eurosys2007.final.pdf Certification-based approaches have already multiple reliability issues to improve write performance compared to statement-based replication, but this is very dependent on the capacity of the system to limit the conflicting window for concurrent transactions. What do you mean by "reliability issues"? These approaches usually require an atomic broadcast primitive that is usually fragile (limited scalability, hard to tune failure timeouts, ). Most prototype implementations have the load balancer and/or the certifier as a SPOF (single point of failure). Building reliability for these components will come with a significant performance penalty. The writeset extraction mechanisms have had too many limitations so far to allow the use of certification-based replication in production (AFAIK). What limitations are you speaking of here? Oftentimes DDL support is very limited. Non-transactional objects like sequences are not captured. Session or environment variables are not necessarily propagated. Support of temp tables varies between databases which makes it hard to support them properly in a generic way. Well I guess everyone has a story on some limitations it has found with some database replication technology especially when a user expects a cluster to behave like a single database instance. Happy holidays, Emmanuel -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map and freezing
On Tue, Dec 23, 2008 at 4:24 AM, Heikki Linnakangas wrote: > Heikki Linnakangas wrote: >> >> Peter Eisentraut wrote: >>> >>> Heikki Linnakangas wrote: I think we need a threshold similar to autovacuum_freeze_max_age for manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and relfrozenxid is older than vacuum_freeze_max_age, the visibility map is ignored and all pages are scanned. >>> >>> Would one parameter to control both suffice? (i.e., rename >>> autovacuum_freeze_max_age to vacuum_freeze_max_age) >> >> Imagine that you run a nightly VACUUM from cron, and have autovacuum >> disabled. If autovacuum_freeze_max_age is the same as vacuum_freeze_max_age, >> as soon as that age is reached, an anti-wraparound autovacuum launched. What >> we'd want to happen is for the next nightly VACUUM to do the work instead. >> So they need to be separate settings, with some space between them by >> default. > > Attached is a proposed patch to handle freezing. In a nutshell: > > Introduce new vacuum_freeze_max_age setting. Manual VACUUM will scan the > whole table and advance relfrozenxid, if relfrozenxid is older than > vacuum_freeze_max_age. > > If vacuum_freeze_max_age is greater than autovacuum_freeze_max_age - 1 > million transactions, it's effectively capped at that value. It doesn't make > sense to have vacuum_freeze_max_age > autovacuum_freeze_max_age, because the > manual VACUUM wouldn't have a chance to do the full sweep before the > anti-wraparound autovacuum is launched. The "minus one million transactions" > is to give some headroom. > > I'm lowering vacuum_freeze_min_age from 100 million transactions to 50 > million, so that the whole-table vacuum doesn't need to run as often. Note > that since VACUUM normally only scans pages that need vacuuming according to > the visibility map, tuples on skipped pages are not frozen any earlier even > though vacuum_freeze_min_age is lower. > > To recap, here's the new defaults: > autovacuum_freeze_max_age 2 > vacuum_freeze_max_age 15000 > vacuum_freeze_min_age5000 > > This means that with defaults, autovacuum will launch a whole-table vacuum > every 150 million transactions (autovacuum_freeze_max_age - > vacuum_freeze_min_age). If you run a nightly manual VACUUM, you'll get a > whole-table vacuum every 100 million transactions. > > vacuum_freeze_max_age also affects autovacuums. If an autovacuum is launched > on table to remove dead tuples, and vacuum_freeze_max_age has been reached > (but not yet autovacuum_freeze_max_age), the autovacuum will scan the whole > table. I'm not sure if this is desirable, to avoid having to launch separate > anti-wraparound autovacuums even when there's not many dead tuples, or just > confusing. > > If you set vacuum_freeze_max_age to 0, the visibility map is not used to > skip pages, so you'll get the pre-8.4 old behavior. It seems to be strange that "max" can be less than "min". Is it worth dividing a parameter into two(min/max)? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incoherent view of serializable transactions
Kevin, If you want to know how to build SERIALIZABLE with a database that provides SI (Snapshot Isolation), read http://portal.acm.org/citation.cfm?doid=1376616.137669 Note that in practice, READ COMMITTED is the most largely used isolation level and its limitations are relatively well understood by the average programmer that can program his application accordingly. I still don't get why people would use SERIALIZABLE since there is no efficient implementation of it. My 2 cents. Emmanuel Kevin Grittner wrote: As I've understood limitations of the PostgreSQL implementation of SERIALIZABLE transactions, at least the only example given in the documentation, revolve around a rather unlikely situation: Given concurrent transactions T1 and T2 and non-overlapping sets of data A and B, T1 reads data including A and uses the data to modify B while T2 reads data including B and uses that data to modify A, where the modifications performed by either would affect the modifications made by the other, if they were visible. For reasons I'll omit here, that scenario didn't worry me for my current uses of PostgreSQL. I've found another form of deviation from the standard SERIALIZABLE behavior, though, which does worry me. Although the above appears to be the only situation where the end result after everything commits is inconsistent with standard SERIALIZABLE behavior, the PostgreSQL implementation allows transactions to view the data in states which would never be possible during the application of the transactions in series in the order they will appear to have been applied after the commit. Imagine, as an example, a system which involves recording receipts, each of which must go into a daily deposit. There is a control table with one row containing the current deposit date for receipts. Somewhere mid-afternoon that date is updated, all subsequent receipts fall into the new day, and a report is run listing the receipts for the day and giving the deposit total. Under a standard-compliant implementation of SERIALIZABLE, this is straightforward: a transaction which is inserting a receipt selects the deposit date to use in its transaction, and any SELECT of receipts for a date prior to the current deposit date will see the accurate, final data. Under the PostgreSQL implementation, although data eventually gets to a coherent state, there can be a window of time where a SELECT can return an incomplete list of receipts for a date which appears to be closed, even if all transactions for modifying and viewing data are SERIALIZABLE. -- setup create table ctl (k text not null primary key, deposit_date date not null); insert into ctl values ('receipt', date '2008-12-22'); create table receipt (receipt_no int not null primary key, deposit_date date not null, amount numeric(13,2)); insert into receipt values (1, (select deposit_date from ctl where k = 'receipt'), 1.00); insert into receipt values (2, (select deposit_date from ctl where k = 'receipt'), 2.00); -- connection 1 start transaction isolation level serializable ; insert into receipt values (3, (select deposit_date from ctl where k = 'receipt'), 4.00); -- connection 2 start transaction isolation level serializable ; update ctl set deposit_date = date '2008-12-23' where k = 'receipt'; commit transaction; start transaction isolation level serializable ; select * from ctl; -- (deposit_date shows as 2008-12-23) select * from receipt; -- (Only receipts 1 and 2 show for 2008-12-22.) commit; -- connection 1 commit transaction; -- connection 2 start transaction isolation level serializable ; select * from receipt; -- (All receipts for the 2008-12-22 deposit date now show.) commit transaction; At this point, SERIALIZABLE transactions appear to have worked, with receipt 3 happening before the update of deposit_date; however, there was a window of time when the update to deposit_date was visible and receipt 3 was not. This absolutely can't happen in a standard-compliant implementation. At a minimum, this window where visible data lacks coherency should be noted in the documentation. I don't know if there's any way to fix this without killing performance. -Kevin -- Emmanuel Cecchet FTO @ Frog Thinker Open Source Development & Consulting -- Web: http://www.frogthinker.org email: m...@frogthinker.org Skype: emmanuel_cecchet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some semantic details of the window-function spec
"Hitoshi Harada" writes: > 2008/12/23 Tom Lane : >> * Unlike aggregates, there doesn't seem to be any concept of a window >> function being attached to an outer-level query --- in fact 6.10 rule >> 4 says that a window function's argument can't contain outer references >> at all. That seems excessively strong, but it does seem that there is >> no point in a "winlevelsup" field in WindowFunc, nor in implementing >> any infrastructure for outer-level window functions. > I am so ignorant that I don't know what exactly agglevelsup > represents. Just copied it from agg to window functions... Could > someone show me easy example? In something like select ..., (select avg(a.x) from b), ... from a; the avg() invocation is actually an aggregate of the outer query over a. It's a constant so far as any one invocation of the sub-select on b is concerned. The SQL:2008 spec is pretty opaque about this (as it is on a whole lot of matters :-() but if you read older versions like SQL92 it's spelled out a bit more clearly. I don't see any wording suggesting that window functions are supposed to work this way, however. >> * The last part of section 4.14 states that two different window >> functions must be evaluated against the same sorted row ordering if >> they have syntactically identical partition and ordering clauses, >> even if the windows are otherwise distinct (in particular there >> could be different framing clauses). Since we don't currently implement >> framing clauses the latter is not too interesting, but it's still true >> that the patch as I currently have it doesn't fully meet that >> requirement: if you intentionally specify separate but equivalent named >> window definitions, it won't be smart enough to fold them together, >> and you could end up with extra sorts happening and possibly a different >> ordering of sort-peer rows. How worried are we about that? > Is it? I intended all equivalent windows are folded into one as far as > equal(w1->partitionClause, w2->partitionClause) && > equal(w1->orderClause, w2->orderClause) is true. Well, I rewrote that code pretty heavily because it didn't work per spec as far as references to existing windows went. The problem that remains is that in something like WINDOW w1 as (partition by x), w2 as (partition by x), w3 as (w1 order by y), w4 as (w2 order by y) w3 and w4 are equivalent but it's pretty hard to recognize that. And even if we did recognize it, we couldn't simply fold them together into a single windowClause entry without changing the way that the query looks on reverse-listing. (The patch as submitted doesn't even *have* reverse-listing capability for WINDOW clauses, but I plan to fix that tomorrow ...) This is doubtless fixable with another level of indirection, I'm just wondering how much trouble it's worth. It seems like the only case that will really arise in practice is duplicate anonymous windows (identical OVER clauses), and the code does fold those together. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] Infrastructure changes for recovery (v8)
On Tue, Dec 23, 2008 at 5:18 AM, Heikki Linnakangas wrote: > Simon Riggs wrote: >> >> On Wed, 2008-12-17 at 23:32 -0300, Alvaro Herrera wrote: >>> >>> Simon Riggs escribió: >>> Please let me know how I can make the reviewer's job easier. Diagrams, writeups, whatever. Thanks, >>> >>> A link perhaps? >> >> There is much confusion on this point for which I'm very sorry. >> >> I originally wrote "infra" patch to allow it to be committed separately >> in the Sept commitfest, to reduce size of the forthcoming hotstandby >> patch. That didn't happen (no moans there) so the eventual "hotstandby" >> patch includes all of what was the infra patch, plus the new code. >> >> So currently there is no separate "infra" patch. The two line items on >> the CommitFest page are really just one large project. I would be in >> favour of removing the "infra" lines from the CommitFest page. > > I think it's useful to review the "infra" part of the patch separately, so I > split it out of the big patch again. I haven't looked at this in detail yet, > but it compiles and passes regression tests. Super! I would fix synch rep code based on this patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] encoding cleanups in cvs repo
Magnus Hagander writes: > I have cleaned up a couple of badly broken encodings in cvs commit > messages in: > src/backend/utils/error/Attic/exc.c,v Out of curiosity ... what problems exactly? I just looked through my last complete dump of CVS log history and didn't see anything funny in the messages for exc.c ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incoherent view of serializable transactions
"Kevin Grittner" writes: > At this point, SERIALIZABLE transactions appear to have worked, with > receipt 3 happening before the update of deposit_date; however, there > was a window of time when the update to deposit_date was visible and > receipt 3 was not. > This absolutely can't happen in a standard-compliant implementation. I think you mean "you'd like to believe that can't happen in a standard-compliant implementation". It doesn't include any of the specific behaviors that are forbidden by the spec, though, so I'm less than convinced. An appropriate way to prevent the problem is probably for the transaction that changes the deposit_date to take out a write-excluding lock on the receipts table before it does so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lock conflict behavior?
On Mon, 2008-12-22 at 17:14 +0900, Tatsuo Ishii wrote: > Also, it seems that an attacker could do a denial service attack if he > could open session A and B, since other users on session C or > following sessions will be blocked. LOCK TABLE checks the permissions before attempting to acquire the lock, is there a reason that ALTER TABLE doesn't? Even if they don't have any rights to the table at all (not even SELECT), there are still other problems. For instance, the user could just wait for a long running query (or VACUUM) and issue the ALTER TABLE at that time. I know we don't make any guarantees about preventing denial-of-service attacks from users that can connect, but if possible we should be consistent about checking the permissions. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib/pg_stat_statements 1212
On Mon, Dec 22, 2008 at 00:44, ITAGAKI Takahiro wrote: > > "Alex Hunsaker" wrote: > >> A few comments: >> >> Is there a reason you add sourceText to QueryDesc? AFAICT you can do >> ActivePortal->sourceText and it will always be populated correctly. > > That's for nested statements (SQLs called in stored functions). > ActivePortal->sourceText shows text of only top most query. > >> I think the explain_analyze_format guc is a clever way of getting >> around the explain analyze verbose you proposed earlier. But I dont >> see any doc updates for it. > > Sure, no docs for now. The guc approach is acceptable? > (I'm not sure whether it is the best way...) > If ok, I'll write docs for it. I dunno, Im hopping that splitting up the patches and making the change more visible some more people might chime in :) >> Im still not overly fond of the "statistics." custom guc name, but >> what can you do... > > I have no obsessions with the name. The "pg_stat_statements.*" might > be better to avoid confliction of prefix. If so, we'd better to rename > variables to kill duplication of "statements" from the names. > > Ex. >statistics.max_statements -> pg_stat_statements.limit >statistics.track_statements -> pg_stat_statements.target How about just pg_stat_statements.track ? >statistics.saved_file -> pg_stat_statements.saved_file I do like the consistency of having the custom gucs be the same as the module name, easy to grep or google for. >> Other than that it looks good, though I admit I have not had the time >> to sit down and thoroughly test it yet... > > I found another bug in my patch. > > [pg_stat_statements-1212.patch # pg_stat_statements()] >SpinLockAcquire(&entry->mutex); >values[i++] = Int64GetDatumFast(entry->counters.calls); >values[i++] = Float8GetDatumFast(entry->counters.total_time); >values[i++] = Float8GetDatumFast(entry->counters.cpu_user); >values[i++] = Float8GetDatumFast(entry->counters.cpu_sys); >values[i++] = Int64GetDatumFast(entry->counters.gets); >values[i++] = Int64GetDatumFast(entry->counters.reads); >values[i++] = Int64GetDatumFast(entry->counters.lreads); >values[i++] = Int64GetDatumFast(entry->counters.rows); >SpinLockRelease(&entry->mutex); > > The variables are not protected by spinlock actually when float64 and > int64 are passed by reference (especially on 32bit platform). > It'd be better to copy values: > >Counterstmp; >/* copy the actual values in spinlock */ >SpinLockAcquire(&entry->mutex); >tmp = entry->counters; >SpinLockRelease(&entry->mutex); >/* create a tuple after lock is released. */ >values[i++] = Int64GetDatumFast(tmp.calls); >values[i++] = Float8GetDatumFast(tmp.total_time); >... Ive only been testing on 64bit... maybe thats why I never ran into this. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HAVE_FSEEKO for WIN32
Andrew Dunstan wrote: > > Cleaning up the parallel restore patch I came across a question I might > have asked before, but one which in any case I worked around: > > Why do we carefully define fseeko() for WIN32 but then not define > HAVE_FSEEKO, which makes doing the former pretty much pointless? Well, we are doing something odd here but it might not be what you think. We currently use fseeko() only in pg_dump. We define C code in /port for some Unix platforms that don't support fseeko. For platforms that don't support fseeko and don't have /port support for it we just use fseek() in port.h: #ifndef HAVE_FSEEKO #define fseeko(a, b, c) fseek(a, b, c) #define ftello(a) ftell(a) #endif but then for Win32 we #undef fseeko and redefine it: #ifdef WIN32 #define pgoff_t __int64 #undef fseeko #undef ftello #ifdef WIN32_ONLY_COMPILER #define fseeko(stream, offset, origin) _fseeki64(stream, offset, origin) #define ftello(stream) _ftelli64(stream) #else #define fseeko(stream, offset, origin) fseeko64(stream, offset, origin) #define ftello(stream) ftello64(stream) #endif #else #define pgoff_t off_t #endif Clearly this code should be moved into port.h, I think. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] affected rows count
On 2008-12-22, at 22:35, Dawid Kuroczko wrote: atlantis=> CREATE OR REPLACE FUNCTION foo_trigger() RETURNS trigger AS $$ BEGIN UPDATE bar SET t=NEW.t WHERE i=NEW.i; RETURN NULL; END; $$ LANGUAGE plpgsql; atlantis=> CREATE TRIGGER foo_update BEFORE UPDATE ON foo FOR EACH ROW EXECUTE PROCEDURE foo_trigger(); CREATE TRIGGER CREATE FUNCTION atlantis=> UPDATE foo SET t='##'||t; UPDATE 0 ^^ Grzegorz means such a situation. Personally I understand the current behavior to be correct -- since no row in that table is updated. that's not quite what I meant. It is correct behavior in this situation - since we didn't update anything. Like I said, it is my bad - I should have tested it before emailing - hackers... So far my ratio of useful emails here is very low. :/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] affected rows count
On Mon, Dec 22, 2008 at 9:07 PM, Bruce Momjian wrote: > Grzegorz Jaskiewicz wrote: >> Hey folks, >> >> It doesn't stop to bug me, that postgres will return 0 number of >> affected rows, if table is triggered. >> Now, question is - is this fixable, but no one cares, or is it some >> sort of a design/implementation flaw and we just have to live with it. > > Would you show us an example of your problem? If I understand the problem correctly: atlantis=> CREATE TABLE foo (i int PRIMARY KEY, t text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" CREATE TABLE atlantis=> CREATE TABLE bar (i int PRIMARY KEY, t text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "bar_pkey" for table "bar" CREATE TABLE atlantis=> INSERT INTO foo (i,t) SELECT n, '#'||n FROM generate_series(0,99) AS g(n); INSERT 0 100 atlantis=> INSERT INTO bar (i) SELECT i FROM foo; INSERT 0 100 atlantis=> UPDATE foo SET t='##'||t; UPDATE 100 atlantis=> CREATE OR REPLACE FUNCTION foo_trigger() RETURNS trigger AS $$ BEGIN UPDATE bar SET t=NEW.t WHERE i=NEW.i; RETURN NULL; END; $$ LANGUAGE plpgsql; atlantis=> CREATE TRIGGER foo_update BEFORE UPDATE ON foo FOR EACH ROW EXECUTE PROCEDURE foo_trigger(); CREATE TRIGGER CREATE FUNCTION atlantis=> UPDATE foo SET t='##'||t; UPDATE 0 ^^ Grzegorz means such a situation. Personally I understand the current behavior to be correct -- since no row in that table is updated. OTOH when you use triggers for emulating table partitioning it leads to confusion (parent table was not updated, but the child table is (or isn't because there were really 0 rows updated -- you can't really tell)). Best regards, Dawid -- ..``The essence of real creativity is a certain : *Dawid Kuroczko* : playfulness, a flitting from idea to idea : qne...@gmail.com : without getting bogged down by fixated demands.'' `..' Sherkaner Underhill, A Deepness in the Sky, V. Vinge -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] affected rows count
On 2008-12-22, at 21:07, Bruce Momjian wrote: Grzegorz Jaskiewicz wrote: Hey folks, It doesn't stop to bug me, that postgres will return 0 number of affected rows, if table is triggered. Now, question is - is this fixable, but no one cares, or is it some sort of a design/implementation flaw and we just have to live with it. Would you show us an example of your problem? Dunno what's wrong with me lately. I was under impression, that about 1/2 year ago on 8.1 I wasn't able to get row count anymore if there was a trigger on a table. Well, affected row count would be always 0 than. But trying now on cvs head, it all works great. heh, I am terribly sorry... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] HAVE_FSEEKO for WIN32
Cleaning up the parallel restore patch I came across a question I might have asked before, but one which in any case I worked around: Why do we carefully define fseeko() for WIN32 but then not define HAVE_FSEEKO, which makes doing the former pretty much pointless? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] affected rows count
On Mon, 2008-12-22 at 15:07 -0500, Bruce Momjian wrote: > Grzegorz Jaskiewicz wrote: > > Hey folks, > > > > It doesn't stop to bug me, that postgres will return 0 number of > > affected rows, if table is triggered. > > Now, question is - is this fixable, but no one cares, or is it some > > sort of a design/implementation flaw and we just have to live with it. > > Would you show us an example of your problem? > This may not be the problem he's talking about, but it's bothered me for a while that there is no way to control the value returned for the affected rows. For instance, if you have an updatable view that uses a function that updates a table in a remote database, it would be nice to be able to pass that value back to the client. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] affected rows count
Grzegorz Jaskiewicz wrote: > Hey folks, > > It doesn't stop to bug me, that postgres will return 0 number of > affected rows, if table is triggered. > Now, question is - is this fixable, but no one cares, or is it some > sort of a design/implementation flaw and we just have to live with it. Would you show us an example of your problem? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
Kenneth Marshall wrote: > Dear PostgreSQL developers, > > I am re-sending this to keep this last change to the > internal hash function on the radar. Please add it to the commitfest wiki page, http://wiki.postgresql.org/wiki/CommitFest_2008-11 Thanks -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] updated hash functions for postgresql v1
Dear PostgreSQL developers, I am re-sending this to keep this last change to the internal hash function on the radar. Ken Sorry about the delay for this update to the new hash index implementation. I was trying to get the WAL logging in place and forgot to post the actual patch. The WAL for hash indexes will need to wait for 8.5, but I did want to add back in the piece of the Bob Jenkins 2006 hash function that was stripped out of the initial patch on application due to concerns about the randomness of the resulting hash values. Here is a re-post of my initial findings comparing the old/new Jenkins hash from lookup2 and lookup3. I have added a third column containing the results for the hash_any() resulting from the attached patch as well as simple timing test for a DB reindex both before and after patching. Also attached is a simple documentation patch updating the note attached to the hash index description. Regards, Ken Hi, I have finally had a chance to do some investigation on the performance of the old hash mix() function versus the updated mix()/final() in the new hash function. Here is a table of my current results for both the old and the new hash function. In this case cracklib refers to the cracklib-dict containing 1648379 unique words massaged in various ways to generate input strings for the hash functions. The result is the number of collisions in the hash values generated. hash inputoldnew newv2 -------- - cracklib 338316 338 cracklib x 2 (i.e. clibclib) 305319 300 cracklib x 3 (clibclibclib) 323329 315 cracklib x 10 302310 329 cracklib x 100350335 298 cracklib x 1000 314309 315 cracklib x 100 truncated to char(100) 311327 320 uint32 from 1-1648379 309319 347 (uint32 1-1948379)*256309314 304 (uint32 1-1948379)*16 310314 324 "a"uint32 (i.e. a1,a0002...) 320321 312 uint32uint32 (i.e. uint64)321287 309 The different result columns are old = Jenkins 1996 hash function(lookup2.c), new = Jenkins 2006 hash function (lookup3.c), and newv2 = adaptation of current hash_any() to incorporate the separate mix()/final() functions. As you can see from the results, spliting the mix() and final() apart does not result in any perceptible loss of randomness in the hash assignment. I also ran a crude timing for a reindex of the following database: CREATE TABLE dict (word text); CREATE INDEX wordhash ON dict USING hash (word); INSERT INTO dict (word) VALUES('s;lhkjdpyoijxfg;lktjgh;sdlhkjo'); INSERT INTO dict (SELECT MAX(word)||MAX(word) FROM dict); ... (21 times) REINDEX TABLE ... The average time to reindex the table using our current hash_any() without the separate mix()/final() was 1696ms and 1482ms with the separate mix()/final() stages giving almost 13% better performance for this stupid metric. --- indices.sgml2008-10-13 14:40:06.0 -0500 +++ indices.sgml.NEW2008-11-04 12:42:35.0 -0600 @@ -190,13 +190,11 @@ -Testing has shown PostgreSQL's hash -indexes to perform no better than B-tree indexes, and the -index size and build time for hash indexes is much worse. -Furthermore, hash index operations are not presently WAL-logged, +PostgreSQL's hash indexes provide +the fast O(1) lookups, even for very large objects. +Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX -after a database crash. -For these reasons, hash index use is presently discouraged. +after a database crash. --- hashfunc.c.ORIG 2008-09-03 13:07:14.0 -0500 +++ hashfunc.c.NEW 2008-11-04 08:36:16.0 -0600 @@ -200,39 +200,94 @@ * hash function, see http://burtleburtle.net/bob/hash/doobs.html, * or Bob's article in Dr. Dobb's Journal, Sept. 1997. * - * In the current code, we have adopted an idea from Bob's 2006 update - * of his hash function, which is to fetch the data a word at a time when - * it is suitably aligned. This makes for a useful speedup, at the cost - * of having to maintain four code paths (aligned vs unaligned, and - * little-endian vs big-endian). Note that we have NOT adopted his newer - * mix() function, which is faster but may sacrifice some randomness. + * In the current code, we have adopted Bob's 2006 update of his hash + * which fetches the data a word at a time when it is suitably aligned. + * This makes for a useful speedup, at the cost of having to maintain + * four code paths (aligned vs unaligned, and little-endian vs big-endian). + * It also two separate mixing functions mix() and final() instead + * of a single multi-purpose function, that is s
Re: [HACKERS] Some semantic details of the window-function spec
2008/12/23 Tom Lane : > * Unlike aggregates, there doesn't seem to be any concept of a window > function being attached to an outer-level query --- in fact 6.10 rule > 4 says that a window function's argument can't contain outer references > at all. That seems excessively strong, but it does seem that there is > no point in a "winlevelsup" field in WindowFunc, nor in implementing > any infrastructure for outer-level window functions. I am so ignorant that I don't know what exactly agglevelsup represents. Just copied it from agg to window functions... Could someone show me easy example? > > * The last part of section 4.14 states that two different window > functions must be evaluated against the same sorted row ordering if > they have syntactically identical partition and ordering clauses, > even if the windows are otherwise distinct (in particular there > could be different framing clauses). Since we don't currently implement > framing clauses the latter is not too interesting, but it's still true > that the patch as I currently have it doesn't fully meet that > requirement: if you intentionally specify separate but equivalent named > window definitions, it won't be smart enough to fold them together, > and you could end up with extra sorts happening and possibly a different > ordering of sort-peer rows. How worried are we about that? Is it? I intended all equivalent windows are folded into one as far as equal(w1->partitionClause, w2->partitionClause) && equal(w1->orderClause, w2->orderClause) is true. And I believe everytime it's true, we can process them in the same window ordering. One thing, in the former discussion someone pointed me that we might have to care about that volatile functions are contained in the window. I currently don't have any idea about that. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map and freezing
Heikki Linnakangas wrote: Peter Eisentraut wrote: Heikki Linnakangas wrote: I think we need a threshold similar to autovacuum_freeze_max_age for manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and relfrozenxid is older than vacuum_freeze_max_age, the visibility map is ignored and all pages are scanned. Would one parameter to control both suffice? (i.e., rename autovacuum_freeze_max_age to vacuum_freeze_max_age) Imagine that you run a nightly VACUUM from cron, and have autovacuum disabled. If autovacuum_freeze_max_age is the same as vacuum_freeze_max_age, as soon as that age is reached, an anti-wraparound autovacuum launched. What we'd want to happen is for the next nightly VACUUM to do the work instead. So they need to be separate settings, with some space between them by default. Attached is a proposed patch to handle freezing. In a nutshell: Introduce new vacuum_freeze_max_age setting. Manual VACUUM will scan the whole table and advance relfrozenxid, if relfrozenxid is older than vacuum_freeze_max_age. If vacuum_freeze_max_age is greater than autovacuum_freeze_max_age - 1 million transactions, it's effectively capped at that value. It doesn't make sense to have vacuum_freeze_max_age > autovacuum_freeze_max_age, because the manual VACUUM wouldn't have a chance to do the full sweep before the anti-wraparound autovacuum is launched. The "minus one million transactions" is to give some headroom. I'm lowering vacuum_freeze_min_age from 100 million transactions to 50 million, so that the whole-table vacuum doesn't need to run as often. Note that since VACUUM normally only scans pages that need vacuuming according to the visibility map, tuples on skipped pages are not frozen any earlier even though vacuum_freeze_min_age is lower. To recap, here's the new defaults: autovacuum_freeze_max_age 2 vacuum_freeze_max_age 15000 vacuum_freeze_min_age5000 This means that with defaults, autovacuum will launch a whole-table vacuum every 150 million transactions (autovacuum_freeze_max_age - vacuum_freeze_min_age). If you run a nightly manual VACUUM, you'll get a whole-table vacuum every 100 million transactions. vacuum_freeze_max_age also affects autovacuums. If an autovacuum is launched on table to remove dead tuples, and vacuum_freeze_max_age has been reached (but not yet autovacuum_freeze_max_age), the autovacuum will scan the whole table. I'm not sure if this is desirable, to avoid having to launch separate anti-wraparound autovacuums even when there's not many dead tuples, or just confusing. If you set vacuum_freeze_max_age to 0, the visibility map is not used to skip pages, so you'll get the pre-8.4 old behavior. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 7493ca9..9848ce0 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3925,6 +3925,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + vacuum_freeze_max_age (integer) + + vacuum_freeze_max_age configuration parameter + + + +VACUUM performs a whole-table scan if the table's +pg_class.relfrozenxid field reaches the +age specified by this setting. The default is 150 million +transactions. Although users can set this value anywhere from zero to +one billion, VACUUM will silently limit the effective value +to the value of minus +1 million transactions, so that regular manual VACUUM has a +chance to run before autovacuum is launched to prevent XID wraparound. +For more information see . + + + + vacuum_freeze_min_age (integer) @@ -3935,7 +3955,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Specifies the cutoff age (in transactions) that VACUUM should use to decide whether to replace transaction IDs with FrozenXID while scanning a table. -The default is 100 million transactions. Although +The default is 50 million transactions. Although users can set this value anywhere from zero to one billion, VACUUM will silently limit the effective value to half the value of , so diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 03ce2e9..c41d464 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -367,10 +367,14 @@ -VACUUM's behavior is controlled by the configuration parameter -: any XID older than -vacuum_freeze_min_age transactions is replaced by -FrozenXID. Larger values of vacuum_freeze_min_age +VACUUM's behavior is controlled by the two configuration +parameters: and +. +vacuum_freeze_max_age controls when VACUUM +performs a full sweep of the table to replace old XID v
Re: [HACKERS] incoherent view of serializable transactions
>>> Martijn van Oosterhout wrote: > On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote: >> As I've understood limitations of the PostgreSQL implementation of >> SERIALIZABLE transactions, at least the only example given in the >> documentation, revolve around a rather unlikely situation: >> >> Given concurrent transactions T1 and T2 and non-overlapping sets of >> data A and B, T1 reads data including A and uses the data to modify B >> while T2 reads data including B and uses that data to modify A, where >> the modifications performed by either would affect the modifications >> made by the other, if they were visible. > > In so far as the "modifications" are just INSERTs (no UPDATEs or > DELETEs), yes. This case is covered in the documentation. Let's not understate the scope of the issue. UPDATEs and DELETEs count, too. For example: -- connection 1 cir=> create table mytab (class int not null, value int not null); CREATE TABLE cir=> copy mytab from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> 110 >> 120 >> 2100 >> 2200 >> \. cir=> start transaction isolation level serializable; START TRANSACTION cir=> update mytab set value = (select sum(value) from mytab where class = 2) where class = 1 and value = 10; UPDATE 1 -- connection 2 cir=> start transaction isolation level serializable; START TRANSACTION cir=> update mytab set value = (select sum(value) from mytab where class = 1) where class = 2 and value = 100; UPDATE 1 cir=> commit transaction; COMMIT -- connection 1 cir=> commit transaction; COMMIT cir=> select * from mytab; class | value ---+--- 1 |20 2 | 200 1 | 300 2 |30 (4 rows) >> Imagine, as an example, a system which involves recording receipts, >> each of which must go into a daily deposit. There is a control table >> with one row containing the current deposit date for receipts. >> Somewhere mid-afternoon that date is updated, all subsequent receipts >> fall into the new day, and a report is run listing the receipts for the >> day and giving the deposit total. > > This is a variation of the above and has the same "proper" solution: > predicate locking. However, in this case the records in question are > already present so you can workaround it easily. First do a SELECT FOR > UPDATE on all the records you want to update. This will serialize all > parallel transactions to either before or after you. Then do your > update. My point isn't that there aren't workarounds, it is that people might reasonably assume that SERIALIZABLE transactions provide sufficient concurrency control for this, since the only example we give of a problem is a rather contrived update anomaly. The fact that even in cases where the data settles into good form at commit leave windows where race conditions could cause occasional bad results without extra explicit locking is not obvious. >> This absolutely can't happen in a standard-compliant implementation. >> At a minimum, this window where visible data lacks coherency should be >> noted in the documentation. I don't know if there's any way to fix >> this without killing performance. > > Predicate locking is nasty and we don't try. I'm not sure if anybody > else does. I know for a fact that Sybase ASE does. I've heard from reasonably reliable sources that DB2 does. I know that Microsoft SQL Server did for some time after the split from the Sybase code base, but I'm not sure they've continued that; in fact there was a reference to concurrency issues in wikipedia which implied that they no longer do. The implementation is not pretty -- they do it by locking accessed pages and insertion points, and in some cases entire tables. It is so ugly that at one point in discussing similar issues Tom said that it couldn't really qualify as predicate locking, but in the face of the fact that they have covered all the bases to provide true serializable transactions, and that theory says that only predicate locking can do that, he conceded that it was predicate locking -- but really ugly and in a form he would never support. Anyway, I didn't argue that we should provide truly serializable transactions, just that we should provide a less contrived example of where the PostgreSQL implementation can show anomalies, so that people don't get burned through a false sense of security. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incoherent view of serializable transactions
On Mon, Dec 22, 2008 at 11:00:53AM -0600, Kevin Grittner wrote: > As I've understood limitations of the PostgreSQL implementation of > SERIALIZABLE transactions, at least the only example given in the > documentation, revolve around a rather unlikely situation: > > Given concurrent transactions T1 and T2 and non-overlapping sets of > data A and B, T1 reads data including A and uses the data to modify B > while T2 reads data including B and uses that data to modify A, where > the modifications performed by either would affect the modifications > made by the other, if they were visible. In so far as the "modifications" are just INSERTs (no UPDATEs or DELETEs), yes. This case is covered in the documentation. > Imagine, as an example, a system which involves recording receipts, > each of which must go into a daily deposit. There is a control table > with one row containing the current deposit date for receipts. > Somewhere mid-afternoon that date is updated, all subsequent receipts > fall into the new day, and a report is run listing the receipts for the > day and giving the deposit total. This is a variation of the above and has the same "proper" solution: predicate locking. However, in this case the records in question are already present so you can workaround it easily. First do a SELECT FOR UPDATE on all the records you want to update. This will serialize all parallel transactions to either before or after you. Then do your update. > This absolutely can't happen in a standard-compliant implementation. > At a minimum, this window where visible data lacks coherency should be > noted in the documentation. I don't know if there's any way to fix > this without killing performance. Predicate locking is nasty and we don't try. I'm not sure if anybody else does. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
[HACKERS] incoherent view of serializable transactions
As I've understood limitations of the PostgreSQL implementation of SERIALIZABLE transactions, at least the only example given in the documentation, revolve around a rather unlikely situation: Given concurrent transactions T1 and T2 and non-overlapping sets of data A and B, T1 reads data including A and uses the data to modify B while T2 reads data including B and uses that data to modify A, where the modifications performed by either would affect the modifications made by the other, if they were visible. For reasons I'll omit here, that scenario didn't worry me for my current uses of PostgreSQL. I've found another form of deviation from the standard SERIALIZABLE behavior, though, which does worry me. Although the above appears to be the only situation where the end result after everything commits is inconsistent with standard SERIALIZABLE behavior, the PostgreSQL implementation allows transactions to view the data in states which would never be possible during the application of the transactions in series in the order they will appear to have been applied after the commit. Imagine, as an example, a system which involves recording receipts, each of which must go into a daily deposit. There is a control table with one row containing the current deposit date for receipts. Somewhere mid-afternoon that date is updated, all subsequent receipts fall into the new day, and a report is run listing the receipts for the day and giving the deposit total. Under a standard-compliant implementation of SERIALIZABLE, this is straightforward: a transaction which is inserting a receipt selects the deposit date to use in its transaction, and any SELECT of receipts for a date prior to the current deposit date will see the accurate, final data. Under the PostgreSQL implementation, although data eventually gets to a coherent state, there can be a window of time where a SELECT can return an incomplete list of receipts for a date which appears to be closed, even if all transactions for modifying and viewing data are SERIALIZABLE. -- setup create table ctl (k text not null primary key, deposit_date date not null); insert into ctl values ('receipt', date '2008-12-22'); create table receipt (receipt_no int not null primary key, deposit_date date not null, amount numeric(13,2)); insert into receipt values (1, (select deposit_date from ctl where k = 'receipt'), 1.00); insert into receipt values (2, (select deposit_date from ctl where k = 'receipt'), 2.00); -- connection 1 start transaction isolation level serializable ; insert into receipt values (3, (select deposit_date from ctl where k = 'receipt'), 4.00); -- connection 2 start transaction isolation level serializable ; update ctl set deposit_date = date '2008-12-23' where k = 'receipt'; commit transaction; start transaction isolation level serializable ; select * from ctl; -- (deposit_date shows as 2008-12-23) select * from receipt; -- (Only receipts 1 and 2 show for 2008-12-22.) commit; -- connection 1 commit transaction; -- connection 2 start transaction isolation level serializable ; select * from receipt; -- (All receipts for the 2008-12-22 deposit date now show.) commit transaction; At this point, SERIALIZABLE transactions appear to have worked, with receipt 3 happening before the update of deposit_date; however, there was a window of time when the update to deposit_date was visible and receipt 3 was not. This absolutely can't happen in a standard-compliant implementation. At a minimum, this window where visible data lacks coherency should be noted in the documentation. I don't know if there's any way to fix this without killing performance. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Some semantic details of the window-function spec
After a couple of hours reading the SQL:2008 spec, I've come to some conclusions about the semantics that are demanded for window functions. Anyone want to to check my work? * If window functions are used together with aggregates or grouping, the grouping and regular aggregation happens first, and then windowing is done on the output rows (group rows). See section 7.12 rules 11,12. * Therefore, the argument of a window function can contain aggregate functions, and cannot reference ungrouped columns (both unlike regular aggregates). Conversely, an ordinary aggregate's argument can't contain a window function. * Also, window specifications can contain aggregate functions, and cannot reference ungrouped columns; this is because they represent another layer of grouping/ordering on top of the GROUP BY if any. (An explicit ORDER BY, if any, happens last of all.) * It is not very clear whether a window function's argument can contain another window function. 7.11 forbids window specifications from containing window functions, which is sensible, but I can't find any such statement about the arguments. The present patch forbids nesting window functions, and I'm fine with imposing that as an implementation restriction even if it's not in the spec; but can anyone find it in the spec? * Unlike aggregates, there doesn't seem to be any concept of a window function being attached to an outer-level query --- in fact 6.10 rule 4 says that a window function's argument can't contain outer references at all. That seems excessively strong, but it does seem that there is no point in a "winlevelsup" field in WindowFunc, nor in implementing any infrastructure for outer-level window functions. * The last part of section 4.14 states that two different window functions must be evaluated against the same sorted row ordering if they have syntactically identical partition and ordering clauses, even if the windows are otherwise distinct (in particular there could be different framing clauses). Since we don't currently implement framing clauses the latter is not too interesting, but it's still true that the patch as I currently have it doesn't fully meet that requirement: if you intentionally specify separate but equivalent named window definitions, it won't be smart enough to fold them together, and you could end up with extra sorts happening and possibly a different ordering of sort-peer rows. How worried are we about that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DTrace probes patch
Tom Lane wrote: Robert Lor writes: Tom Lane wrote: I agree. If the probe is meant to track only *some* WAL writes then it needs to be named something less generic than TRACE_POSTGRESQL_WAL_BUFFER_WRITE. How about change it to TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY similar to TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY for shared buffers? Works for me... Attached is the patch for the above name change. -Robert Index: src/backend/access/transam/xlog.c === RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.324 diff -u -3 -p -r1.324 xlog.c --- src/backend/access/transam/xlog.c 17 Dec 2008 01:39:03 - 1.324 +++ src/backend/access/transam/xlog.c 22 Dec 2008 16:28:00 - @@ -1318,14 +1318,14 @@ AdvanceXLInsertBuffer(bool new_segment) * Have to write buffers while holding insert lock. This is * not good, so only write as much as we absolutely must. */ - TRACE_POSTGRESQL_WAL_BUFFER_WRITE_START(); + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_START(); WriteRqst.Write = OldPageRqstPtr; WriteRqst.Flush.xlogid = 0; WriteRqst.Flush.xrecoff = 0; XLogWrite(WriteRqst, false, false); LWLockRelease(WALWriteLock); Insert->LogwrtResult = LogwrtResult; - TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DONE(); + TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE(); } } } Index: src/backend/utils/probes.d === RCS file: /projects/cvsroot/pgsql/src/backend/utils/probes.d,v retrieving revision 1.4 diff -u -3 -p -r1.4 probes.d --- src/backend/utils/probes.d 17 Dec 2008 01:39:04 - 1.4 +++ src/backend/utils/probes.d 22 Dec 2008 16:28:01 - @@ -89,6 +89,6 @@ provider postgresql { probe xlog__insert(unsigned char, unsigned char); probe xlog__switch(); - probe wal__buffer__write__start(); - probe wal__buffer__write__done(); + probe wal__buffer__write__dirty__start(); + probe wal__buffer__write__dirty__done(); }; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic reloptions improvement
ITAGAKI Takahiro wrote: > Alvaro Herrera wrote: > > > Here's a patch for improving the general reloptions mechanism. What > > this patch does is add a table-based option parser. This allows adding > > new options very easily, and stops the business of having to pass the > > minimum and default fillfactor each time you want the reloptions > > processed. > > You use struct relopt_gen (and its subclasses) for the purpose of > both "definition of options" and "parsed result". But I think > it is cleaner to separete parsed results into another struct > something like: Thanks for the suggestion -- yes, it is better as you suggest. I think putting the default on the same struct was just out of laziness at first, and inertia later. Here's the next version, which also fixes some particularly embarrasing bugs. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. Index: src/backend/access/common/reloptions.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/common/reloptions.c,v retrieving revision 1.11 diff -c -p -r1.11 reloptions.c *** src/backend/access/common/reloptions.c 23 Jul 2008 17:29:53 - 1.11 --- src/backend/access/common/reloptions.c 22 Dec 2008 16:14:40 - *** *** 15,20 --- 15,23 #include "postgres.h" + #include "access/gist_private.h" + #include "access/hash.h" + #include "access/nbtree.h" #include "access/reloptions.h" #include "catalog/pg_type.h" #include "commands/defrem.h" *** *** 22,29 --- 25,157 #include "utils/array.h" #include "utils/builtins.h" #include "utils/guc.h" + #include "utils/memutils.h" #include "utils/rel.h" + /* + * Contents of pg_class.reloptions + * + * To add an option: + * + * (i) decide on a class (integer, real, bool), name, default value, upper + * and lower bounds (if applicable). + * (ii) add a record below. + * (iii) add it to StdRdOptions if appropriate + * (iv) add a block to the appropriate handling routine (probably + * default_reloptions) + * (v) don't forget to document the option + * + * Note that we don't handle "oids" in relOpts because it is handled by + * interpretOidsOption(). + */ + + static relopt_bool boolRelOpts[] = + { + /* list terminator */ + { { NULL } } + }; + + static relopt_int intRelOpts[] = + { + { + { + "fillfactor", + "Packs table pages only to this percentage", + RELOPT_KIND_HEAP + }, + HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100 + }, + { + { + "fillfactor", + "Packs btree index pages only to this percentage", + RELOPT_KIND_BTREE + }, + BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100 + }, + { + { + "fillfactor", + "Packs hash index pages only to this percentage", + RELOPT_KIND_HASH + }, + HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100 + }, + { + { + "fillfactor", + "Packs gist index pages only to this percentage", + RELOPT_KIND_GIST + }, + GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100 + }, + /* list terminator */ + { { NULL } } + }; + + static relopt_real realRelOpts[] = + { + /* list terminator */ + { { NULL } } + }; + + static relopt_gen **relOpts = NULL; + + static void parse_one_reloption(relopt_value *option, char *text_str, + int text_len, bool validate); + + /* + * initialize_reloptions + * initialization routine, must be called at backend start + * + * Initialize the relOpts array and fill each variable's type and name length. + */ + void + initialize_reloptions(void) + { + int i; + int j = 0; + + for (i = 0; boolRelOpts[i].gen.name; i++) + j++; + for (i = 0; intRelOpts[i].gen.name; i++) + j++; + for (i = 0; realRelOpts[i].gen.name; i++) + j++; + + if (relOpts) + pfree(relOpts); + relOpts = MemoryContextAlloc(TopMemoryContext, + (j + 1) * sizeof(relopt_gen *)); + + j = 0; + for (i = 0; boolRelOpts[i].gen.name; i++) + { + relOpts[j] = &boolRelOpts[i].gen; + relOpts[j]->type = RELOPT_TYPE_BOOL; + relOpts[j]->namelen = strlen(relOpts[j]->name); + j++; + } + + for (i = 0; intRelOpts[i].gen.name; i++) + { + relOpts[j] = &intRelOpts[i].gen; + relOpts[j]->type = RELOPT_TYPE_INT; + relOpts[j]->namelen = strlen(relOpts[j]->name); + j++; + } + + for (i = 0; realRelOpts[i].gen.name; i++) + { + relOpts[j] = &realRelOpts[i].gen; + relOpts[j]->type = RELOPT_TYPE_REAL; + relOpts[j]->namelen = strlen(relOpts[j]->name); + j++; + } + + /* add a list terminator */ + relOpts[j] = NULL; + } /* * Transform a relation options list (list of DefElem) into the text array *** transformRelOptions(Datum oldOptions, Li *** 73,81 for (i = 0; i < noldoptions; i++) { ! text *oldoption = DatumGetTextP(oldoptions[i]); ! char *text_str = VARDATA(oldoption); ! int text_len = VARSIZE(oldoption) - VARHDR
[HACKERS] encoding cleanups in cvs repo
I have cleaned up a couple of badly broken encodings in cvs commit messages in: src/backend/utils/error/Attic/exc.c,v per discussion with Alvaro, by simply editing the RCS file directly in the cvs repository. Only the commit message was changed (broken character removed) Should have no effect. In case something broke, I have extra backups :-) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reloptions and toast tables
On 12/20/08, Alvaro Herrera wrote: > > ALTER TABLE foo SET (TOAST autovacuum_enabled = false); ... > ALTER TABLE foo SET TOAST (autovacuum_enabled = false); > i will be happy with any of this options (actually i prefer the second one but don't have a strong argument against the first) -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
On Sun, Dec 21, 2008 at 10:25:59PM -0500, Robert Haas wrote: > [Some performance testing.] I (finally!) have a chance to post my performance testing results... my apologies for the really long delay. Unfortunately I'm not seeing wonderful speedups with the particular queries I did in this case. I generated three 1GB datasets, with skews set at 1, 2, and 3. The test script I wrote turns on enable_usestatmcvs and runs EXPLAIN ANALYZE on the same query five times. Then it turns enable_usestatmcvs off, and runs the same query five more times. It does this with each of the three datasets in turn, and then starts over at the beginning until I tell it to quit. My results showed a statistically significant improvement in speed only on the skew == 3 dataset. I did the same tests twice, once with default_statistics_target set to 10, and once with it set to 100. I've attached boxplots of the total query times as reported by EXPLAIN ANALYZE ("dst10" in the filename indicates default_statistics_target was 10, and so on), my results parsed out of the EXPLAIN ANALYZE output (test.filtered.10 and test.filtered.100), the results of one-tailed Student's T tests of the result set (ttests), and the R code to run the tests if anyone's really interested (t.test.R). The results data includes six columns: the skew value, whether enable_usestatmcvs was on or not (represented by a 1 or 0), total times for each of the three joins that made up the query, and total time for the query itself. The results above pay attention only to the total query time. Finally, the query involved: SELECT * FROM lineitem l LEFT JOIN part p ON (p.p_partkey = l.l_partkey) LEFT JOIN orders o ON (o.o_orderkey = l.l_orderky) LEFT JOIN customer c ON (c.c_custkey = o.o_custkey); - Josh / eggyknap <><>SKEWUSESTAT J1 J2 J3 TOT 1 1 50461.443000397244.673000 453217.081000 459501.492 1 1 47884.085000392737.144000 453039.924000 460809.210 1 1 52175.049000473484.66 518528.66 523864.739 1 1 47127.359000463970.123000 510257.929000 515556.171 1 1 49382.039000492098.877000 542123.146000 547503.329 1 0 43094.98464022.565000 509026.652000 514349.238 1 0 45901.734000439642.013000 490180.994000 495489.335 1 0 43127.40430072.203000 475914.797000 481192.279 1 0 42070.676000375572.825000 423910.457000 429677.988 1 0 56491.288000498455.906000 551204.091000 557467.631 2 1 58372.411000461959.358000 508724.227000 514004.653 2 1 55187.182000451564.246000 497331.791000 502957.730 2 1 61093.577000443683.358000 493160.552000 498868.413 2 1 55299.883000482283.701000 541617.568000 548030.650 2 1 54002.928000499089.964000 544504.041000 549828.715 2 0 56133.232000452656.945000 501956.569000 507287.362 2 0 56900.88478264.522000 537943.058000 544455.088 2 0 61512.999000480176.724000 541688.121000 548684.876 2 0 55106.671000474847.36 522074.604000 527428.018 2 0 57440.536000512357.019000 558515.194000 563922.575 3 1 48912.233000519270.741000 562948.024000 568318.976 3 1 51509.014000455114.005000 502253.369000 507639.017 3 1 48977.903000399254.515000 442796.459000 448157.712 3 1 52664.751000398226.595000 02.503000 449745.454 3 1 57036.981000498623.476000 541792.07 547105.638 3 0 53972.755000490592.656000 544792.70 550086.185 3 0 59046.762000490597.511000 534615.83 539919.402 3 0 49112.387000517318.422000 574361.142000 581877.479 3 0 50138.407000499705.817000 545116.168000 550505.373 3 0 48691.832000510223.136000 564247.448000 570378.601 1 1 68256.834000496599.31 557998.082000 565697.676 1 1 56864.637000456848.446000 502898.716000 508340.867 1 1 53933.953000479646.739000 528711.936000 534046.589 1 1 56468.009000456499.306000 503936.705000 509286.867 1 1 56117.481000464881.592000 511655.733000 517015.575 1 0 60140.954000466226.599000 519332.729000 524760.071 1 0 56106.889000487886.698000 544010.57 550316.703 1 0 62452.804000509665.97 556011.068000 561309.527 1 0 58373.154000468318.808000 515009.584000 520342.427 1 0 52479.479000499852.717000 546099.564000 551457.608 2 1 58950.898000487229.024000 535760.246000 541083.469 2 1 77649.141000542007.659000 596702.949000 602057.034 2 1 66
Re: [HACKERS] Lock conflict behavior?
Tom Lane wrote: > Tatsuo Ishii writes: >> I'm wondering if following behavior of PostgreSQL regarding lock >> conflict is an expected one. Here's a scenario: > >> Session A: >> BEGIN; >> SELECT * FROM pg_class limit 1; -- acquires access share lock > >> Session B: >> BEGIN; >> ALTER TABLE pg_class ; -- waits for acquiring access >> exclusive lock(wil fail anyway >> though) >> Session C: >> SELECT * FROM pg_class...; -- whatever query which needs >> to acces pg_class will be >> blocked, too bad... > >> I understand that B should wait for aquiring lock, but Should C wait >> for? > > If we didn't do this, then a would-be acquirer of exclusive lock would > have a very serious problem with lock starvation: it might wait forever > in the face of a continuous stream of access-share lock requests. See http://en.wikipedia.org/wiki/Readers-writers_problem Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Automatic view update rules
--On Mittwoch, November 26, 2008 10:54:01 +0100 Bernd Helmle wrote: --On Dienstag, November 25, 2008 23:43:02 -0500 Robert Haas wrote: Do you intend to submit an updated version of this patch for this commitfest? I'll do asap, i've updated the status to 'waiting on author'. Okay, i've finally managed to create an updated version with (hopefully) all issues mentioned by Robert adressed. -- Thanks Bernd view_update.patch.bz2 Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lock conflict behavior?
Tatsuo Ishii writes: > I'm wondering if following behavior of PostgreSQL regarding lock > conflict is an expected one. Here's a scenario: > Session A: > BEGIN; > SELECT * FROM pg_class limit 1; -- acquires access share lock > Session B: > BEGIN; > ALTER TABLE pg_class ; -- waits for acquiring access > exclusive lock(wil fail anyway > though) > Session C: > SELECT * FROM pg_class...; -- whatever query which needs > to acces pg_class will be > blocked, too bad... > I understand that B should wait for aquiring lock, but Should C wait > for? If we didn't do this, then a would-be acquirer of exclusive lock would have a very serious problem with lock starvation: it might wait forever in the face of a continuous stream of access-share lock requests. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a small proposal for avoiding foot-shooting
> "Albe" == Albe Laurenz writes: Albe> I *guess* it is the problem addressed by Albe> http://archives.postgresql.org/pgsql-committers/2008-04/msg00275.php Albe> and Albe> http://archives.postgresql.org/pgsql-committers/2008-04/msg00358.php No; the problem is with stop -mimmediate (not -mfast), or any other PANIC shutdown of the postmaster during backup. -- Andrew. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a small proposal for avoiding foot-shooting
Tom Lane wrote: >> I propose that this behaviour be changed such that 'terse' is ignored >> for all log messages of FATAL or PANIC severity. >> [ on the strength of a single example ] > [...] > > It seems like it might be better to rephrase error messages to ensure > that anything really critical is mentioned in the primary message. > In this case, perhaps instead of > errmsg("could not locate required checkpoint record") > we could have it print > errmsg("could not locate checkpoint record specified in file > \"%s/backup_label\".", DataDir) > assuming we did actually get the location from there. > > Anyway, you've omitted a lot of details that would be necessary > to judge exactly what was misleading about what the DBA saw. I *guess* it is the problem addressed by http://archives.postgresql.org/pgsql-committers/2008-04/msg00275.php and http://archives.postgresql.org/pgsql-committers/2008-04/msg00358.php Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Lock conflict behavior?
Hi, I'm wondering if following behavior of PostgreSQL regarding lock conflict is an expected one. Here's a scenario: Session A: BEGIN; SELECT * FROM pg_class limit 1; -- acquires access share lock Session B: BEGIN; ALTER TABLE pg_class ; -- waits for acquiring access exclusive lock(wil fail anyway though) Session C: SELECT * FROM pg_class...; -- whatever query which needs to acces pg_class will be blocked, too bad... I understand that B should wait for aquiring lock, but Should C wait for? Also, it seems that an attacker could do a denial service attack if he could open session A and B, since other users on session C or following sessions will be blocked. -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers