[HACKERS] Call for translations
In anticipation of the release of PostgreSQL 9.1, it is once again time to update the message translations. We are now in a string freeze, which has traditionally been associated with the first release candidate, so it's a good time to do this work now. If you want to help, see http://babel.postgresql.org/ for instructions and other information. If there are already active translation teams, please communicate with them first. The mailing list pgtranslation-translat...@pgfoundry.org is available for general discussion and coordination of translation activities. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots redux
No, I don't think it will all be in memory - but that's part of the performance calculation. If you need to check on the status of an XID and find that you need to read a page of data in from disk, that's going to be many orders of magnitude slower than anything we do with s snapshot now. Now, if you gain enough elsewhere, it could still be a win, but I'm not going to just assume that. I was just suggesting this, because the memory costs have come down a lot(as you may know) and people can afford to buy more memory in enterprise scenario. We may not need to worry about MBs of memory, especially with the cloud computing being widely adopted, when we get scalability. Gokul.
Re: [HACKERS] Displaying accumulated autovacuum cost
Attached patch includes math is hard reworking, so it displays the average write rate in the log output automatically: LOG: automatic vacuum of table pgbench.public.pgbench_accounts: index scans: 1 pages: 0 removed, 163935 remain tuples: 200 removed, 4625165 remain buffer usage: 111901 hits, 123348 misses, 102351 dirtied, 23.365 MiB/s write rate system usage: CPU 1.84s/4.22u sec elapsed 34.22 sec All of the updates to the process title are gone, in favor of some progress report mechanism TBD. The summary is much more important than the progress tracking part as far as I'm concerned, I don't mind splitting things apart to try and get this part in earlier. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 889737e..fa15b2e 100644 *** a/src/backend/commands/vacuum.c --- b/src/backend/commands/vacuum.c *** *** 43,48 --- 43,49 #include utils/fmgroids.h #include utils/guc.h #include utils/memutils.h + #include utils/ps_status.h #include utils/snapmgr.h #include utils/syscache.h #include utils/tqual.h *** vacuum(VacuumStmt *vacstmt, Oid relid, b *** 214,219 --- 215,223 VacuumCostActive = (VacuumCostDelay 0); VacuumCostBalance = 0; + VacuumPageHit = 0; + VacuumPageMiss = 0; + VacuumPageDirty = 0; /* * Loop to process each selected relation. *** vacuum_delay_point(void) *** 1160,1167 if (msec VacuumCostDelay * 4) msec = VacuumCostDelay * 4; - pg_usleep(msec * 1000L); - VacuumCostBalance = 0; /* update balance values for workers */ --- 1164,1169 diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index b5547c5..a41f1cd 100644 *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** lazy_vacuum_rel(Relation onerel, VacuumS *** 151,165 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0; bool scan_all; TransactionId freezeTableLimit; /* measure elapsed time iff autovacuum logging requires it */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) { pg_rusage_init(ru0); ! if (Log_autovacuum_min_duration 0) starttime = GetCurrentTimestamp(); } --- 151,168 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0, endtime; bool scan_all; TransactionId freezeTableLimit; + long secs; + int usecs; + double write_rate; /* measure elapsed time iff autovacuum logging requires it */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) { pg_rusage_init(ru0); ! if (Log_autovacuum_min_duration 0 || VacuumCostActive) starttime = GetCurrentTimestamp(); } *** lazy_vacuum_rel(Relation onerel, VacuumS *** 225,247 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) { if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(), Log_autovacuum_min_duration)) ! ereport(LOG, ! (errmsg(automatic vacuum of table \%s.%s.%s\: index scans: %d\n ! pages: %d removed, %d remain\n ! tuples: %.0f removed, %.0f remain\n ! system usage: %s, ! get_database_name(MyDatabaseId), ! get_namespace_name(RelationGetNamespace(onerel)), ! RelationGetRelationName(onerel), ! vacrelstats-num_index_scans, ! vacrelstats-pages_removed, ! vacrelstats-rel_pages, ! vacrelstats-tuples_deleted, ! vacrelstats-new_rel_tuples, ! pg_rusage_show(ru0; } } --- 228,282 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) { + endtime = GetCurrentTimestamp(); if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, endtime, Log_autovacuum_min_duration)) ! { ! if (VacuumCostActive) ! { ! TimestampDifference(starttime, endtime, secs, usecs); ! write_rate = 0; ! if ((secs 0) || (usecs 0)) ! write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) / ! (secs + usecs / 100.0); ! ! ereport(LOG, ! (errmsg(automatic vacuum of table \%s.%s.%s\: index scans: %d\n ! pages: %d removed, %d remain\n ! tuples: %.0f removed, %.0f remain\n ! buffer usage: %d hits, %d misses, %d dirtied, %.3f MiB/s write rate\n ! system usage: %s, ! get_database_name(MyDatabaseId), ! get_namespace_name(RelationGetNamespace(onerel)), ! RelationGetRelationName(onerel), ! vacrelstats-num_index_scans, ! vacrelstats-pages_removed, !
Re: [HACKERS] Inputting relative datetimes
On 28 August 2011 00:39, Robert Haas robertmh...@gmail.com wrote: On Sat, Aug 27, 2011 at 7:43 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: On 27 August 2011 12:29, Dean Rasheed dean.a.rash...@gmail.com wrote: ... if nothing else it has been a fun exercise figuring out how the datetime string parsing code works. While looking through the current code, I spotted the following oddity: select timestamp 'yesterday 10:30'; timestamp - 2011-08-26 10:30:00 which is what you'd expect, however: select timestamp '10:30 yesterday'; timestamp - 2011-08-26 00:00:00 Similarly today and tomorrow reset any time fields so far, but ISTM that they should really be preserving the hour, min, sec fields decoded so far. Sounds right to me. Want to send a patch? The attached patch makes today, tomorrow and yesterday only set the year, month and day fields. All the other fields are already initialised to 0 at the start, and may be set non-zero before or after encountering these special date values. The result should now be independent of the order of the fields. Regards, Dean datetime.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
Pity that this patch works only on hpux :(. But i have an idea: maybe when executor stop at locked row, it should process next row instead of wait. Of course if query not contain order by or windowing functions. -- pasman -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
Updated patch cleans up two diff mistakes made when backing out the progress report feature. The tip-off I screwed up should have been the absurdly high write rate shown. The usleep was accidentally deleted, so it was running without cost limits even applying. Here's a good one instead: LOG: automatic vacuum of table pgbench.public.pgbench_accounts: index scans: 1 pages: 0 removed, 163935 remain tuples: 200 removed, 2928356 remain buffer usage: 117393 hits, 123351 misses, 102684 dirtied, 2.168 MiB/s write rate system usage: CPU 2.54s/6.27u sec elapsed 369.99 sec -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 889737e..c9890b4 100644 *** a/src/backend/commands/vacuum.c --- b/src/backend/commands/vacuum.c *** vacuum(VacuumStmt *vacstmt, Oid relid, b *** 214,219 --- 214,222 VacuumCostActive = (VacuumCostDelay 0); VacuumCostBalance = 0; + VacuumPageHit = 0; + VacuumPageMiss = 0; + VacuumPageDirty = 0; /* * Loop to process each selected relation. diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index b5547c5..a41f1cd 100644 *** a/src/backend/commands/vacuumlazy.c --- b/src/backend/commands/vacuumlazy.c *** lazy_vacuum_rel(Relation onerel, VacuumS *** 151,165 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0; bool scan_all; TransactionId freezeTableLimit; /* measure elapsed time iff autovacuum logging requires it */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) { pg_rusage_init(ru0); ! if (Log_autovacuum_min_duration 0) starttime = GetCurrentTimestamp(); } --- 151,168 int nindexes; BlockNumber possibly_freeable; PGRUsage ru0; ! TimestampTz starttime = 0, endtime; bool scan_all; TransactionId freezeTableLimit; + long secs; + int usecs; + double write_rate; /* measure elapsed time iff autovacuum logging requires it */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) { pg_rusage_init(ru0); ! if (Log_autovacuum_min_duration 0 || VacuumCostActive) starttime = GetCurrentTimestamp(); } *** lazy_vacuum_rel(Relation onerel, VacuumS *** 225,247 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) { if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(), Log_autovacuum_min_duration)) ! ereport(LOG, ! (errmsg(automatic vacuum of table \%s.%s.%s\: index scans: %d\n ! pages: %d removed, %d remain\n ! tuples: %.0f removed, %.0f remain\n ! system usage: %s, ! get_database_name(MyDatabaseId), ! get_namespace_name(RelationGetNamespace(onerel)), ! RelationGetRelationName(onerel), ! vacrelstats-num_index_scans, ! vacrelstats-pages_removed, ! vacrelstats-rel_pages, ! vacrelstats-tuples_deleted, ! vacrelstats-new_rel_tuples, ! pg_rusage_show(ru0; } } --- 228,282 /* and log the action if appropriate */ if (IsAutoVacuumWorkerProcess() Log_autovacuum_min_duration = 0) { + endtime = GetCurrentTimestamp(); if (Log_autovacuum_min_duration == 0 || ! TimestampDifferenceExceeds(starttime, endtime, Log_autovacuum_min_duration)) ! { ! if (VacuumCostActive) ! { ! TimestampDifference(starttime, endtime, secs, usecs); ! write_rate = 0; ! if ((secs 0) || (usecs 0)) ! write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) / ! (secs + usecs / 100.0); ! ! ereport(LOG, ! (errmsg(automatic vacuum of table \%s.%s.%s\: index scans: %d\n ! pages: %d removed, %d remain\n ! tuples: %.0f removed, %.0f remain\n ! buffer usage: %d hits, %d misses, %d dirtied, %.3f MiB/s write rate\n ! system usage: %s, ! get_database_name(MyDatabaseId), ! get_namespace_name(RelationGetNamespace(onerel)), ! RelationGetRelationName(onerel), ! vacrelstats-num_index_scans, ! vacrelstats-pages_removed, ! vacrelstats-rel_pages, ! vacrelstats-tuples_deleted, ! vacrelstats-new_rel_tuples, ! VacuumPageHit, ! VacuumPageMiss, ! VacuumPageDirty, ! write_rate, ! pg_rusage_show(ru0; ! } ! else ! ereport(LOG, ! (errmsg(automatic vacuum of table \%s.%s.%s\: index scans: %d\n ! pages: %d removed, %d remain\n ! tuples: %.0f removed, %.0f remain\n ! system usage: %s, ! get_database_name(MyDatabaseId), ! get_namespace_name(RelationGetNamespace(onerel)), !
Re: [HACKERS] Inputting relative datetimes
On 28 August 2011 00:00, Jeff MacDonald j...@zoidtechnologies.com wrote: Greetings, On Saturday, August 27, 2011 11:36:13 AM Dean Rasheed wrote: I'm not sure how best to handle timezones though, since it's hard-coded list probably won't match the timezones PostgreSQL knows about. Maybe that doesn't matter, I'm not sure. It'll matter when the expression has a result that crosses the DST date. Does Postgres have a library that could be used by the parser? On further examination of this and other datetime parsing code, I am coming to the conclusion that for absolute timestamps the PostgreSQL code is (IMO) the best and most flexible in terms of accepting a variety of commons formats. While getdate.y and others offer better support for relative timestamps, it seems to come at the cost of poorer support for absolute timestamps, which for me is a deal-breaker. There are also other incompatibilities, such as different handling of today, which is just likely to lead to confusion. It may not be worth the effort of trying to get a patch into core PostgreSQL for this, but given the already excellent absolute timestamp and interval support, I'm thinking that maybe the best answer is to just write a userland function that breaks an input string up into timestamp and interval parts and returns the resulting timestamp. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] confusing invalid UTF8 byte sequence error
Has anyone else ever found this error message confusing: ERROR: 22021: invalid byte sequence for encoding UTF8: 0xdb24 I think what is really meant is better expressed like this: ERROR: 22021: invalid byte sequence for encoding UTF8: 0xdb 0x24 Otherwise it looks like a codepoint or a 16-bit word (endianness?) or who knows what. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
Robert Haas robertmh...@gmail.com writes: First, I did this: -#define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE) +#define TAS(lock) (*(lock) ? 1 : _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)) Seems reasonable, and similar to x86 logic. Then, I did this: - while (TAS(lock)) + while (*lock ? 1 : TAS(lock)) Er, what? That sure looks like a manual application of what you'd already done in the TAS macro. Of course, we can't apply the second patch as it stands, because I tested it on x86 and it loses. But it seems pretty clear we need to do it at least for this architecture... Please clarify: when you say this architecture, are you talking about IA64 or PA-RISC? Is there any reason to think that this is specific to HP-UX rather than any other system on the same architecture? (I'm sure I can get access to some IA64 clusters at Red Hat, though maybe not 64-core ones.) I don't have an objection to the TAS macro change, but I do object to fooling with the hardware-independent code in s_lock.c ... especially when the additional improvement seems barely above the noise threshold. You ought to be able to do whatever you need inside the TAS macro. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [v9.2] Object access hooks with arguments support (v1)
The attached patch is a draft to support arguments in addition to OAT_* enum and object identifiers. The existing object_access_hook enables loadable modules to acquire control when objects are referenced. The first guest of this hook is contrib/sepgsql for assignment of default security label on newly created objects. Right now, OAT_POST_CREATE is the all supported object access type. However, we plan to enhance this mechanism onto other widespread purpose; such as comprehensive DDL permissions supported by loadable modules. This patch is a groundwork to utilize this hook for object creation permission checks, not only default labeling. At the v9.1 development cycle, I proposed an idea that defines both OAT_CREATE hook prior to system catalog modification and OAT_POST_CREATE hook as currently we have. This design enables to check permission next to the existing pg_xxx_aclcheck() or pg_xxx_ownercheck(), and raise an error before system catalog updates. However, it was painful to deliver private datum set on OAT_CREATE to the OAT_POST_CREATE due to the code complexity. The other idea tried to do all the necessary things in OAT_POST_CREATE hook, and it had been merged, because loadable modules can pull properties of the new object from system catalogs by the supplied object identifiers. Thus, contrib/sepgsql assigns a default security label on new object using OAT_POST_CREATE hook. However, I have two concern on the existing hook to implement permission check for object creation. The first one is the entry of system catalog is not visible using SnaphotNow, so we had to open and scan system catalog again, instead of syscache mechanism. The second one is more significant. A part of information to make access control decision is not available within contents of the system catalog entries. For example, we hope to skip permission check when heap_create_with_catalog() was launched by make_new_heap() because the new relation is just used to swap later. Thus, I'd like to propose to support argument of object_access_hook to inform the loadable modules additional contextual information on its invocations; to solve these concerns. Regarding to the first concern, fortunately, most of existing OAT_POST_CREATE hook is deployed just after insert or update of system catalogs, but before CCI. So, it is helpful for the loadable modules to deliver Form_pg_ data to reference properties of the new object, instead of open and scan the catalog again. In the draft patch, I enhanced OAT_POST_CREATE hook commonly to take an argument that points to the Form_pg_ data of the new object. Regarding to the second concern, I added a few contextual information as second or later arguments in a part of object classes. Right now, I hope the following contextual information shall be provided to OAT_POST_CREATE hook to implement permission checks of object creation. * pg_class - TupleDesc structure of the new relation I want to reference of pg_attribute, not only pg_class. * pg_class - A flag to show whether the relation is defined for rebuilding, or not. I want not to apply permission check in the case when it is invoked from make_new_heap(), because it just create a dummy table as a part of internal process. All the necessary permission checks should be done at ALTER TABLE or CLUSTER. * pg_class - A flag to show whether the relation is created with SELECT INTO, or not. I want to check insert permission of the new table, created by SELECT INTO, because DML hook is not available to check this case. * pg_type - A flag to show whether the type is defined as implicit array, or not. I want not to apply permission check on creation of implicit array type. * pg_database - Oid of the source (template) database. I want to fetch security label of the source database to compute a default label of the new database. * pg_trigger - A flag to show whether the trigger is used to FK constraint, or not. I want not to apply permission check on creation of FK constraint. It should be done in ALTER TABLE level. Sorry for this long explanation. Right now, I tend to consider it is the best way to implement permission checks on object creation with least invasive way. Thanks, Any comments welcome. -- KaiGai Kohei kai...@kaigai.gr.jp pgsql-v9.2-object-access-hooks-argument.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why buildfarm member anchovy is failing on 8.2 and 8.3 branches
I spent a bit of time looking into $SUBJECT. The cause of the failure is that configure mistakenly decides that setproctitle and some other functions are available, when they aren't; this eventually leads to link failures of course. Now 8.2 and 8.3 use autoconf 2.59. 8.4 and up, which do not exhibit this failure, use autoconf 2.61 or later. Sure enough, there is a difference in the test program generated by the more recent autoconfs: they actually try to call the function, where the previous ones do something weird involving a function pointer comparison. I dug in the autoconf change log and found this: 2005-10-19 Paul Eggert egg...@cs.ucla.edu (AC_LANG_FUNC_LINK_TRY(C)): Call the function rather than simply comparing its address. Intel's interprocedural optimization was outsmarting the old heuristic. Problem reported by Mikulas Patocka. Since anchovy is using the gold linker at -O3, it's not exactly surprising that it might be carrying out aggressive interprocedural optimizations that we're not seeing used on other platforms. The bottom line seems to be that autoconf 2.59 is seriously broken on recent toolchains. Should we try to do something about that, like migrate the 8.2 and 8.3 releases to a newer autoconf? 8.2 is close enough to EOL that I don't mind answering no for it, but maybe we should do that in 8.3. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why buildfarm member anchovy is failing on 8.2 and 8.3 branches
On 08/28/2011 04:15 PM, Tom Lane wrote: I spent a bit of time looking into $SUBJECT. The cause of the failure is that configure mistakenly decides that setproctitle and some other functions are available, when they aren't; this eventually leads to link failures of course. Now 8.2 and 8.3 use autoconf 2.59. 8.4 and up, which do not exhibit this failure, use autoconf 2.61 or later. Sure enough, there is a difference in the test program generated by the more recent autoconfs: they actually try to call the function, where the previous ones do something weird involving a function pointer comparison. I dug in the autoconf change log and found this: 2005-10-19 Paul Eggertegg...@cs.ucla.edu (AC_LANG_FUNC_LINK_TRY(C)): Call the function rather than simply comparing its address. Intel's interprocedural optimization was outsmarting the old heuristic. Problem reported by Mikulas Patocka. Since anchovy is using the gold linker at -O3, it's not exactly surprising that it might be carrying out aggressive interprocedural optimizations that we're not seeing used on other platforms. The bottom line seems to be that autoconf 2.59 is seriously broken on recent toolchains. Should we try to do something about that, like migrate the 8.2 and 8.3 releases to a newer autoconf? 8.2 is close enough to EOL that I don't mind answering no for it, but maybe we should do that in 8.3. If we're going to do it for 8.3 we might as well for 8.2 at the same time, ISTM, even if it is close to EOL. Is -O3 a recommended setting for icc? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why buildfarm member anchovy is failing on 8.2 and 8.3 branches
Andrew Dunstan and...@dunslane.net writes: On 08/28/2011 04:15 PM, Tom Lane wrote: The bottom line seems to be that autoconf 2.59 is seriously broken on recent toolchains. Should we try to do something about that, like migrate the 8.2 and 8.3 releases to a newer autoconf? 8.2 is close enough to EOL that I don't mind answering no for it, but maybe we should do that in 8.3. If we're going to do it for 8.3 we might as well for 8.2 at the same time, ISTM, even if it is close to EOL. Yeah, possibly, if it's not too invasive. I've not yet done any research about what would need to change. Is -O3 a recommended setting for icc? No idea. But after a bit of man-page-reading I think it's probably not the -O level that counts, so much as the fact that anchovy is using -flto (link-time optimization) in CFLAGS. I don't see any indication that that's being selected by the buildfarm script itself, so it must be coming from an environment setting of CFLAGS. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
On Sun, Aug 28, 2011 at 11:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: First, I did this: -#define TAS(lock) _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE) +#define TAS(lock) (*(lock) ? 1 : _Asm_xchg(_SZ_W, lock, 1, _LDHINT_NONE)) Seems reasonable, and similar to x86 logic. Then, I did this: - while (TAS(lock)) + while (*lock ? 1 : TAS(lock)) Er, what? That sure looks like a manual application of what you'd already done in the TAS macro. Sorry, I blew through that a little too blithely. If you change TAS() itself, then even the very first attempt to acquire the lock will try the unlocked instruction first, whereas changing s_lock() allows you to do something different in the contended case than you do in the uncontended case. We COULD just change the TAS() macro since, in this case, it seems to make only a minor difference, but what I was thinking is that we could change s_lock.h to define two macros, TAS() and TAS_SPIN(). If a particular architecture defines TAS() but not TAS_SPIN(), then we define TAS_SPIN(x) to be TAS(x). Then, S_LOCK() can stay as-is - calling TAS() - but s_lock() can call TAS_SPIN(), which will normally be the same as TAS() but can be made different on any architecture where the retry loop should do something different than the initial attempt. Please clarify: when you say this architecture, are you talking about IA64 or PA-RISC? Is there any reason to think that this is specific to HP-UX rather than any other system on the same architecture? (I'm sure I can get access to some IA64 clusters at Red Hat, though maybe not 64-core ones.) I tested on IA64; I don't currently have access to a PA-RISC box. The documentation I'm looking at implies that the same approach would be desirable there, but that's just an unsubstantiated rumor at this point -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
Robert Haas robertmh...@gmail.com writes: On Sun, Aug 28, 2011 at 11:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Then, I did this: - while (TAS(lock)) + while (*lock ? 1 : TAS(lock)) Er, what? That sure looks like a manual application of what you'd already done in the TAS macro. Sorry, I blew through that a little too blithely. If you change TAS() itself, then even the very first attempt to acquire the lock will try the unlocked instruction first, whereas changing s_lock() allows you to do something different in the contended case than you do in the uncontended case. Yeah, I figured out that was probably what you meant a little while later. I found a 64-CPU IA64 machine in Red Hat's test labs and am currently trying to replicate your results; report to follow. We COULD just change the TAS() macro since, in this case, it seems to make only a minor difference, but what I was thinking is that we could change s_lock.h to define two macros, TAS() and TAS_SPIN(). Yeah, I was thinking along the same lines, though perhaps the name of the new macro could use a little bikeshedding. The comments in s_lock.h note that the unlocked test in x86 TAS is of uncertain usefulness. It seems entirely possible to me that we ought to use a similar design on x86, ie, use the unlocked test only once we've entered the delay loop. Please clarify: when you say this architecture, are you talking about IA64 or PA-RISC? Is there any reason to think that this is specific to HP-UX rather than any other system on the same architecture? (I'm sure I can get access to some IA64 clusters at Red Hat, though maybe not 64-core ones.) I tested on IA64; I don't currently have access to a PA-RISC box. The documentation I'm looking at implies that the same approach would be desirable there, but that's just an unsubstantiated rumor at this point Well, I've got a PA-RISC box, but it's only a single processor so it's not gonna prove much. Anybody? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why buildfarm member anchovy is failing on 8.2 and 8.3 branches
On 08/28/2011 05:51 PM, Tom Lane wrote: Is -O3 a recommended setting for icc? No idea. But after a bit of man-page-reading I think it's probably not the -O level that counts, so much as the fact that anchovy is using -flto (link-time optimization) in CFLAGS. I don't see any indication that that's being selected by the buildfarm script itself, so it must be coming from an environment setting of CFLAGS. The buildfarm member is using: 'CFLAGS' = '-O3 -xN -parallel -ip' 'CC' = 'icc' cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why buildfarm member anchovy is failing on 8.2 and 8.3 branches
Andrew Dunstan and...@dunslane.net writes: On 08/28/2011 05:51 PM, Tom Lane wrote: Is -O3 a recommended setting for icc? No idea. But after a bit of man-page-reading I think it's probably not the -O level that counts, so much as the fact that anchovy is using -flto (link-time optimization) in CFLAGS. I don't see any indication that that's being selected by the buildfarm script itself, so it must be coming from an environment setting of CFLAGS. The buildfarm member is using: 'CFLAGS' = '-O3 -xN -parallel -ip' 'CC' = 'icc' Er, anchovy? Where do you see that? The only thing I see it forcing is 'config_env' = { 'CC' = 'ccache cc' }, regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why buildfarm member anchovy is failing on 8.2 and 8.3 branches
On 08/28/2011 06:21 PM, Tom Lane wrote: Andrew Dunstanand...@dunslane.net writes: On 08/28/2011 05:51 PM, Tom Lane wrote: Is -O3 a recommended setting for icc? No idea. But after a bit of man-page-reading I think it's probably not the -O level that counts, so much as the fact that anchovy is using -flto (link-time optimization) in CFLAGS. I don't see any indication that that's being selected by the buildfarm script itself, so it must be coming from an environment setting of CFLAGS. The buildfarm member is using: 'CFLAGS' = '-O3 -xN -parallel -ip' 'CC' = 'icc' Er, anchovy? Where do you see that? The only thing I see it forcing is 'config_env' = { 'CC' = 'ccache cc' }, Sorry, yes, you're right. I was looking at mongoose. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
I wrote: Yeah, I figured out that was probably what you meant a little while later. I found a 64-CPU IA64 machine in Red Hat's test labs and am currently trying to replicate your results; report to follow. OK, these results are on a 64-processor SGI IA64 machine (AFAICT, 64 independent sockets, no hyperthreading or any funny business); 124GB in 32 NUMA nodes; running RHEL5.7, gcc 4.1.2. I built today's git head with --enable-debug (but not --enable-cassert) and ran with all default configuration settings except shared_buffers = 8GB and max_connections = 200. The test database is initialized at -s 100. I did not change the database between runs, but restarted the postmaster and then did this to warm the caches a tad: pgbench -c 1 -j 1 -S -T 30 bench Per-run pgbench parameters are as shown below --- note in particular that I assigned one pgbench thread per 8 backends. The numbers are fairly variable even with 5-minute runs; I did each series twice so you could get a feeling for how much. Today's git head: pgbench -c 1 -j 1 -S -T 300 bench tps = 5835.213934 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 8499.223161 (including ... pgbench -c 8 -j 1 -S -T 300 bench tps = 15197.126952 (including ... pgbench -c 16 -j 2 -S -T 300 bench tps = 30803.255561 (including ... pgbench -c 32 -j 4 -S -T 300 bench tps = 65795.356797 (including ... pgbench -c 64 -j 8 -S -T 300 bench tps = 81644.914241 (including ... pgbench -c 96 -j 12 -S -T 300 bench tps = 40059.202836 (including ... pgbench -c 128 -j 16 -S -T 300 benchtps = 21309.615001 (including ... run 2: pgbench -c 1 -j 1 -S -T 300 bench tps = 5787.310115 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 8747.104236 (including ... pgbench -c 8 -j 1 -S -T 300 bench tps = 14655.369995 (including ... pgbench -c 16 -j 2 -S -T 300 bench tps = 28287.254924 (including ... pgbench -c 32 -j 4 -S -T 300 bench tps = 61614.715187 (including ... pgbench -c 64 -j 8 -S -T 300 bench tps = 79754.640518 (including ... pgbench -c 96 -j 12 -S -T 300 bench tps = 40334.994324 (including ... pgbench -c 128 -j 16 -S -T 300 benchtps = 23285.271257 (including ... With modified TAS macro (see patch 1 below): pgbench -c 1 -j 1 -S -T 300 bench tps = 6171.454468 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 8709.003728 (including ... pgbench -c 8 -j 1 -S -T 300 bench tps = 14902.731035 (including ... pgbench -c 16 -j 2 -S -T 300 bench tps = 29789.744482 (including ... pgbench -c 32 -j 4 -S -T 300 bench tps = 59991.549128 (including ... pgbench -c 64 -j 8 -S -T 300 bench tps = 117369.287466 (including ... pgbench -c 96 -j 12 -S -T 300 bench tps = 112583.144495 (including ... pgbench -c 128 -j 16 -S -T 300 benchtps = 110231.305282 (including ... run 2: pgbench -c 1 -j 1 -S -T 300 bench tps = 5670.097936 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 8230.786940 (including ... pgbench -c 8 -j 1 -S -T 300 bench tps = 14785.952481 (including ... pgbench -c 16 -j 2 -S -T 300 bench tps = 29335.875139 (including ... pgbench -c 32 -j 4 -S -T 300 bench tps = 59605.433837 (including ... pgbench -c 64 -j 8 -S -T 300 bench tps = 108884.294519 (including ... pgbench -c 96 -j 12 -S -T 300 bench tps = 110387.439978 (including ... pgbench -c 128 -j 16 -S -T 300 benchtps = 109046.121191 (including ... With unlocked test in s_lock.c delay loop only (patch 2 below): pgbench -c 1 -j 1 -S -T 300 bench tps = 5426.491088 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 8787.939425 (including ... pgbench -c 8 -j 1 -S -T 300 bench tps = 15720.801359 (including ... pgbench -c 16 -j 2 -S -T 300 bench tps = 33711.102718 (including ... pgbench -c 32 -j 4 -S -T 300 bench tps = 61829.180234 (including ... pgbench -c 64 -j 8 -S -T 300 bench tps = 109781.655020 (including ... pgbench -c 96 -j 12 -S -T 300 bench tps = 107132.848280 (including ... pgbench -c 128 -j 16 -S -T 300 benchtps = 106533.630986 (including ... run 2: pgbench -c 1 -j 1 -S -T 300 bench tps = 5705.283316 (including ... pgbench -c 2 -j 1 -S -T 300 bench tps = 8442.798662 (including ... pgbench -c 8 -j 1 -S -T 300 bench tps = 14423.723837 (including ... pgbench -c 16 -j 2 -S -T 300 bench tps = 29112.751995 (including ... pgbench -c 32 -j 4 -S -T 300 bench tps = 62258.984033 (including ... pgbench -c 64 -j 8 -S -T 300 bench tps = 107741.988800 (including ... pgbench -c 96 -j 12 -S -T 300 bench tps = 107138.968981 (including ... pgbench -c 128 -j 16 -S -T 300 benchtps = 106110.215138 (including ... So this pretty well confirms Robert's results, in particular that all of the win from an unlocked test comes from using it in the delay loop. Given the lack of evidence that a general change in TAS() is beneficial, I'm inclined to vote against it, on the grounds that the
Re: [HACKERS] spinlocks on HP-UX
On Sun, Aug 28, 2011 at 7:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: So this pretty well confirms Robert's results, in particular that all of the win from an unlocked test comes from using it in the delay loop. Given the lack of evidence that a general change in TAS() is beneficial, I'm inclined to vote against it, on the grounds that the extra test is surely a loss at some level when there is not contention. (IOW, +1 for inventing a second macro to use in the delay loop only.) Beautiful. Got a naming preference for that second macro? I suggested TAS_SPIN() because it's what you use when you spin, as opposed to what you use in the uncontended case, but I'm not attached to that. We ought to do similar tests on other architectures. I found some lots-o-processors x86_64 machines at Red Hat, but they don't seem to own any PPC systems with more than 8 processors. Anybody have big iron with other non-Intel chips? Aside from PPC, it would probably be worth testing SPARC and ARM if we can find machines. Anything else is probably too old or too marginal to get excited about. AFAIK these effects don't manifest with 32 cores, so I suspect that a lot of what's in s_lock.h is irrelevant just because many of those architectures are too old to exist in 32-core versions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cheaper snapshots redux
On Sun, Aug 28, 2011 at 4:33 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: No, I don't think it will all be in memory - but that's part of the performance calculation. If you need to check on the status of an XID and find that you need to read a page of data in from disk, that's going to be many orders of magnitude slower than anything we do with s snapshot now. Now, if you gain enough elsewhere, it could still be a win, but I'm not going to just assume that. I was just suggesting this, because the memory costs have come down a lot(as you may know) and people can afford to buy more memory in enterprise scenario. We may not need to worry about MBs of memory, especially with the cloud computing being widely adopted, when we get scalability. The proof of the pudding is in the eating, so let me finish coding up this approach and see how it works. Then we can decide where to go next... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
Robert Haas robertmh...@gmail.com writes: On Sun, Aug 28, 2011 at 7:19 PM, Tom Lane t...@sss.pgh.pa.us wrote: (IOW, +1 for inventing a second macro to use in the delay loop only.) Beautiful. Got a naming preference for that second macro? I suggested TAS_SPIN() because it's what you use when you spin, as opposed to what you use in the uncontended case, but I'm not attached to that. I had been thinking TAS_CONTENDED, but on reflection there's not a strong argument for that over TAS_SPIN. Do what you will. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spinlocks on HP-UX
2011/8/28 pasman pasmański pasma...@gmail.com: Pity that this patch works only on hpux :(. Well, not really. x86 is already well-behaved. On a 32-core x86 box running Linux, performs seems to plateau and level off, and then fall off gradually. But on ia64, performance just collapses after about 24 cores. The fact that we don't have that problem everywhere is a good thing, not a bad thing... But i have an idea: maybe when executor stop at locked row, it should process next row instead of wait. Of course if query not contain order by or windowing functions. That wouldn't really help, first of all because you'd then have to remember to go back to that row (and chances are it would still be contended then), and second because these aren't row-level locks anyway. They're locks on various global data structures, such as the ProcArray. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] strange row number estimates in pg9.1rc1
Hello hackers, I'm seeing something weird which looks like a bug in 9.1rc1 after the upgrade 8.4-9.0-9.1 done using pg_upgrade. I have a set of *static* tables for which explain select * gives row number estimates which are an order of magnitude lower than the actual number of rows in a table (despite the vacuum analyze executed on a table immediately before). See: wsdb= vacuum verbose analyze ukidssdr7.lassource; INFO: vacuuming ukidssdr7.lassource INFO: index ukidssdr7lassource_q3c_idx now contains 58060655 row versions in 143515 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.55s/0.19u sec elapsed 3.23 sec. . INFO: lassource: found 0 removable, 0 nonremovable row versions in 0 out of 6451184 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 2.66s/0.94u sec elapsed 17.92 sec. INFO: analyzing ukidssdr7.lassource INFO: lassource: scanned 3 of 6451184 pages, containing 27 live rows and 0 dead rows; 3 rows in sample, 806239 estimated total rows wsdb= explain select * from ukidssdr7.lassource ; QUERY PLAN -- Seq Scan on lassource (cost=0.00..6459246.39 rows=806239 width=766) wsdb= select count(*) from ukidssdr7.lassource ; count -- 58060655 (1 row) All the columns in that table have fixed width types (e.g. real,int,bigint etc; no varchars, texts). So I don't see the reason why the row number estimate must be so much off. I also checked that the size of the relation is almost exactly equal to width * count(*) = 766*58060655. So there is no empty space anywhere in the relation ( as it should be because there was completely no write activity on the table). And I noticed that at least for several tables with hundreds of millions rows, explain select * shows ridiculously small number of expected rows: wsdb= explain select * from sdssdr7.phototag ; QUERY PLAN -- Seq Scan on phototag (cost=0.00..24408626.00 rows=72 width=288) I guess it may be important that I did upgrade the cluster from 8.4 to 9.0 and to 9.1 using pg_upgrade. vacuum analyze have been run on the cluster. after the ugprades. Am i missing something or is it a bug ? it looks to me like some arithmetic error in the computation of the number of rows in the tables. At least before when I was using PG 8.4 for the same data, I was used to do explain select * to get the number of rows in the tables, instead of count(*) (my tables are very large), now it seems that there is a huge discrepancy between the numbers. Thanks, Sergey *** Sergey E. Koposov, PhD Institute for Astronomy, Cambridge/Sternberg Astronomical Institute Web: http://lnfm1.sai.msu.ru/~math E-mail: m...@sai.msu.ru -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: regular logging of checkpoint progress
On Sat, Aug 27, 2011 at 03:57:16PM +0200, Tomas Vondra wrote: On 27 Srpen 2011, 6:01, Noah Misch wrote: On Fri, Aug 26, 2011 at 10:46:33AM +0200, Tomas Vondra wrote: 1. collect pg_stat_bgwriter stats 2. run pgbench for 10 minutes 3. collect pg_stat_bgwriter stats (to compute difference with (1)) 4. kill the postmaster The problem is that when checkpoint stats are collected, there might be a checkpoint in progress and in that case the stats are incomplete. In some cases (especially with very small db blocks) this has significant impact because the checkpoints are less frequent. Could you remove this hazard by adding a step 2a. psql -c CHECKPOINT? I already do that, but it really does not solve the issue. It just aligns the first expected 'timed' checkpoint, it does not solve the problem with in-progress checkpoints unless the runs behave exactly the same (and that's the boring case). To clarify, run that command _after_ the 10-minute pgbench run. It blocks until completion of both the in-progress checkpoint, if any, and the requested checkpoint. -- Noah Mischhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers