Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 05:38:14 Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: I'm all in favor of having some memory ordering primitives so that we can try to implement better algorithms, but if we use it here it amounts to a fairly significant escalation in the minimum requirements to compile PG (which is bad) rather than just a performance optimization (which is good). I don't believe there would be any escalation in compilation requirements: we already have the ability to invoke stronger primitives than these. What is needed is research to find out what the primitives are called, on platforms where we aren't relying on direct asm access. My feeling is it's time to bite the bullet and do that work. We shouldn't cripple the latch operations because of laziness at the outset. I don't think developing the code is the actual code is that hard - s_lock.c contains nearly everything necessary. An 'lock xchg' or similar is only marginally slower then the barrier-only implementation. So doing a TAS() on a slock_t in private memory should be an easy enough fallback implementation. So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses spinlocks for that purpose - no idea where that is true these days. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib: auth_delay module
(2010/11/18 2:17), Robert Haas wrote: On Wed, Nov 17, 2010 at 10:32 AM, Ross J. Reedstromreeds...@rice.edu wrote: On Tue, Nov 16, 2010 at 09:41:37PM -0500, Robert Haas wrote: On Tue, Nov 16, 2010 at 8:15 PM, KaiGai Koheikai...@ak.jp.nec.com wrote: If we don't need a PoC module for each new hooks, I'm not strongly motivated to push it into contrib tree. How about your opinion? I'd say let it go, unless someone else feels strongly about it. I would use this module (rate limit new connection attempts) as soon as I could. Putting a cap on potential CPU usage on a production DB by either a blackhat or mistake by a developer caused by a mistake in configuration (leaving the port accessible) is definitely useful, even in the face of max_connections. My production apps already have their connections and seldom need new ones. They all use CPU though. If KaiGai updates the code per previous discussion, would you be willing to take a crack at adding documentation? P.S. Your email client seems to be setting the Reply-To address to a ridiculous value. OK, I'll revise my patch according to the previous discussion. Please wait for about one week. I have a big event in this weekend. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Label switcher function
(2010/11/18 11:30), Robert Haas wrote: 2010/11/17 KaiGai Koheikai...@ak.jp.nec.com: I revised my patch as I attached. The hook function is modified and consolidated as follows: typedef enum FunctionCallEventType { FCET_BE_HOOKED, FCET_PREPARE, FCET_START, FCET_END, FCET_ABORT, } FunctionCallEventType; typedef Datum (*function_call_event_type)(Oid functionId, FunctionCallEventType event, Datum event_arg); extern PGDLLIMPORT function_call_event_type function_call_event_hook; Unlike the subject of this e-mail, now it does not focus on only switching security labels during execution of a certain functions. For example, we may use this hook to track certain functions for security auditing, performance tuning, and others. In the case of SE-PgSQL, it shall return BoolGetDatum(true), if the target function is configured as a trusted procedure, then, this invocation will be hooked by fmgr_security_definer. In the first call, it shall compute the security context to be assigned during execution on FCET_PREPARE event. Then, it switches to the computed label on the FCET_START event, and restore it on the FCET_END or ECET_ABORT event. This seems like it's a lot simpler than before, which is good. It looks to me as though there should really be two separate hooks, though, one for what is now FCET_BE_HOOKED and one for everything else. For FCET_BE_HOOKED, you want a function that takes an Oid and returns a bool. For the other event types, the functionId and event arguments are OK, but I think you should forget about the save_datum stuff and just always pass fcache-flinfo andfcache-private. The plugin can get the effect of save_datum by passing around whatever state it needs to hold on to using fcache-private. So: bool (*needs_function_call_hook)(Oid fn_oid); void (*function_call_hook)(Oid fn_oid, FunctionCallEventType event, FmgrInfo flinfo, Datum *private); It seems to me a good idea. The characteristic of FCET_BE_HOOKED event type was a bit different from other three event types. Please wait for about a week to revise my patch. Another general comment is that you've not done a very complete job updating the comments; there are several of them in fmgr.c that are no longer accurate. Also, please zap the unnecessary whitespace changes. Indeed, the comment at middle of the fmgr_info_cxt_security() and just above definition of the fmgr_security_definer() are not correct. Did you notice anything else? Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED estimated time of arrival?
Some random comments on the patch: ReleaseConnection is a very generic name for a global function, would be good to prefix it with pgsqlfdw or something. Same with any other globally visible functions. Please use the built-in contain_mutable_functions(Node *) instead of custom is_immutable_func(). Or at least func_volatile(Oid) Is it really a good idea to allow LOCK TABLE on foreign tables in its current form? It only locks the local foreign table object, not the table in the remote server. Sorry if this was fiercely discussed already, but I don't think the file FDW belongs in core. I'd rather see it as a contrib module I would've expected the contrib install script to create the foreign data wrapper for me. While you can specify options to a foreign data wrapper, the CREATE FOREIGN DATA WRAPPER seems similar to CREATE LANGUAGE, ie. something that happens when the foreign data wrapper library is installed. How do you specify a foreign table that has a different name in the remote server? For example, if I wanted to create a foreign table called foo, that fetched rows from a remote table called bar? I would really like to see the SQL query that's shipped to the remote host in EXPLAIN. That's essential information for analyzing a query that involves a foreign table. What about transactions? Does the SQL/MED standard have something to say about that? In general, I'm surprised that there's no hook at all into the planning phase. You have this TODO comment postgresql_fdw: /* * TODO: omit (deparse to NULL) columns which are not used in the * original SQL. * * We must parse nodes parents of this ForeignScan node to determine unused * columns because some columns may be used only in parent Sort/Agg/Limit * nodes. */ Parsing the parents of the ForeignScan node seems like a backwards way of solving the problem. The planner should tell the FDW what columns it needs. And there should be some way for the FDW to tell the planner which quals it can handle, so that the executor doesn't need to recheck them. You could make the planner interface infinitely complicated, but that's no excuse for doing nothing at all. The interface needs some thought... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fate of the old cvs repository
On Sun, Oct 17, 2010 at 19:15, David Fetter da...@fetter.org wrote: On Sun, Oct 17, 2010 at 05:54:04PM +0200, Magnus Hagander wrote: So, it seems we're pretty firmly on git now, and I doubt we're ever going to shift back now :) That means I'd like to get the two CVS VMs shut down (that's cvs.postgresql.org and anoncvs.postgresql.org), so we don't have to attempt to maintain them... What should we do with the official old cvs repository when we do this? Just create a .tar.gz and drop it on the ftp site? (I assume most committers already have such a copy of the repository, but there should probably be an official one for the project?) Anything else? +1 for dropping a tarball on the FTP mirrors. That way it's distributed and hard to lose. :) It's now done. It will show up in /pub/dev/archive/ on the ftp mirrors as soon as they've replicated. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Isn't HANDLE 64 bits on Win64?
Dave Page dp...@pgadmin.org writes: On Tue, Nov 16, 2010 at 10:01 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Nov 16, 2010 at 01:35, Tom Lane t...@sss.pgh.pa.us wrote: BTW, it seems like it'd be a good thing if we had a Win64 machine in the buildfarm. Yes. I actually thought we had one. Dave, weren't you going to set one up? I was, but I saw one there so didn't bother (hamerkop). Windows buildfarm critters can take a surprising amount of herding... hamerkop seems to have gone AWOL around the time of the git conversion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving prep_buildtree used in VPATH builds
Excerpts from Greg Smith's message of vie nov 19 01:52:34 -0300 2010: I'd think that if configure takes longer than it has to because the system is heavily loaded, the amount compilation time is going to suffer from that would always dwarf this component of total build time. But if this was slow enough at some point to motivate you to write a patch for it, maybe that assumption is wrong. What if instead of -depth you do something like find the_args | sort -r ? If you find a way to filter out the parents that you know have already been created, you could also cut down on the number of mkdir -p calls, which could result in a larger speedup. And maybe we should remove the test -d. Also, the `expr` call could be substituted by ${item##$sourcedir}, which is supposed to be a POSIX shell feature according to http://www.unix.org/whitepapers/shdiffs.html and http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html In short, there are plenty of optimization opportunities for this script without having to involve nonstandard constructs. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changes to Linux OOM killer in 2.6.36
Greg Smith wrote: oom_adj is deprecated, scheduled for removal in August 2010: That surprised me so I checked the URL. I believe you have a typo there and it's August, 2012. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Thu, Nov 18, 2010 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm all in favor of having some memory ordering primitives so that we can try to implement better algorithms, but if we use it here it amounts to a fairly significant escalation in the minimum requirements to compile PG (which is bad) rather than just a performance optimization (which is good). I don't believe there would be any escalation in compilation requirements: we already have the ability to invoke stronger primitives than these. What is needed is research to find out what the primitives are called, on platforms where we aren't relying on direct asm access. I don't believe that's correct, although it's possible that I may be missing something. On any platform where we have TAS(), that should be sufficient to set the flag, but how will we read the flag? A simple fetch isn't guaranteed to be sufficient; for some architectures, you might need to insert a read fence, and I don't think we have anything like that defined right now. We've got special-cases in s_lock.h for all kinds of crazy architectures; and it's not obvious what would be needed. For example some operating system I've never heard of called SINIX has this: #include abi_mutex.h typedef abilock_t slock_t; #define TAS(lock) (!acquire_lock(lock)) #define S_UNLOCK(lock) release_lock(lock) #define S_INIT_LOCK(lock) init_lock(lock) #define S_LOCK_FREE(lock) (stat_lock(lock) == UNLOCKED) It's far from obvious to me how to make this do what we need - I have a sneaking suspicion it can't be done with those primitives at all - and I bet neither of us has a machine on which it can be tested. Now maybe we no longer care about supporting SINIX anyway, but the point is that if we make this change, every platform for which we don't have working TAS and read-fence operations becomes an unsupported platform. Forget about --disable-spinlocks; there is no such thing. That strikes me as an utterly unacceptable amount of collateral damage to avoid a basically harmless API change, not to mention a ton of work. You might be able to convince me that it's no longer important to support platforms without a working spinlock implementation (although I think it's rather nice that we can - might encourage someone to try out PG and then contribute an implementation for their favorite platform) but this is also going to break platforms that nominally have TAS now (some of the TAS implementations aren't really TAS, as in the above case, and we may not be able to easily determine what's required for a read-fence even where TAS is a real TAS). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote: So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses spinlocks for that purpose - no idea where that is true these days. Me neither, which is exactly the problem. Under Tom's proposal, any architecture we don't explicitly provide for, breaks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq changes for synchronous replication
On Thu, Nov 18, 2010 at 7:43 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Nov 16, 2010 at 10:49 AM, Robert Haas robertmh...@gmail.com wrote: Just in a quick scan, I don't have any objection to v2 except that the protocol documentation is lacking. OK, I'll mark it Waiting on Author pending that issue. The patch is touching protocol.sgml as follows. Isn't this enough? How about some updates to the Message Flow section, especially the section on COPY Operations? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 15:16:24 Robert Haas wrote: On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote: So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses spinlocks for that purpose - no idea where that is true these days. Me neither, which is exactly the problem. Under Tom's proposal, any architecture we don't explicitly provide for, breaks. I doubt its that much of a problem as !defined(HAS_TEST_AND_SET) will be so slow that there would be noise from that side more often... Besides, we can just jump into the kernel and back in that case (which the TAS implementation already does), that does more than just a fence... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 9:27 AM, Aidan Van Dyk ai...@highrise.ca wrote: On Fri, Nov 19, 2010 at 9:16 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote: So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses spinlocks for that purpose - no idea where that is true these days. Me neither, which is exactly the problem. Under Tom's proposal, any architecture we don't explicitly provide for, breaks. Just a small point of clarification - you need to have both that unknown archtecture, and that architecture has to have postgres process running simultaneously on difference CPUs with different caches that are incoherent to have those problems. Sure you do. But so what? Are you going to compile PostgreSQL and implement TAS as a simple store and read-fence as a simple load? How likely is that to work out well? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 9:16 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote: So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses spinlocks for that purpose - no idea where that is true these days. Me neither, which is exactly the problem. Under Tom's proposal, any architecture we don't explicitly provide for, breaks. Just a small point of clarification - you need to have both that unknown archtecture, and that architecture has to have postgres process running simultaneously on difference CPUs with different caches that are incoherent to have those problems. a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 15:29:10 Andres Freund wrote: Besides, we can just jump into the kernel and back in that case (which the TAS implementation already does), that does more than just a fence... Or if you don't believe that is enough initialize a lock on the stack, lock and forget it... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 9:29 AM, Andres Freund and...@anarazel.de wrote: On Friday 19 November 2010 15:16:24 Robert Haas wrote: On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote: So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses spinlocks for that purpose - no idea where that is true these days. Me neither, which is exactly the problem. Under Tom's proposal, any architecture we don't explicitly provide for, breaks. I doubt its that much of a problem as !defined(HAS_TEST_AND_SET) will be so slow that there would be noise from that side more often... Besides, we can just jump into the kernel and back in that case (which the TAS implementation already does), that does more than just a fence... Eh, really? If there's a workaround for platforms for which we don't know what the appropriate read-fencing incantation is, then I'd feel more comfortable about doing this. But I don't see how to make that work. The whole problem here is that API is designed in such a way that the signal handler might be invoked when the lock that it needs to grab is already held by the same process. The reason memory barriers solve the problem is because they'll be atomically released when we jump into the signal handler, but that is not true of a spin-lock or a semaphore. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 9:35 AM, Aidan Van Dyk ai...@highrise.ca wrote: On Fri, Nov 19, 2010 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote: Just a small point of clarification - you need to have both that unknown archtecture, and that architecture has to have postgres process running simultaneously on difference CPUs with different caches that are incoherent to have those problems. Sure you do. But so what? Are you going to compile PostgreSQL and implement TAS as a simple store and read-fence as a simple load? How likely is that to work out well? If I was trying to port PostgreSQL to some strange architecture, and my strange architecture didtt' have all the normal TAS and memory bariers stuff because it was only a UP system with no cache, then yes, and it would work out well ;-) I get your point, but obviously this case isn't very interesting or likely in 2010. If it was some strange SMP architecture, I wouldn't expect *anything* to work out well if the architecture doesn't have some sort of TAS/memory barrier/cache-coherency stuff in it ;-) Well, you'd be pleasantly surprised to find that you could at least get it to compile using --disable-spinlocks. Yeah, the performance would probably be lousy and you might run out of semaphores, but at least for basic stuff it would run. Ripping that out just to avoid an API change in code we committed two months ago seems a bit extreme, especially since it's also going to implementing a read-fence operation on every platform we want to continue supporting. Maybe you could default the read-fence to just a simple read for platforms that are not known to have an issue, but all the platforms where TAS is calling some OS-provided routine that does mysterious magic under the covers are going to need attention; and I just don't think that cleaning up everything that's going to break is a very worthwhile investment of our limited development resources, even if it doesn't result in needlessly dropping platform support. If we're going to work on memory primitives, I would much rather see us put that effort into, say, implementing more efficient LWLock algorithms to solve the bottlenecks that the MOSBENCH guys found, rather than spending it on trying to avoid a minor API complication for the latch facility. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 15:38:37 Robert Haas wrote: Eh, really? If there's a workaround for platforms for which we don't know what the appropriate read-fencing incantation is, then I'd feel more comfortable about doing this. But I don't see how to make that work. The whole problem here is that API is designed in such a way that the signal handler might be invoked when the lock that it needs to grab is already held by the same process. The reason memory barriers solve the problem is because they'll be atomically released when we jump into the signal handler, but that is not true of a spin-lock or a semaphore. Well, its not generally true - you are right there. But there is a wide range for syscalls available where its inherently true (which is what I sloppily referred to). And you are allowed to call a, although quite restricted, set of system calls even in signal handlers. I don't have the list for older posix versions in mind, but for 2003 you can choose something from several like write, lseek,setpgid which inherently have to serialize. And I am quite sure there were sensible calls for earlier versions. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 15:14:58 Robert Haas wrote: On Thu, Nov 18, 2010 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm all in favor of having some memory ordering primitives so that we can try to implement better algorithms, but if we use it here it amounts to a fairly significant escalation in the minimum requirements to compile PG (which is bad) rather than just a performance optimization (which is good). I don't believe there would be any escalation in compilation requirements: we already have the ability to invoke stronger primitives than these. What is needed is research to find out what the primitives are called, on platforms where we aren't relying on direct asm access. I don't believe that's correct, although it's possible that I may be missing something. On any platform where we have TAS(), that should be sufficient to set the flag, but how will we read the flag? A simple fetch isn't guaranteed to be sufficient; for some architectures, you might need to insert a read fence, and I don't think we have anything like that defined right now. A TAS is both a read and write fence. After that you don't *need* to fetch it. And even if it were only a write fence on some platforms - if we consistently issue write fences at the relevant places that ought to be enough. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 15:49:45 Robert Haas wrote: If we're going to work on memory primitives, I would much rather see us put that effort into, say, implementing more efficient LWLock algorithms to solve the bottlenecks that the MOSBENCH guys found, rather than spending it on trying to avoid a minor API complication for the latch facility. But for that you will need more infrastructure in that area anyway. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 9:51 AM, Andres Freund and...@anarazel.de wrote: On Friday 19 November 2010 15:49:45 Robert Haas wrote: If we're going to work on memory primitives, I would much rather see us put that effort into, say, implementing more efficient LWLock algorithms to solve the bottlenecks that the MOSBENCH guys found, rather than spending it on trying to avoid a minor API complication for the latch facility. But for that you will need more infrastructure in that area anyway. True, but you don't have to do it all at once. You can continue to do the same old stuff on the platforms you currently support, and use the newer stuff on platforms where the right thing to do is readily apparent, like x64 and x86_64. And people can add support for their favorite platforms gradually over time, rather than having a flag day where we stop supporting everything we don't know what to do with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Label switcher function
2010/11/19 KaiGai Kohei kai...@ak.jp.nec.com: Indeed, the comment at middle of the fmgr_info_cxt_security() and just above definition of the fmgr_security_definer() are not correct. Did you notice anything else? I think I noticed a couple of places, but I didn't write down exactly which ones. Sorry -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 9:49 AM, Andres Freund and...@anarazel.de wrote: Well, its not generally true - you are right there. But there is a wide range for syscalls available where its inherently true (which is what I sloppily referred to). And you are allowed to call a, although quite restricted, set of system calls even in signal handlers. I don't have the list for older posix versions in mind, but for 2003 you can choose something from several like write, lseek,setpgid which inherently have to serialize. And I am quite sure there were sensible calls for earlier versions. Well, it's not quite enough just to call into the kernel to serialize on some point of memory, because your point is to make sure that *this particular piece of memory* is coherent. It doesn't matter if the kernel has proper fencing in it's stuff if the memory it's guarding is in another cacheline, because that won't *necessarily* force cache coherency in your local lock/variable memory. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Robert Haas robertmh...@gmail.com writes: If we're going to work on memory primitives, I would much rather see us put that effort into, say, implementing more efficient LWLock algorithms to solve the bottlenecks that the MOSBENCH guys found, rather than spending it on trying to avoid a minor API complication for the latch facility. I haven't read all of this very long thread yet, but I will point out that you seem to be arguing from the position that memory ordering primitives will only be useful for the latch code. This is nonsense of the first order. We already know that the sinval signalling mechanism could use it to avoid needing a spinlock. I submit that it's very likely that fixing communication bottlenecks elsewhere will similarly require memory ordering primitives if we are to avoid the stupid use a lock approach. I think it's time to build that infrastructure. BTW, I agree with Andres' point that we can probably default memory barriers to be no-ops on unknown platforms. Weak memory ordering isn't a common architectural choice. A look through s_lock.h suggests that PPC and MIPS are the only supported arches that need to worry about this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: If we're going to work on memory primitives, I would much rather see us put that effort into, say, implementing more efficient LWLock algorithms to solve the bottlenecks that the MOSBENCH guys found, rather than spending it on trying to avoid a minor API complication for the latch facility. I haven't read all of this very long thread yet, but I will point out that you seem to be arguing from the position that memory ordering primitives will only be useful for the latch code. This is nonsense of the first order. We already know that the sinval signalling mechanism could use it to avoid needing a spinlock. I submit that it's very likely that fixing communication bottlenecks elsewhere will similarly require memory ordering primitives if we are to avoid the stupid use a lock approach. I think it's time to build that infrastructure. I completely agree, but I'm not too sure I want to drop support for any platform for which we haven't yet implemented such primitives. What's different about this case is that fall back to taking the spin lock is not a workable option. BTW, I agree with Andres' point that we can probably default memory barriers to be no-ops on unknown platforms. Weak memory ordering isn't a common architectural choice. A look through s_lock.h suggests that PPC and MIPS are the only supported arches that need to worry about this. That's good to hear. I'm more worried, however, about architectures where we supposedly have TAS but it isn't really TAS but some OS-provided acquire a lock primitive. That won't generalize nicely to what we need for this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote: Just a small point of clarification - you need to have both that unknown archtecture, and that architecture has to have postgres process running simultaneously on difference CPUs with different caches that are incoherent to have those problems. Sure you do. But so what? Are you going to compile PostgreSQL and implement TAS as a simple store and read-fence as a simple load? How likely is that to work out well? If I was trying to port PostgreSQL to some strange architecture, and my strange architecture didtt' have all the normal TAS and memory bariers stuff because it was only a UP system with no cache, then yes, and it would work out well ;-) If it was some strange SMP architecture, I wouldn't expect *anything* to work out well if the architecture doesn't have some sort of TAS/memory barrier/cache-coherency stuff in it ;-) a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 15:58:39 Aidan Van Dyk wrote: On Fri, Nov 19, 2010 at 9:49 AM, Andres Freund and...@anarazel.de wrote: Well, its not generally true - you are right there. But there is a wide range for syscalls available where its inherently true (which is what I sloppily referred to). And you are allowed to call a, although quite restricted, set of system calls even in signal handlers. I don't have the list for older posix versions in mind, but for 2003 you can choose something from several like write, lseek,setpgid which inherently have to serialize. And I am quite sure there were sensible calls for earlier versions. Well, it's not quite enough just to call into the kernel to serialize on some point of memory, because your point is to make sure that *this particular piece of memory* is coherent. It doesn't matter if the kernel has proper fencing in it's stuff if the memory it's guarding is in another cacheline, because that won't *necessarily* force cache coherency in your local lock/variable memory. Yes and no. It provides the same guarantees as our current approach of using spinlocks for exactly that - that it theoretically is not enough is an independent issue (but *definitely* an issue). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On 11/19/2010 03:58 PM, Aidan Van Dyk wrote: Well, it's not quite enough just to call into the kernel to serialize on some point of memory, because your point is to make sure that *this particular piece of memory* is coherent. Well, that certainly doesn't apply to full fences, that are not specific to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or 'mf' on ia64. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Excerpts from Bruce Momjian's message of vie nov 19 00:17:59 -0300 2010: Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of mi nov 17 13:04:46 -0300 2010: OK, I doubt we want to add complexity to improve this, so I see our options as: o ignore the problem o display IPv4/IPv6 labels o display only an IPv6 label o something else I think we should use inet_ntop where available to print the address. Good idea because inet_ntop() is thread-safe. Does that work on IPv6? You indicated that inet_ntoa() does not. According to opengroup.org, IPv6 should work if the underlying libraries support it, whereas inet_ntoa explicitely does not. http://www.opengroup.org/onlinepubs/009695399/functions/inet_ntop.html http://www.opengroup.org/onlinepubs/009695399/functions/inet_addr.html -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq changes for synchronous replication
Robert Haas robertmh...@gmail.com writes: On Thu, Nov 18, 2010 at 7:43 AM, Fujii Masao masao.fu...@gmail.com wrote: The patch is touching protocol.sgml as follows. Isn't this enough? How about some updates to the Message Flow section, especially the section on COPY Operations? Yeah. You're adding a new fundamental state to the protocol; it's not enough to bury that in the description of a message format. I don't think a whole lot of new verbiage is needed, but the COPY section needs to point out that this is a different state that allows both send and receive, and explain what the conditions are for getting into and out of that state. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Robert Haas robertmh...@gmail.com writes: ... The reason memory barriers solve the problem is because they'll be atomically released when we jump into the signal handler, but that is not true of a spin-lock or a semaphore. Hm, I wonder whether your concern is stemming from a wrong mental model. There is nothing to release. In my view, a memory barrier primitive is a sequence point, having the properties that all writes issued before the barrier shall become visible to other processors before any writes after it, and also that no reads issued after the barrier shall be executed until those writes have become visible. (PPC can separate those two aspects, but I think we probably don't need to get that detailed for our purposes.) On most processors, the barrier primitive will just be ((void) 0) because they don't deal in out-of-order writes anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] What do these terms mean in the SOURCE CODE?
I am going through the Executor code and come across the following terms quite often. Can someone tell me what do they mean (in a few (may be a couple of) sentences)? 1. Scan State 2. Plan State 3. Tuple Projection 4. EState 5. Qual 6. Expression They sound quite ambiguous in the source code, specially when some of them already have terms which have multiple meanings. Thanks for your time. -Vaibhav (*_*)
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Robert Haas robertmh...@gmail.com writes: I completely agree, but I'm not too sure I want to drop support for any platform for which we haven't yet implemented such primitives. What's different about this case is that fall back to taking the spin lock is not a workable option. The point I was trying to make is that the fallback position can reasonably be a no-op. That's good to hear. I'm more worried, however, about architectures where we supposedly have TAS but it isn't really TAS but some OS-provided acquire a lock primitive. That won't generalize nicely to what we need for this case. I did say we need some research ;-). We need to look into what's the appropriate primitive for any such OSes that are available for PPC or MIPS. I don't feel a need to be paranoid about it for other architectures. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Markus Wanner mar...@bluegap.ch writes: Well, that certainly doesn't apply to full fences, that are not specific to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or 'mf' on ia64. Hm, what do those do exactly? We've never had any such thing in the Intel-ish spinlock asm, but if out-of-order writes are possible I should think we'd need 'em. Or does lock xchgb imply an mfence? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] final patch - plpgsql: for-in-array
I checked my tests and the most important is a remove a repeated detoast. postgres=# CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer) RETURNS text[] LANGUAGE plpgsql AS $function$ DECLARE s text[] := '{}'; l int := 0; i int; v text; loc text[] = $1; BEGIN FOR i IN array_lower(loc,1)..array_upper(loc,1) LOOP EXIT WHEN l = $3; IF loc[i] LIKE $2 THEN s := s || loc[i]; l := l + 1; END IF; END LOOP; RETURN s; END;$function$; This code is very slow when array is large - tested on n=1000. With one small modification can be 20x faster DECLARE s text[] := '{}'; l int := 0; i int; v text; loc text[] = $1 || '{}'::text[]; -- does just detoast and docomprimation BEGIN the final version of test can be: so result: Don't access to large unmodified array inside cycle, when data comes from table (for iteration over A[1000] of text(10)). A speadup is from 451 sec to 15 sec. This rule can be interesting for PostGIS people, because it can be valid for other long varlena values. But still this is 2x slower than special statement. Regards Pavel Stehule samples %symbol name 332 22.1333 exec_eval_expr 311 20.7333 plpgsql_param_fetch 267 17.8000 exec_eval_datum 220 14.6667 exec_stmts 916.0667 setup_param_list 825.4667 exec_eval_cleanup.clone.10 714.7333 __i686.get_pc_thunk.bx 483.2000 exec_simple_cast_value 432.8667 exec_eval_boolean samples %symbol name 4636 37.5994 array_seek.clone.0 961 7.7940 pglz_decompress 901 7.3074 list_member_ptr 443 3.5929 MemoryContextAllocZero 384 3.1144 AllocSetAlloc 381 3.0900 ExecEvalParamExtern 334 2.7088 GetSnapshotData 255 2.0681 AllocSetFree 254 2.0600 LWLockRelease 249 2.0195 ExecMakeFunctionResultNoSets 249 2.0195 UTF8_MatchText 234 1.8978 LWLockAcquire 195 1.5815 AllocSetReset 167 1.3544 AllocSetCheck 163 1.3220 pfree 151 1.2247 ExecEvalArrayRef 149 1.2084 RevalidateCachedPlan 138 1.1192 bms_is_member 126 1.0219 CopySnapshot -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Bruce Momjian's message of vie nov 19 00:17:59 -0300 2010: Alvaro Herrera wrote: I think we should use inet_ntop where available to print the address. Good idea because inet_ntop() is thread-safe. Does that work on IPv6? You indicated that inet_ntoa() does not. According to opengroup.org, IPv6 should work if the underlying libraries support it, whereas inet_ntoa explicitely does not. http://www.opengroup.org/onlinepubs/009695399/functions/inet_ntop.html http://www.opengroup.org/onlinepubs/009695399/functions/inet_addr.html I get the impression that you guys have forgotten the existence of src/backend/utils/adt/inet_net_ntop.c regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On 11/19/2010 04:51 PM, Tom Lane wrote: Hm, what do those do exactly? Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. [1] Given the memory ordering guarantees of x86, this instruction might only be relevant for SMP systems, though. Or does lock xchgb imply an mfence? Probably on older architectures (given the name bus locked exchange), but OTOH I wouldn't bet on that still being true. Locking the entire bus sounds like a prohibitively expensive operation with today's amounts of cores per system. Regards Markus Wanner [1]: random google hit on 'mfence': http://siyobik.info/index.php?module=x86id=170 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 16:51:00 Tom Lane wrote: Markus Wanner mar...@bluegap.ch writes: Well, that certainly doesn't apply to full fences, that are not specific to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or 'mf' on ia64. Hm, what do those do exactly? We've never had any such thing in the Intel-ish spinlock asm, but if out-of-order writes are possible I should think we'd need 'em. Or does lock xchgb imply an mfence? Out of order writes are definitely possible if you consider multiple processors. Locked statments like 'lock xaddl;' guarantee that the specific operands (or their cachelines) are visible on all processors and are done atomically - but its not influencing the whole cache like mfence would. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Andres Freund and...@anarazel.de writes: Locked statments like 'lock xaddl;' guarantee that the specific operands (or their cachelines) are visible on all processors and are done atomically - but its not influencing the whole cache like mfence would. Where is this locking the whole cache meme coming from? What we're looking for has nothing to do with locking anything. It's primarily a directive to the processor to flush any dirty cache lines out to main memory. It's not going to block any other processors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 17:25:57 Tom Lane wrote: Andres Freund and...@anarazel.de writes: Locked statments like 'lock xaddl;' guarantee that the specific operands (or their cachelines) are visible on all processors and are done atomically - but its not influencing the whole cache like mfence would. Where is this locking the whole cache meme coming from? What we're looking for has nothing to do with locking anything. It's primarily a directive to the processor to flush any dirty cache lines out to main memory. It's not going to block any other processors. I was never talking about 'locking the whole cache' - I was talking about flushing/fencing it like a global read/write barrier would. And lock xchgb/xaddl does not imply anything for other cachelines but its own. I only used 'locked' in the context of 'lock xaddl'. Am I misunderstanding you? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Andres Freund and...@anarazel.de writes: I was never talking about 'locking the whole cache' - I was talking about flushing/fencing it like a global read/write barrier would. And lock xchgb/xaddl does not imply anything for other cachelines but its own. If that's the case, why aren't the parallel regression tests falling over constantly? My recollection is that when I broke the sinval code by assuming strong memory ordering without spinlocks, it didn't take long at all for the PPC buildfarm members to expose the problem. If it's possible for Intel-ish processors to exhibit weak memory ordering behavior, I'm quite sure that our current code would be showing bugs everywhere. The impression I had of current Intel designs is that they ensure global cache coherency, ie if one processor has a dirty cache line the others know that, and will go get the updated data before attempting to access that piece of memory. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq changes for synchronous replication
Excerpts from Tom Lane's message of vie nov 19 12:25:13 -0300 2010: Robert Haas robertmh...@gmail.com writes: On Thu, Nov 18, 2010 at 7:43 AM, Fujii Masao masao.fu...@gmail.com wrote: The patch is touching protocol.sgml as follows. Isn't this enough? How about some updates to the Message Flow section, especially the section on COPY Operations? Yeah. You're adding a new fundamental state to the protocol; it's not enough to bury that in the description of a message format. I don't think a whole lot of new verbiage is needed, but the COPY section needs to point out that this is a different state that allows both send and receive, and explain what the conditions are for getting into and out of that state. Is it sane that the new message has so specific a name? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq changes for synchronous replication
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of vie nov 19 12:25:13 -0300 2010: Yeah. You're adding a new fundamental state to the protocol; it's not enough to bury that in the description of a message format. I don't think a whole lot of new verbiage is needed, but the COPY section needs to point out that this is a different state that allows both send and receive, and explain what the conditions are for getting into and out of that state. Is it sane that the new message has so specific a name? Yeah, it might be better to call it something generic like CopyBoth. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I completely agree, but I'm not too sure I want to drop support for any platform for which we haven't yet implemented such primitives. What's different about this case is that fall back to taking the spin lock is not a workable option. The point I was trying to make is that the fallback position can reasonably be a no-op. Hmm, maybe you're right. I was assuming weak memory ordering was a reasonably common phenomenon, but if it only applies to a very small number of architectures and we're pretty confident we know which ones they are, your approach would be far less frightening than I originally thought. But is that really true? I think it would be useful to try to build up a library of primitives in this area. For this particular task, we really only need a write-with-fence primitive and a read-with-fence primitive. On strong memory ordering machines, these can just do a store and a read, respectively; on weak memory ordering machines, they can insert whatever fencing operations are needed on either the store side or the load side. I think it would also be useful to provide macros for compare-and-swap and fetch-and-add on platforms where they are available. Then we could potentially write code like this: #ifdef HAVE_COMPARE_AND_SWAP ...do it the lock-free way... #else ...oh, well, do it with spinlocks... #endif -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Robert Haas robertmh...@gmail.com writes: I think it would be useful to try to build up a library of primitives in this area. For this particular task, we really only need a write-with-fence primitive and a read-with-fence primitive. That's really entirely the wrong way to think about it. You need a fence primitive, full stop. It's a sequence point, not an operation in itself. It guarantees that reads/writes occurring before or after it aren't resequenced around it. I don't even understand what write with fence means --- is the write supposed to be fenced against other writes before it, or other writes after it? I think it would also be useful to provide macros for compare-and-swap and fetch-and-add on platforms where they are available. That would be a great deal more work, because it's not a no-op anywhere; and our need for it is still rather hypothetical. I'm surprised to see you advocating that when you didn't want to touch fencing a moment ago. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
* Andres Freund: I was never talking about 'locking the whole cache' - I was talking about flushing/fencing it like a global read/write barrier would. And lock xchgb/xaddl does not imply anything for other cachelines but its own. My understanding is that once you've seen the result of an atomic operation on i386 and amd64, you are guaranteed to observe all prior writes performed by the thread which did the atomic operation, too. Explicit fencing is only necessary if you need synchronization without atomic operations. -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib: auth_delay module
On Fri, Nov 19, 2010 at 04:57:03PM +0900, KaiGai Kohei wrote: (2010/11/18 2:17), Robert Haas wrote: If KaiGai updates the code per previous discussion, would you be willing to take a crack at adding documentation? P.S. Your email client seems to be setting the Reply-To address to a ridiculous value. OK, I'll revise my patch according to the previous discussion. Please wait for about one week. I have a big event in this weekend. I'll take a crack at the docs, though I might need hand-holding for the new git stuff (I'll hit the wiki) Ross -- Ross Reedstrom, Ph.D. reeds...@rice.edu Systems Engineer Admin, Research Scientistphone: 713-348-6166 Connexions http://cnx.orgfax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
I wrote: Markus Wanner mar...@bluegap.ch writes: Well, that certainly doesn't apply to full fences, that are not specific to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or 'mf' on ia64. Hm, what do those do exactly? I poked around in the Intel manuals a bit. They do have mfence (also lfence and sfence) but so far as I can tell, those are only used to manage loads and stores that are issued by special instructions that explicitly mark the operation as weakly ordered. So the reason we're not seeing bugs is presumably that C compilers don't generate such instructions. Also, Intel architectures do guarantee cache consistency across multiple processors (and it costs them a lot...) I found a fairly interesting and detailed paper about memory fencing in the Linux kernel: http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I think it would be useful to try to build up a library of primitives in this area. For this particular task, we really only need a write-with-fence primitive and a read-with-fence primitive. That's really entirely the wrong way to think about it. You need a fence primitive, full stop. It's a sequence point, not an operation in itself. It guarantees that reads/writes occurring before or after it aren't resequenced around it. I don't even understand what write with fence means --- is the write supposed to be fenced against other writes before it, or other writes after it? I was taking it to mean something similar to the memory guarantees around synchronized blocks in Java. At the start of a synchronized block you discard any cached data which you've previously read from or written to main memory, and must read everything fresh from that point. At the end of a synchronized block you must write any locally written values to main memory, although you retain them in your thread-local cache for possible re-use. Reads or writes from outside the synchronized block can be pulled into the block and reordered in among the reads and writes within the block (which may also be reordered) unless there's another block to contain them. It works fine once you have your head around it, and allows for significant optimization in a heavily multi-threaded application. I have no idea whether such a model would be useful for PostgreSQL. If I understand Tom he is proposing what sounds roughly like what could be achieved in the Java memory model by keeping all code for a process within a single synchronized block, with the fence being a point where you end it (flushing all local writes to main memory) and start a new one (forcing a discard of locally cached data). Of course I'm ignoring the locking aspect of synchronized blocks and just discussing the memory access aspect of them. (A synchronized block in Java always references some [any] Object, and causes an exclusive lock to be held on the object from one end of the block to the other.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
Note the standard also supports unnesting multiple arrays concurrently, the rule for handling arrays with different lengths is to use null padding of the shorter array. SELECT * FROM UNNEST( ARRAY[5,2,3,4], ARRAY['hello', 'world'] ) WITH ORDINALITY AS t(a,b,i); a b i --- -- -- 5 'hello' 1 2 'world' 2 3 3 4 4 (4 rows) To implement this it is not just substituting the existing unnest(anyarray) function in multiple times. Regards, Caleb On Nov 19, 2010, at 4:50 AM, pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org wrote: From: David Fetter da...@fetter.orgmailto:da...@fetter.org Date: November 18, 2010 11:48:16 PM PST To: Itagaki Takahiro itagaki.takah...@gmail.commailto:itagaki.takah...@gmail.com Cc: PG Hackers pgsql-hackers@postgresql.orgmailto:pgsql-hackers@postgresql.org Subject: Re: UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF) On Fri, Nov 19, 2010 at 11:40:05AM +0900, Itagaki Takahiro wrote: On Fri, Nov 19, 2010 at 08:33, David Fetter da...@fetter.orgmailto:da...@fetter.org wrote: In order to get WITH ORDINALITY, would it be better to change gram.y to account for both WITH ORDINALITY and without, or just for the WITH ORDINALITY case? We probably need to change gram.y and make UNNEST to be COL_NAME_KEYWORD. UNNEST (without ORDINALITY) will call the existing unnest() function, and UNNEST() WITH ORDINALITY will call unnest_with_ordinality(). Thanks for sketching that out :) BTW, what will we return for arrays with 2 or more dimensions? At the moment, per the SQL standard, UNNEST without the WITH ORDINALITY clause flattens all dimensions. SELECT * FROM UNNEST(ARRAY[[1,2],[3,4]]); unnest 1 2 3 4 (4 rows) Unless we want to do something super wacky and contrary to the SQL standard, UNNEST(array) WITH ORDINALITY should do the same. There are no confusion in your two arguments version: UNNEST(anyarray, number_of_dimensions_to_unnest) but we will also support one argument version. Array indexes will be composite numbers in the cases. The possible design would be just return sequential serial numbers of the values -- the following two queries return the same results: - SELECT i, v FROM UNNEST($1) WITH ORDINALITY AS t(v, i) - SELECT row_number() OVER () AS i, v FROM UNNEST($1) AS t(v) Yes, that's what the standard says. Possible less-than-total unrolling schemes include: - Flatten specified number of initial dimensions into one list, e.g. turn UNNEST(array_3d, 2) into SETOF(array_1d) with one column of ORDINALITY - Flatten similarly, but have an ORDINALITY column for each flattened dimension. - More exotic schemes, such as UNNEST(array_3d, [1,3]), with either of the two methods above. And of course the all-important: - Other possibilities I haven't thought of :) Cheers, David. -- David Fetter da...@fetter.orgmailto:da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.commailto:david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: That's really entirely the wrong way to think about it. You need a fence primitive, full stop. It's a sequence point, not an operation in itself. I was taking it to mean something similar to the memory guarantees around synchronized blocks in Java. At the start of a synchronized block you discard any cached data which you've previously read from or written to main memory, and must read everything fresh from that point. At the end of a synchronized block you must write any locally written values to main memory, although you retain them in your thread-local cache for possible re-use. That is basically the model that we have implemented in the spinlock primitives: taking a spinlock corresponds to starting a synchronized block and releasing the spinlock ends it. On processors that need it, the spinlock macros include memory fence instructions that implement the above semantics. However, for lock-free interactions I think this model isn't terribly helpful: it's not clear what is inside and what is outside the sync block, and forcing your code into that model doesn't improve either clarity or performance. What you typically need is a guarantee about the order in which writes become visible. To give a concrete example, the sinval bug I was mentioning earlier boiled down to assuming that a write into an element of the sinval message array would become visible to other processors before the change of the last-message pointer variable became visible to them. Without a fence instruction, that doesn't hold on WMO processors, and so they were able to fetch a stale message value. In some cases you also need to guarantee the order of reads. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Tom Lane t...@sss.pgh.pa.us wrote: What you typically need is a guarantee about the order in which writes become visible. In some cases you also need to guarantee the order of reads. Doesn't that suggest different primitives? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 18:46:00 Tom Lane wrote: I wrote: Markus Wanner mar...@bluegap.ch writes: Well, that certainly doesn't apply to full fences, that are not specific to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or 'mf' on ia64. Hm, what do those do exactly? I poked around in the Intel manuals a bit. They do have mfence (also lfence and sfence) but so far as I can tell, those are only used to manage loads and stores that are issued by special instructions that explicitly mark the operation as weakly ordered. So the reason we're not seeing bugs is presumably that C compilers don't generate such instructions. Well. Some memcpy() implementations use string (or SIMD) operations which are weakly ordered though. Also, Intel architectures do guarantee cache consistency across multiple processors (and it costs them a lot...) Only if you are talking about the *same* locations though. See example 8.2.3.4 Combined with: For the Intel486 and Pentium processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor. For the P6 and more recent processor families, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow it’s cache coherency mechanism to ensure that the operation is carried out atomically. This operation is called “cache locking.” The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area. Which means something like (in intel's terminology) can happen: initially x = 0 P1: mov [_X], 1 P1: lock xchg Y, 1 P2. lock xchg [_Z], 1 P2: mov r1, [_X] A valid result is that r1 on P2 is 0. I think that is not biting pg because it always uses the same spinlocks at the reading and writing side - but I am not that sure about that. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
On Fri, Nov 19, 2010 at 01:48:06PM -0500, caleb.wel...@emc.com wrote: Note the standard also supports unnesting multiple arrays concurrently, the rule for handling arrays with different lengths is to use null padding of the shorter array. Interesting. I notice that our version doesn't support multiple-array UNNEST just yet. SELECT * FROM UNNEST(ARRAY[1,2,3,4], ARRAY['hello','world']); ERROR: function unnest(integer[], text[]) does not exist LINE 1: SELECT * FROM UNNEST(ARRAY[1,2,3,4], ARRAY['hello','world'])... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. SELECT * FROM UNNEST( ARRAY[5,2,3,4], ARRAY['hello', 'world'] ) WITH ORDINALITY AS t(a,b,i); a b i --- -- -- 5 'hello' 1 2 'world' 2 3 3 4 4 (4 rows) This looks a lot like an OUTER JOIN on the ORDINALITY column of each of the individual UNNEST...WITH ORDINALITYs. Given that we know the ORDINALITY in advance just by building the arrays, we could optimize this away from FULL JOIN to LEFT (or RIGHT) JOINs. To implement this it is not just substituting the existing unnest(anyarray) function in multiple times. Right. Regards, Caleb On Nov 19, 2010, at 4:50 AM, pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org wrote: From: David Fetter da...@fetter.orgmailto:da...@fetter.org Date: November 18, 2010 11:48:16 PM PST To: Itagaki Takahiro itagaki.takah...@gmail.commailto:itagaki.takah...@gmail.com Cc: PG Hackers pgsql-hackers@postgresql.orgmailto:pgsql-hackers@postgresql.org Subject: Re: UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF) On Fri, Nov 19, 2010 at 11:40:05AM +0900, Itagaki Takahiro wrote: On Fri, Nov 19, 2010 at 08:33, David Fetter da...@fetter.orgmailto:da...@fetter.org wrote: In order to get WITH ORDINALITY, would it be better to change gram.y to account for both WITH ORDINALITY and without, or just for the WITH ORDINALITY case? We probably need to change gram.y and make UNNEST to be COL_NAME_KEYWORD. UNNEST (without ORDINALITY) will call the existing unnest() function, and UNNEST() WITH ORDINALITY will call unnest_with_ordinality(). Thanks for sketching that out :) BTW, what will we return for arrays with 2 or more dimensions? At the moment, per the SQL standard, UNNEST without the WITH ORDINALITY clause flattens all dimensions. SELECT * FROM UNNEST(ARRAY[[1,2],[3,4]]); unnest 1 2 3 4 (4 rows) Unless we want to do something super wacky and contrary to the SQL standard, UNNEST(array) WITH ORDINALITY should do the same. There are no confusion in your two arguments version: UNNEST(anyarray, number_of_dimensions_to_unnest) but we will also support one argument version. Array indexes will be composite numbers in the cases. The possible design would be just return sequential serial numbers of the values -- the following two queries return the same results: - SELECT i, v FROM UNNEST($1) WITH ORDINALITY AS t(v, i) - SELECT row_number() OVER () AS i, v FROM UNNEST($1) AS t(v) Yes, that's what the standard says. Possible less-than-total unrolling schemes include: - Flatten specified number of initial dimensions into one list, e.g. turn UNNEST(array_3d, 2) into SETOF(array_1d) with one column of ORDINALITY - Flatten similarly, but have an ORDINALITY column for each flattened dimension. - More exotic schemes, such as UNNEST(array_3d, [1,3]), with either of the two methods above. And of course the all-important: - Other possibilities I haven't thought of :) Cheers, David. -- David Fetter da...@fetter.orgmailto:da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.commailto:david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] how correctly detoast a Datum value?
Hello I try to explicitly detoast a plpgsql var, but I this code breaks a content. what is wrong? switch (datum-dtype) { case PLPGSQL_DTYPE_VAR: { PLpgSQL_var *var = (PLpgSQL_var *) datum; *typeid = var-datatype-typoid; *typetypmod = var-datatype-atttypmod; *isnull = var-isnull; /*. * explicitly detoasting a possible toasted values, * should to protect us under repeated detoasting. * and decomprimiting */ if (!*isnull !var-datatype-typbyval var-datatype-typlen == -1) { struct varlena *datum = PG_DETOAST_DATUM(var-value); if ((Pointer) datum != DatumGetPointer(var-value)) { free_var(var); var-value = PointerGetDatum(datum); } *value = var-value; } else *value = var-value; break; } Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
Excerpts from Caleb.Welton's message of vie nov 19 15:48:06 -0300 2010: Note the standard also supports unnesting multiple arrays concurrently, the rule for handling arrays with different lengths is to use null padding of the shorter array. SELECT * FROM UNNEST( ARRAY[5,2,3,4], ARRAY['hello', 'world'] ) WITH ORDINALITY AS t(a,b,i); a b i --- -- -- 5 'hello' 1 2 'world' 2 3 3 4 4 (4 rows) Hmm, this is pretty interesting and useful --- I had to deal with some XPath code not long ago and I had to turn to plpgsql; I think it could have been done with multi-array unnest. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Friday 19 November 2010 20:03:27 Andres Freund wrote: Which means something like (in intel's terminology) can happen: initially x = 0 P1: mov [_X], 1 P1: lock xchg Y, 1 P2. lock xchg [_Z], 1 P2: mov r1, [_X] A valid result is that r1 on P2 is 0. I think that is not biting pg because it always uses the same spinlocks at the reading and writing side - but I am not that sure about that. Which also seems to mean that a simple read memory barrier that does __asm__ __volatile__(lock; xaddl $0, ???) seems not to be enough unless you use the same address for all those barriers which would cause horrible cacheline bouncing. Am I missing something? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Bruce Momjian's message of vie nov 19 00:17:59 -0300 2010: Alvaro Herrera wrote: I think we should use inet_ntop where available to print the address. Good idea because inet_ntop() is thread-safe. Does that work on IPv6? You indicated that inet_ntoa() does not. According to opengroup.org, IPv6 should work if the underlying libraries support it, whereas inet_ntoa explicitely does not. http://www.opengroup.org/onlinepubs/009695399/functions/inet_ntop.html http://www.opengroup.org/onlinepubs/009695399/functions/inet_addr.html I get the impression that you guys have forgotten the existence of src/backend/utils/adt/inet_net_ntop.c Yeah, that is nice, but we are calling this from libpq, not the backend. Let me work up a patch. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] directory archive format for pg_dump
Hi, Sharing some thoughts after a first round of reviewing, where I only had time to read the patch itself. Joachim Wieland j...@mcknight.de writes: Since the compression is currently all down in the custom format backup code, the first thing I've done was refactoring the compression functions into a separate file. While at it, I have added support for liblzf compression. I think I'd like to see a separate patch for the new compression support. Sorry about that, I realize that's extra work… And it could be about personal preferences, but the way you added the liblzf support strikes me at odd, with all those #ifdefs everywhere. Is it possible to have a specific file for each supported compression format, then some routing code in src/bin/pg_dump/compress_io.c? The routing code already exists but then the file is full of #ifdef sections to define the right supporting function when I think having a compress_io_zlib and a compress_io_lzf files would be better. Then there's the bulk of the new dump format feature in the other part of the patch, namely src/bin/pg_dump/pg_backup_directory.c. You have to update the copyright in the file header there, at least :) I'm yet to devote more time on this part of the patch but it seems like it's rewriting the full support without using the existing bits. That's something I have to check, didn't have time to read the existing other archive formats code there. I'm hesitant as far as marking the patch Waiting on author to get it split. Joachim, what do you think? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Excerpts from Bruce Momjian's message of vie nov 19 16:43:33 -0300 2010: Tom Lane wrote: I get the impression that you guys have forgotten the existence of src/backend/utils/adt/inet_net_ntop.c Yeah, that is nice, but we are calling this from libpq, not the backend. Let me work up a patch. Actually the code seems agnostic (no ereport, palloc etc) so maybe it could just be moved to src/port. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Alvaro Herrera wrote: Excerpts from Bruce Momjian's message of vie nov 19 16:43:33 -0300 2010: Tom Lane wrote: I get the impression that you guys have forgotten the existence of src/backend/utils/adt/inet_net_ntop.c Yeah, that is nice, but we are calling this from libpq, not the backend. Let me work up a patch. Actually the code seems agnostic (no ereport, palloc etc) so maybe it could just be moved to src/port. I was wondering that. I am unclear if we need it though --- can we not assume inet_ntop() exists on all systems? We assumed inet_ntoa() did. Of course, the buildfarm will tell us. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changes to Linux OOM killer in 2.6.36
On Thu, Nov 18, 2010 at 19:43, Greg Smith g...@2ndquadrant.com wrote: Last month's new Linux kernel 2.6.36 includes a rewrite of the out of memory killer: http://lwn.net/Articles/391222/ http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10 Yeah, Ive been following this somewhat closely... Also of interest is the recent thread about reverting the new oom (don't know if it will happen, but maybe they won't deprecate oom_adj): http://lkml.org/lkml/2010/11/14/5 The new badness method totals the task's RSS and swap as a percentage of RAM, where the old one scored starting with the total memory used by the process. I *think* that this is an improvement for PostgreSQL, based on the sort of data I see with: Well, it seems to be an improvement. If I look at the oom_score on a 2.6.36 box ruining postgres I get: $ cd /proc; for a in [0-9]*; do echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// $a/cmdline`; done|grep -v ^0|sort -n |less 1 1309 supervising syslog-ng 1 1310 /usr/sbin/syslog-ng 1 1336 /usr/sbin/crond 1 1368 /usr/sbin/irqbalance 1 1485 /usr/sbin/ntpd 1 1495 /usr/local/bin/pgbouncer 1 1506 /sbin/agetty 1 3391 /var/lib/postgres/pgsql-9.0/bin/postgres 1 3393 postgres: writer process 1 3394 postgres: wal writer process 1 3395 postgres: autovacuum launcher process 1 3396 postgres: stats collector process 1 4110 postgres: joshua wopr [local] idle 2 4109 postgres: joshua wopr [local] idle So in this case it should kill one of the backends *before* the postmaster. Ignoring that backend... it looks like postmaster has the same score as every other process on the system. It also has a has a higher RSS than most, so I suspect it will still get killed first: $ ps ax -o rss,pid,size,vsize,args | sort -n ... 2416 1680 588 46548 /usr/lib/postfix/master 2424 1696 640 46748 qmgr -l -t fifo -u 2956 3395 2416 244644 postgres: autovacuum launcher process 3116 2216 720 65464 sshd: alex [priv] 4096 3393 1088 243316 postgres: writer process 6592 4110 2516 246808 postgres: joshua wopr [local] idle 11756 3391 900 243128 /var/lib/postgres/pgsql-9.0/bin/postgres 32640 4109 9084 255564 postgres: joshua wopr [local] idle in transaction So I think we will still need to protect the postmaster from OOM :(. One thing that's definitely changed is the interface used to control turning off the OOM killer. Grr... Whatever happens to a stable userspace abi? I don't think it's worth doing anything to the database code until tests on the newer kernel confirm whether this whole thing is even necessary anymore. +1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Fri, Nov 19, 2010 at 1:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: However, for lock-free interactions I think this model isn't terribly helpful: it's not clear what is inside and what is outside the sync block, and forcing your code into that model doesn't improve either clarity or performance. What you typically need is a guarantee about the order in which writes become visible. To give a concrete example, the sinval bug I was mentioning earlier boiled down to assuming that a write into an element of the sinval message array would become visible to other processors before the change of the last-message pointer variable became visible to them. Without a fence instruction, that doesn't hold on WMO processors, and so they were able to fetch a stale message value. In some cases you also need to guarantee the order of reads. But what about timings vs. random other stuff? Like in this case there's a problem if the signal arrives before the memory update to latch-is_set becomes visible. I don't know what we need to do to guarantee that. This page seems to indicate that x86 is OK as far as this is concerned - we can simply store a 1 and everyone will see it: http://coding.derkeiler.com/Archive/Assembler/comp.lang.asm.x86/2004-08/0979.html ...but if we were to, say, increment a counter at that location, it would not be safe without a LOCK prefix (further messages in the thread indicate that you might also have a problem if the address in question is unaligned). It's not obvious to me, however, what might be required on other processors. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
The other aspect of the standard that the Postgres implementation does not currently support is the fact that unnest is supposed to be defined in terms of laterally derived subqueries, e.g. you should be able to unnest another element from a from list entry laterally on the left. CREATE TABLE t1(id int, values int[]); SELECT id, a FROM t1 UNNEST(values) as u(a); If you consider it in terms of LATERAL, which Postgres also doesn't support, then you may find that it works out much more cleanly to consider the multi-array unnest in terms of that rather than in terms of an outer join. Specifically since arrays are implicitly ordered on their ordinality a simple array lookup is much easier/more efficient than performing a full fledged join operator. E.g. the rewrite is: SELECT id, values[i] as a FROM t1 LATERAL(SELECT generate_series(array_lower(values, 1), array_upper(values, 1) ) as lat(i); But then LATERAL support is something that has been discussed on and off for a while without seeing much progress. Regards, Caleb On Nov 19, 2010, at 11:06 AM, David Fetter wrote: On Fri, Nov 19, 2010 at 01:48:06PM -0500, caleb.wel...@emc.com wrote: Note the standard also supports unnesting multiple arrays concurrently, the rule for handling arrays with different lengths is to use null padding of the shorter array. Interesting. I notice that our version doesn't support multiple-array UNNEST just yet. SELECT * FROM UNNEST(ARRAY[1,2,3,4], ARRAY['hello','world']); ERROR: function unnest(integer[], text[]) does not exist LINE 1: SELECT * FROM UNNEST(ARRAY[1,2,3,4], ARRAY['hello','world'])... ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. SELECT * FROM UNNEST( ARRAY[5,2,3,4], ARRAY['hello', 'world'] ) WITH ORDINALITY AS t(a,b,i); a b i --- -- -- 5 'hello' 1 2 'world' 2 3 3 4 4 (4 rows) This looks a lot like an OUTER JOIN on the ORDINALITY column of each of the individual UNNEST...WITH ORDINALITYs. Given that we know the ORDINALITY in advance just by building the arrays, we could optimize this away from FULL JOIN to LEFT (or RIGHT) JOINs. To implement this it is not just substituting the existing unnest(anyarray) function in multiple times. Right. Regards, Caleb On Nov 19, 2010, at 4:50 AM, pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org wrote: From: David Fetter da...@fetter.orgmailto:da...@fetter.org Date: November 18, 2010 11:48:16 PM PST To: Itagaki Takahiro itagaki.takah...@gmail.commailto:itagaki.takah...@gmail.com Cc: PG Hackers pgsql-hackers@postgresql.orgmailto:pgsql-hackers@postgresql.org Subject: Re: UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF) On Fri, Nov 19, 2010 at 11:40:05AM +0900, Itagaki Takahiro wrote: On Fri, Nov 19, 2010 at 08:33, David Fetter da...@fetter.orgmailto:da...@fetter.org wrote: In order to get WITH ORDINALITY, would it be better to change gram.y to account for both WITH ORDINALITY and without, or just for the WITH ORDINALITY case? We probably need to change gram.y and make UNNEST to be COL_NAME_KEYWORD. UNNEST (without ORDINALITY) will call the existing unnest() function, and UNNEST() WITH ORDINALITY will call unnest_with_ordinality(). Thanks for sketching that out :) BTW, what will we return for arrays with 2 or more dimensions? At the moment, per the SQL standard, UNNEST without the WITH ORDINALITY clause flattens all dimensions. SELECT * FROM UNNEST(ARRAY[[1,2],[3,4]]); unnest 1 2 3 4 (4 rows) Unless we want to do something super wacky and contrary to the SQL standard, UNNEST(array) WITH ORDINALITY should do the same. There are no confusion in your two arguments version: UNNEST(anyarray, number_of_dimensions_to_unnest) but we will also support one argument version. Array indexes will be composite numbers in the cases. The possible design would be just return sequential serial numbers of the values -- the following two queries return the same results: - SELECT i, v FROM UNNEST($1) WITH ORDINALITY AS t(v, i) - SELECT row_number() OVER () AS i, v FROM UNNEST($1) AS t(v) Yes, that's what the standard says. Possible less-than-total unrolling schemes include: - Flatten specified number of initial dimensions into one list, e.g. turn UNNEST(array_3d, 2) into SETOF(array_1d) with one column of ORDINALITY - Flatten similarly, but have an ORDINALITY column for each flattened dimension. - More exotic schemes, such as UNNEST(array_3d, [1,3]), with either of the two methods above. And of course the all-important: - Other possibilities I haven't thought of :) Cheers, David. -- David Fetter da...@fetter.orgmailto:da...@fetter.org http://fetter.org/
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On Monday 15 November 2010 17:12:25 Robert Haas wrote: I notice that int8out isn't terribly consistent with int2out and int4out, in that it does an extra copy. Maybe that's justified given the greater potential memory wastage, but I'm not certain. One approach might be to pick some threshold value and allocate a buffer in one of two sizes based on how large the value is relative to that cutoff. But that might also be a stupid idea, not sure. I removed the extra buffer - its actually a tiny bit faster without it (I guess the allocation pattern is a bit nicer during copy as it will always take the same paths and eventually the same address). I couldn't measure any difference memory-usage wise. The code was that way before btw. It would speed things up for me if you or someone else could take a quick pass over what remains here and fix the formatting and whitespace to be consistent with our general project style, and make the comment headers more consistent among the functions being added/modified. I think I did most of those - the function comments in numutils weren't consistent before - now its consistent with the unchanged pg_atoi. Thanks for reviewing/applying the first part, Andres From 55acfa4f971f5a0e33eb8b9e66d621c16be96d42 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Fri, 19 Nov 2010 21:44:29 +0100 Subject: [PATCH] Implement custom int[248]-string conversion routines out of speed reasons. --- src/backend/utils/adt/int8.c | 10 +-- src/backend/utils/adt/numutils.c | 130 src/include/utils/builtins.h |1 + src/test/regress/expected/int2.out | 13 src/test/regress/expected/int4.out | 13 src/test/regress/expected/int8.out | 13 src/test/regress/sql/int2.sql |4 + src/test/regress/sql/int4.sql |4 + src/test/regress/sql/int8.sql |4 + 9 files changed, 172 insertions(+), 20 deletions(-) diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 894110d..8f4ef5a 100644 *** a/src/backend/utils/adt/int8.c --- b/src/backend/utils/adt/int8.c *** *** 20,25 --- 20,26 #include funcapi.h #include libpq/pqformat.h #include utils/int8.h + #include utils/builtins.h #define MAXINT8LEN 25 *** Datum *** 157,170 int8out(PG_FUNCTION_ARGS) { int64 val = PG_GETARG_INT64(0); ! char *result; ! int len; ! char buf[MAXINT8LEN + 1]; ! ! if ((len = snprintf(buf, MAXINT8LEN, INT64_FORMAT, val)) 0) ! elog(ERROR, could not format int8); ! result = pstrdup(buf); PG_RETURN_CSTRING(result); } --- 158,166 int8out(PG_FUNCTION_ARGS) { int64 val = PG_GETARG_INT64(0); ! char *result = palloc(MAXINT8LEN + 1); ! pg_lltoa(val, result); PG_RETURN_CSTRING(result); } diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c index 5f8083f..7b50549 100644 *** a/src/backend/utils/adt/numutils.c --- b/src/backend/utils/adt/numutils.c *** *** 3,10 * numutils.c * utility functions for I/O of built-in numeric types. * - * integer:pg_atoi, pg_itoa, pg_ltoa - * * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * --- 3,8 *** pg_atoi(char *s, int size, int c) *** 109,135 } /* ! * pg_itoa - converts a short int to its string represention * ! * Note: ! *previously based on ~ingres/source/gutil/atoi.c ! *now uses vendor's sprintf conversion */ void pg_itoa(int16 i, char *a) { ! sprintf(a, %hd, (short) i); } /* ! * pg_ltoa - converts a long int to its string represention * ! * Note: ! *previously based on ~ingres/source/gutil/atoi.c ! *now uses vendor's sprintf conversion */ void ! pg_ltoa(int32 l, char *a) { ! sprintf(a, %d, l); } --- 107,239 } /* ! * pg_ltoa - convert a signed 16bit integer to its string representation * ! * It doesnt seem worth implementing this separately. */ void pg_itoa(int16 i, char *a) { ! pg_ltoa((int32)i, a); } + /* ! * pg_ltoa: convert a signed 32bit integer to its string representation * ! * 'buf' has to be 12 bytes long to fit the result of any 32bit integer. ! * ! * Its unfortunate to have this function twice - once for 32bit, once ! * for 64bit, but incurring the cost of 64bit computation to 32bit ! * platforms doesn't seem to be acceptable. */ void ! pg_ltoa(int32 value, char *buf) { ! char *bufstart = buf; ! bool neg = false; ! ! /* ! * Avoid problems with the most negative not being representable ! * as a positive integer ! */ ! if (value == INT32_MIN) ! { ! memcpy(buf, -2147483648, 12); ! return; ! } ! else if (value 0) ! { ! value = -value; ! neg = true; ! } ! ! /* Build the string by computing the wanted
Re: [HACKERS] directory archive format for pg_dump
Hi Dimitri and Joachim. I've looked the patch too, and I want to share some thoughts too. I've used http://wiki.postgresql.org/wiki/Reviewing_a_Patch to guide my review. Submission review: I've apllied and compiled the patch successfully using the current master. Usability review: The dir format generated in my database 60 files, with different sizes, and it looks very confusing. Is it possible to use the same trick as pigz and pbzip2, creating a concatenated file of streams? Feature test: Just a partial review. I can dump / restore using lzf, but didnt stress it hard to check robustness. Performance review: Didnt test it hard too, but looks ok. Coding review: Just a shallow review here. I think I'd like to see a separate patch for the new compression support. Sorry about that, I realize that's extra work… Same feeling here, this is the 1st thing that I notice. The md5.c and kwlookup.c reuse using a link doesn't look nice either. This way you need to compile twice, among others things, but I think that its temporary, right? -- José Arthur Benetasso Villanova -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changes to Linux OOM killer in 2.6.36
Kevin Grittner wrote: Greg Smith wrote: oom_adj is deprecated, scheduled for removal in August 2010: That surprised me so I checked the URL. I believe you have a typo there and it's August, 2012. This is why I include references, so that when the cold medicine hits me in the middle of proofreading my message and I sent it anyway you aren't mislead. Yes, 2012, only a few months before doomsday. The aproaching end of the world then means any bugs left can be marked WONTFIX. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
[HACKERS] Hot Standby: too many KnownAssignedXids
Hi, I am seeing the following here on 9.0.1 on Linux x86-64: LOG: redo starts at 1F8/FC00E978 FATAL: too many KnownAssignedXids CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23 and this is the complete history: postgres was running as HS in foreground, Ctrl-C'ed it for a restart. LOG: received fast shutdown request LOG: aborting any active transactions FATAL: terminating walreceiver process due to administrator command FATAL: terminating connection due to administrator command LOG: shutting down LOG: database system is shut down Started it up again: $ postgres -D /db/ LOG: database system was shut down in recovery at 2010-11-19 14:36:30 EST LOG: entering standby mode cp: cannot stat `/archive/000101F90001': No such file or directory cp: cannot stat `/archive/000101F800FC': No such file or directory LOG: redo starts at 1F8/FC00E978 FATAL: too many KnownAssignedXids CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23 LOG: startup process (PID 30052) exited with exit code 1 LOG: terminating any other active server processes (copied the log files over...) ./postgres -D /db/ LOG: database system was interrupted while in recovery at log time 2010-11-19 14:36:12 EST HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: entering standby mode LOG: restored log file 000101F90001 from archive LOG: restored log file 000101F800FC from archive LOG: redo starts at 1F8/FC00E978 FATAL: too many KnownAssignedXids CONTEXT: xlog redo insert: rel 1663/16384/18373; tid 3829898/23 LOG: startup process (PID 31581) exited with exit code 1 LOG: terminating any other active server processes Changing the line in the source code to give some more output gives me: FATAL: too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978, pArray-maxKnownAssignedXids: 6890 I still have the server, if you want me to debug anything or send a patch against 9.0.1 that gives more output, just let me know. Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] directory archive format for pg_dump
Excerpts from José Arthur Benetasso Villanova's message of vie nov 19 18:28:03 -0300 2010: The md5.c and kwlookup.c reuse using a link doesn't look nice either. This way you need to compile twice, among others things, but I think that its temporary, right? Not sure what you mean here, but kwlookup.c is a symlink without this patch too. It's just the way it works; the compilation environments here and in the backend are different, so there is no other option but to compile twice. I guess md5.c is a new one (I didn't check), but I would assume it's the same thing. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] directory archive format for pg_dump
Hi Dimitri, thanks for reviewing my patch! On Fri, Nov 19, 2010 at 2:44 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I think I'd like to see a separate patch for the new compression support. Sorry about that, I realize that's extra work… I guess it wouldn't be a very big deal but I also doubt that it makes the review that much easier. Basically the compression refactor patch would just touch pg_backup_custom.c (because this is the place where the libz compression is currently burried into) and the two new compress_io.(c|h) files. Everything else is pretty much the directory stuff and is on top of these changes. And it could be about personal preferences, but the way you added the liblzf support strikes me at odd, with all those #ifdefs everywhere. Is it possible to have a specific file for each supported compression format, then some routing code in src/bin/pg_dump/compress_io.c? Sure we could. But I wanted to wait with any fancy function pointer stuff until we have decided if we want to include the liblzf support at all. The #ifdefs might be a bit ugly but in case we do not include liblzf support, it's the easiest way to take it out again. As written in my introduction, this patch is not really about liblzf, liblzf is just a proof of concept for factoring out the compression part and I have included it, so that people can use it and see how much speed improvement they get. The routing code already exists but then the file is full of #ifdef sections to define the right supporting function when I think having a compress_io_zlib and a compress_io_lzf files would be better. Sure! I completely agree... Then there's the bulk of the new dump format feature in the other part of the patch, namely src/bin/pg_dump/pg_backup_directory.c. You have to update the copyright in the file header there, at least :) Well, not sure if we can just change the copyright notice, because in the end the structure was copied from one of the other files which all have the copyright notice in them, so my work is based on those other files... I'm hesitant as far as marking the patch Waiting on author to get it split. Joachim, what do you think? I will see if I can split it. Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] directory archive format for pg_dump
Dimitri Fontaine dimi...@2ndquadrant.fr writes: I think I'd like to see a separate patch for the new compression support. Sorry about that, I realize that's extra work⦠That part of the patch is likely to get rejected outright anyway, so I *strongly* recommend splitting it out. We have generally resisted adding random compression algorithms to pg_dump because of license and patent considerations, and I see no reason to suppose this one is going to pass muster. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] duplicate connection failure messages
Bruce Momjian br...@momjian.us writes: I was wondering that. I am unclear if we need it though --- can we not assume inet_ntop() exists on all systems? We assumed inet_ntoa() did. The Single Unix Spec includes inet_ntoa but not inet_ntop. Of course, the buildfarm will tell us. The buildfarm unfortunately contains only a subset of the platforms we care about. I don't think this problem is large enough to justify taking a portability risk by depending on non-SUS library functions. If you want to do this, please do it as suggested previously, ie depend on the copy of the code we have internally. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Robert Haas robertmh...@gmail.com writes: But what about timings vs. random other stuff? Like in this case there's a problem if the signal arrives before the memory update to latch-is_set becomes visible. I don't know what we need to do to guarantee that. I don't believe there's an issue there. A context swap into the kernel is certainly going to include msync. If you're afraid otherwise, you could put an msync before the kill() call, but I think it's a waste of effort. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
Andres Freund and...@anarazel.de writes: On Friday 19 November 2010 18:46:00 Tom Lane wrote: I poked around in the Intel manuals a bit. They do have mfence (also lfence and sfence) but so far as I can tell, those are only used to manage loads and stores that are issued by special instructions that explicitly mark the operation as weakly ordered. So the reason we're not seeing bugs is presumably that C compilers don't generate such instructions. Well. Some memcpy() implementations use string (or SIMD) operations which are weakly ordered though. I'd expect memcpy to msync at completion of the move if it does that kind of thing. Otherwise it's failing to ensure that the move is really done before it returns. For the Intel486 and Pentium processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor. For the P6 and more recent processor families, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow itâs cache coherency mechanism to ensure that the operation is carried out atomically. This operation is called âcache locking.â The cache coherency mechanism automatically prevents two or more processors that have cached the same area of memory from simultaneously modifying data in that area. Like it says, the cache coherency mechanism prevents this from being a problem for us. Once the change is made in a processor's cache, it's the cache's job to ensure that all processors see it --- and on Intel architectures, the cache does take care of that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: patch: format function - fixed oid
On Thu, Nov 18, 2010 at 11:54 PM, Pavel Stehule pavel.steh...@gmail.com wrote: -- Forwarded message -- From: Pavel Stehule pavel.steh...@gmail.com Date: 2010/11/18 Subject: Re: patch: format function, next generation To: Jeff Janes jeff.ja...@gmail.com Kopie: pgsql-hackers-ow...@postgresql.org Hello somebody takes my oid :) updated patch is in attachment Regards Pavel Stehule Dear Pavel and Hackers, I've reviewed this patch. It applied, makes, and passes make check. It has added regression tests that seem appropriate. I think the feature added matches the consensus that emerged from the very long email discussion. The C code seems fine (to my meager abilities to judge that). But I think the documentation does need some work. From func.sgml: This functions can be used to create a formated string or message. There are allowed three types of tags: %s as string, %I as SQL identifiers and %L as SQL literals. Attention: result for %I and %L must not be same as result of functionquote_ident/function and functionquote_literal/function functions, because this function doesn't try to coerce parameters to typetext/type type and directly use a type's output functions. The placeholder can be related to some explicit parameter with using a optional n$ specification inside format. Should we make it explicit that this is inspired by C's sprintf? Do we want to call them tags? This is introducing what seems to be a new word to describe what are usually (I think) called conversion specifiers. Must not be the same should be Might not be the same. However, it does not appear that quote_ident is willing to use coercion at all, and the %L behavior is more comparable to quote_nullable. Maybe: This function can be used to create a formatted string suitable for use as dynamic SQL or as a message. There are three types of conversion specifiers: %s for literal strings, %I for SQL identifiers, and %L for SQL literals. Note that the results of the %L conversion might not be the same as the results of the functionquote_nullable/function function, as the latter coerces its argument to typetext/type while functionformat/function uses a type's output function. A conversion can reference an explicit parameter position by using an optional n$ in the format specification. Does type's output function need to cross-reference someplace? coercion is described elsewhere in this section of docs, but output functions are not. And for the changes to plpgsql.sgml, I would propose: para Building a string for dynamic SQL statement can be simplified by using the functionformat/function function (see xref linkend=functions-string): programlisting EXECUTE format('UPDATE tbl SET %I = %L WHERE key = %L', colname, newvalue, keyvalue); /programlisting The functionformat/function format can be used together with the literalUSING/literal clause: programlisting EXECUTE format('UPDATE tbl SET %I = $1 WHERE key = $2', colname) USING newvalue, keyvalue; /programlisting This form is more efficient because the parameters literalnewvalue/literal and literalkeyvalue/literal are not converted to text. /para These are mostly grammatical changes, but with the last three lines I may have missed the meaning of what you originally intended--I'm not sure on that. Thanks, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] directory archive format for pg_dump
On Fri, Nov 19, 2010 at 11:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: Dimitri Fontaine dimi...@2ndquadrant.fr writes: I think I'd like to see a separate patch for the new compression support. Sorry about that, I realize that's extra work… That part of the patch is likely to get rejected outright anyway, so I *strongly* recommend splitting it out. We have generally resisted adding random compression algorithms to pg_dump because of license and patent considerations, and I see no reason to suppose this one is going to pass muster. I was already anticipating that possiblitiy and my inital patch description is along these lines. However, liblzf is BSD licensed so on the license side we should be fine. Regarding patents, your last comment was that you'd like to see if it's really worth it and so I have included support for lzf for anybody to go ahead and find that out. Will send an updated split up patch this weekend (which would actually be four patches already...). Joachim
Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)
On Saturday 20 November 2010 00:08:07 Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Friday 19 November 2010 18:46:00 Tom Lane wrote: I poked around in the Intel manuals a bit. They do have mfence (also lfence and sfence) but so far as I can tell, those are only used to manage loads and stores that are issued by special instructions that explicitly mark the operation as weakly ordered. So the reason we're not seeing bugs is presumably that C compilers don't generate such instructions. Well. Some memcpy() implementations use string (or SIMD) operations which are weakly ordered though. Like it says, the cache coherency mechanism prevents this from being a problem for us. Once the change is made in a processor's cache, it's the cache's job to ensure that all processors see it --- and on Intel architectures, the cache does take care of that. Check example 8.2.3.4 of 3a. - in my opinion that makes my example correct. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add a primary key using an existing index
On 10-11-07 01:54 PM, Gurjeet Singh wrote: Attached is the patch that extends the same feature for UNIQUE indexes. It also includes some doc changes for the ALTER TABLE command, but I could not verify the resulting changes since I don't have the doc-building infrastructure installed. Regards, Gurjeet, I've taken a stab at reviewing this. Submission Review: Tests The expected output for the regression tests you added don't match what I'm getting when I run the tests with your patch applied. I think you just need to regenerate the expected results they seem to be from a previous version of the patch (different error messages etc..). Documentation --- I was able to generate the docs. The ALTER TABLE page under the synopsis has ADD table_constraint where table_constraint is defined on the CREATE TABLE page. On the CREATE TABLE page table_constraint isn't defined as having the WITH , the WITH is part of index_parameters. I propose the alter table page instead have ADD table_constraint [index_parameters] where index_parameters also references the CREATE TABLE page like table_constraint. Usability Review Behaviour - I feel that if the ALTER TABLE ... renames the the index a NOTICE should be generated. We generate notices about creating an index for a new pkey. We should give them a notice that we are renaming an index on them. Coding Review: == Error Messages - in tablecmds your errdetail messages often don't start with a capital letter. I belive the preference is to have the errdetail strings start with a capital letter and end with a period. tablecmds.c - get_constraint_index_oid contains the check /* Currently only B-tree indexes are suupported for primary keys */ if (index_rel-rd_rel-relam != BTREE_AM_OID) elog(ERROR, \%s\ is not a B-Tree index, index_name); but above we already validate that the index is a unique index with another check. Today only B-tree indexes support unique constraints. If this changed at some point and we could have a unique index of some other type, would something in this patch need to be changed to support them? If we are only depending on the uniqueness property then I think this check is covered by the uniquness one higher in the function. Also note the typo in your comment above (suupported) Comments - index.c: Line 671 and 694. Your indentation changes make the comments run over 80 characters. If you end up submitting a new version of the patch I'd reformat those two comments. Other than those issues the patch looks good to me. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
On Fri, Nov 19, 2010 at 04:11:56PM -0500, caleb.wel...@emc.com wrote: The other aspect of the standard that the Postgres implementation does not currently support is the fact that unnest is supposed to be defined in terms of laterally derived subqueries, e.g. you should be able to unnest another element from a from list entry laterally on the left. CREATE TABLE t1(id int, values int[]); SELECT id, a FROM t1 UNNEST(values) as u(a); If you consider it in terms of LATERAL, which Postgres also doesn't support, then you may find that it works out much more cleanly to consider the multi-array unnest in terms of that rather than in terms of an outer join. Specifically since arrays are implicitly ordered on their ordinality a simple array lookup is much easier/more efficient than performing a full fledged join operator. E.g. the rewrite is: SELECT id, values[i] as a FROM t1 LATERAL(SELECT generate_series(array_lower(values, 1), array_upper(values, 1) ) as lat(i); But then LATERAL support is something that has been discussed on and off for a while without seeing much progress. Is LATERAL something you'd like to put preliminary support in for? :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons
On Fri, Nov 19, 2010 at 4:16 PM, Andres Freund and...@anarazel.de wrote: On Monday 15 November 2010 17:12:25 Robert Haas wrote: I notice that int8out isn't terribly consistent with int2out and int4out, in that it does an extra copy. Maybe that's justified given the greater potential memory wastage, but I'm not certain. One approach might be to pick some threshold value and allocate a buffer in one of two sizes based on how large the value is relative to that cutoff. But that might also be a stupid idea, not sure. I removed the extra buffer - its actually a tiny bit faster without it (I guess the allocation pattern is a bit nicer during copy as it will always take the same paths and eventually the same address). I couldn't measure any difference memory-usage wise. The code was that way before btw. Yeah, I know. After further thought I decided not to commit this part, because using 32 bytes when you only need 8 is sort of sucky. I'm not sure if it matters in real life, but if it's only a tiny speedup I guess I might as well play it safe. It would speed things up for me if you or someone else could take a quick pass over what remains here and fix the formatting and whitespace to be consistent with our general project style, and make the comment headers more consistent among the functions being added/modified. I think I did most of those - the function comments in numutils weren't consistent before - now its consistent with the unchanged pg_atoi. Thanks for reviewing/applying the first part, Sure thing. Thanks for taking time to do this - very nice speedup. This part now committed, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix for seg picksplit function
On Tue, Nov 16, 2010 at 6:07 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas robertmh...@gmail.com wrote: But on a broader note, I'm not very certain the sorting algorithm is sensible. For example, suppose you have 10 segments that are exactly '0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding, but it seems like this will result in a 15/15 split when we almost certainly want a 10/20 split. I think there will be problems in more complex cases as well. The documentation says about the less-than and greater-than operators that These operators do not make a lot of sense for any practical purpose but sorting. In order to illustrate a real problem we should think about gist behavior with great enough amount of data. For example, I tried to extrapolate this case to 10 of segs where 40% are (0,1) segs and 60% are (1,2) segs. And this case doesn't seem a problem for me. Well, the problem with just comparing on is that it takes very little account of the upper bounds. I think the cases where a simple split would hurt you the most are those where examining the upper bound is necessary to to get a good split. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fwd: What do these terms mean in the SOURCE CODE?
Is no one ready to help on this? :( -Vaibhav -- Forwarded message -- From: Vaibhav Kaushal vaibhavkaushal...@gmail.com Date: Fri, Nov 19, 2010 at 9:11 PM Subject: What do these terms mean in the SOURCE CODE? To: pgsql-hackers@postgresql.org I am going through the Executor code and come across the following terms quite often. Can someone tell me what do they mean (in a few (may be a couple of) sentences)? 1. Scan State 2. Plan State 3. Tuple Projection 4. EState 5. Qual 6. Expression They sound quite ambiguous in the source code, specially when some of them already have terms which have multiple meanings. Thanks for your time. -Vaibhav (*_*)
Re: [HACKERS] directory archive format for pg_dump
Hi Jose, 2010/11/19 José Arthur Benetasso Villanova jose.art...@gmail.com: The dir format generated in my database 60 files, with different sizes, and it looks very confusing. Is it possible to use the same trick as pigz and pbzip2, creating a concatenated file of streams? What pigz is parallelizing is the actual computation of the compressed data. The directory archive format however is a preparation for a parallel pg_dump, dumping several tables (especially large tables of course) in parallel via multiple database connections and multiple pg_dump frontends. The idea of multiplexing their output into one file has been rejected on the grounds that it would probably slow down the whole process. Nevertheless pigz could be implemented as an alternative compression algorithm and that way the custom and the directory archive format could use it, but here as well, license and patent questions might be in the way, even though it is based on libz. The md5.c and kwlookup.c reuse using a link doesn't look nice either. This way you need to compile twice, among others things, but I think that its temporary, right? No, it isn't. md5.c is used in the same way by e.g. libpq and there are other examples for links in core, check out src/bin/psql for example. Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Isn't HANDLE 64 bits on Win64?
Magnus Hagander mag...@hagander.net writes: On Tue, Nov 16, 2010 at 11:01, Magnus Hagander mag...@hagander.net wrote: So yes, it looks completely broken. I guess Windows doesn't actually *assign* you a handle larger than 2^32 until you actually ahve that many open handles. Typical values on my test system (win64) comes out at around 4000 in all tests. Patch applied for this and backpatched to 9.0. I did a bit of googling and found some references claiming that Win64 will never assign system handles that are outside the range representable as a signed long; and further stating there are standard macros HandleToLong and LongToHandle to perform those conversions. So I'd be comfortable with the original coding as long as we used those macros instead of random casting. Dunno if you think that'd be cleaner than what you did. (It's also a fair question whether those macros are available on Win32.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
On Sat, Nov 20, 2010 at 03:48, caleb.wel...@emc.com wrote: Note the standard also supports unnesting multiple arrays concurrently, the rule for handling arrays with different lengths is to use null padding of the shorter array. UNNEST( ARRAY[5,2,3,4], ARRAY['hello', 'world'] ) WITH ORDINALITY AS t(a,b,i); Hmmm, that means we cannot support multi-array unnest() with our generic aggregate functions. The function prototype might be like below, but we don't support such definition. unnest(anyarray1, anyarray2, ..., OUT anyelement1, OUT anyelement2, ...) RETURNS SETOF record So, we would need a special representation for multi-array unnest(). -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Isn't HANDLE 64 bits on Win64?
Magnus Hagander mag...@hagander.net writes: On Tue, Nov 16, 2010 at 16:23, Tom Lane t...@sss.pgh.pa.us wrote: What's not clear to me is whether the section title means that only certain handles have this guarantee, and if so whether we have to worry about running into ones that don't. I think it is pretty clear it does - the section has a list of different handles at the bottom. What we're using is a File Mapping Object, which is not on that list. And which is, AFAICT, not a user or gdi handle. That doesn't mean it's not guaranteed to be in the 32-bit space, but I'm pretty sure that specific page doesn't guarantee it. Well, the patch as-applied is fine with me. I just wanted to be sure we'd considered the alternatives, especially in view of the fact that we have not seen any clear failures of the previous coding. The reason this came to mind was http://archives.postgresql.org/pgsql-admin/2010-11/msg00128.php which looks for all the world like a handle transmission failure --- but that person claims to be running Win32, so unless he's wrong, this particular issue doesn't explain his problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers