[HACKERS] BGWorkers, shared memory pointers, and postmaster restart
Hi all I've been using the dynamic BGWorker support for some recent work, and I think I've found an issue with how postmaster restarts are handled. TL;DR: I don't think there's a safe way to use a BGWorker (static or dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg Datum that points into shared memory, and think we might need a API change to fix that. If the postmaster restarts due to a crash, it calls shared memory startup hooks and sets the found pointer to false, so they're re-inited. This makes sense as shm is presumed to be corrupt. Dynamic BGWorkers are killed as a part of restart. However, they're not unregistered, and are relaunched if there's a bgw_restart_time set. They're called with the same bgw_main_arg Datum as with the original launch. If this Datum is a pointer into the shared memory that just got zeroed and re-inited, exciting things happen. In my case the worker restart generally happens after shm is re-inited, and since it gets reinitialized with the same contents the dynamic bgworker registered during postmaster startup restarts normally. I only noticed the problem because my shared memory init hook launches bgworkers once shm is set up, and I was getting two copies of a bgworker after postmaster restart. This also affects static bgworkers that use a pointer into shared memory to pass data on EXEC_BACKEND platforms (Windows). For dynamic bgworkers, it seems like it'd be OK to just require extensions to re-register them after a postmaster restart, or add a BGW_UNREGISTER_ON_POSTMASTER_RESTART flag to let that be controlled so bgworkers that didn't receive pointers into shm didn't have to deal with it. So any pointer into shared memory that's being re-initialized would be discarded when the bgworker was unregistered during postmater restart. Dynamic bgworkers are new in 9.4, so we still have the freedom to change behaviour for them. But that won't fix static bgworkers that have a pointer into shm as an argument. In non-EXEC_BACKEND platforms we can just pass a pointer to palloc'd postmaster memory, but that won't work if we have to exec() after fork(). I don't think it's reasonable to say "well, don't do that" - if you can only send a single pass-by-value Datum to a bgworker's main function, their utility is greatly reduced. I could always set up an exit hook / SIGQUIT handler that tries to unregister dynamic bgworkers using their BGWorkerHandle s, from the worker that initially registered them. If the worker that registered them is the one that crashed and triggered a postmaster restart this won't do any good, so it's a half-solution at best. I can't stash the BGWorkerHandle s for the launched workers in shm and unregister them during postmaster restart either. For one thing, shm is presumed corrupt. For another, BGWorkerHandle is an incomplete struct with no exposed size, so I can't copy it into shm anyway. This seems like a bit of a pickle. Am I missing something obvious? The only way around it that I can currently see is to have a single static bgworker with no pointer argument launch and manage all the other workers required for the extension. It would launch them all with bgw_restart_time = BGW_NEVER_RESTART and set its self as the bgw_notify_pid . If they die, it unregisters them then registers a new one. Effectively, it's doing the work the bgworker code does already, except that it makes sure the bgworkers don't get relaunched on postmaster restart. Since it doesn't depend on shm being valid, this should work, but it's messy and seems like there should be a better way. Do we need to change how we define BGWorkers to require that a BGWorker with a Datum that points to shared memory be automatically unregistered by the postmaster on restart? This would have to apply to static BGWorkers too, so we'd want to think about adding a flag for it and maybe even backpatching the flag into 9.3; it'd only affect extensions that actually used the combination described above. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
On 04/15/2014 10:17 PM, Tom Lane wrote: >> > I actually think we should *add* a LIBPQEXPORT that handles this for >> > libpq, much like PGDLLEXPORT does for postgres(.exe). And in the >> > process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or >> > something. > My reaction to that is "not bloody likely". I remarked on this upthread > already, but there is absolutely no way that I want to clutter our source > code with platform-specific markings like that. > > Perhaps somebody could try a Windows build with PGDLLEXPORT defined to > empty, and verify that it works, and if so do a pgbench comparison > against a build done the existing way? Good idea. Personally, I don't care about Windows enough. I want it to work, but performance optimisation is beyond what I'm bothered with. Another useful test would be to modify libpq as described above, so its headers set __declspec(dllexport) on its exports during compilation and its headers set __declspec(dllimport) when included while compiling other binaries that will link to libpq. Then use *that* in pgbench too and see if it makes any meaningful difference. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [doc] EXPLAIN CREATE MATERIALIZED VIEW AS?
On Wed, Apr 16, 2014 at 12:35 PM, Amit Langote wrote: > Hi, > > Attached adds CREATE MATERIALIZED VIEW AS to the list of statements > that can be EXPLAINed. Now that you mention that, REFRESH MATERIALIZED VIEW can be EXPLAIN'ed as well, except that it returns that and does not error out: =# explain refresh materialized view aam; QUERY PLAN --- Utility statements have no plan structure (1 row) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we make beta testing better?
On Tue, Apr 15, 2014 at 5:47 PM, Josh Berkus wrote: > Hackers, > > I think 9.3 has given us evidence that our users aren't giving new > versions of PostgreSQL substantial beta testing, or if they are, they > aren't sharing the results with us. > > How can we make beta testing better and more effective? How can we get > more users to actually throw serious workloads at new versions and share > the results? > > I've tried a couple of things over the last two years and they haven't > worked all that well. Since we're about to go into another beta testing > period, we need something new. Ideas? > I think it boils down to making it really easy to create a workload generator. Most companies have simple single-threaded regression tests for functionality but very few companies have good parallel workload generators which reflect activities in their production environment. A documented beta test process/toolset which does the following would help: 1) Enables full query logging 2) Creates a replica of a production DB, record $TIME when it stops. 3) Allow user to make changes (upgrade to 9.4, change hardware, change kernel settings, ...) 4) Plays queries from the CSV logs starting from $TIME mimicking actual timing and transaction boundaries If Pg can make it easy to duplicate activities currently going on in production inside another environment, I would be pleased to fire a couple billion queries through it over the next few weeks. #4 should include reporting useful to the project, such as a sampling of queries which performed significantly worse and a few relative performance stats for overall execution time.
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Mon, Apr 14, 2014 at 1:11 PM, Peter Geoghegan wrote: > In the past, various hackers have noted problems they've observed with > this scheme. A common pathology is to see frantic searching for a > victim buffer only to find all buffer usage_count values at 5. It may > take multiple revolutions of the clock hand before a victim buffer is > found, as usage_count is decremented for each and every buffer. Also, > BufFreelistLock contention is considered a serious bottleneck [1], > which is related. I think that the basic problem here is that usage counts increase when buffers are referenced, but they decrease when buffers are evicted, and those two things are not in any necessary way connected to each other. In particular, if no eviction is happening, reference counts will converge to the maximum value. I've read a few papers about algorithms that attempt to segregate the list of buffers into "hot" and "cold" lists, and an important property of such algorithms is that they mustn't be allowed to make everything hot. It's easy to be too simplistic, here: an algorithm that requires that no more than half the list be hot will fall over badly on a workload where the working set exceeds the available cache and the really hot portion of the working set is 60% of the available cache. So you need a more sophisticated algorithm than that. But that core property that not all buffers can be hot must somehow be preserved, and our algorithm doesn't. This isn't a fundamental property of the usage-count idea; it's an artifact of the fact that usage count decreases are tied to eviction pressure rather than access pressure. For example, suppose we made a rule that if the total usage counts of all buffers exceed 3 * NBuffers, then every time you bump the usage count of a buffer from N to N+1, you're required to advance the clock sweep far enough to decrease the reference count of a buffer by one. When you want to reclaiim a buffer, you advance a separate clock sweep until you find a buffer with a zero usage count; if you circle the whole ring without finding one, then you reclaim the buffer you saw with the lowest usage count. There are obvious scalability problems here (everyone fighting over the right to advance the clock sweep) but ignore that for the sake of the thought experiment: now you have an algorithm where not all buffers can be hot. If some buffers are hotter than others, then whenever their usage count is decreased it will immediately get pushed back up again, but some other buffer then has to absorb the decrease. Only the buffers that are really hot can maintain high usage counts, because *somebody* has to have a low usage count. Even ignoring scalability concerns, this might not be (and probably isn't) exactly what we want to implement, but I think it illustrates an important control principle all the same: buffer "cooling" needs to be driven by the same underlying phenomenon - probably buffer access - as buffer "heating". If they're driven by unrelated phenomena, then the rates may be wildly incomparable, and you'll end up with everything hot or everything cold. If that happens, you lose, because with everything the same, there's no principled way to decide which things are actually best to evict. If we come up with some good solution for shared buffers, we should also consider it applying it to SLRU eviction. I believe that the current situation around CLOG eviction is none too pretty. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
On 04/16/2014 06:12 AM, Hannu Krosing wrote: > On 04/16/2014 01:25 AM, Tom Lane wrote: >> Hannu Krosing writes: >>> Is there a way to force it to prefer a plan where the results of (select >>> id from onemillion where data > '0.9' limit 100) >>> are passed to FDW as a single IN ( = ANY(...)) query and are retrieved >>> all at once ? >> You could write the query like that: >> >> select * from onemillion_pgfdw where id = any (array(select id from >> onemillion where data > '0.9' limit 100)); > My actual use-case was about a join between a local and a remote table > and without rewriting the query (they come from ORM) > > I was hoping to be able to nudge postgresql towards a better plan via some > tuning of table/fdw options or GUCs. > > for example, would postgresql use the WHERE id IN (...) query on remote > side for a query like > > select r.data, l.data > from onemillion_pgfdw r > join onemillion l > on r.id = l.id and l.data > '0.999'; > > if it recognizes that the local side returns only 1000 rows ? > > or would it still use 1000 individual WHERE id = $1 queries. > > Is getting the foreign data via IN and then turning the data into a hash > for joining one of the plans it considers at all ? It sees that could we need an extra tuning parameter for choosing the ID IN (...) + HASH plan over individual SELECT .. WHERE ID = $1 something between `fdw_startup_cost` and `fdw_tuple_cost` to signify that an IN query returning 1000 rows runs faster than 1000 = queries as I understan currently they both would be estimated as fdw_startup_cost + 1000 * fdw_tuple_cost the new parameter could be fdw_call_cost or fdw_query_cost and would estimate how much each individual call to fdw costs, thus favouring calls which return more data in one call Cheers Hannu > > Best > Hannu > >> Or at least you should be able to, except when I try it I get >> >> explain analyze >> select * from onemillion_pgfdw where id = any (array(select id from >> onemillion where data > '0.9' limit 100)); >> ERROR: operator does not exist: integer = integer[] >> HINT: No operator matches the given name and argument type(s). You might >> need to add explicit type casts. >> CONTEXT: Remote SQL command: EXPLAIN SELECT id, inserted, data FROM >> public.onemillion WHERE ((id = ANY ((SELECT null::integer[] >> >> so there's something the remote-estimate code is getting wrong here. >> (It seems to work without remote_estimate, though.) >> >> regards, tom lane > -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 5:00 AM, Peter Geoghegan wrote: > On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma wrote: >> There's a paper on a non blocking GCLOCK algorithm, that does lock >> free clock sweep and buffer pinning[7]. If we decide to stay with >> GCLOCK it may be interesting, although I still believe that some >> variant of buffer nailing[8] is a better idea, my experience shows >> that most of the locking overhead is cache line bouncing ignoring the >> extreme cases where our naive spinlock implementation blows up. > > You might be right about that, but lets handle one problem at a time. > Who knows what the bottleneck will end up being if and when we address > the naivety around frequency? I want to better characterize that > problem first. Just to summarize you about the previous discussion and the improvements that we decided to do in this area based on feedback are as follows: 1. Bgwriter needs to be improved so that it can help in reducing usage count and finding next victim buffer (run the clock sweep and add buffers to the free list). 2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist are less. 3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer (a spinlock for the freelist, and an lwlock for the clock sweep). Here we can try to make it lock free based on atomic ops as well. 4. Bgwriter needs to be more aggressive, logic based on which it calculates how many buffers it needs to process needs to be improved. 5. Contention around buffer mapping locks. 6. Cacheline bouncing around the buffer header spinlocks, is there anything we can do to reduce this? 7. Choose Optimistically used buffer in StrategyGetBuffer(). 8. Don't bump the usage count every time buffer is pinned. I have already addressed some of these improvements in patch[1] and for other's, I have plan to work on them for 9.5. I think here you want to address the improvements related to usage count and see if it can get us win in some of commonly used scenario's, without affecting any other commonly used scenario. I feel this is good idea to pursue and see if we can get good benefits with it. Infact few days back, I had ran some tests manually to see the problems around BufFreeListLock (currently I don't have script ready) and more recently Jason Petersen has done some benchmarking in this area which you can refer it here[2]. I wonder if we can work together to improve things in this area. [1] http://www.postgresql.org/message-id/006e01ce926c$c7768680$56639380$@kap...@huawei.com [2] https://googledrive.com/host/0Bx33JCTmOADOeTIwaE9KX21yWEk/Concurrency%20Limits%20with%20Large%20Working%20Sets With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
On 04/16/2014 01:25 AM, Tom Lane wrote: > Hannu Krosing writes: >> Is there a way to force it to prefer a plan where the results of (select >> id from onemillion where data > '0.9' limit 100) >> are passed to FDW as a single IN ( = ANY(...)) query and are retrieved >> all at once ? > You could write the query like that: > > select * from onemillion_pgfdw where id = any (array(select id from > onemillion where data > '0.9' limit 100)); My actual use-case was about a join between a local and a remote table and without rewriting the query (they come from ORM) I was hoping to be able to nudge postgresql towards a better plan via some tuning of table/fdw options or GUCs. for example, would postgresql use the WHERE id IN (...) query on remote side for a query like select r.data, l.data from onemillion_pgfdw r join onemillion l on r.id = l.id and l.data > '0.999'; if it recognizes that the local side returns only 1000 rows ? or would it still use 1000 individual WHERE id = $1 queries. Is getting the foreign data via IN and then turning the data into a hash for joining one of the plans it considers at all ? Best Hannu > > Or at least you should be able to, except when I try it I get > > explain analyze > select * from onemillion_pgfdw where id = any (array(select id from > onemillion where data > '0.9' limit 100)); > ERROR: operator does not exist: integer = integer[] > HINT: No operator matches the given name and argument type(s). You might > need to add explicit type casts. > CONTEXT: Remote SQL command: EXPLAIN SELECT id, inserted, data FROM > public.onemillion WHERE ((id = ANY ((SELECT null::integer[] > > so there's something the remote-estimate code is getting wrong here. > (It seems to work without remote_estimate, though.) > > regards, tom lane -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane wrote: > > Robert Haas writes: > >> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane wrote: > >>> Yeah. After a fast review of the custom-scan and cache-scan > >>> patches, it seems to me that my original fears are largely > >>> confirmed: the custom scan patch is not going to be sufficient to > >>> allow development of any truly new plan type. Yeah, you can plug in > >>> some new execution node types, but actually doing anything > >>> interesting is going to require patching other parts of the system. > > > >> Without prejudice to the rest of what you said, this argument doesn't > >> hold much water with me. I mean, anything that our extensibility > >> mechanism doesn't support today will require new hooks, but does that > >> mean we're never going to add any more hooks? I sure hope not. > > > > No, that's not what I said. ISTM that the argument for the > > custom-scan API is that it allows interesting new things to be done > > *without further modifying the core code*. But the example > > application (cache_scan) fails to demonstrate that, and indeed seems > > to be a counterexample. Whether we'd accept cache_scan on its own > > merits is a separate question. The problem for me is that custom-scan > > isn't showing that it can support what was claimed without doing > > serious damage to modularity and maintainability of the core code. > > I think there's two separate things in there, one of which I agree with > and one of which I disagree with. I agree that we must avoid damaging the > modularity and maintainability of the core code; I don't agree that > custom-scan needs to be able to do interesting things with zero additional > changes to the core code. If we come up with three interesting applications > for custom scan that require 5 new hooks between them, I'll consider that > a major success - assuming those hooks don't unduly limit future changes > we may wish to make in the core code. I think your concern about exposing > APIs that may not be terribly stable is well-founded, but I don't think > that means we shouldn't expose *anything*. > I agree 100%. We usually change hook definition release-by-release, and it is author's responsibility to follow the newer interface if he continues to maintain his extension on the newer release also. Probably, it is a gray stuff neither black nor white. If we can design a perfect interface, it might be good but has no evolution further. Of course, it does not justify poor designed interface, but an important stuff is to find out a best way at this moment. It may take core refactoring, not just exposing static functions. What I tried to implement is the only way to implement it. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> Andres Freund writes: > > What I think this discussion shows that this patch isn't ready for > > 9.4. The first iteration of the patch came in 2013-11-06. Imo that's > > pretty damn late for a relatively complex patch. And obviously we > > don't have agreement on the course forward. > > I don't think we need to stop discussing, but I think it's pretty > > clear that this isn't 9.4 material. And that it's far from "Ready for > Committer". > Yep, today is the expected feature freeze date towards v9.4. It is little bit late to include v9.4 features, unfortunately. > Yeah. I'm still not exactly convinced that custom-scan will ever allow > independent development of new plan types (which, with all due respect to > Robert, is what it was being sold as last year in Ottawa). But I'm not > opposed in principle to committing it, if we can find a way to have a cleaner > API for things like setrefs.c. It seems like late-stage planner processing > in general is an issue for this patch (createplan.c and subselect.c are > also looking messy). EXPLAIN isn't too great either. > > I'm not sure exactly what to do about those cases, but I wonder whether > things would get better if we had the equivalent of > expression_tree_walker/mutator capability for plan nodes. The state of > affairs in setrefs and subselect, at least, is a bit reminiscent of the > bad old days when we had lots of different bespoke code for traversing > expression trees. > Hmm. If we have something like expression_tree_walker/mutator for plan nodes, we can pass a walker/mutator function's pointer instead of exposing static functions that takes recursive jobs. If custom-plan provider (that has sub-plans) got a callback with walker/ mutator pointer, all it has to do for sub-plans are calling this new plan-tree walking support routine with supplied walker/mutator. It seems to me more simple design than what I did. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
Peter Eisentraut writes: > Let me point out again that my patch doesn't actually do anything about > PGDLLEXPORT or the like. It just adds automatic prototypes into > PG_FUNCTION_INFO_V1, to reduce compiler warnings in extensions and > reduce some boilerplate in general. Hmm ... for some reason I had gotten it in my head that you were adding PGDLLEXPORT to the autogenerated extern declarations, but at least the version of the patch in <1389762012.24046.2.ca...@vanquo.pezone.net> doesn't do that, so the point is moot. I still object to the aspect of the patch that moves the externs for _PG_init/_PG_fini into fmgr.h: that is conceptually wrong and will create more confusion than the trivial code savings is worth. But I won't complain if you commit the PG_FUNCTION_INFO_V1 changes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [doc] EXPLAIN CREATE MATERIALIZED VIEW AS?
Hi, Attached adds CREATE MATERIALIZED VIEW AS to the list of statements that can be EXPLAINed. -- Amit explain-create-materialized-view-as.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
On 4/14/14, 3:28 PM, Peter Eisentraut wrote: > On 4/4/14, 10:07 AM, Andres Freund wrote: >> If >> somebody previously tried to do the correct thing and attached >> PGDLLEXPORT to their own *function* prototoype, it would cause problems >> now. > > What is the difference (on affected platforms) between > > Datum funcname(PG_FUNCTION_ARGS); > > and writing (effectively) > > PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS); > Datum funcname(PG_FUNCTION_ARGS); > > or for that matter > > Datum funcname(PG_FUNCTION_ARGS); > PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS); > > > If there isn't a difference, then my patch is fine. Otherwise, it might > be good to document the issues for extension authors. Let me point out again that my patch doesn't actually do anything about PGDLLEXPORT or the like. It just adds automatic prototypes into PG_FUNCTION_INFO_V1, to reduce compiler warnings in extensions and reduce some boilerplate in general. If it turns out that this might help someone optimize the Windows build, then great. But I gather it won't, so so what. Or maybe it can be made to work, in which case extension authors will get compiler errors about places they need to clean up. So it could still help that way, but who knows, that's not the point. If there are still concerns in this area, we could commit just the part that adds the prototype to PG_FUNCTION_INFO_V1 and let that sit for a while, and then remove the (now redundant) explicit prototypes in a later release, so if there is a need to revert it, it won't be so big. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs
On 4/15/14, 11:11 PM, Tom Lane wrote: > Peter Eisentraut writes: >> On 4/14/14, 10:30 PM, Andrew Dunstan wrote: >>> Yes. It probably won't be a huge change, but it will need a bit of code. > >> It might be more future-proof if the build farm just called make >> check-world and used some other way to identify the individual tests in >> that output. Otherwise, we'll need a new build farm release every time >> a test suite is added. > > That argument would be more convincing if "make check-world" worked > on Windows ... What about it doesn't work on Windows? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The question about the type numeric
Hi hackers, I am learning about numeric . The comment of NumericShort format is: * In the NumericShort format, the remaining 14 bits of the header word * (n_short.n_header) are allocated as follows: 1 for sign (positive or * negative), 6 for dynamic scale, and 7 for weight. In practice, most * commonly-encountered values can be represented this way. So the Max of the NumericShort format should be up to 508 digits before the decimal point. So the sign of the number 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567 should be 0x807F. The number is 257 digits before the decimal point. But the sign is 0. So is there anything wrong? 2014-04-15 wangshuo HighGo Software Co.,Ltd. Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan Shandong, China(Head Office) Tel:+86-0531-55701530 Fax:+86-0531-55701544 Website:www.highgo.com Mobile:18766416137
Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup
On 4/9/14, 10:57 AM, Magnus Hagander wrote: > So it'd be an array, and by default you'd have something like: > basebackup_skip_path = $log_directory > ? > > Maybe use it to skip backup labels by default as well. > basebackup_skip_path = $log_directory, $backup_label_files > > > I hadn't considered any details, but yes, someting along that line. And > then you could also include arbitrary filenames or directories should > you want. What are the use cases for excluding anything else? pg_basebackup ought to have some intelligence about what files are appropriate to include or exclude, depending on what the user is trying to do. It shouldn't become a general file copying tool. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs
Peter Eisentraut writes: > On 4/14/14, 10:30 PM, Andrew Dunstan wrote: >> Yes. It probably won't be a huge change, but it will need a bit of code. > It might be more future-proof if the build farm just called make > check-world and used some other way to identify the individual tests in > that output. Otherwise, we'll need a new build farm release every time > a test suite is added. That argument would be more convincing if "make check-world" worked on Windows ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs
On 4/14/14, 10:30 PM, Andrew Dunstan wrote: > > On 04/14/2014 10:17 PM, Tom Lane wrote: >> Peter Eisentraut writes: >>> Add TAP tests for client programs >> I assume the buildfarm would need to be taught about this? >> >> > > > Yes. It probably won't be a huge change, but it will need a bit of code. It might be more future-proof if the build farm just called make check-world and used some other way to identify the individual tests in that output. Otherwise, we'll need a new build farm release every time a test suite is added. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Tue, Apr 15, 2014 at 11:53 PM, Christian Ullrich wrote: > * From: Robert Haas >> Why not just pass a command-line switch? > > Because, as I wrote in the message you are quoting, I did not think that > having a command-line option for the sole purpose of telling the > postmaster who its parent is was a suitable solution. > > I had already given up on that idea based on Amit's advice, and I will > create a patch based on a command-line option. > > While I have you here, though, any suggestions on what the name of that > option should be? I think --background is about right. --background as switch name seems to be okay. > Also, how should > I treat the option on non-Windows platforms? Should it just not be there > (= error), or be ignored if present? I think ignored for non-windows is better way to proceed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas wrote: > On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila wrote: >> On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas wrote: >>> For the create case, I'm wondering if we should put the block that >>> tests for !hmap *before* the _dosmaperr() and check for EEXIST. What >>> is your opinion? >> >> Either way is okay, but I think the way you are suggesting is better as it >> will make code consistent with other place (PGSharedMemoryCreate()). > > OK, can you prepare a patch? Please find attached patch to address this issue. One minor point to note is that now we have to call GetLastError() twice, once inside error path and once to check EEXIST, but I think that is okay as existing code in PGSharedMemoryCreate() does it that way. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_dsm_invalid_errcode_issue-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The question about the type numeric
sure.postgres wrote > Hi hackers, > > I am learning about numeric . > The comment of NumericShort format is: > * In the NumericShort format, the remaining 14 bits of the header word > * (n_short.n_header) are allocated as follows: 1 for sign (positive or > * negative), 6 for dynamic scale, and 7 for weight. In practice, most > * commonly-encountered values can be represented this way. > > So the Max of the NumericShort format should be up to 508 digits before > the decimal point. > So the sign of the number > 12345678901234567890123456789012345678901234567890 > 12345678901234567890123456789012345678901234567890123456789012345678901234567890 > 12345678901234567890123456789012345678901234567890123456789012345678901234567890 > 12345678901234567890123456789012345678901234567 > should be 0x807F. > The number is 257 digits before the decimal point. > But the sign is 0. > So is there anything wrong? [1000 00][0][00111 0][001] I appreciate that you got no responses on the original e-mail but if you are going to re-raise the question at least have the courtesy to respond to your original thread and not go and start a new one. And maybe trying rephrasing the question since most likely your original question was not worded in such a way to garner a response. I may have this totally wrong but I don't see why the sign of your number should be anything but "zero" since that is, I presume, the value of the specific bit for a positive number - which yours is. So, in short, nothing seems to "be wrong". If you think something is wrong you should probably state what that is explicitly and ask someone to explain what is happening. I would have said all this when I saw the first e-mail but I wasn't (and still am not) totally clear on what you are asking and was hoping someone more familiar could make better sense of it. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/The-question-about-the-type-numeric-tp5800173p5800174.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The question about the type numeric
Hi hackers, I am learning about numeric . The comment of NumericShort format is: * In the NumericShort format, the remaining 14 bits of the header word * (n_short.n_header) are allocated as follows: 1 for sign (positive or * negative), 6 for dynamic scale, and 7 for weight. In practice, most * commonly-encountered values can be represented this way. So the Max of the NumericShort format should be up to 508 digits before the decimal point. So the sign of the number 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567 should be 0x807F. The number is 257 digits before the decimal point. But the sign is 0. So is there anything wrong? Thank you! 2014-04-15 wangshuo HighGo Software Co.,Ltd. Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan Shandong, China(Head Office) Tel:+86-0531-55701530 Fax:+86-0531-55701544 Website:www.highgo.com Mobile:18766416137
Re: [HACKERS] Patch: iff -> if
Thom Brown-2 wrote > On 15 April 2014 23:19, Andreas 'ads' Scherbaum < > adsmail@ > > wrote: >> >> Hi, >> >> stumbled over a number of "iff" in the source where "if" is meant - not >> sure >> what the real story behind this is, but attached is a patch to fix the >> about >> 80 occurrences. >> >> This only appears in comments, not in any code path. > > Yeah, apparently those are intentional, and mean "if and only if" (i.e. > <=>) Just looking at the first few items someone's good intention is being ruined by bad execution...especially: "Add...references to the section...a block is in [iff] chapters aren't autolabelled..." Many of these are not mathematical propositions but flow-control logic for which "IF" is indeed the correct term-of-art; though re-reading the example above that is probably one of the more logical ones... Now: "...new tuple was inserted, and its HEAP_ONLY_TUPLE flag is set [if/iff] a HOT update was done" needs to be evaluated on its merits; namely does HEAP_ONLY_TUPLE ever get set if a HOT update was not done? If not then "IFF" is proper and necessary to convey that fact (though even this one is marginal and both versions are likely unambiguous in reality). In short "returns BOOLEAN" can reasonably be said to properly use "IF" in almost all cases - especially for those functions that are check-oriented (and thus obviously if the opposite condition is present the opposite result would be returned). "IFF" is not incorrect here but for me is misplaced and too-much-information. Given that it is unlikely the mis/over-use of IFF in the comments is a meaningful problem I would vote for letting the imprecision go and avoid the code churn. If the surrounding code was being altered anyway then having the corresponding comment updated seems acceptable; but a blanket find/replace doesn't seem helpful. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Patch-iff-if-tp5800153p5800172.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff -> if
On Tue, Apr 15, 2014 at 6:19 PM, Andreas 'ads' Scherbaum < adsm...@wars-nicht.de> wrote: > > Hi, > > stumbled over a number of "iff" in the source where "if" is meant - not > sure what the real story behind this is, but attached is a patch to fix the > about 80 occurrences. > > "IFF" is a common idiom in mathematics (and logic, which means it will be of interest in computer science, which shares boundaries with both) which stands for "If And Only If". Unless it has very recently fallen out of fashion in mathematics, it's almost certainly intentional to use "iff" in the comments, and not merely a typo. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Re: [HACKERS] Patch: iff -> if
Steve Crawford writes: > On 04/15/2014 05:36 PM, Andrew Dunstan wrote: >> On 04/15/2014 06:26 PM, Thom Brown wrote: >>> Yeah, apparently those are intentional, and mean "if and only if" >> This is a reasonably common idiom, or used to be. > If it has fallen into disuse the news has failed to reach me: > http://en.wikipedia.org/wiki/If_and_only_if > http://www.mathwords.com/i/if_and_only_if.htm > http://mathworld.wolfram.com/Iff.html The last discussion about this suggested that non-English speakers might be familiar with other abbreviations for the concept. But anyway, yes, those are not typos. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff -> if
On 04/15/2014 05:36 PM, Andrew Dunstan wrote: On 04/15/2014 06:26 PM, Thom Brown wrote: On 15 April 2014 23:19, Andreas 'ads' Scherbaum wrote: Hi, stumbled over a number of "iff" in the source where "if" is meant - not sure what the real story behind this is, but attached is a patch to fix the about 80 occurrences. This only appears in comments, not in any code path. Yeah, apparently those are intentional, and mean "if and only if" (i.e. <=>) This is a reasonably common idiom, or used to be. If it has fallen into disuse the news has failed to reach me: http://en.wikipedia.org/wiki/If_and_only_if http://www.mathwords.com/i/if_and_only_if.htm http://mathworld.wolfram.com/Iff.html ... Cheers, Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff -> if
On 04/15/2014 06:26 PM, Thom Brown wrote: On 15 April 2014 23:19, Andreas 'ads' Scherbaum wrote: Hi, stumbled over a number of "iff" in the source where "if" is meant - not sure what the real story behind this is, but attached is a patch to fix the about 80 occurrences. This only appears in comments, not in any code path. Yeah, apparently those are intentional, and mean "if and only if" (i.e. <=>) This is a reasonably common idiom, or used to be. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma wrote: > PostgreSQL replacement algorithm is more similar to Generalized CLOCK > or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm > that approximates LIRS[3]. LIRS is what MySQL implements[4] and > CLOCK-Pro is implemented by NetBSD [5] and there has been some work on > trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double > the cache size metadata entries and detect pages that have been > recently referenced. Basically they provide an adaptive tradeoff > between LRU and LFU. That's good to know. > There's a paper on a non blocking GCLOCK algorithm, that does lock > free clock sweep and buffer pinning[7]. If we decide to stay with > GCLOCK it may be interesting, although I still believe that some > variant of buffer nailing[8] is a better idea, my experience shows > that most of the locking overhead is cache line bouncing ignoring the > extreme cases where our naive spinlock implementation blows up. You might be right about that, but lets handle one problem at a time. Who knows what the bottleneck will end up being if and when we address the naivety around frequency? I want to better characterize that problem first. > There has been some research that indicates that for TPC-A workloads > giving index pages higher weights increases hitrates[1]. Frankly, there doesn't need to be any research on this, because it's just common sense that probabilistically, leaf pages are much more useful than heap pages in servicing index scan queries if we assume a uniform distribution. If we don't assume that, then they're still more useful on average. > I think the hardest hurdle for any changes in this area will be > showing that we don't have any nasty regressions. I think the best way > to do that would be to study separately the performance overhead of > the replacement algorithm and optimality of the replacement choices. > If we capture a bunch of buffer reference traces by instrumenting > PinBuffer, we can pretty accurately simulate the behavior of different > algorithm and tuning choices with different shared buffer sizes. > Obviously full scale tests are still needed due to interactions with > OS, controller and disk caches and other miscellaneous influences. But > even so, simulation would get us much better coverage of various > workloads and at least some confidence that it's a good change > overall. It will be very hard and time consuming to gather equivalent > evidence with full scale tests. I think I agree with all of that. The fact that we as a community don't appear to have too much to say about what workloads to prioritize somewhat frustrates this. The other problem is that sizing shared_buffers appropriately involves a surprising amount of deference to rules of thumb that in practice no one is quite prepared to rigorously defend - who is to say what apportionment of memory to Postgres is appropriate here? I too was hopeful that we could evaluate this work purely in terms of observed improvements to hit rate (at least initially), but now I doubt even that. It would be great to be able to say "here are the parameters of this discussion", and have everyone immediately agree with that, but in this instance that's legitimately not possible. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
Hannu Krosing writes: > Is there a way to force it to prefer a plan where the results of (select > id from onemillion where data > '0.9' limit 100) > are passed to FDW as a single IN ( = ANY(...)) query and are retrieved > all at once ? You could write the query like that: select * from onemillion_pgfdw where id = any (array(select id from onemillion where data > '0.9' limit 100)); Or at least you should be able to, except when I try it I get explain analyze select * from onemillion_pgfdw where id = any (array(select id from onemillion where data > '0.9' limit 100)); ERROR: operator does not exist: integer = integer[] HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. CONTEXT: Remote SQL command: EXPLAIN SELECT id, inserted, data FROM public.onemillion WHERE ((id = ANY ((SELECT null::integer[] so there's something the remote-estimate code is getting wrong here. (It seems to work without remote_estimate, though.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Mon, Apr 14, 2014 at 8:11 PM, Peter Geoghegan wrote: > PostgreSQL implements a clock sweep algorithm, which gets us something > approaching an LRU for the buffer manager in trade-off for less > contention on core structures. Buffers have a usage_count/"popularity" > that currently saturates at 5 (BM_MAX_USAGE_COUNT). The classic CLOCK > algorithm only has one bit for what approximates our "usage_count" (so > it's either 0 or 1). I think that at its core CLOCK is an algorithm > that has some very desirable properties that I am sure must be > preserved. Actually, I think it's more accurate to say we use a > variant of clock pro, a refinement of the original CLOCK. PostgreSQL replacement algorithm is more similar to Generalized CLOCK or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm that approximates LIRS[3]. LIRS is what MySQL implements[4] and CLOCK-Pro is implemented by NetBSD [5] and there has been some work on trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double the cache size metadata entries and detect pages that have been recently referenced. Basically they provide an adaptive tradeoff between LRU and LFU. > In the past, various hackers have noted problems they've observed with > this scheme. A common pathology is to see frantic searching for a > victim buffer only to find all buffer usage_count values at 5. It may > take multiple revolutions of the clock hand before a victim buffer is > found, as usage_count is decremented for each and every buffer. Also, > BufFreelistLock contention is considered a serious bottleneck [1], > which is related. There's a paper on a non blocking GCLOCK algorithm, that does lock free clock sweep and buffer pinning[7]. If we decide to stay with GCLOCK it may be interesting, although I still believe that some variant of buffer nailing[8] is a better idea, my experience shows that most of the locking overhead is cache line bouncing ignoring the extreme cases where our naive spinlock implementation blows up. > Let's leave aside inner/root pages though, because they're so > dramatically useful when in a primary index on a tpb-b table that > they'll always be cached by any non-terrible algorithm. It beggars > belief that the still relatively dense (and consequently *popular*) > B+Tree leaf pages get so little credit for being of such long-term > utility (in the view of our clock sweep algorithm as implemented). The > algorithm has what could be loosely described as an excessively > short-term perspective. There is clearly a better balance to be had > here. I don't think the answer is that we have the B-Tree code give > its pages some special treatment that makes them harder to evict, > although I will admit that I briefly entertained the idea. There has been some research that indicates that for TPC-A workloads giving index pages higher weights increases hitrates[1]. I think the hardest hurdle for any changes in this area will be showing that we don't have any nasty regressions. I think the best way to do that would be to study separately the performance overhead of the replacement algorithm and optimality of the replacement choices. If we capture a bunch of buffer reference traces by instrumenting PinBuffer, we can pretty accurately simulate the behavior of different algorithm and tuning choices with different shared buffer sizes. Obviously full scale tests are still needed due to interactions with OS, controller and disk caches and other miscellaneous influences. But even so, simulation would get us much better coverage of various workloads and at least some confidence that it's a good change overall. It will be very hard and time consuming to gather equivalent evidence with full scale tests. [1] http://www.csd.uoc.gr/~hy460/pdf/p35-nicola.pdf [2] http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf [3] http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/jiang02_LIRS.pdf [4] http://lists.mysql.com/commits/28601 [5] http://fxr.watson.org/fxr/source/uvm/uvm_pdpolicy_clockpro.c?v=NETBSD [6] http://lwn.net/Articles/147879/ [7] http://derby-nb.googlecode.com/svn-history/r41/trunk/derby-nb/ICDE10_conf_full_409.pdf [8] http://www.postgresql.org/message-id/ca+tgmozypeyhwauejvyy9a5andoulcf33wtnprfr9sycw30...@mail.gmail.com Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
On 04/15/2014 02:25 PM, Josh Berkus wrote: > Hackers, > > We need documentation on how users should intelligently set the > multixact freeze settings. I'm happy to write the actual text, but I > definitely don't have any idea how to set these myself. Under what > circumstances should they be different from freeze_max_age? How? > Also: how do I check the multixact age of a table? There doesn't seem to be any data for this ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff -> if
On 15 April 2014 23:19, Andreas 'ads' Scherbaum wrote: > > Hi, > > stumbled over a number of "iff" in the source where "if" is meant - not sure > what the real story behind this is, but attached is a patch to fix the about > 80 occurrences. > > This only appears in comments, not in any code path. Yeah, apparently those are intentional, and mean "if and only if" (i.e. <=>) -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
Kyotaro HORIGUCHI writes: > [ pathkey_and_uniqueindx_v10_20130411.patch ] I thought some more about this patch, and realized that it's more or less morally equivalent to allowing references to ungrouped variables when the query has a GROUP BY clause listing all the columns of the primary key. In that case the parser is effectively pretending that the GROUP BY list contains additional implicit entries that are functionally dependent on the entries that are actually there. In this patch, what we want to do is recognize that trailing entries in an ORDER BY list are semantically no-ops and can be ignored because they are functionally dependent on earlier entries. Now, the reason that the parser restricts the functional dependency deduction to a primary key is that it wants to be able to identify a constraint OID that the query is dependent on to be semantically valid. In this case, we don't need such an OID, so just finding any old unique index on not-null columns is good enough. (If someone drops the index, the optimization might become incorrect, but that would force replanning anyway.) However, this way of thinking about it shows that the patch is missing possible optimizations. If we have "ORDER BY a, b, c" and (a,b) is the primary key, then including c in the ORDER BY list is semantically redundant, *whether or not we use an indexscan on the pkey index at all*. More: if we have "ORDER BY a, b, c" and the primary key is (b,a), we can still discard c from the sort requirement, even though the pkey index as such isn't helpful for producing the required order. So hacking up the pathkeys attributed to the indexscan is the wrong thing. Rather, what we should be looking to do is decide that c is a useless pathkey and remove it from the query_pathkeys, much as we'd do if we found "c = constant" in WHERE. That would allow optimization of other query plans besides scan-the-pkey-index plans. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Question about optimising (Postgres_)FDW
Hi I am playing around with postgres_fdw and found that the following code ... -- CREATE EXTENSION postgres_fdw; CREATE SERVER loop foreign data wrapper postgres_fdw OPTIONS (port '5432', dbname 'testdb'); CREATE USER MAPPING FOR PUBLIC SERVER loop; create table onemillion ( id serial primary key, inserted timestamp default clock_timestamp(), data text ); insert into onemillion(data) select random() from generate_series(1,100); CREATE FOREIGN TABLE onemillion_pgfdw ( id int, inserted timestamp, data text ) SERVER loop OPTIONS (table_name 'onemillion', use_remote_estimate 'true'); testdb=# explain analyse select * from onemillion_pgfdw where id in (select id from onemillion where data > '0.9' limit 100); QUERY PLAN - Nested Loop (cost=122.49..10871.06 rows=50 width=44) (actual time=4.269..93.444 rows=100 loops=1) -> HashAggregate (cost=22.06..23.06 rows=100 width=4) (actual time=1.110..1.263 rows=100 loops=1) -> Limit (cost=0.00..20.81 rows=100 width=4) (actual time=0.038..1.026 rows=100 loops=1) -> Seq Scan on onemillion (cost=0.00..20834.00 rows=100115 width=4) (actual time=0.036..0.984 rows=100 loops=1) Filter: (data > '0.9'::text) Rows Removed by Filter: 805 -> Foreign Scan on onemillion_pgfdw (cost=100.43..108.47 rows=1 width=29) (actual time=0.772..0.773 rows=1 loops=100) Total runtime: 93.820 ms (8 rows) Time: 97.283 ms -- ... actually performs 100 distinct "SELECT * FROM onemillion WHERE id = $1" calls on "remote" side. Is there a way to force it to prefer a plan where the results of (select id from onemillion where data > '0.9' limit 100) are passed to FDW as a single IN ( = ANY(...)) query and are retrieved all at once ? If not, how hord would it be to add this feature ? -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How can we make beta testing better?
Hackers, I think 9.3 has given us evidence that our users aren't giving new versions of PostgreSQL substantial beta testing, or if they are, they aren't sharing the results with us. How can we make beta testing better and more effective? How can we get more users to actually throw serious workloads at new versions and share the results? I've tried a couple of things over the last two years and they haven't worked all that well. Since we're about to go into another beta testing period, we need something new. Ideas? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila wrote: > On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas wrote: >> On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila wrote: >>> I have checked that other place in code also check handle to >>> decide if API has failed. Refer function PGSharedMemoryIsInUse(). >>> So I think fix to call GetLastError() after checking handle is okay. >>> Attached patch fixes this issue. After patch, the server shows below >>> log which is exactly what is expected from test_shm_mq >> >> In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred >> way to test for a NULL pointer. I notice that the !hmap style is used >> throughout this code, so I guess cleaning that up is a matter for a >> separate commit. > > I think in that case we might want to cleanup some other similar usage > (PGSharedMemoryCreate) of !hmap. Ah. Well, in that case maybe we should just leave it alone, since it's been like that forever and nobody's cared until now. >> For the create case, I'm wondering if we should put the block that >> tests for !hmap *before* the _dosmaperr() and check for EEXIST. What >> is your opinion? > > Either way is okay, but I think the way you are suggesting is better as it > will make code consistent with other place (PGSharedMemoryCreate()). OK, can you prepare a patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Need Multixact Freezing Docs
Hackers, We need documentation on how users should intelligently set the multixact freeze settings. I'm happy to write the actual text, but I definitely don't have any idea how to set these myself. Under what circumstances should they be different from freeze_max_age? How? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Excessive WAL generation and related performance issue
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/15/2014 02:15 PM, Heikki Linnakangas wrote: > You're counting XLogRecData structs, not backup blocks. Each > backup block typically consists of three XLogRecData structs, one > to record a BkpBlock struct, one to record the data before the > unused "hole" in the middle of the page, and one for after. > Sometimes just two, if there is no hole to skip. See the loop just > before the CRC calculation, that's where the XLogRecData entries > for backup blocks are created. > > - Heikki Ah, thanks for the explanation! Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTTaLxAAoJEDfy90M199hlwuQP/3Tuea1TUe+4L21ZdProNIIF fUtejFNUwqhkyWNcnePlubgSyTEIfHGEG9hatrB5/MWdzpiyEvXdDkvV1ODakEhJ CVKZbnQ4dmnrevypy2f2YdhlbB9du/DDFhFcPOZGbn+vLywwM9oMPS8tQmsol37e aITe2GnD5LpEcmCSqzz04OL+xAxKLe8fXaI9dDsTRWXb9qdj4pDHI706CeixwSFb sGsGcIHXmnWieMby9qfWc0WGpc38iMRRkE+LeaEULhsycFP/2x09irXdhbl5T1SH 4PItwX0/ZgLskklG2gaD4HpNe75+Emj1i22PHDYhXSoAzpykHUf+kZZwMUr0AbaF 5QVCer071jHaMacpaVC7/qwUt8zISx4/1wtJuQzfk5H3P2q4L+b/xPmod5/cqs9z /wFp+9kjMT4349sSMe1eDPTDoIZKgRh8Eiag5IfJtrOAjoK+FN+k8uWNikiyFDMu z/3l+6mbfrl7FAmfeXLFC9fqhhGOiGLHoZufB/4qFgEikj4S94Hx9Q0nHqkMsFvM Fcd3qcpLI06Xku7LmBPRvdZ8OVFGWAirH1jBlrdsvC9E5VoZxgByxg90EaTlwjAQ 1ZaGOsbQoXdOPOwe/rGx2ONGwgZp8uFwHSXzUY+CJucvfYQh2AD67AlEhS7Jb9NC ummpulzJ6arce0815KaT =Y+m4 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Excessive WAL generation and related performance issue
On 04/15/2014 11:53 PM, Joe Conway wrote: One more question before I get to that. I had applied the following patch to XLogInsert 8<-- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2f71590..e39cd37 100644 - --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -737,10 +737,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) uint32 len, write_len; unsignedi; + unsignedinorm; boolupdrqst; booldoPageWrites; boolisLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH); uint8 info_orig = info; + uint32 xl_tot_len; /* cross-check on whether we should be here or not */ if (!XLogInsertAllowed()) @@ -924,8 +926,23 @@ begin:; * header. */ INIT_CRC32(rdata_crc); + i = 0; + inorm = 0; for (rdt = rdata; rdt != NULL; rdt = rdt->next) + { COMP_CRC32(rdata_crc, rdt->data, rdt->len); + + if (rdt_lastnormal == rdt) + { + inorm = i; + i = 0; + } + else + i++; + } + xl_tot_len = SizeOfXLogRecord + write_len; + if ((inorm + i) > 4 || xl_tot_len > 2000) + elog(LOG, "XLogInsert;tot_nml_blks;%d;tot_bkp_blks;%d;tot_Xlog_Len;%d", inorm, i, xl_tot_len); START_CRIT_SECTION(); 8<-- The idea was to record number of normal and backup blocks, and total size of the record. I have quite a few entries in the log from the test run which are like: 8<-- 2014-04-11 08:42:06.904 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16168 2014-04-11 09:03:12.790 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16172 2014-04-11 10:16:57.949 PDT;LOG: XLogInsert;tot_nml_blks;3;tot_bkp_blks;5;tot_Xlog_Len;16150 8<-- and 8<-- 2014-04-11 11:17:08.313 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332 2014-04-11 11:17:08.338 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;16020 2014-04-11 11:17:08.389 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332 8<-- In other words, based on my inserted logic, it appears that there are 5 and 6 backup blocks on a fairly regular basis. However in xlog.h it says: 8<-- * If we backed up any disk blocks with the XLOG record, we use flag * bits in xl_info to signal it. We support backup of up to 4 disk * blocks per XLOG record. 8<-- So is my logic to record number of backup blocks wrong, or is the comment wrong, or am I otherwise misunderstanding something? You're counting XLogRecData structs, not backup blocks. Each backup block typically consists of three XLogRecData structs, one to record a BkpBlock struct, one to record the data before the unused "hole" in the middle of the page, and one for after. Sometimes just two, if there is no hole to skip. See the loop just before the CRC calculation, that's where the XLogRecData entries for backup blocks are created. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Excessive WAL generation and related performance issue
Joe Conway writes: > In other words, based on my inserted logic, it appears that there are > 5 and 6 backup blocks on a fairly regular basis. > However in xlog.h it says: > 8<-- > * If we backed up any disk blocks with the XLOG record, we use flag > * bits in xl_info to signal it. We support backup of up to 4 disk > * blocks per XLOG record. > 8<-- > So is my logic to record number of backup blocks wrong, or is the > comment wrong, or am I otherwise misunderstanding something? The comment is correct, so you did something wrong. From memory, there's a goto-label retry loop in that function; maybe you need to zero your counters after the retry label? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Excessive WAL generation and related performance issue
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/14/2014 04:34 PM, Joe Conway wrote: > On 04/14/2014 04:25 PM, Andres Freund wrote: >> On 2014-04-14 16:22:48 -0700, Joe Conway wrote: >>> That'll help performance, but lets say I generally keep WAL >>> files for PITR and don't turn that off before starting -- >>> shouldn't I be very surprised to need over 3TB of archive >>> storage when loading a 50GB table with a couple of indexes? > >> The point is that more frequent checkpoints will increase the >> WAL volume *significantly* because more full page writes will >> have to be generated. > > OK, I'll see how much it can be brought down through checkpoint > tuning and report back. One more question before I get to that. I had applied the following patch to XLogInsert 8<-- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2f71590..e39cd37 100644 - --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -737,10 +737,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) uint32 len, write_len; unsignedi; + unsignedinorm; boolupdrqst; booldoPageWrites; boolisLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH); uint8 info_orig = info; + uint32 xl_tot_len; /* cross-check on whether we should be here or not */ if (!XLogInsertAllowed()) @@ -924,8 +926,23 @@ begin:; * header. */ INIT_CRC32(rdata_crc); + i = 0; + inorm = 0; for (rdt = rdata; rdt != NULL; rdt = rdt->next) + { COMP_CRC32(rdata_crc, rdt->data, rdt->len); + + if (rdt_lastnormal == rdt) + { + inorm = i; + i = 0; + } + else + i++; + } + xl_tot_len = SizeOfXLogRecord + write_len; + if ((inorm + i) > 4 || xl_tot_len > 2000) + elog(LOG, "XLogInsert;tot_nml_blks;%d;tot_bkp_blks;%d;tot_Xlog_Len;%d", inorm, i, xl_tot_len); START_CRIT_SECTION(); 8<-- The idea was to record number of normal and backup blocks, and total size of the record. I have quite a few entries in the log from the test run which are like: 8<-- 2014-04-11 08:42:06.904 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16168 2014-04-11 09:03:12.790 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16172 2014-04-11 10:16:57.949 PDT;LOG: XLogInsert;tot_nml_blks;3;tot_bkp_blks;5;tot_Xlog_Len;16150 8<-- and 8<-- 2014-04-11 11:17:08.313 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332 2014-04-11 11:17:08.338 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;16020 2014-04-11 11:17:08.389 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332 8<-- In other words, based on my inserted logic, it appears that there are 5 and 6 backup blocks on a fairly regular basis. However in xlog.h it says: 8<-- * If we backed up any disk blocks with the XLOG record, we use flag * bits in xl_info to signal it. We support backup of up to 4 disk * blocks per XLOG record. 8<-- So is my logic to record number of backup blocks wrong, or is the comment wrong, or am I otherwise misunderstanding something? Thanks, Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTTZxJAAoJEDfy90M199hl15MQAKTcv9BoZTsXDleSu9JrU1ha UhHUnDALRmxWLgyPYsgtifxMQ3jLp5eLrkMHGnQbVD17619OgHckuOiEphc2bdQp MfZlv3jrEqxnmsh6qKhK1J23mHj0cohWXQ9EUoyjE6tlZueLPyMigaIV662KP1d2 pUXCh6IEJYMMaPfqhR5Mxi62s+HMkpAULhafWeEeAwcU1eYNijFWlyxJWlsv7D6X 9ZuDSmRtqnAP0g23GcbxNkL/I9Yv090Uxar7um2Rw5SEUV+Uv1kMY0GVCjHluE0k qZhSF1tE2jypThhSnv5xRHT3ZzdKoJtNmfLekjws7+dFZbSBLgNOj4EdV0H/wUgf NqO71kkeRhd44uMRzii0cr03LwBiwqC2apCYoZy7s0X3rl10hZfKgVEKkyhaZ4VJ QdfR1WdY/hC7mKW7NPnkycF+Es1ykEfuPnKHHsyJ3fHeFGxkKD3I6A8jGnNnS6VL ba+jx+t3qnrcKQAW8lqQ3rAij5Jkb97Ljibc7o6w8cgnGA4S0tqsE6jDrdDR1FO4 ns5uULTs4REU8clFwiKNZnQfINRUUfqY1mtlRneJMANeafm0j2CyIzvqLqB2mdOH YL9SS2lIngQlVSfgpu7EiSS7sJx8XGe3a3YFE9DoTBpq009scrscH40+kuN823wp yruufkzaBN6lyAjo3zoR =GQDN -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity
David Fetter writes: > On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote: >> Fixed. I added a personal script option that allows me to test contrib, >> but forgot to run it. > Is that script of general utility for committers? If so, it might be > good to include it in the distribution. I'd be happy to go through > and perl-ify it, document it, etc. Or maybe it could be a new make > target... I'm pretty sure "make check-world" would've covered this already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity
On 2014-04-15 12:32:36 -0700, David Fetter wrote: > On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote: > > On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote: > > > Bruce Momjian writes: > > > > psql: conditionally display oids and replication identity > > > > > > Buildfarm isn't terribly pleased with this --- looks like you missed > > > contrib/test_decoding/ > > > > Fixed. I added a personal script option that allows me to test contrib, > > but forgot to run it. > > Is that script of general utility for committers? If so, it might be > good to include it in the distribution. I'd be happy to go through > and perl-ify it, document it, etc. Or maybe it could be a new make > target... "make check-world" Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity
On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote: > On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > psql: conditionally display oids and replication identity > > > > Buildfarm isn't terribly pleased with this --- looks like you missed > > contrib/test_decoding/ > > Fixed. I added a personal script option that allows me to test contrib, > but forgot to run it. Is that script of general utility for committers? If so, it might be good to include it in the distribution. I'd be happy to go through and perl-ify it, document it, etc. Or maybe it could be a new make target... Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something flaky in the "relfilenode mapping" infrastructure
Andres Freund writes: >> On 2014-03-27 08:02:35 -0400, Tom Lane wrote: >>> Buildfarm member prairiedog thinks there's something unreliable about >>> commit f01d1ae3a104019d6d68aeff85c4816a275130b3: > So I had made a notice to recheck on > this. > http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedog&br=HEAD > indicates there haven't been any further failures... So, for now I > assume this was caused by some problem fixed elsewhere. Hard to say. In any case, I agree we can't make any progress unless we see it again. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something flaky in the "relfilenode mapping" infrastructure
On 2014-03-28 21:36:11 +0100, Andres Freund wrote: > Hi, > > On 2014-03-27 08:02:35 -0400, Tom Lane wrote: > > Buildfarm member prairiedog thinks there's something unreliable about > > commit f01d1ae3a104019d6d68aeff85c4816a275130b3: > > > *** > > /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/expected/alter_table.out > > Thu Mar 27 04:12:40 2014 > > --- > > /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/results/alter_table.out > > Thu Mar 27 04:52:02 2014 > > *** > > *** 2333,2339 > > ) mapped; > >incorrectly_mapped | have_mappings > > +--- > > ! 0 | t > > (1 row) > > That's rather odd. It has survived for a couple of months on the other > buildfarm animals now... Could one of you apply the attached patch > adding more details to eventual failures? So I had made a notice to recheck on this. http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedog&br=HEAD indicates there haven't been any further failures... So, for now I assume this was caused by some problem fixed elsewhere. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
* From: Amit Kapila > On Mon, Apr 14, 2014 at 11:46 AM, Christian Ullrich > wrote: > > * From: Amit Kapila > >> Do you mean to say use some existing environment variable? > >> Introducing an environment variable to solve this issue or infact > >> using some existing environ variable doesn't seem to be the best way > >> to pass such information. > > > > I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was > > set, the postmaster etc. would ignore the events. > > Do you plan to reset it and if yes when? > I think there is chance that after setting this environment variable, > some other instance of server (postmaster) can read it and missed the > signal which it should have otherwise processed. We have decided not to go this way, but just for completeness: Environment inheritance works the same way on Windows as on Unix. When a process is started with a modified environment (one of the plentiful arguments of CreateProcess() et al.), only that process and its descendants see the modification. I had not planned to set a system-level or user-level variable. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
* From: Bruce Momjian > On Mon, Apr 14, 2014 at 09:34:14AM +0530, Amit Kapila wrote: > > The problem can be solved this way, but the only question here is > > whether it is acceptable for users to have a new console window for > > server. > > > > Can others also please share their opinion if this fix (start server > > in new console) seems acceptable or shall we try by passing some > > information from pg_ctl and then ignore CTRL+C && CTRL+BREAK for it? > > I wanted to point out that I think this patch is only appropriate for > head, not backpatching. Also, on Unix we have to handle signals that Yes, of course. > come from the kill command. Can you send CTRL+C from other applications > on Windows? Yes again, using GenerateConsoleCtrlEvent() you can send these events to any (console-attached) process you have the required permissions for, but that is not an issue for the same reason it isn't one on Unix. All the target process sees is the event, it cannot determine (nor does it care) where the event came from. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
* From: Robert Haas > On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich > wrote: > > I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was > > set, the postmaster etc. would ignore the events. > > Why not just pass a command-line switch? Because, as I wrote in the message you are quoting, I did not think that having a command-line option for the sole purpose of telling the postmaster who its parent is was a suitable solution. I had already given up on that idea based on Amit's advice, and I will create a patch based on a command-line option. While I have you here, though, any suggestions on what the name of that option should be? I think --background is about right. Also, how should I treat the option on non-Windows platforms? Should it just not be there (= error), or be ignored if present? -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Tue, Apr 15, 2014 at 9:30 AM, Merlin Moncure wrote: > There are many reports of improvement from lowering shared_buffers. > The problem is that it tends to show up on complex production > workloads and that there is no clear evidence pointing to problems > with the clock sweep; it could be higher up in the partition locks or > something else entirely (like the O/S). pgbench is also not the > greatest tool for sniffing out these cases: it's too random and for > large database optimization is generally an exercise in de-randomizing > i/o patterns. We really, really need a broader testing suite that > covers more usage patterns. I find it quite dissatisfying that we know so little about this. I'm finding that my patch helps much less when shared_buffers is sized large enough to fit the index entirely (although there are still some localized stalls on master, where there are none with patched). shared_buffers is still far too small to fit the entire heap. With shared_buffers=24GB (which still leaves just under 8GB of memory for the OS to use as cache, since this system has 32GB of main memory), the numbers are much less impressive relative to master with the same configuration. Both sets of numbers are still better than what you've already seen with shared_buffers=8GB, since of course the "no more than 8GB" recommendation is not an absolute, and as you say its efficacy seemingly cannot be demonstrated with pgbench. My guess is that the patch doesn't help because once there is more than enough room to cache the entire index (slightly over twice as many buffers as would be required to do so), even on master it becomes virtually impossible to evict those relatively popular index pages, since they still have an early advantage. It doesn't matter that master's clock sweep has what I've called an excessively short-term perspective, because there is always enough pressure relative to the number of leaf pages being pinned to prefer to evict heap pages. There is still a lot of buffers that can fit some moderate proportion of all heap pages even after buffering the entire index (something like ~13GB). You might say that with this new shared_buffers setting, clock sweep doesn't need to have a "good memory", because it can immediately observe the usefulness of B-Tree leaf pages. There is no need to limit myself to speculation here, of course. I'll check it out using pg_buffercache. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich wrote: > I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was set, > the postmaster etc. would ignore the events. Why not just pass a command-line switch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On Thu, Apr 10, 2014 at 08:05:11PM -0400, Bruce Momjian wrote: > On Thu, Apr 10, 2014 at 07:58:55PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > It also has changed the OID status to only display if it exists. One > > > question that came up with Robert is whether OID status should appear > > > for \d as well, now that is only shows up when present. > > > > Yeah, I was wondering about that too. If part of the argument here is > > to make these two displays act more alike, it seems inconsistent that > > one is emitted by \d while the other only comes out with \d+. > > > > Of course, there are two ways to fix that: maybe the replica info > > also only belongs in \d+? > > OK, I changed my patch to only show replica info for \d+. If we decide > to change them to both display for \d, I will update it again. OK, hearing only quiet, I have applied the patch. I like that we now document the replication identity default, which should go a long way to making this clearer. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Mon, Apr 14, 2014 at 7:45 PM, Peter Geoghegan wrote: > On Mon, Apr 14, 2014 at 5:30 PM, Bruce Momjian wrote: >> I am glad you are looking at this. You are right that it requires a >> huge amount of testing, but clearly our code needs improvement in this >> area. > > Thanks. > > Does anyone recall the original justification for the recommendation > that shared_buffers never exceed 8GiB? I'd like to revisit the test > case, if such a thing exists. There are many reports of improvement from lowering shared_buffers. The problem is that it tends to show up on complex production workloads and that there is no clear evidence pointing to problems with the clock sweep; it could be higher up in the partition locks or something else entirely (like the O/S). pgbench is also not the greatest tool for sniffing out these cases: it's too random and for large database optimization is generally an exercise in de-randomizing i/o patterns. We really, really need a broader testing suite that covers more usage patterns. I was suspicious for a while that spinlock contention inside the clocksweep was causing stalls and posted a couple of different patches to try and reduce the chance of that. I basically gave up when I couldn't demonstrate that case in simulated testing. I still think there is no good reason for the clock to pedantically adjust usage count on contented buffers...better to throw a single TTAS and bail to the next buffer if either 'T' signals a lock. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Andres Freund writes: > What I think this discussion shows that this patch isn't ready for > 9.4. The first iteration of the patch came in 2013-11-06. Imo that's > pretty damn late for a relatively complex patch. And obviously we don't > have agreement on the course forward. > I don't think we need to stop discussing, but I think it's pretty clear > that this isn't 9.4 material. And that it's far from "Ready for Committer". Yeah. I'm still not exactly convinced that custom-scan will ever allow independent development of new plan types (which, with all due respect to Robert, is what it was being sold as last year in Ottawa). But I'm not opposed in principle to committing it, if we can find a way to have a cleaner API for things like setrefs.c. It seems like late-stage planner processing in general is an issue for this patch (createplan.c and subselect.c are also looking messy). EXPLAIN isn't too great either. I'm not sure exactly what to do about those cases, but I wonder whether things would get better if we had the equivalent of expression_tree_walker/mutator capability for plan nodes. The state of affairs in setrefs and subselect, at least, is a bit reminiscent of the bad old days when we had lots of different bespoke code for traversing expression trees. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Hi, On 2014-04-15 11:07:11 -0400, Robert Haas wrote: > On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane wrote: > > Robert Haas writes: > [ discussion ] What I think this discussion shows that this patch isn't ready for 9.4. The first iteration of the patch came in 2013-11-06. Imo that's pretty damn late for a relatively complex patch. And obviously we don't have agreement on the course forward. I don't think we need to stop discussing, but I think it's pretty clear that this isn't 9.4 material. And that it's far from "Ready for Committer". Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane wrote: >>> Yeah. After a fast review of the custom-scan and cache-scan patches, it >>> seems to me that my original fears are largely confirmed: the custom scan >>> patch is not going to be sufficient to allow development of any truly new >>> plan type. Yeah, you can plug in some new execution node types, but >>> actually doing anything interesting is going to require patching other >>> parts of the system. > >> Without prejudice to the rest of what you said, this argument doesn't >> hold much water with me. I mean, anything that our extensibility >> mechanism doesn't support today will require new hooks, but does that >> mean we're never going to add any more hooks? I sure hope not. > > No, that's not what I said. ISTM that the argument for the custom-scan > API is that it allows interesting new things to be done *without further > modifying the core code*. But the example application (cache_scan) fails > to demonstrate that, and indeed seems to be a counterexample. Whether > we'd accept cache_scan on its own merits is a separate question. The > problem for me is that custom-scan isn't showing that it can support what > was claimed without doing serious damage to modularity and maintainability > of the core code. I think there's two separate things in there, one of which I agree with and one of which I disagree with. I agree that we must avoid damaging the modularity and maintainability of the core code; I don't agree that custom-scan needs to be able to do interesting things with zero additional changes to the core code. If we come up with three interesting applications for custom scan that require 5 new hooks between them, I'll consider that a major success - assuming those hooks don't unduly limit future changes we may wish to make in the core code. I think your concern about exposing APIs that may not be terribly stable is well-founded, but I don't think that means we shouldn't expose *anything*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
* Tom Lane (t...@sss.pgh.pa.us) wrote: > A concrete example here is setrefs.c, whose responsibilities tend to > change from release to release. I think if we committed custom-scan > as is, we'd have great difficulty changing setrefs.c's transformations > ever again, at least if we hoped to not break users of the custom-scan > API. I'm not sure what the solution is --- but turning setrefs into > a white box instead of a black box isn't it. Yeah, this was my (general) complaint as well and the answer that I kept getting back is "well, it's ok, you can still break it between major releases and the custom scan users will just have to deal with it". I'm a bit on the fence about that, itself, but the other half of that coin is that we could end up with parts of the *core* code that think it's ok to go pulling in these functions, once they're exposed, and that could end up making things quite ugly and difficult to maintain going forward. Thanks, Stephen signature.asc Description: Digital signature
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Robert Haas writes: > On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane wrote: >> Yeah. After a fast review of the custom-scan and cache-scan patches, it >> seems to me that my original fears are largely confirmed: the custom scan >> patch is not going to be sufficient to allow development of any truly new >> plan type. Yeah, you can plug in some new execution node types, but >> actually doing anything interesting is going to require patching other >> parts of the system. > Without prejudice to the rest of what you said, this argument doesn't > hold much water with me. I mean, anything that our extensibility > mechanism doesn't support today will require new hooks, but does that > mean we're never going to add any more hooks? I sure hope not. No, that's not what I said. ISTM that the argument for the custom-scan API is that it allows interesting new things to be done *without further modifying the core code*. But the example application (cache_scan) fails to demonstrate that, and indeed seems to be a counterexample. Whether we'd accept cache_scan on its own merits is a separate question. The problem for me is that custom-scan isn't showing that it can support what was claimed without doing serious damage to modularity and maintainability of the core code. What this may mean is that we need more attention to refactoring of the core code. But just removing "static" from any function that looks like it might be handy isn't my idea of well-considered refactoring. More the opposite in fact: if those things turn into APIs that we have to support, it's going to kill any ability to do such refactoring. A concrete example here is setrefs.c, whose responsibilities tend to change from release to release. I think if we committed custom-scan as is, we'd have great difficulty changing setrefs.c's transformations ever again, at least if we hoped to not break users of the custom-scan API. I'm not sure what the solution is --- but turning setrefs into a white box instead of a black box isn't it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane wrote: > Simon Riggs writes: >> [ assorted comments about custom-scan patch, but particularly ] > >> * The prune hook makes me feel very uneasy. It seems weirdly specific >> implementation detail, made stranger by the otherwise lack of data >> maintenance API calls. Calling that for every dirty page sounds like >> an issue and my patch rejection indicator is flashing red around that. > > Yeah. After a fast review of the custom-scan and cache-scan patches, it > seems to me that my original fears are largely confirmed: the custom scan > patch is not going to be sufficient to allow development of any truly new > plan type. Yeah, you can plug in some new execution node types, but > actually doing anything interesting is going to require patching other > parts of the system. Are we going to say to all comers, "sure, we'll put > a hook call anywhere you like, just ask"? I can't see this as being the > way to go. Without prejudice to the rest of what you said, this argument doesn't hold much water with me. I mean, anything that our extensibility mechanism doesn't support today will require new hooks, but does that mean we're never going to add any more hooks? I sure hope not. When hooks are proposed here, we evaluate on them on their merits and attempt to judge the likelihood that a hook in a particular place will be useful, but generally we're not averse to adding them, and as long as the paths aren't too performance-critical, I don't think we should be averse to adding them. We have a great system today for letting people add new data types and things of that sort, but anything that penetrates more deeply into the heart of the system pretty much can't be done; this is why various companies, such as our respective employers, have developed and maintained forks of the PostgreSQL code base instead of just hooking in to the existing code. We probably can't solve that problem completely, but that doesn't mean we should throw in the towel. And in particular, I think it's pretty normal that a new facility like custom scans might create additional demand for new hooks. If something was completely impossible before, and the new facility makes it almost-possible, then why shouldn't someone ask for a hook there? A prune hook probably has no business in the custom scan patch proper, but whether it's a good idea or a bad one should be decided on the merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
Craig Ringer writes: > On 04/15/2014 03:39 AM, Tom Lane wrote: >> I still wish we could get rid of this problem by fixing the Windows build >> recipes so that the PGDLLEXPORT marking wasn't needed. We proved to >> ourselves recently that getting rid of PGDLLIMPORT on global variables >> wouldn't work, but I'm not sure that the function end of it was really >> investigated. > My understanding is that we *can* drop PGDLLEXPORT on functions without > actively breaking anything. But we probably shouldn't. > If we omit PGDLLEXPORT, the linker of the DLL/executable that imports > the extern function will generate a thunk from the .LIB file for the > target DLL during linkage; this thunk within the DLL/EXE with the > undefined extern then jumps to the real address within the defining DLL/EXE. TBH, if the only argument for this is a small efficiency difference, then to my mind it barely requires discussion. I don't give one hoot about micro-optimization for the Windows platform; I'm satisfied if it works at all there. And I seriously doubt that a couple more cycles to call any function implemented in a loadable module would matter anyway. > I actually think we should *add* a LIBPQEXPORT that handles this for > libpq, much like PGDLLEXPORT does for postgres(.exe). And in the > process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or > something. My reaction to that is "not bloody likely". I remarked on this upthread already, but there is absolutely no way that I want to clutter our source code with platform-specific markings like that. Perhaps somebody could try a Windows build with PGDLLEXPORT defined to empty, and verify that it works, and if so do a pgbench comparison against a build done the existing way? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
On 04/15/2014 03:39 AM, Tom Lane wrote: > I still wish we could get rid of this problem by fixing the Windows build > recipes so that the PGDLLEXPORT marking wasn't needed. We proved to > ourselves recently that getting rid of PGDLLIMPORT on global variables > wouldn't work, but I'm not sure that the function end of it was really > investigated. My understanding is that we *can* drop PGDLLEXPORT on functions without actively breaking anything. But we probably shouldn't. If we omit PGDLLEXPORT, the linker of the DLL/executable that imports the extern function will generate a thunk from the .LIB file for the target DLL during linkage; this thunk within the DLL/EXE with the undefined extern then jumps to the real address within the defining DLL/EXE. Reference: http://msdn.microsoft.com/en-us/library/zw3za17w.aspx So in other words, it makes calls across DLL boundaries less efficient by adding a layer of indirection. (No idea how this works in the presence of link time base address randomization either). I actually think we should *add* a LIBPQEXPORT that handles this for libpq, much like PGDLLEXPORT does for postgres(.exe). And in the process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or something. PGDLLEXPORT is probably less important overall - it'll mainly impact extensions (like hstore, intarray, etc) that call into the server. I wonder if this thunking still really mattres with modern CPU architecures' smart branch prediction, TLB caches, etc. I haven't found much info on the real world impact. It would probably be reasonable to add PGDLLEXPORT within postgres.exe only on functions we've intentionally exposed for use by extensions, where those functions are likely to get called a lot and don't have bigget costs like disk I/O, network I/O, expensive memory allocations, etc, that make call time overheads irrelevant. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The question about the type numeric
Hi hackers, I am learning about numeric . The comment of NumericShort format is: * In the NumericShort format, the remaining 14 bits of the header word * (n_short.n_header) are allocated as follows: 1 for sign (positive or * negative), 6 for dynamic scale, and 7 for weight. In practice, most * commonly-encountered values can be represented this way. So the Max of the NumericShort format should be up to 508 digits before the decimal point. So the sign of the number 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567 should be 0x807F. The number is 257 digits before the decimal point. But the sign is 0. So is there anything wrong? 2014-04-15 wangshuo HighGo Software Co.,Ltd. Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan Shandong, China(Head Office) Tel:+86-0531-55701530 Fax:+86-0531-55701544 Website:www.highgo.com Mobile:18766416137
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
Thank you for committing. At Fri, 28 Mar 2014 11:50:56 -0400, Tom Lane wrote in <21426.1396021...@sss.pgh.pa.us> tgl> Kyotaro HORIGUCHI writes: tgl> > Hello. Attached is the 2nd version of 'pushdown in UNION ALL on tgl> > partitioned tables' patch type 1 - fix in equiv-member version. tgl> tgl> Committed, thanks. tgl> tgl>regards, tom lane regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers