Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN
On Wed, May 7, 2014 at 10:52 PM, Amit Kapila wrote: > On Thu, May 8, 2014 at 5:30 AM, Jaime Casanova wrote: >> Hi, >> >> This patch implements $subject only when ANALYZE and VERBOSE are on. >> I made it that way because for years nobody seemed interested in this >> info (at least no one did it) so i decided that maybe is to much >> information for most people (actually btree indexes are normally very >> fast). > > > Why to capture only for Index Insert/Update and not for Read; is it > because Read will be always fast ot implementation complexity? > EXPLAIN ANALYZE already shows that on any SELECT that uses an index in some way. Or are you thinking on something else? > Why not similar timings for heap? > well "actual time" shows us total time of the operation so just deducting the time spent on triggers, indexes and planning seems like a way to get "heap modification time". yes, maybe we still need some additional data. for example, i could want to know how much time we spent extending a relation. > Why can't we print when only Analyze is used with Explain, the > execution time is printed with Analyze option? > i'm not sure the info is useful for everyone, i'm not opposed to show it all the time though > Could you please tell in what all kind of scenario's, do you expect it > to be useful? > One I could think is that if there are multiple indexes on a table and user > wants to find out if any particular index is consuming more time. > exactly my use case. consider this plan (we spent 78% of the time updating the index uniq_idx_on_text): QUERY PLAN Insert on public.t1 (actual time=0.540..0.540 rows=0 loops=1) -> Result (actual time=0.046..0.049 rows=1 loops=1) Output: Index uniq_idx_on_text on t1: time=0.421 rows=1 Index t1_pkey on t1: time=0.027 rows=1 Total runtime: 0.643 ms (6 rows) so i want to answer questions like, how much an index is hurting write performance? once i know that i can look for alternative solutions. In that vein, it was interesting to see how fastupdate affect performance in a GIN index using gin_trgm_ops (5 times slower with fastupdate=off) (fastupdate=on) Index idx_ggin on t1: time=0.418 rows=1 (fastupdate=off) Index idx_ggin on t1: time=2.205 rows=1 this is not different to showing trigger time info, which we already do -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Ignore files in src/interfaces/libpq generated by windows builds
Hi all, While doing some builds with mingw and 9.4, I noticed that a couple of files generated by build are not ignored in the source tree. Those two files are system.c and win32setlocale.c in src/interfaces/libpq. Please find attached a patch fixing that. Note that this is recent and is partially caused by commit a692ee5. Regards, -- Michael 20140508_mingw_gitignore.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Compilation errors with mingw build caused by undefined optreset
Hi all, Since commit 60ff2fd introducing the centralizated getopt-related things in a global header file, build on Windows with mingw is failing because of some declarations of HAVE_INT_OPTRESET causing optreset to become undefined: postmaster.c: In function 'PostmasterMain': postmaster.c:853:2: error: 'optreset' undeclared (first use in this function) postmaster.c:853:2: note: each undeclared identifier is reported only once for each function it appears in This failure is new with 9.4, and attached is a patch fixing it... Regards, -- Michael 20140508_minwg_getopt_error.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] popen and pclose redefinitions causing many warning in Windows build
Hi all, Since commit a692ee5, code compilation on windows is full of warnings caused by the re-definitions of popen and pclose: In file included from ../../../src/include/c.h:1028:0, from ../../../src/include/postgres.h:47, from analyze.c:25: ../../../src/include/port.h:312:0: warning: "popen" redefined [enabled by default] In file included from ../../../src/include/c.h:81:0, from ../../../src/include/postgres.h:47, from analyze.c:25: c:\mingw64\bin\../lib/gcc/x86_64-w64-mingw32/4.7.0/../../../../x86_64-w64-mingw32/include/stdio.h:48 9:0: note: this is the location of the previous definition In file included from ../../../src/include/c.h:1028:0, from ../../../src/include/postgres.h:47, from analyze.c:25: ../../../src/include/port.h:313:0: warning: "pclose" redefined [enabled by default] In file included from ../../../src/include/c.h:81:0, from ../../../src/include/postgres.h:47, from analyze.c:25 The patch attached fixes that. Regards, -- Michael 20140508_win_build_warnings.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
On 05/08/2014 10:53 AM, Tom Lane wrote: > Craig Ringer writes: >> On 05/08/2014 12:21 AM, Tom Lane wrote: >>> If Craig has a concrete argument why all GUCs should be accessible >>> to external modules, then let's see it > >> As for just GUCs: I suggested GUCs because GUCs are what's been coming >> up repeatedly as an actual practical issue. > > Meh. A quick look through the commit logs says that GUC variables are not > more than 50% of what we've had to PGDLLIMPORT'ify in the past year or > two. Maybe that's different from 2ndQuadrant's internal experience, > but then you've not showed us the use-case driving your changes. That's because the use case isn't that interesting, really, only the hiccups it's caused. I'd just as happily mark all extern vars PGDLLIMPORT, and only suggested focusing on GUCs because I didn't expect to get far with what I really thought was best. Re your concerns with exposing GUCs that should by rights be internal to the wider world, it'd be interesting to mark functions and vars as something like PGINTERNAL, to expand to: __attribute__((visibility ("hidden")) on gcc, so they're just not available for linkage in extensions. That's a weaker form of using -fvisibility=hidden, where we explicitly say "this is private" rather than treating everything as private unless explicitly marked public, which has already been rejected. Right now they're already exported for !windows, and while it's IMO a bug to have that difference for windows, it doesn't mean the correct answer is to export for all. If we're confident it won't break anything interesting it'd be OK to instead say "unexport on !windows too". In terms of ugliness, would you be happier using a pg_extern instead of extern, where we: #ifdef WIN32 #define pg_extern extern PGDLLIMPORT #else #define pg_extern extern #endif ? That makes it easier to pretend that there's nothing windows-y going on - and despite appearances, I'm also pretty keen not to have to think about that platform's linkage horrors when I don't have to. However, it also makes backpatching ickier. >> I'd be quite happy to >> PGDLLEXPORT all extern vars, but I was confident that'd be rejected for >> aesthetic reasons, and thought that exporting all GUCs would be a >> reasonable compromise. > > From the aesthetic standpoint, what I'd like is to not have to blanket > our source code with Windows-isms. But I guess I can't have that. I'd rather prefer that as well, but without the ability to go knocking at Redmond and introduce them to ELF and sane linkage, I don't think any of us are going to get it. If it weren't for backbranches etc the first thing I'd do to make it less ugly personally would be to rename PGDLLIMPORT as PGEXPORT and BUILDING_DLL as BUILDING_POSTGRES . The current names are unfortunate. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN
On Thu, May 8, 2014 at 5:30 AM, Jaime Casanova wrote: > Hi, > > This patch implements $subject only when ANALYZE and VERBOSE are on. > I made it that way because for years nobody seemed interested in this > info (at least no one did it) so i decided that maybe is to much > information for most people (actually btree indexes are normally very > fast). Why to capture only for Index Insert/Update and not for Read; is it because Read will be always fast ot implementation complexity? Why not similar timings for heap? Why can't we print when only Analyze is used with Explain, the execution time is printed with Analyze option? Could you please tell in what all kind of scenario's, do you expect it to be useful? One I could think is that if there are multiple indexes on a table and user wants to find out if any particular index is consuming more time. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
> >> > * ForeignScan node that is not associated with a particular > foreign-table. > >> > Once we try to apply ForeignScan node instead of Sort or > >> > Aggregate, > >> existing > >> > FDW implementation needs to be improved. These nodes scan on a > >> materialized > >> > relation (generated on the fly), however, existing FDW code assumes > >> > ForeignScan node is always associated with a particular > foreign-table. > >> > We need to eliminate this restriction. > >> > >> I don't think we need to do that, given the above. > >> > > It makes a problem if ForeignScan is chosen as alternative path of Join. > > > > The target-list of Join node are determined according to the query > > form on the fly, so we cannot expect a particular TupleDesc to be > > returned preliminary. Once we try to apply ForeignScan instead of Join > > node, it has to have its TupleDesc depending on a set of joined relations. > > > > I think, it is more straightforward approach to allow ForeignScan that > > is not associated to a particular (cataloged) relations. > > From your description, my understanding is that you would like to stream > data from 2 standard tables to the GPU, then perform a join on the GPU itself. > > I have been told that is not likely to be useful because of the data transfer > overheads. > Here are two solutions. One is currently I'm working; in case when number of rows in left- and right- tables are not balanced well, we can keep a hash table in the GPU DRAM, then we transfer the data stream chunk-by-chunk from the other side. Kernel execution and data transfer can be run asynchronously, so it allows to hide data transfer cost as long as we have enough number of chunks, like processor pipelining. Other solution is "integrated" GPU that kills necessity of data transfer, like Intel's Haswell, AMD's Kaveri or Nvidia's Tegra K1; all majors are moving to same direction. > Or did I misunderstand, and that this is intended to get around the current > lack of join pushdown into FDWs? > The logic above is obviously executed on the extension side, so it needs ForeignScan node to perform like Join node; that reads two input relation streams and output one joined relation stream. It is quite similar to expected FDW join-pushdown design. It will consume (remote) two relations and generates one output stream; looks like a scan on a particular relation (but no catalog definition here). Probably, it shall be visible to local backend as follows: (it is a result of previous prototype based on custom-plan api) postgres=# EXPLAIN VERBOSE SELECT count(*) FROM pgbench1_branches b JOIN pgbench1_accounts a ON a.bid = b.bid WHERE aid < 100; QUERY PLAN - Aggregate (cost=101.60..101.61 rows=1 width=0) Output: count(*) -> Custom Scan (postgres-fdw) (cost=100.00..101.43 rows=71 width=0) Remote SQL: SELECT NULL FROM (public.pgbench_branches r1 JOIN public.pgbench_accounts r2 ON ((r1.bid = r2.bid))) WHERE ((r2.aid < 100)) (4 rows) The place of "Custom Scan" node will be ForeignScan, if Join pushdown got supported. At that time, what relation should be scanned by this ForeignScan? It is the reason why I proposed ForeignScan node without particular relation. > Can you be specific about the actual architecture you wish for, so we can > understand how to generalise that into an API? > If we push the role of CustomPlan node into ForeignScan, I want to use this node to acquire control during query planning/execution. As I did in the custom-plan patch, first of all, I want extension to have a chance to add alternative path towards particular scan/join. If extension can take over the execution, it will generate a ForeignPath (or CustomPath) node then call add_path(). As usual manner, planner decide whether the alternative path is cheaper than other candidates. In case when it replaced scan relation by ForeignScan, it is almost same as existing API doing, except for the underlying relation is regular one, not foreign table. In case when it replaced join relations by ForeignScan, it will be almost same as expected ForeignScan with join-pushed down. Unlike usual table scan, it does not have actual relation definition on catalog, and its result tuple-slot is determined on the fly. One thing different from the remote-join is, this ForeignScan node may have sub-plans locally, if FDW driver (e.g GPU execution) may have capability on Join only, but no relation scan portion. So, unlike its naming, I want ForeignScan to support to have sub-plans if FDW driver supports the capability. Does it make you clear? Or, makes you more confused?? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers
Re: [HACKERS] Unhappy with error handling in psql's handleCopyOut()
On Tue, Feb 11, 2014 at 03:43:08PM -0500, Tom Lane wrote: > While looking at the pending patch to make psql report a line count > after COPY, I came across this business in handleCopyOut(): > > * Check command status and return to normal libpq state. After a > * client-side error, the server will remain ready to deliver data. The > * cleanest thing is to fully drain and discard that data. If the > * client-side error happened early in a large file, this takes a long > * time. Instead, take advantage of the fact that PQexec() will silently > * end any ongoing PGRES_COPY_OUT state. This does cause us to lose the > * results of any commands following the COPY in a single command string. > * It also only works for protocol version 3. XXX should we clean up > * using the slow way when the connection is using protocol version 2? > > which git blames on commit 08146775 (committed by Alvaro on behalf of > Noah). > > This does not make me happy. In the first place, we have not dropped > support for protocol version 2. As you may realize, that commit helped protocol version 3 most but also strictly improved things for protocol version 2. I didn't feel the need to improve them both to the same extent, a judgment that still seems reasonable. > In the second place, I fail to see > what the advantage is of kluging things like this. The main costs of > draining the remaining COPY data are the server-side work of generating > the data and the network transmission costs, neither of which will go > away with this technique. Agreed. >From your commit b8f00a4 of 2014-02-13: > + * If for some reason libpq is still reporting PGRES_COPY_OUT state, we > + * would like to forcibly exit that state, since our caller would be > + * unable to distinguish that situation from reaching the next COPY in a > + * command string that happened to contain two consecutive COPY TO > STDOUT > + * commands. However, libpq provides no API for doing that, and in > + * principle it's a libpq bug anyway if PQgetCopyData() returns -1 or -2 > + * but hasn't exited COPY_OUT state internally. So we ignore the > + * possibility here. >*/ PQgetCopyData() returns -2 without exiting COPY_OUT when its malloc() call fails. My submission of the patch that became commit 08146775 included a test procedure for getting an infinite loop in handleCopyOut(); that commit fixed the infinite loop. After commit b8f00a4, the same test procedure once again initiates an infinite loop. To a large extent, that's bad luck on the part of commit b8f00a4. I bet malloc() failure in pqCheckInBufferSpace() could initiate an infinite loop even with the longstanding 9.2/9.3 code. > So I'm thinking we should revert this kluge > and just drain the data straightforwardly, which would also eliminate > the mentioned edge-case misbehavior when there were more commands in > the query string. > > Is there a reason I'm overlooking not to do this? The moral of the infinite loop test case story is that draining the data straightforwardly has strictly more ways to fail. Consequently, commit b8f00a4 improved some scenarios while regressing others. Overall, I do think it was an improvement. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
2014-05-07 18:06 GMT+09:00 Kouhei Kaigai : > Let me list up the things to be clarified / developed randomly. > > * Join replacement by FDW; We still don't have consensus about join > replacement > by FDW. Probably, it will be designed to remote-join implementation > primarily, > however, things to do is similar. We may need to revisit the Hanada-san's > proposition in the past. I can't recall the details soon but the reason I gave up was about introducing ForiegnJoinPath node, IIRC. I'll revisit the discussion and my proposal. -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
Craig Ringer writes: > On 05/08/2014 12:21 AM, Tom Lane wrote: >> If Craig has a concrete argument why all GUCs should be accessible >> to external modules, then let's see it > As for just GUCs: I suggested GUCs because GUCs are what's been coming > up repeatedly as an actual practical issue. Meh. A quick look through the commit logs says that GUC variables are not more than 50% of what we've had to PGDLLIMPORT'ify in the past year or two. Maybe that's different from 2ndQuadrant's internal experience, but then you've not showed us the use-case driving your changes. > I'd be quite happy to > PGDLLEXPORT all extern vars, but I was confident that'd be rejected for > aesthetic reasons, and thought that exporting all GUCs would be a > reasonable compromise. >From the aesthetic standpoint, what I'd like is to not have to blanket our source code with Windows-isms. But I guess I can't have that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
Andrew Dunstan writes: > On 05/07/2014 04:13 PM, Tom Lane wrote: >> A README file would be better, >> perhaps, but there's not a specific directory associated with the jsonb >> code; so I think this sort of info belongs either in jsonb.h or in the >> file header comment for jsonb_gin.c. > Is there any reason we couldn't have a README.jsonb? We could, but the only place I can see to put it would be in backend/utils/adt/, which seems like a poor precedent; I don't want to end up with forty-two README.foo files in there. The header comments for the jsonbxxx.c files are probably better candidates for collecting this sort of info. (The larger problem here is that utils/adt/ has become a catchbasin for all kinds of stuff that can barely squeeze under the rubric of "abstract data type". But fixing that is something I don't care to tackle now.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor improvement to fdwhandler.sgml
(2014/05/05 23:05), Robert Haas wrote: On Wed, Apr 30, 2014 at 4:10 AM, Etsuro Fujita wrote: (2014/04/28 23:31), Robert Haas wrote: On Thu, Apr 24, 2014 at 7:59 AM, Etsuro Fujita wrote: The patch attached improves docs in fdwhandler.sgml a little bit. When you submit a patch, it's helpful to describe what the patch actually does, rather than just saying it makes things better. For example, I think that this patch could be described as "in fdwhandler.sgml, mark references to scan_clauses with tags". I thought so. Sorry, my explanation wasn't enough. A problem with that idea is that scan_clauses is not a field in any struct. I was mistaken. I think those should be marked with tags. Patch attached. OK, committed. Thanks! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
Simon, * Simon Riggs (si...@2ndquadrant.com) wrote: > I'm proposing something that is like an index, not like a plan node. > > The reason that proposal is being made is that we need to consider > data structure, data location and processing details. > > * In the case of Mat Views, if there is no Mat View, then we can't use > it - we can't replace that with just any mat view instead I agree with you about MatView's. There are clear trade-offs there, similar to those with indexes. > * GPUs and other special processing units have finite data transfer > rates, so other people have proposed that they retain data on the > GPU/SPU - so we want to do a lookaside only for situations where the > data is already prepared to handle a lookaside. I've heard this and I'm utterly unconvinced that it could be made to work at all- and it's certainly moving the bar of usefullness quite far away, making the whole thing much less practical. If we can't cost for this transfer rate and make use of GPUs for medium-to-large size queries which are only transient, then perhaps shoving all GPU work out across an FDW is actually the right solution, and make that like some kind of MatView as you're proposing- but I don't see how you're going to manage updates and invalidation of that data in a sane way for a multi-user PG system. > * The other cases I cited of in-memory data structures are all > pre-arranged items with structures suited to processing particular > types of query If it's transient in-memory work, I'd like to see our generalized optimizer consider them all instead of pushing that job on the user to decide when the optimizer should consider certain methods. > Given that I count 4-5 beneficial use cases for this index-like > lookaside, it seems worth investing time in. I'm all for making use of MatViews and GPUs, but there's more than one way to get there and look-asides feels like pushing the decision, unnecessarily, on to the user. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [v9.5] Custom Plan API
Simon, * Simon Riggs (si...@2ndquadrant.com) wrote: > On 8 May 2014 01:49, Kouhei Kaigai wrote: > >From your description, my understanding is that you would like to > stream data from 2 standard tables to the GPU, then perform a join on > the GPU itself. > > I have been told that is not likely to be useful because of the data > transfer overheads. That was my original understanding and, I believe, the case at one point, however... > Or did I misunderstand, and that this is intended to get around the > current lack of join pushdown into FDWs? I believe the issue with the transfer speeds to the GPU have been either eliminated or at least reduced to the point where it's practical now. This is all based on prior discussions with KaiGai- I've not done any testing myself. In any case, this is exactly what they're looking to do, as I understand it, and to do the same with aggregates that work well on GPUs. > Can you be specific about the actual architecture you wish for, so we > can understand how to generalise that into an API? It's something that *could* be done with FDWs, once they have the ability to have join push-down and aggregate push-down, but I (and, as I understand it, Tom) feel isn't really the right answer for this because the actual *data* is completely under PG in this scenario. It's just in-memory processing that's being done on the GPU and in the GPU's memory. KaiGai has speculated about other possibilities (eg: having the GPU's memory also used as some kind of multi-query cache, which would reduce the transfer costs, but at a level of complexity regarding that cache that I'm not sure it'd be sensible to try and do and, in any case, could be done later and might make sense independently, if we could make it work for, say, a memcached environment too; I'm thinking it would be transaction-specific, but even that would be pretty tricky unless we held locks across every row...). Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [v9.5] Custom Plan API
On 7 May 2014 18:39, Stephen Frost wrote: > * Simon Riggs (si...@2ndquadrant.com) wrote: >> On 7 May 2014 17:43, Stephen Frost wrote: >> > It's the optimizer's job to figure out which path to pick though, based >> > on which will have the lowest cost. >> >> Of course. I'm not suggesting otherwise. >> >> >> If do you want that, you can write an Event Trigger that automatically >> >> adds a lookaside for any table. >> > >> > This sounds terribly ugly and like we're pushing optimization decisions >> > on to the user instead of just figuring out what the best answer is. >> >> I'm proposing that we use a declarative approach, just like we do when >> we say CREATE INDEX. > > There's quite a few trade-offs when it comes to indexes though. I'm > trying to figure out when you wouldn't want to use a GPU, if it's > available to you and the cost model says it's faster? To me, that's > kind of like saying you want a declarative approach for when to use a > HashJoin. I'm proposing something that is like an index, not like a plan node. The reason that proposal is being made is that we need to consider data structure, data location and processing details. * In the case of Mat Views, if there is no Mat View, then we can't use it - we can't replace that with just any mat view instead * GPUs and other special processing units have finite data transfer rates, so other people have proposed that they retain data on the GPU/SPU - so we want to do a lookaside only for situations where the data is already prepared to handle a lookaside. * The other cases I cited of in-memory data structures are all pre-arranged items with structures suited to processing particular types of query Given that I count 4-5 beneficial use cases for this index-like lookaside, it seems worth investing time in. It appears that Kaigai wishes something else in addition to this concept, so there may be some confusion from that. I'm sure it will take a while to really understand all the ideas and possibilities. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
On 8 May 2014 01:49, Kouhei Kaigai wrote: >> > * ForeignScan node that is not associated with a particular foreign-table. >> > Once we try to apply ForeignScan node instead of Sort or Aggregate, >> existing >> > FDW implementation needs to be improved. These nodes scan on a >> materialized >> > relation (generated on the fly), however, existing FDW code assumes >> > ForeignScan node is always associated with a particular foreign-table. >> > We need to eliminate this restriction. >> >> I don't think we need to do that, given the above. >> > It makes a problem if ForeignScan is chosen as alternative path of Join. > > The target-list of Join node are determined according to the query form > on the fly, so we cannot expect a particular TupleDesc to be returned > preliminary. Once we try to apply ForeignScan instead of Join node, it > has to have its TupleDesc depending on a set of joined relations. > > I think, it is more straightforward approach to allow ForeignScan that > is not associated to a particular (cataloged) relations. >From your description, my understanding is that you would like to stream data from 2 standard tables to the GPU, then perform a join on the GPU itself. I have been told that is not likely to be useful because of the data transfer overheads. Or did I misunderstand, and that this is intended to get around the current lack of join pushdown into FDWs? Can you be specific about the actual architecture you wish for, so we can understand how to generalise that into an API? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
> I have completed the initial version of the 9.4 release notes. You can > view them here: > > http://www.postgresql.org/docs/devel/static/release-9-4.html > > I will be adding additional markup in the next few days. > > Feedback expected and welcomed. I expect to be modifying this until we > release 9.4 final. I have marked items where I need help with question > marks. E.1.3.7.1. System Information Functions Add functions for error-free pg_class, pg_proc, pg_type, and pg_operator lookups (Yugo Nagata, Nozomi Anzai, Robert Haas) For example, to_regclass() does error-free lookups of pg_class, and returns NULL for lookup failures. Probably "error-free" is too strong wording because these functions are not actualy error free. test=# select to_regclass('a.b.c.d'); ERROR: improper relation name (too many dotted names): a.b.c.d STATEMENT: select to_regclass('a.b.c.d'); Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
> > Let me list up the things to be clarified / developed randomly. > > > > * Join replacement by FDW; We still don't have consensus about join > replacement > > by FDW. Probably, it will be designed to remote-join implementation > primarily, > > however, things to do is similar. We may need to revisit the Hanada-san's > > proposition in the past. > > Agreed. We need to push down joins into FDWs and we need to push down > aggregates also, so they can be passed to FDWs. I'm planning to look at > aggregate push down. > Probably, it's a helpful feature. > > * Lookaside for ANY relations; I want planner to try GPU-scan for any > relations > > once installed, to reduce user's administration cost. > > It needs lookaside allow to specify a particular foreign-server, not > foreign- > > table, then create ForeignScan node that is not associated with a > particular > > foreign-table. > > IMHO we would not want to add indexes to every column, on every table, nor > would we wish to use lookaside for all tables. It is a good thing to be > able to add optimizations for individual tables. GPUs are not good for > everything; it is good to be able to leverage their strengths, yet avoid > their weaknesses. > > If do you want that, you can write an Event Trigger that automatically adds > a lookaside for any table. > It may be a solution if we try to replace scan on a relation by a ForeignScan, in other words, a case when we can describe 1:1 relationship between a table and a foreign-table; being alternatively scanned. Is it possible to fit a case when a ForeignScan replaces a built-in Join plans? I don't think it is a realistic assumption to set up lookaside configuration for all the possible combination of joins, preliminary. I have an idea; if lookaside accept a function, foreign-server or something subjective entity as an alternative path, it will be able to create paths on the fly, not only preconfigured foreign-tables. This idea will take two forms of DDL commands as: CREATE LOOKASIDE ON TO ; CREATE LOOKASIDE ON EXECUTE ; Things to do internally is same. TO- form kicks a built-in routine, instead of user defined function, to add alternative scan/join paths according to the supplied table/matview/foreign table and so on. > > * ForeignScan node that is not associated with a particular foreign-table. > > Once we try to apply ForeignScan node instead of Sort or Aggregate, > existing > > FDW implementation needs to be improved. These nodes scan on a > materialized > > relation (generated on the fly), however, existing FDW code assumes > > ForeignScan node is always associated with a particular foreign-table. > > We need to eliminate this restriction. > > I don't think we need to do that, given the above. > It makes a problem if ForeignScan is chosen as alternative path of Join. The target-list of Join node are determined according to the query form on the fly, so we cannot expect a particular TupleDesc to be returned preliminary. Once we try to apply ForeignScan instead of Join node, it has to have its TupleDesc depending on a set of joined relations. I think, it is more straightforward approach to allow ForeignScan that is not associated to a particular (cataloged) relations. > > * FDW method for MultiExec. In case when we can stack multiple ForeignScan > > nodes, it's helpful to support to exchange scanned tuples in their own > > data format. Let's assume two ForeignScan nodes are stacked. One > performs > > like Sort, another performs like Scan. If they internally handle column- > > oriented data format, TupleTableSlot is not a best way for data > exchange. > > I agree TupleTableSlot may not be best way for bulk data movement. We > probably need to look at buffering/bulk movement between executor nodes > in general, which would be of benefit for the FDW case also. > This would be a problem even for Custom Scans as originally presented also, > so I don't see much change there. > Yes. I is the reason why my Custom Scan proposition supports MultiExec method. > > * Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented > > using writable FDW feature. Not a big issue, but don't forget it... > > Yes, possible. > > > I hope these ideas make sense. This is early days and there may be other > ideas and much detail yet to come. > I'd like to agree general direction. My biggest concern towards FDW is transparency for application. If lookaside allows to redirect a reference towards scan/join on regular relations by ForeignScan (as an alternative method to execute), here is no strong reason to stick on custom-plan. However, existing ForeignScan node does not support to work without a particular foreign table. It may become a restriction if we try to replace Join node by ForeignScan, and it is my worry. (Even it may be solved during Join replacement by FDW works.) One other point I noticed. * SubPlan support; if an extension support its specia
Re: [HACKERS] bgworker crashed or not?
On 05/08/2014 05:45 AM, Robert Haas wrote: > I've pushed your rebased patch now with some further kibitzing, > especially in regards to the documentation. Thanks for tacking that Robert. It's great to have that API sorted out before 9.4 hits and it's locked in. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
On 05/08/2014 12:21 AM, Tom Lane wrote: > If Craig has a concrete argument why all GUCs should be accessible > to external modules, then let's see it Because they already are. The only difference here is that that access works only on !windows. I agree (strongly) that we should have a better defined API in terms of what is "accessible to external modules" and what is not. However, we don't, as you stressed just that in a prior discussion when I raised the idea of using -fvisbility=hidden to limit access to some symbols. Given that we don't have any kind of exernal vs internal API division, why pretend we do just for one platform? As for just GUCs: I suggested GUCs because GUCs are what's been coming up repeatedly as an actual practical issue. I'd be quite happy to PGDLLEXPORT all extern vars, but I was confident that'd be rejected for aesthetic reasons, and thought that exporting all GUCs would be a reasonable compromise. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 12:06 PM, Josh Berkus wrote: > For that matter, our advice on shared_buffers ... and our design for it > ... is going to need to change radically soon, since Linux is getting an > ARC with a frequency cache as well as a recency cache, and FreeBSD and > OpenSolaris already have them. I knew about ZFS, but Linux is implementing ARC? There are good reasons to avoid ARC. CAR seems like a more plausible candidate, since it apparently acknowledges ARC's shortcomings and fixes them. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN
On Wed, May 7, 2014 at 7:03 PM, Josh Berkus wrote: > On 05/07/2014 05:00 PM, Jaime Casanova wrote: >> Hi, >> >> This patch implements $subject only when ANALYZE and VERBOSE are on. >> I made it that way because for years nobody seemed interested in this >> info (at least no one did it) so i decided that maybe is to much >> information for most people (actually btree indexes are normally very >> fast). > > What's "index maintenance"? > Maybe is not the right term... i mean: the time that take to update indexes on an INSERT/UPDATE operation -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN
On 05/07/2014 05:00 PM, Jaime Casanova wrote: > Hi, > > This patch implements $subject only when ANALYZE and VERBOSE are on. > I made it that way because for years nobody seemed interested in this > info (at least no one did it) so i decided that maybe is to much > information for most people (actually btree indexes are normally very > fast). What's "index maintenance"? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [WIP] showing index maintenance on EXPLAIN
Hi, This patch implements $subject only when ANALYZE and VERBOSE are on. I made it that way because for years nobody seemed interested in this info (at least no one did it) so i decided that maybe is to much information for most people (actually btree indexes are normally very fast). But because we have GiST and GIN this became an interested piece of data (there are other cases even when using btree). Current patch doesn't have docs yet i will add them soon. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 From be4d94a7ca49b176d9f67476f2edde9e1f3d2a21 Mon Sep 17 00:00:00 2001 From: Jaime Casanova Date: Wed, 7 May 2014 18:28:46 -0500 Subject: [PATCH] Make EXPLAIN show measurements of updating indexes. This adds a line in the output of EXPLAIN ANALYZE VERBOSE showing the time that took to update indexes in INSERT/UPDATE operations. Also, add that info in auto_explain module using a new variable auto_explain.log_indexes. --- contrib/auto_explain/auto_explain.c| 14 ++ src/backend/catalog/indexing.c |2 +- src/backend/commands/copy.c|6 +-- src/backend/commands/explain.c | 75 src/backend/executor/execUtils.c | 19 +++- src/backend/executor/nodeModifyTable.c |6 +-- src/include/commands/explain.h |1 + src/include/executor/executor.h|4 +- src/include/nodes/execnodes.h |2 + 9 files changed, 118 insertions(+), 11 deletions(-) diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index c8ca7c4..f2c11e5 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -25,6 +25,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */ static bool auto_explain_log_analyze = false; static bool auto_explain_log_verbose = false; static bool auto_explain_log_buffers = false; +static bool auto_explain_log_indexes = false; static bool auto_explain_log_triggers = false; static bool auto_explain_log_timing = false; static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT; @@ -114,6 +115,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable("auto_explain.log_indexes", + "Include index statistics in plans.", + "This has no effect unless log_analyze is also set.", + &auto_explain_log_indexes, + false, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + DefineCustomBoolVariable("auto_explain.log_triggers", "Include trigger statistics in plans.", "This has no effect unless log_analyze is also set.", @@ -307,6 +319,8 @@ explain_ExecutorEnd(QueryDesc *queryDesc) ExplainBeginOutput(&es); ExplainQueryText(&es, queryDesc); ExplainPrintPlan(&es, queryDesc); + if (es.analyze && auto_explain_log_indexes) +ExplainPrintIndexes(&es, queryDesc); if (es.analyze && auto_explain_log_triggers) ExplainPrintTriggers(&es, queryDesc); ExplainEndOutput(&es); diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c index 4bf412f..b2d3a1e 100644 --- a/src/backend/catalog/indexing.c +++ b/src/backend/catalog/indexing.c @@ -46,7 +46,7 @@ CatalogOpenIndexes(Relation heapRel) resultRelInfo->ri_RelationDesc = heapRel; resultRelInfo->ri_TrigDesc = NULL; /* we don't fire triggers */ - ExecOpenIndices(resultRelInfo); + ExecOpenIndices(resultRelInfo, 0); return resultRelInfo; } diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 70ee7e5..fbb1f13 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2179,7 +2179,7 @@ CopyFrom(CopyState cstate) 1, /* dummy rangetable index */ 0); - ExecOpenIndices(resultRelInfo); + ExecOpenIndices(resultRelInfo, 0); estate->es_result_relations = resultRelInfo; estate->es_num_result_relations = 1; @@ -2333,7 +2333,7 @@ CopyFrom(CopyState cstate) if (resultRelInfo->ri_NumIndices > 0) recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self), - estate); + estate, NULL); /* AFTER ROW INSERT Triggers */ ExecARInsertTriggers(estate, resultRelInfo, tuple, @@ -2440,7 +2440,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid, ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false); recheckIndexes = ExecInsertIndexTuples(myslot, &(bufferedTuples[i]->t_self), - estate); + estate, NULL); ExecARInsertTriggers(estate, resultRelInfo, bufferedTuples[i], recheckIndexes); diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 08f3167..8e2b10d 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -502,6 +502,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
Re: [HACKERS] bgworker crashed or not?
On 07/05/14 23:45, Robert Haas wrote: It was, but I felt that the different pieces of this should be separated into separate commits, and (regardless of how I committed it) I needed to review each change separately. I wasn't saying it was your fault that it wasn't done; just that I hadn't gotten it committed yet. Oh ok, I got confused and thought you might have overlooked that part while modifying related parts of code. I've pushed your rebased patch now with some further kibitzing, especially in regards to the documentation. Cool, thanks :) -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
On 2014-05-07 17:48:15 -0400, Robert Haas wrote: > On Tue, May 6, 2014 at 6:09 PM, Andres Freund wrote: > >> I guess I'd vote for > >> ditching the allocated column completely and outputting the memory > >> allocated without ShmemIndex using some fixed tag (like "ShmemIndex" > >> or "Bootstrap" or "Overhead" or something). > > > > My way feels slightly cleaner, but I'd be ok with that as well. There's > > no possible conflicts with an actual segment... In your variant the > > unallocated/slop memory would continue to have a NULL key? > > Yeah, that seems all right. Hm. Not sure what you're ACKing here ;). > One way to avoid conflict with an actual segment would be to add an > after-the-fact entry into ShmemIndex representing the amount of memory > that was used to bootstrap it. There's lots of allocations from shmem that cannot be associated with any index entry though. Not just ShmemIndex's own entry. Most prominently most of the memory used for SharedBufHash isn't actually associated with the "Shared Buffer Lookup Table" entry - imo a dynahash.c defficiency. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 2014-05-07 16:24:53 -0500, Merlin Moncure wrote: > On Wed, May 7, 2014 at 4:15 PM, Andres Freund wrote: > > On 2014-05-07 13:51:57 -0700, Jeff Janes wrote: > >> On Wed, May 7, 2014 at 11:38 AM, Andres Freund > >> wrote: > >> > >> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote: > >> > > > >> > > *) raising shared buffers does not 'give more memory to postgres for > >> > > caching' -- it can only reduce it via double paging > >> > > >> > That's absolutely not a necessary consequence. If pages are in s_b for a > >> > while the OS will be perfectly happy to throw them away. > >> > > >> > >> Is that an empirical observation? > > > > Yes. > > > >> I've run some simulations a couple years > >> ago, and also wrote some instrumentation to test that theory under > >> favorably engineered (but still plausible) conditions, and couldn't get > >> more than a small fraction of s_b to be so tightly bound in that the kernel > >> could forget about them. Unless of course the entire workload or close to > >> it fits in s_b. > > > > I think it depends on your IO access patterns. If the whole working set > > fits into the kernel's page cache and there's no other demand for pages > > it will stay in. If you constantly rewrite most all your pages they'll > > also stay in the OS cache because they'll get written out. If the churn > > in shared_buffers is so high (because it's so small in comparison to the > > core hot data set) that there'll be dozens if not hundreds clock sweeps > > a second you'll also have no locality. > > It's also *hugely* kernel version specific :( > > right. This is, IMNSHO, exactly the sort of language that belongs in the > docs. Well, that's just the tip of the iceberg though. Whether you can accept small shared_buffers to counteract double buffering or not is also a hard to answer question... That again heavily depends on the usage patterns. If you have high concurrency and your working set has some locality it's very important to have a high s_b lest you fall afoul of the freelist lock. If you have high concurrency but 90+ of your page lookups *aren't* going to be in the cache you need to be very careful with a large s_b because the clock sweeps to lower the usagecounts can enlarge the lock contention. Then there's both memory and cache efficiency questions around both the PrivateRefCount array and the lwlocks In short: I think it's pretty hard to transfer this into language that's a) agreed upon b) understandable to someone that hasn't discovered several of the facts for him/herself. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
On Tue, May 6, 2014 at 6:09 PM, Andres Freund wrote: >> I guess I'd vote for >> ditching the allocated column completely and outputting the memory >> allocated without ShmemIndex using some fixed tag (like "ShmemIndex" >> or "Bootstrap" or "Overhead" or something). > > My way feels slightly cleaner, but I'd be ok with that as well. There's > no possible conflicts with an actual segment... In your variant the > unallocated/slop memory would continue to have a NULL key? Yeah, that seems all right. One way to avoid conflict with an actual segment would be to add an after-the-fact entry into ShmemIndex representing the amount of memory that was used to bootstrap it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Wed, May 7, 2014 at 4:55 PM, Petr Jelinek wrote: >> This isn't done yet. > > Unless I am missing something this change was included in every patch I sent > - setting rw->rw_terminate = true; in CleanupBackgroundWorker for zero exit > code + comment changes. Or do you have objections to this approach? > > Anyway missing parts attached. It was, but I felt that the different pieces of this should be separated into separate commits, and (regardless of how I committed it) I needed to review each change separately. I wasn't saying it was your fault that it wasn't done; just that I hadn't gotten it committed yet. I've pushed your rebased patch now with some further kibitzing, especially in regards to the documentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 2:24 PM, Merlin Moncure wrote: > right. This is, IMNSHO, exactly the sort of language that belongs in the > docs. +1 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops
On Wed, May 7, 2014 at 1:39 PM, Tom Lane wrote: > The indexscan is incorrectly returning rows where the queried key exists > but isn't at top-level. > > We could fix this either by giving up on no-recheck for existence queries, > or by changing the way that non-top-level keys get indexed. However > I suspect the latter would break containment queries, or at least make > their lives a lot more difficult. Thanks for looking into this. I don't know that it would make life all that much harder for containment. jsonb_ops is already not compelling for testing containment where there is a lot of nesting. ISTM that make_scalar_key() could be taught to respect that distinction. ISTM that containment will indifferently treat the top-level keys as top level, and the nested keys as nested, because if you're looking at something nested you have to put a top-level key in the rhs value to do so, and it will be marked as top level, or not top level in a bijective fashion. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops
Peter Geoghegan writes: > On Wed, May 7, 2014 at 1:47 PM, Tom Lane wrote: >> No, wait, containment *doesn't* look into sub-objects: >> >> regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}'; >> f1 >> - >> {"foo": {"bar": "baz"}} >> (1 row) >> >> regression=# select * from j where f1 @> '{"bar": "baz"}'; >> f1 >> >> (0 rows) > Yes it does. It's just not intended to work like that. You have to > "give a path" to what you're looking for. Right, so the top-level item in the query has to be at top level in the searched-for row. This means that we *could* distinguish top-level from non-top-level keys in the index items, and still be able to generate correct index probes for a containment search (since we could similarly mark the generated keys as to whether they came from top level in the query object). So the question is whether that's a good idea or not. It seems like it is a good idea for the current existence operator, and about a wash for the current containment operator, but perhaps a bad idea for other definitions of containment. In any case, something here needs to change. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 4:15 PM, Andres Freund wrote: > On 2014-05-07 13:51:57 -0700, Jeff Janes wrote: >> On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote: >> >> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote: >> > > >> > > *) raising shared buffers does not 'give more memory to postgres for >> > > caching' -- it can only reduce it via double paging >> > >> > That's absolutely not a necessary consequence. If pages are in s_b for a >> > while the OS will be perfectly happy to throw them away. >> > >> >> Is that an empirical observation? > > Yes. > >> I've run some simulations a couple years >> ago, and also wrote some instrumentation to test that theory under >> favorably engineered (but still plausible) conditions, and couldn't get >> more than a small fraction of s_b to be so tightly bound in that the kernel >> could forget about them. Unless of course the entire workload or close to >> it fits in s_b. > > I think it depends on your IO access patterns. If the whole working set > fits into the kernel's page cache and there's no other demand for pages > it will stay in. If you constantly rewrite most all your pages they'll > also stay in the OS cache because they'll get written out. If the churn > in shared_buffers is so high (because it's so small in comparison to the > core hot data set) that there'll be dozens if not hundreds clock sweeps > a second you'll also have no locality. > It's also *hugely* kernel version specific :( right. This is, IMNSHO, exactly the sort of language that belongs in the docs. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
On 05/07/2014 04:13 PM, Tom Lane wrote: A README file would be better, perhaps, but there's not a specific directory associated with the jsonb code; so I think this sort of info belongs either in jsonb.h or in the file header comment for jsonb_gin.c. Is there any reason we couldn't have a README.jsonb? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 2014-05-07 13:51:57 -0700, Jeff Janes wrote: > On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote: > > > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote: > > > > > > *) raising shared buffers does not 'give more memory to postgres for > > > caching' -- it can only reduce it via double paging > > > > That's absolutely not a necessary consequence. If pages are in s_b for a > > while the OS will be perfectly happy to throw them away. > > > > Is that an empirical observation? Yes. > I've run some simulations a couple years > ago, and also wrote some instrumentation to test that theory under > favorably engineered (but still plausible) conditions, and couldn't get > more than a small fraction of s_b to be so tightly bound in that the kernel > could forget about them. Unless of course the entire workload or close to > it fits in s_b. I think it depends on your IO access patterns. If the whole working set fits into the kernel's page cache and there's no other demand for pages it will stay in. If you constantly rewrite most all your pages they'll also stay in the OS cache because they'll get written out. If the churn in shared_buffers is so high (because it's so small in comparison to the core hot data set) that there'll be dozens if not hundreds clock sweeps a second you'll also have no locality. It's also *hugely* kernel version specific :( Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops
On Wed, May 7, 2014 at 1:51 PM, Peter Geoghegan wrote: > Yes it does. It's just not intended to work like that. You have to > "give a path" to what you're looking for. There is exactly one exception explicitly noted as such in the user-visible docs, which is arrays and the containment of single elements. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 07/05/14 22:32, Robert Haas wrote: On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek wrote: I've committed the portion of your patch which does this, with pretty extensive changes. I moved the function which resets the crash times to bgworker.c, renamed it, and made it just reset all of the crash times unconditionally; there didn't seem to be any point in skipping the irrelevant ones, so it seemed best to keep things simple. I also moved the call from reaper() where you had it to PostmasterStateMachine(), because the old placement did not work for me when I carried out the following steps: 1. Kill a background worker with a 60-second restart time using SIGTERM, so that it does exit(1). 2. Before it restarts, kill a regular backend with SIGQUIT. Let me know if you think I've got it wrong. No I think it's fine, I didn't try that combination and just wanted to put it as deep in the call as possible. (2) If a shmem-connected backend fails to release the deadman switch or exits with an exit code other than 0 or 1, we crash-and-restart. A non-shmem-connected backend never causes a crash-and-restart. +1 I did this as a separate commit, e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for ReleasePostmasterChildSlot inside the if statement. Then I realized that was bogus, so I just pushed eee6cf1f337aa488a20e9111df446cdad770e645 to fix that. Hopefully it's OK now. The fixed one looks ok to me. (3) When a background worker exits without triggering a crash-and-restart, an exit code of precisely 0 causes the worker to be unregistered; any other exit code has no special effect, so bgw_restart_time controls. +1 This isn't done yet. Unless I am missing something this change was included in every patch I sent - setting rw->rw_terminate = true; in CleanupBackgroundWorker for zero exit code + comment changes. Or do you have objections to this approach? Anyway missing parts attached. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml index fd32d6c..5f86100 100644 --- a/doc/src/sgml/bgworker.sgml +++ b/doc/src/sgml/bgworker.sgml @@ -166,10 +166,12 @@ typedef struct BackgroundWorker - Background workers are expected to be continuously running; if they exit - cleanly, postgres will restart them immediately. Consider doing - interruptible sleep when they have nothing to do; this can be achieved by - calling WaitLatch(). Make sure the + Background workers are normally expected to be running continuously; + they get restarted automaticaly in case of crash (see + bgw_restart_time documentation for details), + but if they exit cleanly, postgres will not restart them. + Consider doing interruptible sleep when they have nothing to do; this can be + achieved by calling WaitLatch(). Make sure the WL_POSTMASTER_DEATH flag is set when calling that function, and verify the return code for a prompt exit in the emergency case that postgres itself has terminated. diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index 64c9722..b642f37 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -884,8 +884,9 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker, * running but is no longer. * * In the latter case, the worker may be stopped temporarily (if it is - * configured for automatic restart, or if it exited with code 0) or gone - * for good (if it is configured not to restart and exited with code 1). + * configured for automatic restart and exited with code 1) or gone for + * good (if it exited with code other than 1 or if it is configured not + * to restart). */ BgwHandleStatus GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 79d1c50..a0331e6 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -2849,7 +2849,11 @@ CleanupBackgroundWorker(int pid, if (!EXIT_STATUS_0(exitstatus)) rw->rw_crashed_at = GetCurrentTimestamp(); else + { + /* Zero exit status means terminate */ rw->rw_crashed_at = 0; + rw->rw_terminate = true; + } /* * Additionally, for shared-memory-connected workers, just like a diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h index c9550cc..d3dd1f2 100644 --- a/src/include/postmaster/bgworker.h +++ b/src/include/postmaster/bgworker.h @@ -16,10 +16,11 @@ * that the failure can only be transient (fork failure due to high load, * memory pressure, too many processes, etc); more permanent problems, like * failure to connect to a database, are detected later in the worker and dealt - * with just by having the worker exit normally. A worker which exits with a - * return code of 0 will be immediately re
Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops
On Wed, May 7, 2014 at 1:47 PM, Tom Lane wrote: > No, wait, containment *doesn't* look into sub-objects: > > regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}'; >f1 > - > {"foo": {"bar": "baz"}} > (1 row) > > regression=# select * from j where f1 @> '{"bar": "baz"}'; > f1 > > (0 rows) Yes it does. It's just not intended to work like that. You have to "give a path" to what you're looking for. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote: > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote: > > > > *) raising shared buffers does not 'give more memory to postgres for > > caching' -- it can only reduce it via double paging > > That's absolutely not a necessary consequence. If pages are in s_b for a > while the OS will be perfectly happy to throw them away. > Is that an empirical observation? I've run some simulations a couple years ago, and also wrote some instrumentation to test that theory under favorably engineered (but still plausible) conditions, and couldn't get more than a small fraction of s_b to be so tightly bound in that the kernel could forget about them. Unless of course the entire workload or close to it fits in s_b. Cheers, Jeff
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 05/07/2014 01:36 PM, Jeff Janes wrote: > On Wed, May 7, 2014 at 11:04 AM, Josh Berkus wrote: >> Unfortunately nobody has the time/resources to do the kind of testing >> required for a new recommendation for shared_buffers. > I think it is worse than that. I don't think we know what such testing > would even look like. SSD? BBU? max_connections=2 with 256 cores? > pgbench -N? capture and replay of Amazon's workload? > > If we could spell out/agree upon what kind of testing we would find > convincing, that would probably go a long way to getting some people to > work on carrying out the tests. Unless the conclusion was "please have 3TB > or RAM and a 50 disk RAID", then there might be few takers. Well, step #1 would be writing some easy-to-run benchmarks which carry out selected workloads and measure response times. The minimum starting set would include one OLTP/Web benchmark, and one DW benchmark. I'm not talking about the software to run the workload; we have that, in several varieties. I'm talking about the actual database generator and queries to run. That's the hard work. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops
I wrote: > Another idea would be to change the definition of the exists operator > so that it *does* look into sub-objects. It seems rather random to me > that containment looks into sub-objects but exists doesn't. However, > possibly there are good reasons for the non-orthogonality. No, wait, containment *doesn't* look into sub-objects: regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}'; f1 - {"foo": {"bar": "baz"}} (1 row) regression=# select * from j where f1 @> '{"bar": "baz"}'; f1 (0 rows) This is rather surprising in view of the way that section 8.14.4 goes on about nesting. But I guess the user-facing docs for jsonb are in little better shape than the internal docs. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 checksum errors in recovery with gin index
On 05/07/2014 10:35 AM, Jeff Janes wrote: When recovering from a crash (with injection of a partial page write at time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum verification failure. 16396 is a gin index. If I have it ignore checksum failures, there is no apparent misbehavior. I'm trying to bisect it, but it could take a while and I thought someone might have some theories based on the log: 29075 2014-05-06 23:29:51.411 PDT:LOG: 0: database system was not properly shut down; automatic recovery in progress 29075 2014-05-06 23:29:51.411 PDT:LOCATION: StartupXLOG, xlog.c:6361 29075 2014-05-06 23:29:51.412 PDT:LOG: 0: redo starts at 11/323FE1C0 29075 2014-05-06 23:29:51.412 PDT:LOCATION: StartupXLOG, xlog.c:6600 29075 2014-05-06 23:29:51.471 PDT:WARNING: 01000: page verification failed, calculated checksum 35967 but expected 7881 29075 2014-05-06 23:29:51.471 PDT:CONTEXT: xlog redo Delete list pages A-ha. The WAL record of list page deletion doesn't create a full-page images of the deleted pages. That's pretty sensible, as a deleted page doesn't contain any data that needs to be retained. However, if a full-page image is not taken, then the page should be completely recreated from scratch at replay. It's not doing that. Thanks for the testing! I'll have a stab at fixing that tomorrow. Basically, ginRedoDeleteListPages() needs to re-initialize the deleted pages. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] jsonb existence queries are misimplemented by jsonb_ops
I wrote: > The readability of that comment starts to go downhill with its use of > "reset" to refer to what everything else calls a "recheck" flag, and in > any case it's claiming that we *don't* need a recheck for exists (a > statement I suspect to be false, but more later). And, indeed, it's false: regression=# create table j (f1 jsonb); CREATE TABLE regression=# insert into j values ('{"foo": {"bar": "baz"}}'); INSERT 0 1 regression=# insert into j values ('{"foo": {"blah": "baz"}}'); INSERT 0 1 regression=# insert into j values ('{"fool": {"bar": "baz"}}'); INSERT 0 1 regression=# create index on j using gin(f1); CREATE INDEX regression=# select * from j where f1 ? 'bar'; f1 (0 rows) regression=# set enable_seqscan to 0; SET regression=# select * from j where f1 ? 'bar'; f1 -- {"foo": {"bar": "baz"}} {"fool": {"bar": "baz"}} (2 rows) The indexscan is incorrectly returning rows where the queried key exists but isn't at top-level. We could fix this either by giving up on no-recheck for existence queries, or by changing the way that non-top-level keys get indexed. However I suspect the latter would break containment queries, or at least make their lives a lot more difficult. Another idea would be to change the definition of the exists operator so that it *does* look into sub-objects. It seems rather random to me that containment looks into sub-objects but exists doesn't. However, possibly there are good reasons for the non-orthogonality. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 11:04 AM, Josh Berkus wrote: > On 05/06/2014 10:35 PM, Peter Geoghegan wrote: > > +1. In my view, we probably should have set it to a much higher > > absolute default value. The main problem with setting it to any > > multiple of shared_buffers that I can see is that shared_buffers is a > > very poor proxy for what effective_cache_size is supposed to > > represent. In general, the folk wisdom around sizing shared_buffers > > has past its sell-by date. > > Unfortunately nobody has the time/resources to do the kind of testing > required for a new recommendation for shared_buffers. > I think it is worse than that. I don't think we know what such testing would even look like. SSD? BBU? max_connections=2 with 256 cores? pgbench -N? capture and replay of Amazon's workload? If we could spell out/agree upon what kind of testing we would find convincing, that would probably go a long way to getting some people to work on carrying out the tests. Unless the conclusion was "please have 3TB or RAM and a 50 disk RAID", then there might be few takers. Cheers, Jeff
Re: [HACKERS] bgworker crashed or not?
On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek wrote: >> What I'm inclined to do is change the logic so that: >> >> (1) After a crash-and-restart sequence, zero rw->rw_crashed_at, so >> that anything which is still registered gets restarted immediately. > > Yes, that's quite obvious change which I missed completely :). I've committed the portion of your patch which does this, with pretty extensive changes. I moved the function which resets the crash times to bgworker.c, renamed it, and made it just reset all of the crash times unconditionally; there didn't seem to be any point in skipping the irrelevant ones, so it seemed best to keep things simple. I also moved the call from reaper() where you had it to PostmasterStateMachine(), because the old placement did not work for me when I carried out the following steps: 1. Kill a background worker with a 60-second restart time using SIGTERM, so that it does exit(1). 2. Before it restarts, kill a regular backend with SIGQUIT. Let me know if you think I've got it wrong. >> (2) If a shmem-connected backend fails to release the deadman switch >> or exits with an exit code other than 0 or 1, we crash-and-restart. A >> non-shmem-connected backend never causes a crash-and-restart. > > +1 I did this as a separate commit, e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for ReleasePostmasterChildSlot inside the if statement. Then I realized that was bogus, so I just pushed eee6cf1f337aa488a20e9111df446cdad770e645 to fix that. Hopefully it's OK now. >> (3) When a background worker exits without triggering a >> crash-and-restart, an exit code of precisely 0 causes the worker to be >> unregistered; any other exit code has no special effect, so >> bgw_restart_time controls. > > +1 This isn't done yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
Peter Geoghegan writes: > On Wed, May 7, 2014 at 12:27 PM, Tom Lane wrote: >> The jsonb_ops storage format for values is inherently lossy, because it >> cannot distinguish the string values "n", "t", "f" from JSON null or >> boolean true, false respectively; nor does it know the difference between >> a number and a string containing digits. This appears to not quite be a >> bug because the consistent functions force recheck for all queries that >> care about values (as opposed to keys). But if it's documented anywhere >> I don't see where. > The fact that we *don't* set the reset flag for > JsonbExistsStrategyNumber, and why that's okay is prominently > documented. So I'd say that it is. Meh. Are you talking about the large comment block in gin_extract_jsonb? The readability of that comment starts to go downhill with its use of "reset" to refer to what everything else calls a "recheck" flag, and in any case it's claiming that we *don't* need a recheck for exists (a statement I suspect to be false, but more later). It entirely fails to explain that other query types *do* need a recheck because of arbitrary decisions about not representing JSON datatype information fully. There's another comment in gin_consistent_jsonb that's just as misleading, because it mentions (vaguely) some reasons why recheck is necessary without mentioning this one. A larger issue here is that, to the extent that this comment even has the information I'm after, the comment is in the wrong place. It is not attached to the code that's actually making the lossy representational choices (ie, make_scalar_key), nor to the code that decides whether or not a recheck is needed (ie, gin_consistent_jsonb). I don't think that basic architectural choices like these should be relegated to comment blocks inside specific functions to begin with. A README file would be better, perhaps, but there's not a specific directory associated with the jsonb code; so I think this sort of info belongs either in jsonb.h or in the file header comment for jsonb_gin.c. > Anyway, doing things that way for values won't obviate the need to set > the reset flag, unless you come up with a much more sophisticated > scheme. Existence (of keys) is only tested in respect of the > top-level. Containment (where values are tested) is more complicated. I'm not expecting that it will make things better for the current query operators. I am thinking that exact rather than lossy storage might help for future operators wanting to use this same index representation. Once we ship 9.4, it's going to be very hard to change the index representation, especially for the default opclass (cf the business about which opclass is default for type inet). So it behooves us to not throw information away needlessly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 checksum errors in recovery with gin index
On Wed, May 7, 2014 at 10:34 AM, Andres Freund wrote: > Hi, > > On 2014-05-07 10:21:26 -0700, Jeff Janes wrote: > > On Wed, May 7, 2014 at 12:48 AM, Andres Freund >wrote: > > > If you have the WAL a pg_xlogdump grepping for everything referring to > > > that block would be helpful. > > > > > > > The only record which mentions block 28486 by name is this one: > > Hm, try running it with -b specified. > Still nothing more. > > > rmgr: Gin len (rec/tot): 1576/ 1608, tx: 77882205, lsn: > > 11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page, > node: > > 1663/16384/16396 blkno: 28486 > > > > However, I think that that record precedes the recovery start point. > > If that's the case it seems likely that a PageSetLSN() or PageSetDirty() > are missing somewhere... > Wouldn't that result in corrupted data after crash recovery when checksum failures are ignored? I haven't seen any of that. Cheers, Jeff
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
Peter Geoghegan writes: > On Wed, May 7, 2014 at 12:08 PM, Tom Lane wrote: >> Is this an obsolete speculation about writing jsonb_hash_ops, or is there >> still something worth preserving there? If so, what? > This is not obsolete. It would equally apply to a GiST opclass that > wanted to support our current definition of existence. Array elements > are keys simply by virtue of being strings, but otherwise are treated > as values. See the large comment block within gin_extract_jsonb(). It's not that aspect I'm complaining about, it's the bit about "one day we may wish to write". This comment should restrict itself to describing what jsonb_ops does, not make already-or-soon-to-be-obsolete statements about what other opclasses might or might not do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
On Wed, May 7, 2014 at 12:27 PM, Tom Lane wrote: > The jsonb_ops storage format for values is inherently lossy, because it > cannot distinguish the string values "n", "t", "f" from JSON null or > boolean true, false respectively; nor does it know the difference between > a number and a string containing digits. This appears to not quite be a > bug because the consistent functions force recheck for all queries that > care about values (as opposed to keys). But if it's documented anywhere > I don't see where. The fact that we *don't* set the reset flag for JsonbExistsStrategyNumber, and why that's okay is prominently documented. So I'd say that it is. > And in any case, is it a good idea? We could fairly > easily change things so that these cases are guaranteed distinguishable. > We're using an entire byte to convey one bit of information (key or > value); I'm inclined to redefine the flag byte so that it tells not just > that but which JSON datatype is involved. It seemed simpler to do it that way. As I've said before, jsonb_ops is mostly concerned with hstore-style indexing. It could also be particularly useful for expressional indexes on "tags" arrays of strings, which is a common use-case. jsonb_hash_ops on the other hand is for testing containment, which is useful for querying heavily nested documents, where typically there is a very low selectivity for keys. It's not the default largely because I was concerned about not supporting all indexable operators by default, and because I suspected that it would be preferred to have a default closer to hstore. Anyway, doing things that way for values won't obviate the need to set the reset flag, unless you come up with a much more sophisticated scheme. Existence (of keys) is only tested in respect of the top-level. Containment (where values are tested) is more complicated. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 2:58 PM, Peter Geoghegan wrote: > On Wed, May 7, 2014 at 11:50 AM, Robert Haas wrote: >> But that does not mean, as the phrase "folk >> wisdom" might be taken to imply, that we don't know anything at all >> about what actually works well in practice. > > Folk wisdom doesn't imply that. It implies that we think this works, > and we may well be right, but there isn't all that much rigor behind > some of it. I'm not blaming anyone for this state of affairs. I've > heard plenty of people repeat the "don't exceed 8GB" rule - I > regularly repeated it myself. I cannot find any rigorous defense of > this, though. If you're aware of one, please point it out to me. I'm not sure the level of rigor you'd like to see is going to be available here. Complex systems have complex behavior; that's life. At any rate, I'm not aware of any rigorous defense of the "don't exceed 8GB" rule. But, #1, I'd never put it that simply. What I've found is more like this: If it's possible to size shared_buffers so that the working set fits entirely within shared_buffers, that configuration is worthy of strong consideration. Otherwise, you probably want to keep shared_buffers low in order to avoid checkpoint-related I/O spikes and minimize double buffering; try 25% of system memory up to 512MB on Windows or up to 2GB on 32-bit Linux or up to 8GB on 64-bit Linux for starters, and then tune based on your workload. And #2, I think the origin of the 8GB number on 64-bit non-Windows systems is that people found that checkpoint-related I/O spikes became intolerable when you went too much above that number. On some systems, the threshold is lower than that - for example, I believe Merlin and others have reported numbers more like 2GB than 8GB - and on other systems, the threshold is higher - indeed, some people go way higher and never hit it at all. I agree that it would be nice to better-characterize why different users hit it at different levels, but it's probably highly dependent on hardware, workload, and kernel version, so I tend to doubt it can be characterized very simply. If I had go to guess, I'd bet that fixing Linux's abominable behavior around the fsync() call would probably go a long way toward making higher values of shared_buffers more practical. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
And while I'm looking at it ... The jsonb_ops storage format for values is inherently lossy, because it cannot distinguish the string values "n", "t", "f" from JSON null or boolean true, false respectively; nor does it know the difference between a number and a string containing digits. This appears to not quite be a bug because the consistent functions force recheck for all queries that care about values (as opposed to keys). But if it's documented anywhere I don't see where. And in any case, is it a good idea? We could fairly easily change things so that these cases are guaranteed distinguishable. We're using an entire byte to convey one bit of information (key or value); I'm inclined to redefine the flag byte so that it tells not just that but which JSON datatype is involved. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
On Wed, May 7, 2014 at 12:08 PM, Tom Lane wrote: > * Jsonb Keys and string array elements are treated equivalently when > * serialized to text index storage. One day we may wish to create an opclass > * that only indexes values, but for now keys and values are stored in GIN > * indexes in a way that doesn't really consider their relationship to each > * other. > > Is this an obsolete speculation about writing jsonb_hash_ops, or is there > still something worth preserving there? If so, what? This is not obsolete. It would equally apply to a GiST opclass that wanted to support our current definition of existence. Array elements are keys simply by virtue of being strings, but otherwise are treated as values. See the large comment block within gin_extract_jsonb(). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 11:50 AM, Robert Haas wrote: >> Doesn't match my experience. Even with the current buffer manager >> there's usually enough locality to keep important pages in s_b for a >> meaningful time. I *have* seen workloads that should have fit into >> memory not fit because of double buffering. > > Same here. I think that it depends on whether or not you're thinking about the worst case. Most people are not going to be in the category you describe here. Plenty of people in the Postgres community run with very large shared_buffers settings, on non i/o bound workloads, and report good results - often massive, quickly apparent improvements. I'm mostly concerned with obsoleting the 8GB hard ceiling rule here. It probably doesn't matter whether and by how much one factor is worse than the other, though. I found the section "5.2 Temporal Control: Buffering" in the following paper, that speaks about the subject quite interesting: http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
btw ... in jsonb.h there is this comment: * Jsonb Keys and string array elements are treated equivalently when * serialized to text index storage. One day we may wish to create an opclass * that only indexes values, but for now keys and values are stored in GIN * indexes in a way that doesn't really consider their relationship to each * other. Is this an obsolete speculation about writing jsonb_hash_ops, or is there still something worth preserving there? If so, what? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 05/07/2014 11:52 AM, Peter Geoghegan wrote: > On Wed, May 7, 2014 at 11:40 AM, Josh Berkus wrote: >> So, as one of several people who put literally hundreds of hours into >> the original benchmarking which established the sizing recommendations >> for shared_buffers (and other settings), I find the phrase "folk wisdom" >> personally offensive. So, can we stop with this? > > I have also put a lot of time into benchmarking. No personal offence > was intended, and I'm glad that we have some advice to give to users, > but the fact of the matter is that current *official* recommendations > are very vague. Well, they should be vague; the only hard data we have is rather out-of-date (I think 8.2 was our last set of tests). If we gave users specific, detailed recommendations, we'd be misleading them. For that matter, our advice on shared_buffers ... and our design for it ... is going to need to change radically soon, since Linux is getting an ARC with a frequency cache as well as a recency cache, and FreeBSD and OpenSolaris already have them. FWIW, if someone could fund me for a month, I'd be happy to create a benchmarking setup where we could test these kinds of things; I have pretty clear ideas how to build one. I imagine some of our other consultants could make the same offer. However, it's too much work for anyone to get done "in their spare time". -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
On 05/07/2014 02:52 PM, Andres Freund wrote: On 2014-05-07 14:48:51 -0400, Tom Lane wrote: Andres Freund writes: * The jentry representation should be changed so it's possible to get the type of a entry without checking individual types. That'll make code like findJsonbValueFromSuperHeader() (well, whatever you've renamed it to) much easier to read. Preferrably so it an have the same values (after shifting/masking) ask the JBE variants. And it needs space for futher types (or representations thereof). Further types? What further types? JSON seems unlikely to grow any other value types than what it's got. I am not thinking about user level exposed types, but e.g. a hash/object representation that allows duplicated keys and keeps the original order. Because I am pretty sure that'll come. That makes one of you. We've had hstore for years and nobody that I recall has asked for preservation of key order or duplicates. And any app that relies on either in JSON is still, IMNSHO, broken by design. OTOH, there are suggestions of "superjson" types that support other scalar types such as timestamps. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] making bgworkers without shmem access actually not have shmem access
On 07/05/14 20:37, Robert Haas wrote: At a minimum, it's got to be better than the status quo, where shared memory is accessible throughout the entire lifetime of non-shmem-access background workers. Seems reasonable to me, it might need to be revisited to at least try to figure out if we can make EXEC_BACKEND safer, but it's definitely better than the current state. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 11:50 AM, Robert Haas wrote: > But that does not mean, as the phrase "folk > wisdom" might be taken to imply, that we don't know anything at all > about what actually works well in practice. Folk wisdom doesn't imply that. It implies that we think this works, and we may well be right, but there isn't all that much rigor behind some of it. I'm not blaming anyone for this state of affairs. I've heard plenty of people repeat the "don't exceed 8GB" rule - I regularly repeated it myself. I cannot find any rigorous defense of this, though. If you're aware of one, please point it out to me. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] making bgworkers without shmem access actually not have shmem access
On Wed, May 7, 2014 at 2:44 PM, Tom Lane wrote: > Robert Haas writes: >> I've complained about this problem a few times before: there's nothing >> to prevent a background worker which doesn't request shared memory >> access from calling InitProcess() and then accessing shared memory >> anyway. The attached patch is a first crack at fixing it. > >> Comments? > > Looks reasonable to me. Thanks for the fast review. Committed, after fixing one further bug I spotted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 11:40 AM, Josh Berkus wrote: > So, as one of several people who put literally hundreds of hours into > the original benchmarking which established the sizing recommendations > for shared_buffers (and other settings), I find the phrase "folk wisdom" > personally offensive. So, can we stop with this? I have also put a lot of time into benchmarking. No personal offence was intended, and I'm glad that we have some advice to give to users, but the fact of the matter is that current *official* recommendations are very vague. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
On 2014-05-07 14:48:51 -0400, Tom Lane wrote: > Andres Freund writes: > > * The jentry representation should be changed so it's possible to get the > > type > > of a entry without checking individual types. That'll make code like > > findJsonbValueFromSuperHeader() (well, whatever you've renamed it to) > > much easier to read. Preferrably so it an have the same values (after > > shifting/masking) ask the JBE variants. And it needs space for futher > > types (or representations thereof). > > Further types? What further types? JSON seems unlikely to grow any > other value types than what it's got. I am not thinking about user level exposed types, but e.g. a hash/object representation that allows duplicated keys and keeps the original order. Because I am pretty sure that'll come. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 2014-05-07 11:45:04 -0700, Peter Geoghegan wrote: > On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote: > >> *) raising shared buffers does not 'give more memory to postgres for > >> caching' -- it can only reduce it via double paging > > > > That's absolutely not a necessary consequence. If pages are in s_b for a > > while the OS will be perfectly happy to throw them away. > > The biggest problem with double buffering is not that it wastes > memory. Rather, it's that it wastes memory bandwidth. Doesn't match my experience. Even with the current buffer manager there's usually enough locality to keep important pages in s_b for a meaningful time. I *have* seen workloads that should have fit into memory not fit because of double buffering. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 2:49 PM, Andres Freund wrote: > On 2014-05-07 11:45:04 -0700, Peter Geoghegan wrote: >> On Wed, May 7, 2014 at 11:38 AM, Andres Freund >> wrote: >> >> *) raising shared buffers does not 'give more memory to postgres for >> >> caching' -- it can only reduce it via double paging >> > >> > That's absolutely not a necessary consequence. If pages are in s_b for a >> > while the OS will be perfectly happy to throw them away. >> >> The biggest problem with double buffering is not that it wastes >> memory. Rather, it's that it wastes memory bandwidth. > > Doesn't match my experience. Even with the current buffer manager > there's usually enough locality to keep important pages in s_b for a > meaningful time. I *have* seen workloads that should have fit into > memory not fit because of double buffering. Same here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 2:40 PM, Josh Berkus wrote: > On 05/07/2014 11:13 AM, Peter Geoghegan wrote: >> We ought to be realistic about the fact that the current >> recommendations around sizing shared_buffers are nothing more than >> folk wisdom. That's the best we have right now, but that seems quite >> unsatisfactory to me. > > So, as one of several people who put literally hundreds of hours into > the original benchmarking which established the sizing recommendations > for shared_buffers (and other settings), I find the phrase "folk wisdom" > personally offensive. So, can we stop with this? > > Otherwise, I don't think I can usefully participate in this discussion. +1. I think it is quite accurate to say that we can't predict precisely what value of shared_buffers will perform best for a particular workload and on a particular system. There are people out there using very large values and very small ones, according to what they have found most effective. But that does not mean, as the phrase "folk wisdom" might be taken to imply, that we don't know anything at all about what actually works well in practice. Because we do know quite a bit about that. I and people I work with have been able to improve performance greatly on many systems by providing guidance based on what this community has been able to understand on this topic, and dismissing it as rubbish is wrong. Also, I seriously doubt that a one-size-fits-all guideline about setting shared_buffers will ever be right for every workload. Workloads, by their nature, are complex beasts. The size of the workload varies, and which portions of it are how hot vary, and the read-write mix varies, and those are not problems with PostgreSQL; those are problems with data. That is not to say that we can't do anything to make PostgreSQL work better across a wider range of settings for shared_buffers, but it is to say that no matter how much work we do on the code, setting this optimally for every workload will probably remain complex. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
Andres Freund writes: > * The jentry representation should be changed so it's possible to get the type > of a entry without checking individual types. That'll make code like > findJsonbValueFromSuperHeader() (well, whatever you've renamed it to) > much easier to read. Preferrably so it an have the same values (after > shifting/masking) ask the JBE variants. And it needs space for futher > types (or representations thereof). Further types? What further types? JSON seems unlikely to grow any other value types than what it's got. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote: >> *) raising shared buffers does not 'give more memory to postgres for >> caching' -- it can only reduce it via double paging > > That's absolutely not a necessary consequence. If pages are in s_b for a > while the OS will be perfectly happy to throw them away. The biggest problem with double buffering is not that it wastes memory. Rather, it's that it wastes memory bandwidth. I think that lessening that problem will be the major benefit of making larger shared_buffers settings practical. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] making bgworkers without shmem access actually not have shmem access
Robert Haas writes: > I've complained about this problem a few times before: there's nothing > to prevent a background worker which doesn't request shared memory > access from calling InitProcess() and then accessing shared memory > anyway. The attached patch is a first crack at fixing it. > Comments? Looks reasonable to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 05/07/2014 11:13 AM, Peter Geoghegan wrote: > We ought to be realistic about the fact that the current > recommendations around sizing shared_buffers are nothing more than > folk wisdom. That's the best we have right now, but that seems quite > unsatisfactory to me. So, as one of several people who put literally hundreds of hours into the original benchmarking which established the sizing recommendations for shared_buffers (and other settings), I find the phrase "folk wisdom" personally offensive. So, can we stop with this? Otherwise, I don't think I can usefully participate in this discussion. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote: > On Wed, May 7, 2014 at 1:13 PM, Peter Geoghegan wrote: > > On Wed, May 7, 2014 at 11:04 AM, Josh Berkus wrote: > >> Unfortunately nobody has the time/resources to do the kind of testing > >> required for a new recommendation for shared_buffers. > > > > I meant to suggest that the buffer manager could be improved to the > > point that the old advice becomes obsolete. Right now, it's much > > harder to analyze shared_buffers than it should be, presumably because > > of the problems with the buffer manager. I think that if we could > > formulate better *actionable* advice around what we have right now, > > that would have already happened. > > > > We ought to be realistic about the fact that the current > > recommendations around sizing shared_buffers are nothing more than > > folk wisdom. That's the best we have right now, but that seems quite > > unsatisfactory to me. > > I think the stock advice is worse then nothing because it is A. based > on obsolete assumptions and B. doesn't indicate what the tradeoffs are > or what kinds of symptoms adjusting the setting could alleviate. The > documentation should be reduced to things that are known, for example: > > *) raising shared buffers does not 'give more memory to postgres for > caching' -- it can only reduce it via double paging That's absolutely not a necessary consequence. If pages are in s_b for a while the OS will be perfectly happy to throw them away. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] making bgworkers without shmem access actually not have shmem access
I've complained about this problem a few times before: there's nothing to prevent a background worker which doesn't request shared memory access from calling InitProcess() and then accessing shared memory anyway. The attached patch is a first crack at fixing it. Unfortunately, there's still a window between when we fork() and when we detach shared memory, but that's also true for processes like syslogger, whose death is nevertheless does not cause a crash-and-restart. Also unfortunately, in the EXEC_BACKEND case, we actually have to attach shared memory first, to get our background worker entry details. This is undesirable, but I'm not sure there's any good way to fix it at all, and certainly not in time for 9.4. Hopefully, the window when shared memory is attached with this patch is short enough to make things relatively safe. At a minimum, it's got to be better than the status quo, where shared memory is accessible throughout the entire lifetime of non-shmem-access background workers. Attached is also a new background worker called say_hello which I used for testing this patch. That's obviously not for commit. Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c index a6b25d8..8b8ae89 100644 --- a/src/backend/postmaster/bgworker.c +++ b/src/backend/postmaster/bgworker.c @@ -20,9 +20,11 @@ #include "postmaster/bgworker_internals.h" #include "postmaster/postmaster.h" #include "storage/barrier.h" +#include "storage/dsm.h" #include "storage/ipc.h" #include "storage/latch.h" #include "storage/lwlock.h" +#include "storage/pg_shmem.h" #include "storage/pmsignal.h" #include "storage/proc.h" #include "storage/procsignal.h" @@ -400,12 +402,16 @@ BackgroundWorkerStopNotifications(pid_t pid) BackgroundWorker * BackgroundWorkerEntry(int slotno) { + static BackgroundWorker myEntry; BackgroundWorkerSlot *slot; Assert(slotno < BackgroundWorkerData->total_slots); slot = &BackgroundWorkerData->slot[slotno]; Assert(slot->in_use); - return &slot->worker; /* can't become free while we're still here */ + + /* must copy this in case we don't intend to retain shmem access */ + memcpy(&myEntry, &slot->worker, sizeof myEntry); + return &myEntry; } #endif @@ -542,6 +548,19 @@ StartBackgroundWorker(void) snprintf(buf, MAXPGPATH, "bgworker: %s", worker->bgw_name); init_ps_display(buf, "", "", ""); + /* + * If we're not supposed to have shared memory access, then detach from + * shared memory. If we didn't request shared memory access, the + * postmaster won't force a cluster-wide restart if we exit unexpectedly, + * so we'd better make sure that we don't mess anything up that would + * require that sort of cleanup. + */ + if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0) + { + dsm_detach_all(); + PGSharedMemoryDetach(); + } + SetProcessingMode(InitProcessing); /* Apply PostAuthDelay */ @@ -616,19 +635,29 @@ StartBackgroundWorker(void) /* We can now handle ereport(ERROR) */ PG_exception_stack = &local_sigjmp_buf; - /* Early initialization */ - BaseInit(); - /* - * If necessary, create a per-backend PGPROC struct in shared memory, - * except in the EXEC_BACKEND case where this was done in - * SubPostmasterMain. We must do this before we can use LWLocks (and in - * the EXEC_BACKEND case we already had to do some stuff with LWLocks). + * If the background worker request shared memory access, set that up now; + * else, detach all shared memory segments. */ -#ifndef EXEC_BACKEND if (worker->bgw_flags & BGWORKER_SHMEM_ACCESS) + { + /* + * Early initialization. Some of this could be useful even for + * background workers that aren't using shared memory, but they can + * call the individual startup routines for those subsystems if needed. + */ + BaseInit(); + + /* + * Create a per-backend PGPROC struct in shared memory, except in the + * EXEC_BACKEND case where this was done in SubPostmasterMain. We must + * do this before we can use LWLocks (and in the EXEC_BACKEND case we + * already had to do some stuff with LWLocks). + */ +#ifndef EXEC_BACKEND InitProcess(); #endif + } /* * If bgw_main is set, we use that value as the initial entrypoint. diff --git a/contrib/say_hello/Makefile b/contrib/say_hello/Makefile new file mode 100644 index 000..2da89d7 --- /dev/null +++ b/contrib/say_hello/Makefile @@ -0,0 +1,17 @@ +# contrib/say_hello/Makefile + +MODULES = say_hello + +EXTENSION = say_hello +DATA = say_hello--1.0.sql + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/say_hello +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/say_hello/say_hello--1.0.sql b/contrib/say_hello/say_hello--1.0.sql new file mode 100644 index 000..c82b7df --- /d
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 1:13 PM, Peter Geoghegan wrote: > On Wed, May 7, 2014 at 11:04 AM, Josh Berkus wrote: >> Unfortunately nobody has the time/resources to do the kind of testing >> required for a new recommendation for shared_buffers. > > I meant to suggest that the buffer manager could be improved to the > point that the old advice becomes obsolete. Right now, it's much > harder to analyze shared_buffers than it should be, presumably because > of the problems with the buffer manager. I think that if we could > formulate better *actionable* advice around what we have right now, > that would have already happened. > > We ought to be realistic about the fact that the current > recommendations around sizing shared_buffers are nothing more than > folk wisdom. That's the best we have right now, but that seems quite > unsatisfactory to me. I think the stock advice is worse then nothing because it is A. based on obsolete assumptions and B. doesn't indicate what the tradeoffs are or what kinds of symptoms adjusting the setting could alleviate. The documentation should be reduced to things that are known, for example: *) raising shared buffers does not 'give more memory to postgres for caching' -- it can only reduce it via double paging *) are generally somewhat faster than fault to o/s buffers *) large s_b than working dataset size can be good configuration for read only loads especially *) have bad interplay with o/s in some configurations with large settings *) shared buffers can reduce write i/o in certain workloads *) interplay with checkpoint *) have different mechanisms for managing contention than o/s buffers merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Tue, May 6, 2014 at 9:55 AM, Andres Freund wrote: > On 2014-05-06 17:43:45 +0100, Simon Riggs wrote: > > > All this changes is the cost of > > IndexScans that would use more than 25% of shared_buffers worth of > > data. Hopefully not many of those in your workload. Changing the cost > > doesn't necessarily prevent index scans either. And if there are many > > of those in your workload AND you run more than one at same time, then > > the larger setting will work against you. So the benefit window for > > such a high setting is slim, at best. > Not only do you need to run more than one at a time, but they also must use mostly disjoint sets of data, in order for the larger estimate to be bad. > > Why? There's many workloads where indexes are larger than shared buffers > but fit into the operating system's cache. And that's precisely what > effective_cache_size is about. > It is more about the size of the table referenced by the index, rather than the size of the index. The point is that doing a large index scan might lead you to visit the same table blocks repeatedly within quick succession. (If a small index scan is on the inner side of a nested loop, then you might access the same index leaf blocks and the same table blocks repeatedly--that is why is only mostly about the table size, rather than exclusively). Cheers, Jeff
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
On 2014-05-07 10:55:28 -0700, Peter Geoghegan wrote: > On Wed, May 7, 2014 at 4:40 AM, Andres Freund wrote: > > * The jentry representation should be changed so it's possible to get the > > type > > of a entry without checking individual types. That'll make code like > > findJsonbValueFromSuperHeader() (well, whatever you've renamed it to) > > much easier to read. Preferrably so it an have the same values (after > > shifting/masking) ask the JBE variants. And it needs space for futher > > types (or representations thereof). > > I don't know what you mean. Every place that macros like > JBE_ISNUMERIC() are used subsequently involves access to (say) numeric > union fields. At best, you could have those functions check that the > types match, and then handle each case in a switch that only looked at > (say) the "candidate", but that doesn't really save much at all. It > wouldn't take much to have the macros give enum constant values back > as you suggest, though. Compare for (i = 0; i < count; i++) { JEntry *e = array + i; if (JBE_ISNULL(*e) && key->type == jbvNull) { result->type = jbvNull; result->estSize = sizeof(JEntry); } else if (JBE_ISSTRING(*e) && key->type == jbvString) { result->type = jbvString; result->val.string.val = data + JBE_OFF(*e); result->val.string.len = JBE_LEN(*e); result->estSize = sizeof(JEntry) + result->val.string.len; } else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric) { result->type = jbvNumeric; result->val.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e))); result->estSize = 2 * sizeof(JEntry) + VARSIZE_ANY(result->val.numeric); } else if (JBE_ISBOOL(*e) && key->type == jbvBool) { result->type = jbvBool; result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0; result->estSize = sizeof(JEntry); } else continue; if (compareJsonbScalarValue(key, result) == 0) return result; } with for (i = 0; i < count; i++) { JEntry *e = array + i; if (!JBE_TYPE_IS_SCALAR(*e)) continue; if (JBE_TYPE(*e) != key->type) continue; result = getJsonbValue(e); if (compareJsonbScalarValue(key, result) == 0) return result; } Yes, it's not a fair comparison because I made up getJsonbValue(). But it's *much* more readable regardless. And there's more places that could use it. Like the second half of findJsonbValueFromSuperHeader(). FWIW, that's one of the requests I definitely made before. > > * I wonder if the hash/object pair representation is wise and if it > > shouldn't be keys combined with offsets first, and then the > > values. That will make access much faster. And that's what jsonb > > essentially is about. > > I like that the physical layout reflects the layout of the original > JSON document. Don't see much value in that. This is a binary representation and it'd be bijective. > Besides, I don't think this is obviously the case. Are > you sure that it won't be more useful to make children as close to > their parents as possible? Particularly given idiomatic usage. Because - if done right - it would allow elementwise access without scanning previous values. > > * I think both arrays and hashes should grow individual structs. With > > space for additional flags. > Why? Because a) it will make the code more readable b) it'll allow for adding different representations of hashes/arrays. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On Wed, May 7, 2014 at 11:04 AM, Josh Berkus wrote: > Unfortunately nobody has the time/resources to do the kind of testing > required for a new recommendation for shared_buffers. I meant to suggest that the buffer manager could be improved to the point that the old advice becomes obsolete. Right now, it's much harder to analyze shared_buffers than it should be, presumably because of the problems with the buffer manager. I think that if we could formulate better *actionable* advice around what we have right now, that would have already happened. We ought to be realistic about the fact that the current recommendations around sizing shared_buffers are nothing more than folk wisdom. That's the best we have right now, but that seems quite unsatisfactory to me. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 05/06/2014 10:35 PM, Peter Geoghegan wrote: > +1. In my view, we probably should have set it to a much higher > absolute default value. The main problem with setting it to any > multiple of shared_buffers that I can see is that shared_buffers is a > very poor proxy for what effective_cache_size is supposed to > represent. In general, the folk wisdom around sizing shared_buffers > has past its sell-by date. Unfortunately nobody has the time/resources to do the kind of testing required for a new recommendation for shared_buffers. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wanted: jsonb on-disk representation documentation
On Wed, May 7, 2014 at 4:40 AM, Andres Freund wrote: > * Imo we need space in jsonb ondisk values to indicate a format > version. We won't fully get this right. That would be nice. > * The jentry representation should be changed so it's possible to get the type > of a entry without checking individual types. That'll make code like > findJsonbValueFromSuperHeader() (well, whatever you've renamed it to) > much easier to read. Preferrably so it an have the same values (after > shifting/masking) ask the JBE variants. And it needs space for futher > types (or representations thereof). I don't know what you mean. Every place that macros like JBE_ISNUMERIC() are used subsequently involves access to (say) numeric union fields. At best, you could have those functions check that the types match, and then handle each case in a switch that only looked at (say) the "candidate", but that doesn't really save much at all. It wouldn't take much to have the macros give enum constant values back as you suggest, though. > * I wonder if the hash/object pair representation is wise and if it > shouldn't be keys combined with offsets first, and then the > values. That will make access much faster. And that's what jsonb > essentially is about. I like that the physical layout reflects the layout of the original JSON document. Besides, I don't think this is obviously the case. Are you sure that it won't be more useful to make children as close to their parents as possible? Particularly given idiomatic usage. Jsonb, in point of fact, *is* fast. > * I think both arrays and hashes should grow individual structs. With > space for additional flags. Why? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 05/07/2014 07:31 AM, Andrew Dunstan wrote: > +1. If we ever want to implement an auto-tuning heuristic it seems we're > going to need some hard empirical evidence to support it, and that > doesn't seem likely to appear any time soon. 4GB default it is, then. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
* Simon Riggs (si...@2ndquadrant.com) wrote: > On 7 May 2014 17:43, Stephen Frost wrote: > > It's the optimizer's job to figure out which path to pick though, based > > on which will have the lowest cost. > > Of course. I'm not suggesting otherwise. > > >> If do you want that, you can write an Event Trigger that automatically > >> adds a lookaside for any table. > > > > This sounds terribly ugly and like we're pushing optimization decisions > > on to the user instead of just figuring out what the best answer is. > > I'm proposing that we use a declarative approach, just like we do when > we say CREATE INDEX. There's quite a few trade-offs when it comes to indexes though. I'm trying to figure out when you wouldn't want to use a GPU, if it's available to you and the cost model says it's faster? To me, that's kind of like saying you want a declarative approach for when to use a HashJoin. > The idea is that we only consider a lookaside when a lookaside has > been declared. Same as when we add an index, the optimizer considers > whether to use that index. What we don't want to happen is that the > optimizer considers a GIN plan, even when a GIN index is not > available. Yes, I understood your proposal- I just don't agree with it. ;) For MatViews and/or Indexes, there are trade-offs to be had as it relates to disk space, insert speed, etc. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] 9.4 checksum errors in recovery with gin index
Hi, On 2014-05-07 10:21:26 -0700, Jeff Janes wrote: > On Wed, May 7, 2014 at 12:48 AM, Andres Freund wrote: > > > Hi, > > > > On 2014-05-07 00:35:35 -0700, Jeff Janes wrote: > > > When recovering from a crash (with injection of a partial page write at > > > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum > > > verification failure. > > > > > > 16396 is a gin index. > > > > Over which type? What was the load? make check? > > > > A gin index on text[]. > > The load is a variation of the crash recovery tester I've been using the > last few years, this time adapted to use a gin index in a rather unnatural > way. I just increment a counter on a random row repeatedly via a unique > key, but for this purpose that unique key is stuffed into text[] along with > a bunch of cruft. The cruft is text representations of negative integers, > the actual key is text representation of nonnegative integers. > > The test harness (patch to induce crashes, and two driving programs) and a > preserved data directory are here: > > https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing > > (role jjanes, database jjanes) > > As far as I can tell, this problem goes back to the beginning of page > checksums. Interesting. > > > If I have it ignore checksum failures, there is no apparent misbehavior. > > > I'm trying to bisect it, but it could take a while and I thought someone > > > might have some theories based on the log: > > > > If you have the WAL a pg_xlogdump grepping for everything referring to > > that block would be helpful. > > > > The only record which mentions block 28486 by name is this one: Hm, try running it with -b specified. > rmgr: Gin len (rec/tot): 1576/ 1608, tx: 77882205, lsn: > 11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page, node: > 1663/16384/16396 blkno: 28486 > > However, I think that that record precedes the recovery start point. If that's the case it seems likely that a PageSetLSN() or PageSetDirty() are missing somewhere... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
On 7 May 2014 17:43, Stephen Frost wrote: > * Simon Riggs (si...@2ndquadrant.com) wrote: >> IMHO we would not want to add indexes to every column, on every table, >> nor would we wish to use lookaside for all tables. It is a good thing >> to be able to add optimizations for individual tables. GPUs are not >> good for everything; it is good to be able to leverage their >> strengths, yet avoid their weaknesses. > > It's the optimizer's job to figure out which path to pick though, based > on which will have the lowest cost. Of course. I'm not suggesting otherwise. >> If do you want that, you can write an Event Trigger that automatically >> adds a lookaside for any table. > > This sounds terribly ugly and like we're pushing optimization decisions > on to the user instead of just figuring out what the best answer is. I'm proposing that we use a declarative approach, just like we do when we say CREATE INDEX. The idea is that we only consider a lookaside when a lookaside has been declared. Same as when we add an index, the optimizer considers whether to use that index. What we don't want to happen is that the optimizer considers a GIN plan, even when a GIN index is not available. I'll explain it more at the developer meeting. It probably sounds a bit weird at first. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
On Wed, May 7, 2014 at 1:08 PM, Tom Lane wrote: > Robert Haas writes: >> On Wed, May 7, 2014 at 12:21 PM, Tom Lane wrote: >>> If Craig has a concrete argument why all GUCs should be accessible >>> to external modules, then let's see it (after which we'd better debate >>> exposing the few that are in fact static in guc.c). > >> I think there's actually a very good reason to think that GUCs are >> good candidates for this treatment, which is that, by definition, the >> GUC is a public interface: you can change it with a SET command. > > Sure, and we provide public APIs for accessing/setting GUCs. The SET > side of that is most emphatically *not* "just set the C variable". > Yeah, you can get away with reading them like that, assuming you want > the internal representation not the user-visible one. In any case, > I've not heard the use-case why all (and only) GUCs might need to be > readable in that way. My experience is that GUCs are a common thing to want expose to extensions, and that C code usually wants the internal form, not the string form. I'm not arguing that nothing else needs to be exposed, but if there's a better argument possible for exposing the GUC variables than the fact that a bunch of people with experience developing PostgreSQL extensions view that as a real need, I can't think what it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 checksum errors in recovery with gin index
On Wed, May 7, 2014 at 12:48 AM, Andres Freund wrote: > Hi, > > On 2014-05-07 00:35:35 -0700, Jeff Janes wrote: > > When recovering from a crash (with injection of a partial page write at > > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum > > verification failure. > > > > 16396 is a gin index. > > Over which type? What was the load? make check? > A gin index on text[]. The load is a variation of the crash recovery tester I've been using the last few years, this time adapted to use a gin index in a rather unnatural way. I just increment a counter on a random row repeatedly via a unique key, but for this purpose that unique key is stuffed into text[] along with a bunch of cruft. The cruft is text representations of negative integers, the actual key is text representation of nonnegative integers. The test harness (patch to induce crashes, and two driving programs) and a preserved data directory are here: https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing (role jjanes, database jjanes) As far as I can tell, this problem goes back to the beginning of page checksums. > > If I have it ignore checksum failures, there is no apparent misbehavior. > > I'm trying to bisect it, but it could take a while and I thought someone > > might have some theories based on the log: > > If you have the WAL a pg_xlogdump grepping for everything referring to > that block would be helpful. > The only record which mentions block 28486 by name is this one: rmgr: Gin len (rec/tot): 1576/ 1608, tx: 77882205, lsn: 11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page, node: 1663/16384/16396 blkno: 28486 However, I think that that record precedes the recovery start point. Cheers, Jeff
Re: [HACKERS] PGDLLEXPORTing all GUCs?
On 2014-05-07 13:08:52 -0400, Tom Lane wrote: > Robert Haas writes: > > On Wed, May 7, 2014 at 12:21 PM, Tom Lane wrote: > >> If Craig has a concrete argument why all GUCs should be accessible > >> to external modules, then let's see it (after which we'd better debate > >> exposing the few that are in fact static in guc.c). > > > I think there's actually a very good reason to think that GUCs are > > good candidates for this treatment, which is that, by definition, the > > GUC is a public interface: you can change it with a SET command. > > Sure, and we provide public APIs for accessing/setting GUCs. As strings. Not a useful representation for C code... And to avoid units getting tacked on you need to first get the config option number, then allocate an array on the stack, call GetConfigOptionByNum(), then parse the result into the underlying type. Meh. > The SET > side of that is most emphatically *not* "just set the C variable". > Yeah, you can get away with reading them like that, assuming you want > the internal representation not the user-visible one. In any case, > I've not heard the use-case why all (and only) GUCs might need to be > readable in that way. I think you're making up the 'and only' part. There's lots of variables that should/need to be exported. Just look at the amount of mess you cleaned up with variou extensions not actually working on windows... Last time round you argued against exporting all variables. So Craig seems to have choosen a subset that's likely to be needed. > Again, I'm not arguing against a proposal that we should automatically > export all globally-declared variables for platform-levelling reasons. > I *am* saying that I find a proposal to do that just to GUCs to be > unsupported by any argument made so far. Well, then let's start discussing that proposal then. I propose having defining a 'pg_export' macro that's suitably defined by the buildsystem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
Robert Haas writes: > On Wed, May 7, 2014 at 12:21 PM, Tom Lane wrote: >> If Craig has a concrete argument why all GUCs should be accessible >> to external modules, then let's see it (after which we'd better debate >> exposing the few that are in fact static in guc.c). > I think there's actually a very good reason to think that GUCs are > good candidates for this treatment, which is that, by definition, the > GUC is a public interface: you can change it with a SET command. Sure, and we provide public APIs for accessing/setting GUCs. The SET side of that is most emphatically *not* "just set the C variable". Yeah, you can get away with reading them like that, assuming you want the internal representation not the user-visible one. In any case, I've not heard the use-case why all (and only) GUCs might need to be readable in that way. Again, I'm not arguing against a proposal that we should automatically export all globally-declared variables for platform-levelling reasons. I *am* saying that I find a proposal to do that just to GUCs to be unsupported by any argument made so far. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)
Heikki Linnakangas writes: > gin_extract_jsonb recursively extracts all the elements, keys and values > of any sub-object too, but JB_ROOT_COUNT only counts the top-level elements. Got it. So if the top level is empty, we can exit early, but otherwise we use its length * 2 as a guess at how big the output will be; which will be right if it's an object without further substructure, and otherwise might need enlargement. > (I hope this is made a bit more clear in the comments I added in the > patch I posted this morning) Didn't read that yet, but will incorporate this info into the jsonb_gin patch I'm working on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
On Wed, May 7, 2014 at 12:21 PM, Tom Lane wrote: > Robert Haas writes: >> I don't accept this argument. In EnterpriseDB's Advanced Server fork >> of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT >> precisely because we have external modules that need access to them. > > Well, that's an argument for marking every darn global variable as > PGDLLEXPORT. But it's *not* an argument for marking GUCs in particular > that way. In particular, you are conveniently ignoring the point that > GUCs are much more likely to be global as an artifact of the way guc.c > is modularized than because we actually think they should be globally > accessible. > > If Craig has a concrete argument why all GUCs should be accessible > to external modules, then let's see it (after which we'd better debate > exposing the few that are in fact static in guc.c). Or if you want > to hang your hat on the platform-leveling argument, then we should be > re-debating exporting *all* global variables. But as far as the actually > proposed patch goes, all I'm hearing is very confused thinking. I think there's actually a very good reason to think that GUCs are good candidates for this treatment, which is that, by definition, the GUC is a public interface: you can change it with a SET command. It's certainly easier to imagine an extension wanting access to update_process_title than, say, criticalRelcachesBuilt. But maybe you're right and we should revisit the idea of exposing everything. A quick grep through src/include suggests that GUCs are a big percentage of what's not marked PGDLLEXPORT anyway, and the other stuff that's in there is stuff like PgStartTime and PostmasterPid which hardly seem like silly things to expose. I certainly think we should err on the side of exposing stuff that people think might be useful rather than pretending that we can stop them from using symbols by refusing to PGDLLEXPORT them. We can't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
* Simon Riggs (si...@2ndquadrant.com) wrote: > IMHO we would not want to add indexes to every column, on every table, > nor would we wish to use lookaside for all tables. It is a good thing > to be able to add optimizations for individual tables. GPUs are not > good for everything; it is good to be able to leverage their > strengths, yet avoid their weaknesses. It's the optimizer's job to figure out which path to pick though, based on which will have the lowest cost. > If do you want that, you can write an Event Trigger that automatically > adds a lookaside for any table. This sounds terribly ugly and like we're pushing optimization decisions on to the user instead of just figuring out what the best answer is. > I agree TupleTableSlot may not be best way for bulk data movement. We > probably need to look at buffering/bulk movement between executor > nodes in general, which would be of benefit for the FDW case also. > This would be a problem even for Custom Scans as originally presented > also, so I don't see much change there. Being able to do bulk movement would be useful, but (as I proposed months ago) being able to do asyncronous returns would be extremely useful also, when you consider FDWs and Append()- the main point there being that you want to keep the FDWs busy and working in parallel. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PGDLLEXPORTing all GUCs?
On 2014-05-07 12:21:55 -0400, Tom Lane wrote: > Robert Haas writes: > > I don't accept this argument. In EnterpriseDB's Advanced Server fork > > of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT > > precisely because we have external modules that need access to them. > > Well, that's an argument for marking every darn global variable as > PGDLLEXPORT. But it's *not* an argument for marking GUCs in particular > that way. In particular, you are conveniently ignoring the point that > GUCs are much more likely to be global as an artifact of the way guc.c > is modularized than because we actually think they should be globally > accessible. GUCs in general are user configurable things. So it's not particularly unreasonable to assume that a significant fraction of them are of interest to extensions. And it's not like exporting them gives way to many additional dangers - they already can be overwritten. In fact, I am pretty sure that nearly all of these cases are about *reading* the underlying variable not writing them. It's pretty darn less convenient and far slower to get the config variable as text and then convert it to the underlying type. > (after which we'd better debate exposing the few that are in fact > static in guc.c). I plan to do that for most of them - completely independently of this topic. I think 'export'ing a variable in several files is a horrible idea. > Or if you want > to hang your hat on the platform-leveling argument, then we should be > re-debating exporting *all* global variables. Yes. I am wondering whether that's not the most sensible course. It's a pita, but essentially we'd have to do a global s/export/pg_export/ in the headers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)
On 05/07/2014 06:27 PM, Tom Lane wrote: I think you're just proving the point that this code is woefully underdocumented. If there were, somewhere, some comment explaining what the heck JB_ROOT_COUNT actually counts, maybe I wouldn't be asking this question. jsonb.h is certainly not divulging any such information. After having reverse-engineered the convertJsonb code, I think I can explain what JB_ROOT_COUNT is. If the root of the Jsonb datum is an array, it's the number of elements in that top-level array. If it's an object, it's the number of key/value pairs in that top-level object. Some of the elements of that array (or values of the object) can be arrays or objects themselves. gin_extract_jsonb recursively extracts all the elements, keys and values of any sub-object too, but JB_ROOT_COUNT only counts the top-level elements. (I hope this is made a bit more clear in the comments I added in the patch I posted this morning) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.5] Custom Plan API
* Simon Riggs (si...@2ndquadrant.com) wrote: > Agreed. My proposal is that if the planner allows the lookaside to an > FDW then we pass the query for full execution on the FDW. That means > that the scan, aggregate and join could take place via the FDW. i.e. > "Custom Plan" == lookaside + FDW How about we get that working for FDWs to begin with and then we can come back to this idea..? We're pretty far from join-pushdown or aggregate-pushdown to FDWs, last I checked, and having those would be a massive win for everyone using FDWs. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PGDLLEXPORTing all GUCs?
Robert Haas writes: > I don't accept this argument. In EnterpriseDB's Advanced Server fork > of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT > precisely because we have external modules that need access to them. Well, that's an argument for marking every darn global variable as PGDLLEXPORT. But it's *not* an argument for marking GUCs in particular that way. In particular, you are conveniently ignoring the point that GUCs are much more likely to be global as an artifact of the way guc.c is modularized than because we actually think they should be globally accessible. If Craig has a concrete argument why all GUCs should be accessible to external modules, then let's see it (after which we'd better debate exposing the few that are in fact static in guc.c). Or if you want to hang your hat on the platform-leveling argument, then we should be re-debating exporting *all* global variables. But as far as the actually proposed patch goes, all I'm hearing is very confused thinking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
On Wed, May 7, 2014 at 11:19 AM, Tom Lane wrote: > Andres Freund writes: >> On 2014-05-07 10:29:36 -0400, Tom Lane wrote: >>> To my mind, we PGDLLEXPORT some variable only after deciding that yeah, >>> we're okay with having third-party modules touching that. Craig's >>> proposal is to remove human judgement from that process. > >> It's just levelling the planefield between platforms. If I had an idea >> how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on >> windows. >> The problem is that there's lot of variables which aren't exported and >> which we'll only discover after the release. Just look at what >> e.g. postgres_fdw needed. It's not particularly unlikely that others >> fdws need some of those as well. But they can't change the release at >> the same time. > > [ shrug... ] That problem is uncorrelated with GUC status, however. > If that's your argument for a patch, then the patch should DLLEXPORT > *every single non-static variable*. Which is a discussion we've already > had, and rejected. > > I'd not be against an automatic mechanism for that, and indeed put > considerable work into trying to make it happen a few months ago. But > I'll resist wholesale cluttering of the source code with those markers. > As long as we have to have them, I think we should use them in the way > I outlined, that we mark only variables that are "considered okay to > access". In fact, GUCs are exactly the *last* variables that should get > marked that way automatically, because so many of them are global only > because of the need for guc.c to communicate with the owning module, > not because we want anything else touching them. I don't accept this argument. In EnterpriseDB's Advanced Server fork of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT precisely because we have external modules that need access to them. Of course, what that means is that when PostgreSQL changes things around in a future release, we have to go revise those external modules as well. However, that just doesn't happen often enough to actually pose a significant problem for us. It's not really accurate to think that people are only going to rely on the things that we choose to PGDLLEXPORT. It's more accurate to think that when we don't mark things PGDLLEXPORT, we're forcing people to decide between (1) giving up on writing their extension, (2) having that extension not work on Windows, (3) submitting a patch to add a PGDLLEXPORT marking and waiting an entire release cycle for that to go G.A., and then still not being able to support older versions, or (4) forking PostgreSQL. That's an unappealing list of options. I would not go so far as to say that we should PGDLLEXPORT absolutely every non-static variable. But I think adding those markings to every GUC we've got is perfectly reasonable. I am quite sure that the 2ndQuadrant folks know that they'll be on the hook to update any code they write that uses those variables if and when a future version of PostgreSQL whacks things around. But that's not a new problem - the same thing happens when a function signature changes, or when a variable that does happen to have a PGDLLEXPORT marking changes. And at least in my experience it's also not a large problem. The amount of time EnterpriseDB spends updating our (large!) amount of proprietary code in response to such changes is a small percentage of our overall development time. Enabling extensibility is a far more important goal than keeping people from committing to interfaces that may change in the future, especially since the latter is a losing battle anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)
Peter Geoghegan writes: > On Tue, May 6, 2014 at 8:08 PM, Tom Lane wrote: >> The early-exit code path supposes that JB_ROOT_COUNT is absolutely >> reliable as an indicator that there's nothing in the jsonb value. >> On the other hand, the realloc logic inside the iteration loop implies >> that JB_ROOT_COUNT is just an untrustworthy estimate. Which theory is >> correct? And why is there not a comment to be seen anywhere? If the code >> is correct then this logic is certainly worthy of a comment or three. > JsonbIteratorNext() is passed "false" as its skipNested argument. It's > recursive. And? I think you're just proving the point that this code is woefully underdocumented. If there were, somewhere, some comment explaining what the heck JB_ROOT_COUNT actually counts, maybe I wouldn't be asking this question. jsonb.h is certainly not divulging any such information. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
Andres Freund writes: > On 2014-05-07 10:29:36 -0400, Tom Lane wrote: >> To my mind, we PGDLLEXPORT some variable only after deciding that yeah, >> we're okay with having third-party modules touching that. Craig's >> proposal is to remove human judgement from that process. > It's just levelling the planefield between platforms. If I had an idea > how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on > windows. > The problem is that there's lot of variables which aren't exported and > which we'll only discover after the release. Just look at what > e.g. postgres_fdw needed. It's not particularly unlikely that others > fdws need some of those as well. But they can't change the release at > the same time. [ shrug... ] That problem is uncorrelated with GUC status, however. If that's your argument for a patch, then the patch should DLLEXPORT *every single non-static variable*. Which is a discussion we've already had, and rejected. I'd not be against an automatic mechanism for that, and indeed put considerable work into trying to make it happen a few months ago. But I'll resist wholesale cluttering of the source code with those markers. As long as we have to have them, I think we should use them in the way I outlined, that we mark only variables that are "considered okay to access". In fact, GUCs are exactly the *last* variables that should get marked that way automatically, because so many of them are global only because of the need for guc.c to communicate with the owning module, not because we want anything else touching them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
On 2014-05-07 10:29:36 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2014-05-07 09:35:06 -0400, Tom Lane wrote: > >> Craig Ringer writes: > >>> Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic > >>> concerns? > > >> That seems morally equivalent to "is there a reason not to make every > >> static variable global?". > > > I think what Craig actually tries to propose is to mark all GUCs > > currently exported in headers PGDLLIMPORT. > > There are few if any GUCs that aren't exposed in headers, just so that > guc.c can communicate with the owning modules. That doesn't mean that > we want everybody in the world messing with them. > > To my mind, we PGDLLEXPORT some variable only after deciding that yeah, > we're okay with having third-party modules touching that. Craig's > proposal is to remove human judgement from that process. It's just levelling the planefield between platforms. If I had an idea how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on windows. The problem is that there's lot of variables which aren't exported and which we'll only discover after the release. Just look at what e.g. postgres_fdw needed. It's not particularly unlikely that others fdws need some of those as well. But they can't change the release at the same time. If we want to declare variables off limits to extension/external code we need a solution that works on !windows as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
Tom Lane-2 wrote > Andres Freund < > andres@ > > writes: >> On 2014-05-07 09:35:06 -0400, Tom Lane wrote: >>> Craig Ringer < > craig@ > > writes: Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic concerns? > >>> That seems morally equivalent to "is there a reason not to make every >>> static variable global?". > >> I think what Craig actually tries to propose is to mark all GUCs >> currently exported in headers PGDLLIMPORT. > > There are few if any GUCs that aren't exposed in headers, just so that > guc.c can communicate with the owning modules. That doesn't mean that > we want everybody in the world messing with them. > > To my mind, we PGDLLEXPORT some variable only after deciding that yeah, > we're okay with having third-party modules touching that. Craig's > proposal is to remove human judgement from that process. So third-party modules that use GUC's that are not PGDLLEXPORT are doing so improperly - even if it works for them because they only care/test non-Windows platforms? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/PGDLLEXPORTing-all-GUCs-tp5802901p5802955.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 7 May 2014 15:10, Merlin Moncure wrote: > The core issues are: > 1) There is no place to enter total system memory available to the > database in postgresql.conf > 2) Memory settings (except for the above) are given as absolute > amounts, not percentages. Those sound useful starting points. The key issue for me is that effective_cache_size is a USERSET. It applies per-query, just like work_mem (though work_mem is per query node). If we had "total system memory" we wouldn't know how to divide it up amongst users since we have no functionality for "workload management". It would be very nice to be able to tell Postgres that "I have 64GB RAM, use it wisely". At present, any and all users can set effective_cache_size and work_mem to any value they please, any time they wish and thus overuse available memory. Which is why I've had to write plugins to manage the memory allocations better in userspace. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 7 May 2014 15:07, Tom Lane wrote: > Simon Riggs writes: >> I think I'm arguing myself towards using a BufferAccessStrategy of >> BAS_BULKREAD for large IndexScans, BitMapIndexScans and >> BitMapHeapScans. > > As soon as you've got some hard evidence to present in favor of such > changes, we can discuss it. I've got other things to do besides > hypothesize. Now we have a theory to test, I'll write a patch and we can collect evidence for, or against. > In the meantime, it seems like there is an emerging consensus that nobody > much likes the existing auto-tuning behavior for effective_cache_size, > and that we should revert that in favor of just increasing the fixed > default value significantly. I see no problem with a value of say 4GB; > that's very unlikely to be worse than the pre-9.4 default (128MB) on any > modern machine. > > Votes for or against? +1 for fixed 4GB and remove the auto-tuning code. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers
On 05/07/2014 10:12 AM, Andres Freund wrote: On 2014-05-07 10:07:07 -0400, Tom Lane wrote: In the meantime, it seems like there is an emerging consensus that nobody much likes the existing auto-tuning behavior for effective_cache_size, and that we should revert that in favor of just increasing the fixed default value significantly. I see no problem with a value of say 4GB; that's very unlikely to be worse than the pre-9.4 default (128MB) on any modern machine. Votes for or against? +1 for increasing it to 4GB and remove the autotuning. I don't like the current integration into guc.c much and a new static default doesn't seem to be worse than the current autotuning. +1. If we ever want to implement an auto-tuning heuristic it seems we're going to need some hard empirical evidence to support it, and that doesn't seem likely to appear any time soon. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDLLEXPORTing all GUCs?
Andres Freund writes: > On 2014-05-07 09:35:06 -0400, Tom Lane wrote: >> Craig Ringer writes: >>> Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic >>> concerns? >> That seems morally equivalent to "is there a reason not to make every >> static variable global?". > I think what Craig actually tries to propose is to mark all GUCs > currently exported in headers PGDLLIMPORT. There are few if any GUCs that aren't exposed in headers, just so that guc.c can communicate with the owning modules. That doesn't mean that we want everybody in the world messing with them. To my mind, we PGDLLEXPORT some variable only after deciding that yeah, we're okay with having third-party modules touching that. Craig's proposal is to remove human judgement from that process. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers