Re: [HACKERS] crash in plancache with subtransactions
On 22.10.2010 06:10, Tom Lane wrote: Right at the moment I'm not seeing any way that the present exec_eval_simple_expr approach can be fixed to work safely in the presence of recursion. What I think we might have to do is give up on the idea of caching execution state trees across calls, instead using them just for the duration of a single plpgsql function call. I'm not sure what sort of runtime penalty might ensue. The whole design predates the plancache, and I think it was mostly intended to prevent having to re-parse-and-plan simple expressions every time. So a lot of that overhead has gone away anyway given the plancache, and maybe we shouldn't sweat too much about paying what remains. We should test and measure that. (But on the third hand, what are we gonna do for back-patching to versions without the plancache?) One simple idea is to keep a flag along with the executor state to indicate that the executor state is currently in use. Set it just before calling ExecEvalExpr, and reset afterwards. If the flag is already set in the beginning of exec_eval_simple_expr, we have recursed, and must create a new executor state. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On Fri, Oct 22, 2010 at 11:03 AM, Josh Berkus wrote: > >> I think it's pretty well explained in the fine manual. >> >> http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS > > Nope. No relationship to checkpoint_segments is explained there. Try > again? Please see http://archives.postgresql.org/pgsql-docs/2010-10/msg00038.php Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timeout and wait-forever in sync rep
On Fri, Oct 22, 2010 at 7:33 AM, Bruce Momjian wrote: > Fujii Masao wrote: >> Hi, >> >> As the result of the discussion, I think that we need the following two >> parameters for the case where the standby goes down. > > Can we have a parameter that calls a operating system command when a > standby is declared dead, to notify the administrator? For me, that command is useful to STONITH the standby when the master detects the disconnection. I agree to add that parameter. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating-point timestamps versus Range Types
On Thu, Oct 21, 2010 at 10:24 PM, Tom Lane wrote: > Bruce Momjian writes: >> Greg Stark wrote: >>> Did we have a solution for the problem that understanding which >>> columns are timestamps requires having a tuple descriptor and parsing >>> the every tuple? That seems like it would a) be slow and b) require a >>> lot of high level code in the middle of a low-level codepath. > >> Yep, that's what it requires. It would rewrite in the new format. > > In the case of the recent hstore fixes, we were able to put the burden > on the hstore functions themselves to do any necessary conversion. > I wonder if it'd be possible to do something similar here? I haven't > chased the bits in any detail, but I'm thinking that integer timestamps > in a plausible range might all look like denormalized floats, and > conversely plausible float timestamps would look like ridiculously large > integer timestamps. Would we be willing to make such assumptions to > support in-place upgrade of timestamps? This seems like it might not be entirely reliable, which would make me disinclined to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating-point timestamps versus Range Types
On Thu, Oct 21, 2010 at 7:24 PM, Tom Lane wrote: > Would we be willing to make such assumptions to > support in-place upgrade of timestamps? > If something like that is true (I'm not sure it is) then we could consider doing the equivalent of what we were talking about doing for changes that require more space in the new version. 1. Backpatch a change that refuses to create new values of the prohibited type 2. Provide a tool which scans all the values in your old database and ensures there are no values of the prohibited type 3. Only allow pg_migrator from the version that includes the backpatched check We would also have to make sure the new version's integer timestamp doesn't write out any instances that look valid float timestamps. We could have a guc to disable this check if you have a tool that scans all the pages and rewrites any old values. I think #1 would be feasible if it's really as simple as checking the high bit which I imagine is what you're hoping it is. I supose it would require writing out a 0 or larger denormalized value which would mean we wouldn't be able to handle values close to the epoch properly. I suppose for timestamps that doesn't really matter since that's precision we never really have anyways. I'm not sure if the tool to rewrite all existing values is so feasible though. Considering that the values could be stuck in the middle of arrays or records or even custom data types. Also there's tintervals and so on to worry about too. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
On Thu, Oct 21, 2010 at 7:13 PM, Tom Lane wrote: > ... and as I recall, we got rid of it principally because the temp > tables weren't visible to ordinary catalog lookups, thus breaking > all sorts of client-side logic. > Though that wouldn't be the case if the catalogs held a template. Anyways I think this horse has been beaten to death. Whoever implements it will look at the pros and cons and decide which way they'll go. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Alvaro Herrera writes: > Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010: >> I don't believe that it's plancache's fault; the real problem is that >> plpgsql is keeping "simple expression" execution trees around longer >> than it should. Your patch masks the problem by forcing those trees to >> be rebuilt, but it's the execution trees not the plan trees that contain >> stale data. > Ahh, this probably explains why I wasn't been able to reproduce the > problem without involving subxacts, or prepared plans, that seemed to > follow mostly the same paths around plancache cleanup. > It's also the likely cause that this hasn't ben reported earlier. I traced through the details and found that the proximate cause of the crash is that this bit in fmgr_sql() gets confused: /* * Convert params to appropriate format if starting a fresh execution. (If * continuing execution, we can re-use prior params.) */ if (es && es->status == F_EXEC_START) postquel_sub_params(fcache, fcinfo); After the error in the first subtransaction, the execution state tree for the "public.dummy(p_name_table)" expression has a fn_extra link that is pointing to a SQLFunctionCache that's in F_EXEC_RUN state. So when the second call of broken() tries to re-use the state tree, fmgr_sql() thinks it's continuing the execution of a set-returning function, and doesn't bother to re-initialize its ParamListInfo struct. So it merrily tries to execute using a text datum that's pointing at long-since-pfree'd storage for the original 'nonexistant.stuffs' argument string. I had always felt a tad uncomfortable with the way that plpgsql re-uses execution state trees for simple expressions; it seemed to me that it was entirely unsafe in recursive usage. But I'd never been able to prove that it was broken. Now that I've seen this example, I know how to break it: recurse indirectly through a SQL function. For instance, this will dump core immediately: create or replace function recurse(float8) returns float8 as $$ begin raise notice 'recurse(%)', $1; if ($1 < 10) then return sql_recurse($1 + 1); else return $1; end if; end $$ language plpgsql; -- "limit" is to prevent this from being inlined create or replace function sql_recurse(float8) returns float8 as $$ select recurse($1) limit 1; $$ language sql; select recurse(0); Notice the lack of any subtransaction or error condition. The reason this fails is *not* plancache misfeasance or failure to clean up after error. Rather, it's that the inner execution of recurse() is trying to re-use an execution state tree that is pointing at an already-active execution of sql_recurse(). In general, what plpgsql is doing is entirely unsafe whenever a called function tries to keep changeable execution state in storage pointed to by fn_extra. We've managed to miss the problem because plpgsql doesn't try to use this technique on functions returning set (see exec_simple_check_node), and the vast majority of non-SRF functions that use fn_extra at all use it to cache catalog lookup results, which don't change from call to call. But there's no convention that says a function can't keep execution status data in fn_extra --- in fact, there isn't anyplace else for it to keep such data. Right at the moment I'm not seeing any way that the present exec_eval_simple_expr approach can be fixed to work safely in the presence of recursion. What I think we might have to do is give up on the idea of caching execution state trees across calls, instead using them just for the duration of a single plpgsql function call. I'm not sure what sort of runtime penalty might ensue. The whole design predates the plancache, and I think it was mostly intended to prevent having to re-parse-and-plan simple expressions every time. So a lot of that overhead has gone away anyway given the plancache, and maybe we shouldn't sweat too much about paying what remains. (But on the third hand, what are we gonna do for back-patching to versions without the plancache?) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Fri, Oct 22, 2010 at 1:31 AM, Dimitri Fontaine wrote: > Of course, you what that means? Yes, another version of the patch, that > will build the control file out of the control.in at build time rather > than install time, and that's back to using EXTVERSION both in the > Makefile and in the .control.in file. Here are detailed report for v9 patch. * extension.v9.patch.gz seems to contain other changes in the repo. Did you use old master to get the diff? * Typo in doc/xfunc.sgml. They are to be "replaceable" probably. openjade:xfunc.sgml:2510:32:E: element "REPLACABLE" undefined openjade:xfunc.sgml:2523:46:E: element "REPLACABLE" undefined * There are some inconsistency between extension names in \dx+ view and actual name used by CREATE EXTENSION. - auto_username => insert_username - intarray => _int - xml2 => pgxml We might need to rename them, or add 'installer'/'uninstaller' entries into control files to support different extension names from .so name. * pg_execute_from_file() and encoding It expects the file is in server encoding, but it is not always true because we support multiple encodings in the same installation. How about adding encoding parameter to the function? => pg_execute_from_file( file, encoding ) CREATE EXTENSION could have optional ENCODING option. => CREATE EXTENSION name [ ENCODING 'which' ] I strongly hope the multi-encoding support for my Japanese textsearch extension. Comments in the extension is written in UTF-8, but both UTF-8 and EUC_JP are equally used for database encodings. * Error messages in pg_execute_from_file() - "must be superuser to get file information" would be "must be superuser to execute file" . - "File '%s' could not be executed" would be "could not execute file: '%s'". Our message style guide is here: http://www.postgresql.org/docs/9.0/static/error-style-guide.html Many messages in extension.c are also to be adjusted. commands/extension.c needs to be cleaned up a bit more: * fsize in read_extension_control_file() is not used. * ferror() test just after AllocateFile() is not needed; NULL checking is enough. * malloc() in add_extension_custom_variable_classes(). I think the README says nothing about malloc() except assign_hook cases; palloc would be better here. BTW, did you register your patch to the next commitfest? It would be better to do so for tracking the activities. https://commitfest.postgresql.org/action/commitfest_view?id=8 -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating-point timestamps versus Range Types
Bruce Momjian writes: > Greg Stark wrote: >> Did we have a solution for the problem that understanding which >> columns are timestamps requires having a tuple descriptor and parsing >> the every tuple? That seems like it would a) be slow and b) require a >> lot of high level code in the middle of a low-level codepath. > Yep, that's what it requires. It would rewrite in the new format. In the case of the recent hstore fixes, we were able to put the burden on the hstore functions themselves to do any necessary conversion. I wonder if it'd be possible to do something similar here? I haven't chased the bits in any detail, but I'm thinking that integer timestamps in a plausible range might all look like denormalized floats, and conversely plausible float timestamps would look like ridiculously large integer timestamps. Would we be willing to make such assumptions to support in-place upgrade of timestamps? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On Thu, Oct 21, 2010 at 10:03 PM, Josh Berkus wrote: > >> I think it's pretty well explained in the fine manual. >> >> http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS > > Nope. No relationship to checkpoint_segments is explained there. Try > again? Well, it says "This sets only the minimum number of segments retained in pg_xlog; the system might need to retain more segments for WAL archival or to recover from a checkpoint." So in other words, the relationship with checkpoint segments is that whichever one currently requires retaining a larger number of segments applies. That's all the relationship there is. I'm not sure I understand the question. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
Bruce Momjian writes: > Greg Stark wrote: >> It seems to me simpler and more direct to just nail relcache >> entries for these objects into memory and manipulate them directly. >> They can be constructed from the global catalog tables and then >> tweaked to point to the backend local temporary tables. > Funny, but that is how I implemented temporary tables in 1999 and lasted > until 2002 when schema support was added. It actually worked because > all the lookups go through the syscache. ... and as I recall, we got rid of it principally because the temp tables weren't visible to ordinary catalog lookups, thus breaking all sorts of client-side logic. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
> I think it's pretty well explained in the fine manual. > > http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS Nope. No relationship to checkpoint_segments is explained there. Try again? >> If checkpoint_segments were a hard limit, then we could let admins set >> wal_keep_segments to -1, knowing that they'd set checkpoint_segments to >> the max space they had available. > > This assumes that more checkpoint segments is always better, which > isn't true. I might have 100 GB of disk space free, but not want to > replay WAL for 4 days if I have a crash. No, it assumes no such thing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On Thu, Oct 21, 2010 at 9:09 PM, Josh Berkus wrote: > >>> Very true. But the lack of a -1 setting for wal_keep_segments means >>> that if you would like to take a backup without archiving, you must >>> set wal_keep_segments to a value greater than or equal to the rate at >>> which you generate WAL segments multiplied by the time it takes you to >>> run a backup. If that doesn't qualify as requiring arcane knowledge, >>> I'm mystified as to what ever could. > > Speaking of which, what's the relationship between checkpoint_segments > and wal_keep_segments? PG seems perfectly willing to let me set the > latter higher than the former, and it's not documented. I think it's pretty well explained in the fine manual. http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS > If checkpoint_segments were a hard limit, then we could let admins set > wal_keep_segments to -1, knowing that they'd set checkpoint_segments to > the max space they had available. This assumes that more checkpoint segments is always better, which isn't true. I might have 100 GB of disk space free, but not want to replay WAL for 4 days if I have a crash. I do think that the current default of checkpoint_segments=3 is pathologically insane, but that's another can of worms. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating-point timestamps versus Range Types
Greg Stark wrote: > On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian wrote: > > One thing we have talked about is converting the page on read-in from > > the backend. ?Since the timestamps are the same size as float or > > integer, that might be possible. > > Did we have a solution for the problem that understanding which > columns are timestamps requires having a tuple descriptor and parsing > the every tuple? That seems like it would a) be slow and b) require a > lot of high level code in the middle of a low-level codepath. Yep, that's what it requires. It would rewrite in the new format. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
>> Very true. But the lack of a -1 setting for wal_keep_segments means >> that if you would like to take a backup without archiving, you must >> set wal_keep_segments to a value greater than or equal to the rate at >> which you generate WAL segments multiplied by the time it takes you to >> run a backup. If that doesn't qualify as requiring arcane knowledge, >> I'm mystified as to what ever could. Speaking of which, what's the relationship between checkpoint_segments and wal_keep_segments? PG seems perfectly willing to let me set the latter higher than the former, and it's not documented. If checkpoint_segments were a hard limit, then we could let admins set wal_keep_segments to -1, knowing that they'd set checkpoint_segments to the max space they had available. Although we might want to rename those. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Bruce Momjian wrote: >Greg Stark wrote: >> On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg >> wrote: >> > In order to simplify recovery at this point (enormously), it would >> > have been very helpful (at almost negligible cost), to have the name >> > of the table, the name of the columns, and the types of the >> > columns available. >> > Why don't we insert that data into the first page of a regular table >> > file after in the special data area? >I was thinking we could dump a flat file very 15 minutes into each >database directory that had recovery-useful information. It wouldn't be >perfect, but would probably be sufficient for most forensics. It would definitely be better than the current state. But it still disconnects the information from the files they belong to (a bit). >From a cost/benifit ratio point of view, I'd still prefer to interlace the information into the tablefiles (which also scales better in case of numerous tables). -- Stephen. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Greg Stark wrote: >On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg wrote: >> For the recovery information I'd like to reserve: >> identifier: 00: table OID >> ? ? ? ? ? ?01: table layout >So here's a proposal for something that could maybe be implemented. I >think I'm leaning against this currently as there are just too many >caveats even for this limited functionality. I'm more inclined to go >with the idea someone else proposed of dumping an audit log of all DDL >on the table or after any DDL dumping the create table statements as >pg_dump would generate them would to a separate fork. Yes, but that has two distinct downsides I'd like to avoid: - It can grow uncontrollably in the case of someone using alter table on a (too) regular basis. - It separates the data from the tablefile it pertains to (which could complicate recovery (a lot)). >In this meta data object put: >table oid >table name >number of columns >array of typlen for those columns (-1 for varlena and size for fixed length) [...] >Including the type brings in a new set of complications. Even if you >assume the built-in typoids never change there are going to be typoids >that correspond to user defined types. Without the names of those >types the oids are pretty useless. >Just the typoids would put you over 8k in the worst case and the names >would put you into the realm of needing arbitrary numbers of blocks >for even average cases. Simiarly including the column names would >require potentially many blocks. All valid points/concerns. But, let's approach this from the side of the forensics analist instead, and see what information typically really would be needed (for argument's sake, let's call the unfortunate sod that lost the catalog to his database "the customer"): The customer usually still has access to the developer, or some developer documentation which documents which columns are used for what. It would most likely document most columns (especially the column names, to a lesser extent, the column types), but might be lacking some of the more recent changes which (unfortunately) were done on the live database using alter table, and hadn't been documented properly yet (I'm talking about typical real world cases I've encountered). Which means that analist would primarily be concerned with getting back the information of the column types and the column names. The next step would be to know about slight deviations from the documented columns (dropped or added columns since the last revision of the docs). Next are custom types. Custom types are likely to be better documented, hence the column name would usually be enough to recover the definition of a custom type from the docs. Assuming the above assumptions to be true in the common case, I would propose to implement something along the following lines: Entries for the special area in tables: 0007 00 tableoidTo be sprinkled in every megabyte or so. 01 00 cc tablename... cc = number of columns 01 01 ... // are column widths for varlena widths 01 02 ... // are typeoids per column (is 16-bits wide enough for the common cases? If not disregard my suggestion and make these 32-bit wide each) 01 03 nn col1name nn col2name ... nn = length of the column name that follows End of filled special area, no need to parse beyond here in the current page. Whereas the 01 special area types could be present in any number of pages. If they are present, they shall be present starting at the first page of the file, and possibly will be present in the next following page(s), until a page is encountered without them. Multiple occurrences of 01 01, 01 02, or 01 03 shall be concatenated to form the complete informationset. -- Stephen. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
Robert Haas wrote: > > However the user-unfriendliness isn't the fact that administrators > > need to determine how much disk they're willing to dedicate to > > Postgres. The user-unfriendliness is that they then have to specify > > this in terms of WAL log files and also have to know that we sometimes > > keep more than that and so on. > > > > We've done a good job in the past of converting GUC variables to > > meaningful units for administrators and users but it's an ongoing > > effort. If we need a GUC to control the amount of disk space we use it > > should be in units of MB/GB/TB. If we need a GUC for controlling how > > much WAL history to keep for recovering standbys or replicas then it > > should be specified in units of time. > > > > Units like "number of wal files" or worse in the case of > > checkpoint_segments "number of wal files / 2 - 1" or something like > > that are terrible. They require arcane knowledge for the > > administrator to have a clue how to set. > > Very true. But the lack of a -1 setting for wal_keep_segments means > that if you would like to take a backup without archiving, you must > set wal_keep_segments to a value greater than or equal to the rate at > which you generate WAL segments multiplied by the time it takes you to > run a backup. If that doesn't qualify as requiring arcane knowledge, > I'm mystified as to what ever could. LOL. Time machine required (both forward and backward time options). -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql autocompletion for \z and \dg
On Thu, Oct 21, 2010 at 8:45 PM, Robert Haas wrote: > Please add this in the usual spot: > https://commitfest.postgresql.org/action/commitfest_view/open Aye sir, added. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On Thu, Oct 21, 2010 at 8:52 PM, Greg Stark wrote: > On Thu, Oct 21, 2010 at 5:46 PM, Josh Berkus wrote: >> >>> Agreed, but as a reality check: when I proposed that wal_keep_segments >>> = -1 would keep all WAL segments (for use while the file system was >>> being backed up), I was told administrators shoud compute how much free >>> disk space they had. Obviously easy of use is not our #1 priority. >> >> Depends. Running out of disk space isn't exactly user-friendly either. >> And detecting how much free space is available would be a painful bit >> of platform-dependant code ... > > Nor can we assume we're the only thing using disk space. > > However the user-unfriendliness isn't the fact that administrators > need to determine how much disk they're willing to dedicate to > Postgres. The user-unfriendliness is that they then have to specify > this in terms of WAL log files and also have to know that we sometimes > keep more than that and so on. > > We've done a good job in the past of converting GUC variables to > meaningful units for administrators and users but it's an ongoing > effort. If we need a GUC to control the amount of disk space we use it > should be in units of MB/GB/TB. If we need a GUC for controlling how > much WAL history to keep for recovering standbys or replicas then it > should be specified in units of time. > > Units like "number of wal files" or worse in the case of > checkpoint_segments "number of wal files / 2 - 1" or something like > that are terrible. They require arcane knowledge for the > administrator to have a clue how to set. Very true. But the lack of a -1 setting for wal_keep_segments means that if you would like to take a backup without archiving, you must set wal_keep_segments to a value greater than or equal to the rate at which you generate WAL segments multiplied by the time it takes you to run a backup. If that doesn't qualify as requiring arcane knowledge, I'm mystified as to what ever could. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Greg Stark wrote: > On Tue, Oct 19, 2010 at 1:12 PM, Stephen R. van den Berg wrote: > > In order to simplify recovery at this point (enormously), it would > > have been very helpful (at almost negligible cost), to have the name > > of the table, the name of the columns, and the types of the > > columns available. > > > > Why don't we insert that data into the first page of a regular table > > file after in the special data area? > > > > I'd be willing to create a patch for that (should be pretty easy), > > if nobody considers it to be a bad idea. > > There isn't necessarily one value for these attributes. You can > rename columns and that rename may succeed and commit or fail and > rollback. You can drop or add columns and some rows will have or not > have the added columns at all. You could even add a column, insert > some rows, then abort -- all in a transaction. So some (aborted) rows > will have extra columns that aren't even present in the current table > definition. > > All this isn't to say the idea you're presenting is impossible or a > bad idea. If this meta information was only a hint for forensic > purposes and you take into account these caveats it might still be > useful. But I'm not sure how useful. I mean, you can't really decipher > everything properly without the data in the catalog -- and you have to > premise this on the idea that you've lost everything in the catalog > but not the data in other tables. Which seems like a narrow use case. I was thinking we could dump a flat file very 15 minutes into each database directory that had recovery-useful information. It wouldn't be perfect, but would probably be sufficient for most forensics. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On Thu, Oct 21, 2010 at 5:46 PM, Josh Berkus wrote: > >> Agreed, but as a reality check: when I proposed that wal_keep_segments >> = -1 would keep all WAL segments (for use while the file system was >> being backed up), I was told administrators shoud compute how much free >> disk space they had. Obviously easy of use is not our #1 priority. > > Depends. Running out of disk space isn't exactly user-friendly either. > And detecting how much free space is available would be a painful bit > of platform-dependant code ... Nor can we assume we're the only thing using disk space. However the user-unfriendliness isn't the fact that administrators need to determine how much disk they're willing to dedicate to Postgres. The user-unfriendliness is that they then have to specify this in terms of WAL log files and also have to know that we sometimes keep more than that and so on. We've done a good job in the past of converting GUC variables to meaningful units for administrators and users but it's an ongoing effort. If we need a GUC to control the amount of disk space we use it should be in units of MB/GB/TB. If we need a GUC for controlling how much WAL history to keep for recovering standbys or replicas then it should be specified in units of time. Units like "number of wal files" or worse in the case of checkpoint_segments "number of wal files / 2 - 1" or something like that are terrible. They require arcane knowledge for the administrator to have a clue how to set. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
> Agreed, but as a reality check: when I proposed that wal_keep_segments > = -1 would keep all WAL segments (for use while the file system was > being backed up), I was told administrators shoud compute how much free > disk space they had. Obviously easy of use is not our #1 priority. Depends. Running out of disk space isn't exactly user-friendly either. And detecting how much free space is available would be a painful bit of platform-dependant code ... -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql autocompletion for \z and \dg
On Wed, Oct 20, 2010 at 10:56 PM, Josh Kupershmidt wrote: > It looks like psql's tab completion for the \z and \dg commands in > psql are missing. I couldn't see a reason for this, so attached patch > fixes. > > Also, this patch proposes to change psql's "\?" help text to say that > \dg and \du are the same, since AFAICT they do exactly the same thing. Please add this in the usual spot: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On Thu, Oct 21, 2010 at 8:22 PM, Bruce Momjian wrote: > Josh Berkus wrote: >> Greg, >> >> > The way things stand you *always* need archived logs. Even if you have >> > streaming set up it might try to use archived logs if it falls too far >> > behind. >> >> Actually, you don't. If you're willing to accept possible >> desynchronization and recloning of the standbys, then you can skip the >> archive logs. > > Agreed, but as a reality check: when I proposed that wal_keep_segments > = -1 would keep all WAL segments (for use while the file system was > being backed up), I was told administrators shoud compute how much free > disk space they had. Obviously easy of use is not our #1 priority. Amen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Robert Haas wrote: > On Thu, Oct 21, 2010 at 4:21 PM, Josh Berkus wrote: > > On 10/20/10 6:54 PM, Robert Haas wrote: > >> I find it impossible to believe that's > >> a good decision, and IMHO we should be focusing on how to make the > >> parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us > >> most of the same benefits without throwing away hard-won performance. > > > > I'd be happy to accept that. ?Is it possible, though? > > I sketched an outline of the problem AIUI here: > > http://archives.postgresql.org/pgsql-hackers/2010-10/msg01348.php > > I think it's possible; I'm not quite sure how hard it is. > Unfortunately, I've not had as much PG-hacking time lately as I'd > like... Have we documented these TODOs? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Robert Haas wrote: > On Wed, Oct 20, 2010 at 3:40 PM, Greg Stark wrote: > > On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas wrote: > >> Exactly. ?It doesn't take many 3-7% slowdowns to add up to being 50% > >> or 100% slower, and that sucks. ?In fact, I'm still not convinced that > >> we were wise to boost default_statistics_target as much as we did. ?I > >> argued for a smaller boost at the time. > > > > Well we don't want to let ourselves be paralyzed by FUD so it was > > important to identify specific concerns and then tackle those > > concerns. Once we identified the worst-case planning cases we profiled > > them and found that the inflection point of the curve was fairly > > clearly above 100 but that there were cases where values below 1,000 > > caused problems. So I'm pretty happy with the evidence-based approach. > > The inflection point of the curve was certainly a good thing for us to > look at but the fact remains that we took a hit on a trivial > benchmark, and we can't afford to take too many of those. Agreed. If people start wondering if our new major releases are perhaps _slower_ than previous ones, we have lost a huge amount of momentum. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010: > I'm not immediately sure why plpgsql_subxact_cb is failing to clean up > correctly in this example, but that seems to be where to look. I think the reason is that one econtext is pushed for function execution, and another one for blocks that contain exceptions. The example function does not contain exceptions -- the savepoints are handled by the external SQL code. I'll have a closer look tomorrow. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
Josh Berkus wrote: > Greg, > > > The way things stand you *always* need archived logs. Even if you have > > streaming set up it might try to use archived logs if it falls too far > > behind. > > Actually, you don't. If you're willing to accept possible > desynchronization and recloning of the standbys, then you can skip the > archive logs. Agreed, but as a reality check: when I proposed that wal_keep_segments = -1 would keep all WAL segments (for use while the file system was being backed up), I was told administrators shoud compute how much free disk space they had. Obviously easy of use is not our #1 priority. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
Tom Lane wrote: > Bruce Momjian writes: > > Currently it isn't possible to create temporary tables on read-only > > standby servers, and I don't see it listed on the TODO list. Can I add > > it? > > Not unless you have some credible concept for how it might ever be > implemented. You can't create temp tables because you can't modify > system catalogs, and if you did somehow create them you couldn't put > anything in them because you can't generate XIDs on a slave ... much > less commit them. We have talked about ways that temp tables might be > created without touching the "real" system catalogs, but the XID issue > seems a complete showstopper. So, this is one of those odd cases where we know people are going to ask for a feature (temp tables on slaves), but we are not ready to put it on our TODO list. Where do we document that this isn't going to happen? In "Features we don't want"? That title doesn't really match. "Features we don't know how to do" doesn't sound good. ;-) -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Kevin Grittner wrote: >Tom Lane wrote: >> Most of the data-loss reports I've seen appeared to come from >> people who wouldn't be capable of doing such recovery work even if >> better tools were available. >No doubt; but the recovery work often winds up in the hands of >people with more skills than those responsible for the loss. >Whoever comes in to help with recovery is looking for every bit of >traction they can get. You'd be amazed at some of the odd straws >people can grasp to help recover data. And, obviously, the amount of time spent (by the professional) to recover the data goes down a lot when more information is available; which reduces cost and shortens the downtime to the sad sod that lost the data in the first place. -- Stephen. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creation of temporary tables on read-only standby servers
Greg Stark wrote: > On Tue, Oct 19, 2010 at 12:03 PM, Robert Haas wrote: > > The trick is that it would require us to have two pg_class tables, two > > pg_attribute tables, two pg_attrdef tables, etc.: in each case, one > > permanent and one temporary. ?I am not sure how complex that will turn > > out to be. > > Tom suggested using inheritance for this. > > I find it strange to try constructing catalog tables to represent > these local definitions which never need to be read by any other > backend and in any case are 1:1 copies of the global catalog entries. > > It seems to me simpler and more direct to just nail relcache > entries for these objects into memory and manipulate them directly. > They can be constructed from the global catalog tables and then > tweaked to point to the backend local temporary tables. Funny, but that is how I implemented temporary tables in 1999 and lasted until 2002 when schema support was added. It actually worked because all the lookups go through the syscache. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating-point timestamps versus Range Types
On Thu, Oct 21, 2010 at 4:49 PM, Bruce Momjian wrote: > One thing we have talked about is converting the page on read-in from > the backend. Since the timestamps are the same size as float or > integer, that might be possible. Did we have a solution for the problem that understanding which columns are timestamps requires having a tuple descriptor and parsing the every tuple? That seems like it would a) be slow and b) require a lot of high level code in the middle of a low-level codepath. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Floating-point timestamps versus Range Types
Robert Haas wrote: > On Mon, Oct 18, 2010 at 2:29 PM, Jeff Davis wrote: > > A reasonable conversion path might be to offer integer timestamps using > > a different type name (e.g. inttimestamp) that always means integer > > timestamps. Then, they could convert using ALTER TABLE, then do an > > in-place upgrade. We could even make pg_upgrade optionally convert > > inttimestamp to timestamp in O(1) on an integer-timestamps build. > > I think in retrospect it would certainly have been better to make > integer timestamps and float timestamps two separate data types, > rather than two versions of the same data type. Whether it's worth > providing that now after the fact is not clear to me. I'd be inclined > to wait and see whether we get many complaints... > > One problem with changing types in pg_upgrade is that type OIDs can > get embedded in the on-disk representation - I believe that this > happens for arrays, for instance. So I think it's practical for > pg_upgrade to change type names during a version upgrade, but not type > OIDs. One thing we have talked about is converting the page on read-in from the backend. Since the timestamps are the same size as float or integer, that might be possible. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable snapshot isolation patch
Jeff Davis wrote: > When using locks in an unconventional way, it would be helpful to > describe the invalid schedules that you're preventing. Perhaps an > example if you think it would be reasonably simple? Also some > indication of how another process is intended to modify the list > without walking it. I've just pushed some comment changes intended to address this. Did I hit the mark? -Kevin P.S. Sorry for the delay in responding to such simple requests -- I've been tied up with a family medical crisis; I hope to crank through much of what you've raised this weekend. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Excerpts from Tom Lane's message of jue oct 21 19:36:07 -0300 2010: > Alvaro Herrera writes: > > A customer was hitting some misbehavior in one of their internal tests and > > I tracked it down to plancache not behaving properly with > > subtransactions: in particular, a plan is not being marked "dead" when > > the subtransaction on which it is planned rolls back. > > I don't believe that it's plancache's fault; the real problem is that > plpgsql is keeping "simple expression" execution trees around longer > than it should. Your patch masks the problem by forcing those trees to > be rebuilt, but it's the execution trees not the plan trees that contain > stale data. Ahh, this probably explains why I wasn't been able to reproduce the problem without involving subxacts, or prepared plans, that seemed to follow mostly the same paths around plancache cleanup. It's also the likely cause that this hasn't ben reported earlier. > I'm not immediately sure why plpgsql_subxact_cb is failing to clean up > correctly in this example, but that seems to be where to look. Will take a look ... if the girls let me ... -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
On Thu, Oct 21, 2010 at 5:21 PM, Kevin Grittner wrote: > Tom Lane wrote: > >> Most of the data-loss reports I've seen appeared to come from >> people who wouldn't be capable of doing such recovery work even if >> better tools were available. > > No doubt; but the recovery work often winds up in the hands of > people with more skills than those responsible for the loss. > Whoever comes in to help with recovery is looking for every bit of > traction they can get. You'd be amazed at some of the odd straws > people can grasp to help recover data. > > I don't know how big the special area is, but if enough room could > be carved out to include even the relation ID or filename, it might > be a big help to someone. I'm pretty skeptical about including > information about attributes, though. Unfortunately, the use case for the relation ID or filename is much thinner than the case for the column definitions. You're less likely to lose that information, and if you do lose it you can probably guess by file size or by running strings on the data files. The really hard thing is to remember just exactly what columns you had in there, in what order... and oh yeah there was that column we dropped. But like you, I'm pretty dubious about trying to store all that stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On 19/10/10 13:16, Josh Berkus wrote: Robert asked me to write this up, so here it is. It is critical that we make replication easier to set up, administrate and monitor than it currently is. In my conversations with people, this is more important to our users and the adoption of PostgreSQL than synchronous replication is. First, I'm finding myself constantly needing to tutor people on how to set up replication. The mere fact that it requires a minimum 1-hour class to explain how to use it, or a 10-page tutoral, tells us it's too complex. As further evidence, Bruce and I explained binary replication to several MySQL geeks at OpenSQLCamp last weekend, and they were horrified at the number and complexity of the steps required. As it currently is, binary replication is not going to win us a lot of new users from the web development or virtualization world. +1 I've been having the same experience - how to set this up and do failover and failback etc occupies quite a bit of time in courses I've been teaching here in NZ and Australia. Having this whole replication business much simpler is definitely the way to go. A good example of how simple it can be is mongodb, where it is essentially one command to setup a 2 replica system with a voting arbiter: $ mongo > rs.initiate( { _id : "replication_set0", members : [ { _id : 0, host : "192.163,2,100" }, { _id : 1, host : "192.168.2.101" }, { _id : 2, host : "192.168.2.103", arbiterOnly : true } ] } ) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] docs on contrib modules that can't pg_upgrade?
Robert Treat wrote: > Howdy folks, > > Was wondering if there are any docs on which contrib modules don't work with > pg_upgrade? I seem to remember discussion on this during the 9.0 cycle, but > couldn't find it in the mail archive, and don't see anything in the wiki. > What > brings this up is I'm currently working on an 8.3 upgrade and it has > pg_freespacemap which breaks things; I think easy enough to work-around in > this case, but I am sure for other contribs, or for folks with a lot of > machinery built on top of a contrib, that won't always be the case. If > something like this doesn't exist, I'll start a wiki page on it, but thought > I'd ask first. I don't know of any /contrib modules that will not upgrade; if we had any they would be mentioned in the pg_upgrade docs and checked in the source code. We had a possible issue with hstore, but that was changed in a backward-compatible way for 9.0. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] crash in plancache with subtransactions
Alvaro Herrera writes: > A customer was hitting some misbehavior in one of their internal tests and > I tracked it down to plancache not behaving properly with > subtransactions: in particular, a plan is not being marked "dead" when > the subtransaction on which it is planned rolls back. I don't believe that it's plancache's fault; the real problem is that plpgsql is keeping "simple expression" execution trees around longer than it should. Your patch masks the problem by forcing those trees to be rebuilt, but it's the execution trees not the plan trees that contain stale data. I'm not immediately sure why plpgsql_subxact_cb is failing to clean up correctly in this example, but that seems to be where to look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Timeout and wait-forever in sync rep
Fujii Masao wrote: > Hi, > > As the result of the discussion, I think that we need the following two > parameters for the case where the standby goes down. Can we have a parameter that calls a operating system command when a standby is declared dead, to notify the administrator? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_filedump binary for CentOS
Greg Stark wrote: > On Thu, Oct 14, 2010 at 2:53 PM, Tom Lane wrote: > > Bruce Momjian writes: > >> Should we consider moving pg_filedump into our /contrib? > > > > Can't: it's GPL. > > > > I don't particularly see a problem with having GPL'd contrib modules. > It would mean any users hoping to redistribute the package couldn't > include those modules except under the GPL. But most repackagers don't > include the contrib modules anyways. Even ones that do and want to > include those modules would only have to include the source to that > module. > > I can see not wanting to let that camel's nose in for fear of having > packagers always be uncertain about the status of each contrib module > though. I think we should just link to the tool from our docs so there is no license complexity. Where do we add it? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL has higher cost than inheritance
On Oct 21, 2010, at 2:17 PM, Tom Lane wrote: > The oversight here is that we don't use appendrel planning for > a top-level UNION ALL construct. That didn't use to matter, > because you always got the same stupid Append plan either way. > Now it seems like we ought to have some more intelligence for the > top-level SetOp case. I smell some code refactoring coming up. Does it smell like chicken? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why do we have a database specification in .pgpass?
Peter Eisentraut wrote: > On ons, 2010-10-13 at 14:32 -0400, Bruce Momjian wrote: > > We have a database specification in .pgpass: > > > > hostname:port:database:username:password > > > > What is the purpose of 'database' since username/password combinations > > are global, not per database? I would like to documents its purpose. > > As a side note, the thing at the other end of a connection is not > necessarily a PostgreSQL server. It could be a connection pool proxy. > I don't know if any implementatation could make use of the database > field at the moment, but it should be kept in mind. > > That said, it would probably be good to document that the database field > is currently only useful in certain limited circumstances. Agreed, done. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 70d9202..a911c50 100644 *** a/doc/src/sgml/libpq.sgml --- b/doc/src/sgml/libpq.sgml *** myEventProc(PGEventId evtId, void *evtIn *** 6331,6336 --- 6331,6338 or the default socket directory) connections coming from the local machine. In a standby server, a database name of replication matches streaming replication connections made to the master server. +The database field is of limited usefulness because +users have the same password for all databases in the same cluster. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Tom Lane wrote: > Most of the data-loss reports I've seen appeared to come from > people who wouldn't be capable of doing such recovery work even if > better tools were available. No doubt; but the recovery work often winds up in the hands of people with more skills than those responsible for the loss. Whoever comes in to help with recovery is looking for every bit of traction they can get. You'd be amazed at some of the odd straws people can grasp to help recover data. I don't know how big the special area is, but if enough room could be carved out to include even the relation ID or filename, it might be a big help to someone. I'm pretty skeptical about including information about attributes, though. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL has higher cost than inheritance
Greg Stark writes: > On Thu, Oct 21, 2010 at 6:57 AM, Tom Lane wrote: >>> Thanks. It also explains my another question why Merge Append cannot >>> be used for UNION ALL plans. >> Hmm, seems like the example you show ought to work. I wonder if there >> was an oversight in that patch... > Huh, that definitely worked in the earlier versions of the patch (as > much as it "worked" at all) Actually, it works as long as the UNION is in a subquery: regression=# EXPLAIN select * from ( (SELECT * FROM ONLY parent ORDER BY i) UNION ALL (SELECT * FROM child ORDER BY i)) ss ORDER BY i LIMIT 10; QUERY PLAN --- Limit (cost=168.76..169.13 rows=10 width=4) -> Result (cost=168.76..294.51 rows=3400 width=4) -> Merge Append (cost=168.76..294.51 rows=3400 width=4) Sort Key: parent.i -> Sort (cost=168.75..174.75 rows=2400 width=4) Sort Key: parent.i -> Seq Scan on parent (cost=0.00..34.00 rows=2400 width=4) -> Index Scan using child_i_idx on child (cost=0.00..43.25 rows=1000 width=4) (8 rows) The oversight here is that we don't use appendrel planning for a top-level UNION ALL construct. That didn't use to matter, because you always got the same stupid Append plan either way. Now it seems like we ought to have some more intelligence for the top-level SetOp case. I smell some code refactoring coming up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] find -path isn't portable
Excerpts from Tom Lane's message of jue oct 21 17:48:18 -0300 2010: > buildfarm member koi, having recently been rescued from git purgatory, > is failing like this: > > configure: using CFLAGS=-O -Kinline > configure: using CPPFLAGS= -I/usr/local/include/libxml2 -I/usr/local/include > configure: using LDFLAGS= -L/usr/local/lib -L/usr/local/lib > preparing build tree... UX:find: ERROR: Illegal option -- -path > UX:find: TO FIX: Usage: find [path-list] [predicate-list] > UX:ln: ERROR: Cannot create > /home/ohp/pgfarmbuild/HEAD/pgsql.10464//config/Makefile: No such file or > directory > configure: error: failed > > which indicates it can't cope with your recent patch to prep_buildtree. > I was afraid that might be a problem. Can we please stick to the > find(1) options defined by the Single Unix Spec? > http://www.opengroup.org/onlinepubs/007908799/xcu/find.html Ugh. I'll look into this. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] crash in plancache with subtransactions
Hi, A customer was hitting some misbehavior in one of their internal tests and I tracked it down to plancache not behaving properly with subtransactions: in particular, a plan is not being marked "dead" when the subtransaction on which it is planned rolls back. It was reported in 8.4, but I can reproduce the problem on 9.0 too with this small script: drop schema alvherre cascade; drop schema test cascade; create schema test; create schema alvherre; set search_path = 'alvherre'; create or replace function dummy(text) returns text language sql as $$ SELECT relname::text FROM pg_class c WHERE c.oid = $1::regclass $$; create or replace function broken(p_name_table text) returns void language plpgsql as $$ declare v_table_full text := alvherre.dummy(p_name_table); begin return; end; $$; BEGIN; create table test.stuffs (stuff text); SAVEPOINT a; select broken('nonexistant.stuffs'); ROLLBACK TO a; select broken('test.stuffs'); rollback; The symptom is that the second call to broken() fails with this error message: ERROR: relation "" does not exist CONTEXT: SQL function "dummy" statement 1 PL/pgSQL function "broken" line 3 during statement block local variable initialization Note that this is totally bogus, because the relation being referenced does indeed exist. In fact, if you commit the transaction and call the function again, it works. Also, the state after the first call is a bit bogus: if you repeat the whole sequence starting at the BEGIN line, it causes a crash on 8.4. I hacked up plancache a bit so that it marks plans as dead when the subtransaction resource owner releases it. It adds a new arg to ReleaseCachedPlan(); if true, the plan is marked dead. All current callers, except the one in ResourceOwnerReleaseInternal(), use false thus preserving the current behavior. resowner sets this as true when aborting a (sub)transaction. I have to admit that it seems somewhat the wrong API, but I don't see a better way. (I thought above relcache or syscache inval, but as far as I can't tell there isn't any here). I'm open to suggestions. Patch attached. -- Álvaro Herrera 0001-Mark-a-cache-plan-as-dead-when-aborting-its-creating.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing an installation's default value of unix_socket_directory
"A.M." writes: > On Oct 21, 2010, at 4:19 PM, Robert Haas wrote: >> I think adding this to pg_config is sensible. Sure, the user could >> have moved the socket directory. But it's a place to start looking. >> So why not? > Because pg_config is supposed to return the current state of a cluster? pg_config is not supposed to do any such thing. It exists specifically and solely to tell you about build options that were baked into the compiled code. Actually, the only reason this is even up for discussion is that there's no configure option to set DEFAULT_PGSOCKET_DIR. If there were, and debian were using it, then pg_config --configure would tell what I wish to know. I thought for a bit about proposing we add such an option, but given the current state of play it might be more misleading than helpful: as long as distros are accustomed to changing this setting via a patch, you couldn't trust pg_config --configure to tell you what a given installation actually has compiled into it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] find -path isn't portable
buildfarm member koi, having recently been rescued from git purgatory, is failing like this: configure: using CFLAGS=-O -Kinline configure: using CPPFLAGS= -I/usr/local/include/libxml2 -I/usr/local/include configure: using LDFLAGS= -L/usr/local/lib -L/usr/local/lib preparing build tree... UX:find: ERROR: Illegal option -- -path UX:find: TO FIX: Usage: find [path-list] [predicate-list] UX:ln: ERROR: Cannot create /home/ohp/pgfarmbuild/HEAD/pgsql.10464//config/Makefile: No such file or directory configure: error: failed which indicates it can't cope with your recent patch to prep_buildtree. I was afraid that might be a problem. Can we please stick to the find(1) options defined by the Single Unix Spec? http://www.opengroup.org/onlinepubs/007908799/xcu/find.html regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation, work in progress
> I think that that would probably involve a whole lot more notational > busywork than just replacing typmod with something more complicated. > However, we're talking about breaking vast amounts of code in either > case, so maybe making it even vaster isn't a real consideration. Gods, yes. Please let's not extend typemod any further without an overhaul. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation, work in progress
Robert Haas writes: > On Thu, Oct 21, 2010 at 4:28 PM, Tom Lane wrote: >> TypeName per se is completely inappropriate for use beyond the first >> stage of parsing, because it requires catalog lookups to make any sense >> of. I think the post-parsing representation should still start with a >> type OID. I can agree with replacing typmod with a struct, though. > I think we should have both the type OID and the typmod in the struct. > Carrying the type OID separately from the typmod has caused us enough > heartache already. No? I think that that would probably involve a whole lot more notational busywork than just replacing typmod with something more complicated. However, we're talking about breaking vast amounts of code in either case, so maybe making it even vaster isn't a real consideration. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing an installation's default value of unix_socket_directory
On Oct 21, 2010, at 4:19 PM, Robert Haas wrote: > 2010/10/21 Tom Lane : >> Alvaro Herrera writes: >>> Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010: I agree this is interesting information to get, but wonder how pg_config can know that and it looks to me that this information as nothing to do in pg_config pg_config is all about installation, socket_dir is a postgresql.conf setting. >> >>> Yeah -- how is pg_config to know? All it can tell you is what was the >>> compiled-in default. >> >> That's what I wanted, actually. If you've set a non-default value in >> postgresql.conf, SHOW will tell you about that, but it fails to expose >> the default value. >> >>> Maybe you should go the SHOW route. The user could connect via TCP and >>> find out the socket directory that way. >> >> Yeah, the SHOW case is not useless by any means. > > I think adding this to pg_config is sensible. Sure, the user could > have moved the socket directory. But it's a place to start looking. > So why not? Because pg_config is supposed to return the current state of a cluster? Because it might indicate a connection to the wrong server? Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation, work in progress
On Thu, Oct 21, 2010 at 4:28 PM, Tom Lane wrote: > Peter Eisentraut writes: >> We already have TypeName as a structure that contains type and typmod >> (and collation, in my patch). We could make that a primnode instead of >> a parsenode, and use it in more places, or we could make a new leaner >> structure that only contains the numeric info. > > TypeName per se is completely inappropriate for use beyond the first > stage of parsing, because it requires catalog lookups to make any sense > of. I think the post-parsing representation should still start with a > type OID. I can agree with replacing typmod with a struct, though. I think we should have both the type OID and the typmod in the struct. Carrying the type OID separately from the typmod has caused us enough heartache already. No? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation, work in progress
Peter Eisentraut writes: > We already have TypeName as a structure that contains type and typmod > (and collation, in my patch). We could make that a primnode instead of > a parsenode, and use it in more places, or we could make a new leaner > structure that only contains the numeric info. TypeName per se is completely inappropriate for use beyond the first stage of parsing, because it requires catalog lookups to make any sense of. I think the post-parsing representation should still start with a type OID. I can agree with replacing typmod with a struct, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Thu, Oct 21, 2010 at 4:21 PM, Josh Berkus wrote: > On 10/20/10 6:54 PM, Robert Haas wrote: >> I find it impossible to believe that's >> a good decision, and IMHO we should be focusing on how to make the >> parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us >> most of the same benefits without throwing away hard-won performance. > > I'd be happy to accept that. Is it possible, though? I sketched an outline of the problem AIUI here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01348.php I think it's possible; I'm not quite sure how hard it is. Unfortunately, I've not had as much PG-hacking time lately as I'd like... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
"Stephen R. van den Berg" writes: > Tom Lane wrote: >> There are way too many scenarios where you'll have no hope of doing >> any such manual recovery anyway. > True. It's all a matter of statistics. Judging by the number of reports > I find by googling net-history, I'd have to conclude that the proposed > extra information would have helped more than half of them. Uh, no, it would have helped whatever minuscule fraction of them had the tools and the expertise to make use of the information. This should not be confused with a magic fix-it button. If you've lost your system catalogs you're facing a whole lot of work that will at best recover an untrustworthy version of some of your data. Most of the data-loss reports I've seen appeared to come from people who wouldn't be capable of doing such recovery work even if better tools were available. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Thu, Oct 21, 2010 at 12:10:22PM -0700, David Gould wrote: > On Thu, Oct 21, 2010 at 08:16:27AM -0700, Mark Wong wrote: > > On Tue, Oct 19, 2010 at 8:30 PM, daveg wrote: > > > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: > > >> On 20/10/10 16:05, Mark Kirkwood wrote: > > >> > > > >> > > > >> >shmget and friends are hugetlbpage aware, so it seems it should 'just > > >> >work'. > > >> > > > >> > > >> Heh - provided you specify > > >> > > >> SHM_HUGETLB > > >> > > >> > > >> in the relevant call that is :-) > > > > > > I had a patch for this against 8.3 that I could update if there is any > > > interest. I suspect it is helpful. > > > > Oh, probably better than me digging up my broken one. Send it out as > > is if you don't want to update it. :) > > I'll update it and see if I can get a largish machine to test, at least with > pgbench on. But not today alas. If you'd be so kind as to update it, others can probably find the aforementioned largish machine to test it on :) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On 10/20/10 6:54 PM, Robert Haas wrote: > I find it impossible to believe that's > a good decision, and IMHO we should be focusing on how to make the > parameters PGC_SIGHUP rather than PGC_POSTMASTER, which would give us > most of the same benefits without throwing away hard-won performance. I'd be happy to accept that. Is it possible, though? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing an installation's default value of unix_socket_directory
2010/10/21 Tom Lane : > Alvaro Herrera writes: >> Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010: >>> I agree this is interesting information to get, but wonder how >>> pg_config can know that and it looks to me that this information as >>> nothing to do in pg_config >>> >>> pg_config is all about installation, socket_dir is a postgresql.conf >>> setting. > >> Yeah -- how is pg_config to know? All it can tell you is what was the >> compiled-in default. > > That's what I wanted, actually. If you've set a non-default value in > postgresql.conf, SHOW will tell you about that, but it fails to expose > the default value. > >> Maybe you should go the SHOW route. The user could connect via TCP and >> find out the socket directory that way. > > Yeah, the SHOW case is not useless by any means. I think adding this to pg_config is sensible. Sure, the user could have moved the socket directory. But it's a place to start looking. So why not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing an installation's default value of unix_socket_directory
Alvaro Herrera writes: > Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010: >> I agree this is interesting information to get, but wonder how >> pg_config can know that and it looks to me that this information as >> nothing to do in pg_config >> >> pg_config is all about installation, socket_dir is a postgresql.conf setting. > Yeah -- how is pg_config to know? All it can tell you is what was the > compiled-in default. That's what I wanted, actually. If you've set a non-default value in postgresql.conf, SHOW will tell you about that, but it fails to expose the default value. > Maybe you should go the SHOW route. The user could connect via TCP and > find out the socket directory that way. Yeah, the SHOW case is not useless by any means. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
On Thu, Oct 21, 2010 at 11:30 AM, Stephen R. van den Berg wrote: > For the recovery information I'd like to reserve: > identifier: 00: table OID > 01: table layout > So here's a proposal for something that could maybe be implemented. I think I'm leaning against this currently as there are just too many caveats even for this limited functionality. I'm more inclined to go with the idea someone else proposed of dumping an audit log of all DDL on the table or after any DDL dumping the create table statements as pg_dump would generate them would to a separate fork. But here goes just to give you an idea what I think is doable and how limited it would be: In this meta data object put: table oid table name number of columns array of typlen for those columns (-1 for varlena and size for fixed length) That would have a maximum size of just over 6k which is too large for a BUFSZ 4k database but fits on default 8k databases. It would have enough information to be able to find all the table columns but not to understand how to interpret the contents either what their name or types. Including the type brings in a new set of complications. Even if you assume the built-in typoids never change there are going to be typoids that correspond to user defined types. Without the names of those types the oids are pretty useless. Just the typoids would put you over 8k in the worst case and the names would put you into the realm of needing arbitrary numbers of blocks for even average cases. Simiarly including the column names would require potentially many blocks. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing an installation's default value of unix_socket_directory
Tom Lane writes: > One possible response would be to add an item to what pg_config knows > about, eg "pg_config --socketdir". This doesn't answer every possible > use-case either, but it would be helpful for some scenarios. > > Thoughts? Following some links one can find out http://packages.debian.org/source/unstable/postgresql-9.0 http://ftp.de.debian.org/debian/pool/main/p/postgresql-9.0/postgresql-9.0_9.0.1-1.debian.tar.gz And check that debian package is patching src/include/pg_config_manual.h and not using the other facility proposed in the comments: * here's where to twiddle it. You can also override this at runtime * with the postmaster's -k switch. -#define DEFAULT_PGSOCKET_DIR "/tmp" +#define DEFAULT_PGSOCKET_DIR "/var/run/postgresql" But still, I wonder how this -k switch will get a role here, pg_control certainly won't know about that. I guess it's worse to give a wrong value rather than none, but that's easy to fix by having a good label for the line, I guess. All in all it would be good to have that in pg_control. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing an installation's default value of unix_socket_directory
Excerpts from Cédric Villemain's message of jue oct 21 16:01:30 -0300 2010: > I agree this is interesting information to get, but wonder how > pg_config can know that and it looks to me that this information as > nothing to do in pg_config > > pg_config is all about installation, socket_dir is a postgresql.conf setting. Yeah -- how is pg_config to know? All it can tell you is what was the compiled-in default. pg_ctl would be nice, but we'd have to make it parse the config file (there has been talk about that). In any case, if you don't know where the socket is, presumably you don't know where the config file is, either. You've just moved the problem. Maybe you should go the SHOW route. The user could connect via TCP and find out the socket directory that way. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Thu, Oct 21, 2010 at 08:16:27AM -0700, Mark Wong wrote: > On Tue, Oct 19, 2010 at 8:30 PM, daveg wrote: > > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: > >> On 20/10/10 16:05, Mark Kirkwood wrote: > >> > > >> > > >> >shmget and friends are hugetlbpage aware, so it seems it should 'just > >> >work'. > >> > > >> > >> Heh - provided you specify > >> > >> SHM_HUGETLB > >> > >> > >> in the relevant call that is :-) > > > > I had a patch for this against 8.3 that I could update if there is any > > interest. I suspect it is helpful. > > Oh, probably better than me digging up my broken one. Send it out as > is if you don't want to update it. :) I'll update it and see if I can get a largish machine to test, at least with pgbench on. But not today alas. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation, work in progress
On Thu, Oct 21, 2010 at 2:44 PM, Peter Eisentraut wrote: > On tor, 2010-10-14 at 22:54 -0400, Robert Haas wrote: >> and maybe not that bad, but I wonder if there is some preparatory >> refactoring that could be done to trim it down a bit. I notice, for >> example, that a lot of places that looked at > first/last> now look at . In >> particular, all the pathkey stuff is like this. And similarly places >> that used to care about now have to care about > tymod, collation>. There might be ways to restructure some of this >> code so that these things can be changed without having to touch quite >> so many places. > > Yeah, I think that's what I'll try to do next. > > We already have TypeName as a structure that contains type and typmod > (and collation, in my patch). We could make that a primnode instead of > a parsenode, and use it in more places, or we could make a new leaner > structure that only contains the numeric info. > > We could then, for example, change things like this: > > typedef struct Var > { > Expr xpr; > ... > Oid vartype; > int32 vartypmod; > ... > } > > into this > > typedef struct Var > { > Expr xpr; > ... > TypeName/TypeFoo vartype; > ... > } > > This would save boatloads of duplicate code. I think that the idea of having a node that represents "a type in all its glory" is a very good one. I'm somewhat inclined not to reuse TypeName, because I think we'll end up wanting to use this in places where "names" and "location" are not available. In fact, judging by some of the logic in LookupTypeNames(), we have some cases like that already, which might be worth trying to clean up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing an installation's default value of unix_socket_directory
2010/10/21 Tom Lane : > I just noticed that there doesn't seem to be any good way of finding > out what a postmaster's default value of unix_socket_directory is. > If you try to SHOW it you just get an empty string. We could probably > fix things so that SHOW exposes the actual setting, but (1) there might > be security arguments against that, and (2) if your problem is that you > would like to find out the value so's you can connect to said > postmaster, SHOW isn't going to help you. > > One possible response would be to add an item to what pg_config knows > about, eg "pg_config --socketdir". This doesn't answer every possible > use-case either, but it would be helpful for some scenarios. > > Thoughts? I agree this is interesting information to get, but wonder how pg_config can know that and it looks to me that this information as nothing to do in pg_config pg_config is all about installation, socket_dir is a postgresql.conf setting. I'd prefer a 'pg_ctl status' output. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in plpython's Python Generators
Excerpts from Alvaro Herrera's message of jue oct 21 15:32:53 -0300 2010: > Excerpts from Jean-Baptiste Quenot's message of jue oct 21 09:20:16 -0300 > 2010: > > > I get this error when calling the function: > > > > test=# select foobar(); > > ERROR: error fetching next item from iterator > > I can reproduce this here. The first bug to solve is, I think, getting > a more meaningful error report. Something like this. Somebody that really knows their way around Python has to clean this up. alvherre=# select * from foobar(); ERROR: error extrayendo el próximo elemento del iterador CONTEXTO: falló SPI_execute: SPI_ERROR_UNCONNECTED función PL/Python «foobar» I think all error cases in plpython need some improvement so that they show the error message from Python. Right now they are ignored. ... and presumably somebody can fix the real bug that Jean-Baptiste hit, too. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support plpy.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per-column collation, work in progress
On tor, 2010-10-14 at 22:54 -0400, Robert Haas wrote: > and maybe not that bad, but I wonder if there is some preparatory > refactoring that could be done to trim it down a bit. I notice, for > example, that a lot of places that looked at first/last> now look at . In > particular, all the pathkey stuff is like this. And similarly places > that used to care about now have to care about tymod, collation>. There might be ways to restructure some of this > code so that these things can be changed without having to touch quite > so many places. Yeah, I think that's what I'll try to do next. We already have TypeName as a structure that contains type and typmod (and collation, in my patch). We could make that a primnode instead of a parsenode, and use it in more places, or we could make a new leaner structure that only contains the numeric info. We could then, for example, change things like this: typedef struct Var { Exprxpr; ... Oid vartype; int32 vartypmod; ... } into this typedef struct Var { Exprxpr; ... TypeName/TypeFoo vartype; ... } This would save boatloads of duplicate code. > It looks like you've define collations as objects that exist within > particular namespaces, but there's no CREATE COLLATION statement, so I > don't see what purpose this serves. I suppose we could leave that to > be added later, but is there actually a use case for having collations > in individual schemas, or should we treat them more like we do casts - > i.e. as database-global objects? The SQL standard defines it that way, and there should be a CREATE COLLATION statement later. Application-specific collation sequences might not be unreasonable in the future. > Why does the executor ever need to see collate clauses? Hmm, maybe not. I think it did in an earlier working draft. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable snapshot isolation patch
Jeff Davis wrote: > in this case we do clearly have a problem, because the result is > not equal to the serial execution of the transactions in either > order. Yeah, you're right. I misread that example -- newbie with the PERIOD type. > So the question is: at what point is the logic wrong? It's either: > 1. PreCommit_CheckForSerializationFailure() is missing a failure > case. > 2. The state prior to entering that function (which I believe I > sufficiently described) is wrong. It's (2). For the reasons I described in my previous email. Even though misread the specifics of your example, I was close enough to see where the problem was accurately. :-/ > If it's (2), then what should the state look like, and how is the > GiST code supposed to result in that state? The second insert should create conflicts similar to what the first did, but in the other direction -- simple write skew. How GiST is supposed to catch this is the big question. My logic that a conflicting insert will modify a page read by the other transaction only holds until someone inserts a conflicting entry. That's why it wasn't reliably failing until you had and example where both transactions accessing the same leaf page. In your example, session 1's insert creates the leaf entry and propagates entries up to the root. When session 2 inserts, it can just modify the leaf, so the conflict is missed. As I said, the most obvious way to fix this is to look for conflicts while descending to the leaf for an insert. I'm almost sure we can do better than that, but I haven't finished thinking it through. A rough idea might be that when we find a conflict on an insert, we acquire additional predicate locks on everything between the lowest point of conflict and the leaf; the rest of the logic would remain as-is. I haven't finished mulling that over, but it seems likely to work. If we did that, session 2 would detect the conflict on the insert to the leaf, and all would be well. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ask for review of MERGE
Robert Haas wrote: I think the right way to write UPSERT is something along the lines of: MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON s.item_id = t.item_id ... That led in the right direction, after a bit more fiddling I was finally able to get something that does what I wanted: a single table UPSERT implemented with this MERGE implementation. Here's a log of a test session, suitable for eventual inclusion in the regression tests: CREATE TABLE Stock(item_id int UNIQUE, balance int); INSERT INTO Stock VALUES (10, 2200); INSERT INTO Stock VALUES (20, 1900); SELECT * FROM Stock ORDER BY item_id; item_id | balance -+- 10 |2200 20 |1900 MERGE INTO Stock t USING (VALUES(10,100)) AS s(item_id,balance) ON s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance) ; MERGE 1 SELECT * FROM Stock ORDER BY item_id; item_id | balance -+- 10 |2300 20 |1900 MERGE INTO Stock t USING (VALUES(30,2000)) AS s(item_id,balance) ON s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance) ; MERGE 1 SELECT * FROM Stock ORDER BY item_id; item_id | balance -+- 10 |2300 20 |1900 30 |2000 I'm still a little uncertain as to whether any of my other examples should have worked under the spec but just didn't work here, but I'll worry about that later. Here's what the query plan looks like on a MATCH: Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 rows=0 loops=1) Action 1: Update When Matched Action 2: Insert When Not Mactched MainPlan: -> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual time=0.050..0.061 rows=1 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=1) -> Index Scan using stock_item_id_key on stock t (cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1) Index Cond: ("*VALUES*".column1 = item_id) Total runtime: 0.370 ms And here's a miss: Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145 rows=0 loops=1) Action 1: Update When Matched Action 2: Insert When Not Mactched MainPlan: -> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual time=0.028..0.033 rows=1 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8) (actual time=0.004..0.005 rows=1 loops=1) -> Index Scan using stock_item_id_key on stock t (cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1) Index Cond: ("*VALUES*".column1 = item_id) Total runtime: 0.255 ms Next steps here: 1) Performance/concurrency tests against trigger-based UPSERT approach. 2) Finish bit rot cleanup against HEAD. 3) Work out more complicated test cases to try and fine more unexpected behavior edge cases and general bugs. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
Tom Lane wrote: >"Stephen R. van den Berg" writes: >> If it's inserted in the "special" area, it will not break any >> compatibility. >I'll tell you what I really don't like about this proposal: we discuss >some scheme or other for taking over the "special space" in heap pages >at least once a year. None of them have been particularly compelling >so far, but one may come along that is; especially given that we're now >trying to maintain on-disk compatibility across versions. So I think >the opportunity cost of assigning a use to that space is mighty high. >I don't find this idea important enough to justify foreclosing future >uses for the special space. Well, I had (of course) thought of that, and the classical solution to this is to specify a certain attribute based format in order not to canabalise the space and block it for further other use. I.e. in the special area, we could start using something like: 2-byte field length (including the length field), 1-byte identifier, field content. For the recovery information I'd like to reserve: identifier: 00: table OID 01: table layout >The real bottom line is this: if you care enough about your data to >be willing to expend a large amount of effort on manual recovery >attempts, why didn't you have a decent backup scheme in place? Two obvious valid answers would be: Stupidity and/or ignorance, sometimes a strain of bad luck. I know it is a sad state of affairs, but not all users of postgresql are equally knowledgable/intelligent/responsible. >There are way too many scenarios where you'll have no hope of doing >any such manual recovery anyway. True. It's all a matter of statistics. Judging by the number of reports I find by googling net-history, I'd have to conclude that the proposed extra information would have helped more than half of them. -- Stephen. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in plpython's Python Generators
Excerpts from Jean-Baptiste Quenot's message of jue oct 21 09:20:16 -0300 2010: > I get this error when calling the function: > > test=# select foobar(); > ERROR: error fetching next item from iterator I can reproduce this here. The first bug to solve is, I think, getting a more meaningful error report. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL has higher cost than inheritance
On Thu, Oct 21, 2010 at 6:57 AM, Tom Lane wrote: >> Thanks. It also explains my another question why Merge Append cannot >> be used for UNION ALL plans. > > Hmm, seems like the example you show ought to work. I wonder if there > was an oversight in that patch... > Huh, that definitely worked in the earlier versions of the patch (as much as it "worked" at all) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Exposing an installation's default value of unix_socket_directory
I just noticed that there doesn't seem to be any good way of finding out what a postmaster's default value of unix_socket_directory is. If you try to SHOW it you just get an empty string. We could probably fix things so that SHOW exposes the actual setting, but (1) there might be security arguments against that, and (2) if your problem is that you would like to find out the value so's you can connect to said postmaster, SHOW isn't going to help you. One possible response would be to add an item to what pg_config knows about, eg "pg_config --socketdir". This doesn't answer every possible use-case either, but it would be helpful for some scenarios. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.
Kevin Grittner wrote: > Bruce Momjian wrote: > > > why would we do this client-side rather than server-side? > > Because the timeout is supposed to be a limit on the time allowed > for specific Java methods to complete, which might be running a > large number of SQL statements within one invocation, and which may > include significant network latency. It's a lot of work to get > "pretty close" on the server side, and you can never really > implement exactly what the JDBC API is requesting. > > What if you have an app which can draw data from any of a number of > remote databases, and you want to use this limit so if one becomes > unavailable for some reason you can re-run the request on another > within a reasonable time? The network connection goes down after > you submit your request, you've got a period of minutes or hours > until TCP gives up, and the user expects a response within a few > seconds... > > If you implement something with server-side semantics, there's > nothing to prevent an application which is PostgreSQL-aware from > accessing it through JDBC, of course. statement_timeout and other > GUCs can be set locally to your heart's content. OK, thanks. Just had to ask. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.
Bruce Momjian wrote: > why would we do this client-side rather than server-side? Because the timeout is supposed to be a limit on the time allowed for specific Java methods to complete, which might be running a large number of SQL statements within one invocation, and which may include significant network latency. It's a lot of work to get "pretty close" on the server side, and you can never really implement exactly what the JDBC API is requesting. What if you have an app which can draw data from any of a number of remote databases, and you want to use this limit so if one becomes unavailable for some reason you can re-run the request on another within a reasonable time? The network connection goes down after you submit your request, you've got a period of minutes or hours until TCP gives up, and the user expects a response within a few seconds... If you implement something with server-side semantics, there's nothing to prevent an application which is PostgreSQL-aware from accessing it through JDBC, of course. statement_timeout and other GUCs can be set locally to your heart's content. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.
Tom Lane wrote: > Bruce Momjian writes: > >> Tom Lane wrote: > >>> I'm all for doing this client-side. > > > Uh, why would we do this client-side rather than server-side? If we do > > it server-side, all interfaces get it. > > 1. The API that's being implemented is JDBC-specific. > > 2. Even if you want to argue that it would be generally useful to have > such a feature, it would certainly require additional client-side > programming to make each client interface make use of it. Wouldn't it be simpler to code this in the server and have the client use that facility, rather than have the each client implement it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.
Bruce Momjian writes: >> Tom Lane wrote: >>> I'm all for doing this client-side. > Uh, why would we do this client-side rather than server-side? If we do > it server-side, all interfaces get it. 1. The API that's being implemented is JDBC-specific. 2. Even if you want to argue that it would be generally useful to have such a feature, it would certainly require additional client-side programming to make each client interface make use of it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus arrays versus typmods
Tom Lane wrote: > you are confusing an array over a domain type with a domain over > an array type. Yes I was. Sorry. Argument withdrawn. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.
Kevin Grittner wrote: > Tom Lane wrote: > > > I'm all for doing this client-side. > > Well, that makes a big difference. Unless someone can make a > convincing argument for why we should modify the server side to > support this, I think we should just focus on this one client-side > patch. > > I'd be happy to give it a closer look, but I may not be able to do so > for a few weeks, and won't complain if someone beats me to it. Uh, why would we do this client-side rather than server-side? If we do it server-side, all interfaces get it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus arrays versus typmods
"Kevin Grittner" writes: > Tom Lane wrote: >> So we will downcast myia to int[], or at least one might assume >> that's what's happening. But actually it's worse than that: the >> result of this operation is thought to be myia not int[], because >> myia itself is taken as matching ANYARRAY, and the operator result >> is the same ANYARRAY type. > That is actually what I would want and expect. Let's say I have an > array of attorney bar numbers, and I add one more as a literal. > While an argument could be made that the integer should be cast to > a bar number before being added to the array, we don't require that > for an assignment to a simple variable in the domain, so it would be > surprising to require a cast here, and even more surprising for the > concatenation to result in an array of primitive integers rather > than a array of attorney bar numbers. I disagree with that argument: you are confusing an array over a domain type with a domain over an array type. In the latter case, the domain could have additional constraints (such as the length constraint in my other example), and there's no reason to assume that || or other array operators would preserve those constraints. A perhaps comparable example is create domain verysmallint as int check (value < 10); select 9::verysmallint + 1; The result of the addition is int, not verysmallint, which is why you don't get an error. >From an abstract-data-type point of view, the fact that any of these operations are even allowed without an explicit downcast is a bit uncool: it exposes the implementation of the domain type, which one could argue shouldn't be allowed, at least not without some notational marker showing you know what you're doing. But the SQL committee seems to have decided to ignore that tradition and allow auto-downcasts. Nonetheless, when a domain type is fed to an operator that works on its base type, it has to be clearly understood that there *is* an implied downcast, and whatever special properties the domain may have had will be lost. As far as the operator and its result are concerned, the domain is just its base type. I'm not against fixing these cases so that auto downcasts happen, I'm just saying that it's outside the scope of what I'm going to do in response to the current bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus arrays versus typmods
Tom Lane wrote: > regression=# select array[1,2] || 3::myi; > ERROR: operator does not exist: integer[] || myi > > In this case, one might expect myi to be automatically downcast to < int so that it could be matched up with the int array, but that's > not happening. I guess it should allow that, although for my uses of domains it's hard to see a reasonable use case for it, so that part doesn't bother me too much. > regression=# create domain myia as int[]; > CREATE DOMAIN > regression=# select array[1,2]::myia || 3; > ?column? > -- > {1,2,3} > (1 row) > > So we will downcast myia to int[], or at least one might assume > that's what's happening. But actually it's worse than that: the > result of this operation is thought to be myia not int[], because > myia itself is taken as matching ANYARRAY, and the operator result > is the same ANYARRAY type. That is actually what I would want and expect. Let's say I have an array of attorney bar numbers, and I add one more as a literal. While an argument could be made that the integer should be cast to a bar number before being added to the array, we don't require that for an assignment to a simple variable in the domain, so it would be surprising to require a cast here, and even more surprising for the concatenation to result in an array of primitive integers rather than a array of attorney bar numbers. > regression=# create domain myia2 as int[] > check(array_length(value,1) = 2); > CREATE DOMAIN > regression=# select array[1,2]::myia2 || 3; > ?column? > -- > {1,2,3} > (1 row) > So we have a value that's claimed to belong to the domain, but it > doesn't meet the domain's constraints. Yeah, that's obviously wrong. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable snapshot isolation patch
On Thu, 2010-10-21 at 10:29 -0500, Kevin Grittner wrote: > Basically, when we already have a pivot, but no transaction has yet > committed, we wait to see if TN commits first. If so, we have a > problem; if not, we don't. There's probably some room for improving > performance by cancelling T0 or T1 instead of TN, at least some of > the time; but in this pass we are always cancelling the transaction > in whose process we detect the need to cancel something. Well, in this case we do clearly have a problem, because the result is not equal to the serial execution of the transactions in either order. So the question is: at what point is the logic wrong? It's either: 1. PreCommit_CheckForSerializationFailure() is missing a failure case. 2. The state prior to entering that function (which I believe I sufficiently described) is wrong. If it's (2), then what should the state look like, and how is the GiST code supposed to result in that state? I know some of these questions are answered in the relevant research, but I'd like to discuss this concrete example specifically. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Domains versus arrays versus typmods
Robert Haas writes: > On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane wrote: >> My point is that anyplace that is relying on the surface typelem, >> without drilling down to see what the base type is, is wrong. >> So yeah, those lookups are (will be) necessary. > OK. In that case, +1 from me. I've come across another interesting definitional issue, which is what properties should domains have with respect to matching to polymorphic arguments. Currently, the polymorphic matching functions take domains at face value (ie, without noticing their relationships to their base types), with one exception: because they use get_element_type() to decide if a type matches ANYARRAY, domains over arrays will be considered to match ANYARRAY. This leads to some weird inconsistencies and at least one genuine bug. Observe (this is with 9.0.x HEAD): regression=# create domain myi as int; CREATE DOMAIN regression=# select array[1,2] || 3; ?column? -- {1,2,3} (1 row) regression=# select array[1,2] || 3::myi; ERROR: operator does not exist: integer[] || myi In this case, one might expect myi to be automatically downcast to int so that it could be matched up with the int array, but that's not happening. However: regression=# create domain myia as int[]; CREATE DOMAIN regression=# select array[1,2]::myia || 3; ?column? -- {1,2,3} (1 row) So we will downcast myia to int[], or at least one might assume that's what's happening. But actually it's worse than that: the result of this operation is thought to be myia not int[], because myia itself is taken as matching ANYARRAY, and the operator result is the same ANYARRAY type. Thus, this case goes off the rails completely: regression=# create domain myia2 as int[] check(array_length(value,1) = 2); CREATE DOMAIN regression=# select array[1,2]::myia2; array --- {1,2} (1 row) regression=# select array[1,2,3]::myia2; ERROR: value for domain myia2 violates check constraint "myia2_check" regression=# select array[1,2]::myia2 || 3; ?column? -- {1,2,3} (1 row) The result of the || is considered to be myia2, as can be seen for example this way: regression=# create view vvv as select array[1,2]::myia2 || 3 as x; CREATE VIEW regression=# \d vvv View "public.vvv" Column | Type | Modifiers +---+--- x | myia2 | So we have a value that's claimed to belong to the domain, but it doesn't meet the domain's constraints. What I am intending to do about this in the short run is to leave the anyarray-ness tests in the polymorphic-compatibility-checking functions as-is. That will mean (with the change in typelem for domains) that a domain over array doesn't match ANYARRAY unless you explicitly downcast it. I argue that this is consistent with the current behavior of not auto-downcasting domains to match the element type of an array. We could go back and change it later, but if we do, we should try to make both cases provide auto-downcast-when-needed behavior. I have not dug into just what code changes would be needed for that. Auto-downcast wouldn't be exactly compatible with the current behavior anyway, since it would result in a different claimed type for the operator result. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Thu, Oct 21, 2010 at 11:12 AM, Dimitri Fontaine wrote: > "David E. Wheeler" writes: >> Sure. The reason to do it, though, is so that extension authors can create >> just one metadata file, instead of two (or three, if one must also put such >> data into the Makefile). > > That's a good idea, but my guess is that the implementation cost of > supporting the control format in your perl infrastructure is at least an > order of magnitude lower than the cost for me to support your current > JSON file format, so I lean towards you having an automated way to fill > in the json file from the control one... Ah, truth to power. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Excerpts from Dimitri Fontaine's message of jue oct 21 12:53:18 -0300 2010: > This part of the problem didn't receive much thoughts yet, and it shows > up. About the rest of the patch have been in my head for months, I > expect less problems there... Keep on it. You're doing a terrific job. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
"David E. Wheeler" writes: > Be aware that PGXS sets a $(VERSION) variable already, so you'll need > to use another name, I think, to guard from conflicts. This is in > Makefile.global: Of course that's not a problem for contribs, and I used EXTVERSION in a previous version of the patch. I guess I will get back to use $(EXTVERSION) in the Makefile next time I have to produce a patch. This part of the problem didn't receive much thoughts yet, and it shows up. About the rest of the patch have been in my head for months, I expect less problems there... Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Oct 21, 2010, at 8:12 AM, Dimitri Fontaine wrote: > That's a good idea, but my guess is that the implementation cost of > supporting the control format in your perl infrastructure is at least an > order of magnitude lower than the cost for me to support your current > JSON file format, so I lean towards you having an automated way to fill > in the json file from the control one... Well, it *will* be easier. Eventually. Right now, the file has to be edited by hand. Which I can tell you from experience is rather…error-prone. Anyway, I wouldn't push for a JSON file format until a parser was just there for you to use without too much trouble. > The Makefile supports $(VERSION) because chances are it's already there > (think packaging or tarball release targets). Having yet another place > where to manually maintain a version number ain't appealing. Be aware that PGXS sets a $(VERSION) variable already, so you'll need to use another name, I think, to guard from conflicts. This is in Makefile.global: VERSION = 9.0.1 MAJORVERSION = 9.0 Maybe use EXTVERSION? You don't want to overwrite the core version because a makefile author could use it to change the build (pgTAP does this, for example). > In the latest patch, though, the only other thing you find in the > Makefile about the extension is its basename, which must be the one of > both the .control and the .sql files. And it's possible for $(EXTENSION) > to be a list of them, too, because of contrib/spi. Right, that makes sense. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable snapshot isolation patch
Jeff Davis wrote: > That looks like a reasonable state to me, but I'm not sure exactly > what the design calls for. I am guessing that the real problem is > in PreCommit_CheckForSerializationFailure(), where there are 6 > conditions that must be met for an error to be thrown. T2 falls > out right away at condition 1. T1 falls out on condition 4. I > don't really understand condition 4 at all -- can you explain it? > And can you explain conditions 5 and 6 too? Since most transactions are rolled back on a conflict detection during a read or write attempt, there are only a few very specific conditions which can "slip through" to where they need to be detected on commit. Here's the code with the six conditions: if (MySerializableXact->inConflict != InvalidSerializableXact && MySerializableXact->inConflict != MySerializableXact && !(MySerializableXact->inConflict->rolledBack) && MySerializableXact->inConflict->inConflict != InvalidSerializableXact && !SxactIsCommitted(MySerializableXact->inConflict) && !SxactIsCommitted(MySerializableXact->inConflict->inConflict)) Condition 4 is testing whether MySerializableXact is on the "out" side of a pivot -- in the parlance of most examples, is MySerializableXact TN? Condition 5 and 6 confirm that neither T0 nor T1 have committed first; we can only have a problem if TN commits first. Basically, when we already have a pivot, but no transaction has yet committed, we wait to see if TN commits first. If so, we have a problem; if not, we don't. There's probably some room for improving performance by cancelling T0 or T1 instead of TN, at least some of the time; but in this pass we are always cancelling the transaction in whose process we detect the need to cancel something. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Tue, Oct 19, 2010 at 8:30 PM, daveg wrote: > On Wed, Oct 20, 2010 at 04:08:37PM +1300, Mark Kirkwood wrote: >> On 20/10/10 16:05, Mark Kirkwood wrote: >> > >> > >> >shmget and friends are hugetlbpage aware, so it seems it should 'just >> >work'. >> > >> >> Heh - provided you specify >> >> SHM_HUGETLB >> >> >> in the relevant call that is :-) > > I had a patch for this against 8.3 that I could update if there is any > interest. I suspect it is helpful. Oh, probably better than me digging up my broken one. Send it out as is if you don't want to update it. :) Regards, Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
"David E. Wheeler" writes: > Sure. The reason to do it, though, is so that extension authors can create > just one metadata file, instead of two (or three, if one must also put such > data into the Makefile). That's a good idea, but my guess is that the implementation cost of supporting the control format in your perl infrastructure is at least an order of magnitude lower than the cost for me to support your current JSON file format, so I lean towards you having an automated way to fill in the json file from the control one... The Makefile supports $(VERSION) because chances are it's already there (think packaging or tarball release targets). Having yet another place where to manually maintain a version number ain't appealing. In the latest patch, though, the only other thing you find in the Makefile about the extension is its basename, which must be the one of both the .control and the .sql files. And it's possible for $(EXTENSION) to be a list of them, too, because of contrib/spi. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL and HugePage
On Wed, Oct 20, 2010 at 1:13 PM, Robert Haas wrote: > On Wed, Oct 20, 2010 at 3:47 PM, daveg wrote: >> On Wed, Oct 20, 2010 at 12:28:25PM -0700, Greg Stark wrote: >>> On Wed, Oct 20, 2010 at 12:17 PM, Greg Stark wrote: >>> > I don't think it's a big cost once all the processes >>> > have been forked if you're reusing them beyond perhaps slightly more >>> > efficient cache usage. >>> >>> Hm, this site claims to get a 13% win just from the reduced tlb misses >>> using a preload hack with Pg 8.2. That would be pretty substantial. >>> >>> http://oss.linbit.com/hugetlb/ >> >> That was my motivation in trying a patch. TLB misses can be a substantial >> overhead. I'm not current on the state of play, but working at Sun's >> benchmark lab on a DB TPC-B benchmark something for the first generation >> of MP systems, something like 30% of all bus traffic was TLB misses. The >> next iteration of the hardward had a much larger TLB. >> >> I have a client with 512GB memory systems, currently with 128GB configured >> as postgresql buffer cache. Which is 32M TLB entires trying to fit in the >> few dozed cpu TLB slots. I suspect there may be some contention. >> >> I'll benchmark of course. > > Do you mean 128GB shared buffers, or shared buffers + OS cache? I > think that the general wisdom is that performance tails off beyond > 8-10GB of shared buffers anyway, so a performance improvement on 128GB > shared buffers might not mean much unless you can also show that 128GB > shared buffers actually performs better than some smaller amount. I'm sure someone will correct me if I'm wrong, but when I looked at this a couple years ago I believe a side effect of using hugetlbs is that these segments are never swapped out. I made a weak attempt to patch postgres to use hugetlbs when allocating shared memory. If I can find that patch I'll send it out.. Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
On Oct 21, 2010, at 12:33 AM, Dimitri Fontaine wrote: > I don't see what it buys us in this very context. The main thing here to > realize is that I wrote about no code to parse the control file. I don't > think the extension patch should depend on the JSON-in-core patch. > > Now, once we have JSON and before the release, I guess given a good > reason to have much more complex configuration files that don't look at > all like postgresql.conf, we could revisit. Sure. The reason to do it, though, is so that extension authors can create just one metadata file, instead of two (or three, if one must also put such data into the Makefile). Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Tom Lane writes: > ... well, that's just a bug in hstore. *All* the contrib modules > should be using PGXS, unless they have a damn good reason not to; > which is not apparent for hstore. Here's a patch. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support *** a/contrib/hstore/Makefile --- b/contrib/hstore/Makefile *** *** 1,9 # contrib/hstore/Makefile - subdir = contrib/hstore - top_builddir = ../.. - include $(top_builddir)/src/Makefile.global - MODULE_big = hstore OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \ crc32.o --- 1,5 *** *** 12,15 DATA_built = hstore.sql --- 8,21 DATA = uninstall_hstore.sql REGRESS = hstore + ifdef USE_PGXS + PG_CONFIG = pg_config + PGXS := $(shell $(PG_CONFIG) --pgxs) + include $(PGXS) + else + subdir = contrib/hstore + top_builddir = ../.. + include $(top_builddir)/src/Makefile.global include $(top_srcdir)/contrib/contrib-global.mk + endif + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Serializable snapshot isolation patch
Jeff Davis wrote: >> Also, it appears to be non-deterministic, to a degree at least, >> so you may not observe the problem in the exact way that I do. > The SELECTs only look at the root and the predicate doesn't match. > So each SELECT sets an SIReadLock on block 0 and exits the search. > Looks good so far. > > T1 then inserts, and it has to modify page 0, so it does > FlagRWConflict(). That sets writer->inConflict = reader and > reader->outConflict = writer (where writer is T1 and reader is > T2); and T1->outConflict and T2->inConflict remain NULL. > > Then T2 inserts, and I didn't catch that part in as much detail in > gdb, but it apparently has no effect on that state, so we still > have T1->inConflict = T2, T1->outConflict = NULL, T2->inConflict = > NULL, and T2->outConflict = T1. I now see where the wheels fall off. The GiST query initially stops at a high level, so predicate locks only go that deep, and the *first* insert of a conflicting row must ripple up and modify a locked page; but *subsequent* inserts may only need to modify the leaf level. Even though your particular example doesn't involve a cycle and therefore doesn't require a rollback for correctness (although it might tend to generate a false positive if index page locks were working correctly), you've exposed a flaw in the GiST AM implementation of predicate locks. On a first glance, it appears that we would need to check for conflicts as we move down through the index to find the right spot for an insert, not as we modify pages for the insert. I hope there's some more subtle technique or some way to qualify it; otherwise a search which stops at the root page would generate a conflict out to just about any index insertion from a concurrent transaction. I will add this to my list of issues to fix based on your review, unless it's something you would like to tackle -- I'm not going to chase away anyone who wants to help with this. :-) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extensions, this time with a patch
Dimitri Fontaine writes: > Itagaki Takahiro writes: >> Why does only hstore have version '9.1'? Any other modules have >> '9.1devel'. > It's the only contrib that's not using PGXS but instead directly > includes $(top_builddir)/src/Makefile.global, ... well, that's just a bug in hstore. *All* the contrib modules should be using PGXS, unless they have a damn good reason not to; which is not apparent for hstore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rawdump
"Stephen R. van den Berg" writes: > Robert Haas wrote: >> and break on-disk compatibility just to make it easier to > If it's inserted in the "special" area, it will not break any > compatibility. I'll tell you what I really don't like about this proposal: we discuss some scheme or other for taking over the "special space" in heap pages at least once a year. None of them have been particularly compelling so far, but one may come along that is; especially given that we're now trying to maintain on-disk compatibility across versions. So I think the opportunity cost of assigning a use to that space is mighty high. I don't find this idea important enough to justify foreclosing future uses for the special space. The real bottom line is this: if you care enough about your data to be willing to expend a large amount of effort on manual recovery attempts, why didn't you have a decent backup scheme in place? There are way too many scenarios where you'll have no hope of doing any such manual recovery anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL has higher cost than inheritance
Itagaki Takahiro writes: > On Thu, Oct 21, 2010 at 2:18 PM, Tom Lane wrote: >> The plan for UNION initially involves a couple of SubqueryScan nodes, >> which impose an extra cost of cpu_tuple_cost per tuple. Â Those later >> get optimized away, but we don't try to readjust the cost estimates >> for that. > Thanks. It also explains my another question why Merge Append cannot > be used for UNION ALL plans. Hmm, seems like the example you show ought to work. I wonder if there was an oversight in that patch... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Fix snapshot taking inconsistencies
Hi, Here's an updated patch. I'm still not too fond of the logic in spi.c, but I can't see a better way to do this. If someone sees a better way, I'm not going to object. I also made some changes to the SQL functions now that we have a different API. The previous code pushed and popped snapshots quite heavily. I'd also like to see more regression tests for SQL functions, but I'm going to submit my suggestions as a separate patch. Regards, Marko Tiikkaja *** a/src/backend/catalog/pg_proc.c --- b/src/backend/catalog/pg_proc.c *** *** 755,761 fmgr_sql_validator(PG_FUNCTION_ARGS) --- 755,763 Oid funcoid = PG_GETARG_OID(0); HeapTuple tuple; Form_pg_proc proc; + List *raw_parsetree_list; List *querytree_list; + ListCell *list_item; boolisnull; Datum tmp; char *prosrc; *** *** 828,836 fmgr_sql_validator(PG_FUNCTION_ARGS) */ if (!haspolyarg) { ! querytree_list = pg_parse_and_rewrite(prosrc, ! proc->proargtypes.values, ! proc->pronargs); (void) check_sql_fn_retval(funcoid, proc->prorettype, querytree_list, NULL, NULL); --- 830,854 */ if (!haspolyarg) { ! /* !* Parse and rewrite the queries in the function text. !* !* Even though check_sql_fn_retval is only interested in the last !* query, we analyze all of them here to check for any errors. !*/ ! raw_parsetree_list = pg_parse_query(prosrc); ! ! querytree_list = NIL; ! foreach(list_item, raw_parsetree_list) ! { ! Node *parsetree = (Node *) lfirst(list_item); ! ! querytree_list = pg_analyze_and_rewrite(parsetree, prosrc, ! proc->proargtypes.values, proc->pronargs); ! } ! ! Assert(querytree_list != NIL); ! (void) check_sql_fn_retval(funcoid, proc->prorettype, querytree_list, NULL, NULL); *** a/src/backend/executor/functions.c --- b/src/backend/executor/functions.c *** *** 84,89 typedef struct --- 84,90 boolreturnsSet; /* true if returning multiple rows */ boolreturnsTuple; /* true if returning whole tuple result */ boolshutdown_reg; /* true if registered shutdown callback */ + boolsnapshot; /* true if pushed an active snapshot */ boolreadonly_func; /* true to run in "read only" mode */ boollazyEval; /* true if using lazyEval for result query */ *** *** 93,107 typedef struct JunkFilter *junkFilter; /* will be NULL if function returns VOID */ ! /* head of linked list of execution_state records */ ! execution_state *func_state; } SQLFunctionCache; typedef SQLFunctionCache *SQLFunctionCachePtr; /* non-export function prototypes */ ! static execution_state *init_execution_state(List *queryTree_list, SQLFunctionCachePtr fcache, bool lazyEvalOK); static void init_sql_fcache(FmgrInfo *finfo, bool lazyEvalOK); --- 94,107 JunkFilter *junkFilter; /* will be NULL if function returns VOID */ ! List *func_state; } SQLFunctionCache; typedef SQLFunctionCache *SQLFunctionCachePtr; /* non-export function prototypes */ ! static List *init_execution_state(List *queryTree_list, SQLFunctionCachePtr fcache, bool lazyEvalOK); static void init_sql_fcache(FmgrInfo *finfo, bool lazyEvalOK); *** *** 123,183 static void sqlfunction_destroy(DestReceiver *self); /* Set up the list of per-query execution_state records for a SQL function */ ! static execution_state * init_execution_state(List *que