Re: [HACKERS] Built-in support for a memory consumption ulimit?
On Wed, Jun 18, 2014 at 10:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kapil...@gmail.com writes: Won't it be possible if we convert malloc calls in backend code to go through wrapper, we already have some precedents of same like guc_malloc, pg_malloc? We do not have control over mallocs done by third-party code (think pl/perl for example). Yeah, mallocs done by third-party code would be difficult to track, one possibility could be that we expose a built-in memory allocator function. I think it will lead to change in third-party code who wants to use this new feature. However if thats not viable then we need to think about some OS specific calls like the one you have suggested above (sbrk(0)), but I think that solution might also need to have portable API for Windows. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] include_dir catch-22
There is, IMO, a significant oversight with the include_dir feature. If a distributor wants to enable it by default at initdb time, they can't just turn it on in postgresql.conf.sample, because initdb will die when the postgres backend refuses to start because the configdir is missing. Yet the configdir cannot exist, or initdb will refuse to run. IMO we should just treat the configdir as implicitly if_exists, but otherwise, there needs to be a separate if_exists option. I'll gladly submit a patch to fix this for 9.4, I just want an opinion on which way to go first. (initdb then 'sed' the config file afterwards is not an answer) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [REVIEW] Re: Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions
On Tue, Jun 17, 2014 at 12:30 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: (Cc: to pgsql-bugs dropped.) At 2014-03-17 18:24:55 +1100, kommi.harib...@gmail.com wrote: *** a/doc/src/sgml/xfunc.sgml --- b/doc/src/sgml/xfunc.sgml *** *** 153,159 SELECT clean_emp(); --- 153,186 (literal\/) (assuming escape string syntax) in the body of the function (see xref linkend=sql-syntax-strings). /para + +sect2 id=xfunc-sql-function-parsing-mechanism + titleParsing mechanism of a function/title +indexterm + primaryfunction/primary + secondaryparsing mechanism/secondary +/indexterm I suggest Catalog changes within functions instead of the above title. + para + The body of an SQL function is parsed as if it were a single - multi-part + statement and thus uses a constant snapshot of the system catalogs for + every sub-statement therein. Commands that alter the catalog will likely not + work as expected. + /para + + para + For example: Issuing CREATE TEMP TABLE within an SQL function will add + the table to the catalog but subsequent statements in the same function will + not see those additions and thus the temporary table will be invisible to them. + /para + + para + Thus it is generally advised that applicationPL/pgSQL/ be used, instead of + acronymSQL/acronym, when any catalog visibilities are required in the same function. + /para +/sect2 I don't think that much text is warranted. I suggest something like the following condensed wording: para The body of an SQL function is parsed as if it were a single multi-part statement, using a constant snapshot of the system catalogs. The effect of any commands that alter the catalogs (e.g. CREATE TEMP TABLE) will therefore not be visible to subsequent commands in the function body. /para para The recommended workaround is to use applicationPL/PgSQL/. /para Does that seem sensible to you? Looks good, Thanks for the review. Updated patch attached. Regards, Hari Babu Fujitsu Australia sql_functions_parsing_doc_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] include_dir catch-22
On 06/18/2014 03:30 PM, Craig Ringer wrote: There is, IMO, a significant oversight with the include_dir feature. If a distributor wants to enable it by default at initdb time, they can't just turn it on in postgresql.conf.sample, because initdb will die when the postgres backend refuses to start because the configdir is missing. Yet the configdir cannot exist, or initdb will refuse to run. IMO we should just treat the configdir as implicitly if_exists, but otherwise, there needs to be a separate if_exists option. I'll gladly submit a patch to fix this for 9.4, I just want an opinion on which way to go first. Oh, and while it's possible for include_if_exists, you get a spammy initdb like: creating template1 database in testdb/base/1 ... LOG: skipping missing configuration file /home/ec2-user/testdb/some.conf 2014-06-18 04:05:15.194 EDT LOG: skipping missing configuration file /home/ec2-user/testdb/some.conf ok initializing pg_authid ... LOG: skipping missing configuration file /home/ec2-user/testdb/some.conf 2014-06-18 04:05:15.894 EDT LOG: skipping missing configuration file /home/ec2-user/testdb/some.conf ok initializing dependencies ... LOG: skipping missing configuration file /home/ec2-user/testdb/some.conf 2014-06-18 04:05:15.927 EDT LOG: skipping missing configuration file /home/ec2-user/testdb/some.conf ok -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for inet operators
I wanted to check the patch last time and found a bug effecting MVC vs MVC part of the join selectivity. Fixed version attached. Emre Hasegeli e...@hasegeli.com: Comparing the lists with each other slows down the function when statistics set to higher values. To avoid this problem I only use log(n) values of the lists. It is the first log(n) value for MCV, evenly separated values for histograms. In my tests, this optimization does not affect the planning time when statistics = 100, but does affect accuracy of the estimation. I can send the version without this optimization, if slow down with larger statistics is not a problem which should be solved on the selectivity estimation function. Also, I changed this from log(n) to sqrt(n). It seems much better now. I try to explain the reason to processes some of the values with more comments. I hope it is understandable. inet-selfuncs-v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 06/17/2014 09:16 PM, Andres Freund wrote: On 2014-06-17 12:14:00 -0400, Robert Haas wrote: On Tue, Jun 17, 2014 at 12:04 PM, Andres Freund and...@2ndquadrant.com wrote: Well, I'm not the guy who does things with geometric data, but I don't want to ignore the significant percentage of our users who are. As you must surely know, the GIST implementations for geometric data types store bounding boxes on internal pages, and that seems to be useful to people. What is your reason for thinking that it would be any less useful in this context? For me minmax indexes are helpful because they allow to generate *small* 'coarse' indexes over large volumes of data. From my pov that's possible possible because they don't contain item pointers for every contained row. That'ill imo work well if there are consecutive rows in the table that can be summarized into one min/max range. That's quite likely to happen for common applications of number of scalar datatypes. But the likelihood of placing sufficiently many rows with very similar bounding boxes close together seems much less relevant in practice. And I think that's generally likely for operations which can't be well represented as btree opclasses - the substructure that implies inside a Datum will make correlation between consecutive rows less likely. Well, I don't know: suppose you're loading geospatial data showing the location of every building in some country. It might easily be the case that the data is or can be loaded in an order that provides pretty good spatial locality, leading to tight bounding boxes over physically consecutive data ranges. Well, it might be doable to correlate them along one axis, but along both? That's more complicated... And even alongside one axis you already get into problems if your geometries are irregularly sized. Sure, there are cases where it would be useless. But it's easy to imagine scenarios where it would work well, where points are loaded in clusters and points that are close to each other also end up physically close to each other. Asingle large polygon will completely destroy indexability for anything stored physically close by because suddently the minmax range will be huge... So you'll need to cleverly sort for that as well. That's an inherent risk with minmax indexes: insert a few rows to the wrong locations in the heap, and the selectivity of the index degrades rapidly. The main problem with using it for geometric types is that you can't easily CLUSTER the table to make the minmax index effective again. But there are ways around that. But I'm not trying to say that we absolutely have to support that kind of thing; what I am trying to say is that there should be a README or a mailing list post or some such that says: We thought about how generic to make this. We considered A, B, and C. We rejected C as too narrow, and A because if we made it that general it would have greatly enlarged the disk footprint for the following reasons. Therefore we selected B. Isn't 'simpler implementation' a valid reason that's already been discussed onlist? Obviously simpler implementation doesn't trump everything, but it's one valid reason... Note that I have zap to do with the design of this feature. I work for the same company as Alvaro, but that's pretty much it... Without some analysis (e.g implementing it and comparing), I don't buy that it makes the implementation simpler to restrict it in this way. Maybe it does, but often it's actually simpler to solve the general case. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [REVIEW] Re: Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions
Thanks, I've marked this ready for committer. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict
Hi all I posted about a possible packaging issue with RHEL 6 PGDG packages for 9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a prior mail asking about what happened with moving to git hasn't had a response). Given that time is a concern with 9.4, I thought I'd raise it here. http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com Hopefully it's just PEBKAC. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] datistemplate of pg_database does not behave as per description in documentation
From: Rajeev rastogi rajeev.rast...@huawei.com Please find the attached patch with only documentation change. I marked this as ready for committer. The patch is good because it matches the description in the following page: http://www.postgresql.org/docs/devel/static/manage-ag-templatedbs.html Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 2014-06-18 12:18:26 +0300, Heikki Linnakangas wrote: On 06/17/2014 09:16 PM, Andres Freund wrote: Well, it might be doable to correlate them along one axis, but along both? That's more complicated... And even alongside one axis you already get into problems if your geometries are irregularly sized. Sure, there are cases where it would be useless. But it's easy to imagine scenarios where it would work well, where points are loaded in clusters and points that are close to each other also end up physically close to each other. Asingle large polygon will completely destroy indexability for anything stored physically close by because suddently the minmax range will be huge... So you'll need to cleverly sort for that as well. That's an inherent risk with minmax indexes: insert a few rows to the wrong locations in the heap, and the selectivity of the index degrades rapidly. Sure. But it's fairly normal to have natural clusteredness in many columns (surrogate keys, dateseries type of data). Even if you insert geometric types in a geographic clusters you'll have worse results because some bounding boxes will be big and such. And: The main problem with using it for geometric types is that you can't easily CLUSTER the table to make the minmax index effective again. But there are ways around that. Which are? Sure you can try stuff like recreating the table, sorting rows with boundary boxes area above threshold first, and then go on to sort by the lop left corner of the bounding box. But that'll be neither builtin, nor convenient, nor perfect. In contrast to a normal CLUSTER for types with a btree opclass which will yield the perfect order. Isn't 'simpler implementation' a valid reason that's already been discussed onlist? Obviously simpler implementation doesn't trump everything, but it's one valid reason... Note that I have zap to do with the design of this feature. I work for the same company as Alvaro, but that's pretty much it... Without some analysis (e.g implementing it and comparing), I don't buy that it makes the implementation simpler to restrict it in this way. Maybe it does, but often it's actually simpler to solve the general case. So to implement a feature one now has to implement the most generic variant as a prototype first? Really? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 2014-06-17 16:48:07 -0700, Greg Stark wrote: On Tue, Jun 17, 2014 at 11:16 AM, Andres Freund and...@2ndquadrant.com wrote: Well, it might be doable to correlate them along one axis, but along both? That's more complicated... And even alongside one axis you already get into problems if your geometries are irregularly sized. Asingle large polygon will completely destroy indexability for anything stored physically close by because suddently the minmax range will be huge... So you'll need to cleverly sort for that as well. I think there's a misunderstanding here, possibly mine. My understanding is that a min/max index will always be exactly the same size for a given size table. It stores the minimum and maximum value of the key for each page. Then you can do a bitmap scan by comparing the search key with each page's minimum and maximum to see if that page needs to be included in the scan. The failure mode is not that the index is large but that a page that has an outlier will be included in every scan as a false positive incurring an extra iop. I just rechecked, and no, it doesn't, by default, store a range for each page. It's MINMAX_DEFAULT_PAGES_PER_RANGE=128 pages by default... Haven't checked what's the lowest it can be se tto. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows
From: Michael Paquier michael.paqu...@gmail.com When doing some work on Windows, I noticed that the mkvc specs in src/tools/msvc use wsock32.lib, which is as far as I understand an old, obsolete version of the Windows socket library. Wouldn't it make sense to update the specs to build only with ws2_32.lib like in the patch attached? Yes, that makes sense, because wsock32.dll is an obsolete WinSock 1.1 library, while ws2_32.dll is a successor WinSock 2.0 library which is fully compatible with the old one. Doing grep -lir wsock32 . shows the following files. Could you modify and test these files as well for code cleanup? ./configure ./configure.in ./contrib/pgcrypto/Makefile ./src/interfaces/libpq/Makefile ./src/interfaces/libpq/win32.c ./src/interfaces/libpq/win32.mak ./src/test/thread/README ./src/tools/msvc/Mkvcbuild.pm Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] btreecheck extension
* Peter Geoghegan (p...@heroku.com) wrote: On Tue, Jun 17, 2014 at 5:11 PM, Robert Haas robertmh...@gmail.com wrote: Now, we could. We could come up with an extensible syntax, like this: CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ]; That's what I had in mind. Using the same trick that you came up with for EXPLAIN, to avoid grammar bloat and let the am figure out for itself what to name the various check types, with a generic default check. I'm fine with having these start out as external tools which are doing checks, but I've been specifically asked about (and have desired myself from time-to-time...) an in-core capability to check index/heap/etc validity. Folks coming from certain other RDBMS's find it amazing that we don't have any support for that when what they really want is a background worker which is just automatically going around doing these checks. Now, perhaps we could have the background worker without the syntax for running these by hand, but I don't particularly like that idea. Being able to run these checks by hand is extremely useful and I'd much prefer to be able to force that than to have some mechanism where I have to submit a request for a check to another process through a queue or something along those lines. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Minmax indexes
On 06/18/2014 01:46 PM, Andres Freund wrote: On 2014-06-18 12:18:26 +0300, Heikki Linnakangas wrote: The main problem with using it for geometric types is that you can't easily CLUSTER the table to make the minmax index effective again. But there are ways around that. Which are? Sure you can try stuff like recreating the table, sorting rows with boundary boxes area above threshold first, and then go on to sort by the lop left corner of the bounding box. Right, something like that. Or cluster using some other column that correlates with the geometry, like a zip code. But that'll be neither builtin, nor convenient, nor perfect. In contrast to a normal CLUSTER for types with a btree opclass which will yield the perfect order. Sure. BTW, CLUSTERing by a geometric type would be useful anyway, even without minmax indexes. Isn't 'simpler implementation' a valid reason that's already been discussed onlist? Obviously simpler implementation doesn't trump everything, but it's one valid reason... Note that I have zap to do with the design of this feature. I work for the same company as Alvaro, but that's pretty much it... Without some analysis (e.g implementing it and comparing), I don't buy that it makes the implementation simpler to restrict it in this way. Maybe it does, but often it's actually simpler to solve the general case. So to implement a feature one now has to implement the most generic variant as a prototype first? Really? Implementing something is a good way to demonstrate how it would look like. But no, I don't insist on implementing every possible design whenever a new feature is proposed. I liked Greg's sketch of what the opclass support functions would be. It doesn't seem significantly more complicated than what's in the patch now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
* Matheus de Oliveira (matioli.math...@gmail.com) wrote: Hi Hackers, Having read only the subject- +1 from me on the idea. Maybe +1000. I was facing a situation were we wanted to set temp_tablespaces to a tablespace on a ephemeral disk (yes, it is AWS ephemeral disk), and I know many users have faced the same situation. Although it seems safe to create a tablespace on ephemeral disks if you use it to store only temporary files (either created by queries or temp tables), PostgreSQL does not have such information, and can't safely prevent a careless user of creating a non-temporary relation on this tablespace. Right. So, what you guys think about letting PG know somehow that a tablespace is temporary? PG would need to enforce that it's only used for temporary objects as well, of course.. Or at least, that was my thinking on this. I have took some small time to make a PoC just to see if that is doable. And so I did a new syntax like: CREATE TABLESPACE spcname [TEMP | TEMPORARY] LOCATION ... So, if TEMP or TEMPORARY is present, I mark a new column at pg_tablespace as true. On every table creation or moving to a new tablespace, I just check this, and fails if the tablespace is temporary but the relpersistence says the table is not. Not sure about that specific syntax (don't we have SET options now?) but I do like the general idea. The other part is, every time some query or relation is asked to be stored on this tablespace, I create (on-demand) the PG_TEMP_FILES_DIR inside of it (also if it is temporary). The attached patch (and also on my Github, [1]), shows the PoC. For now, I'm not worried about the code quality, there are yet a lot of work to be done there (like ALTER TABLESPACE, better testing, use relcache, etc...), and it is my first hacking on PG (so I'm a newbie). But I'd like to hear from you guys if such feature is desirable and if I could starting working on that for real. Also some thoughts about better way of implementing it. Yeah, +1 here and it should go into an appropriate commitfest to get a proper review. This would be *really* nice to have. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] comparison operators
* Andrew Dunstan (and...@dunslane.net) wrote: I think I'd rather just say for many data types or something along those lines, rather than imply that there is some obvious rule that users should be able to intuit. Perhaps with a link to where the informaiton about which exist is available..? Or a query to get the list? In general, I agree with you. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Minmax indexes
On Tue, Jun 17, 2014 at 2:16 PM, Andres Freund and...@2ndquadrant.com wrote: But I'm not trying to say that we absolutely have to support that kind of thing; what I am trying to say is that there should be a README or a mailing list post or some such that says: We thought about how generic to make this. We considered A, B, and C. We rejected C as too narrow, and A because if we made it that general it would have greatly enlarged the disk footprint for the following reasons. Therefore we selected B. Isn't 'simpler implementation' a valid reason that's already been discussed onlist? Obviously simpler implementation doesn't trump everything, but it's one valid reason... Note that I have zap to do with the design of this feature. I work for the same company as Alvaro, but that's pretty much it... It really *hasn't* been discussed on-list. See these emails, discussing the same ideas, from 8 months ago: http://www.postgresql.org/message-id/5249b2d3.6030...@vmware.com http://www.postgresql.org/message-id/ca+tgmoyscbw-uc8lqv96szignxsuzayqbfdqmk-fgu22hdk...@mail.gmail.com Now, Alvaro did not respond to those emails, nor did anyone involved in the development of the feature. There may be an argument that implementing that would be too complicated, but Heikki said he didn't think it would be, and nobody's made a concrete argument as to why he's wrong (and Heikki knows a lot about indexing). Basically, I think Heikki asked a good question - which was could we abstract this more? - and I can't recall seeing a clear answer explaining why we could or couldn't and what the trade-offs would be. 'could we abstract more' imo is a pretty bad design guideline. It's 'is there benefit in abstracting more'. Otherwise you end up with way to complicated systems. On the flip side, if you don't abstract enough, you end up being able to cover only a small set of the relevant use cases, or else you end up with a bunch of poorly-coordinated tools to cover slightly different use cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] datistemplate of pg_database does not behave as per description in documentation
On Wed, Jun 18, 2014 at 7:47 PM, MauMau maumau...@gmail.com wrote: From: Rajeev rastogi rajeev.rast...@huawei.com Please find the attached patch with only documentation change. I marked this as ready for committer. The patch is good because it matches the description in the following page: http://www.postgresql.org/docs/devel/static/manage-ag-templatedbs.html ISTM that this patch has already been committed by Bruce. Please see the commit 72590b3a69baaf24d1090a2c2ceb9181be34043e Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello , I have a few preliminary comments about your patch Thank you for review comments. the patch creates src/common/lz4/.travis.yml, which it shouldn't. Agree. I will remove it. Shouldn't this use palloc? palloc() is disallowed in critical sections and we are already in CS while executing this code. So we use malloc(). It's OK since the memory is allocated just once per session and it stays till the end. At the very minimum, I would move the if (!compressed_pages_allocated) block outside the for (i = 0; i XLR_MAX_BKP_BLOCKS; i++) loop,. Yes , the code for allocating memory is being executed just once for each run of the program so it can be taken out of the for loop. But as the condition if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF !compressed_pages_allocated) evaluates to be true for just the first loop , I am not sure if the change will be a significant improvement from performance point of view except it will save few condition checks. and add some comments. I think we could live with that I will add comments. If we were going to keep multiple compression algorithms around, I'd be inclined to create a pg_compress(…, compression_algorithm) function to hide these return-value differences from the callers. and a pg_decompress() function that does error checking +1 for abstracting out the differences in the return values and arguments and provide a common interface for all compression algorithms. if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_SNAPPY) { if (pg_snappy_compress(page, BLCKSZ, buf) == EIO) return NULL; } else if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_LZ4) { if (pg_LZ4_compress(page, BLCKSZ, buf) == 0) return NULL; } else if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_PGLZ) { if (pglz_compress(page, BLCKSZ, (PGLZ_Header *) buf, PGLZ_strategy_default) != 0) return NULL; } else elog(ERROR, Wrong value for compress_backup_block GUC); /* * …comment about insisting on saving at least two bytes… */ if (VARSIZE(buf) = orig_len - 2) return NULL; *len = VARHDRSIZE + VARSIZE(buf); return buf; I guess it doesn't matter *too* much if the intention is to have all these compression algorithms only during development/testing and pick just one in the end. But the above is considerably easier to read in the meanwhile. The above version is better as it avoids goto statement. I don't mind the suggestion elsewhere in this thread to use full_page_compression = y (as a setting alongside torn_page_protection = x). This change of GUC is in the ToDo for this patch. Thank you, Rahila On Tue, Jun 17, 2014 at 5:17 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: At 2014-06-13 20:07:29 +0530, rahilasye...@gmail.com wrote: Patch named Support-for-lz4-and-snappy adds support for LZ4 and Snappy in PostgreSQL. I haven't looked at this in any detail yet, but I note that the patch creates src/common/lz4/.travis.yml, which it shouldn't. I have a few preliminary comments about your patch. @@ -84,6 +87,7 @@ boolXLogArchiveMode = false; char*XLogArchiveCommand = NULL; bool EnableHotStandby = false; bool fullPageWrites = true; +int compress_backup_block = false; I think compress_backup_block should be initialised to BACKUP_BLOCK_COMPRESSION_OFF. (But see below.) + for (j = 0; j XLR_MAX_BKP_BLOCKS; j++) + compressed_pages[j] = (char *) malloc(buffer_size); Shouldn't this use palloc? + * Create a compressed version of a backup block + * + * If successful, return a compressed result and set 'len' to its length. + * Otherwise (ie, compressed result is actually bigger than original), + * return NULL. + */ +static char * +CompressBackupBlock(char *page, uint32 orig_len, char *dest, uint32 *len) +{ First, the calling convention is a bit strange. I understand that you're pre-allocating compressed_pages[] so as to avoid repeated allocations; and that you're doing it outside CompressBackupBlock so as to avoid passing in the index i. But the result is a little weird. At the very minimum, I would move the if (!compressed_pages_allocated) block outside the for (i = 0; i XLR_MAX_BKP_BLOCKS; i++) loop, and add some comments. I think we could live with that. But I'm not at all fond of the code in this function either. I'd write it like this: struct varlena *buf = (struct varlena *) dest; if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_SNAPPY) { if (pg_snappy_compress(page, BLCKSZ, buf) == EIO) return NULL; } else if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_LZ4) { if (pg_LZ4_compress(page, BLCKSZ, buf) == 0) return NULL; } else if
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2014-06-18 18:10:34 +0530, Rahila Syed wrote: Hello , I have a few preliminary comments about your patch Thank you for review comments. the patch creates src/common/lz4/.travis.yml, which it shouldn't. Agree. I will remove it. Shouldn't this use palloc? palloc() is disallowed in critical sections and we are already in CS while executing this code. So we use malloc(). It's OK since the memory is allocated just once per session and it stays till the end. malloc() isn't allowed either. You'll need to make sure all memory is allocated beforehand Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
At 2014-06-18 18:10:34 +0530, rahilasye...@gmail.com wrote: palloc() is disallowed in critical sections and we are already in CS while executing this code. So we use malloc(). Are these allocations actually inside a critical section? It seems to me that the critical section starts further down, but perhaps I am missing something. Second, as Andres says, you shouldn't malloc() inside a critical section either; and anyway, certainly not without checking the return value. I am not sure if the change will be a significant improvement from performance point of view except it will save few condition checks. Moving that allocation out of the outer for loop it's currently in is *nothing* to do with performance, but about making the code easier to read. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
At 2014-06-18 18:25:34 +0530, a...@2ndquadrant.com wrote: Are these allocations actually inside a critical section? It seems to me that the critical section starts further down, but perhaps I am missing something. OK, I was missing that XLogInsert() itself can be called from inside a critical section. So the allocation has to be moved somewhere else altogether. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Wed, Jun 18, 2014 at 6:25 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: At 2014-06-18 18:10:34 +0530, rahilasye...@gmail.com wrote: palloc() is disallowed in critical sections and we are already in CS while executing this code. So we use malloc(). Are these allocations actually inside a critical section? It seems to me that the critical section starts further down, but perhaps I am missing something. ISTM XLogInsert() itself is called from other critical sections. See heapam.c for example. Second, as Andres says, you shouldn't malloc() inside a critical section either; and anyway, certainly not without checking the return value. I was actually surprised to see Andreas comment. But he is right. OOM inside CS will result in a PANIC. I wonder if we can or if we really do enforce that though. The code within #ifdef WAL_DEBUG in the same function is surely doing a palloc(). That will be caught since there is an assert inside palloc(). May be nobody tried building with WAL_DEBUG since that assert was added. May be Rahila can move that code to InitXLogAccess or even better check for malloc() return value and proceed without compression. There is code in snappy.c which will need similar handling, if we decide to finally add that to core. I am not sure if the change will be a significant improvement from performance point of view except it will save few condition checks. Moving that allocation out of the outer for loop it's currently in is *nothing* to do with performance, but about making the code easier to read. +1. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Proposal for CSN based snapshots
On Tue, Jun 17, 2014 at 9:00 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 06/18/2014 12:41 AM, Robert Haas wrote: On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 05/30/2014 11:14 PM, Heikki Linnakangas wrote: Yeah. To recap, the failure mode is that if the master crashes and restarts, the transaction becomes visible in the master even though it was never replicated. Wouldn't another pg_clog bit for the transaction be able to sort that out? How? A flag to indicate that the tx is locally committed but hasn't been confirmed by a streaming synchronous replica, so it must not become visible until the replica confirms it or SR is disabled. Then scan pg_clog on start / replica connect and ask the replica to confirm local commit for those tx's. No? No. Otherwise, one of those bits could get changed after a backend takes a snapshot and before it finishes using it - so that the transaction snapshot is in effect changing underneath it. You could avoid that by memorizing the contents of CLOG when taking a snapshot, but that would defeat the whole purpose of CSN-based snapshots, which is to make the small and fixed-size. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL replay bugs
On Tue, Jun 17, 2014 at 5:40 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Jun 18, 2014 at 1:40 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jun 2, 2014 at 8:55 AM, Michael Paquier michael.paqu...@gmail.com wrote: I'm not sure if this is reasonably possible, but one thing that would make this tool a whole lot easier to use would be if you could make all the magic happen in a single server. For example, suppose you had a background process that somehow got access to the pre and post images for every buffer change, and the associated WAL record, and tried applying the WAL record to the pre-image to see whether it got the corresponding post-image. Then you could run 'make check' or so and afterwards do something like psql -c 'SELECT * FROM wal_replay_problems()' and hopefully get no rows back. So your point is to have a 3rd independent server in the process that would compare images taken from a master and its standby? Seems to complicate the machinery. No, I was trying to get it down form 2 servers to 1, not 2 servers up to 3. Don't get me wrong, having this tool at all sounds great. But I think to really get the full benefit out of it we need to be able to run it in the buildfarm, so that if people break stuff it gets noticed quickly. The patch I sent has included a regression test suite making the tests rather facilitated: that's only a matter of running actually make check in the contrib repository containing the binary able to compare buffer captures between a master and a standby. Cool! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] comparison operators
Stephen Frost sfr...@snowman.net writes: * Andrew Dunstan (and...@dunslane.net) wrote: I think I'd rather just say for many data types or something along those lines, rather than imply that there is some obvious rule that users should be able to intuit. Perhaps with a link to where the informaiton about which exist is available..? Or a query to get the list? Queries for this sort of thing are covered in the chapter about index opclasses. The basic query would be like select opcintype::regtype from pg_opclass where opcmethod = 403 and opcdefault; but I'm not sure if this is completely useful; it's not obvious for example that the text opclass is also used for varchar. Another point is that some of the operators aren't named in the conventional way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
On Tue, Jun 17, 2014 at 9:45 PM, Brightwell, Adam adam.brightw...@crunchydatasolutions.com wrote: I absolutely appreciate all of the feedback that has been provided. It has been educational. To your point above, I started putting together a wiki page, as Stephen has spoken to, that is meant to capture these concerns and considerations as well as to capture ideas around solutions. https://wiki.postgresql.org/wiki/Row_Security_Considerations This page is obviously not complete, but I think it is a good start. Hopefully this document will help to continue the conversation and assist in addressing all the concerns that have been brought to the table. As well, I hope that this document serves to demonstrate our intent and that we *are* taking these concerns seriously. I assure you that as one of the individuals who is working towards the acceptance of this feature/patch, I am very much concerned about meeting the expected standards of quality and security. Cool, thanks for weighing in. I think that page is a good start. An item that I think should be added there is the potential overlap between security_barrier views and row-level security. How can we reuse code (and SQL syntax?) for existing features like WITH CHECK OPTION instead of writing new code (and inventing new syntax) for very similar concepts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for CSN based snapshots
On 06/18/2014 09:12 PM, Robert Haas wrote: No. Otherwise, one of those bits could get changed after a backend takes a snapshot and before it finishes using it - so that the transaction snapshot is in effect changing underneath it. You could avoid that by memorizing the contents of CLOG when taking a snapshot, but that would defeat the whole purpose of CSN-based snapshots, which is to make the small and fixed-size. Ah. Thankyou. I appreciate the explanation. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cube distance patch?
Hi, In September 2013, there was patch sent by Stas Kelvich ( http://www.postgresql.org/message-id/9e07e159-e405-41e2-9889-a04f534fc...@gmail.com) that adds indexable kNN searches to cube contrib module. What is needed so that it could get committed? Regards, depesz
Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict
Craig Ringer cr...@2ndquadrant.com writes: I posted about a possible packaging issue with RHEL 6 PGDG packages for 9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a prior mail asking about what happened with moving to git hasn't had a response). http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com Why in the world is PGDG installing something named libevent at all? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] datistemplate of pg_database does not behave as per description in documentation
From: Fujii Masao masao.fu...@gmail.com On Wed, Jun 18, 2014 at 7:47 PM, MauMau maumau...@gmail.com wrote: I marked this as ready for committer. The patch is good because it matches the description in the following page: http://www.postgresql.org/docs/devel/static/manage-ag-templatedbs.html ISTM that this patch has already been committed by Bruce. Please see the commit 72590b3a69baaf24d1090a2c2ceb9181be34043e Oh, the devel doc certainly reflects the change: http://www.postgresql.org/docs/devel/static/catalog-pg-database.html I marked this as committed. I hope Magnus-san's enhancements to the CommitFest app will enable the CommitFest entry to be automatically updated at git commit. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict
On 06/18/2014 09:34 AM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: I posted about a possible packaging issue with RHEL 6 PGDG packages for 9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a prior mail asking about what happened with moving to git hasn't had a response). http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com Why in the world is PGDG installing something named libevent at all? The only thing I can think of offhand that would require it is pgbouncer. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cube distance patch?
At 2014-06-18 15:33:37 +0200, dep...@gmail.com wrote: Hi, In September 2013, there was patch sent by Stas Kelvich ( http://www.postgresql.org/message-id/9e07e159-e405-41e2-9889-a04f534fc...@gmail.com) that adds indexable kNN searches to cube contrib module. What is needed so that it could get committed? I suggest adding it to the 2014-08 CF so that it gets attention in the next round of reviews. I see some correspondence about it from Sergey Konoplev and Alexander Korotkov (added to Cc:) in March suggesting that it'll be resubmitted with changes for 9.5: http://www.postgresql.org/message-id/CAL_0b1uaGVBnsqx-G6G2Q+N6eGppG89ry6FF0n=yel+yq9a...@mail.gmail.com Perhaps Alexander can comment if there's an updated version of the patch in the works. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: Yeah, I was thinking something like this could work, but I would go further. Suppose you had separate GRANTable privileges for direct access to individual tables, bypassing RLS, e.g. GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name So, is this one new privilege (DIRECT) or four separate new privileges that are variants of the existing privileges (DIRECT SELECT, DIRECT INSERT, DIRECT UPDATE, DIRECT DELETE)? I had taken it to be a single privilege, but you're right, it could be done for each of those.. I really don't think we have the bits for more than one case here though (if that) without a fair bit of additional rework. I'm not against that rework (and called for it wayyy back when I proposed the TRUNCATE privilege, as I recall) but that's a whole different challenge and no small bit of work.. Technically, there are 4 bits left, and that's what we need for separate privileges. We last consumed bits in 2008 (for TRUNCATE) and 2006 (for GRANT ON DATABASE), so even if we used all of the remaining bits it might be another 5 years before anyone has to do that refactoring. But even if the refactoring needs to be done now for some reason, it's only June, and the last CommitFest doesn't start until February 15th. I think we're being way too quick to jump to talking about what can and can't be done in time for 9.5. Let's start by figuring out how we'd really like it to work and then, if it's too ambitious, we can scale it back. My main concern about using only one bit is that someone might want to allow a user to bypass RLS on SELECT while still enforcing it for data-modifying operations. That seems like a plausible use case to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clarification of FDW API Documentation
--On 13. Juni 2014 13:46:38 -0400 Tom Lane t...@sss.pgh.pa.us wrote: Imagine if `BeginForeignScan` set up a remote cursor and `IterateForeignScan` just fetched _one tuple at a time_ (unlike the current behavior where they are fetched in batches). The tuple would be passed to `ExecForeignDelete` (as is required), but the remote cursor would remain pointing at that tuple. Couldn't `ExecForeignDelete` just call `DELETE FROM table WHERE CURRENT OF cursor` to then delete that tuple? No. This is not guaranteed (or even likely) to work in join cases: the tuple to be updated/deleted might no longer be the current one of the scan. You *must* arrange for the scan to return enough information to uniquely identify the tuple later, and that generally means adding some resjunk columns. Yeah, this is exactly the trap i ran into while implementing the informix_fdw driver. It used an updatable cursor to implement the modify actions as you proposed first. Consider a query like UPDATE remote SET f1 = t.id FROM local t WHERE t.id = f1 The planner might choose a hash join where the hash table is built by forwarding the cursor via the foreign scan. You'll end up with the cursor positioned at the end and you have no way to get it back in sync when the modify action is actually called. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
On Wed, Jun 18, 2014 at 9:00 AM, Stephen Frost sfr...@snowman.net wrote: I have took some small time to make a PoC just to see if that is doable. And so I did a new syntax like: CREATE TABLESPACE spcname [TEMP | TEMPORARY] LOCATION ... So, if TEMP or TEMPORARY is present, I mark a new column at pg_tablespace as true. On every table creation or moving to a new tablespace, I just check this, and fails if the tablespace is temporary but the relpersistence says the table is not. Not sure about that specific syntax (don't we have SET options now?) but I do like the general idea. Maybe something like that: CREATE TABLESPACE spcname LOCATION '/foo/bar' WITH (only_temp_relations = true); Have in mind you must take care if you use ALTER TABLESPACE spcname SET (...) to guarantee that exists only temp objects stored in the target tablespace, and if exists a regular object you must throw an exception. Makes sense? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] comparison operators
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: * Andrew Dunstan (and...@dunslane.net) wrote: I think I'd rather just say for many data types or something along those lines, rather than imply that there is some obvious rule that users should be able to intuit. Perhaps with a link to where the informaiton about which exist is available..? Or a query to get the list? Queries for this sort of thing are covered in the chapter about index opclasses. The basic query would be like Right, a link to there from this would be useful, imv. select opcintype::regtype from pg_opclass where opcmethod = 403 and opcdefault; but I'm not sure if this is completely useful; it's not obvious for example that the text opclass is also used for varchar. Another point is that some of the operators aren't named in the conventional way. Good point. Hopefully a link over to the index-opclasses.html would be helpful to users exploring these questions. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] comparison operators
Andrew Dunstan wrote On 06/17/2014 07:25 PM, Andres Freund wrote: On 2014-06-17 19:22:07 -0400, Tom Lane wrote: Andrew Dunstan lt; andrew@ gt; writes: I went to have a look at documenting the jsonb comparison operators, and found that the docs on comparison operators contain this: Comparison operators are available for all relevant data types. They neglect to specify further, however. This doesn't seem very satisfactory. How is a user to know which are relevant? I know they are not available for xml and json, but are for jsonb. Just talking about all relevant types seems rather hand-wavy. Well, there are 38 default btree opclasses in the standard system ATM. Are we worried enough about this to list them all explicitly? Given the lack of complaints to date, I'm not. I think I'd rather just say for many data types or something along those lines, rather than imply that there is some obvious rule that users should be able to intuit. Ideal world for me: we'd list the data types that do not provide comparison operators (or not a full set) by default with links to the section in the documentation where the reasoning for said omission is explained and/or affirmed. My other reaction is that referring to data types at all in this section is unnecessary - other than maybe to state (which it does not currently) that both sides of the comparison must be of the same (or binary equivalent, like text/varchar) type or there must exist an implicit cast for one of the operands. Much of that knowledge is implied and well understood though, as is the fact that operators are closely associated with data types. IOW - I would be fine with removing Comparison operators are available for all relevant data types and not replacing it with anything. Though for many data types is my preferred equivalent phrase for the same reasons Andrew noted. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/comparison-operators-tp5807654p5807757.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] comparison operators
David G Johnston david.g.johns...@gmail.com writes: Andrew Dunstan wrote I think I'd rather just say for many data types or something along those lines, rather than imply that there is some obvious rule that users should be able to intuit. Ideal world for me: we'd list the data types that do not provide comparison operators (or not a full set) by default with links to the section in the documentation where the reasoning for said omission is explained and/or affirmed. I was just wondering whether that wouldn't be a shorter list. It's not hard to get the base types that don't have btree opclasses: select typname from pg_type where not exists (select 1 from pg_opclass where opcmethod = 403 and opcdefault and opcintype = pg_type.oid) and typtype = 'b' and not (typelem!=0 and typlen=-1) order by 1; typname --- aclitem box cid cidr circle gtsvector json line lseg path pg_node_tree point polygon refcursor regclass regconfig regdictionary regoper regoperator regproc regprocedure regtype smgr txid_snapshot unknown varchar xid xml (28 rows) although this is misleading because some of these are binary-coercible to indexable types, which means that the indexable type's opclass works for them. Eliminating those, we get select typname from pg_type where not exists (select 1 from pg_opclass where opcmethod = 403 and opcdefault and binary_coercible(pg_type.oid, opcintype)) and typtype = 'b' and not (typelem!=0 and typlen=-1) order by 1; typname --- aclitemhaven't bothered, no obvious sort order anyway boxno linear sort order cidhaven't bothered circle no linear sort order gtsvector internal type, wouldn't be useful json line no linear sort order lseg no linear sort order path no linear sort order point no linear sort order polygonno linear sort order refcursor haven't bothered smgr useless legacy type txid_snapshot no linear sort order unknownthere are no operations for 'unknown' xidno linear sort order (yes, really) xml (17 rows) So really we're pretty close to being able to say there are comparison operators for every built-in type for which it's sensible. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict
Hi, On Wed, 2014-06-18 at 09:41 -0400, Andrew Dunstan wrote: The only thing I can think of offhand that would require it is pgbouncer. Yeah. Recent pgbouncer versions require libevent 2.0+, and RHEL 6 has 1.4. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR signature.asc Description: This is a digitally signed message part
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
* Robert Haas (robertmh...@gmail.com) wrote: On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote: I had taken it to be a single privilege, but you're right, it could be done for each of those.. I really don't think we have the bits for more than one case here though (if that) without a fair bit of additional rework. I'm not against that rework (and called for it wayyy back when I proposed the TRUNCATE privilege, as I recall) but that's a whole different challenge and no small bit of work.. Technically, there are 4 bits left, and that's what we need for separate privileges. I'd really hate to chew them all up.. We last consumed bits in 2008 (for TRUNCATE) and 2006 (for GRANT ON DATABASE), so even if we used all of the remaining bits it might be another 5 years before anyone has to do that refactoring. Perhaps, or we might come up with some new whiz-bang permission to add next year. :/ But even if the refactoring needs to be done now for some reason, it's only June, and the last CommitFest doesn't start until February 15th. I think we're being way too quick to jump to talking about what can and can't be done in time for 9.5. Let's start by figuring out how we'd really like it to work and then, if it's too ambitious, we can scale it back. Alright- perhaps we can discuss what kind of refactoring would be needed for such a change then, to get a better idea as to the scope of the change and the level of effort required. My thoughts on how to address this were to segregate the ACL bits by object type. That is to say, the AclMode stored for databases might only use bits 0-2 (create/connect/temporary), while tables would use bits 0-7 (insert/select/update/delete/references/trigger). This would allow us to more easily add more rights at the database and/or tablespace level too. My main concern about using only one bit is that someone might want to allow a user to bypass RLS on SELECT while still enforcing it for data-modifying operations. That seems like a plausible use case to me. I absolutely agree that it's a real use-case and one which we should support, just trying to avoid biting off more than can be done between now and February. Still, if we get things hammered out and more-or-less agreement on the way forward, getting the code written may move quickly. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost sfr...@snowman.net wrote: Yeah, if we have to ask an external security module a question for each row, there's little hope of any real optimization. However, I think there will be a significant number of cases where people will want filtering clauses that can be realized by doing an index scan instead of a sequential scan, and if we end up forcing a sequential scan anyway, the feature will be useless to those people. I agree that we want to support that, if we can do so reasonably. What I was trying to get at is simply this- don't we provide that already with the leakproof attribute and functions? If we don't have enough there to allow index scans then we should be looking to add more, I'm thinking. So the reason why we got onto this particular topic was because of the issue of multiple security policies for a single table. Of course, multiple security policies can always be merged into a single more-complex policy, but the resulting policy may be so complex that the query-planner is no longer capable of doing a good job optimizing it. I won't mention here exactly what a certain large commercial database vendor has implemented here; suffice it to say, however, that their design avoids this pitfall, and ours currently does not. I agree on this point, but I'm still hopeful that we'll be able to get a good feature into 9.5. There are quite a few resources available for the 'just programming' part, so the long pole in the tent here is absolutely hashing out what we want and how it should function. Agreed. I'd be happy to host or participate in a conference call or similar if that would be useful to move this along- or we can continue to communicate via email. There's a bit of a lull in conferences to which I'm going to right now, so in person is unlikely, unless folks want to get together somewhere on the east coast (I'd be happy to travel to Philly, Pittsburgh, NYC, etc, if it'd help..). For me, email is easiest; but there are other options, too. What solution did you come up with for this case, which performed well and was also secure..? I put the logic in the client. :-( Well, that's not helpful here. ;) Sure. The reason I brought it up is to say - hey, look, I had this come up in the real world. What would it take to be able to do actually do it in the database server? And the answer is - something that will handle multiple security policies cleanly. But I'm not sure; that feels like it's giving something up that might be important. And I think that the kinds of syntax we're discussing won't support leaving that out of the initial version and adding it later, so if we commit to this syntax, we're stuck with that behavior. To avoid that, we'd need something like this: ALTER TABLE tab ADD POLICY polname WHERE quals; GRANT SELECT (polname) ON TABLE tab TO role; Right, if we were to support multiple policies on a given table then we would have to support adding and removing them individually, as well as specify when they are to be applied- and what if that when overlaps? Do we apply both and only a row which passed them all gets sent to the user? Essentially we'd be defining the RLS policies to be AND'd together, right? Would we want to support both AND-based and OR-based, and allow users to pick what set of conditionals they want applied to their various overlapping RLS policies? AND is not a sensible policy; it would need to be OR. If you grant someone access to two different subsets of the rows in a table, it stands to reason that they will expect to have access to all of the rows that are in at least one of those subsets. If you give someone your car key and your house key, that means they can operate your car or enter your house; it does not mean that they can operate your car but only when it's inside your garage. Alternatively, we could: - Require the user to specify in some way which of the available policies they want applied, and then apply only that one. or - Decide that such scenarios constitute misconfiguration. Throw an error and make the table owner or other relevant local authority fix it. Sounds all rather painful and much better done programatically by the user in a language which is suited to that task- eg: pl/pgsql, perl, C, or something besides our ALTER syntax + catalog representation. I think exactly the opposite, for the query planning reasons previously stated. I think the policies will quickly get so complicated that they're no longer optimizable. Here's a simple example: - Policy 1 allows the user to access rows for which complexfunc() returns true. - Policy 2 allows the user to access rows for which a = 1. Most users have access only through policy 2, but some have access through policy 1. Users who have access through policy 1 will always get a sequential scan, but users who have access through policy 2 have an excellent chance of getting an index
Re: [HACKERS] RHEL6 packaging issue for 9.4 beta - libevent conflict
On 06/18/2014 06:34 AM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: I posted about a possible packaging issue with RHEL 6 PGDG packages for 9.4beta on pgsql-yum-pkg, but things are pretty quiet over there (a prior mail asking about what happened with moving to git hasn't had a response). http://www.postgresql.org/message-id/53a10ef9.9060...@2ndquadrant.com Why in the world is PGDG installing something named libevent at all? pg_bouncer. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc If we send our children to Caesar for their education, we should not be surprised when they come back as Romans. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
On Wed, Jun 18, 2014 at 10:40 AM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Tue, Jun 17, 2014 at 10:06 PM, Stephen Frost sfr...@snowman.net wrote: I had taken it to be a single privilege, but you're right, it could be done for each of those.. I really don't think we have the bits for more than one case here though (if that) without a fair bit of additional rework. I'm not against that rework (and called for it wayyy back when I proposed the TRUNCATE privilege, as I recall) but that's a whole different challenge and no small bit of work.. Technically, there are 4 bits left, and that's what we need for separate privileges. I'd really hate to chew them all up.. Usually it's the patch author who WANTS to chew up all the available bit space and OTHER people who say no. :-) We last consumed bits in 2008 (for TRUNCATE) and 2006 (for GRANT ON DATABASE), so even if we used all of the remaining bits it might be another 5 years before anyone has to do that refactoring. Perhaps, or we might come up with some new whiz-bang permission to add next year. :/ Well, people proposed separate permissions for things like VACUUM and ANALYZE around the time TRUNCATE was added, and those were rejected on the grounds that they didn't add enough value to justify wasting bits on them. I think we see whether there's a workable system that such that marginal permissions (like TRUNCATE) that won't be checked often don't have to consume bits. But even if the refactoring needs to be done now for some reason, it's only June, and the last CommitFest doesn't start until February 15th. I think we're being way too quick to jump to talking about what can and can't be done in time for 9.5. Let's start by figuring out how we'd really like it to work and then, if it's too ambitious, we can scale it back. Alright- perhaps we can discuss what kind of refactoring would be needed for such a change then, to get a better idea as to the scope of the change and the level of effort required. My thoughts on how to address this were to segregate the ACL bits by object type. That is to say, the AclMode stored for databases might only use bits 0-2 (create/connect/temporary), while tables would use bits 0-7 (insert/select/update/delete/references/trigger). This would allow us to more easily add more rights at the database and/or tablespace level too. Yeah, that's another idea. But it really deserves its own thread. I'm still not convinced we have to do this at all to meet this need, but that should be argued back and forth on that other thread. My main concern about using only one bit is that someone might want to allow a user to bypass RLS on SELECT while still enforcing it for data-modifying operations. That seems like a plausible use case to me. I absolutely agree that it's a real use-case and one which we should support, just trying to avoid biting off more than can be done between now and February. Still, if we get things hammered out and more-or-less agreement on the way forward, getting the code written may move quickly. Nifty. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics hardware support table supported architectures
On Tue, Jun 17, 2014 at 1:55 PM, Andres Freund and...@2ndquadrant.com wrote: But the concern is more whether 1 byte can actually be written without also writing neighbouring values. I.e. there's hardware out there that'll implement a 1byte store as reading 4 bytes, changing one of the bytes in a register, and then write the 4 bytes out again. Which would mean that a neighbouring write will possibly cause a wrong value to be written... Ah, OK. I've heard about that before, but had forgotten. What happens is that gcc will do a syscall triggering the kernel to turn of scheduling; perform the math and store the result; turn scheduling on again. That way there cannot be a other operation influencing the calculation/store. Imagine if you have hardware that, internally, only does stores in 4 byte units. Even if it's a single CPU machine, which most of those are, the kernel could schedule a separate process after the first 4bytes have been written. Oops. The kernel has ways to prevent that, userspace doesn't... Interesting. Turn off scheduling sounds like a pretty dangerous syscall. Does somebody want other columns in there? I think the main question at the developer meeting was how far we want to go with supporting primitives like atomic add, atomic and, atomic or, etc. So I think we should add columns for those. Well, once CAS is available, atomic add etc is all trivially implementable - without further hardware support. It might be more efficient to use the native instruction (e.g. xadd can be much better than a cmpxchg loop because there's no retries), but that's just optimization that won't matter unless you have a fair bit of concurrency. There's currently fallbacks like: #ifndef PG_HAS_ATOMIC_FETCH_ADD_U32 #define PG_HAS_ATOMIC_FETCH_ADD_U32 STATIC_IF_INLINE uint32 pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 add_) { uint32 old; while (true) { old = pg_atomic_read_u32_impl(ptr); if (pg_atomic_compare_exchange_u32_impl(ptr, old, old + add_)) break; } return old; } I understand, but the performance characteristics are quite different. My understanding from the developer meeting was that we'd be OK with having, say, three levels of support for atomic ops: all ops supported, only TAS, none. Or maybe four: all ops, CAS + TAS, only TAS, none. But I think there was resistance (in which I participate) to the idea of, say, having platform 1 with add but not and and or, platform 2 with and and or but not add, platform 3 with both, platform 4 with neither, etc. Then it becomes too hard for developers to predict whether something that is a win on their platform will be a loss on some other platform. 3) sparcv8: Last released model 1997. I seem to recall hearing about this in a customer situation relatively recently, so there may be a few of these still kicking around out there. Really? As I'd written in a reply solaris 10 (released 2005) dropped support for it. Dropping support for a platform that's been desupported 10 years ago by it's manufacturer doesn't sound bad imo... We definitely have at least one customer using Solaris 9. I don't know their architecture for certain, but they did recently install a new version of PostgreSQL. 4) i386: Support dropped from windows 98 (yes, really), linux, openbsd (yes, really), netbsd (yes, really). No code changes needed. Wow, OK. In that case, yeah, let's dump it. But let's make sure we adequately document that someplace in the code comments, along with the reasons, because not everyone may realize how dead it is. I'm generally wondering how to better document the supported os/platform combinations. E.g. it's not apparent that we only support certain platforms on a rather limited set of compilers... Maybe a table with columns like: platform, platform version, supported-OSs, supported-compilers? Sounds worth at try. 6) armv-v5 I think this is also a bit less dead than the other ones; Red Hat's shows Bugzilla shows people filing bugs for platform-specific problems as recently as January of 2013: https://bugzilla.redhat.com/show_bug.cgi?id=892378 Closed as WONTFIX :P. Joking aside, I think there are still usecases for arm-v5 - but it's embedded stuff without a real OS and such. Nothing you'd install PG on. There's distributions that are dropping ARMv6 support already... My biggest problem is that it's not even documented whether v5 has atomic 4byte stores - while it's documted for v6. I think in doubtful cases we might as well keep the support in. If you've got the fallback to non-atomics, keeping the other code around doesn't hurt much, and might make it easier for someone who is interested in one of those platforms. It's fine and good to kill things that are totally dead, but I think it's better for a user of some obscure platform to find that it
Re: [HACKERS] Quantify small changes to predicate evaluation
On Tue, Jun 17, 2014 at 10:23 AM, Dennis Butterstein soullinu...@web.de wrote: Hi Marti, thank you for your quick reply. I tried the proposed tweaks and see some differences regarding the measurements. It seems as if the overall query performance dropped a little what I think the disabled turbo boost mode is responsible for (all measurements are single query only). I think that kind of behaviour is somewhat expected. Run1: 26.559s Run2: 28.280s Unfortunately the variance between the runs seems to remain high. In fact I have exclusive access to the machine and I made sure not to run in any i/o related problems regarding buffer caches. Are there any other stumbling blocks I'm missing at the moment? Maybe I've to rethink my (end-to-end) measurement strategy. In your experience how small is it possible to get measurement variances on Postgres? Thank you very much. Kind regards, Dennis I find that it's possible to get smaller variations than what you're experiencing on read-only workloads without any special tuning, but variation on workloads that write data is much higher, similar to what you're seeing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] buildfarm client release 4.13
I have released version 4.13 of the PostgreSQL Buildfarm client. This can be downloaded from http://www.pgbuildfarm.org/downloads/releases/build-farm-4_13.tgz Changes in this release (from the git log): fcc182b Don't run TestCollateLinuxUTF8 on unsupported branches. 273af50 Don't run FileTextArrayFDW tests if not wanted. 9944a4a Don't remove the ccache directory on failure, unless configured to. 73e4187 Make ccache and vpath builds play nicely together. 9ff8c22 Work around path idiocy in msysGit. 0948ac7 revert naming change for log files ca68525 Exclude ecpg/tests from find_typedefs code. If you are using ccache, please note that there are adjustments to the recommended use pattern. The sample config file no longer suggests that the ccache directory have the branch name at the end. It is now recommended that you use a single cache for all branches for a particular member. To do this remove /$branch from the relevant line in your config file, if you have it, and remove those directories in the cache root. Your first run on each branch will rebuild some or all of the cache. My unifoed cache on crake is now at 615 Mb, rather than the multiples of Gb it had been previously. It is recommended that this release be deployed by all users fairly quickly because of the fix in log file names that was discarding some that were quite important. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 CF1
On Tue, Jun 17, 2014 at 6:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Robert Haas wrote: On Mon, Jun 16, 2014 at 2:36 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: P.S. If you tag your reviews with [REVIEW] in the Subject, it'll be easier to keep track of them. I and, I believe, various other people hate that style, because at least in Gmail, it breaks the threading. It is much easier to find things if they are all posted on one thread. Yes, please don't do that. A simple, normal reply to the message that submits the patch is much better from my point of view as a subsequent reviewer and committer. Worth noting also is that Magnus is working on a new version of the commitfest app that will be able to automatically keep track of threads about patches --- so long as they *are* threads according to our mailing list archives. I'm not sure if the archives recognize replies with a changed Subject: as being the same thread or not. The archives code does threading based on the headers (in-reply-to and references, in priority order). It completely ignores the subject when it comes to the threading. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] Is analyze_new_cluster.sh still useful?
Hi, now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't it make sense to get rid of the analyze_new_cluster.sh file which pg_upgrade writes? The net content is a single line which could as well be printed by pg_upgrade itself. Instead of an lengthy explanation how to invoke that manually, there should be a short note and a pointer to some manual section. I think the chances of people reading that would even be increased. Similary, I don't really see the usefulness of delete_old_cluster.sh as a file, when rm -rf could just be presented on the console for the admin to execute by cut-and-paste. Christoph -- Senior Berater, Tel.: +49 (0)21 61 / 46 43-187 credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209 Hohenzollernstr. 133, 41061 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer pgp fingerprint: 5C48 FE61 57F4 9179 5970 87C6 4C5A 6BAB 12D2 A7AE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BUG #10680 - ldapbindpasswd leaks to postgresql log
On Wed, Jun 18, 2014 at 4:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Steven Siebert smsi...@gmail.com writes: Attached is a proposed patch for BUG #10680. It's a simple fix to the problem of the ldapbindpasswd leaking in clear text to the postgresql log. The patch simply removes the raw pg_hba.conf line from the log message, but retains the log line number to assist admins in troubleshooting. You haven't exactly explained why this is a problem. The proposed patch would impede diagnosing of many other problems, so it's not going to get committed without a thoroughly compelling rationale. Yes, properly logging that was intentional, in commit 7f49a67f954db3e92fd96963169fb8302959576e. Hint: I don't store my postmaster log securely is not compelling. We've been over that ground before; there are far too many reasons why access to the postmaster log is a potential security hazard to justify concluding that this particular one is worse. Yeah, and the password is already in cleartext in a file next to it. If we actually feel the need to get rid of it, we should do a better job. Such as actively blanking it out with something else. Since we know the password (we parsed it out), it shouldn't be impossible to actually blank out *just the password*, without ruining all the other diagnostics usage of it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Atomics hardware support table supported architectures
On 2014-06-18 11:15:15 -0400, Robert Haas wrote: On Tue, Jun 17, 2014 at 1:55 PM, Andres Freund and...@2ndquadrant.com wrote: What happens is that gcc will do a syscall triggering the kernel to turn of scheduling; perform the math and store the result; turn scheduling on again. That way there cannot be a other operation influencing the calculation/store. Imagine if you have hardware that, internally, only does stores in 4 byte units. Even if it's a single CPU machine, which most of those are, the kernel could schedule a separate process after the first 4bytes have been written. Oops. The kernel has ways to prevent that, userspace doesn't... Interesting. Turn off scheduling sounds like a pretty dangerous syscall. The kernel does that internally, just during cmpxchg. It'll reenable interrupts before returning. There's hundreds of places inside at least linux doing that internally... Should you be interested in the details: https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt Does somebody want other columns in there? I think the main question at the developer meeting was how far we want to go with supporting primitives like atomic add, atomic and, atomic or, etc. So I think we should add columns for those. Well, once CAS is available, atomic add etc is all trivially implementable - without further hardware support. It might be more efficient to use the native instruction (e.g. xadd can be much better than a cmpxchg loop because there's no retries), but that's just optimization that won't matter unless you have a fair bit of concurrency. There's currently fallbacks like: #ifndef PG_HAS_ATOMIC_FETCH_ADD_U32 #define PG_HAS_ATOMIC_FETCH_ADD_U32 STATIC_IF_INLINE uint32 pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 add_) { uint32 old; while (true) { old = pg_atomic_read_u32_impl(ptr); if (pg_atomic_compare_exchange_u32_impl(ptr, old, old + add_)) break; } return old; } I understand, but the performance characteristics are quite different. There might be cases where that's true, but in the majority of cases where the variable isn't highly contended it's just about the same. In many cases the microcode will implement atomic ops using ll/sc operations internally anyway. My understanding from the developer meeting was that we'd be OK with having, say, three levels of support for atomic ops: all ops supported, only TAS, none. Or maybe four: all ops, CAS + TAS, only TAS, none. But I think there was resistance (in which I participate) to the idea of, say, having platform 1 with add but not and and or, platform 2 with and and or but not add, platform 3 with both, platform 4 with neither, etc. Then it becomes too hard for developers to predict whether something that is a win on their platform will be a loss on some other platform. I don't think developers really have to care. In nearly all the cases the above cmpxchg loop will loop precisely once. Usually the cacheline will stay in the exclusive state on that process, and that's it. In most cases what it'll be replacing will be something protected by a spinlock anyways - and the above isn't any worse than that. Note that we're far from the only one falling back to cmpxchg for all these ops - that's pretty much a standard fallback. I'm fine with always implementing everything but tas, cas, add ontop of cas for now. I want or/and/sub/... to be conveniently available to C code, but they don't need to be based on hardware ops. Having open-coded versions of the above in several places sounds like a horrible idea to me. Both, because we might want to optimize it at some point and because it's horrible readability wise. 3) sparcv8: Last released model 1997. I seem to recall hearing about this in a customer situation relatively recently, so there may be a few of these still kicking around out there. Really? As I'd written in a reply solaris 10 (released 2005) dropped support for it. Dropping support for a platform that's been desupported 10 years ago by it's manufacturer doesn't sound bad imo... We definitely have at least one customer using Solaris 9. I don't know their architecture for certain, but they did recently install a new version of PostgreSQL. But Solaris 9 doesn't imply sparcv8. Ultrasparc (first sparcv9) support was added in solaris 2.5... That's a *long* time ago. If my googling turned up the right information you needed to use special -xarch flags for sun studio to even compile binaries that are pure v8 (v8plus is fine btw) since sun studio 9... Same for gcc apparently (-mno-v8plus), although I don't know how far back. The reason I'd like to kill it is that there's basically zero chance of ever testing it and we'd need to write configure check that not only detect the suncc intrinsics but also runs
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
* Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote: On Wed, Jun 18, 2014 at 9:00 AM, Stephen Frost sfr...@snowman.net wrote: Not sure about that specific syntax (don't we have SET options now?) but I do like the general idea. Maybe something like that: CREATE TABLESPACE spcname LOCATION '/foo/bar' WITH (only_temp_relations = true); Yeah, that's more along the lines of what I was thinking. Have in mind you must take care if you use ALTER TABLESPACE spcname SET (...) to guarantee that exists only temp objects stored in the target tablespace, and if exists a regular object you must throw an exception. Makes sense? Yes. I'd definitely like to see an ALTER TABLESPACE option, with an ERROR that lists out all of the non-temporary objects which are found (and lists any other databases which have objects in those tablespaces..). That would allow administrators who have existing notionally temporary-only tablespaces to go clean things up to make them actually temporary-only. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Set new system identifier using pg_resetxlog
On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund and...@2ndquadrant.com wrote: I can clearly understand the utility of being able to reset the system ID to a new, randomly-generated system ID - but giving the user the ability to set a particular value of their own choosing seems like a pretty sharp tool. What is the use case for that? I've previously hacked this up adhoc during data recovery when I needed to make another cluster similar enough that I could replay WAL. Another usecase is to mark a database as independent from its origin. Imagine a database that gets sharded across several servers. It's not uncommon to do that by initially basebackup'ing the database to several nodes and then use them separately from thereon. It's quite useful to actually mark them as being distinct. Especially as several of them right now would end up with the same timeline id... Sure, but that only requires being able to reset the ID randomly, not to a particular value. But it seems to me that we might need to have a process discussion here, because, while I'm all in favor of incremental feature proposals that build towards a larger goal, it currently appears that the larger goal toward which you are building is not something that's been publicly discussed and debated on this list. And I really think we need to have that conversation. Obviously, individual patches will still need to be debated, but I feel like 2ndQuadrant is trying to construct a castle without showing the community the floor plan. I believe that there is relatively broad agreement that we would all like a castle, but different people may have legitimately different ideas about how it should be constructed. If the work arrives as a series of disconnected pieces (user-specified system ID, event triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has to take it on faith that those pieces are going to eventually fit together in a way that we'll all be happy with. In some cases, that's fine, because the feature is useful on its own merits whether it ends up being part of the castle or not. Uh. Right now this patch has been written because it's needed for a out of core replication solution. That's what BDR is at this point. The patch is unobtrusive, has other usecases than just our internal one and doesn't make pg_resetxlog even more dangerous than it already is. I don't see much problem with considering it on it's own cost/benefit? Well, I think it *does* make pg_resetxlog more dangerous; see previous discussion of pg_computemaxlsn. So this seems to be a concern that's relatively independent of this patch. Am I seing that right? Partially; not completely. I actually don't think any of the discussions I was involved in had the externally visible version of replication identifiers limited to 16bits? If you are referring to my patch, 16bits was just the width of the *internal* name that should basically never be looked at. User visible replication identifiers are always identified by an arbitrary string - whose format is determined by the user of the replication identifier facility. *BDR* currently stores the system identifer, the database id and a name in there - but that's nothing core needs to concern itself with. I don't think you're going to be able to avoid users needing to know about those IDs. The configuration table is going to have to be the same on all nodes, and how are you going to get that set up without those IDs being user-visible? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Tue, Jun 17, 2014 at 04:07:55PM -0400, Robert Haas wrote: On Sat, Jun 14, 2014 at 7:56 PM, Kevin Grittner kgri...@ymail.com wrote: I looked at the standard, and initially tried to implement the standard syntax for this; however, it appeared that the reasons given for not using standard syntax for the row variables also apply to the transition relations (the term used by the standard). There isn't an obvious way to tie that in to all the PLs we support. It could be done, but it seems like it would intolerably ugly, and more fragile than what we have done so far. I'm not too familiar with this area. Can you describe what the standard syntax for the row variables is, as opposed to our syntax? Also, what's the standard syntax for the the transition relations? The good: - Generating the tuplestores. Yay! The bad: - Generating them exactly and only for AFTER triggers - Requiring that the tuplestores both be generated or not at all. There are real use cases described below where only one would be relevant. - Generating the tuplestores unconditionally. The ugly: - Attaching tuplestore generation to tables rather than callers (triggers, DML, etc.) The SQL standard says: trigger definition ::= CREATE TRIGGER trigger name trigger action time trigger event ON table name [ REFERENCING transition table or variable list ] triggered action trigger action time ::= BEFORE | AFTER | INSTEAD OF trigger event ::= INSERT | DELETE | UPDATE [ OF trigger column list ] trigger column list ::= column name list triggered action ::= [ FOR EACH { ROW | STATEMENT } ] [ triggered when clause ] triggered SQL statement triggered when clause ::= WHEN left paren search condition right paren triggered SQL statement ::= SQL procedure statement | BEGIN ATOMIC { SQL procedure statement semicolon }... END transition table or variable list ::= transition table or variable... transition table or variable ::= OLD [ ROW ] [ AS ] old transition variable name | NEW [ ROW ] [ AS ] new transition variable name | OLD TABLE [ AS ] old transition table name | NEW TABLE [ AS ] new transition table name old transition table name ::= transition table name new transition table name ::= transition table name transition table name ::= identifier old transition variable name ::= correlation name new transition variable name ::= correlation name Sorry that was a little verbose, but what it does do is give us what we need at trigger definition time. I'd say it's pilot error if a trigger definition says make these tuplestores and the trigger body then does nothing with them, which goes to Robert's point below re: unconditional overhead. Some things which I *did* follow from the standard: these new relations are only allowed within AFTER triggers, but are available in both AFTER STATEMENT and AFTER ROW triggers. That is, an AFTER UPDATE ... FOR EACH ROW trigger could use both the OLD and NEW row variables as well as the delta relations (under whatever names we pick). That probably won't be used very often, but I can imagine some cases where it might be useful. I expect that these will normally be used in FOR EACH STATEMENT triggers. I'm concerned about the performance implications of capturing the delta relations unconditionally. Along that same line, we don't always need to capture both the before tuplestores and the after ones. Two examples of this come to mind: - BEFORE STATEMENT triggers accessing rows, where there is no after part to use, and - DML (RETURNING BEFORE, e.g.) which only touches one of them. This applies both to extant use cases of RETURNING and to planned ones. I'm sure if I can think of two, there are more. In summary, I'd like to propose that the tuplestores be generated separately in general and attached to callers. We can optimize this by not generating redundant tuplestores. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication identifier format
On 2014-06-18 12:36:13 -0400, Robert Haas wrote: I actually don't think any of the discussions I was involved in had the externally visible version of replication identifiers limited to 16bits? If you are referring to my patch, 16bits was just the width of the *internal* name that should basically never be looked at. User visible replication identifiers are always identified by an arbitrary string - whose format is determined by the user of the replication identifier facility. *BDR* currently stores the system identifer, the database id and a name in there - but that's nothing core needs to concern itself with. I don't think you're going to be able to avoid users needing to know about those IDs. The configuration table is going to have to be the same on all nodes, and how are you going to get that set up without those IDs being user-visible? Why? Users and other systems only ever see the external ID. Everything leaving the system is converted to the external form. The short id basically is only used in shared memory and in wal records. For both using longer strings would be problematic. In the patch I have the user can actually see them as they're stored in pg_replication_identifier, but there should never be a need for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is analyze_new_cluster.sh still useful?
Christoph Berg christoph.b...@credativ.de writes: now that we have vacuumdb --all --analyze-in-stages in 9.4, wouldn't it make sense to get rid of the analyze_new_cluster.sh file which pg_upgrade writes? The net content is a single line which could as well be printed by pg_upgrade itself. Instead of an lengthy explanation how to invoke that manually, there should be a short note and a pointer to some manual section. I think the chances of people reading that would even be increased. Similary, I don't really see the usefulness of delete_old_cluster.sh as a file, when rm -rf could just be presented on the console for the admin to execute by cut-and-paste. There are contexts where pg_upgrade is executed by some wrapper script and the user doesn't normally see its output directly. This is the case in the Red Hat packaging (unless Honza changed it since I left ;-)) and I think Debian might be similar. I generally don't like the amount of cruft pg_upgrade leaves lying around, so I'd be glad to see these script files go away if possible; but we need to think about how this will play when there's a wrapper script between pg_upgrade and the human user. In the Red Hat wrapper script, the pg_upgrade output is dumped into a log file, which the user can look at if he wants, but I'd bet the average user doesn't read it --- that was certainly the expectation. Of course, said user probably never notices the separate shell scripts either, so maybe it's a wash. Another angle is that some folks might have tried to automate things even more, with a wrapper script that starts up the new postmaster and runs analyze_new_cluster.sh all by itself. I guess they could make the wrapper do vacuumdb --all --analyze-in-stages directly, though, so maybe that's not a fatal objection either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 2014-06-18 12:36:13 -0400, Robert Haas wrote: On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund and...@2ndquadrant.com wrote: I can clearly understand the utility of being able to reset the system ID to a new, randomly-generated system ID - but giving the user the ability to set a particular value of their own choosing seems like a pretty sharp tool. What is the use case for that? I've previously hacked this up adhoc during data recovery when I needed to make another cluster similar enough that I could replay WAL. Another usecase is to mark a database as independent from its origin. Imagine a database that gets sharded across several servers. It's not uncommon to do that by initially basebackup'ing the database to several nodes and then use them separately from thereon. It's quite useful to actually mark them as being distinct. Especially as several of them right now would end up with the same timeline id... Sure, but that only requires being able to reset the ID randomly, not to a particular value. I can definitely see a point in a version of the option that generates the id randomly. But the use case one up actually does require setting it to a s specific value... Uh. Right now this patch has been written because it's needed for a out of core replication solution. That's what BDR is at this point. The patch is unobtrusive, has other usecases than just our internal one and doesn't make pg_resetxlog even more dangerous than it already is. I don't see much problem with considering it on it's own cost/benefit? Well, I think it *does* make pg_resetxlog more dangerous; see previous discussion of pg_computemaxlsn. Wasn't the thing around pg_computemaxlsn that there's actually no case where it could be used safely? And that experienced people suggested to use it an unsafe fashion? I don't see how the proposed ability makes it more dangerous. It *already* has the ability to reset the timelineid. That's the case where users are much more likely to think about resetting it because that's much more plausible than taking a unrelated cluster and resetting its sysid, timeline and LSN. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is analyze_new_cluster.sh still useful?
On 2014-06-18 12:51:43 -0400, Tom Lane wrote: Another angle is that some folks might have tried to automate things even more, with a wrapper script that starts up the new postmaster and runs analyze_new_cluster.sh all by itself. I guess they could make the wrapper do vacuumdb --all --analyze-in-stages directly, though, so maybe that's not a fatal objection either. Wouldn't that be quite counterproductive? The reason we don't normally do that and why --analyze-in-stages exists is that the cluster should be started up as fast as possible. Restarting it after ANALYZE went through would be defeating that purpose, no? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
Stephen Frost sfr...@snowman.net writes: Yes. I'd definitely like to see an ALTER TABLESPACE option, with an ERROR that lists out all of the non-temporary objects which are found (and lists any other databases which have objects in those tablespaces..). That would allow administrators who have existing notionally temporary-only tablespaces to go clean things up to make them actually temporary-only. That seems just about impossible from a concurrency standpoint (ie, what if someone is creating a new table in the tablespace concurrently with your check? Possibly in a different database?) I would certainly suggest that the first version of the patch not undertake to allow this property to be ALTERed; the cost-benefit ratio isn't there IMO. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 2014-06-18 18:54:05 +0200, Andres Freund wrote: On 2014-06-18 12:36:13 -0400, Robert Haas wrote: Sure, but that only requires being able to reset the ID randomly, not to a particular value. I can definitely see a point in a version of the option that generates the id randomly. That's actually included in the patch btw (thanks Abhijit)... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is analyze_new_cluster.sh still useful?
Andres Freund and...@2ndquadrant.com writes: On 2014-06-18 12:51:43 -0400, Tom Lane wrote: Another angle is that some folks might have tried to automate things even more, with a wrapper script that starts up the new postmaster and runs analyze_new_cluster.sh all by itself. I guess they could make the wrapper do vacuumdb --all --analyze-in-stages directly, though, so maybe that's not a fatal objection either. Wouldn't that be quite counterproductive? The reason we don't normally do that and why --analyze-in-stages exists is that the cluster should be started up as fast as possible. Restarting it after ANALYZE went through would be defeating that purpose, no? How so? Once you've started the postmaster, you're open for business, no? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund and...@2ndquadrant.com wrote: Well, I think it *does* make pg_resetxlog more dangerous; see previous discussion of pg_computemaxlsn. Wasn't the thing around pg_computemaxlsn that there's actually no case where it could be used safely? And that experienced people suggested to use it an unsafe fashion? There is a use case - to determine whether WAL has been replayed from the future relative to the WAL stream and control file you have on disk. But the rest is true enough. I don't see how the proposed ability makes it more dangerous. It *already* has the ability to reset the timelineid. That's the case where users are much more likely to think about resetting it because that's much more plausible than taking a unrelated cluster and resetting its sysid, timeline and LSN. All right, well, I've said my piece. I don't have anything to add to that that isn't sheer repetition. My vote is to hold off on this until we've talked about replication identifiers and other related topics in more depth. But if that position doesn't garner majority support ... so be it! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is analyze_new_cluster.sh still useful?
On 2014-06-18 13:24:14 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-06-18 12:51:43 -0400, Tom Lane wrote: Another angle is that some folks might have tried to automate things even more, with a wrapper script that starts up the new postmaster and runs analyze_new_cluster.sh all by itself. I guess they could make the wrapper do vacuumdb --all --analyze-in-stages directly, though, so maybe that's not a fatal objection either. Wouldn't that be quite counterproductive? The reason we don't normally do that and why --analyze-in-stages exists is that the cluster should be started up as fast as possible. Restarting it after ANALYZE went through would be defeating that purpose, no? How so? Once you've started the postmaster, you're open for business, no? Wasn't there lots of talk about making the server inaccessible while pg_upgrade is doing its thing? Also, many people are desparately unhappy if postgres has to be restarted (to return to be being OS managed) after their application already has connected. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 18/06/14 19:26, Robert Haas wrote: On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund and...@2ndquadrant.com wrote: I don't see how the proposed ability makes it more dangerous. It *already* has the ability to reset the timelineid. That's the case where users are much more likely to think about resetting it because that's much more plausible than taking a unrelated cluster and resetting its sysid, timeline and LSN. All right, well, I've said my piece. I don't have anything to add to that that isn't sheer repetition. My vote is to hold off on this until we've talked about replication identifiers and other related topics in more depth. But if that position doesn't garner majority support ... so be it! I am not sure I get what does this have to do with replication identifiers. The patch has several use-cases, one of them has to do that you can know the future system id before you set it, which is useful for automating some things... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 2014-06-18 13:26:37 -0400, Robert Haas wrote: My vote is to hold off on this until we've talked about replication identifiers and other related topics in more depth. I really don't understand this. We're *NOT* proposing this patch as an underhanded way of preempting the discussion of whether/how replication identifiers are going to be used. We're proposing it because we currently have a need for the facility and this will reduce the number of patches we have to keep around after 9.5. And more importantly because there's several other use cases than our internal one for it. To settle one more point: I am not planning to propose BDR's generation of replication identifier names for integration. It works well enough for BDR but I think we can come up with something better for core. If I had my current knowledge two years back I'd not have chosen the current scheme. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 06/13/2014 05:31 PM, Petr Jelinek wrote: Hello, attached is a simple patch which makes it possible to change the system identifier of the cluster in pg_control. This is useful for individualization of the instance that is started on top of data directory produced by pg_basebackup - something that's helpful for logical replication setup where you need to easily identify each node (it's used by Bidirectional Replication for example). I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it be better design to have an independant function, pg_set_system_identifier? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
Josh Berkus wrote: On 06/13/2014 05:31 PM, Petr Jelinek wrote: Hello, attached is a simple patch which makes it possible to change the system identifier of the cluster in pg_control. This is useful for individualization of the instance that is started on top of data directory produced by pg_basebackup - something that's helpful for logical replication setup where you need to easily identify each node (it's used by Bidirectional Replication for example). I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it be better design to have an independant function, pg_set_system_identifier? We have overloaded pg_resetxlog for all pg_control-tweaking needs. I don't see any reason to do differently here. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 2014-06-18 10:44:56 -0700, Josh Berkus wrote: On 06/13/2014 05:31 PM, Petr Jelinek wrote: Hello, attached is a simple patch which makes it possible to change the system identifier of the cluster in pg_control. This is useful for individualization of the instance that is started on top of data directory produced by pg_basebackup - something that's helpful for logical replication setup where you need to easily identify each node (it's used by Bidirectional Replication for example). I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it be better design to have an independant function, pg_set_system_identifier? You mean an independent binary? Because it's not possible to change this at runtime. If so, it's because pg_resetxlog already has the option to change many related things (e.g. the timeline id). And it'd require another copy of several hundred lines of code. It's all stored in the control file/checkpoints. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote: I'm unclear on why we would overload pg_resetxlog for this. Because pg_resetxlog already does something very similar, so the patch is small. If it were independent, it would have to copy quite some code from pg_resetxlog. Wouldn't it be better design to have an independant function, pg_set_system_identifier? A *function*? Why? -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to change the pgsql source code and build it??
Well, the initdb issue looks to be resolved. Actually, after making the changes as suggested by Kyotaro Horiguchi, I copied only initdb binary to my platform and didn't copy all of them. Hence, the dependencies were still not resolved and was getting the error. However, now the database server is started and is up and running. But, When I try to connect the client to the server, I am getting the following error: /switch/pgsql/bin # ./psql FATAL: database root does not exist psql: FATAL: database root does not exist Upon browsing couple of links, I learned that in order to get through with this issue, we should login with the actual postgres user so that it will let the client to get connected with the default database. However in my case, I don't know why there wasn't a default database with name 'root' got created or why the server rejects the client when it tries to connect to the default database. Can anyone shed some light on 1) when the default database gets created 2) how is this database 'name' is decided? Or what is the name of the default database name? 3) Is there any other places in the database server code where this check is applied? Upon looking at the error I got, I believe the code is searching for the database name == user name. If anyone can give some input on the code, it would be helpful! Thanks, Shreesha On Mon, Jun 16, 2014 at 6:58 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 06/17/2014 02:02 AM, Shreesha wrote: But I believe, I am looking forward for the exact opposite of it. In other words, a possible work around for a root user to execute certain executable(s) as an unprivileged user. So you want the su or sudo commands? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- ~Shreesha.
Re: [HACKERS] Is analyze_new_cluster.sh still useful?
Andres Freund and...@2ndquadrant.com writes: On 2014-06-18 13:24:14 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-06-18 12:51:43 -0400, Tom Lane wrote: Another angle is that some folks might have tried to automate things even more, with a wrapper script that starts up the new postmaster and runs analyze_new_cluster.sh all by itself. I guess they could make the wrapper do vacuumdb --all --analyze-in-stages directly, though, so maybe that's not a fatal objection either. Wouldn't that be quite counterproductive? The reason we don't normally do that and why --analyze-in-stages exists is that the cluster should be started up as fast as possible. Restarting it after ANALYZE went through would be defeating that purpose, no? How so? Once you've started the postmaster, you're open for business, no? Wasn't there lots of talk about making the server inaccessible while pg_upgrade is doing its thing? Also, many people are desparately unhappy if postgres has to be restarted (to return to be being OS managed) after their application already has connected. I think we're not on the same page. My point is that someone might want to automate the whole sequence: stop old postmaster, run pg_upgrade, start the updated postmaster normally (hence it *is* open for business), kick off the analyze runs. If you're concerned about minimal downtime you would not want to be waiting around for the admin to issue a perfectly predictable series of commands. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is analyze_new_cluster.sh still useful?
On 2014-06-18 13:54:16 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-06-18 13:24:14 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-06-18 12:51:43 -0400, Tom Lane wrote: Another angle is that some folks might have tried to automate things even more, with a wrapper script that starts up the new postmaster and runs analyze_new_cluster.sh all by itself. I guess they could make the wrapper do vacuumdb --all --analyze-in-stages directly, though, so maybe that's not a fatal objection either. Wouldn't that be quite counterproductive? The reason we don't normally do that and why --analyze-in-stages exists is that the cluster should be started up as fast as possible. Restarting it after ANALYZE went through would be defeating that purpose, no? How so? Once you've started the postmaster, you're open for business, no? Wasn't there lots of talk about making the server inaccessible while pg_upgrade is doing its thing? Also, many people are desparately unhappy if postgres has to be restarted (to return to be being OS managed) after their application already has connected. I think we're not on the same page. My point is that someone might want to automate the whole sequence: stop old postmaster, run pg_upgrade, start the updated postmaster normally (hence it *is* open for business), kick off the analyze runs. If you're concerned about minimal downtime you would not want to be waiting around for the admin to issue a perfectly predictable series of commands. Oh, yea. Definitely. I think that's what I've seen happen in pretty much *all* usages of pg_upgrade. I somehow misread that you wanted to add that into pg_upgrade. Not really sure how, sorry. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote: At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote: I'm unclear on why we would overload pg_resetxlog for this. Because pg_resetxlog already does something very similar, so the patch is small. If it were independent, it would have to copy quite some code from pg_resetxlog. Aha. In that case, it seems like it's time to rename pg_resetxlog, if it does a bunch of things that aren't resetting the xlog. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 2014-06-18 10:59:59 -0700, Josh Berkus wrote: On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote: At 2014-06-18 10:44:56 -0700, j...@agliodbs.com wrote: I'm unclear on why we would overload pg_resetxlog for this. Because pg_resetxlog already does something very similar, so the patch is small. If it were independent, it would have to copy quite some code from pg_resetxlog. Aha. In that case, it seems like it's time to rename pg_resetxlog, if it does a bunch of things that aren't resetting the xlog. Well, all those actually do write to the xlog (to write a new checkpoint, containing the updated control file). Since pg_resetxlog has done all this pretty much since forever renaming it now seems to be a big hassle for users for pretty much no benefit? This isn't a tool the average user should ever touch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
On Wed, Jun 18, 2014 at 1:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: Yes. I'd definitely like to see an ALTER TABLESPACE option, with an ERROR that lists out all of the non-temporary objects which are found (and lists any other databases which have objects in those tablespaces..). That would allow administrators who have existing notionally temporary-only tablespaces to go clean things up to make them actually temporary-only. That seems just about impossible from a concurrency standpoint (ie, what if someone is creating a new table in the tablespace concurrently with your check? Possibly in a different database?) You are correct, I completely forgot that CREATE TABLESPACE is not transaction safe. I would certainly suggest that the first version of the patch not undertake to allow this property to be ALTERed; the cost-benefit ratio isn't there IMO. +1 Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Set new system identifier using pg_resetxlog
On 06/18/2014 11:03 AM, Andres Freund wrote: Well, all those actually do write to the xlog (to write a new checkpoint, containing the updated control file). Since pg_resetxlog has done all this pretty much since forever renaming it now seems to be a big hassle for users for pretty much no benefit? This isn't a tool the average user should ever touch. If we're using it to create a unique system ID which can be used to orchestrate replication and clustering systems, a lot more people are going to be touching it than ever did before -- and not just for BDR. Or are you saying that we have to destroy the data by resetting the xlog before we can change the system identifier? If so, this feature is less than completely useful ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?
* Robert Haas (robertmh...@gmail.com) wrote: On Tue, Jun 17, 2014 at 10:25 PM, Stephen Frost sfr...@snowman.net wrote: I agree that we want to support that, if we can do so reasonably. What I was trying to get at is simply this- don't we provide that already with the leakproof attribute and functions? If we don't have enough there to allow index scans then we should be looking to add more, I'm thinking. So the reason why we got onto this particular topic was because of the issue of multiple security policies for a single table. Of course, multiple security policies can always be merged into a single more-complex policy, but the resulting policy may be so complex that the query-planner is no longer capable of doing a good job optimizing it. Yeah, I could see that happening with some use-cases. ALTER TABLE tab ADD POLICY polname WHERE quals; GRANT SELECT (polname) ON TABLE tab TO role; Right, if we were to support multiple policies on a given table then we would have to support adding and removing them individually, as well as specify when they are to be applied- and what if that when overlaps? Do we apply both and only a row which passed them all gets sent to the user? Essentially we'd be defining the RLS policies to be AND'd together, right? Would we want to support both AND-based and OR-based, and allow users to pick what set of conditionals they want applied to their various overlapping RLS policies? AND is not a sensible policy; it would need to be OR. If you grant someone access to two different subsets of the rows in a table, it stands to reason that they will expect to have access to all of the rows that are in at least one of those subsets. I think I can buy off on this. What that also means is that any 'short-circuiting' that we try to do here would be based on stop once we get back a 'true'. This could seriously change how we actually implement RLS though as doing it all through query rewrites and making this work with multiple security policies which are OR'd together and yet keeping the optimization and qual push-down and index-based plans is looking pretty daunting. I'm also of the opinion that this isn't strictly necessary for the initial RLS offering in PG- there's a clear way we could migrate existing users to a multi-policy system from a single-policy system. Sure, to get the performance and optimization benefits that we'd presumably have in the multi-policy case they'd need to re-work their RLS configuration, but for users who care, they'll likely be very happy to do so to gain those benefits. Perhaps the question here is- if we implement RLS one way for the single case and then change the implementation all around for the multi case, will we end up breaking the single case? Or destroying the performance for it? I can't see either of those cases being allowed- if and when we support multi, we must still support single and the whole point of multi would be to allow more performant implementations and that solution will require the single case to be at least as performant as what we're proposing to do today, I believe. Or are you thinking that we would never support calling user-defined functions in any RLS scheme because we want to be able to do that optimization? I don't see that being acceptable from a feature standpoint. Alternatively, we could: - Require the user to specify in some way which of the available policies they want applied, and then apply only that one. I'd want to at least see a way to apply an ordering to the policies being applied, or have PG work out which one is cheapest and try that one first. - Decide that such scenarios constitute misconfiguration. Throw an error and make the table owner or other relevant local authority fix it. Having them all be OR'd together feels simpler and easier to work with than trying to provide the user with all the knobs necessary to select which subset of users they want the policy applied to when (user X from IP range a.b.c.d/24 at time 1500). We could probably make it work with exclusion constraints, range types, etc, and perhaps it'd be a reason to bring btree_gist into core (which I'm all for) and make it work with catalog tables, but... just 'yuck' all around, for my part. Sounds all rather painful and much better done programatically by the user in a language which is suited to that task- eg: pl/pgsql, perl, C, or something besides our ALTER syntax + catalog representation. I think exactly the opposite, for the query planning reasons previously stated. I think the policies will quickly get so complicated that they're no longer optimizable. Here's a simple example: - Policy 1 allows the user to access rows for which complexfunc() returns true. - Policy 2 allows the user to access rows for which a = 1. Most users have access only through policy 2, but some have access through policy 1. Users who have access through policy 1
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: Yes. I'd definitely like to see an ALTER TABLESPACE option, with an ERROR that lists out all of the non-temporary objects which are found (and lists any other databases which have objects in those tablespaces..). That would allow administrators who have existing notionally temporary-only tablespaces to go clean things up to make them actually temporary-only. That seems just about impossible from a concurrency standpoint (ie, what if someone is creating a new table in the tablespace concurrently with your check? Possibly in a different database?) Yeah, that's definitely an annoying complexity. I would certainly suggest that the first version of the patch not undertake to allow this property to be ALTERed; the cost-benefit ratio isn't there IMO. I suppose scheduling downtime to do the check manually across all databases, then drop and recreate the tablespace, would work. As someone who's working with a couple of these cases, it'd be awful nice if there was a way PG would handle it for me. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_control is missing a field for LOBLKSIZE
On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian br...@momjian.us wrote: On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: On Tue, Jun 17, 2014 at 03:55:02PM -0400, Alvaro Herrera wrote: Can't you compare it to the historic default value? I mean, add an assumption that people thus far has never tweaked it. Well, if they did tweak it, then they would be unable to use pg_upgrade because it would complain about a mismatch if they actually matched the old and new servers. What about comparing to the symbolic value LOBLKSIZE? This would make pg_upgrade assume that the old installation had been tweaked the same as in its own build. This ends up being the same as what you said, ie, effectively no comparison ... but it might be less complicated to code/understand. OK, assume the compiled-in default is the value for an old cluster that has no value --- yeah, I could do that. I'm not really sure why this is better than Bruce's original proposal, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
Robert Haas robertmh...@gmail.com wrote: Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I thought the reason why this hasn't been implemented before now is that sending an ErrorResponse to the client will result in a loss of protocol sync. Hmm ... you are right that this isn't as simple as an ereport(ERROR), but I'm not sure it's impossible. We could for instance put the backend into skip-till-Sync state so that it effectively ignored the next command message. Causing that to happen might be impracticably messy, though. Another thing we could maybe do is AbortCurrentTransaction() and send the client a NoticeResponse saying hey, expect all of your future commands to fail with complaints about the transaction being aborted. Wow, until I read through the thread on this patch and the old thread that Robert linked to I had forgotten the attempt by Andres to deal with this three and a half years ago. That patch died because of how complicated it was to do the right thing if the connection was left open. Personally, where I have seen a use for this, treating it as FATAL would be better anyway; but was OK with treating it as an ERROR if that didn't sink the patch. I guess that puts me in the same camp as Josh and Robert. Vik has submitted v1 and v2 of this patch; the only difference between them is that v1 makes a timeout FATAL and v2 makes it an ERROR (with appropriate corresponding changes to code comments, documentation, and message text). It's not hard to show the problems with an ERROR under v2, confirming that the simple approach of just changing the ereport severity is not enough: test=# set idle_in_transaction_timeout = '3s'; SET test=# begin; BEGIN test=# select 1; ERROR: current transaction is aborted due to idle-in-transaction timeout test=# commit; ERROR: current transaction is aborted, commands ignored until end of transaction block test=# commit; ROLLBACK The first thing is that I don't think a delay between the BEGIN and the SELECT should cause a timeout to trigger, but more importantly there should not be two ERROR responses to one SELECT statement. I'm inclined to abandon the ERROR approach as not worth the effort and fragility, and focus on v1 of the patch. If we can't get to consensus on that, I think that this patch should be flagged Returned with Feedback, noting that any follow-up version requires some way to deal with the issues raised regarding multiple ERROR messages. Abridged for space. hopefully without losing the main points of the authors, so far: Josh Berkus: My $0.018: terminating the connection is the preferred behavior. Tom Lane: FWIW, I think aborting the transaction is probably better, especially if the patch is designed to do nothing to already-aborted transactions. If the client is still there, it will see the abort as a failure in its next query, which is less likely to confuse it completely than a connection loss. (I think, anyway.) Álvaro Herrera: I think if we want this at all it should abort the xact, not drop the connection. Andrew Dunstan: [quotes Tom's argument] Yes, I had the same thought. David G Johnston: Default to dropping the connection but give the usersadministrators the capability to decide for themselves? Robert Haas: Assuming that the state of play hasn't changed in some way I'm unaware of since 2010, I think the best argument for FATAL here is that it's what we can implement. I happen to think it's better anyway, because the cases I've seen where this would actually be useful involve badly-written applications that are not under the same administrative control as the database and supposedly have built-in connection poolers, except sometimes they seem to forget about some of the connections they themselves opened. The DBAs can't make the app developers fix the app; they just have to try to survive. Aborting the transaction is a step in the right direction but since the guy at the other end of the connection is actually or in effect dead, that just leaves you with an eternally idle connection. Tom Lane (reprise): I'm not sure whether cancel-transaction behavior is enough better than cancel-session to warrant extra effort here. Andres Freund: [quotes Tom's latest (above)] I don't think idle_in_transaction_timeout is worth this, but if we could implement it we could finally have recovery conflicts against idle in txn sessions not use FATAL... Kevin Grittner: Personally, where I have seen a use for this, treating it as FATAL would be better anyway A few ideas were put forward on how a much more complicated patch could allow transaction rollback with an ERROR work acceptably, but I think that would probably need to be a new patch in a later CF, so that is omitted here. So, can we agree to use FATAL to terminate the connection? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/10/2014 02:57 AM, MauMau wrote: From: Amit Kapila amit.kapil...@gmail.com Is there a need to free memory context in PG_CATCH()? In error path the memory due to this temporary context will get freed as it will delete the transaction context and this temporary context will definitely be in the hierarchy of it, so it should also get deleted. I think such handling can be useful incase we use PG_CATCH() to suppress the error. I thought the same, but I also felt that I should make an effort to release resources as soon as possible, considering the memory context auto deletion as a last resort. However, looking at other places where PG_CATCH() is used, memory context is not deleted. So, I removed the modification from PG_CATCH() block. Thanks. I think the context deletion was missed in the first place because storeRow() is the wrong place to create the context. Rather than creating the context in storeRow() and deleting it two levels up in materializeQueryResult(), I think it should be created and deleted in the interim layer, storeQueryResult(). Patch along those lines attached. Any objections to me back-patching it this way? Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTod+UAAoJEDfy90M199hlaWgP/iitSQm7m+Sf2i8hP73TAdTX Naw+TKtag822xzVT6AoVJafeZvEv0PNRYurP69y4zE11cwiG9u+RfoO1X1hP7FYu mdHo5++YG+znmqDVC57Qav0qAheYaCk2Y9RKEZk9gEJLUO+HJRxuzc+L277lsAni Iyu3DyaiogzmGBtjrPPex4VMtpFAPHzHWfABaL11kvNBXXpUjO/Z02ex+1nuZDV7 +wNSV4IdTTsmpdbbi/NRhDeU7sZOerIAY29B6vI/GXfHdwhDhg+3tG4PuoDg6YG2 jX/H37UE0zq0+J3ySnM92HZtn9RxfrjdkHTz/X7DAjZjHNwBNVi18oNfrsXUyqHO 3j3VEaRelow2+tUDaTxziEkZRA0vCRoeG20B7dgQY3op60wm9lpJ+pCQH8UGwPYC HDxaOYcVKmxbWq+j86l2ZyYlfP8sVbqTMEc00dnYoc2sdDzU36+KJadIqoMoUgds G5uNYOZ5eTxumua9exz2cqGVOdgzcDD3r/ZAUUVM0jk3LwdA4CKLorsy26Ce59lF mSIO58uAJe198tyOCmbpZjbypIRRO3Qm73SfUmioCbcUzSg2tDdPpf0LP62V/YEm gFIsPHNyZnaVm0baLilbJFg+88sxFSo1hvpoaUfNPVww7woAfFxi6uBt5h5KthUO JoDWxYjYUy9VFDdC4rh4 =gj6p -END PGP SIGNATURE- diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index a81853f..46c4a51 100644 *** a/contrib/dblink/dblink.c --- b/contrib/dblink/dblink.c *** storeQueryResult(storeInfo *sinfo, PGcon *** 1076,1081 --- 1076,1089 if (!PQsetSingleRowMode(conn)) /* shouldn't fail */ elog(ERROR, failed to set single-row mode for dblink query); + /* Create short-lived memory context for data conversions */ + if (!sinfo-tmpcontext) + sinfo-tmpcontext = AllocSetContextCreate(CurrentMemoryContext, + dblink temporary context, + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + for (;;) { CHECK_FOR_INTERRUPTS(); *** storeQueryResult(storeInfo *sinfo, PGcon *** 1119,1124 --- 1127,1136 /* clean up GUC settings, if we changed any */ restoreLocalGucs(nestlevel); + /* clean up data conversion short-lived memory context */ + if (sinfo-tmpcontext != NULL) + MemoryContextDelete(sinfo-tmpcontext); + /* return last_res */ res = sinfo-last_res; sinfo-last_res = NULL; *** storeRow(storeInfo *sinfo, PGresult *res *** 1204,1218 if (sinfo-cstrs) pfree(sinfo-cstrs); sinfo-cstrs = (char **) palloc(nfields * sizeof(char *)); - - /* Create short-lived memory context for data conversions */ - if (!sinfo-tmpcontext) - sinfo-tmpcontext = - AllocSetContextCreate(CurrentMemoryContext, - dblink temporary context, - ALLOCSET_DEFAULT_MINSIZE, - ALLOCSET_DEFAULT_INITSIZE, - ALLOCSET_DEFAULT_MAXSIZE); } /* Should have a single-row result if we get here */ --- 1216,1221 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_control is missing a field for LOBLKSIZE
Robert Haas robertmh...@gmail.com writes: On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian br...@momjian.us wrote: On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote: What about comparing to the symbolic value LOBLKSIZE? This would make pg_upgrade assume that the old installation had been tweaked the same as in its own build. This ends up being the same as what you said, ie, effectively no comparison ... but it might be less complicated to code/understand. OK, assume the compiled-in default is the value for an old cluster that has no value --- yeah, I could do that. I'm not really sure why this is better than Bruce's original proposal, though. The net behavior would be the same, but I thought it might be easier to code by thinking of it this way. Or maybe it wouldn't --- it's just a suggestion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
Joe Conway m...@joeconway.com writes: I think the context deletion was missed in the first place because storeRow() is the wrong place to create the context. Rather than creating the context in storeRow() and deleting it two levels up in materializeQueryResult(), I think it should be created and deleted in the interim layer, storeQueryResult(). Patch along those lines attached. Since the storeInfo struct is longer-lived than storeQueryResult(), it'd probably be better if the cleanup looked like + if (sinfo-tmpcontext != NULL) + MemoryContextDelete(sinfo-tmpcontext); + sinfo-tmpcontext = NULL; I find myself a bit suspicious of this whole thing though. If it's necessary to explicitly clean up the tmpcontext, why not also the sinfo-cstrs allocation? And what about the tupdesc, attinmeta, etc created further up in that if (first) block? I'd have expected that all this stuff gets allocated in a context that's short-lived enough that we don't really need to clean it up explicitly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/18/2014 12:09 PM, Tom Lane wrote: Joe Conway m...@joeconway.com writes: I think the context deletion was missed in the first place because storeRow() is the wrong place to create the context. Rather than creating the context in storeRow() and deleting it two levels up in materializeQueryResult(), I think it should be created and deleted in the interim layer, storeQueryResult(). Patch along those lines attached. Since the storeInfo struct is longer-lived than storeQueryResult(), it'd probably be better if the cleanup looked like + if (sinfo-tmpcontext != NULL) + MemoryContextDelete(sinfo-tmpcontext); + sinfo-tmpcontext = NULL; good point I find myself a bit suspicious of this whole thing though. If it's necessary to explicitly clean up the tmpcontext, why not also the sinfo-cstrs allocation? And what about the tupdesc, attinmeta, etc created further up in that if (first) block? I'd have expected that all this stuff gets allocated in a context that's short-lived enough that we don't really need to clean it up explicitly. Yeah, I thought about that too. My testing showed a small amount of remaining leakage -- like 20 MB on 100,000 iterations -- I wasn't sure that it was worthwhile to worry about. The memory context leak was much larger. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJToeUQAAoJEDfy90M199hls8kP/0hhsVdn8HKcY7OZ64cKju4e dH0SwQNMZxklpQkjEqb6vpbTmqaQ/7At/i7b6RGTMWFbIBu7L5Vpz+NZnJ7SyfaC UjSgKfSe6M2auDIXlLCiZ62d8KZJJmVQj6U2h8ljhqorJq22kvClgXAUlcxZMoVv SEthE1y7Udxgf3IqBO0AN6SMPrxP8bZDgrPbtZqw9UEHkCGZK0MDdH8TSRip7p6V heyg9GbWeTvwFWBUYomMnMEUB6UlgBRnmHYsZIAjmUgxZfGiKhlydGOrb7MDHopz ejyb06fg2MsSQnrEnCElLbonUqb5S9bD4p9BZHF/bz67AhVieJvDvnsSIoDjzR1a +JIYe36ZDqo2kIGx4PuESgGX4ZTxxsrw033AQhVBW+KGkIuxjnvhh7tvEj3ACSjd 0bC1ztrnzCJLyNFd6jKaq5KAcNCEb/zvfKQAdHygJ87JkwE8I6u+ms1hdDpc10uZ w484lcv/qP/JO2SaqPerDMwRsDXaovT8kEcsbqr1D7XFXu4cufHVGwe64IJYMjt+ 9Y09/+i+Xt9DKXfkxC68Q2QWxwst5NfShRWDF3NG02iWlMqU2OsoVTQB6137DCiB 7hBKoNBmOApLJIjwCzEjB5IJmeeAz9x7YxgsqicRPnXoCKO+aF3dTbS+1u04va84 5XQ0p0lwUZaNfYr/xbi6 =H5bR -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_control is missing a field for LOBLKSIZE
On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jun 17, 2014 at 8:46 PM, Bruce Momjian br...@momjian.us wrote: On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote: What about comparing to the symbolic value LOBLKSIZE? This would make pg_upgrade assume that the old installation had been tweaked the same as in its own build. This ends up being the same as what you said, ie, effectively no comparison ... but it might be less complicated to code/understand. OK, assume the compiled-in default is the value for an old cluster that has no value --- yeah, I could do that. I'm not really sure why this is better than Bruce's original proposal, though. The net behavior would be the same, but I thought it might be easier to code by thinking of it this way. Or maybe it wouldn't --- it's just a suggestion. Well, the difference is that if we just don't check it, there can never be an error. Basically, it's the user's job to DTRT. If we check it against some semi-arbitrary value, we'll catch the case where the old cluster was modified with a custom setting and the new one was not - but couldn't we also get false positives under obscure circumstances? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
On 06/18/2014 11:50 AM, Kevin Grittner wrote: The first thing is that I don't think a delay between the BEGIN and the SELECT should cause a timeout to trigger, but more importantly there should not be two ERROR responses to one SELECT statement. I do think a delay between BEGIN and SELECT should trigger the timeout. There are plenty of badly-written applications which auto-begin, that is, they issue a BEGIN; immediately after every COMMIT; whether or not there's any additional work to do. This is a major source of IIT and the timeout should not ignore it. I'm inclined to abandon the ERROR approach as not worth the effort and fragility, and focus on v1 of the patch. If we can't get to consensus on that, I think that this patch should be flagged Returned with Feedback, noting that any follow-up version requires some way to deal with the issues raised regarding multiple ERROR messages. +1 -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_control is missing a field for LOBLKSIZE
Robert Haas robertmh...@gmail.com writes: On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: The net behavior would be the same, but I thought it might be easier to code by thinking of it this way. Or maybe it wouldn't --- it's just a suggestion. Well, the difference is that if we just don't check it, there can never be an error. Basically, it's the user's job to DTRT. If we check it against some semi-arbitrary value, we'll catch the case where the old cluster was modified with a custom setting and the new one was not - but couldn't we also get false positives under obscure circumstances? Huh? What we'd be checking is the LOBLKSIZE compiled into pg_upgrade versus that stored into pg_control by the new postmaster. If those are different, then pg_upgrade didn't come from the same build as the new postmaster, which is already a pretty hazardous situation (especially if the user is fooling with low-level stuff like this). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
Josh Berkus j...@agliodbs.com writes: There are plenty of badly-written applications which auto-begin, that is, they issue a BEGIN; immediately after every COMMIT; whether or not there's any additional work to do. This is a major source of IIT and the timeout should not ignore it. Nonsense. We explicitly don't do anything useful until the first actual command arrives, precisely to avoid that problem. It might be that we should slap such apps' wrists anyway, but given that we've gone to the trouble of working around the behavior at the system structural level, I'd be inclined to say not. What you'd be doing is preventing people who have to deal with such apps from using the timeout in any useful fashion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_control is missing a field for LOBLKSIZE
On Wed, Jun 18, 2014 at 3:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 18, 2014 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote: The net behavior would be the same, but I thought it might be easier to code by thinking of it this way. Or maybe it wouldn't --- it's just a suggestion. Well, the difference is that if we just don't check it, there can never be an error. Basically, it's the user's job to DTRT. If we check it against some semi-arbitrary value, we'll catch the case where the old cluster was modified with a custom setting and the new one was not - but couldn't we also get false positives under obscure circumstances? Huh? What we'd be checking is the LOBLKSIZE compiled into pg_upgrade versus that stored into pg_control by the new postmaster. If those are different, then pg_upgrade didn't come from the same build as the new postmaster, which is already a pretty hazardous situation (especially if the user is fooling with low-level stuff like this). OK, I agree that checking that wouldn't hurt anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is analyze_new_cluster.sh still useful?
On Wed, Jun 18, 2014 at 10:58 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-06-18 13:54:16 -0400, Tom Lane wrote: I think we're not on the same page. My point is that someone might want to automate the whole sequence: stop old postmaster, run pg_upgrade, start the updated postmaster normally (hence it *is* open for business), kick off the analyze runs. If you're concerned about minimal downtime you would not want to be waiting around for the admin to issue a perfectly predictable series of commands. Oh, yea. Definitely. I think that's what I've seen happen in pretty much *all* usages of pg_upgrade. I think it is a popular way to do it not because it is a particularly good way, but because the better alternatives are not readily available. If your database needs statistics badly enough that you want to do a coarse pre-pass with default_statistics_target=1, why would you want that pass to be done on an open database? Surely you don't want 100 open connections all doing giant seq scans (that should be single-row look up, but without stats they are not) competing with the analyze. Having a database which is open to queries but they have such deranged execution plans that they never actually finish is not truly open, and the attempts to service those futile queries just delays the true opening even further. If you really need a multi pass ANALYZE, you probably need the first pass to be before the database opens because otherwise the open will be a disaster, and the 2nd pass to be after the database opens but before your bulk queries (mass deletes, EOM reports, etc.) kick in. Having both passes be on the same side of the opening seems unlikely to do much good for most use cases. Fortunately it probably doesn't do much harm to most people, either, simple because most databases are not terribly sensitive to the issue. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.
On Wed, Jun 18, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-01-08 23:58:16 +, Robert Haas wrote: Reduce the number of semaphores used under --disable-spinlocks. Instead of allocating a semaphore from the operating system for every spinlock, allocate a fixed number of semaphores (by default, 1024) from the operating system and multiplex all the spinlocks that get created onto them. This could self-deadlock if a process attempted to acquire more than one spinlock at a time, but since processes aren't supposed to execute anything other than short stretches of straight-line code while holding a spinlock, that shouldn't happen. One motivation for this change is that, with the introduction of dynamic shared memory, it may be desirable to create spinlocks that last for less than the lifetime of the server. Without this change, attempting to use such facilities under --disable-spinlocks would quickly exhaust any supply of available semaphores. Quite apart from that, it's desirable to contain the quantity of semaphores needed to run the server simply on convenience grounds, since using too many may make it harder to get PostgreSQL running on a new platform, which is mostly the point of --disable-spinlocks in the first place. I'm looking at the way you did this in the context of the atomics patch. Won't: s_init_lock_sema(volatile slock_t *lock) { static int counter = 0; *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES; } lead to bad results if spinlocks are intialized after startup? Why? Essentially mapping new spinlocks to the same semaphore? Yeah, but so what? If we're mapping a bajillion spinlocks to the same semaphore already, what's a few more? That's a restriction I can live with, especially as this is only for super old platforms. But it might be worth mentioning somewhere? Dunno. What restriction? I've essentially reeimplemented that kind of logic in the atomics patch. Looking to get rid of the duplication... There I used something like slot = ((uintptr_t) addr_of_atomic (sizeof(void*) + 5)) % NUM_LOCKS but I think your method is actually better because it allows to place spinlocks/atomics to be placed in dsm segments placed at different location in individual processes. Right. My current plan to get rid of the duplication is to simply embed the spinlock inside the atomic variable instead of having a separate array of spinlocks protecting atomic variables. Doesn't sound crazy at first glance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: Yes. I'd definitely like to see an ALTER TABLESPACE option, with an ERROR that lists out all of the non-temporary objects which are found (and lists any other databases which have objects in those tablespaces..). That would allow administrators who have existing notionally temporary-only tablespaces to go clean things up to make them actually temporary-only. I would certainly suggest that the first version of the patch not undertake to allow this property to be ALTERed; the cost-benefit ratio isn't there IMO. I suppose scheduling downtime to do the check manually across all databases, then drop and recreate the tablespace, would work. As someone who's working with a couple of these cases, it'd be awful nice if there was a way PG would handle it for me. I wonder if some form of NOT VALID marking could be useful here. Of course, this is not a constraint. But a mechanism of a similar spirit seems apropos. It seems reasonable to leave such a thing for future improvement. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
On 06/18/2014 12:32 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: There are plenty of badly-written applications which auto-begin, that is, they issue a BEGIN; immediately after every COMMIT; whether or not there's any additional work to do. This is a major source of IIT and the timeout should not ignore it. Nonsense. We explicitly don't do anything useful until the first actual command arrives, precisely to avoid that problem. Oh, we don't allocate a snapshot? If not, then no objection here. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.
On 2014-06-18 15:52:49 -0400, Robert Haas wrote: On Wed, Jun 18, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-01-08 23:58:16 +, Robert Haas wrote: Reduce the number of semaphores used under --disable-spinlocks. Instead of allocating a semaphore from the operating system for every spinlock, allocate a fixed number of semaphores (by default, 1024) from the operating system and multiplex all the spinlocks that get created onto them. This could self-deadlock if a process attempted to acquire more than one spinlock at a time, but since processes aren't supposed to execute anything other than short stretches of straight-line code while holding a spinlock, that shouldn't happen. One motivation for this change is that, with the introduction of dynamic shared memory, it may be desirable to create spinlocks that last for less than the lifetime of the server. Without this change, attempting to use such facilities under --disable-spinlocks would quickly exhaust any supply of available semaphores. Quite apart from that, it's desirable to contain the quantity of semaphores needed to run the server simply on convenience grounds, since using too many may make it harder to get PostgreSQL running on a new platform, which is mostly the point of --disable-spinlocks in the first place. I'm looking at the way you did this in the context of the atomics patch. Won't: s_init_lock_sema(volatile slock_t *lock) { static int counter = 0; *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES; } lead to bad results if spinlocks are intialized after startup? Why? Because every further process will start with a copy of the postmaster's counter or with 0 (EXEC_BACKEND)? Essentially mapping new spinlocks to the same semaphore? Yeah, but so what? If we're mapping a bajillion spinlocks to the same semaphore already, what's a few more? Well, imagine something like parallel query creating new segments, including a spinlock (possibly via a lwlock) at runtime. If there were several backends processing such queries this they'd all map to the same semaphore. Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] Memory leak in dblink
Joe Conway m...@joeconway.com writes: On 06/18/2014 12:09 PM, Tom Lane wrote: I find myself a bit suspicious of this whole thing though. If it's necessary to explicitly clean up the tmpcontext, why not also the sinfo-cstrs allocation? And what about the tupdesc, attinmeta, etc created further up in that if (first) block? I'd have expected that all this stuff gets allocated in a context that's short-lived enough that we don't really need to clean it up explicitly. Yeah, I thought about that too. My testing showed a small amount of remaining leakage -- like 20 MB on 100,000 iterations -- I wasn't sure that it was worthwhile to worry about. The memory context leak was much larger. [ Thinks for awhile... ] Ah. The memory context is a child of the econtext's ecxt_per_tuple_memory, which is supposed to be short-lived, but we only do MemoryContextReset() on that after each tuple, not MemoryContextResetAndDeleteChildren(). I recall there's been debate about changing that, but it's not something we can risk changing in back branches, for sure. So I concur we need an explicit context delete here. I do see growth in the per-query context as well. I'm not sure where that's coming from, but we probably should try to find out. A couple hundred bytes per iteration is going to add up, even if it's not as fast as 8K per iteration. I'm not sure it's dblink's fault, because I don't see anything in dblink.c that is allocating anything in the per-query context, except for the returned tuplestores, which somebody else should clean up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?
On Wed, Jun 18, 2014 at 4:53 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: Yes. I'd definitely like to see an ALTER TABLESPACE option, with an ERROR that lists out all of the non-temporary objects which are found (and lists any other databases which have objects in those tablespaces..). That would allow administrators who have existing notionally temporary-only tablespaces to go clean things up to make them actually temporary-only. I would certainly suggest that the first version of the patch not undertake to allow this property to be ALTERed; the cost-benefit ratio isn't there IMO. I suppose scheduling downtime to do the check manually across all databases, then drop and recreate the tablespace, would work. As someone who's working with a couple of these cases, it'd be awful nice if there was a way PG would handle it for me. I wonder if some form of NOT VALID marking could be useful here. Of course, this is not a constraint. But a mechanism of a similar spirit seems apropos. It seems reasonable to leave such a thing for future improvement. +1 Then, to summarize Matheus must do: * use an option instead of change the syntax and catalog to indicate that a tablespace is used to store temp objects * throw an exception if we try ALTER the option only_temp_relations * add regression tests * add documentation And, of course, register to the next open commitfest [1] to get detailed feedback and review. Regards, [1] https://commitfest.postgresql.org/ -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] [COMMITTERS] pgsql: Reduce the number of semaphores used under --disable-spinlocks.
On Wed, Jun 18, 2014 at 3:56 PM, Andres Freund and...@2ndquadrant.com wrote: I'm looking at the way you did this in the context of the atomics patch. Won't: s_init_lock_sema(volatile slock_t *lock) { static int counter = 0; *lock = (++counter) % NUM_SPINLOCK_SEMAPHORES; } lead to bad results if spinlocks are intialized after startup? Why? Because every further process will start with a copy of the postmaster's counter or with 0 (EXEC_BACKEND)? Oh, true. Maybe we should randomize that. Essentially mapping new spinlocks to the same semaphore? Yeah, but so what? If we're mapping a bajillion spinlocks to the same semaphore already, what's a few more? Well, imagine something like parallel query creating new segments, including a spinlock (possibly via a lwlock) at runtime. If there were several backends processing such queries this they'd all map to the same semaphore. Yeah. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] idle_in_transaction_timeout
On Wed, Jun 18, 2014 at 3:53 PM, Josh Berkus j...@agliodbs.com wrote: On 06/18/2014 12:32 PM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: There are plenty of badly-written applications which auto-begin, that is, they issue a BEGIN; immediately after every COMMIT; whether or not there's any additional work to do. This is a major source of IIT and the timeout should not ignore it. Nonsense. We explicitly don't do anything useful until the first actual command arrives, precisely to avoid that problem. Oh, we don't allocate a snapshot? If not, then no objection here. The only problem I see is that it makes the semantics kind of weird and confusing. Kill connections that are idle in transaction for too long is a pretty clear spec; kill connections that are idle in transaction except if they haven't executed any commands yet because we think you don't care about that case is not quite as clear, and not really what the GUC name says, and maybe not what everybody wants, and maybe masterminding. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics hardware support table supported architectures
On Wed, Jun 18, 2014 at 12:13 PM, Andres Freund and...@2ndquadrant.com wrote: There might be cases where that's true, but in the majority of cases where the variable isn't highly contended it's just about the same. In many cases the microcode will implement atomic ops using ll/sc operations internally anyway. But if the variable isn't highly contended, then why are we messing around with atomic ops in the first place? I'm fine with always implementing everything but tas, cas, add ontop of cas for now. I want or/and/sub/... to be conveniently available to C code, but they don't need to be based on hardware ops. Having open-coded versions of the above in several places sounds like a horrible idea to me. Both, because we might want to optimize it at some point and because it's horrible readability wise. OK, so if we're only going to have TAS, CAS, and fetch-and-add as primitives (which sounds sensible to me) and implement the rest in terms of those for now, then the table on the wiki only needs one more column: information about support for fetch-and-add. More than that, I actually really hate things that don't have a configure option, like WAL_DEBUG, because you have to change a checked-in file, which shows up as a diff, and if you're not careful you check it in, and if you are careful it still gets blown away every time you git reset --hard, which I do a lot. I think the fact that both Heikki and I on separate occasions have made commits enabling WAL_DEBUG shows pretty clearly the weaknesses of that method of doing business. Why don't you pass it to configure or add it in Makefile.custom? That's what I do. Yeah, I should probably do that instead. But I still think the point that two different commiters have managed to flip that flag by accident is telling. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics hardware support table supported architectures
On 2014-06-18 17:04:49 -0400, Robert Haas wrote: On Wed, Jun 18, 2014 at 12:13 PM, Andres Freund and...@2ndquadrant.com wrote: There might be cases where that's true, but in the majority of cases where the variable isn't highly contended it's just about the same. In many cases the microcode will implement atomic ops using ll/sc operations internally anyway. But if the variable isn't highly contended, then why are we messing around with atomic ops in the first place? a) Quite often the strange atomic op is only needed to allow a more performance critical codepart to use atomic ops (e.g. setting flags atomically is only required to make atomic pins work). b) A spinlock protected region is often more likely to take longer than a single atomic op because it'll often touch more cachelines and such. For such cmpxchg loops there's pretty much no intermediary instructions inbetween which the cacheline could be stolen by another core. c) The overall amount of bus traffic is often noticeably lower with a atomic op because the average total number of locked instructions is lower (unlocking often enough requires another atomic op or barrier) Why don't you pass it to configure or add it in Makefile.custom? That's what I do. Yeah, I should probably do that instead. But I still think the point that two different commiters have managed to flip that flag by accident is telling. Not arguing against having the configure flag here (already decided), just wondering whether your life could be made easier :) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers