Re: [HACKERS] Escaping from blocked send() reprised.
Hmm.. Sorry for my stupidity. Why is that necessary? It seems really rather wrong to make BIO_set_retry_write() dependant on ProcDiePending? Especially as, at least in my testing, it's not even required because the be_tls_write() can just check the error properly? I mistook the previous conversation as it doesn't work as expected. I confirmed that it works fine. After all, it works as I expected. The parameter for ProcessClientWriteInterrupt() looks somewhat uneasy but the patch 4 looks fine as a whole. Do you have anything to worry about in the patch? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On Fri, Oct 10, 2014 at 1:08 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-09 16:01:55 +0200, Andres Freund wrote: I don't think OLTP really is the best test case for this. Especially not pgbench with relatilvely small rows *and* a uniform distribution of access. Try parallel COPY TO. Batch write loads is where I've seen this hurt badly. just by switching shared_buffers from 1 to 8GB. I haven't tried, but I hope that with an approach like your's this might become better. psql -f /tmp/prepare.sql pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100 Thanks for providing the scripts. You haven't specified how much data is present in 'large' file used in tests. I have tried with different set of rows, but I could not see the dip that is present in your data when you increased shared buffers from 1GB to 8GB, also I don't see any difference with patch. BTW, why do you think that for such worklaods this patch can be helpful, according to my understanding it can be helpful mainly for read mostly workloads when all the data doesn't fit in shared buffers. Performance Data --- IBM POWER-8 24 cores, 192 hardware threads RAM = 492GB For 50 rows Data populated using below statement: insert into largedata_64 ('aa',generate_series(1,50)); copy largedata_64 to '/tmp/large' binary; pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100 shared_buffers - 1GB - progress: 7.0 s, 2.7 tps, lat 2326.645 ms stddev 173.506 progress: 11.5 s, 3.5 tps, lat 2295.577 ms stddev 78.949 progress: 15.8 s, 3.7 tps, lat 2298.217 ms stddev 223.346 progress: 20.4 s, 3.5 tps, lat 2350.187 ms stddev 192.312 progress: 25.1 s, 3.4 tps, lat 2280.206 ms stddev 54.580 progress: 31.9 s, 3.4 tps, lat 2408.593 ms stddev 243.230 progress: 45.2 s, 1.1 tps, lat 5120.151 ms stddev 3913.561 progress: 50.5 s, 1.3 tps, lat 8967.954 ms stddev 3384.229 progress: 52.7 s, 2.7 tps, lat 3883.788 ms stddev 1733.293 progress: 55.6 s, 3.2 tps, lat 2684.282 ms stddev 348.615 progress: 58.2 s, 3.4 tps, lat 2602.355 ms stddev 268.718 progress: 60.8 s, 3.1 tps, lat 2361.937 ms stddev 302.643 progress: 65.3 s, 3.5 tps, lat 2341.903 ms stddev 162.338 progress: 74.1 s, 2.6 tps, lat 2720.182 ms stddev 716.425 progress: 76.4 s, 3.5 tps, lat 3023.234 ms stddev 670.473 progress: 80.4 s, 2.0 tps, lat 2795.323 ms stddev 820.429 progress: 85.6 s, 1.9 tps, lat 4756.217 ms stddev 844.284 progress: 91.9 s, 2.2 tps, lat 3996.001 ms stddev 1301.143 progress: 96.6 s, 3.5 tps, lat 2284.419 ms stddev 85.013 progress: 101.1 s, 3.5 tps, lat 2282.848 ms stddev 71.388 transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 8 number of threads: 8 duration: 100 s number of transactions actually processed: 275 latency average: 2939.784 ms latency stddev: 1739.974 ms tps = 2.710138 (including connections establishing) tps = 2.710208 (excluding connections establishing) shared_buffers - 8GB progress: 6.7 s, 2.7 tps, lat 2349.816 ms stddev 212.889 progress: 11.0 s, 3.5 tps, lat 2257.364 ms stddev 141.148 progress: 15.2 s, 3.8 tps, lat 2209.669 ms stddev 127.101 progress: 21.7 s, 3.7 tps, lat 2159.838 ms stddev 92.205 progress: 25.8 s, 3.9 tps, lat 2221.072 ms stddev 283.362 progress: 30.1 s, 3.5 tps, lat 2179.611 ms stddev 152.741 progress: 39.3 s, 2.1 tps, lat 2768.609 ms stddev 1265.508 progress: 50.9 s, 1.1 tps, lat 9361.388 ms stddev 2657.885 progress: 52.9 s, 1.0 tps, lat 2036.098 ms stddev 3.599 progress: 55.2 s, 4.3 tps, lat 2167.688 ms stddev 91.183 progress: 57.6 s, 3.0 tps, lat 2399.219 ms stddev 173.535 progress: 60.2 s, 4.1 tps, lat 2427.273 ms stddev 198.698 progress: 65.2 s, 3.4 tps, lat 2441.630 ms stddev 123.773 progress: 72.4 s, 2.9 tps, lat 2534.631 ms stddev 254.162 progress: 75.0 s, 3.9 tps, lat 2468.266 ms stddev 221.969 progress: 82.3 s, 3.0 tps, lat 2548.690 ms stddev 404.852 progress: 86.7 s, 1.4 tps, lat 3980.576 ms stddev 1205.743 progress: 92.5 s, 1.4 tps, lat 5174.340 ms stddev 643.415 progress: 97.1 s, 3.7 tps, lat 3252.847 ms stddev 1689.268 progress: 101.8 s, 3.4 tps, lat 2346.690 ms stddev 138.251 transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 8 number of threads: 8 duration: 100 s number of transactions actually processed: 284 latency average: 2856.195 ms latency stddev: 1740.699 ms tps = 2.781603 (including connections establishing) tps = 2.781682 (excluding connections establishing) For 5000 rows shared_buffers - 1GB --- progress: 5.0 s, 357.7 tps, lat 22.295 ms stddev 3.511 progress: 10.0 s, 339.0 tps, lat 23.606 ms stddev 4.388 progress: 15.0 s, 323.4 tps, lat 24.733 ms stddev 5.001 progress: 20.0 s, 329.6 tps, lat 24.258 ms stddev 4.407 progress: 25.0 s, 334.3 tps, lat 23.963 ms stddev 4.126 progress: 30.0 s, 337.5 tps, lat 23.699 ms
Re: [HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c
On 10/10/2014 02:04 AM, Peter Geoghegan wrote: I found a reference made obsolete by commit 9e85183b, which is from way back in 2000. comparetup_index_btree() says: /* * This is similar to _bt_tuplecompare(), but we have already done the * index_getattr calls for the first column, and we need to keep track of * whether any null fields are present. Also see the special treatment * for equal keys at the end. */ I think that this comment should simply indicate that the routine is similar to comparetup_heap(), except that it takes care of the special tie-break stuff for B-Tree sorts, as well as enforcing uniqueness during unique index builds. It should not reference _bt_compare() at all (which is arguably the successor to _bt_tuplecompare()), since _bt_compare() is concerned with a bunch of stuff highly specific to the B-Tree implementation (e.g. having a hard-wired return value for comparisons involving the first data item on an internal page). Yeah. Want to write that into a patch? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c
On Thu, Oct 9, 2014 at 11:58 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Yeah. Want to write that into a patch? Attached. -- Peter Geoghegan diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c new file mode 100644 index 8e57505..2d15da0 *** a/src/backend/utils/sort/tuplesort.c --- b/src/backend/utils/sort/tuplesort.c *** comparetup_index_btree(const SortTuple * *** 3176,3184 Tuplesortstate *state) { /* !* This is similar to _bt_tuplecompare(), but we have already done the !* index_getattr calls for the first column, and we need to keep track of !* whether any null fields are present. Also see the special treatment * for equal keys at the end. */ ScanKey scanKey = state-indexScanKey; --- 3176,3183 Tuplesortstate *state) { /* !* This is similar to comparetup_heap(), but expects index tuples. There !* is also special handling for enforcing uniqueness, and special treatment * for equal keys at the end. */ ScanKey scanKey = state-indexScanKey; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
Hi, On 2014-10-10 10:13:03 +0530, Amit Kapila wrote: I have done few performance tests for above patches and results of same is as below: Cool, thanks. Performance Data -- IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB max_connections =210 Database Locale =C checkpoint_segments=256 checkpoint_timeout=35min shared_buffers=8GB Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins Test type - read only pgbench with -M prepared Other Related information about test a. This is the data for median of 3 runs, the detailed data of individual run is attached with mail. b. I have applied both the patches to take performance data. Scale Factor - 100 Patch_ver/Client_count 1 8 16 32 64 128 HEAD 13344 106921 196629 295123 377846 333928 PATCH 13662 106179 203960 298955 452638 465671 Scale Factor - 3000 Patch_ver/Client_count 8 16 32 64 128 160 HEAD 86920 152417 231668 280827 257093 255122 PATCH 87552 160313 230677 276186 248609 244372 Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. The above data shows that the patch improves performance for cases when there is shared LWLock contention, however there is a slight performance dip in case of Exclusive LWLocks (at scale factor 3000, it needs exclusive LWLocks for buf mapping tables). Now I am not sure if this is the worst case dip or under similar configurations the performance dip can be higher, because the trend shows that dip is increasing with more client counts. Brief Analysis of code w.r.t performance dip - Extra Instructions w.r.t Head in Acquire Exclusive lock path a. Attempt lock twice b. atomic operations for nwaiters in LWLockQueueSelf() and LWLockAcquireCommon() c. Now we need to take spinlock twice, once for self queuing and then again for setting releaseOK. d. few function calls and some extra checks Hm. I can't really see the number of atomics itself matter - a spinning lock will do many more atomic ops than this. But I wonder whether we could get rid of the releaseOK lock. Should be quite possible. Now probably these shouldn't matter much in case backend needs to wait for other Exclusive locker, but I am not sure what else could be the reason for dip in case we need to have Exclusive LWLocks. Any chance to get a profile? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] alter user/role CURRENT_USER
Hello, on the way considering alter role set .., I found that ALTER ROLE/USER cannot take CURRENT_USER as the role name. In addition to that, documents of ALTER ROLE / USER are inconsistent with each other in spite of ALTER USER is said to be an alias for ALTER ROLE. Plus, ALTER USER cannot take ALL as user name although ALTER ROLE can. This patch does following things, - ALTER USER/ROLE now takes CURRENT_USER as user name. - Rewrite sysnopsis of the documents for ALTER USER and ALTER ROLE so as to they have exactly same syntax. - Modify syntax of ALTER USER so as to be an alias of ALTER ROLE. - Added CURRENT_USER/CURRENT_ROLE syntax to both. - Added ALL syntax as user name to ALTER USER. - Added IN DATABASE syntax to ALTER USER. x Integrating ALTER ROLE/USER syntax could not be done because of shift/reduce error of bison. x This patch contains no additional regressions. Is it needed? SESSION_USER/USER also can be made usable for this command, but this patch doesn't so (yet). regards, -- Kyotaro Horiguchi NTT Open Source Software Center From d12f479de845f55f77096e79fea69930bd665416 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp Date: Tue, 9 Sep 2014 19:26:33 +0900 Subject: [PATCH 2/2] ALTER ROLE CURRENT_USER document --- doc/src/sgml/ref/alter_role.sgml | 15 --- doc/src/sgml/ref/alter_user.sgml | 13 +++-- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/doc/src/sgml/ref/alter_role.sgml b/doc/src/sgml/ref/alter_role.sgml index 0471daa..e6f8093 100644 --- a/doc/src/sgml/ref/alter_role.sgml +++ b/doc/src/sgml/ref/alter_role.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation refsynopsisdiv synopsis -ALTER ROLE replaceable class=PARAMETERname/replaceable [ [ WITH ] replaceable class=PARAMETERoption/replaceable [ ... ] ] +ALTER ROLE { replaceable class=parametername/replaceable | CURRENT_USER } [ [ WITH ] replaceable class=PARAMETERoption/replaceable [ ... ] ] phrasewhere replaceable class=PARAMETERoption/replaceable can be:/phrase @@ -37,12 +37,12 @@ ALTER ROLE replaceable class=PARAMETERname/replaceable [ [ WITH ] replace | [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'replaceable class=PARAMETERpassword/replaceable' | VALID UNTIL 'replaceable class=PARAMETERtimestamp/replaceable' -ALTER ROLE replaceable class=PARAMETERname/replaceable RENAME TO replaceablenew_name/replaceable +ALTER ROLE replaceable class=parametername/replaceable RENAME TO replaceablenew_name/replaceable -ALTER ROLE replaceable class=PARAMETERname/replaceable [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable { TO | = } { replaceablevalue/replaceable | DEFAULT } -ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable FROM CURRENT -ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET replaceableconfiguration_parameter/replaceable -ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET ALL +ALTER ROLE { replaceable class=parametername/replaceable | CURRENT_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable { TO | = } { replaceablevalue/replaceable | DEFAULT } +ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable FROM CURRENT +ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET replaceableconfiguration_parameter/replaceable +ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET ALL /synopsis /refsynopsisdiv @@ -123,7 +123,8 @@ ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATA termreplaceable class=PARAMETERname/replaceable/term listitem para -The name of the role whose attributes are to be altered. +The name of the role whose attributes are to be +altered. literalCURRENT_USER/ matches the name of the current user. /para /listitem /varlistentry diff --git a/doc/src/sgml/ref/alter_user.sgml b/doc/src/sgml/ref/alter_user.sgml index 58ae1da..feb1197 100644 --- a/doc/src/sgml/ref/alter_user.sgml +++ b/doc/src/sgml/ref/alter_user.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation refsynopsisdiv synopsis -ALTER USER replaceable class=PARAMETERname/replaceable [ [ WITH ] replaceable class=PARAMETERoption/replaceable [ ... ] ] +ALTER USER {
Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback
On Thu, Oct 9, 2014 at 6:42 PM, furu...@pm.nttdata.co.jp wrote: If we remove --fsync-interval, resoponse time to user will not be delay. Although, fsync will be executed multiple times in a short period. And there is no way to solve the problem without --fsync-interval, what should we do about it? I'm sorry, I didn't understand that. Here is an example. When WAL is sent at 100ms intervals, fsync() is executed 10 times per second. If --fsync-interval is set by 1 second, we have to wait SQL responce(kind of making WAL record) for 1 second, though, fsync() won't be executed several times in 1 second. I think --fsync-interval meets the demands of people who wants to restrain fsync() happens for several time in short period, but what do you think? Is it ok to delete --fsync-interval ? I still don't see the problem. In synchronous mode, pg_receivexlog should have similar logic as walreceiver does. It should read as much WAL from the socket as it can without blocking, and fsync() and send reply after that. And also fsync whenever switching to new segment. If the master sends WAL every 100ms, then pg_recevexlog will indeed fsync() 10 times per second. There's nothing wrong with that. In asynchronous mode, only fsync whenever switching to new segment. Yeah. Or rather, add a new message type, to indicate the synchronous/asynchronous status. What kind of 'message type' we have to add ? Do we need to separate 'w' into two types ? synchronous and asynchronous ? OR Add a new message type, kind of 'notify synchronous', and inform pg_receivexlog of synchronous status when it connect to the server. Better to add a new notify message type. And pg_recevexlog should be prepared to receive it at any time. The status might change on the fly, if the server's configuration is reloaded. Thanks for the reply. In synchronous mode, pg_receivexlog should have similar logic as walreceiver does. OK. I understand that removing --fsync-interval has no problem. +1 for adding something like --synchronous option. To me, it sounds walreceiver-compatible mode rather than synchronous. Better to add a new notify message type. And pg_recevexlog should be prepared to receive it at any time. The status might change on the fly, if the server's configuration is reloaded. OK. I'll consider it. I don't think that's good idea because it prevents us from using pg_receivexlog as async walreceiver (i.e., received WAL data is fsynced and feedback is sent back to the server soon, but transaction commit in the server doesn't wait for the feedback). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user set local_preload_libraries.
Hello, I overlooked this thread. On Mon, Sep 15, 2014 at 1:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On 9/1/14 7:51 AM, Kyotaro HORIGUCHI wrote: The attached patch simply changes the context for local_... to PGC_USERSET and edits the doc. I had this ready to commit, but then Invent PGC_SU_BACKEND and mark log_connections/log_disconnections that way. was committed in the meantime. Does this affect what we should do with this change? I guess one thing to look into would be whether we could leave local_preload_libraries as PGC_BACKEND and change session_preload_libraries to PGC_SU_BACKEND, and then investigate whether we could allow settings made with ALTER ROLE / SET to change PGC_BACKEND settings. Yeah, I was wondering about that while I was making the other commit. I did not touch those variables at the time, but it would make sense to restrict them as you suggest. +1 Also I think that it's useful to allow ALTER ROLE/DATABASE SET to set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what about applying the attached patch? This patch allows that and changes the context of session_preload_libraries to PGC_SU_BACKEND. It's not my business to decide to aplly it but I don't see obvious problmen in it so far. By the way, I became unable to login at all after wrongly setting *_preload_libraries for all available users. Is there any releaf measures for the situation? I think it's okay even if there's no way to login again but want to know if any. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Column Redaction
Postgres currently supports column level SELECT privileges. 1. If we want to confirm a credit card number, we can issue SELECT 1 FROM customer WHERE stored_card_number = '1234 5678 5344 7733' 2. If we want to look for card fraud, we need to be able to use the full card number to join to transaction data and look up blocked card lists etc.. 3. We want to block the direct retrieval of card numbers for additional security. In some cases, we might want to return an answer like ' * 7733' We can't do all of the above with current facilities inside the database. The ability to mask output for data in certain cases, for the purpose of security, is known lately as data redaction, or column-level data redaction. The best way to support this requirement would be to allow columns to have an additional output formatting function. This would be executed only when data is about to be returned by a query. All other uses of that would not restrict the data. This would have other uses as well, such as default report formats, so we can store financial amounts as NUMERIC, but format them on retrieval as $12,345.78 etc.. Suggested user interface would be... FORMAT functionname(parameters, if any) e.g. CREATE TABLE customer ( id ... ... , stored_card_number NUMERIC FORMAT pci_card_number_redaction() ... ); We'd need to implement something to allow pg_dump to ignore format functions. I suggest the best way to do that is by providing a BACKUP role that can be delegated to other users. We would then allow a parameter for SET output_formatting = on | off, which can only be set by superuser and BACKUP role, then have pg_dump issue SET output_formatting = off explicitly when it runs. Do we want redaction in PostgreSQL? Do we want it generalised into output format functions? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10/10/2014 11:57 AM, Simon Riggs wrote: Postgres currently supports column level SELECT privileges. 1. If we want to confirm a credit card number, we can issue SELECT 1 FROM customer WHERE stored_card_number = '1234 5678 5344 7733' 2. If we want to look for card fraud, we need to be able to use the full card number to join to transaction data and look up blocked card lists etc.. 3. We want to block the direct retrieval of card numbers for additional security. In some cases, we might want to return an answer like ' * 7733' We can't do all of the above with current facilities inside the database. Deny access to the underlying tables. Write SQL functions to do 1. and 2., and grant privileges to the functions, instead. For 3. create views that do the redaction. The ability to mask output for data in certain cases, for the purpose of security, is known lately as data redaction, or column-level data redaction. The best way to support this requirement would be to allow columns to have an additional output formatting function. This would be executed only when data is about to be returned by a query. All other uses of that would not restrict the data. I don't see how that could work. Once you have access to the datum, you can find its value in many indirect ways, without invoking the output function. For example, write a PL/pgSQL function that takes the card number as argument. Use and to binary search its value. If you block and , I'm sure there are countless other ways. And messing with output functions seems pretty, well, messy, in general. I think the only solution that's going to work in practice is to implement the redaction at a higher level. Don't allow direct access to the tables with card numbers. Create functions that do whatever joins, etc. you need to do with them, and grant privileges to only the functions. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
This would have other uses as well, such as default report formats, so we can store financial amounts as NUMERIC, but format them on retrieval as $12,345.78 etc.. Nice idea, but what if you need to do further calculations? If you output the value of credit card transactions it works fine, but in case you want to SUM up the values, then you need to cast it back from text(?) to numeric, calculate it and cast it to text(?) again? And if you do - for any reason - need the credit card number in your application (for example sending it to the credit card company to deduct money) how can you retrieve it¹s original value? Moreover, if you SELECT from a sub-SELECT which already has the formatted information and not the plain data? Maybe you should restrict access to tables for a certain user and only allow the user to use a view which formats the output. Modern applications do have a presentation layer which should take care of data formatting. I am not sure if it is a good idea to mix data storage and data presentation in the database. Regards, Damian Wolgast (irc:asymetrixs) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user set local_preload_libraries.
On Fri, Oct 10, 2014 at 5:36 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, I overlooked this thread. On Mon, Sep 15, 2014 at 1:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On 9/1/14 7:51 AM, Kyotaro HORIGUCHI wrote: The attached patch simply changes the context for local_... to PGC_USERSET and edits the doc. I had this ready to commit, but then Invent PGC_SU_BACKEND and mark log_connections/log_disconnections that way. was committed in the meantime. Does this affect what we should do with this change? I guess one thing to look into would be whether we could leave local_preload_libraries as PGC_BACKEND and change session_preload_libraries to PGC_SU_BACKEND, and then investigate whether we could allow settings made with ALTER ROLE / SET to change PGC_BACKEND settings. Yeah, I was wondering about that while I was making the other commit. I did not touch those variables at the time, but it would make sense to restrict them as you suggest. +1 Also I think that it's useful to allow ALTER ROLE/DATABASE SET to set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what about applying the attached patch? This patch allows that and changes the context of session_preload_libraries to PGC_SU_BACKEND. It's not my business to decide to aplly it but I don't see obvious problmen in it so far. By the way, I became unable to login at all after wrongly setting *_preload_libraries for all available users. Is there any releaf measures for the situation? I think it's okay even if there's no way to login again but want to know if any. Yep, that's a problem. You can login to the server even in that case by, for example, executing the following commands, though. $ export PGOPTIONS=-c *_preload_libraries= $ psql Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10 October 2014 10:29, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 10/10/2014 11:57 AM, Simon Riggs wrote: Postgres currently supports column level SELECT privileges. 1. If we want to confirm a credit card number, we can issue SELECT 1 FROM customer WHERE stored_card_number = '1234 5678 5344 7733' 2. If we want to look for card fraud, we need to be able to use the full card number to join to transaction data and look up blocked card lists etc.. 3. We want to block the direct retrieval of card numbers for additional security. In some cases, we might want to return an answer like ' * 7733' We can't do all of the above with current facilities inside the database. Deny access to the underlying tables. Write SQL functions to do 1. and 2., and grant privileges to the functions, instead. For 3. create views that do the redaction. If everything were easy to lock down the approach you suggest is of course the best way. The problem there is that the SQL for (2) changes frequently, so we want to give people SQL access. Just not the ability to retrieve data in a usable form. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10 October 2014 10:15, Thom Brown t...@linux.com wrote: On 10 October 2014 09:57, Simon Riggs si...@2ndquadrant.com wrote: Postgres currently supports column level SELECT privileges. 1. If we want to confirm a credit card number, we can issue SELECT 1 FROM customer WHERE stored_card_number = '1234 5678 5344 7733' 2. If we want to look for card fraud, we need to be able to use the full card number to join to transaction data and look up blocked card lists etc.. 3. We want to block the direct retrieval of card numbers for additional security. In some cases, we might want to return an answer like ' * 7733' One question that immediately springs to mind is: would the format apply when passing columns to other functions? If not, wouldn't something like SELECT upper(redacted_column::text) ... just bypass the formatting? Yes, it would. As would SELECT redacted_column || ' ' I'm not sure how to block such usage, other than to apply it prior to final calculation of functions. i.e. we apply it in the SELECT clause, but not in the other clauses FROM ON/WHERE/GROUP/ORDER/HAVING etc.. Also, how would casting be handled? Would it be forbidden for such cases? And couldn't the card number be worked out using: SELECT 1 FROM customer WHERE stored_card_number LIKE '%1 7733'; ?column? -- (0 rows) SELECT 1 FROM customer WHERE stored_card_number LIKE '%2 7733'; ?column? -- 1 (1 row) SELECT 1 FROM customer WHERE stored_card_number LIKE '%12 7733'; ?column? -- (0 rows) .. and so on, which could be scripted in a DO statement? Not so much a challenge to the idea, but just wishing to understand how it would work. Yes, covert channels would always exist. It would really be down to auditing to control such exploits. Redaction is aimed at minimising access in normal usage. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
The problem there is that the SQL for (2) changes frequently, so we want to give people SQL access. So you want to give people access to your SQL database and worry that they could see specific information (credit card numbers) in plain and therefore you want to format it, so that people cannot see the real data. Is that correct? I'd either do that by only letting them access a view or be reconsidering if it is really a good idea to give them SQL access to the server as they could do other things which e.g. could slow down the server enormously. Never trust the user. So I see what you want to achieve but I am not sure if the reason to do that is good. Can you explain please? Maybe you should provide them an interface (e.g. web app) that restricts access to certain functions and cares about formatting. Regards Damian Wolgast (irc:asymetrixs) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling shared buffer eviction
On 2014-10-10 12:28:13 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 1:08 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-09 16:01:55 +0200, Andres Freund wrote: I don't think OLTP really is the best test case for this. Especially not pgbench with relatilvely small rows *and* a uniform distribution of access. Try parallel COPY TO. Batch write loads is where I've seen this hurt badly. just by switching shared_buffers from 1 to 8GB. I haven't tried, but I hope that with an approach like your's this might become better. psql -f /tmp/prepare.sql pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100 Thanks for providing the scripts. You haven't specified how much data is present in 'large' file used in tests. I don't think it matters much. It should be small enough that you get a couple TPS over all backends. I have tried with different set of rows, but I could not see the dip that is present in your data when you increased shared buffers from 1GB to 8GB, also I don't see any difference with patch. Interesting. I wonder whether that's because the concurrency wasn't high enough for that machine to show the problem. I ran the test on my workstation which has 8 actual cores. BTW, why do you think that for such worklaods this patch can be helpful, according to my understanding it can be helpful mainly for read mostly workloads when all the data doesn't fit in shared buffers. The performance dip comes from all the backends performing the clock sweep. As the access is pretty uniform all the buffers start with some usage count (IIRC 3 in this example. Much worse if 5). Due to the uniform usagecount the clock sweep frequently has to go several times through *all* the buffers. That leads to quite horrible performance in some cases. I had hoped that bgreclaimer can make that workload les horrible by funneling most of the accesses through the freelist. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10 October 2014 11:08, Damian Wolgast damian.wolg...@si-co.net wrote: The problem there is that the SQL for (2) changes frequently, so we want to give people SQL access. So you want to give people access to your SQL database and worry that they could see specific information (credit card numbers) in plain and therefore you want to format it, so that people cannot see the real data. Is that correct? I'd either do that by only letting them access a view or be reconsidering if it is really a good idea to give them SQL access to the server as they could do other things which e.g. could slow down the server enormously. Never trust the user. So I see what you want to achieve but I am not sure if the reason to do that is good. Can you explain please? Maybe you should provide them an interface (e.g. web app) that restricts access to certain functions and cares about formatting. The requirement for redaction cannot be provided by a view. A view provides a single value for each column, no matter whether it is used in SELECT or WHERE clause. Redaction requires output formatting only, but unchanged for other purposes. Redaction is now a feature available in other databases. I guess its possible its all smoke and mirrors, but thats why we discuss stuff before we build it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10/10/2014 01:21 PM, Simon Riggs wrote: Redaction is now a feature available in other databases. I guess its possible its all smoke and mirrors, but thats why we discuss stuff before we build it. I googled for Oracle Data redaction, and found General Usage guidelines: General Usage Guidelines * Oracle Data Redaction is not intended to protect against attacks by privileged database users who run ad hoc queries directly against the database. * Oracle Data Redaction is not intended to protect against users who run exhaustive SQL queries that attempt to determine the actual values by inference. So it's not actually suitable for the example you gave. I don't think we want this feature... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
Simon, * Simon Riggs (si...@2ndquadrant.com) wrote: The requirement for redaction cannot be provided by a view. A view provides a single value for each column, no matter whether it is used in SELECT or WHERE clause. Redaction requires output formatting only, but unchanged for other purposes. Redaction is now a feature available in other databases. I guess its possible its all smoke and mirrors, but thats why we discuss stuff before we build it. In general, I'm on-board with the idea and similar requests have come from users I've talked with. Is there any additional information available on how these other databases deal with the questions and concerns which have been raised? Regarding functions, 'leakproof' functions should be alright to allow, though Heikki brings up a good point regarding binary search being possible in a plpgsql function (or even directly by a client). Of course, that approach also requires that you have a specific item in mind. Methods to mitigate would include not allowing regular users to create functions or run DO blocks and rate-limiting their queries, along with appropriate auditing. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Column Redaction
Hi 2014-10-10 10:57 GMT+02:00 Simon Riggs si...@2ndquadrant.com: Postgres currently supports column level SELECT privileges. 1. If we want to confirm a credit card number, we can issue SELECT 1 FROM customer WHERE stored_card_number = '1234 5678 5344 7733' 2. If we want to look for card fraud, we need to be able to use the full card number to join to transaction data and look up blocked card lists etc.. 3. We want to block the direct retrieval of card numbers for additional security. In some cases, we might want to return an answer like ' * 7733' We can't do all of the above with current facilities inside the database. The ability to mask output for data in certain cases, for the purpose of security, is known lately as data redaction, or column-level data redaction. The best way to support this requirement would be to allow columns to have an additional output formatting function. This would be executed only when data is about to be returned by a query. All other uses of that would not restrict the data. This would have other uses as well, such as default report formats, so we can store financial amounts as NUMERIC, but format them on retrieval as $12,345.78 etc.. Suggested user interface would be... FORMAT functionname(parameters, if any) e.g. CREATE TABLE customer ( id ... ... , stored_card_number NUMERIC FORMAT pci_card_number_redaction() ... ); We'd need to implement something to allow pg_dump to ignore format functions. I suggest the best way to do that is by providing a BACKUP role that can be delegated to other users. We would then allow a parameter for SET output_formatting = on | off, which can only be set by superuser and BACKUP role, then have pg_dump issue SET output_formatting = off explicitly when it runs. I see a benefit of this feature as alternative output function .. I remember a talk about output format of boolean function. But how this feature can help to security? You should to disallow any expression over this column marked or you have to enforced output alternative output function early. When you require a alternative output format function (should be implemented in C), then there is not too less work than implementation of new type. So probably much more practical a any expression can be used like stored_card_number NUMERIC FORMAT (right(stored_card_numbe::text, 4)) Regards Pavel Do we want redaction in PostgreSQL? Do we want it generalised into output format functions? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 2014-10-08 20:07:35 -0400, Robert Haas wrote: On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote: So, what makes it work for me (among other unrelated stuff) seems to be the following in .gdbinit, defineing away some things that gdb doesn't handle: macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F)) macro define __extension__ macro define AssertVariableIsOfTypeMacro(x, y) ((void)0) Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about postgres' macros. At least if you're in the right scope. As an example, the following works: (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, elem, BackendList) Ah, cool. I'll try that. If that works for you, should we put it somewhere in the docs? If so, where? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 10/10/2014 01:21 PM, Simon Riggs wrote: Redaction is now a feature available in other databases. I guess its possible its all smoke and mirrors, but thats why we discuss stuff before we build it. I googled for Oracle Data redaction, and found General Usage guidelines: General Usage Guidelines * Oracle Data Redaction is not intended to protect against attacks by privileged database users who run ad hoc queries directly against the database. * Oracle Data Redaction is not intended to protect against users who run exhaustive SQL queries that attempt to determine the actual values by inference. So it's not actually suitable for the example you gave. I don't think we want this feature... Or, we need to consider how Oracle addresses these risks and consider if we can provide a similar capability. Those capabilities may include specific configuration and could be a prerequisite for this feature, but I don't think it's sensible to say we don't want this feature simply because it can't stand alone as a perfect answer to these risks. As has been discussed before, we are likely in a better position to identify the concerns and problem areas, come up with recommendations for configuration and/or develop new capabilities to mitigate those risks, than the every-day user or DBA. If we provide it and address these issues in a central location which is generally available, then fixes and problems can be addressed and fixed rather than every database implementation faced with these concerns having to address them independently with, most likely, poorer quality solutions. While we don't want every feature of every database, this deserves more consideration. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Column Redaction
* Thom Brown (t...@linux.com) wrote: To be honest, this all sounds rather flaky. Even if you do rate-limit their queries, they can use methods that avoid rate-limiting, such as recursive queries. And if you're only after one credit card number (to use the original example), you'd get it in a relatively short amount of time, despite some rate-limiting system. The discussion about looking up specific card numbers in the original email from Simon was actually an allowed use-case, as I understood it, not a risk concern. Indeed, if you know a valid credit card number already, as in this example, then why are you bothering with the search? Perhaps it would provide confirmation, but it's not the database's responsibility to make you forget the number you already have. Doing a random walk through a keyspace of 10^16 and extracting a significant enough number of results to be useful should be difficult. I agree that if we're completely unable to make it difficult then this is less useful, but I feel it's a bit early to jump to that conclusion. This gives the vague impression of security, but it really seems just the placing of a few obstacles in the way. One might consider that all security is just placing obstacles in the way. And auditing sounds like a euphemism for pass the problem of security on elsewhere anyway. Auditing is a known requirement for good security.. There's certainly different levels of it, but if you aren't at least auditing your security configuration for the attack vectors you're concerned about, then you're unlikely to have any real security. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Column Redaction
On 10/10/2014 01:35 PM, Stephen Frost wrote: Regarding functions, 'leakproof' functions should be alright to allow, though Heikki brings up a good point regarding binary search being possible in a plpgsql function (or even directly by a client). Of course, that approach also requires that you have a specific item in mind. It doesn't require that you have a specific item in mind. Binary search is cheap, O(log n). It's easy to write a function to do a binary search on a single item, passed as argument, and then apply that to all rows: SELECT binary_search_reveal(cardnumber) FROM redacted_table; Really, I don't see how this can possible be made to work. You can't allow ad hoc processing of data, and still avoid revealing it to the user. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 10/10/2014 01:35 PM, Stephen Frost wrote: Regarding functions, 'leakproof' functions should be alright to allow, though Heikki brings up a good point regarding binary search being possible in a plpgsql function (or even directly by a client). Of course, that approach also requires that you have a specific item in mind. It doesn't require that you have a specific item in mind. Binary search is cheap, O(log n). It's easy to write a function to do a binary search on a single item, passed as argument, and then apply that to all rows: SELECT binary_search_reveal(cardnumber) FROM redacted_table; Note that your binary_search_reveal wouldn't be marked as leakproof and therefore this wouldn't be allowed. If this was allowed, you'd simply do raise notice inside the function and call it a day. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
Hi Robert, On 2014-10-08 16:01:53 -0400, Robert Haas wrote: [ comment fixes ] Thanks, I've incorporated these + a bit more. Could you otherwise make sense of the explanation and the algorithm? +/* yipeyyahee */ Although this will be clear to individuals with a good command of English, I suggest avoiding such usages. I've removed them with a heavy heart. These are heartfelt emotions from getting the algorithm to work (:P) I've attached these fixes + the removal of spinlocks around releaseOK as follow up patches. Obviously they'll be merged into the other patch, but sounds useful to be able see them separately. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Tue, 7 Oct 2014 15:32:34 +0200 Subject: [PATCH 1/4] Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. Besides being shorter and much easier to read it: * changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue. * adds a memory pg_write_barrier() in the wakeup paths between dequeuing and unsetting -lwWaiting. That was always required on weakly ordered machines, but f4077cda2 made it more urgent. Author: Andres Freund --- src/backend/access/transam/twophase.c | 1 - src/backend/storage/lmgr/lwlock.c | 151 +- src/backend/storage/lmgr/proc.c | 2 - src/include/storage/lwlock.h | 5 +- src/include/storage/proc.h| 3 +- 5 files changed, 60 insertions(+), 102 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index d5409a6..6401943 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid, proc-roleId = owner; proc-lwWaiting = false; proc-lwWaitMode = 0; - proc-lwWaitLink = NULL; proc-waitLock = NULL; proc-waitProcLock = NULL; for (i = 0; i NUM_LOCK_PARTITIONS; i++) diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 9fe6855..e6f9158 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -35,6 +35,7 @@ #include miscadmin.h #include pg_trace.h #include replication/slot.h +#include storage/barrier.h #include storage/ipc.h #include storage/predicate.h #include storage/proc.h @@ -115,9 +116,9 @@ inline static void PRINT_LWDEBUG(const char *where, const LWLock *lock) { if (Trace_lwlocks) - elog(LOG, %s(%s %d): excl %d shared %d head %p rOK %d, + elog(LOG, %s(%s %d): excl %d shared %d rOK %d, where, T_NAME(lock), T_ID(lock), - (int) lock-exclusive, lock-shared, lock-head, + (int) lock-exclusive, lock-shared, (int) lock-releaseOK); } @@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id) lock-exclusive = 0; lock-shared = 0; lock-tranche = tranche_id; - lock-head = NULL; - lock-tail = NULL; + dlist_init(lock-waiters); } @@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val) proc-lwWaiting = true; proc-lwWaitMode = mode; - proc-lwWaitLink = NULL; - if (lock-head == NULL) - lock-head = proc; - else - lock-tail-lwWaitLink = proc; - lock-tail = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex); @@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode) proc-lwWaiting = true; proc-lwWaitMode = LW_WAIT_UNTIL_FREE; - proc-lwWaitLink = NULL; - if (lock-head == NULL) - lock-head = proc; - else - lock-tail-lwWaitLink = proc; - lock-tail = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex); @@ -997,13 +987,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) */ proc-lwWaiting = true; proc-lwWaitMode = LW_WAIT_UNTIL_FREE; - proc-lwWaitLink = NULL; - /* waiters are added to the front of the queue */ - proc-lwWaitLink = lock-head; - if (lock-head == NULL) - lock-tail = proc; - lock-head = proc; + dlist_push_head(lock-waiters, proc-lwWaitLink); /* Can release the mutex now */ SpinLockRelease(lock-mutex); @@ -1079,9 +1064,10 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval) void LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val) { - PGPROC *head; - PGPROC *proc; - PGPROC *next; + dlist_head wakeup; + dlist_mutable_iter iter; + + dlist_init(wakeup); /* Acquire mutex. Time spent holding mutex
Re: [HACKERS] Column Redaction
On 10/10/2014 11:38 AM, Simon Riggs wrote: On 10 October 2014 10:29, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 10/10/2014 11:57 AM, Simon Riggs wrote: Postgres currently supports column level SELECT privileges. 1. If we want to confirm a credit card number, we can issue SELECT 1 FROM customer WHERE stored_card_number = '1234 5678 5344 7733' 2. If we want to look for card fraud, we need to be able to use the full card number to join to transaction data and look up blocked card lists etc.. 3. We want to block the direct retrieval of card numbers for additional security. In some cases, we might want to return an answer like ' * 7733' We can't do all of the above with current facilities inside the database. Deny access to the underlying tables. Write SQL functions to do 1. and 2., and grant privileges to the functions, instead. For 3. create views that do the redaction. If everything were easy to lock down the approach you suggest is of course the best way. The problem there is that the SQL for (2) changes frequently, so we want to give people SQL access. 1. Give people access to development system with safe data where they write their functions 2. once function is working, pass it to auditors 3. deploy and use the function. Just not the ability to retrieve data in a usable form. For an attacker any access is in a usable form, for honest people you can just provide a view or set-returning function. btw, one way to do the redaction you suggested above is to write a special type, which redacts data on output. You can even make the type output function dependent on backup role. Just make sure that users are aware that it is not really a security feature which protects against attackers. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OĂœ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10/10/2014 02:05 PM, Stephen Frost wrote: * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 10/10/2014 01:35 PM, Stephen Frost wrote: Regarding functions, 'leakproof' functions should be alright to allow, though Heikki brings up a good point regarding binary search being possible in a plpgsql function (or even directly by a client). Of course, that approach also requires that you have a specific item in mind. It doesn't require that you have a specific item in mind. Binary search is cheap, O(log n). It's easy to write a function to do a binary search on a single item, passed as argument, and then apply that to all rows: SELECT binary_search_reveal(cardnumber) FROM redacted_table; Note that your binary_search_reveal wouldn't be marked as leakproof and therefore this wouldn't be allowed. If this was allowed, you'd simply do raise notice inside the function and call it a day. *shrug*, just do the same with a more complicated query, then. Even if you can't create a function that does that, you can still execute the same logic without a function. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10 October 2014 12:00, Stephen Frost sfr...@snowman.net wrote: * Thom Brown (t...@linux.com) wrote: To be honest, this all sounds rather flaky. Even if you do rate-limit their queries, they can use methods that avoid rate-limiting, such as recursive queries. And if you're only after one credit card number (to use the original example), you'd get it in a relatively short amount of time, despite some rate-limiting system. The discussion about looking up specific card numbers in the original email from Simon was actually an allowed use-case, as I understood it, not a risk concern. Indeed, if you know a valid credit card number already, as in this example, then why are you bothering with the search? The topic being column redaction rather than column formatting leads me to believe that the main use-case of the feature would be to prevent the user from discovering the full value of the column. It's not so much point 1 I was responding do, rather point 3, where you don't know the card number, but you get information about it in the results. The purpose of this feature would be to prevent the user from seeing all that data, which is a security feature, but at the moment it just seems to be a way of making it a little less easy to get at that data. This gives the vague impression of security, but it really seems just the placing of a few obstacles in the way. One might consider that all security is just placing obstacles in the way. There's a difference between intending that there shouldn't be a way past security and just making access a matter of walking a longer route. I wouldn't be against formatting per se, but for the purposes of that, I would say that views can already serve that purpose. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 10/10/2014 02:05 PM, Stephen Frost wrote: * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 10/10/2014 01:35 PM, Stephen Frost wrote: Regarding functions, 'leakproof' functions should be alright to allow, though Heikki brings up a good point regarding binary search being possible in a plpgsql function (or even directly by a client). Of course, that approach also requires that you have a specific item in mind. It doesn't require that you have a specific item in mind. Binary search is cheap, O(log n). It's easy to write a function to do a binary search on a single item, passed as argument, and then apply that to all rows: SELECT binary_search_reveal(cardnumber) FROM redacted_table; Note that your binary_search_reveal wouldn't be marked as leakproof and therefore this wouldn't be allowed. If this was allowed, you'd simply do raise notice inside the function and call it a day. *shrug*, just do the same with a more complicated query, then. Even if you can't create a function that does that, you can still execute the same logic without a function. Not sure I see what you're getting at here..? My point was that you'd need a target number and the system would only provide confirmation that the number exists, or does not. Your argument was that the table itself would provide the target number, which was flawed. I don't see how just do the same with a more complicated query removes the need to have a target number for the binary search. A better argument would be the equality case than the binary search if you're simply looking for confirmation of existence. If the user can define a table of targets, or uses a VALUES construct, and then join to it then we might build a hash table and provide those results faster than a binary search, though this again means that the user is providing the list of keys to check. As mentioned elsewhere on the thread, I agree that this capability wouldn't be useful if a random search (which is providing the 'targets') through a 10^16 keyspace generated a significant number of results (I'd also throw in there in a reasonable amount of time- clearly it'd be possible to extract all keys given sufficient time, even with a random search). The sketch that Simon outlined won't obviously provide that guarantee, but I'm not prepared to say we couldn't provide it at all while meeting the goal he outlined. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Column Redaction
* Thom Brown (t...@linux.com) wrote: On 10 October 2014 12:00, Stephen Frost sfr...@snowman.net wrote: The discussion about looking up specific card numbers in the original email from Simon was actually an allowed use-case, as I understood it, not a risk concern. Indeed, if you know a valid credit card number already, as in this example, then why are you bothering with the search? The topic being column redaction rather than column formatting leads me to believe that the main use-case of the feature would be to prevent the user from discovering the full value of the column. I believe the idea is to limit the chances that a user with limited pre-existing knowledge would be able to determine the full value of items in the column, especially in bulk. It's not so much point 1 I was responding do, rather point 3, where you don't know the card number, but you get information about it in the results. We'd certainly want to prevent that to the limit possible. Do you have a specific thought about how they'd be able to find a full number beyond a random search..? The purpose of this feature would be to prevent the user from seeing all that data, which is a security feature, but at the moment it just seems to be a way of making it a little less easy to get at that data. I certainly appreciate the thought challenges and critique and I'm hopeful we could make it more than a little less easy to get at the information. If we aren't able to do that, then the feature isn't useful, certainly. This gives the vague impression of security, but it really seems just the placing of a few obstacles in the way. One might consider that all security is just placing obstacles in the way. There's a difference between intending that there shouldn't be a way past security and just making access a matter of walking a longer route. Throwing random 16-digit numbers and associated information at a credit card processor could be viewed as walking a longer route too. The same goes for random key searches or password guesses. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-10 10:13:03 +0530, Amit Kapila wrote: I have done few performance tests for above patches and results of same is as below: Cool, thanks. Performance Data -- IBM POWER-7 16 cores, 64 hardware threads RAM = 64GB max_connections =210 Database Locale =C checkpoint_segments=256 checkpoint_timeout=35min shared_buffers=8GB Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8) Duration of each individual run = 5mins Test type - read only pgbench with -M prepared Other Related information about test a. This is the data for median of 3 runs, the detailed data of individual run is attached with mail. b. I have applied both the patches to take performance data. Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. Is it possible that implementation of some atomic operation is costlier for particular architecture? I have tried again for scale factor 3000 and could see the dip and this time I have even tried with 175 client count and the dip is approximately 5% which is slightly more than 160 client count. Patch_ver/Client_count 175 HEAD 248374 PATCH 235669 Now probably these shouldn't matter much in case backend needs to wait for other Exclusive locker, but I am not sure what else could be the reason for dip in case we need to have Exclusive LWLocks. Any chance to get a profile? Here it goes.. HEAD - client_count=128 - + 7.53% postgres postgres [.] GetSnapshotData + 3.41% postgres postgres [.] AllocSetAlloc + 2.61% postgres postgres [.] AllocSetFreeIndex + 2.49% postgres postgres [.] _bt_compare + 2.43% postgres [kernel.kallsyms] [k] .__copy_tofrom_user + 2.40% postgres postgres [.] hash_search_with_hash_value + 1.83% postgres postgres [.] tas + 1.29% postgres postgres [.] pg_encoding_mbcliplen + 1.27% postgres postgres [.] MemoryContextCreate + 1.22% postgres postgres [.] MemoryContextAllocZeroAligned + 1.17% postgres postgres [.] hash_seq_search + 0.97% postgres postgres [.] LWLockRelease + 0.96% postgres postgres [.] MemoryContextAllocZero + 0.91% postgres postgres [.] GetPrivateRefCountEntry + 0.82% postgres postgres [.] AllocSetFree + 0.79% postgres postgres [.] LWLockAcquireCommon + 0.78% postgres postgres [.] pfree Detailed Data - - 7.53% postgres postgres [.] GetSnapshotData - GetSnapshotData - 7.46% GetSnapshotData - 7.46% GetTransactionSnapshot - 3.74% exec_bind_message PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain main generic_start_main.isra.0 __libc_start_main 0 - 3.72% PortalStart exec_bind_message PostgresMain BackendRun BackendStartup ServerLoop PostmasterMain main generic_start_main.isra.0 __libc_start_main 0 - 3.41% postgres postgres [.] AllocSetAlloc - AllocSetAlloc - 2.01% AllocSetAlloc 0.81% palloc 0.63% MemoryContextAlloc - 2.61% postgres postgres [.] AllocSetFreeIndex - AllocSetFreeIndex 1.59% AllocSetAlloc 0.79% AllocSetFree - 2.49% postgres postgres [.] _bt_compare - _bt_compare - 1.80% _bt_binsrch - 1.80% _bt_binsrch - 1.21% _bt_search _bt_first Lwlock_contention patches - client_count=128 -- + 7.95% postgres postgres [.] GetSnapshotData + 3.58% postgres postgres [.] AllocSetAlloc + 2.51% postgres postgres [.] _bt_compare + 2.44% postgres postgres [.] hash_search_with_hash_value + 2.33% postgres [kernel.kallsyms] [k]
Re: [HACKERS] Column Redaction
On 10/10/2014 02:27 PM, Stephen Frost wrote: * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 10/10/2014 02:05 PM, Stephen Frost wrote: * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: On 10/10/2014 01:35 PM, Stephen Frost wrote: Regarding functions, 'leakproof' functions should be alright to allow, though Heikki brings up a good point regarding binary search being possible in a plpgsql function (or even directly by a client). Of course, that approach also requires that you have a specific item in mind. It doesn't require that you have a specific item in mind. Binary search is cheap, O(log n). It's easy to write a function to do a binary search on a single item, passed as argument, and then apply that to all rows: SELECT binary_search_reveal(cardnumber) FROM redacted_table; Note that your binary_search_reveal wouldn't be marked as leakproof and therefore this wouldn't be allowed. If this was allowed, you'd simply do raise notice inside the function and call it a day. *shrug*, just do the same with a more complicated query, then. Even if you can't create a function that does that, you can still execute the same logic without a function. Not sure I see what you're getting at here..? My point was that you'd need a target number and the system would only provide confirmation that the number exists, or does not. Your argument was that the table itself would provide the target number, which was flawed. I don't see how just do the same with a more complicated query removes the need to have a target number for the binary search. You said above that it's OK to pass the card numbers to leakproof functions. But if you allow that, you can write a function that takes as argument a redacted card number, and unredacts it (using the and = operators in a binary search). And then you can just do SELECT unredact(card_number) from redacted_table. You seem to have something stronger in mind: only allow the equality operator on the redacted column, and nothing else. That might be better, although I'm not really convinced. There are just too many ways you could still leak the datum. Just a random example, inspired by the recent CRIME attack on SSL: build a row with the redacted datum, and another guess datum, and store it along with 1k of other data in a temporary table. The row gets toasted. Observe how much it compressed; if the guess datum is close to the original datum, it compresses well. Now, you can probably stop that particular attack with more restrictions on what you can do with the datum, but that just shows that pretty much any computation you allow with the datum can be used to reveal its value. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10 October 2014 12:45, Stephen Frost sfr...@snowman.net wrote: This gives the vague impression of security, but it really seems just the placing of a few obstacles in the way. One might consider that all security is just placing obstacles in the way. There's a difference between intending that there shouldn't be a way past security and just making access a matter of walking a longer route. Throwing random 16-digit numbers and associated information at a credit card processor could be viewed as walking a longer route too. The same goes for random key searches or password guesses. But those would need to be exhaustive, and in nearly all cases, impractical. Data such as plain credit card numbers stored in a column, even with all its data masked, would be easy to determine. Salted and hashed passwords, even with complete visibility of the value, isn't vulnerable to scrutiny of particular character values. If it were, no-one would use it. Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10 October 2014 11:45, Thom Brown t...@linux.com wrote: To be honest, this all sounds rather flaky. To be honest, suggesting anything at all is rather difficult and I recommend people try it. Everything sounds crap when you didn't think of it and you've given it an hour's thought. I'm not blind to the difficulties raised and I thank you for your input, but I think its too early to make sweeping generalisations. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10 October 2014 12:01, Heikki Linnakangas hlinnakan...@vmware.com wrote: Really, I don't see how this can possible be made to work. You can't allow ad hoc processing of data, and still avoid revealing it to the user. Anyone with unmonitored access and sufficient time can break through security. I think that is true of any kind of security, and so it is true here also. Auditing and controls are required also, that's why I suggested those first. This proposal was looking beyond that to what we might need next. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On Fri, Oct 10, 2014 at 7:00 AM, Stephen Frost sfr...@snowman.net wrote: * Thom Brown (t...@linux.com) wrote: To be honest, this all sounds rather flaky. Even if you do rate-limit their queries, they can use methods that avoid rate-limiting, such as recursive queries. And if you're only after one credit card number (to use the original example), you'd get it in a relatively short amount of time, despite some rate-limiting system. The discussion about looking up specific card numbers in the original email from Simon was actually an allowed use-case, as I understood it, not a risk concern. Indeed, if you know a valid credit card number already, as in this example, then why are you bothering with the search? Perhaps it would provide confirmation, but it's not the database's responsibility to make you forget the number you already have. Doing a random walk through a keyspace of 10^16 and extracting a significant enough number of results to be useful should be difficult. I agree that if we're completely unable to make it difficult then this is less useful, but I feel it's a bit early to jump to that conclusion. You are obviously wearing your rose-colored glasses this morning. I predict a competent SQL programmer could write an SQL function, or client-side code, to pump the data out of the database using binary search in milliseconds per row. And I think it's more likely than not that there are other techniques that are much faster. The idea that you're going to be able to let people query the data but not actually retrieve it should be viewed with great skepticism. This is the equivalent of telling a child that she can't open her Christmas presents until Christmas, but she can shake them, hold them up to a bright light, and/or X-ray the packages. If she doesn't know what's in there by the time she opens it, it's just for lack of effort. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On 10 October 2014 13:43, Simon Riggs si...@2ndquadrant.com wrote: On 10 October 2014 11:45, Thom Brown t...@linux.com wrote: To be honest, this all sounds rather flaky. To be honest, suggesting anything at all is rather difficult and I recommend people try it. I have, and most ideas I've had have been justifiably shot down or picked apart (scheduled background tasks, offloading stats collection to standby, index maintenance in DML query plans, expression statistics... to name but a few). Everything sounds crap when you didn't think of it and you've given it an hour's thought. I'm not sure that means my concerns aren't valid. I don't think it sounds crap, but I also can't see any use-case for it where we don't already have things covered, or where it's going to offer any useful level of security. Like with RLS, it may be that I'm just looking at things from the wrong perspective. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On Fri, Oct 10, 2014 at 5:57 AM, Simon Riggs si...@2ndquadrant.com wrote: 1. If we want to confirm a credit card number, we can issue SELECT 1 FROM customer WHERE stored_card_number = '1234 5678 5344 7733' ... 3. We want to block the direct retrieval of card numbers for additional security. In some cases, we might want to return an answer like ' * 7733' I wouldn't want to allow that: select ref.ref, customer.name from (select generate_series as ref from generate_series(0, )) ref, customer where ref.ref = stored_card_number.ref May take a long while. Just disable everything except nestloop and suck up the data as it comes. Can be optimized. Not sure how you'd avoid this, not trivial at all. Not possible at all I'd venture. But if you really really want to allow this, encrypt the column, and provide a C function that can decrypt it. You can join encrypted columns, and you can even include the last 4 digits unencrypted if you want (I wouldn't want). Has to be a C function to be able to avoid leaking the key, btw. 2. If we want to look for card fraud, we need to be able to use the full card number to join to transaction data and look up blocked card lists etc.. view works for this pretty well -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Inconsistencies in documentation of row-level locking
Hi all, Currently all the row-level lock modes are described in the page for SELECT query: http://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS However, after browsing the documentation, I noticed in the page describing all the explicit locks of the system that there is a portion dedicated to row-level locks and this section is not mentioning at all FOR KEY SHARE and FOR NO KEY UPDATE. It seems that this is something rather misleading for the user: http://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS Attached is a patch that refactors the whole and improves the documentation: - Addition of a table showing the conflicts between each lock - Moved description of each row-level lock mode to the explicit locking page - Addition of a link in SELECT portion to redirect the user to the explicit locking page Regards, -- Michael From 974b360cc8bfdade607154c94d0a78f5cf466e0a Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Fri, 10 Oct 2014 22:10:10 +0900 Subject: [PATCH] Refactor documentation of row-level locking All the details about each row-level lock mode was referenced in details in the page dedicated to SELECT query, what seems rather incorrect because there is a section in MVCC section describing in details all the locks that can be used in the system. Note that this portion has not been updated for the implementation of FOR KEY SHARE and FOR NO KEY UPDATE while it should have been the case, making the information provided to user inconsistent and misleading. This patch refactors the whole, adding a link in SELECT to redirect the user to the page describing all the explicit locks of the system. --- doc/src/sgml/mvcc.sgml | 172 +-- doc/src/sgml/ref/select.sgml | 60 +-- 2 files changed, 152 insertions(+), 80 deletions(-) diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index cd55be8..478279d 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -1106,30 +1106,107 @@ ERROR: could not serialize access due to read/write dependencies among transact para In addition to table-level locks, there are row-level locks, which - can be exclusive or shared locks. An exclusive row-level lock on a - specific row is automatically acquired when the row is updated or - deleted. The lock is held until the transaction commits or rolls - back, just like table-level locks. Row-level locks do - not affect data querying; they block only emphasiswriters to the same - row/emphasis. + are listed as below with the contexts in which they are used + automatically by productnamePostgreSQL/productname. The main + differences between one lock mode and another is the set of lock modes + with each conflicts (see xref linkend=row-lock-compatibility). + Two transactions cannot hold locks of conflicting modes on the same rows + of the same table at the same time. Also, a transaction never conflicts + with itself. Row-level locks do not affect data querying; they block only + emphasiswriters to the same row/emphasis. /para -para - To acquire an exclusive row-level lock on a row without actually - modifying the row, select the row with commandSELECT FOR - UPDATE/command. Note that once the row-level lock is acquired, - the transaction can update the row multiple times without - fear of conflicts. -/para + variablelist + titleRow-level Lock Modes/title + varlistentry + term +literalFOR UPDATE/literal + /term + listitem +para + literalFOR UPDATE/literal causes the rows retrieved by the + commandSELECT/command statement to be locked as though for + update. This prevents them from being modified or deleted by + other transactions until the current transaction ends. That is, + other transactions that attempt commandUPDATE/command, + commandDELETE/command, + commandSELECT FOR UPDATE/command, + commandSELECT FOR NO KEY UPDATE/command, + commandSELECT FOR SHARE/command or + commandSELECT FOR KEY SHARE/command + of these rows will be blocked until the current transaction ends. + The literalFOR UPDATE/ lock mode + is also acquired by any commandDELETE/ on a row, and also by an + commandUPDATE/ that modifies the values on certain columns. Currently, + the set of columns considered for the commandUPDATE/ case are those that + have a unique index on them that can be used in a foreign key (so partial + indexes and expressional indexes are not considered), but this may change + in the future. + Also, if an commandUPDATE/command, commandDELETE/command, + or commandSELECT FOR UPDATE/command from another transaction + has already locked a selected row or rows, commandSELECT FOR
Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows
From: Craig Ringer cr...@2ndquadrant.com It sounds like they've produced a test case, so they should be able to with a bit of luck. Or even better, send you the test case. I asked the user about this. It sounds like the relevant test case consists of many scripts. He explained to me that the simplified test steps are: 1. initdb 2. pg_ctl start 3. Create 16 tables. Each of those tables consist of around 10 columns. 4. Insert 1000 rows into each of those 16 tables. 5. Launch 16 psql sessions concurrently. Each session updates all 1000 rows of one table, e.g., session 1 updates table 1, session 2 updates table 2, and so on. 6. Repeat step 5 50 times. This sounds a bit complicated, but I understood that the core part is 16 concurrent updates, which should lead to contention on xlog insert slots and/or spinlocks. Your next step here really needs to be to make this reproducible against a debug build. Then see if reverting the xlog scalability work actually changes the behaviour, given that you hypothesised that it could be involved. Thank you, but that may be labor-intensive and time-consuming. In addition, the user uses a machine with multiple CPU cores, while I only have a desktop PC with two CPU cores. So I doubt I can reproduce the problem on my PC. I asked the user to change S_UNLOCK to something like the following and run the test during this weekend (the next Monday is a national holiday in Japan). #define S_UNLOCK(lock) InterlockedExchange(lock, 0) FYI, the user reported today that the problem didn't occur when he ran the same test for 24 hours on 9.3.5. Do you see something relevant in 9.4? Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows
On 2014-10-10 23:08:34 +0900, MauMau wrote: From: Craig Ringer cr...@2ndquadrant.com It sounds like they've produced a test case, so they should be able to with a bit of luck. Or even better, send you the test case. I asked the user about this. It sounds like the relevant test case consists of many scripts. He explained to me that the simplified test steps are: 1. initdb 2. pg_ctl start 3. Create 16 tables. Each of those tables consist of around 10 columns. 4. Insert 1000 rows into each of those 16 tables. 5. Launch 16 psql sessions concurrently. Each session updates all 1000 rows of one table, e.g., session 1 updates table 1, session 2 updates table 2, and so on. 6. Repeat step 5 50 times. This sounds a bit complicated, but I understood that the core part is 16 concurrent updates, which should lead to contention on xlog insert slots and/or spinlocks. Hm. I've run similar loads on linux for long enough that I'm relatively sure I'd have seen this. Could you get them to print out the content's of the lwlock all these processes are waiting for? Your next step here really needs to be to make this reproducible against a debug build. Then see if reverting the xlog scalability work actually changes the behaviour, given that you hypothesised that it could be involved. I don't think you can trivially revert the xlog scalability stuff. Thank you, but that may be labor-intensive and time-consuming. In addition, the user uses a machine with multiple CPU cores, while I only have a desktop PC with two CPU cores. So I doubt I can reproduce the problem on my PC. Well, it'll also be labor intensive for the community to debug. I asked the user to change S_UNLOCK to something like the following and run the test during this weekend (the next Monday is a national holiday in Japan). #define S_UNLOCK(lock) InterlockedExchange(lock, 0) That shouldn't be required. For one, on 9.4 (not 9.5!) spinlock releases only need to prevent reordering on the CPU level. As x86 is a TSO architecture (total store order) that doesn't require doing anything special. And even if it'd require more, on msvc volatile reads/stores act as acquire/release fences unless you monkey with the compiler settings. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-10 17:18:46 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com wrote: Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. Is it possible that implementation of some atomic operation is costlier for particular architecture? Yes, sure. And IIRC POWER improved atomics performance considerably for POWER8... I have tried again for scale factor 3000 and could see the dip and this time I have even tried with 175 client count and the dip is approximately 5% which is slightly more than 160 client count. FWIW, the profile always looks like - 48.61% postgres postgres [.] s_lock - s_lock + 96.67% StrategyGetBuffer + 1.19% UnpinBuffer + 0.90% PinBuffer + 0.70% hash_search_with_hash_value + 3.11% postgres postgres [.] GetSnapshotData + 2.47% postgres postgres [.] StrategyGetBuffer + 1.93% postgres [kernel.kallsyms] [k] copy_user_generic_string + 1.28% postgres postgres [.] hash_search_with_hash_value - 1.27% postgres postgres [.] LWLockAttemptLock - LWLockAttemptLock - 97.78% LWLockAcquire + 38.76% ReadBuffer_common + 28.62% _bt_getbuf + 8.59% _bt_relandgetbuf + 6.25% GetSnapshotData + 5.93% VirtualXactLockTableInsert + 3.95% VirtualXactLockTableCleanup + 2.35% index_fetch_heap + 1.66% StartBufferIO + 1.56% LockReleaseAll + 1.55% _bt_next + 0.78% LockAcquireExtended + 1.47% _bt_next + 0.75% _bt_relandgetbuf to me. Now that's with the client count 496, but it's similar with lower counts. BTW, that profile *clearly* indicates we should make StrategyGetBuffer() smarter. Patch_ver/Client_count 175 HEAD 248374 PATCH 235669 Now probably these shouldn't matter much in case backend needs to wait for other Exclusive locker, but I am not sure what else could be the reason for dip in case we need to have Exclusive LWLocks. Any chance to get a profile? Here it goes.. Lwlock_contention patches - client_count=128 -- + 7.95% postgres postgres [.] GetSnapshotData + 3.58% postgres postgres [.] AllocSetAlloc + 2.51% postgres postgres [.] _bt_compare + 2.44% postgres postgres [.] hash_search_with_hash_value + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user + 2.24% postgres postgres [.] AllocSetFreeIndex + 1.75% postgres postgres [.] pg_atomic_fetch_add_u32_impl Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were the compiler settings you used? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote: You said above that it's OK to pass the card numbers to leakproof functions. But if you allow that, you can write a function that takes as argument a redacted card number, and unredacts it (using the and = operators in a binary search). And then you can just do SELECT unredact(card_number) from redacted_table. Not sure I'm following what you mean by 'redacted'. The original proposal provided ' 1234' as the 'redacted' number, and I'm not seeing how you can get the rest of the number trivially with just equality and binary search. If you start with a complete number then you can get the system to tell you if it exists or not with a binary search or even just doing an equality check. You seem to have something stronger in mind: only allow the equality operator on the redacted column, and nothing else. That wasn't my suggestion- I was merely pointing out that if you have a complete number (perhaps by pulling out a random number, with a filter against the last four digits, reducing the search space to 10^12) which you want to check for existance, you can do that directly. No need for a binary search at all. That might be better, although I'm not really convinced. There are just too many ways you could still leak the datum. Just a random example, inspired by the recent CRIME attack on SSL: build a row with the redacted datum, and another guess datum, and store it along with 1k of other data in a temporary table. The row gets toasted. Observe how much it compressed; if the guess datum is close to the original datum, it compresses well. Now, you can probably stop that particular attack with more restrictions on what you can do with the datum, but that just shows that pretty much any computation you allow with the datum can be used to reveal its value. One concept I've been thinking about is a notion of 'trusted' data sources to allow comparison against. Perhaps individual values are allowed from the user also, but my thought is that you have: master_table trusted_table Such that you can't view the sensetive column in either the master or the trusted table, but you can join between the two on the sensetive column and view other, non-sensetive, attributes of the two tables. You might even allow other transformations on the sensetive column, provided it always results in a boolean comparison to another sensetive column. Not sure if that really solves Simon's use-case exactly, but it might tease out other thoughts. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Column Redaction
* Thom Brown (t...@linux.com) wrote: On 10 October 2014 12:45, Stephen Frost sfr...@snowman.net wrote: There's a difference between intending that there shouldn't be a way past security and just making access a matter of walking a longer route. Throwing random 16-digit numbers and associated information at a credit card processor could be viewed as walking a longer route too. The same goes for random key searches or password guesses. But those would need to be exhaustive, and in nearly all cases, impractical. That would be exactly the idea with this- we make it impractical to get at the unredacted information. Data such as plain credit card numbers stored in a column, even with all its data masked, would be easy to determine. I'm not as convinced of that as you are.. Though I'll point out that in the use-cases which I've been talking to users about, it isn't credit cards under discussion. Salted and hashed passwords, even with complete visibility of the value, isn't vulnerable to scrutiny of particular character values. If it were, no-one would use it. I wasn't suggesting otherwise, but I don't see it as particularly relevant to the discussion regardless. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Column Redaction
Robert, * Robert Haas (robertmh...@gmail.com) wrote: On Fri, Oct 10, 2014 at 7:00 AM, Stephen Frost sfr...@snowman.net wrote: The discussion about looking up specific card numbers in the original email from Simon was actually an allowed use-case, as I understood it, not a risk concern. Indeed, if you know a valid credit card number already, as in this example, then why are you bothering with the search? Perhaps it would provide confirmation, but it's not the database's responsibility to make you forget the number you already have. Doing a random walk through a keyspace of 10^16 and extracting a significant enough number of results to be useful should be difficult. I agree that if we're completely unable to make it difficult then this is less useful, but I feel it's a bit early to jump to that conclusion. Thanks much for the laugh. :) You are obviously wearing your rose-colored glasses this morning. I predict a competent SQL programmer could write an SQL function, or client-side code, to pump the data out of the database using binary search in milliseconds per row. Clearly, if we're unable to prevent that, then this feature wouldn't be useful. What would be helpful is to consider what we could provide along these lines without allowing the data to be trivially recovered. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Column Redaction
On 10 October 2014 15:56, Stephen Frost sfr...@snowman.net wrote: * Thom Brown (t...@linux.com) wrote: Data such as plain credit card numbers stored in a column, even with all its data masked, would be easy to determine. I'm not as convinced of that as you are.. Though I'll point out that in the use-cases which I've been talking to users about, it isn't credit cards under discussion. I think credit card numbers are a good example. If we're talking about format functions here, there has to be something in addition to that which determines permitted comparison operations. If not, and we were going to remove all but = operations, we'd effectively cripple the functionality of anything that's been formatted that wasn't intended as a security measure. It almost sounds like an extension to domains rather than column-level functionality. But then if operators such as , and ~~ aren't hindered, it sounds like no protection at all. Also, joining to foreign tables could be an issue, copying data to temporary tables could possibly remove any redaction, materialised views would need to support it somehow. Although just because I can't picture how that would work, it's no indication that it couldn't. Salted and hashed passwords, even with complete visibility of the value, isn't vulnerable to scrutiny of particular character values. If it were, no-one would use it. I wasn't suggesting otherwise, but I don't see it as particularly relevant to the discussion regardless. I guess I was trying to illustrate that the security in a hashed password is acceptable because it requires exhaustive searching to break. If comparison operators worked on it, it would be broken out of the box. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On Fri, Oct 10, 2014 at 11:58 AM, Stephen Frost sfr...@snowman.net wrote: You are obviously wearing your rose-colored glasses this morning. I predict a competent SQL programmer could write an SQL function, or client-side code, to pump the data out of the database using binary search in milliseconds per row. Clearly, if we're unable to prevent that, then this feature wouldn't be useful. What would be helpful is to consider what we could provide along these lines without allowing the data to be trivially recovered. Joins are way too powerful to allow arbitrary joins to untrusted users. The only somewhat secure way is to allow administrators define which joins are possible, and untrusted users use those. You get that with views. I'm not sure you can allow more than that, and not have lots of leaks. Is there a use case where redaction is the only solution really? Nothing mentioned till now really is: * Transaction logs and blocked card lists can be joined against users and a view can be provided that includes the user, and not the credit card. So you can join freely between the views just fine, by user, and do all the analysis you need without exposing credit card numbers in any way, not even redacted. * If not users, you can join against a random but unique per card value generated at some point when the card is first inserted in the records, and you get a random token for the card. Still works, and can be done with triggers, and is far less leaky than the proposed redaction. * Credit card number verification is a leak on its own, but if you really want it, you can provide a function that does it. And I think it's perfectly reasonable that defining leaking functions has to be an admin thing. * Views can expose the redacted value just fine for direct use. A generically usable user-id or random-token to redacted number mapping view would provide all the freedom you could want. * Functions defined as leakproof (even when they're not, which is an admin decision to throw data safety out the window, but it's a possible decision), which allows fetching redacted columns that way from security barrier views. Is there anything not covered by the above that can be done by built-in redacting? If the answer is yes, then maybe the feature has value. If the feature's value is ease of use, I'd weight that with the security loss. False sense of security is a net security loss in most (if not all) cases. Having to flesh out the logic through security barrier views, leakproof redacting functions and triggers can have the good side-effect of making all the possible leaks obvious to the admin. On Fri, Oct 10, 2014 at 12:27 PM, Thom Brown t...@linux.com wrote: Also, joining to foreign tables could be an issue, copying data to temporary tables could possibly remove any redaction, materialised views would need to support it somehow. Although just because I can't picture how that would work, it's no indication that it couldn't. Well, that's why encryption is usually regulatorily required on credit card data. Way too many ways to leak, and way too valuable to expect lack of knowledgeable and motivated people trying to get them. On Fri, Oct 10, 2014 at 12:27 PM, Thom Brown t...@linux.com wrote: Salted and hashed passwords, even with complete visibility of the value, isn't vulnerable to scrutiny of particular character values. If it were, no-one would use it. I wasn't suggesting otherwise, but I don't see it as particularly relevant to the discussion regardless. I guess I was trying to illustrate that the security in a hashed password is acceptable because it requires exhaustive searching to break. If comparison operators worked on it, it would be broken out of the box. Lately, the security of password-based authentication is being put into question very often. So I wouldn't hold credit card numbers or any other sensible information to the password standard. But lets use the password example: it's widely accepted that holding onto cleartext passwords or even transmitting over any channel them or their plain hashes to be extremely bad practice. So redaction isn't good enough for passwords, nor is salted hashing either. The only generally accepted way on the security community, is a password proof in the context of a zero-knowledge password proof protocol[0]. You'd want something like that for any bit of info you need to join or compare but you can't accept leaking it. [0] http://en.wikipedia.org/wiki/Zero-knowledge_password_proof -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On Fri, Oct 10, 2014 at 10:56 AM, Stephen Frost sfr...@snowman.net wrote: * Thom Brown (t...@linux.com) wrote: On 10 October 2014 12:45, Stephen Frost sfr...@snowman.net wrote: There's a difference between intending that there shouldn't be a way past security and just making access a matter of walking a longer route. Throwing random 16-digit numbers and associated information at a credit card processor could be viewed as walking a longer route too. The same goes for random key searches or password guesses. But those would need to be exhaustive, and in nearly all cases, impractical. That would be exactly the idea with this- we make it impractical to get at the unredacted information. For fun I gave the search a try. create table cards (id serial, cc bigint); insert into cards (cc) SELECT CAST(random() * AS bigint) FROM generate_series(1,1); \timing on WITH RECURSIVE t(id, range_min, range_max) AS ( SELECT id, 1::bigint, FROM cards UNION ALL SELECT id , CASE WHEN cc = range_avg THEN range_avg ELSE range_min END , CASE WHEN cc = range_avg THEN range_avg ELSE range_max END FROM (SELECT id, (range_min + range_max) / 2 AS range_avg, range_min, range_max FROM t ) AS t_avg JOIN cards USING (id) WHERE range_min != range_max ) SELECT id, range_min AS cc FROM t WHERE range_min = range_max; On my laptop I can pull all 10,000 card numbers in less than 1 second. For a text based item I don't imagine it would be much different. Numbers are pretty easy to work with though.
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-10 16:41:39 +0200, Andres Freund wrote: FWIW, the profile always looks like - 48.61% postgres postgres [.] s_lock - s_lock + 96.67% StrategyGetBuffer + 1.19% UnpinBuffer + 0.90% PinBuffer + 0.70% hash_search_with_hash_value + 3.11% postgres postgres [.] GetSnapshotData + 2.47% postgres postgres [.] StrategyGetBuffer + 1.93% postgres [kernel.kallsyms] [k] copy_user_generic_string + 1.28% postgres postgres [.] hash_search_with_hash_value - 1.27% postgres postgres [.] LWLockAttemptLock - LWLockAttemptLock - 97.78% LWLockAcquire + 38.76% ReadBuffer_common + 28.62% _bt_getbuf + 8.59% _bt_relandgetbuf + 6.25% GetSnapshotData + 5.93% VirtualXactLockTableInsert + 3.95% VirtualXactLockTableCleanup + 2.35% index_fetch_heap + 1.66% StartBufferIO + 1.56% LockReleaseAll + 1.55% _bt_next + 0.78% LockAcquireExtended + 1.47% _bt_next + 0.75% _bt_relandgetbuf to me. Now that's with the client count 496, but it's similar with lower counts. BTW, that profile *clearly* indicates we should make StrategyGetBuffer() smarter. Which is nearly trivial now that atomics are in. Check out the attached WIP patch which eliminates the spinlock from StrategyGetBuffer() unless there's buffers on the freelist. Test: pgbench -M prepared -P 5 -S -c 496 -j 496 -T 5000 on a scale=1000 database, with 4GB of shared buffers. Before: progress: 40.0 s, 136252.3 tps, lat 3.628 ms stddev 4.547 progress: 45.0 s, 135049.0 tps, lat 3.660 ms stddev 4.515 progress: 50.0 s, 135788.9 tps, lat 3.640 ms stddev 4.398 progress: 55.0 s, 135268.4 tps, lat 3.654 ms stddev 4.469 progress: 60.0 s, 134991.6 tps, lat 3.661 ms stddev 4.739 after: progress: 40.0 s, 207701.1 tps, lat 2.382 ms stddev 3.018 progress: 45.0 s, 208022.4 tps, lat 2.377 ms stddev 2.902 progress: 50.0 s, 209187.1 tps, lat 2.364 ms stddev 2.970 progress: 55.0 s, 206462.7 tps, lat 2.396 ms stddev 2.871 progress: 60.0 s, 210263.8 tps, lat 2.351 ms stddev 2.914 Yes, no kidding. The results are similar, but less extreme, for smaller client counts like 80 or 160. Amit, since your test seems to be currently completely bottlenecked within StrategyGetBuffer(), could you compare with that patch applied to HEAD and the LW_SHARED patch for one client count? That'll may allow us to see a more meaningful profile... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 6b486e5b467e94ab9297d7656a5b39b816c5c55a Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Fri, 10 Oct 2014 17:36:46 +0200 Subject: [PATCH] WIP: lockless StrategyGetBuffer hotpath --- src/backend/storage/buffer/freelist.c | 154 -- 1 file changed, 90 insertions(+), 64 deletions(-) diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 5966beb..0c634a0 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -18,6 +18,12 @@ #include storage/buf_internals.h #include storage/bufmgr.h +#include port/atomics.h + + +#define LATCHPTR_ACCESS_ONCE(var) ((Latch *)(*((volatile Latch **)(var +#define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)(var + /* * The shared freelist control information. @@ -27,8 +33,12 @@ typedef struct /* Spinlock: protects the values below */ slock_t buffer_strategy_lock; - /* Clock sweep hand: index of next buffer to consider grabbing */ - int nextVictimBuffer; + /* + * Clock sweep hand: index of next buffer to consider grabbing. Note that + * this isn't a concrete buffer - we only ever increase the value. So, to + * get an actual buffer, it needs to be used modulo NBuffers. + */ + pg_atomic_uint32 nextVictimBuffer; int firstFreeBuffer; /* Head of list of unused buffers */ int lastFreeBuffer; /* Tail of list of unused buffers */ @@ -42,8 +52,8 @@ typedef struct * Statistics. These counters should be wide enough that they can't * overflow during a single bgwriter cycle. */ - uint32 completePasses; /* Complete cycles of the clock sweep */ - uint32 numBufferAllocs; /* Buffers allocated since last reset */ + pg_atomic_uint32 completePasses; /* Complete cycles of the clock sweep */ + pg_atomic_uint32 numBufferAllocs; /* Buffers allocated since last reset */ /* * Notification latch, or NULL if none. See StrategyNotifyBgWriter. @@ -124,87 +134,107 @@ StrategyGetBuffer(BufferAccessStrategy strategy) return buf; } - /* Nope, so lock the freelist */ - SpinLockAcquire(StrategyControl-buffer_strategy_lock); - /* * We count buffer allocation requests so that the bgwriter can estimate * the rate of buffer consumption.
Re: [HACKERS] schema-only -n option in pg_restore fails
On Thu, Oct 9, 2014 at 6:19 PM, Josh Berkus j...@agliodbs.com wrote: On 10/09/2014 12:36 PM, Josh Berkus wrote: Summary: pg_restore -n attempts to restore objects to pg_catalog schema Versions Tested: 9.3.5, 9.3.0, 9.2.4 Explored this some with Andrew offlist. Turns out this is going to be a PITA to fix, so it should go on the big pile of TODOs for when we overhaul search_path. Here's what's happening under the hood, pg_restore generates this SQL text: SET search_path = schem_a, pg_catalog; CREATE TABLE tab_a ( test text ); Since schem_a doesn't exist, it's skipped over and pg_restore attempts to create the objects in pg_catalog. So this is Yet Another Issue caused by the ten meter tall tar baby which is search_path. So, my proposal for a resolution: 1) In current versions, patch the docs to explicitly say that -n does not create the schema, and that if the user doesn't create the schema pg_restore will fail. 2) Patch 9.5's pg_restore to do CREATE SCHEMA IF NOT EXISTS when -n is used. This will be 100% backwards-compatible with current behavior. I agree with this solution. Always when I restore some schema from a dump I need to create schemas before and it's sucks. I'm working on the 2th item [1] together with other friend (Sebastian, in cc) to introduce him into the PostgreSQL development process. We'll register soon to the next commitfest. Regards, [1] https://github.com/fabriziomello/postgres/tree/create_schema_during_pg_restore_schema_only -- FabrĂzio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] [GENERAL] Understanding and implementing a GiST Index
I'd be ok with either SP-GiST or GiST, though there are tree structures (BK tree) that are particularly suited to this application that I /think/ map to GiST better then SP-GiST. Re: hamming in the RD-tree implementation: Where? I cloned the smlar git repo, and poked around, and it looks like it only can operate on set intersections, which is a non-starter for what I need to do. I spent a while looking through the source, and I didn't see anything that looked like hamming distance calculation (through I probably missed some stuff. The indirection through `FCall2()` makes things very hard to follow). Anyways, right now, smlar is not useful to me, because it completely ignores the structure of incoming arrays: postgres=# SELECT smlar( postgres(# '{1,0,0,0, 1,1,1,1, 1,1,1,0, 0,1,0,0}'::int[], postgres(# '{1,0,0,1, 0,1,0,0, 0,1,0,0, 0,1,1,0}' postgres(# ); smlar --- 1 (1 row) For two radically different hashes (shortened for example purposes) which compare as identical. I spent some time trying to see if I could easily disable the array uniquefication, and by disabling the calls to `qsort_arg`, and modifying `numOfIntersect` so it just counts the number of times the array elements do not match, and I'm getting the behaviour I want out of `smlar()` at this point: postgres=# SELECT smlar( postgres(# '{1,0,0,0, 1,1,1,1, 1,1,1,0, 0,1,0,0}'::int[], postgres(# '{1,0,0,1, 0,1,0,0, 0,1,0,0, 0,1,1,0}' postgres(# ); smlar 0.4375 (1 row) postgres=# SELECT smlar( '{0,1,1,0}'::int[], '{1,0,1,0}' ); smlar --- 0.5 (1 row) But the index does not seem to work after the changes I made, and the debugging print statements (well, `elog()` statements) I inserted into `cmpArrayElem()` and `numOfIntersect()` are not being hit, so I don't understand how the index is even being built. Anyways, I'm rambling a bit. Any input would be great. Thanks, Connor On 10/9/2014 8:11 AM, Oleg Bartunov wrote: Just quick recommendation. You need to ask -hackers mailing list for that. Also, do you want GiST or SP-GiST ? We already use hamming distance in rd-tree, which implemented in GiST, see our presentation http://www.sai.msu.su/~megera/postgres/talks/pgcon-2012.pdf http://www.sai.msu.su/%7Emegera/postgres/talks/pgcon-2012.pdf, for example. Oleg On Thu, Oct 9, 2014 at 11:09 AM, Connor Wolf w...@imaginaryindustries.com mailto:w...@imaginaryindustries.com wrote: Hi there! I'm trying to implement a custom indexing system using the GiST index framework, and I have a few questions. Basically, I'm trying to implement a system that allows me to search across a set of 64 bit integers by hamming distance. This is intended to be used in searching for similar images, where the 64 bit integer is actually a phash http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html of an image, and similarity between two images is reflected in the hamming distance between two integers. Anyways, The appropriate approach here is to use something called a BK tree http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees, for which I've written some test code https://github.com/fake-name/MangaCMS/blob/master/deduplicator/hamDb.py#L27 and I think I have a decent grip of (my test code seems to work, in any event). That said: * Is there a /simple/ piece of example-code for implementing a GiST index for a single index? I've been looking through the /contrib/ directory, particularly the /contrib/btree_gist/ files, but it looks like 90% of the complexity there is related to supporting all the column types Postgres has, rather then actually tied to producing a functional index. * Once I have something compiling, how can I check to be sure that I'm actually using the indexing module I created? I can enable my compiled extension using `CREATE EXTENSION`, but is there an option for `CREATE INDEX test_index ON testing USING gist (val);` that lets me specify *which* GiST index is actually employed? Is this even a valid question? This is probably something that's available in one of the system tables somewhere, but I haven't had much luck with google finding out where. * Testing: What's the appropriate way to examine the generated tree structure of the index? I certainly went through a few bugs with my test tree system (and that was in python!). Is there any way to examine the index structure for debugging purposes? * Also, it looks like `ereport()` is the proper way to emit debugging information. Is this correct? * In that vein, is there any way to have information that is on the operations of an entire query? Looking at the number of tree nodes touched for a scan would be nice (and I would not be surprised if there is
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Wed, Oct 8, 2014 at 5:01 PM, Kevin Grittner kgri...@ymail.com wrote: Peter Geoghegan p...@heroku.com wrote: On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote: I think the problem is that it's not possible to respect the usual guarantees even at READ COMMITTED level when performing an INSERT OR UPDATE operation (however spelled). You may find that there's a tuple with the same PK which is committed but not visible to the snapshot you took at the beginning of the statement. Can you please comment on this, Kevin? It would be nice to converge on an agreement on syntax here Robert said however spelled -- which I take to mean that he at least sees that the MERGE-like UPSERT syntax can be turned into the desired semantics. I have no idea why anyone would think otherwise. Although the last go-around does suggest that there is at least one point of difference on the semantics. You seem to want to fire the BEFORE INSERT triggers before determining whether this will be an INSERT or an UPDATE. OK, I have a comment on this. Anything we do about triggers will by definition be novel. Right now, we have INSERT, UPDATE, and DELETE. If we add a new operation, whether it's called UPSERT or MERGE or FROB, or if we add a flag to insert that makes it possibly do something other than inserting (e.g. INSERT OR UPDATE), or if we add a flag to update that makes it do something other than updating (e.g. UPDATE or INSERT), then some people's triggers are going to get broken by that change no matter how we do it. When the new operation is invoked, we can fire the insert triggers, the update triggers, some new kind of trigger, or no trigger at all - and any decision we make there will not in all cases be backward-compatible. We can try to minimize the damage (and we probably should) but we can't make it go to zero. INSERT INTO targettable(key, quantity, inserted_at) VALUES(123, quantity, now()) ON CONFLICT WITHIN targettable_pkey UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now(); I actually like this syntax reasonably well in some ways, but I don't like that we're mentioning the index name, and the CONFLICTING() notation is decidedly odd. Maybe something like this: INSERT INTO targettable(key, quantity, inserted_at) VALUES(123, quantity, now()) ON DUPLICATE (key) UPDATE SET quantity = quantity + OLD.quantity, updated_at = now(); (Perhaps OLD should reference the tuple already in the table, and NEW the value from the VALUES clause. That would be snazzy indeed.) Also, how about making the SET clause optional, with the semantics that we just update all of the fields for which a value is explicitly specified: INSERT INTO overwrite_with_abandon (key, value) VALUES (42, 'meaning of life') ON DUPLICATE (key) UPDATE; While the ability to specify a SET clause there explicitly is useful, I bet a lot of key-value store users would love the heck out of a syntax that let them omit it when they want to overwrite. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
Thom Brown t...@linux.com wrote: On 10 October 2014 15:56, Stephen Frost sfr...@snowman.net wrote: Thom Brown (t...@linux.com) wrote: Data such as plain credit card numbers stored in a column, even with all its data masked, would be easy to determine. I'm not as convinced of that as you are.. Though I'll point out that in the use-cases which I've been talking to users about, it isn't credit cards under discussion. I think credit card numbers are a good example. I'm not so sure. Aren't credit card numbers generally required by law to be stored in an encrypted form? If we're talking about format functions here, there has to be something in addition to that which determines permitted comparison operations. If not, and we were going to remove all but = operations, we'd effectively cripple the functionality of anything that's been formatted that wasn't intended as a security measure. It almost sounds like an extension to domains rather than column-level functionality. I have to say that my first thought was that format functions associated with types with domain override would be a very nice capability. But I don't see where that has much to do with security. I have seen many places where redaction is necessary (and in fact done), but I don't see how that could be addressed by what Simon is proposing. Perhaps I'm missing something; if so, a more concrete exposition of a use case might allow things to click. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
Hi! On 9.10.2014 22:28, Kevin Grittner wrote: Tomas Vondra t...@fuzzy.cz wrote: The only case I've been able to come up with is when the hash table fits into work_mem only thanks to not counting the buckets. The new code will start batching in this case. Hmm. If you look at the timings in my posting from 2014-10-01 I had cases where the patched version started with one batch and went to two, while the unpatched used just one batch, and the patched version was still more than twice as fast. I'm sure the on disk batch was fully cached, however; it might work out differently if disk speed actually came into the picture. I think the case with large outer table and memory pressure, forcing the batches to be actually written to disk (instead of just being in the page cache) is the main concern here. That is mostly luck, however, because it depends on the cardinality estimates, and the best way to fix it is increasing work_mem (which is safer thanks to reducing the overhead). Also, Robert proposed a way to mitigate this, if we realize we'd have to do batching during the initial sizing, we can peek whether reducing the number of buckets (to 1/2 or maybe 1/4) would help. I believe this is a good approach, and will look into that after pgconf.eu (i.e. early November), unless someone else is interested. Sure, but it would be good to confirm that it's actually needed first. Yeah. But with cleverly crafted test case it's possible to confirm almost everything ;-) When given a generous work_mem setting the patched version often uses more of what it is allowed than the unpatched version (which is probably one of the reasons it tends to do better). If someone has set a high work_mem and has gotten by only because the configured amount is not all being used when it would benefit performance, they may need to adjust work_mem down to avoid memory problems. That doesn't seem to me to be a reason to reject the patch. I'm not entirely sure I understand this paragraph. What do you mean by configured amount is not all being used when it would benefit performance? Can you give an example? Again, the earlier post showed that while the unpatched used 3516kB whether it had work_mem set to 4MB or 1GB, the patched version used 3073kB when work_mem was set to 4MB and 4540kB when work_mem was set to 1GB. The extra memory allowed the patched version to stay at 1 batch, improving performance over the other setting. OK, so with 4MB the new version was batching, while the original code keeps a single batch (likely due to not counting buckets into work_mem). I believe that's expected behavior. Also, in the e-mail from Oct 2 that got lost [1], I pointed out that by testing only with small hash tables the results are likely influenced by high CPU cache hit ratio. I think benchmarking with larger inner relation (thus increasing the memory occupied by the hash table) might be interesting. [1] http://www.postgresql.org/message-id/c84680e818f6a0f4a26223cd750ff252.squir...@2.emaily.eu The only thing I can think of is the batching behavior described above. This is in Waiting on Author status only because I never got an answer about why the debug code used printf() rather the elog() at a DEBUG level. Other than that, I would say this patch is Ready for Committer. Tomas? You there? I think I responded to that on October 2, quoting: ... Ah, I never received that email. That tends to happen every now and then. :-( I believe it's safe to switch the logging to elog(). IMHO the printf logging is there from some very early version of the code, before elog was introduced. Or something like that. I guess it might be best to remain consistent in this patch and change that in a separate patch. I just wanted to make sure you didn't see any reason not to do so. OK, I'll put that on my TODO list for the next CF. With that addressed I will move this to Ready for Committer. Since both Heikki and Robert spent time on this patch earlier, I'll give either of them a shot at committing it if they want; otherwise I'll do it. Great, thanks! Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
On Fri, Oct 10, 2014 at 02:05:05PM +0100, Thom Brown wrote: On 10 October 2014 13:43, Simon Riggs si...@2ndquadrant.com wrote: On 10 October 2014 11:45, Thom Brown t...@linux.com wrote: To be honest, this all sounds rather flaky. To be honest, suggesting anything at all is rather difficult and I recommend people try it. I have, and most ideas I've had have been justifiably shot down or picked apart (scheduled background tasks, offloading stats collection to standby, index maintenance in DML query plans, expression statistics... to name but a few). Everything sounds crap when you didn't think of it and you've given it an hour's thought. I'm not sure that means my concerns aren't valid. I don't think it sounds crap, but I also can't see any use-case for it where we don't already have things covered, or where it's going to offer any useful level of security. Like with RLS, it may be that I'm just looking at things from the wrong perspective. Agreed. The problem isn't giving it only an hours thought --- it is that we can come up with serious problems in five _seconds_ of thought. Unless you can some up with a solution to those issues, I am not sure why we are even talking about it. My other concern is you must have realized these issues in five seconds too, so why didn't you mention them? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote: Anything we do about triggers will by definition be novel. Right now, we have INSERT, UPDATE, and DELETE. If we add a new operation, whether it's called UPSERT or MERGE or FROB, or if we add a flag to insert that makes it possibly do something other than inserting (e.g. INSERT OR UPDATE), or if we add a flag to update that makes it do something other than updating (e.g. UPDATE or INSERT), then some people's triggers are going to get broken by that change no matter how we do it. When the new operation is invoked, we can fire the insert triggers, the update triggers, some new kind of trigger, or no trigger at all - and any decision we make there will not in all cases be backward-compatible. We can try to minimize the damage (and we probably should) but we can't make it go to zero. +1. It's inevitable that someone isn't going to be entirely happy with whatever we do. Let's be realistic about that, while minimizing the risk. I actually like this syntax reasonably well in some ways, but I don't like that we're mentioning the index name, and the CONFLICTING() notation is decidedly odd. Maybe something like this: People keep remarking that they don't like that you can (optionally) name a unique index explicitly, and I keep telling them why I've done it that way [1]. There is a trade-off here. I am willing to go another way in that trade-off, but let's have a realistic conversation about it. Also, how about making the SET clause optional, with the semantics that we just update all of the fields for which a value is explicitly specified: INSERT INTO overwrite_with_abandon (key, value) VALUES (42, 'meaning of life') ON DUPLICATE (key) UPDATE; While the ability to specify a SET clause there explicitly is useful, I bet a lot of key-value store users would love the heck out of a syntax that let them omit it when they want to overwrite. While I initially like that idea, now I'm not so sure about it [2]. MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE command allows it, since it is defined as a DELETE followed by an INSERT (or something like that), but I think that in general that feature has been a disaster for them. [1] http://www.postgresql.org/message-id/CAM3SWZQpGf6_ME6D1vWqdCFadD7Nprdx2JrE=Hcf=93kxeh...@mail.gmail.com [2] http://www.postgresql.org/message-id/CAM3SWZT=vxbj7qkaidamybu40ap10udsqooqhvix3ykj7wb...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
Robert Haas robertmh...@gmail.com wrote: Anything we do about triggers will by definition be novel. Right now, we have INSERT, UPDATE, and DELETE. If we add a new operation, whether it's called UPSERT or MERGE or FROB, [ ... ] If we call it MERGE, then we had better follow the rules the standard lays out for triggers on a MERGE statement -- which is that UPDATE triggers fire for rows updated and that INSERT triggers fire for rows inserted. That kinda makes sense, since the MERGE command is almost like a row-by-row CASE statement allowing one or the other. If we make up our own command verb, we are free to do as we like. Maybe something like this: INSERT INTO targettable(key, quantity, inserted_at) VALUES(123, quantity, now()) ON DUPLICATE (key) UPDATE SET quantity = quantity + OLD.quantity, updated_at = now(); That seems a lot cleaner than the proposal on the Wiki page. If we go that route, it makes sense to fire the BEFORE INSERT triggers before attempting the insert and then fire BEFORE UPDATE triggers before attempting the UPDATE. If either succeeds, I think we should fire the corresponding AFTER triggers. We already allow a BEFORE triggers to run and then omit the triggering operation without an error, so I don't see that as a problem. This makes a lot more sense to me than attempting to add a new UPSERT trigger type. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner kgri...@ymail.com wrote: That seems a lot cleaner than the proposal on the Wiki page. If we go that route, it makes sense to fire the BEFORE INSERT triggers before attempting the insert and then fire BEFORE UPDATE triggers before attempting the UPDATE. If either succeeds, I think we should fire the corresponding AFTER triggers. We already allow a BEFORE triggers to run and then omit the triggering operation without an error, so I don't see that as a problem. This makes a lot more sense to me than attempting to add a new UPSERT trigger type. You realize that that's exactly what my patch does, right? AFAICT the only confusion that my patch has is with statement-level triggers, as discussed on the Wiki page. I think that the row-level trigger behavior is fine; a lot more thought has gone into it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
Peter Geoghegan p...@heroku.com wrote: On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote: I actually like this syntax reasonably well in some ways, but I don't like that we're mentioning the index name, and the CONFLICTING() notation is decidedly odd. People keep remarking that they don't like that you can (optionally) name a unique index explicitly, and I keep telling them why I've done it that way [1]. There is a trade-off here. I am willing to go another way in that trade-off, but let's have a realistic conversation about it. We've all read that, and your repeated arguments for that point of view. We disagree and have said why. What in that is not a realistic conversation? To restate: to do so is conflating the logical definition of the database with a particular implementation detail. As just one reason that is a bad idea: we can look up unique indexes on the specified columns, but if we implement a other storage techniques where there is no such thing as a unique index on the columns, yet manage to duplicate the semantics (yes, stranger things have happened), people can't migrate to the new structure without rewriting their queries. If the syntax references logical details (like column names) there is no need to rewrite. We don't want to be painted into a corner. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 11:38 AM, Peter Geoghegan p...@heroku.com wrote: That seems a lot cleaner than the proposal on the Wiki page. If we go that route, it makes sense to fire the BEFORE INSERT triggers before attempting the insert and then fire BEFORE UPDATE triggers before attempting the UPDATE. By the way, there is no problem with failing to UPDATE, because we lock rows ahead of the UPDATE. Once a row is locked, the UPDATE cannot conflict. There is no danger of UPDATE before row-level triggers firing without then updating (unless the xact aborts, but you know what I mean). In general, there is no danger of triggers firing more often than you might consider that they should, with the sole exception of the fact that we always fire before insert row-level triggers, even when an UPDATE is the ultimate outcome. Restarting for conflicts (e.g. handling concurrent insertions with promise tuples) necessitates a restart, but to the point after before row-level insert triggers fire, and from a point before any other triggers have the opportunity to fire. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner kgri...@ymail.com wrote: We've all read that, and your repeated arguments for that point of view. We disagree and have said why. What in that is not a realistic conversation? Because, as I've said many times, there are problems with naming columns directly that need to be addressed. There are corner-cases. Are you going to complain about those once I go implement something? Let's talk about that. I think we can just throw an error in these edge-cases, but let's decide if we're comfortable with that. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
Peter Geoghegan p...@heroku.com wrote: On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner kgri...@ymail.com wrote: That seems a lot cleaner than the proposal on the Wiki page. If we go that route, it makes sense to fire the BEFORE INSERT triggers before attempting the insert and then fire BEFORE UPDATE triggers before attempting the UPDATE. If either succeeds, I think we should fire the corresponding AFTER triggers. We already allow a BEFORE triggers to run and then omit the triggering operation without an error, so I don't see that as a problem. This makes a lot more sense to me than attempting to add a new UPSERT trigger type. You realize that that's exactly what my patch does, right? I haven't read the patch, but the discussion has made that clear, yes. Try to take agreement on a point gracefully, will ya? ;-) AFAICT the only confusion that my patch has is with statement-level triggers, as discussed on the Wiki page. I think that the row-level trigger behavior is fine; a lot more thought has gone into it. IMV it is clear that since the standard says that update or insert operations which affect zero rows fire the corresponding trigger, the statement level triggers for both should fire for an UPSERT, even if the set of affected rows for one or the other (or both!) is zero. If we ever get transition relations, it will be easy to check the count of affected rows or otherwise behave appropriately based on what was done. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
Peter Geoghegan p...@heroku.com wrote: There is no danger of UPDATE before row-level triggers firing without then updating (unless the xact aborts, but you know what I mean). Well, let's make sure I do know what you mean. If a BEFORE UPDATE ... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped without error, right? The BEFORE UPDATE triggers will run before the UPDATE if a duplicate is found, and the return value will be treated in the usual manner, replacing values and potentially skipping the update? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 2:29 PM, Peter Geoghegan p...@heroku.com wrote: People keep remarking that they don't like that you can (optionally) name a unique index explicitly, and I keep telling them why I've done it that way [1]. There is a trade-off here. I am willing to go another way in that trade-off, but let's have a realistic conversation about it. I think what's realistic here is that the patch isn't going to get committed the way you have it today, because nobody else likes that design. That may be harsh, but I think it's accurate. But on the substance of the issue, I'm totally unconvinced by your argument in the linked email. Let's suppose you have this: create table foo (a int, b int); create unique index foo1 on foo (a); create unique index foo2 on foo (a); create unique index foo3 on foo (a) where b = 1; Anything that conflicts in foo1 or foo2 is going to conflict in the other one, and with foo3. Anything that conflicts in foo3 is perhaps also going to conflict in foo1 and foo2, or perhaps not. Yet, if the statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear what to do: when we hit a conflict in any of those three indexes, update the row. No matter which index has the conflict, updating is the right answer, and solves the conflict in all three indexes. I dunno what you'd expect someone to write in your syntax to solve this problem - presumably either (1) they can list any of the indexes that might conflict, (2) they must list all of the indexes that might conflict, or (3) it just doesn't work. Whatever the actual behavior of your patch is today, it seems to me that the conflict is, fundamentally, over a set of column values, NOT over a particular index. The index is simply a mechanism for noticing that conflict. If you want to allow this to work with expression indexes, that's not really a problem; you can let the user write INSERT .. ON CONFLICT (upper(foo)) UPDATE ... and match that to the index. It wouldn't bother me if the first version of this feature couldn't target expression indexes anyway, but if you want to include that, having the user mention the actual expression rather than the index name shouldn't make much difference. Also, how about making the SET clause optional, with the semantics that we just update all of the fields for which a value is explicitly specified: INSERT INTO overwrite_with_abandon (key, value) VALUES (42, 'meaning of life') ON DUPLICATE (key) UPDATE; While the ability to specify a SET clause there explicitly is useful, I bet a lot of key-value store users would love the heck out of a syntax that let them omit it when they want to overwrite. While I initially like that idea, now I'm not so sure about it [2]. MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE command allows it, since it is defined as a DELETE followed by an INSERT (or something like that), but I think that in general that feature has been a disaster for them. Your syntax allows the exact same thing; it simply require the user to be more verbose in order to get that behavior. If we think that people wanting that behavior will be rare, then it's fine to require them to spell it out when they want it. If we think it will be the overwhelming common application of this feature, and I do, then making people spell it out when we could just as well infer it is pointless. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
Peter Geoghegan p...@heroku.com wrote: On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner kgri...@ymail.com wrote: [discussion on committers rejecting the notion of a syntax involving specification of an index name] as I've said many times, there are problems with naming columns directly that need to be addressed. There are corner-cases. Are you going to complain about those once I go implement something? If I don't like what I see, yes. I think it will be entirely possible for you to write something along those lines that I won't complain about. I think we can just throw an error in these edge-cases, but let's decide if we're comfortable with that. If the list of columns does not allow a choice of a suitable full-table unique index on only non-null columns, an error seems appropriate. What other problem cases do you see? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 11:55 AM, Kevin Grittner kgri...@ymail.com wrote: I haven't read the patch, but the discussion has made that clear, yes. Try to take agreement on a point gracefully, will ya? ;-) Heh, sorry. I did literally mean what I said - it wasn't 100% clear to me that you got that. It's safest to not make any assumptions like that. AFAICT the only confusion that my patch has is with statement-level triggers, as discussed on the Wiki page. I think that the row-level trigger behavior is fine; a lot more thought has gone into it. IMV it is clear that since the standard says that update or insert operations which affect zero rows fire the corresponding trigger, the statement level triggers for both should fire for an UPSERT, even if the set of affected rows for one or the other (or both!) is zero. If we ever get transition relations, it will be easy to check the count of affected rows or otherwise behave appropriately based on what was done. I think that you're probably right about that. Note that UPDATE rules will not be applied to the UPDATE portion of the statement. Not sure what to do about that, but I'm pretty sure that it's inevitable, because the UPDATE is effectively the same top-level statement for the purposes of rules. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 12:04 PM, Kevin Grittner kgri...@ymail.com wrote: Peter Geoghegan p...@heroku.com wrote: There is no danger of UPDATE before row-level triggers firing without then updating (unless the xact aborts, but you know what I mean). Well, let's make sure I do know what you mean. If a BEFORE UPDATE ... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped without error, right? The BEFORE UPDATE triggers will run before the UPDATE if a duplicate is found, and the return value will be treated in the usual manner, replacing values and potentially skipping the update? That's exactly right, yes (in particular, you can return NULL from before row update triggers and have that cancel the update in the usual manner). And potentially, the final value could be affected by both the before row insert and before row update triggers (by way of CONFLICTING()). That's the most surprising aspect of what I've done (although I would argue it's the least surprising behavior possible given my constraints). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas robertmh...@gmail.com wrote: I think what's realistic here is that the patch isn't going to get committed the way you have it today, because nobody else likes that design. That may be harsh, but I think it's accurate. I do think that's harsh, because it's unnecessary: I am looking for the best design. If you want to propose alternatives, great, but there is a reason why I've done things that way, and that should be acknowledged. I too think naming the unique index is ugly as sin, and have said as much multiple times. We're almost on the same page here. But on the substance of the issue, I'm totally unconvinced by your argument in the linked email. Let's suppose you have this: create table foo (a int, b int); create unique index foo1 on foo (a); create unique index foo2 on foo (a); create unique index foo3 on foo (a) where b = 1; Anything that conflicts in foo1 or foo2 is going to conflict in the other one, and with foo3. Anything that conflicts in foo3 is perhaps also going to conflict in foo1 and foo2, or perhaps not. Yet, if the statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear what to do: when we hit a conflict in any of those three indexes, update the row. No matter which index has the conflict, updating is the right answer, and solves the conflict in all three indexes. I dunno what you'd expect someone to write in your syntax to solve this problem - presumably either (1) they can list any of the indexes that might conflict, (2) they must list all of the indexes that might conflict, or (3) it just doesn't work. I expect them to name exactly one unique index. They should either do that explicitly, or have one in mind and make the behavior implicit at the risk of miscalculating (and having a surprising outcome). It doesn't matter if it's foo1 or foo2 in this example (but foo3 is different, obviously). Whatever the actual behavior of your patch is today, it seems to me that the conflict is, fundamentally, over a set of column values, NOT over a particular index. The index is simply a mechanism for noticing that conflict. I think that this is the kernel of our disagreement. The index is not simply a mechanism for noticing that conflict. The user had better have one unique index in mind to provoke taking the alternative path in the event of a would-be unique violation. Clearly it doesn't matter much in this particular example. But what if there were two partial unique indexes, that were actually distinct, but only in terms of the constant appearing in their predicate? And what about having that changed by a before insert row-level trigger? Are we to defer deciding which unique index to use at the last possible minute? As always with this stuff, the devil is in the details. If we work out the details, then I can come up with something that's acceptable to everyone. Would you be okay with this never working with partial unique indexes? That gives me something to work with. If you want to allow this to work with expression indexes, that's not really a problem; you can let the user write INSERT .. ON CONFLICT (upper(foo)) UPDATE ... and match that to the index. It wouldn't bother me if the first version of this feature couldn't target expression indexes anyway, but if you want to include that, having the user mention the actual expression rather than the index name shouldn't make much difference. I'm not that worried about expression indexes, actually. I'm mostly worried about partial unique indexes, particularly when before insert row-level triggers are in play (making *that* play nice with expression indexes is harder still, but expression indexes on their own are probably not that much of a problem). Also, how about making the SET clause optional, with the semantics that we just update all of the fields for which a value is explicitly specified: INSERT INTO overwrite_with_abandon (key, value) VALUES (42, 'meaning of life') ON DUPLICATE (key) UPDATE; Your syntax allows the exact same thing; it simply require the user to be more verbose in order to get that behavior. If we think that people wanting that behavior will be rare, then it's fine to require them to spell it out when they want it. If we think it will be the overwhelming common application of this feature, and I do, then making people spell it out when we could just as well infer it is pointless. Did you consider my example? I think that people will like this idea, too - that clearly isn't the only consideration, though. As you say, it would be very easy to implement this. However, IMV, we shouldn't, because it is hazardous. MySQL doesn't allow this, and they tend to find expedients like this useful. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Column Redaction
Rod, * Rod Taylor (rod.tay...@gmail.com) wrote: For fun I gave the search a try. Neat! On my laptop I can pull all 10,000 card numbers in less than 1 second. For a text based item I don't imagine it would be much different. Numbers are pretty easy to work with though. I had been planning to give something like this a shot once I got back from various meetings today- so thanks! Being able to use the CC # *as* the target for the binary search is definitely an issue, though looking back on the overall problem space, CC's are less than 54 bits, and it's actually a smaller space than than that if you know how they're put together. My thought on an attack was more along these lines: select * from cards join (SELECT CAST(random() * AS bigint) a from generate_series(1,100)) as foo on (cards.cc = foo.a); Which could pretty quickly find ~500 CC #s in a second or so (with a 'cards' table of about 1M entries) based on my testing. That's clearly sufficient enough to make it a viable attack also. The next question I have is- do(es) the other vendor(s) provide a way to address this or is it simply known that this doesn't offer any protection at all from adhoc queries and it's strictly for formatting? I can certainly imagine it actually being a way to simply avoid *inadvertant* exposure rather than providing any security from the individual running the commands. I'm not sure that would make it genuinely different enough from simply maintaining a view which does that filtering to make it useful on its own as a feature though, but I'm not particularly strongly against it either. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Column Redaction
On 10 October 2014 19:26, Bruce Momjian br...@momjian.us wrote: On Fri, Oct 10, 2014 at 02:05:05PM +0100, Thom Brown wrote: On 10 October 2014 13:43, Simon Riggs si...@2ndquadrant.com wrote: On 10 October 2014 11:45, Thom Brown t...@linux.com wrote: To be honest, this all sounds rather flaky. My other concern is you must have realized these issues in five seconds too, so why didn't you mention them? Because the problems that you come up with in 5 seconds aren't necessarily problems. You just think they are, given 5 seconds thought. I think my first impression of the concept was poor also though it would be wonderful if I had remembered all of my initial objections. I didn't have any problem with Thom's first post, which was helpful in allowing me to explain the context and details. As I said in reply at that point, this is not in itself a barrier; other measures are necessary. The rest of the thread has descended into a massive misunderstanding of the purpose and role of redaction. When any of us move too quickly to a value judgement about a new concept then we're probably missing the point. All of us will be asked at sometime in the next few years why Postgres doesn't have redaction. When you get it, post back here please. Or if you win the argument on it not being useful in any circumstance, post that here also. I'm not in a rush. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On 10/9/14, 6:59 PM, Gavin Flower wrote: On 10/10/14 12:38, Jim Nasby wrote: On 10/8/14, 5:51 PM, Peter Geoghegan wrote: On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com wrote: Although the last go-around does suggest that there is at least one point of difference on the semantics. You seem to want to fire the BEFORE INSERT triggers before determining whether this will be an INSERT or an UPDATE. That seems like a bad idea to me, but if the consensus is that we want to do that, it does argue for your plan of making UPSERT a variant of the INSERT command. Well, it isn't that I'm doing it because I think that it is a great idea, with everything to recommend it. It's more like I don't see any practical alternative. We need the before row insert triggers to fire to figure out what to insert (or if we should update instead). No way around that. At the same time, those triggers are at liberty to do almost anything, and so in general we have no way of totally nullifying their effects (or side effects). Surely you see the dilemma. FWIW, if each row was handled in a subtransaction then an insert that turned out to be an update could be rolled back... but the performance impact of going that route might be pretty horrid. :( There's also the potential to get stuck in a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a BEFORE UPDATE trigger turns it into an INSERT. Perhaps you need an UPSERT trigger? I would think that a BEFORE UPSERT trigger would very likely want to know whether we were inserting or updating, which basically puts us back where we started. That said, since the use case for UPSERT is different than both INSERT and UPDATE maybe it would be a good idea to have a separate trigger for them anyway. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
Peter Geoghegan p...@heroku.com wrote: On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas robertmh...@gmail.com wrote: I think what's realistic here is that the patch isn't going to get committed the way you have it today, because nobody else likes that design. That may be harsh, but I think it's accurate. I do think that's harsh, because it's unnecessary: I am looking for the best design. If you want to propose alternatives, great, but there is a reason why I've done things that way, and that should be acknowledged. I too think naming the unique index is ugly as sin, and have said as much multiple times. We're almost on the same page here. I hope you can adjust to the feedback, because it would be disappointing to have this feature slip from the release, which is what will happen if the index name remains part of the syntax. Would you be okay with this never working with partial unique indexes? That gives me something to work with. That seems like the only sensible course, to me. If you want to allow this to work with expression indexes, that's not really a problem; you can let the user write INSERT .. ON CONFLICT (upper(foo)) UPDATE ... and match that to the index. It wouldn't bother me if the first version of this feature couldn't target expression indexes anyway, but if you want to include that, having the user mention the actual expression rather than the index name shouldn't make much difference. I'm not that worried about expression indexes, actually. I'm mostly worried about partial unique indexes, particularly when before insert row-level triggers are in play (making *that* play nice with expression indexes is harder still, but expression indexes on their own are probably not that much of a problem). There is a problem if any column of the index allows nulls, since that would allow multiple rows in the target table to match an argument. Proving that an expression does not could be tricky. I suggest that, at least for a first cut, we restrict this to matching an index on NOT NULL columns. Also, how about making the SET clause optional, with the semantics that we just update all of the fields for which a value is explicitly specified: INSERT INTO overwrite_with_abandon (key, value) VALUES (42, 'meaning of life') ON DUPLICATE (key) UPDATE; Your syntax allows the exact same thing; it simply require the user to be more verbose in order to get that behavior. If we think that people wanting that behavior will be rare, then it's fine to require them to spell it out when they want it. If we think it will be the overwhelming common application of this feature, and I do, then making people spell it out when we could just as well infer it is pointless. +1 Did you consider my example? I think that people will like this idea, too - that clearly isn't the only consideration, though. As you say, it would be very easy to implement this. However, IMV, we shouldn't, because it is hazardous. To quote the cited case: 1. Developer writes the query, and it works fine. 2. Some time later, the DBA adds an inserted_at column (those are common). The DBA is not aware of the existence of this particular query. The new column has a default value of now(), say. Should we UPDATE the inserted_at column to be NULL? Or (more plausibly) the default value filled in by the INSERT? Or leave it be? I think there is a case to be made for all of these behaviors, and that tension makes me prefer to not do this at all. It's like encouraging SELECT * queries in production, only worse. That does point out the problem, in general, with carrying values from a BEFORE INSERT trigger into an UPDATE. Perhaps if the INSERT fails the UPDATE phase should start with the values specified in the first place, and not try to use anything returned by the BEFORE INSERT triggers? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 2:17 PM, Jim Nasby jim.na...@bluetreble.com wrote: That said, since the use case for UPSERT is different than both INSERT and UPDATE maybe it would be a good idea to have a separate trigger for them anyway. -1. I think that having a separate UPSERT trigger is a bad idea. I'll need to change the statement-level behavior to match what Kevin described (fires twice, always, when INSERT ON CONFLICT UPDATE is used). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 10, 2014 at 2:16 PM, Kevin Grittner kgri...@ymail.com wrote: Would you be okay with this never working with partial unique indexes? That gives me something to work with. That seems like the only sensible course, to me. Okay, great. If that's the consensus, then that's all I need to know. I'm not that worried about expression indexes, actually. I'm mostly worried about partial unique indexes, particularly when before insert row-level triggers are in play There is a problem if any column of the index allows nulls, since that would allow multiple rows in the target table to match an argument. Proving that an expression does not could be tricky. I suggest that, at least for a first cut, we restrict this to matching an index on NOT NULL columns. Hmm. That could definitely be a problem. But the problem is clearly the user's fault. It could allow multiple rows to match, as you say - but you're talking about a different concept of matching. Unlike with certain other systems, that isn't a would-be unique violation, and so it isn't a match in the sense I intend, and so you insert. You don't match more than one row, because (in the sense peculiar to this feature) there are no matches, and insertion really is the correct outcome. As a user, you have no more right to be surprised by this then by the fact that you wouldn't see a dup violation with a similar vanilla INSERT - some people are surprised by that, with regularity. Maybe I need to emphasize the distinction in the documentation between the two slightly different ideas of matching that are in tension here. I think it would be a mistake to forcibly normalize things to remove that distinction (by adding an artificial restriction), at any rate. Did you consider my example? I think that people will like this idea, too - that clearly isn't the only consideration, though. As you say, it would be very easy to implement this. However, IMV, we shouldn't, because it is hazardous. To quote the cited case: 1. Developer writes the query, and it works fine. 2. Some time later, the DBA adds an inserted_at column (those are common). The DBA is not aware of the existence of this particular query. The new column has a default value of now(), say. Should we UPDATE the inserted_at column to be NULL? Or (more plausibly) the default value filled in by the INSERT? Or leave it be? I think there is a case to be made for all of these behaviors, and that tension makes me prefer to not do this at all. It's like encouraging SELECT * queries in production, only worse. That does point out the problem, in general, with carrying values from a BEFORE INSERT trigger into an UPDATE. Perhaps if the INSERT fails the UPDATE phase should start with the values specified in the first place, and not try to use anything returned by the BEFORE INSERT triggers? I think that that behavior is far more problematic for numerous other reasons, though. Potentially, you're not seeing what *really* caused the conflict at all. I just don't think it's worth it to save a bit of typing. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Materialized views don't show up in information_schema
Hi, I've been testing out some of the new materialized view functionality in 9.4 and noticed that they don't show up in the information_schema data dictionary views, specifically information_schema.tables or information_schema.views (sample output is below and it's the same for 9.3 as well). Is this on purpose or just an oversight? I think they should at least be in the information_schema.views view, though the case could be made that they should be in information_schema.tables as well (as they can have additional indexes). If they're added to the tables view then they should have a separate table_type (say, MATERIALIZED VIEW), similar to how foreign tables show up. Looking at the view definitions, it looks like it's just a matter of adding 'm' values to the IN clauses that filter pg_class.relkind. It doesn't look like they're used internally by anything other than tests. Only client apps that query the data dictionary views (ex: GUI clients or dynamic SQL generators) would be impacted. I'd argue it's for the better as they can they can now see those objects, even if they think they're regular tables or views. If this sound fine I can put together a patch for this. Regards, -- Sehrope Sarkuni postgres@vagrant-ubuntu-trusty-64:~$ psql test psql (9.4beta3) Type help for help. test=# CREATE TABLE some_table (x text); CREATE TABLE test=# CREATE MATERIALIZED VIEW some_mview AS SELECT * FROM some_table; SELECT 0 test=# CREATE VIEW some_view AS SELECT * FROM some_table; CREATE VIEW test=# \d some_mview Materialized view public.some_mview Column | Type | Modifiers +--+--- x | text | test=# SELECT table_name, table_type FROM information_schema.tables WHERE table_schema = 'public'; table_name | table_type + some_table | BASE TABLE some_view | VIEW (2 rows) test=# SELECT table_name, view_definition FROM information_schema.views WHERE table_schema = 'public'; table_name | view_definition +-- some_view | SELECT some_table.x+ |FROM some_table; (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views don't show up in information_schema
Sehrope Sarkuni sehr...@jackdb.com writes: I've been testing out some of the new materialized view functionality in 9.4 and noticed that they don't show up in the information_schema data dictionary views, specifically information_schema.tables or information_schema.views (sample output is below and it's the same for 9.3 as well). Is this on purpose or just an oversight? It's on purpose. The information_schema can only show objects that exist in the SQL standard. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views don't show up in information_schema
* Tom Lane (t...@sss.pgh.pa.us) wrote: Sehrope Sarkuni sehr...@jackdb.com writes: I've been testing out some of the new materialized view functionality in 9.4 and noticed that they don't show up in the information_schema data dictionary views, specifically information_schema.tables or information_schema.views (sample output is below and it's the same for 9.3 as well). Is this on purpose or just an oversight? It's on purpose. The information_schema can only show objects that exist in the SQL standard. I'm not particularly thrilled with this answer. I'd aruge that the 'materialized' part of mat views isn't relevant to the standard, which does not concern itself with such performance-oriented considerations, and therefore, to the standard's view (pun rather intended), they're views. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] orangutan seizes up during isolation-check
On Mon, Sep 15, 2014 at 12:51:14AM -0400, Noah Misch wrote: 1. Fork, call setlocale(LC_x, ) in the child, pass back the effective locale name through a pipe, and pass that name to setlocale() in the original process. The short-lived child will get the extra threads, and the postmaster will remain clean. Here's an implementation thereof covering both backend and frontend use of setlocale(). A setlocale() wrapper, pg_setlocale(), injects use of the child process where necessary. I placed the new function in src/common/exec.c, because every executable already needed that file's code and translatable strings for set_pglocale_pgservice() and callees. (Some executables do miss exec.c translations; I did not update them.) src/port/chklocale.c was closer in subject matter, but libpq imports it; this function does not belong in libpq. Also, this function would benefit from a frontend ereport() implementation, which is likely to land in libpgcommon if it lands at all. The runtime changes are conditional on __darwin__ but not on --enable-nls. NLS builds are the production norm; I'd like non-NLS builds to consistently exercise this code rather than shave its bytes and cycles. pg_setlocale() relies on main() setting all six LC_category environment variables. The second attached patch seals the cracks in main()'s longstanding attempt to do so. This improves preexisting user-visible behavior in weird cases: LANG=pt_BR.utf8 LC_ALL=invalid postgres -D nosuch will now print untranslated text, not pt_BR text. Documentation for each of lc_monetary, lc_numeric and lc_time says If this variable is set to the empty string (which is the default) then the value is inherited from the execution environment of the server in a system-dependent way. Not so; by design, setlocale(LC_x, ) just re-selects the last value L passed to pg_perm_setlocale(LC_x, L). I have not touched this; I mention it for the sake of the archives. commit cd27ff7 (HEAD) Author: Noah Misch n...@leadboat.com AuthorDate: Fri Oct 10 04:04:03 2014 -0400 Commit: Noah Misch n...@leadboat.com CommitDate: Fri Oct 10 04:04:03 2014 -0400 When setlocale(LC_x, ) will start a thread, run it in a child process. Only Darwin, --enable-nls builds use a setlocale() that can start a thread. Buildfarm member orangutan experienced BackendList corruption on account of different postmaster threads executing signal handlers simultaneously. Furthermore, a multithreaded postmaster risks undefined behavior from sigprocmask() and fork(). Introduce pg_setlocale(), a wrapper around setlocale(). When the corresponding raw setlocale() call could start a thread, it forks, retrieves the setlocale() return value from the child, and uses that non- locale value to make an equivalent setlocale() call in the original process. The short-lived child does become multithreaded, but it uses no feature for which that poses a problem. Use of pg_setlocale() in the frontend protects forking executables, such as pg_dump, from undefined behavior. As the usage guideline comment implies, whether to call setlocale() or pg_setlocale() is a mere style question for most code sites. Opt for pg_setlocale() at indifferent code sites, leaving raw setlocale() calls in src/interfaces, in pg_setlocale() itself, and in callees of pg_setlocale(). Back-patch to 9.0 (all supported versions). diff --git a/configure b/configure index f0580ce..ee08963 100755 --- a/configure +++ b/configure @@ -11298,7 +11298,7 @@ fi LIBS_including_readline=$LIBS LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat readlink setproctitle setsid shm_open sigprocmask symlink sync_file_range towlower utime utimes wcstombs wcstombs_l +for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open sigprocmask symlink sync_file_range towlower utime utimes wcstombs wcstombs_l do : as_ac_var=`$as_echo ac_cv_func_$ac_func | $as_tr_sh` ac_fn_c_check_func $LINENO $ac_func $as_ac_var diff --git a/configure.in b/configure.in index 527b076..2f8adfc 100644 --- a/configure.in +++ b/configure.in @@ -1257,7 +1257,7 @@ PGAC_FUNC_GETTIMEOFDAY_1ARG LIBS_including_readline=$LIBS LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat readlink setproctitle setsid shm_open sigprocmask symlink sync_file_range towlower utime utimes wcstombs wcstombs_l]) +AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open sigprocmask symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
[HACKERS] B-Tree index builds, CLUSTER, and sortsupport
Both Robert and Heikki expressed dissatisfaction with the fact that B-Tree index builds don't use sortsupport. Because B-Tree index builds cannot really use the onlyKey optimization, the historic oversight of not supporting B-Tree builds (and CLUSTER) might have been at least a little understandable [1]. But with the recent work on sortsupport for text, it's likely that B-Tree index builds will be missing out on rather a lot by not taking advantage of sortsupport. Attached patch modifies tuplesort to correct this oversight. What's really nice about it is that there is actually a net negative code footprint: src/backend/access/nbtree/nbtsort.c | 63 +++--- src/backend/utils/sort/sortsupport.c | 77 ++-- src/backend/utils/sort/tuplesort.c | 274 +++ src/include/utils/sortsupport.h | 3 + 4 files changed, 203 insertions(+), 214 deletions(-) I've been able to throw out a lot of code, including two virtually identical inlinable comparators (that have logic for NULL-wise comparisons). This patch pretty much justifies itself as a refactoring exercise, without performance improvements entering into it, which is nice. I haven't benchmarked it yet, but it seems reasonable to assume that much the same benefits will be seen as were seen for the MinimalTuple case (for multiple-attribute sorts, that similarly cannot use the onlyKey optimization). With this patch, all callers of _bt_mkscankey_nodata() now use sortsupport. This raises the question: Why not just have _bt_mkscankey_nodata() do it all for us? I think that continuing to have B-Tree provide a scanKey, and working off of that is a better abstraction. It would save a few cycles to have the index_getprocinfo() call currently within _bt_mkscankey_nodata() look for BTSORTSUPPORT_PROC rather than BTORDER_PROC and be done with it, but I'd call that a modularity violation. Tuplesort decides a strategy (BTGreaterStrategyNumber or BTLessStrategyNumber) based on the scanKey B-Tree private flag SK_BT_DESC, and it's appropriate for that to live in tuplesort's head if possible, because tuplesort/sortsupport tracks sort direction in a fairly intimate way. Besides, I think that sortsupport already has enough clients for what it is. I imagine it makes sense to directly access a sortsupport-installed comparator through a scanKey, for actual index scans (think abbreviated keys in internal B-Tree pages), but I still want uniformity with the other cases within tuplesort (i.e. the MinimalTuple and Datum cases), so I'm not about to change scanKeys to have a comparator that doesn't need a fmgr trampoline for the benefit of this patch. Note that I don't even store a scanKey within Tuplesortstate anymore. That uniformity is what allowed to to throw out the per-tuplesort-case reversedirection() logic, and use generic reversedirection() logic instead (as anticipated by current comments within Tuplesortstate). Thoughts? [1] http://www.postgresql.org/message-id/cam3swzqlg8nx2yeb+67xx0gig+-folfbtqjkg+jl15_zhi1...@mail.gmail.com -- Peter Geoghegan diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c new file mode 100644 index e8a89d2..a6c44a7 *** a/src/backend/access/nbtree/nbtsort.c --- b/src/backend/access/nbtree/nbtsort.c *** _bt_load(BTWriteState *wstate, BTSpool * *** 684,689 --- 684,690 int i, keysz = RelationGetNumberOfAttributes(wstate-index); ScanKey indexScanKey = NULL; + SortSupport sortKeys; if (merge) { *** _bt_load(BTWriteState *wstate, BTSpool * *** 699,704 --- 700,730 true, should_free2); indexScanKey = _bt_mkscankey_nodata(wstate-index); + /* Prepare SortSupport data for each column */ + sortKeys = (SortSupport) palloc0(keysz * sizeof(SortSupportData)); + + for (i = 0; i keysz; i++) + { + SortSupport sortKey = sortKeys + i; + ScanKey scanKey = indexScanKey + i; + int16 strategy; + + sortKey-ssup_cxt = CurrentMemoryContext; + sortKey-ssup_collation = scanKey-sk_collation; + sortKey-ssup_nulls_first = + (scanKey-sk_flags SK_BT_NULLS_FIRST) != 0; + sortKey-ssup_attno = scanKey-sk_attno; + + AssertState(sortKey-ssup_attno != 0); + + strategy = (scanKey-sk_flags SK_BT_DESC) != 0 ? + BTGreaterStrategyNumber : BTLessStrategyNumber; + + PrepareSortSupportFromIndexRel(wstate-index, strategy, sortKey); + } + + _bt_freeskey(indexScanKey); + for (;;) { load1 = true; /* load BTSpool next ? */ *** _bt_load(BTWriteState *wstate, BTSpool * *** 711,753 { for (i = 1; i = keysz; i++) { ! ScanKey entry; Datum attrDatum1, attrDatum2; bool isNull1, isNull2; int32 compare; ! entry = indexScanKey + i - 1; attrDatum1 = index_getattr(itup, i, tupdes, isNull1); attrDatum2 = index_getattr(itup2, i, tupdes, isNull2); - if (isNull1) - { -
Re: [HACKERS] Materialized views don't show up in information_schema
On 10/10/14 6:53 PM, Stephen Frost wrote: I'm not particularly thrilled with this answer. I'd aruge that the 'materialized' part of mat views isn't relevant to the standard, which does not concern itself with such performance-oriented considerations, and therefore, to the standard's view (pun rather intended), they're views. For example, you can't drop a materialized view with DROP VIEW. So any tool that offers a list of views to manipulate based on the information schema would be confused. This is different from temporary views, for example. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views don't show up in information_schema
* Peter Eisentraut (pete...@gmx.net) wrote: On 10/10/14 6:53 PM, Stephen Frost wrote: I'm not particularly thrilled with this answer. I'd aruge that the 'materialized' part of mat views isn't relevant to the standard, which does not concern itself with such performance-oriented considerations, and therefore, to the standard's view (pun rather intended), they're views. For example, you can't drop a materialized view with DROP VIEW. So any tool that offers a list of views to manipulate based on the information schema would be confused. This is different from temporary views, for example. And users will be confused when using a tool which doesn't see mat views, which is what started this thread. Overall, I'm inclined to view materialized views as a subset of views, which would mean that we'd perhaps add the ability to drop them with 'drop view'. As a comparison, what about unlogged tables? They're not normal tables and they aren't defined by the SQL standard either. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-10 17:18:46 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com wrote: Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. Is it possible that implementation of some atomic operation is costlier for particular architecture? Yes, sure. And IIRC POWER improved atomics performance considerably for POWER8... I have tried again for scale factor 3000 and could see the dip and this time I have even tried with 175 client count and the dip is approximately 5% which is slightly more than 160 client count. FWIW, the profile always looks like: For my tests on Power8, the profile looks somewhat similar to below profile mentioned by you, please see this mail: http://www.postgresql.org/message-id/caa4ek1je9zblhsfiavhd18gdwxux21zfqpjgq_dz_zoa35n...@mail.gmail.com However on Power7, the profile looks different which I have posted above thread. BTW, that profile *clearly* indicates we should make StrategyGetBuffer() smarter. Yeah, even bgreclaimer patch is able to achieve the same, however after that the contention moves to somewhere else as you can see in above link. Here it goes.. Lwlock_contention patches - client_count=128 -- + 7.95% postgres postgres [.] GetSnapshotData + 3.58% postgres postgres [.] AllocSetAlloc + 2.51% postgres postgres [.] _bt_compare + 2.44% postgres postgres [.] hash_search_with_hash_value + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user + 2.24% postgres postgres [.] AllocSetFreeIndex + 1.75% postgres postgres [.] pg_atomic_fetch_add_u32_impl Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were the compiler settings you used? Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-10 17:18:46 +0530, Amit Kapila wrote: On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com wrote: Observations -- a. The patch performs really well (increase upto ~40%) incase all the data fits in shared buffers (scale factor -100). b. Incase data doesn't fit in shared buffers, but fits in RAM (scale factor -3000), there is performance increase upto 16 client count, however after that it starts dipping (in above config unto ~4.4%). Hm. Interesting. I don't see that dip on x86. Is it possible that implementation of some atomic operation is costlier for particular architecture? Yes, sure. And IIRC POWER improved atomics performance considerably for POWER8... I have tried again for scale factor 3000 and could see the dip and this time I have even tried with 175 client count and the dip is approximately 5% which is slightly more than 160 client count. I've run some short tests on hydra: scale 1000: base: 4GB: tps = 296273.004800 (including connections establishing) tps = 296373.978100 (excluding connections establishing) 8GB: tps = 338001.455970 (including connections establishing) tps = 338177.439106 (excluding connections establishing) base + freelist: 4GB: tps = 297057.523528 (including connections establishing) tps = 297156.987418 (excluding connections establishing) 8GB: tps = 335123.867097 (including connections establishing) tps = 335239.122472 (excluding connections establishing) base + LW_SHARED: 4GB: tps = 296262.164455 (including connections establishing) tps = 296357.524819 (excluding connections establishing) 8GB: tps = 336988.744742 (including connections establishing) tps = 337097.836395 (excluding connections establishing) base + LW_SHARED + freelist: 4GB: tps = 296887.981743 (including connections establishing) tps = 296980.231853 (excluding connections establishing) 8GB: tps = 345049.062898 (including connections establishing) tps = 345161.947055 (excluding connections establishing) I've also run some preliminary tests using scale=3000 - and I couldn't see a performance difference either. Note that all these are noticeably faster than your results. Lwlock_contention patches - client_count=128 -- + 7.95% postgres postgres [.] GetSnapshotData + 3.58% postgres postgres [.] AllocSetAlloc + 2.51% postgres postgres [.] _bt_compare + 2.44% postgres postgres [.] hash_search_with_hash_value + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user + 2.24% postgres postgres [.] AllocSetFreeIndex + 1.75% postgres postgres [.] pg_atomic_fetch_add_u32_impl Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were the compiler settings you used? Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] orangutan seizes up during isolation-check
On 10/10/14 8:24 PM, Noah Misch wrote: Here's an implementation thereof covering both backend and frontend use of setlocale(). A setlocale() wrapper, pg_setlocale(), injects use of the child process where necessary. Would such a change not be better in gnulib itself? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views don't show up in information_schema
Another example is an ETL tool or ORM that queries the data dictionary to generate SQL or object models. If mviews show up in the regular information_schema views (specifically the .columns view) then it'll just work as if it's a regular table or view (ex: like foreign tables). If not, it's possible to use the internal pg_ catalog tables to get the column details but it'd require PG specific coding. In our case I already plan on using the internal catalog tables to display mview metadata. I brought this up thinking about existing apps and tools being able to work with mviews without additional code changes. Regards, -- Sehrope Sarkuni -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: I've run some short tests on hydra: scale 1000: base: 4GB: tps = 296273.004800 (including connections establishing) tps = 296373.978100 (excluding connections establishing) 8GB: tps = 338001.455970 (including connections establishing) tps = 338177.439106 (excluding connections establishing) base + freelist: 4GB: tps = 297057.523528 (including connections establishing) tps = 297156.987418 (excluding connections establishing) 8GB: tps = 335123.867097 (including connections establishing) tps = 335239.122472 (excluding connections establishing) base + LW_SHARED: 4GB: tps = 296262.164455 (including connections establishing) tps = 296357.524819 (excluding connections establishing) 8GB: tps = 336988.744742 (including connections establishing) tps = 337097.836395 (excluding connections establishing) base + LW_SHARED + freelist: 4GB: tps = 296887.981743 (including connections establishing) tps = 296980.231853 (excluding connections establishing) 8GB: tps = 345049.062898 (including connections establishing) tps = 345161.947055 (excluding connections establishing) I've also run some preliminary tests using scale=3000 - and I couldn't see a performance difference either. Note that all these are noticeably faster than your results. What is the client count? Could you please post numbers you are getting for 3000 scale factor for client count 128 and 175? Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Hmm. okay, but is this required when we do actual performance tests, because for that currently I don't use CFLAGS. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Materialized views don't show up in information_schema
On 10/10/14 8:44 PM, Stephen Frost wrote: As a comparison, what about unlogged tables? They're not normal tables and they aren't defined by the SQL standard either. They are normal tables when considered within the scope of the SQL standard. The only difference to normal tables is their crash recovery behavior, which is outside the scope of the SQL standard. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On 2014-10-11 06:49:54 +0530, Amit Kapila wrote: On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: I've run some short tests on hydra: scale 1000: base: 4GB: tps = 296273.004800 (including connections establishing) tps = 296373.978100 (excluding connections establishing) 8GB: tps = 338001.455970 (including connections establishing) tps = 338177.439106 (excluding connections establishing) base + freelist: 4GB: tps = 297057.523528 (including connections establishing) tps = 297156.987418 (excluding connections establishing) 8GB: tps = 335123.867097 (including connections establishing) tps = 335239.122472 (excluding connections establishing) base + LW_SHARED: 4GB: tps = 296262.164455 (including connections establishing) tps = 296357.524819 (excluding connections establishing) 8GB: tps = 336988.744742 (including connections establishing) tps = 337097.836395 (excluding connections establishing) base + LW_SHARED + freelist: 4GB: tps = 296887.981743 (including connections establishing) tps = 296980.231853 (excluding connections establishing) 8GB: tps = 345049.062898 (including connections establishing) tps = 345161.947055 (excluding connections establishing) I've also run some preliminary tests using scale=3000 - and I couldn't see a performance difference either. Note that all these are noticeably faster than your results. What is the client count? 160, because that was the one you reported the biggest regression. Could you please post numbers you are getting for 3000 scale factor for client count 128 and 175? Yes, although not tonight Also from hydra? Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Hmm. okay, but is this required when we do actual performance tests, because for that currently I don't use CFLAGS. I'm not sure what you mean? You need to include -O2 in CFLAGS whenever you override it. Your profile was clearly without inlining... And since your general performance numbers are a fair bit lower than what I see with, hopefully, the same code on the same machine... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9
On Sat, Oct 11, 2014 at 7:00 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 06:49:54 +0530, Amit Kapila wrote: On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-10-11 06:18:11 +0530, Amit Kapila wrote: I've run some short tests on hydra: Could you please post numbers you are getting for 3000 scale factor for client count 128 and 175? Yes, although not tonight No issues, whenever you get it. Also from hydra? Yes. One more thing I would like to share with you is that while doing tests, there are some other settings change in postgresql.conf maintenance_work_mem = 1GB synchronous_commit = off wal_writer_delay = 20ms checkpoint_segments=256 checkpoint_timeout=35min I don't think these parameters matter for the tests we are doing, but still I thought it is good to share, because I forgot to send some of these non-default settings in previous mail. Nothing specific, for performance tests where I have to take profiles I use below: ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer make Hah. Doing so overwrites the CFLAGS configure normally sets. Check # CFLAGS are selected so: # If the user specifies something in the environment, that is used. # else: If the template file set something, that is used. # else: If coverage was enabled, don't set anything. # else: If the compiler is GCC, then we use -O2. # else: If the compiler is something else, then we use -O, unless debugging. so, if you do like above, you're compiling without optimizations... So, include at least -O2 as well. Hmm. okay, but is this required when we do actual performance tests, because for that currently I don't use CFLAGS. I'm not sure what you mean? You need to include -O2 in CFLAGS whenever you override it. okay, thats what I wanted to ask you, so that we should not see different numbers due to the way code is built. When I do performance tests where I don't want to see profile, I use below statement: ./configure --prefix=installation_path And since your general performance numbers are a fair bit lower than what I see with, hopefully, the same code on the same machine... You have reported numbers at 1000 scale factor and mine were at 3000 scale factor, so I think the difference is expected. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] orangutan seizes up during isolation-check
On Fri, Oct 10, 2014 at 09:14:38PM -0400, Peter Eisentraut wrote: On 10/10/14 8:24 PM, Noah Misch wrote: Here's an implementation thereof covering both backend and frontend use of setlocale(). A setlocale() wrapper, pg_setlocale(), injects use of the child process where necessary. Would such a change not be better in gnulib itself? Good question. It would be nice to make the change there, for the benefit of other consumers. The patch's setlocale_native_forked() assumes it never runs in a multithreaded process, but libintl_setlocale() must not assume that. I see a few ways libintl/gnulib might proceed: 1) exec() a helper program to run CFLocaleCopyCurrent(). Distributing that executable alongside libintl and locating it at runtime is daunting. 2) Audit the CFLocaleCopyCurrent() code to see if it will, in practice, run reliably in a fork of a multithreaded process. (That conclusion could change without notice.) Use the setlocale_native_forked() technique. 3) Don't change libintl_setlocale(), but add a libintl_setlocale_nothread() for single-threaded consumers to adopt. Each consumer will need to modify code. libintl has a lean API; I expect adding this would be controversial. Among those, I would probably adopt (2). We know much about the process conditions in which pg_setlocale() will run, so placing the workaround in PostgreSQL erases the problem of (2). I'm inclined to stick with placing the workaround in PostgreSQL, but there are fair arguments on both sides. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers