date:20141010

On Fri, Oct 10, 2014 at 1:08 AM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-10-09 16:01:55 +0200, Andres Freund wrote:
 
  I don't think OLTP really is the best test case for this. Especially not
  pgbench with relatilvely small rows *and* a uniform distribution of
  access.
 
  Try parallel COPY TO. Batch write loads is where I've seen this hurt
  badly.


 just by switching shared_buffers from 1 to 8GB. I haven't tried, but I
 hope that with an approach like your's this might become better.

 psql -f /tmp/prepare.sql
 pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100

Thanks for providing the scripts.  You haven't specified how much data
is present in 'large' file used in tests.  I have tried with different set
of
rows, but I could not see the dip that is present in your data when you
increased shared buffers from 1GB to 8GB, also I don't see any difference
with patch.  BTW, why do you think that for such worklaods this patch can
be helpful, according to my understanding it can be helpful mainly for
read mostly workloads when all the data doesn't fit in shared buffers.

Performance Data
---
IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB

For 50 rows

Data populated using below statement:
insert into largedata_64
('aa',generate_series(1,50));
copy largedata_64 to '/tmp/large' binary;

pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100

shared_buffers - 1GB
-
progress: 7.0 s, 2.7 tps, lat 2326.645 ms stddev 173.506
progress: 11.5 s, 3.5 tps, lat 2295.577 ms stddev 78.949
progress: 15.8 s, 3.7 tps, lat 2298.217 ms stddev 223.346
progress: 20.4 s, 3.5 tps, lat 2350.187 ms stddev 192.312
progress: 25.1 s, 3.4 tps, lat 2280.206 ms stddev 54.580
progress: 31.9 s, 3.4 tps, lat 2408.593 ms stddev 243.230
progress: 45.2 s, 1.1 tps, lat 5120.151 ms stddev 3913.561
progress: 50.5 s, 1.3 tps, lat 8967.954 ms stddev 3384.229
progress: 52.7 s, 2.7 tps, lat 3883.788 ms stddev 1733.293
progress: 55.6 s, 3.2 tps, lat 2684.282 ms stddev 348.615
progress: 58.2 s, 3.4 tps, lat 2602.355 ms stddev 268.718
progress: 60.8 s, 3.1 tps, lat 2361.937 ms stddev 302.643
progress: 65.3 s, 3.5 tps, lat 2341.903 ms stddev 162.338
progress: 74.1 s, 2.6 tps, lat 2720.182 ms stddev 716.425
progress: 76.4 s, 3.5 tps, lat 3023.234 ms stddev 670.473
progress: 80.4 s, 2.0 tps, lat 2795.323 ms stddev 820.429
progress: 85.6 s, 1.9 tps, lat 4756.217 ms stddev 844.284
progress: 91.9 s, 2.2 tps, lat 3996.001 ms stddev 1301.143
progress: 96.6 s, 3.5 tps, lat 2284.419 ms stddev 85.013
progress: 101.1 s, 3.5 tps, lat 2282.848 ms stddev 71.388
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 8
duration: 100 s
number of transactions actually processed: 275
latency average: 2939.784 ms
latency stddev: 1739.974 ms
tps = 2.710138 (including connections establishing)
tps = 2.710208 (excluding connections establishing)



shared_buffers - 8GB

progress: 6.7 s, 2.7 tps, lat 2349.816 ms stddev 212.889
progress: 11.0 s, 3.5 tps, lat 2257.364 ms stddev 141.148
progress: 15.2 s, 3.8 tps, lat 2209.669 ms stddev 127.101
progress: 21.7 s, 3.7 tps, lat 2159.838 ms stddev 92.205
progress: 25.8 s, 3.9 tps, lat 2221.072 ms stddev 283.362
progress: 30.1 s, 3.5 tps, lat 2179.611 ms stddev 152.741
progress: 39.3 s, 2.1 tps, lat 2768.609 ms stddev 1265.508
progress: 50.9 s, 1.1 tps, lat 9361.388 ms stddev 2657.885
progress: 52.9 s, 1.0 tps, lat 2036.098 ms stddev 3.599
progress: 55.2 s, 4.3 tps, lat 2167.688 ms stddev 91.183
progress: 57.6 s, 3.0 tps, lat 2399.219 ms stddev 173.535
progress: 60.2 s, 4.1 tps, lat 2427.273 ms stddev 198.698
progress: 65.2 s, 3.4 tps, lat 2441.630 ms stddev 123.773
progress: 72.4 s, 2.9 tps, lat 2534.631 ms stddev 254.162
progress: 75.0 s, 3.9 tps, lat 2468.266 ms stddev 221.969
progress: 82.3 s, 3.0 tps, lat 2548.690 ms stddev 404.852
progress: 86.7 s, 1.4 tps, lat 3980.576 ms stddev 1205.743
progress: 92.5 s, 1.4 tps, lat 5174.340 ms stddev 643.415
progress: 97.1 s, 3.7 tps, lat 3252.847 ms stddev 1689.268
progress: 101.8 s, 3.4 tps, lat 2346.690 ms stddev 138.251
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 8
number of threads: 8
duration: 100 s
number of transactions actually processed: 284
latency average: 2856.195 ms
latency stddev: 1740.699 ms
tps = 2.781603 (including connections establishing)
tps = 2.781682 (excluding connections establishing)


For 5000 rows

shared_buffers - 1GB
---
progress: 5.0 s, 357.7 tps, lat 22.295 ms stddev 3.511
progress: 10.0 s, 339.0 tps, lat 23.606 ms stddev 4.388
progress: 15.0 s, 323.4 tps, lat 24.733 ms stddev 5.001
progress: 20.0 s, 329.6 tps, lat 24.258 ms stddev 4.407
progress: 25.0 s, 334.3 tps, lat 23.963 ms stddev 4.126
progress: 30.0 s, 337.5 tps, lat 23.699 ms

Re: [HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c


On 10/10/2014 02:04 AM, Peter Geoghegan wrote:

I found a reference made obsolete by commit 9e85183b, which is from
way back in 2000.

comparetup_index_btree() says:

/*
* This is similar to _bt_tuplecompare(), but we have already done the
* index_getattr calls for the first column, and we need to keep track of
* whether any null fields are present.  Also see the special treatment
* for equal keys at the end.
*/

I think that this comment should simply indicate that the routine is
similar to comparetup_heap(), except that it takes care of the special
tie-break stuff for B-Tree sorts, as well as enforcing uniqueness
during unique index builds. It should not reference _bt_compare() at
all (which is arguably the successor to _bt_tuplecompare()), since
_bt_compare() is concerned with a bunch of stuff highly specific to
the B-Tree implementation (e.g. having a hard-wired return value for
comparisons involving the first data item on an internal page).


Yeah. Want to write that into a patch?

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c

On Thu, Oct 9, 2014 at 11:58 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Yeah. Want to write that into a patch?

Attached.


-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
new file mode 100644
index 8e57505..2d15da0
*** a/src/backend/utils/sort/tuplesort.c
--- b/src/backend/utils/sort/tuplesort.c
*** comparetup_index_btree(const SortTuple *
*** 3176,3184 
   Tuplesortstate *state)
  {
/*
!* This is similar to _bt_tuplecompare(), but we have already done the
!* index_getattr calls for the first column, and we need to keep track 
of
!* whether any null fields are present.  Also see the special treatment
 * for equal keys at the end.
 */
ScanKey scanKey = state-indexScanKey;
--- 3176,3183 
   Tuplesortstate *state)
  {
/*
!* This is similar to comparetup_heap(), but expects index tuples.  
There
!* is also special handling for enforcing uniqueness, and special 
treatment
 * for equal keys at the end.
 */
ScanKey scanKey = state-indexScanKey;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

Hi,

On 2014-10-10 10:13:03 +0530, Amit Kapila wrote:
 I have done few performance tests for above patches and results of
 same is as below:

Cool, thanks.

 Performance Data
 --
 IBM POWER-7 16 cores, 64 hardware threads
 RAM = 64GB
 max_connections =210
 Database Locale =C
 checkpoint_segments=256
 checkpoint_timeout=35min
 shared_buffers=8GB
 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
 Duration of each individual run = 5mins
 Test type - read only pgbench with -M prepared
 Other Related information about test
 a. This is the data for median of 3 runs, the detailed data of individual
 run
 is attached with mail.
 b. I have applied both the patches to take performance data.
 
 Scale Factor - 100
 
Patch_ver/Client_count 1 8 16 32 64 128  HEAD 13344 106921 196629 295123
 377846 333928  PATCH 13662 106179 203960 298955 452638 465671
 
 Scale Factor - 3000
 
Patch_ver/Client_count 8 16 32 64 128 160  HEAD 86920 152417 231668
 280827 257093 255122  PATCH 87552 160313 230677 276186 248609 244372
 
 
 Observations
 --
 a. The patch performs really well (increase upto ~40%) incase all the
 data fits in shared buffers (scale factor -100).
 b. Incase data doesn't fit in shared buffers, but fits in RAM
 (scale factor -3000), there is performance increase upto 16 client count,
 however after that it starts dipping (in above config unto ~4.4%).

Hm. Interesting. I don't see that dip on x86.

 The above data shows that the patch improves performance for cases
 when there is shared LWLock contention, however there is a slight
 performance dip in case of Exclusive LWLocks (at scale factor 3000,
 it needs exclusive LWLocks for buf mapping tables).  Now I am not
 sure if this is the worst case dip or under similar configurations the
 performance dip can be higher, because the trend shows that dip is
 increasing with more client counts.
 
 Brief Analysis of code w.r.t performance dip
 -
 Extra Instructions w.r.t Head in Acquire Exclusive lock path
 a. Attempt lock twice
 b. atomic operations for nwaiters in LWLockQueueSelf() and
 LWLockAcquireCommon()
 c. Now we need to take spinlock twice, once for self queuing and then
 again for setting releaseOK.
 d. few function calls and some extra checks

Hm. I can't really see the number of atomics itself matter - a spinning
lock will do many more atomic ops than this. But I wonder whether we
could get rid of the releaseOK lock. Should be quite possible.

 Now probably these shouldn't matter much in case backend needs to
 wait for other Exclusive locker, but I am not sure what else could be
 the reason for dip in case we need to have Exclusive LWLocks.

Any chance to get a profile?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] alter user/role CURRENT_USER

2014-10-10 Thread Kyotaro HORIGUCHI

Hello, on the way considering alter role set .., I found that
ALTER ROLE/USER cannot take CURRENT_USER as the role name.

In addition to that, documents of ALTER ROLE / USER are
inconsistent with each other in spite of ALTER USER is said to be
an alias for ALTER ROLE. Plus, ALTER USER cannot take ALL as user
name although ALTER ROLE can.

This patch does following things,

 - ALTER USER/ROLE now takes CURRENT_USER as user name.

 - Rewrite sysnopsis of the documents for ALTER USER and ALTER
   ROLE so as to they have exactly same syntax.

 - Modify syntax of ALTER USER so as to be an alias of ALTER ROLE.

   - Added CURRENT_USER/CURRENT_ROLE syntax to both.
   - Added ALL syntax as user name to ALTER USER.
   - Added IN DATABASE syntax to ALTER USER.

   x Integrating ALTER ROLE/USER syntax could not be done because of
 shift/reduce error of bison.

 x This patch contains no additional regressions. Is it needed?

SESSION_USER/USER also can be made usable for this command, but
this patch doesn't so (yet).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From d12f479de845f55f77096e79fea69930bd665416 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi horiguchi.kyot...@lab.ntt.co.jp
Date: Tue, 9 Sep 2014 19:26:33 +0900
Subject: [PATCH 2/2] ALTER ROLE CURRENT_USER document

---
 doc/src/sgml/ref/alter_role.sgml |   15 ---
 doc/src/sgml/ref/alter_user.sgml |   13 +++--
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/alter_role.sgml b/doc/src/sgml/ref/alter_role.sgml
index 0471daa..e6f8093 100644
--- a/doc/src/sgml/ref/alter_role.sgml
+++ b/doc/src/sgml/ref/alter_role.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-ALTER ROLE replaceable class=PARAMETERname/replaceable [ [ WITH ] replaceable class=PARAMETERoption/replaceable [ ... ] ]
+ALTER ROLE { replaceable class=parametername/replaceable | CURRENT_USER } [ [ WITH ] replaceable class=PARAMETERoption/replaceable [ ... ] ]
 
 phrasewhere replaceable class=PARAMETERoption/replaceable can be:/phrase
 
@@ -37,12 +37,12 @@ ALTER ROLE replaceable class=PARAMETERname/replaceable [ [ WITH ] replace
 | [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'replaceable class=PARAMETERpassword/replaceable'
 | VALID UNTIL 'replaceable class=PARAMETERtimestamp/replaceable'
 
-ALTER ROLE replaceable class=PARAMETERname/replaceable RENAME TO replaceablenew_name/replaceable
+ALTER ROLE replaceable class=parametername/replaceable RENAME TO replaceablenew_name/replaceable
 
-ALTER ROLE replaceable class=PARAMETERname/replaceable [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable { TO | = } { replaceablevalue/replaceable | DEFAULT }
-ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable FROM CURRENT
-ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET replaceableconfiguration_parameter/replaceable
-ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET ALL
+ALTER ROLE { replaceable class=parametername/replaceable | CURRENT_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable { TO | = } { replaceablevalue/replaceable | DEFAULT }
+ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] SET replaceableconfiguration_parameter/replaceable FROM CURRENT
+ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET replaceableconfiguration_parameter/replaceable
+ALTER ROLE { replaceable class=PARAMETERname/replaceable | CURRENT_USER | ALL } [ IN DATABASE replaceable class=PARAMETERdatabase_name/replaceable ] RESET ALL
 /synopsis
  /refsynopsisdiv
 
@@ -123,7 +123,8 @@ ALTER ROLE { replaceable class=PARAMETERname/replaceable | ALL } [ IN DATA
   termreplaceable class=PARAMETERname/replaceable/term
   listitem
para
-The name of the role whose attributes are to be altered.
+The name of the role whose attributes are to be
+altered. literalCURRENT_USER/ matches the name of the current user.
/para
   /listitem
  /varlistentry
diff --git a/doc/src/sgml/ref/alter_user.sgml b/doc/src/sgml/ref/alter_user.sgml
index 58ae1da..feb1197 100644
--- a/doc/src/sgml/ref/alter_user.sgml
+++ b/doc/src/sgml/ref/alter_user.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-ALTER USER replaceable class=PARAMETERname/replaceable [ [ WITH ] replaceable class=PARAMETERoption/replaceable [ ... ] ]
+ALTER USER {

Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-10 Thread Fujii Masao

On Thu, Oct 9, 2014 at 6:42 PM,  furu...@pm.nttdata.co.jp wrote:
  If we remove --fsync-interval, resoponse time to user will not be
 delay.
  Although, fsync will be executed multiple times in a short period.
  And there is no way to solve the problem without --fsync-interval,
  what
  should we do about it?
 
  I'm sorry, I didn't understand that.
 
  Here is an example.
  When WAL is sent at 100ms intervals, fsync() is executed 10 times per
 second.
  If --fsync-interval is set by 1 second, we have to wait SQL
 responce(kind of making WAL record) for 1 second, though, fsync() won't
 be executed several times in 1 second.
  I think --fsync-interval meets the demands of people who wants to
 restrain fsync() happens for several time in short period, but what do
 you think?
  Is it ok to delete --fsync-interval ?

 I still don't see the problem.

 In synchronous mode, pg_receivexlog should have similar logic as
 walreceiver does. It should read as much WAL from the socket as it can
 without blocking, and fsync() and send reply after that. And also fsync
 whenever switching to new segment. If the master sends WAL every 100ms,
 then pg_recevexlog will indeed fsync() 10 times per second. There's
 nothing wrong with that.

 In asynchronous mode, only fsync whenever switching to new segment.

  Yeah. Or rather, add a new message type, to indicate the
  synchronous/asynchronous status.
 
  What kind of 'message type' we have to add ?
 
  Do we need to separate 'w' into two types ? synchronous and
 asynchronous ?
 
  OR
 
  Add a new message type, kind of 'notify synchronous', and inform
  pg_receivexlog of synchronous status when it connect to the server.

 Better to add a new notify message type. And pg_recevexlog should be
 prepared to receive it at any time. The status might change on the fly,
 if the server's configuration is reloaded.

 Thanks for the reply.

 In synchronous mode, pg_receivexlog should have similar logic as walreceiver 
 does.

 OK. I understand that removing --fsync-interval has no problem.

+1 for adding something like --synchronous option. To me,
it sounds walreceiver-compatible mode rather than synchronous.

 Better to add a new notify message type. And pg_recevexlog should be 
 prepared to receive it at any time. The status might change on the fly, if 
 the server's configuration is reloaded.

 OK. I'll consider it.

I don't think that's good idea because it prevents us from using
pg_receivexlog as async walreceiver (i.e., received WAL data is
fsynced and feedback is sent back to the server soon,
but transaction commit in the server doesn't wait for the feedback).

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alter user set local_preload_libraries.

2014-10-10 Thread Kyotaro HORIGUCHI

Hello, I overlooked this thread.

 On Mon, Sep 15, 2014 at 1:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Peter Eisentraut pete...@gmx.net writes:
  On 9/1/14 7:51 AM, Kyotaro HORIGUCHI wrote:
  The attached patch simply changes the context for local_... to
  PGC_USERSET and edits the doc.
 
  I had this ready to commit, but then
 
  Invent PGC_SU_BACKEND and mark log_connections/log_disconnections
  that way.
 
  was committed in the meantime.
 
  Does this affect what we should do with this change?
 
  I guess one thing to look into would be whether we could leave
  local_preload_libraries as PGC_BACKEND and change
  session_preload_libraries to PGC_SU_BACKEND, and then investigate
  whether we could allow settings made with ALTER ROLE / SET to change
  PGC_BACKEND settings.
 
  Yeah, I was wondering about that while I was making the other commit.
  I did not touch those variables at the time, but it would make sense
  to restrict them as you suggest.
 
 +1
 
 Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
 set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
 about applying the attached patch? This patch allows that and
 changes the context of session_preload_libraries to PGC_SU_BACKEND.

It's not my business to decide to aplly it but I don't see
obvious problmen in it so far.

By the way, I became unable to login at all after wrongly setting
*_preload_libraries for all available users. Is there any releaf
measures for the situation? I think it's okay even if there's no
way to login again but want to know if any.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Column Redaction

Postgres currently supports column level SELECT privileges.

1. If we want to confirm a credit card number, we can issue SELECT 1
FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

2. If we want to look for card fraud, we need to be able to use the
full card number to join to transaction data and look up blocked card
lists etc..

3. We want to block the direct retrieval of card numbers for
additional security.
In some cases, we might want to return an answer like ' *  7733'

We can't do all of the above with current facilities inside the database.

The ability to mask output for data in certain cases, for the purpose
of security, is known lately as data redaction, or column-level data
redaction.

The best way to support this requirement would be to allow columns to
have an additional output formatting function. This would be
executed only when data is about to be returned by a query. All other
uses of that would not restrict the data.

This would have other uses as well, such as default report formats, so
we can store financial amounts as NUMERIC, but format them on
retrieval as $12,345.78 etc..

Suggested user interface would be...
   FORMAT functionname(parameters, if any)

e.g.
CREATE TABLE customer
( id ...
...
, stored_card_number  NUMERIC FORMAT pci_card_number_redaction()
...
);

We'd need to implement something to allow pg_dump to ignore format
functions. I suggest the best way to do that is by providing a BACKUP
role that can be delegated to other users. We would then allow a
parameter for SET output_formatting = on | off, which can only be set
by superuser and BACKUP role, then have pg_dump issue SET
output_formatting = off explicitly when it runs.

Do we want redaction in PostgreSQL?
Do we want it generalised into output format functions?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction


On 10/10/2014 11:57 AM, Simon Riggs wrote:

Postgres currently supports column level SELECT privileges.

1. If we want to confirm a credit card number, we can issue SELECT 1
FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

2. If we want to look for card fraud, we need to be able to use the
full card number to join to transaction data and look up blocked card
lists etc..

3. We want to block the direct retrieval of card numbers for
additional security.
In some cases, we might want to return an answer like ' *  7733'

We can't do all of the above with current facilities inside the database.


Deny access to the underlying tables. Write SQL functions to do 1. and 
2., and grant privileges to the functions, instead. For 3. create views 
that do the redaction.



The ability to mask output for data in certain cases, for the purpose
of security, is known lately as data redaction, or column-level data
redaction.

The best way to support this requirement would be to allow columns to
have an additional output formatting function. This would be
executed only when data is about to be returned by a query. All other
uses of that would not restrict the data.


I don't see how that could work. Once you have access to the datum, you 
can find its value in many indirect ways, without invoking the output 
function. For example, write a PL/pgSQL function that takes the card 
number as argument. Use  and  to binary search its value. If you block 
 and , I'm sure there are countless other ways.


And messing with output functions seems pretty, well, messy, in general.

I think the only solution that's going to work in practice is to 
implement the redaction at a higher level. Don't allow direct access to 
the tables with card numbers. Create functions that do whatever joins, 
etc. you need to do with them, and grant privileges to only the functions.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

2014-10-10 Thread Damian Wolgast




This would have other uses as well, such as default report formats, so
we can store financial amounts as NUMERIC, but format them on
retrieval as $12,345.78 etc..

Nice idea, but what if you need to do further calculations?
If you output the value of credit card transactions it works fine, but in
case you want to SUM up the values, then you need to cast it back from
text(?) to numeric, calculate it and cast it to text(?) again?
And if you do - for any reason - need the credit card number in your
application (for example sending it to the credit card company to deduct
money) how can you retrieve it¹s original value?

Moreover, if you SELECT from a sub-SELECT which already has the formatted
information and not the plain data?

Maybe you should restrict access to tables for a certain user and only
allow the user to use a view which formats the output.

Modern applications do have a presentation layer which should take care of
data formatting. I am not sure if it is a good idea to mix data storage
and data presentation in the database.


Regards,
Damian Wolgast (irc:asymetrixs)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] alter user set local_preload_libraries.

2014-10-10 Thread Fujii Masao

On Fri, Oct 10, 2014 at 5:36 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Hello, I overlooked this thread.

 On Mon, Sep 15, 2014 at 1:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Peter Eisentraut pete...@gmx.net writes:
  On 9/1/14 7:51 AM, Kyotaro HORIGUCHI wrote:
  The attached patch simply changes the context for local_... to
  PGC_USERSET and edits the doc.
 
  I had this ready to commit, but then
 
  Invent PGC_SU_BACKEND and mark log_connections/log_disconnections
  that way.
 
  was committed in the meantime.
 
  Does this affect what we should do with this change?
 
  I guess one thing to look into would be whether we could leave
  local_preload_libraries as PGC_BACKEND and change
  session_preload_libraries to PGC_SU_BACKEND, and then investigate
  whether we could allow settings made with ALTER ROLE / SET to change
  PGC_BACKEND settings.
 
  Yeah, I was wondering about that while I was making the other commit.
  I did not touch those variables at the time, but it would make sense
  to restrict them as you suggest.

 +1

 Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
 set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
 about applying the attached patch? This patch allows that and
 changes the context of session_preload_libraries to PGC_SU_BACKEND.

 It's not my business to decide to aplly it but I don't see
 obvious problmen in it so far.

 By the way, I became unable to login at all after wrongly setting
 *_preload_libraries for all available users. Is there any releaf
 measures for the situation? I think it's okay even if there's no
 way to login again but want to know if any.

Yep, that's a problem. You can login to the server even in that case
by, for example, executing the following commands, though.

$ export PGOPTIONS=-c *_preload_libraries=
$ psql

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

On 10 October 2014 10:29, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 10/10/2014 11:57 AM, Simon Riggs wrote:

 Postgres currently supports column level SELECT privileges.

 1. If we want to confirm a credit card number, we can issue SELECT 1
 FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

 2. If we want to look for card fraud, we need to be able to use the
 full card number to join to transaction data and look up blocked card
 lists etc..

 3. We want to block the direct retrieval of card numbers for
 additional security.
 In some cases, we might want to return an answer like ' * 
 7733'

 We can't do all of the above with current facilities inside the database.


 Deny access to the underlying tables. Write SQL functions to do 1. and 2.,
 and grant privileges to the functions, instead. For 3. create views that do
 the redaction.

If everything were easy to lock down the approach you suggest is of
course the best way.

The problem there is that the SQL for (2) changes frequently, so we
want to give people SQL access.

Just not the ability to retrieve data in a usable form.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

On 10 October 2014 10:15, Thom Brown t...@linux.com wrote:
 On 10 October 2014 09:57, Simon Riggs si...@2ndquadrant.com wrote:
 Postgres currently supports column level SELECT privileges.

 1. If we want to confirm a credit card number, we can issue SELECT 1
 FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

 2. If we want to look for card fraud, we need to be able to use the
 full card number to join to transaction data and look up blocked card
 lists etc..

 3. We want to block the direct retrieval of card numbers for
 additional security.
 In some cases, we might want to return an answer like ' *  7733'

 One question that immediately springs to mind is: would the format
 apply when passing columns to other functions?  If not, wouldn't
 something like

 SELECT upper(redacted_column::text) ...

 just bypass the formatting?

Yes, it would. As would SELECT redacted_column || ' '

I'm not sure how to block such usage, other than to apply it prior to
final calculation of functions.

i.e. we apply it in the SELECT clause, but not in the other clauses
FROM ON/WHERE/GROUP/ORDER/HAVING etc..



 Also, how would casting be handled?  Would it be forbidden for such cases?


 And couldn't the card number be worked out using:

 SELECT 1 FROM customer WHERE stored_card_number LIKE '%1 7733';
  ?column?
 --
 (0 rows)

 SELECT 1 FROM customer WHERE stored_card_number LIKE '%2 7733';
  ?column?
 --
 1
 (1 row)

 SELECT 1 FROM customer WHERE stored_card_number LIKE '%12 7733';
  ?column?
 --
 (0 rows)

 .. and so on, which could be scripted in a DO statement?


 Not so much a challenge to the idea, but just wishing to understand
 how it would work.

Yes, covert channels would always exist. It would really be down to
auditing to control such exploits.

Redaction is aimed at minimising access in normal usage.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

2014-10-10 Thread Damian Wolgast


 The problem there is that the SQL for (2) changes frequently, so we
 want to give people SQL access.

So you want to give people access to your SQL database and worry that they 
could see specific information (credit card numbers) in plain and therefore you 
want to format it, so that people cannot see the real data. Is that correct?

I'd either do that by only letting them access a view or be reconsidering if it 
is really a good idea to give them SQL access to the server as they could do 
other things which e.g. could slow down the server enormously.
Never trust the user. So I see what you want to achieve but I am not sure if 
the reason to do that is good. Can you explain please?
Maybe you should provide them an interface (e.g. web app) that restricts access 
to certain functions and cares about formatting.

Regards
Damian Wolgast (irc:asymetrixs)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Scaling shared buffer eviction

On 2014-10-10 12:28:13 +0530, Amit Kapila wrote:
 On Fri, Oct 10, 2014 at 1:08 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-10-09 16:01:55 +0200, Andres Freund wrote:
  
   I don't think OLTP really is the best test case for this. Especially not
   pgbench with relatilvely small rows *and* a uniform distribution of
   access.
  
   Try parallel COPY TO. Batch write loads is where I've seen this hurt
   badly.
 
 
  just by switching shared_buffers from 1 to 8GB. I haven't tried, but I
  hope that with an approach like your's this might become better.
 
  psql -f /tmp/prepare.sql
  pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100
 
 Thanks for providing the scripts.  You haven't specified how much data
 is present in 'large' file used in tests.

I don't think it matters much. It should be small enough that you get a
couple TPS over all backends.

 I have tried with different set of
 rows, but I could not see the dip that is present in your data when you
 increased shared buffers from 1GB to 8GB, also I don't see any difference
 with patch.

Interesting. I wonder whether that's because the concurrency wasn't high
enough for that machine to show the problem. I ran the test on my
workstation which has 8 actual cores.

 BTW, why do you think that for such worklaods this patch can
 be helpful, according to my understanding it can be helpful mainly for
 read mostly workloads when all the data doesn't fit in shared buffers.

The performance dip comes from all the backends performing the clock
sweep. As the access is pretty uniform all the buffers start with some
usage count (IIRC 3 in this example. Much worse if 5). Due to the
uniform usagecount the clock sweep frequently has to go several times
through *all* the buffers. That leads to quite horrible performance in
some cases.
I had hoped that bgreclaimer can make that workload les horrible by
funneling most of the accesses through the freelist.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

On 10 October 2014 11:08, Damian Wolgast damian.wolg...@si-co.net wrote:

 The problem there is that the SQL for (2) changes frequently, so we
 want to give people SQL access.

 So you want to give people access to your SQL database and worry that they 
 could see specific information (credit card numbers) in plain and therefore 
 you want to format it, so that people cannot see the real data. Is that 
 correct?

 I'd either do that by only letting them access a view or be reconsidering if 
 it is really a good idea to give them SQL access to the server as they could 
 do other things which e.g. could slow down the server enormously.
 Never trust the user. So I see what you want to achieve but I am not sure if 
 the reason to do that is good. Can you explain please?
 Maybe you should provide them an interface (e.g. web app) that restricts 
 access to certain functions and cares about formatting.

The requirement for redaction cannot be provided by a view.

A view provides a single value for each column, no matter whether it
is used in SELECT or WHERE clause.

Redaction requires output formatting only, but unchanged for other purposes.

Redaction is now a feature available in other databases. I guess its
possible its all smoke and mirrors, but thats why we discuss stuff
before we build it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction


On 10/10/2014 01:21 PM, Simon Riggs wrote:

Redaction is now a feature available in other databases. I guess its
possible its all smoke and mirrors, but thats why we discuss stuff
before we build it.


I googled for Oracle Data redaction, and found General Usage guidelines:


General Usage Guidelines

* Oracle Data Redaction is not intended to protect against attacks by
privileged database users who run ad hoc queries directly against the
database.

* Oracle Data Redaction is not intended to protect against users who
run exhaustive SQL queries that attempt to determine the actual
values by inference.


So it's not actually suitable for the example you gave. I don't think we 
want this feature...


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
 The requirement for redaction cannot be provided by a view.
 
 A view provides a single value for each column, no matter whether it
 is used in SELECT or WHERE clause.
 
 Redaction requires output formatting only, but unchanged for other purposes.
 
 Redaction is now a feature available in other databases. I guess its
 possible its all smoke and mirrors, but thats why we discuss stuff
 before we build it.

In general, I'm on-board with the idea and similar requests have come
from users I've talked with.

Is there any additional information available on how these other
databases deal with the questions and concerns which have been raised?

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client).  Of
course, that approach also requires that you have a specific item in
mind.  Methods to mitigate would include not allowing regular users to
create functions or run DO blocks and rate-limiting their queries, along
with appropriate auditing. 

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Column Redaction

2014-10-10 Thread Pavel Stehule

Hi

2014-10-10 10:57 GMT+02:00 Simon Riggs si...@2ndquadrant.com:

 Postgres currently supports column level SELECT privileges.

 1. If we want to confirm a credit card number, we can issue SELECT 1
 FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

 2. If we want to look for card fraud, we need to be able to use the
 full card number to join to transaction data and look up blocked card
 lists etc..

 3. We want to block the direct retrieval of card numbers for
 additional security.
 In some cases, we might want to return an answer like ' * 
 7733'

 We can't do all of the above with current facilities inside the database.

 The ability to mask output for data in certain cases, for the purpose
 of security, is known lately as data redaction, or column-level data
 redaction.

 The best way to support this requirement would be to allow columns to
 have an additional output formatting function. This would be
 executed only when data is about to be returned by a query. All other
 uses of that would not restrict the data.

 This would have other uses as well, such as default report formats, so
 we can store financial amounts as NUMERIC, but format them on
 retrieval as $12,345.78 etc..

 Suggested user interface would be...
FORMAT functionname(parameters, if any)

 e.g.
 CREATE TABLE customer
 ( id ...
 ...
 , stored_card_number  NUMERIC FORMAT pci_card_number_redaction()
 ...
 );

 We'd need to implement something to allow pg_dump to ignore format
 functions. I suggest the best way to do that is by providing a BACKUP
 role that can be delegated to other users. We would then allow a
 parameter for SET output_formatting = on | off, which can only be set
 by superuser and BACKUP role, then have pg_dump issue SET
 output_formatting = off explicitly when it runs.


I see a benefit of this feature as alternative output function .. I
remember a talk about output format of boolean function. But how this
feature can help to security?

You should to disallow any expression over this column marked or you have
to enforced output alternative output function early.

When you require a alternative output format function (should be
implemented in C), then there is not too less work than implementation of
new type. So probably much more practical a any expression can be used

like

 stored_card_number NUMERIC FORMAT (right(stored_card_numbe::text, 4))

Regards

Pavel


 Do we want redaction in PostgreSQL?
 Do we want it generalised into output format functions?

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

On 2014-10-08 20:07:35 -0400, Robert Haas wrote:
 On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund and...@2ndquadrant.com wrote:
  So, what makes it work for me (among other unrelated stuff) seems to be
  the following in .gdbinit, defineing away some things that gdb doesn't
  handle:
  macro define __builtin_offsetof(T, F) ((int) (((T *) 0)-F))
  macro define __extension__
  macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)
 
  Additionally I have -ggdb -g3 in CFLAGS. That way gdb knows about
  postgres' macros. At least if you're in the right scope.
 
  As an example, the following works:
  (gdb) p dlist_is_empty(BackendList) ? NULL : dlist_head_element(Backend, 
  elem, BackendList)
 
 Ah, cool.  I'll try that.

If that works for you, should we put it somewhere in the docs? If so,
where?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 On 10/10/2014 01:21 PM, Simon Riggs wrote:
 Redaction is now a feature available in other databases. I guess its
 possible its all smoke and mirrors, but thats why we discuss stuff
 before we build it.
 
 I googled for Oracle Data redaction, and found General Usage guidelines:
 
 General Usage Guidelines
 
 * Oracle Data Redaction is not intended to protect against attacks by
 privileged database users who run ad hoc queries directly against the
 database.
 
 * Oracle Data Redaction is not intended to protect against users who
 run exhaustive SQL queries that attempt to determine the actual
 values by inference.
 
 So it's not actually suitable for the example you gave. I don't
 think we want this feature...

Or, we need to consider how Oracle addresses these risks and consider if
we can provide a similar capability.  Those capabilities may include
specific configuration and could be a prerequisite for this feature, but
I don't think it's sensible to say we don't want this feature simply
because it can't stand alone as a perfect answer to these risks.

As has been discussed before, we are likely in a better position to
identify the concerns and problem areas, come up with recommendations
for configuration and/or develop new capabilities to mitigate those
risks, than the every-day user or DBA.  If we provide it and address
these issues in a central location which is generally available, then
fixes and problems can be addressed and fixed rather than every
database implementation faced with these concerns having to address
them independently with, most likely, poorer quality solutions.

While we don't want every feature of every database, this deserves more
consideration.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Column Redaction

* Thom Brown (t...@linux.com) wrote:
 To be honest, this all sounds rather flaky.  Even if you do rate-limit
 their queries, they can use methods that avoid rate-limiting, such as
 recursive queries.  And if you're only after one credit card number
 (to use the original example), you'd get it in a relatively short
 amount of time, despite some rate-limiting system.

The discussion about looking up specific card numbers in the original
email from Simon was actually an allowed use-case, as I understood it,
not a risk concern.  Indeed, if you know a valid credit card number
already, as in this example, then why are you bothering with the search?
Perhaps it would provide confirmation, but it's not the database's
responsibility to make you forget the number you already have.  Doing a
random walk through a keyspace of 10^16 and extracting a significant
enough number of results to be useful should be difficult.  I agree that
if we're completely unable to make it difficult then this is less
useful, but I feel it's a bit early to jump to that conclusion.

 This gives the vague impression of security, but it really seems just
 the placing of a few obstacles in the way.

One might consider that all security is just placing obstacles in the
way.

 And auditing sounds like a euphemism for pass the problem of
 security on elsewhere anyway.

Auditing is a known requirement for good security..  There's certainly
different levels of it, but if you aren't at least auditing your
security configuration for the attack vectors you're concerned about,
then you're unlikely to have any real security.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Column Redaction


On 10/10/2014 01:35 PM, Stephen Frost wrote:

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client).  Of
course, that approach also requires that you have a specific item in
mind.


It doesn't require that you have a specific item in mind. Binary search 
is cheap, O(log n). It's easy to write a function to do a binary search 
on a single item, passed as argument, and then apply that to all rows:


SELECT binary_search_reveal(cardnumber) FROM redacted_table;

Really, I don't see how this can possible be made to work. You can't 
allow ad hoc processing of data, and still avoid revealing it to the user.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 On 10/10/2014 01:35 PM, Stephen Frost wrote:
 Regarding functions, 'leakproof' functions should be alright to allow,
 though Heikki brings up a good point regarding binary search being
 possible in a plpgsql function (or even directly by a client).  Of
 course, that approach also requires that you have a specific item in
 mind.
 
 It doesn't require that you have a specific item in mind. Binary
 search is cheap, O(log n). It's easy to write a function to do a
 binary search on a single item, passed as argument, and then apply
 that to all rows:
 
 SELECT binary_search_reveal(cardnumber) FROM redacted_table;

Note that your binary_search_reveal wouldn't be marked as leakproof and
therefore this wouldn't be allowed.  If this was allowed, you'd simply
do raise notice inside the function and call it a day.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

Hi Robert,

On 2014-10-08 16:01:53 -0400, Robert Haas wrote:
 [ comment fixes ]

Thanks, I've incorporated these + a bit more.

Could you otherwise make sense of the explanation and the algorithm?

 +/* yipeyyahee */
 
 Although this will be clear to individuals with a good command of
 English, I suggest avoiding such usages.

I've removed them with a heavy heart. These are heartfelt emotions from
getting the algorithm to work (:P)

I've attached these fixes + the removal of spinlocks around releaseOK as
follow up patches. Obviously they'll be merged into the other patch, but
sounds useful to be able see them separately.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 7 Oct 2014 15:32:34 +0200
Subject: [PATCH 1/4] Convert the PGPROC-lwWaitLink list into a dlist instead
 of open coding it.

Besides being shorter and much easier to read it:

* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting -lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent.

Author: Andres Freund
---
 src/backend/access/transam/twophase.c |   1 -
 src/backend/storage/lmgr/lwlock.c | 151 +-
 src/backend/storage/lmgr/proc.c   |   2 -
 src/include/storage/lwlock.h  |   5 +-
 src/include/storage/proc.h|   3 +-
 5 files changed, 60 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d5409a6..6401943 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	proc-roleId = owner;
 	proc-lwWaiting = false;
 	proc-lwWaitMode = 0;
-	proc-lwWaitLink = NULL;
 	proc-waitLock = NULL;
 	proc-waitProcLock = NULL;
 	for (i = 0; i  NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 9fe6855..e6f9158 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -35,6 +35,7 @@
 #include miscadmin.h
 #include pg_trace.h
 #include replication/slot.h
+#include storage/barrier.h
 #include storage/ipc.h
 #include storage/predicate.h
 #include storage/proc.h
@@ -115,9 +116,9 @@ inline static void
 PRINT_LWDEBUG(const char *where, const LWLock *lock)
 {
 	if (Trace_lwlocks)
-		elog(LOG, %s(%s %d): excl %d shared %d head %p rOK %d,
+		elog(LOG, %s(%s %d): excl %d shared %d rOK %d,
 			 where, T_NAME(lock), T_ID(lock),
-			 (int) lock-exclusive, lock-shared, lock-head,
+			 (int) lock-exclusive, lock-shared,
 			 (int) lock-releaseOK);
 }
 
@@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
 	lock-exclusive = 0;
 	lock-shared = 0;
 	lock-tranche = tranche_id;
-	lock-head = NULL;
-	lock-tail = NULL;
+	dlist_init(lock-waiters);
 }
 
 
@@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
 
 		proc-lwWaiting = true;
 		proc-lwWaitMode = mode;
-		proc-lwWaitLink = NULL;
-		if (lock-head == NULL)
-			lock-head = proc;
-		else
-			lock-tail-lwWaitLink = proc;
-		lock-tail = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);
@@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 
 		proc-lwWaiting = true;
 		proc-lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc-lwWaitLink = NULL;
-		if (lock-head == NULL)
-			lock-head = proc;
-		else
-			lock-tail-lwWaitLink = proc;
-		lock-tail = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);
@@ -997,13 +987,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		 */
 		proc-lwWaiting = true;
 		proc-lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc-lwWaitLink = NULL;
-
 		/* waiters are added to the front of the queue */
-		proc-lwWaitLink = lock-head;
-		if (lock-head == NULL)
-			lock-tail = proc;
-		lock-head = proc;
+		dlist_push_head(lock-waiters, proc-lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(lock-mutex);
@@ -1079,9 +1064,10 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 void
 LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 {
-	PGPROC	   *head;
-	PGPROC	   *proc;
-	PGPROC	   *next;
+	dlist_head	wakeup;
+	dlist_mutable_iter iter;
+
+	dlist_init(wakeup);
 
 	/* Acquire mutex.  Time spent holding mutex

Re: [HACKERS] Column Redaction

2014-10-10 Thread Hannu Krosing

On 10/10/2014 11:38 AM, Simon Riggs wrote:
 On 10 October 2014 10:29, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 10/10/2014 11:57 AM, Simon Riggs wrote:
 Postgres currently supports column level SELECT privileges.

 1. If we want to confirm a credit card number, we can issue SELECT 1
 FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

 2. If we want to look for card fraud, we need to be able to use the
 full card number to join to transaction data and look up blocked card
 lists etc..

 3. We want to block the direct retrieval of card numbers for
 additional security.
 In some cases, we might want to return an answer like ' * 
 7733'

 We can't do all of the above with current facilities inside the database.

 Deny access to the underlying tables. Write SQL functions to do 1. and 2.,
 and grant privileges to the functions, instead. For 3. create views that do
 the redaction.
 If everything were easy to lock down the approach you suggest is of
 course the best way.

 The problem there is that the SQL for (2) changes frequently, so we
 want to give people SQL access.
1. Give people access to development system with safe data where they
write their functions

2. once function is working, pass it to auditors

3. deploy and use the function.
 Just not the ability to retrieve data in a usable form.
For an attacker any access is in a usable form, for honest people you
can just provide a view or set-returning function.

btw, one way to do the redaction you suggested above is to write a
special
type, which redacts data on output.

You can even make the type output function dependent on backup role.

Just make sure that users are aware that it is not really a security
feature
which protects against attackers.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction


On 10/10/2014 02:05 PM, Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

On 10/10/2014 01:35 PM, Stephen Frost wrote:

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client).  Of
course, that approach also requires that you have a specific item in
mind.


It doesn't require that you have a specific item in mind. Binary
search is cheap, O(log n). It's easy to write a function to do a
binary search on a single item, passed as argument, and then apply
that to all rows:

SELECT binary_search_reveal(cardnumber) FROM redacted_table;


Note that your binary_search_reveal wouldn't be marked as leakproof and
therefore this wouldn't be allowed.  If this was allowed, you'd simply
do raise notice inside the function and call it a day.


*shrug*, just do the same with a more complicated query, then. Even if 
you can't create a function that does that, you can still execute the 
same logic without a function.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

On 10 October 2014 12:00, Stephen Frost sfr...@snowman.net wrote:
 * Thom Brown (t...@linux.com) wrote:
 To be honest, this all sounds rather flaky.  Even if you do rate-limit
 their queries, they can use methods that avoid rate-limiting, such as
 recursive queries.  And if you're only after one credit card number
 (to use the original example), you'd get it in a relatively short
 amount of time, despite some rate-limiting system.

 The discussion about looking up specific card numbers in the original
 email from Simon was actually an allowed use-case, as I understood it,
 not a risk concern.  Indeed, if you know a valid credit card number
 already, as in this example, then why are you bothering with the search?

The topic being column redaction rather than column formatting
leads me to believe that the main use-case of the feature would be to
prevent the user from discovering the full value of the column.  It's
not so much point 1 I was responding do, rather point 3, where you
don't know the card number, but you get information about it in the
results.  The purpose of this feature would be to prevent the user
from seeing all that data, which is a security feature, but at the
moment it just seems to be a way of making it a little less easy to
get at that data.

 This gives the vague impression of security, but it really seems just
 the placing of a few obstacles in the way.

 One might consider that all security is just placing obstacles in the
 way.

There's a difference between intending that there shouldn't be a way
past security and just making access a matter of walking a longer
route.

I wouldn't be against formatting per se, but for the purposes of that,
I would say that views can already serve that purpose.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 On 10/10/2014 02:05 PM, Stephen Frost wrote:
 * Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 On 10/10/2014 01:35 PM, Stephen Frost wrote:
 Regarding functions, 'leakproof' functions should be alright to allow,
 though Heikki brings up a good point regarding binary search being
 possible in a plpgsql function (or even directly by a client).  Of
 course, that approach also requires that you have a specific item in
 mind.
 
 It doesn't require that you have a specific item in mind. Binary
 search is cheap, O(log n). It's easy to write a function to do a
 binary search on a single item, passed as argument, and then apply
 that to all rows:
 
 SELECT binary_search_reveal(cardnumber) FROM redacted_table;
 
 Note that your binary_search_reveal wouldn't be marked as leakproof and
 therefore this wouldn't be allowed.  If this was allowed, you'd simply
 do raise notice inside the function and call it a day.
 
 *shrug*, just do the same with a more complicated query, then. Even
 if you can't create a function that does that, you can still execute
 the same logic without a function.

Not sure I see what you're getting at here..?  My point was that you'd
need a target number and the system would only provide confirmation that
the number exists, or does not.  Your argument was that the table
itself would provide the target number, which was flawed.  I don't see
how just do the same with a more complicated query removes the need to
have a target number for the binary search.

A better argument would be the equality case than the binary search if
you're simply looking for confirmation of existence.  If the user can
define a table of targets, or uses a VALUES construct, and then join to
it then we might build a hash table and provide those results faster
than a binary search, though this again means that the user is
providing the list of keys to check.

As mentioned elsewhere on the thread, I agree that this capability
wouldn't be useful if a random search (which is providing the 'targets')
through a 10^16 keyspace generated a significant number of results (I'd
also throw in there in a reasonable amount of time- clearly it'd be
possible to extract all keys given sufficient time, even with a random
search).  The sketch that Simon outlined won't obviously provide that
guarantee, but I'm not prepared to say we couldn't provide it at all
while meeting the goal he outlined.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Column Redaction

* Thom Brown (t...@linux.com) wrote:
 On 10 October 2014 12:00, Stephen Frost sfr...@snowman.net wrote:
  The discussion about looking up specific card numbers in the original
  email from Simon was actually an allowed use-case, as I understood it,
  not a risk concern.  Indeed, if you know a valid credit card number
  already, as in this example, then why are you bothering with the search?
 
 The topic being column redaction rather than column formatting
 leads me to believe that the main use-case of the feature would be to
 prevent the user from discovering the full value of the column.

I believe the idea is to limit the chances that a user with limited
pre-existing knowledge would be able to determine the full value of
items in the column, especially in bulk.

 It's
 not so much point 1 I was responding do, rather point 3, where you
 don't know the card number, but you get information about it in the
 results.

We'd certainly want to prevent that to the limit possible.  Do you have
a specific thought about how they'd be able to find a full number beyond
a random search..?

 The purpose of this feature would be to prevent the user
 from seeing all that data, which is a security feature, but at the
 moment it just seems to be a way of making it a little less easy to
 get at that data.

I certainly appreciate the thought challenges and critique and I'm
hopeful we could make it more than a little less easy to get at the
information.  If we aren't able to do that, then the feature isn't
useful, certainly.

  This gives the vague impression of security, but it really seems just
  the placing of a few obstacles in the way.
 
  One might consider that all security is just placing obstacles in the
  way.
 
 There's a difference between intending that there shouldn't be a way
 past security and just making access a matter of walking a longer
 route.

Throwing random 16-digit numbers and associated information at a credit
card processor could be viewed as walking a longer route too.  The
same goes for random key searches or password guesses.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-10-10 10:13:03 +0530, Amit Kapila wrote:
  I have done few performance tests for above patches and results of
  same is as below:

 Cool, thanks.

  Performance Data
  --
  IBM POWER-7 16 cores, 64 hardware threads
  RAM = 64GB
  max_connections =210
  Database Locale =C
  checkpoint_segments=256
  checkpoint_timeout=35min
  shared_buffers=8GB
  Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
  Duration of each individual run = 5mins
  Test type - read only pgbench with -M prepared
  Other Related information about test
  a. This is the data for median of 3 runs, the detailed data of
individual
  run
  is attached with mail.
  b. I have applied both the patches to take performance data.
 
 
  Observations
  --
  a. The patch performs really well (increase upto ~40%) incase all the
  data fits in shared buffers (scale factor -100).
  b. Incase data doesn't fit in shared buffers, but fits in RAM
  (scale factor -3000), there is performance increase upto 16 client
count,
  however after that it starts dipping (in above config unto ~4.4%).

 Hm. Interesting. I don't see that dip on x86.

Is it possible that implementation of some atomic operation is costlier
for particular architecture?

I have tried again for scale factor 3000 and could see the dip and this
time I have even tried with 175 client count and the dip is approximately
5% which is slightly more than 160 client count.


  Patch_ver/Client_count 175  HEAD 248374  PATCH 235669
  Now probably these shouldn't matter much in case backend needs to
  wait for other Exclusive locker, but I am not sure what else could be
  the reason for dip in case we need to have Exclusive LWLocks.

 Any chance to get a profile?

Here it goes..

HEAD - client_count=128
-

+   7.53% postgres  postgres   [.] GetSnapshotData
+   3.41% postgres  postgres   [.] AllocSetAlloc
+   2.61% postgres  postgres   [.] AllocSetFreeIndex
+   2.49% postgres  postgres   [.] _bt_compare
+   2.43% postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
+   2.40% postgres  postgres   [.]
hash_search_with_hash_value
+   1.83% postgres  postgres   [.] tas
+   1.29% postgres  postgres   [.] pg_encoding_mbcliplen
+   1.27% postgres  postgres   [.] MemoryContextCreate
+   1.22% postgres  postgres   [.]
MemoryContextAllocZeroAligned
+   1.17% postgres  postgres   [.] hash_seq_search
+   0.97% postgres  postgres   [.] LWLockRelease
+   0.96% postgres  postgres   [.]
MemoryContextAllocZero
+   0.91% postgres  postgres   [.]
GetPrivateRefCountEntry
+   0.82% postgres  postgres   [.] AllocSetFree
+   0.79% postgres  postgres   [.] LWLockAcquireCommon
+   0.78% postgres  postgres   [.] pfree



Detailed Data
-
-   7.53% postgres  postgres   [.] GetSnapshotData
   - GetSnapshotData
  - 7.46% GetSnapshotData
 - 7.46% GetTransactionSnapshot
- 3.74% exec_bind_message
 PostgresMain
 BackendRun
 BackendStartup
 ServerLoop
 PostmasterMain
 main
 generic_start_main.isra.0
 __libc_start_main
 0
- 3.72% PortalStart
 exec_bind_message
 PostgresMain
 BackendRun
 BackendStartup
 ServerLoop
 PostmasterMain
 main
 generic_start_main.isra.0
 __libc_start_main
 0
-   3.41% postgres  postgres   [.] AllocSetAlloc
   - AllocSetAlloc
  - 2.01% AllocSetAlloc
   0.81% palloc
   0.63% MemoryContextAlloc
-   2.61% postgres  postgres   [.] AllocSetFreeIndex
   - AllocSetFreeIndex
1.59% AllocSetAlloc
0.79% AllocSetFree
-   2.49% postgres  postgres   [.] _bt_compare
   - _bt_compare
  - 1.80% _bt_binsrch
 - 1.80% _bt_binsrch
- 1.21% _bt_search
 _bt_first

Lwlock_contention patches - client_count=128
--

+   7.95%  postgres  postgres   [.] GetSnapshotData
+   3.58%  postgres  postgres   [.] AllocSetAlloc
+   2.51%  postgres  postgres   [.] _bt_compare
+   2.44%  postgres  postgres   [.]
hash_search_with_hash_value
+   2.33%  postgres  [kernel.kallsyms]  [k]

Re: [HACKERS] Column Redaction


On 10/10/2014 02:27 PM, Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

On 10/10/2014 02:05 PM, Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

On 10/10/2014 01:35 PM, Stephen Frost wrote:

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client).  Of
course, that approach also requires that you have a specific item in
mind.


It doesn't require that you have a specific item in mind. Binary
search is cheap, O(log n). It's easy to write a function to do a
binary search on a single item, passed as argument, and then apply
that to all rows:

SELECT binary_search_reveal(cardnumber) FROM redacted_table;


Note that your binary_search_reveal wouldn't be marked as leakproof and
therefore this wouldn't be allowed.  If this was allowed, you'd simply
do raise notice inside the function and call it a day.


*shrug*, just do the same with a more complicated query, then. Even
if you can't create a function that does that, you can still execute
the same logic without a function.


Not sure I see what you're getting at here..?  My point was that you'd
need a target number and the system would only provide confirmation that
the number exists, or does not.  Your argument was that the table
itself would provide the target number, which was flawed.  I don't see
how just do the same with a more complicated query removes the need to
have a target number for the binary search.


You said above that it's OK to pass the card numbers to leakproof 
functions. But if you allow that, you can write a function that takes as 
argument a redacted card number, and unredacts it (using the  and = 
operators in a binary search). And then you can just do SELECT 
unredact(card_number) from redacted_table.


You seem to have something stronger in mind: only allow the equality 
operator on the redacted column, and nothing else. That might be better, 
although I'm not really convinced. There are just too many ways you 
could still leak the datum. Just a random example, inspired by the 
recent CRIME attack on SSL: build a row with the redacted datum, and 
another guess datum, and store it along with 1k of other data in a 
temporary table. The row gets toasted. Observe how much it compressed; 
if the guess datum is close to the original datum, it compresses well. 
Now, you can probably stop that particular attack with more restrictions 
on what you can do with the datum, but that just shows that pretty much 
any computation you allow with the datum can be used to reveal its value.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

On 10 October 2014 12:45, Stephen Frost sfr...@snowman.net wrote:
  This gives the vague impression of security, but it really seems just
  the placing of a few obstacles in the way.
 
  One might consider that all security is just placing obstacles in the
  way.

 There's a difference between intending that there shouldn't be a way
 past security and just making access a matter of walking a longer
 route.

 Throwing random 16-digit numbers and associated information at a credit
 card processor could be viewed as walking a longer route too.  The
 same goes for random key searches or password guesses.

But those would need to be exhaustive, and in nearly all cases,
impractical.  Data such as plain credit card numbers stored in a
column, even with all its data masked, would be easy to determine.
Salted and hashed passwords, even with complete visibility of the
value, isn't vulnerable to scrutiny of particular character values.
If it were, no-one would use it.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

On 10 October 2014 11:45, Thom Brown t...@linux.com wrote:

 To be honest, this all sounds rather flaky.

To be honest, suggesting anything at all is rather difficult and I
recommend people try it.

Everything sounds crap when you didn't think of it and you've given it
an hour's thought.

I'm not blind to the difficulties raised and I thank you for your
input, but I think its too early to make sweeping generalisations.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

On 10 October 2014 12:01, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Really, I don't see how this can possible be made to work. You can't allow
 ad hoc processing of data, and still avoid revealing it to the user.

Anyone with unmonitored access and sufficient time can break through security.

I think that is true of any kind of security, and so it is true here also.

Auditing and controls are required also, that's why I suggested those
first. This proposal was looking beyond that to what we might need
next.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

2014-10-10 Thread Robert Haas

On Fri, Oct 10, 2014 at 7:00 AM, Stephen Frost sfr...@snowman.net wrote:
 * Thom Brown (t...@linux.com) wrote:
 To be honest, this all sounds rather flaky.  Even if you do rate-limit
 their queries, they can use methods that avoid rate-limiting, such as
 recursive queries.  And if you're only after one credit card number
 (to use the original example), you'd get it in a relatively short
 amount of time, despite some rate-limiting system.

 The discussion about looking up specific card numbers in the original
 email from Simon was actually an allowed use-case, as I understood it,
 not a risk concern.  Indeed, if you know a valid credit card number
 already, as in this example, then why are you bothering with the search?
 Perhaps it would provide confirmation, but it's not the database's
 responsibility to make you forget the number you already have.  Doing a
 random walk through a keyspace of 10^16 and extracting a significant
 enough number of results to be useful should be difficult.  I agree that
 if we're completely unable to make it difficult then this is less
 useful, but I feel it's a bit early to jump to that conclusion.

You are obviously wearing your rose-colored glasses this morning.  I
predict a competent SQL programmer could write an SQL function, or
client-side code, to pump the data out of the database using binary
search in milliseconds per row.  And I think it's more likely than not
that there are other techniques that are much faster.  The idea that
you're going to be able to let people query the data but not actually
retrieve it should be viewed with great skepticism.  This is the
equivalent of telling a child that she can't open her Christmas
presents until Christmas, but she can shake them, hold them up to a
bright light, and/or X-ray the packages.  If she doesn't know what's
in there by the time she opens it, it's just for lack of effort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

On 10 October 2014 13:43, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 October 2014 11:45, Thom Brown t...@linux.com wrote:

 To be honest, this all sounds rather flaky.

 To be honest, suggesting anything at all is rather difficult and I
 recommend people try it.

I have, and most ideas I've had have been justifiably shot down or
picked apart (scheduled background tasks, offloading stats collection
to standby, index maintenance in DML query plans, expression
statistics... to name but a few).

 Everything sounds crap when you didn't think of it and you've given it
 an hour's thought.

I'm not sure that means my concerns aren't valid.  I don't think it
sounds crap, but I also can't see any use-case for it where we don't
already have things covered, or where it's going to offer any useful
level of security.  Like with RLS, it may be that I'm just looking at
things from the wrong perspective.
-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

2014-10-10 Thread Claudio Freire

On Fri, Oct 10, 2014 at 5:57 AM, Simon Riggs si...@2ndquadrant.com wrote:

 1. If we want to confirm a credit card number, we can issue SELECT 1
 FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
 ...
 3. We want to block the direct retrieval of card numbers for
 additional security.
 In some cases, we might want to return an answer like ' *  7733'


I wouldn't want to allow that:

select ref.ref, customer.name from (select generate_series as ref from
generate_series(0, )) ref, customer
where ref.ref = stored_card_number.ref

May take a long while. Just disable everything except nestloop and
suck up the data as it comes. Can be optimized. Not sure how you'd
avoid this, not trivial at all. Not possible at all I'd venture.

But if you really really want to allow this, encrypt the column, and
provide a C function that can decrypt it. You can join encrypted
columns, and you can even include the last 4 digits unencrypted if you
want (I wouldn't want).

Has to be a C function to be able to avoid leaking the key, btw.

 2. If we want to look for card fraud, we need to be able to use the
 full card number to join to transaction data and look up blocked card
 lists etc..

view works for this pretty well


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Inconsistencies in documentation of row-level locking

2014-10-10 Thread Michael Paquier

Hi all,

Currently all the row-level lock modes are described in the page for
SELECT query:
http://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS
However, after browsing the documentation, I noticed in the page
describing all the explicit locks of the system that there is a
portion dedicated to row-level locks and this section is not
mentioning at all FOR KEY SHARE and FOR NO KEY UPDATE. It seems that
this is something rather misleading for the user:
http://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS

Attached is a patch that refactors the whole and improves the documentation:
- Addition of a table showing the conflicts between each lock
- Moved description of each row-level lock mode to the explicit locking page
- Addition of a link in SELECT portion to redirect the user to the
explicit locking page
Regards,
-- 
Michael
From 974b360cc8bfdade607154c94d0a78f5cf466e0a Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Fri, 10 Oct 2014 22:10:10 +0900
Subject: [PATCH] Refactor documentation of row-level locking

All the details about each row-level lock mode was referenced in
details in the page dedicated to SELECT query, what seems rather
incorrect because there is a section in MVCC section describing in
details all the locks that can be used in the system. Note that this
portion has not been updated for the implementation of FOR KEY SHARE
and FOR NO KEY UPDATE while it should have been the case, making the
information provided to user inconsistent and misleading. This patch
refactors the whole, adding a link in SELECT to redirect the user to
the page describing all the explicit locks of the system.
---
 doc/src/sgml/mvcc.sgml   | 172 +--
 doc/src/sgml/ref/select.sgml |  60 +--
 2 files changed, 152 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index cd55be8..478279d 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1106,30 +1106,107 @@ ERROR:  could not serialize access due to read/write dependencies among transact
 
 para
  In addition to table-level locks, there are row-level locks, which
- can be exclusive or shared locks.  An exclusive row-level lock on a
- specific row is automatically acquired when the row is updated or
- deleted.  The lock is held until the transaction commits or rolls
- back, just like table-level locks.  Row-level locks do
- not affect data querying; they block only emphasiswriters to the same
- row/emphasis.
+ are listed as below with the contexts in which they are used
+ automatically by productnamePostgreSQL/productname.  The main
+ differences between one lock mode and another is the set of lock modes
+ with each conflicts (see xref linkend=row-lock-compatibility).
+ Two transactions cannot hold locks of conflicting modes on the same rows
+ of the same table at the same time.  Also, a transaction never conflicts
+ with itself.  Row-level locks do not affect data querying; they block only
+ emphasiswriters to the same row/emphasis.
 /para
 
-para
- To acquire an exclusive row-level lock on a row without actually
- modifying the row, select the row with commandSELECT FOR
- UPDATE/command.  Note that once the row-level lock is acquired,
- the transaction can update the row multiple times without
- fear of conflicts.
-/para
+ variablelist
+  titleRow-level Lock Modes/title
+  varlistentry
+   term
+literalFOR UPDATE/literal
+   /term
+   listitem
+para
+ literalFOR UPDATE/literal causes the rows retrieved by the
+ commandSELECT/command statement to be locked as though for
+ update.  This prevents them from being modified or deleted by
+ other transactions until the current transaction ends.  That is,
+ other transactions that attempt commandUPDATE/command,
+ commandDELETE/command,
+ commandSELECT FOR UPDATE/command,
+ commandSELECT FOR NO KEY UPDATE/command,
+ commandSELECT FOR SHARE/command or
+ commandSELECT FOR KEY SHARE/command
+ of these rows will be blocked until the current transaction ends.
+ The literalFOR UPDATE/ lock mode
+ is also acquired by any commandDELETE/ on a row, and also by an
+ commandUPDATE/ that modifies the values on certain columns.  Currently,
+ the set of columns considered for the commandUPDATE/ case are those that
+ have a unique index on them that can be used in a foreign key (so partial
+ indexes and expressional indexes are not considered), but this may change
+ in the future.
+ Also, if an commandUPDATE/command, commandDELETE/command,
+ or commandSELECT FOR UPDATE/command from another transaction
+ has already locked a selected row or rows, commandSELECT FOR

Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-10 Thread MauMau


From: Craig Ringer cr...@2ndquadrant.com

It sounds like they've produced a test case, so they should be able to
with a bit of luck.

Or even better, send you the test case.


I asked the user about this.  It sounds like the relevant test case consists 
of many scripts.  He explained to me that the simplified test steps are:


1. initdb
2. pg_ctl start
3. Create 16 tables.  Each of those tables consist of around 10 columns.
4. Insert 1000 rows into each of those 16 tables.
5. Launch 16 psql sessions concurrently.  Each session updates all 1000 rows 
of one table, e.g., session 1 updates table 1, session 2 updates table 2, 
and so on.

6. Repeat step 5 50 times.

This sounds a bit complicated, but I understood that the core part is 16 
concurrent updates, which should lead to contention on xlog insert slots 
and/or spinlocks.




Your next step here really needs to be to make this reproducible against
a debug build. Then see if reverting the xlog scalability work actually
changes the behaviour, given that you hypothesised that it could be
involved.


Thank you, but that may be labor-intensive and time-consuming.  In addition, 
the user uses a machine with multiple CPU cores, while I only have a desktop 
PC with two CPU cores.  So I doubt I can reproduce the problem on my PC.


I asked the user to change S_UNLOCK to something like the following and run 
the test during this weekend (the next Monday is a national holiday in 
Japan).


#define S_UNLOCK(lock)  InterlockedExchange(lock, 0)

FYI, the user reported today that the problem didn't occur when he ran the 
same test for 24 hours on 9.3.5.  Do you see something relevant in 9.4?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

On 2014-10-10 23:08:34 +0900, MauMau wrote:
 From: Craig Ringer cr...@2ndquadrant.com
 It sounds like they've produced a test case, so they should be able to
 with a bit of luck.
 
 Or even better, send you the test case.
 
 I asked the user about this.  It sounds like the relevant test case consists
 of many scripts.  He explained to me that the simplified test steps are:
 
 1. initdb
 2. pg_ctl start
 3. Create 16 tables.  Each of those tables consist of around 10 columns.
 4. Insert 1000 rows into each of those 16 tables.
 5. Launch 16 psql sessions concurrently.  Each session updates all 1000 rows
 of one table, e.g., session 1 updates table 1, session 2 updates table 2,
 and so on.
 6. Repeat step 5 50 times.
 
 This sounds a bit complicated, but I understood that the core part is 16
 concurrent updates, which should lead to contention on xlog insert slots
 and/or spinlocks.

Hm. I've run similar loads on linux for long enough that I'm relatively
sure I'd have seen this.

Could you get them to print out the content's of the lwlock all these
processes are waiting for?

 Your next step here really needs to be to make this reproducible against
 a debug build. Then see if reverting the xlog scalability work actually
 changes the behaviour, given that you hypothesised that it could be
 involved.

I don't think you can trivially revert the xlog scalability stuff.

 Thank you, but that may be labor-intensive and time-consuming.  In addition,
 the user uses a machine with multiple CPU cores, while I only have a desktop
 PC with two CPU cores.  So I doubt I can reproduce the problem on my PC.

Well, it'll also be labor intensive for the community to debug.

 I asked the user to change S_UNLOCK to something like the following and run
 the test during this weekend (the next Monday is a national holiday in
 Japan).
 
 #define S_UNLOCK(lock)  InterlockedExchange(lock, 0)

That shouldn't be required. For one, on 9.4 (not 9.5!) spinlock releases
only need to prevent reordering on the CPU level. As x86 is a TSO
architecture (total store order) that doesn't require doing anything
special. And even if it'd require more, on msvc volatile reads/stores
act as acquire/release fences unless you monkey with the compiler settings.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
 On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com
 wrote:
   Observations
   --
   a. The patch performs really well (increase upto ~40%) incase all the
   data fits in shared buffers (scale factor -100).
   b. Incase data doesn't fit in shared buffers, but fits in RAM
   (scale factor -3000), there is performance increase upto 16 client
 count,
   however after that it starts dipping (in above config unto ~4.4%).
 
  Hm. Interesting. I don't see that dip on x86.

 Is it possible that implementation of some atomic operation is costlier
 for particular architecture?

Yes, sure. And IIRC POWER improved atomics performance considerably for
POWER8...

 I have tried again for scale factor 3000 and could see the dip and this
 time I have even tried with 175 client count and the dip is approximately
 5% which is slightly more than 160 client count.

FWIW, the profile always looks like
-  48.61%  postgres  postgres  [.] s_lock
   - s_lock
  + 96.67% StrategyGetBuffer
  + 1.19% UnpinBuffer
  + 0.90% PinBuffer
  + 0.70% hash_search_with_hash_value
+   3.11%  postgres  postgres  [.] GetSnapshotData
+   2.47%  postgres  postgres  [.] StrategyGetBuffer
+   1.93%  postgres  [kernel.kallsyms] [k] copy_user_generic_string
+   1.28%  postgres  postgres  [.] hash_search_with_hash_value
-   1.27%  postgres  postgres  [.] LWLockAttemptLock
   - LWLockAttemptLock
  - 97.78% LWLockAcquire
 + 38.76% ReadBuffer_common
 + 28.62% _bt_getbuf
 + 8.59% _bt_relandgetbuf
 + 6.25% GetSnapshotData
 + 5.93% VirtualXactLockTableInsert
 + 3.95% VirtualXactLockTableCleanup
 + 2.35% index_fetch_heap
 + 1.66% StartBufferIO
 + 1.56% LockReleaseAll
 + 1.55% _bt_next
 + 0.78% LockAcquireExtended
  + 1.47% _bt_next
  + 0.75% _bt_relandgetbuf

to me. Now that's with the client count 496, but it's similar with lower
counts.

BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
smarter.

   Patch_ver/Client_count 175  HEAD 248374  PATCH 235669
   Now probably these shouldn't matter much in case backend needs to
   wait for other Exclusive locker, but I am not sure what else could be
   the reason for dip in case we need to have Exclusive LWLocks.
 
  Any chance to get a profile?

 Here it goes..

 Lwlock_contention patches - client_count=128
 --

 +   7.95%  postgres  postgres   [.] GetSnapshotData
 +   3.58%  postgres  postgres   [.] AllocSetAlloc
 +   2.51%  postgres  postgres   [.] _bt_compare
 +   2.44%  postgres  postgres   [.]
 hash_search_with_hash_value
 +   2.33%  postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
 +   2.24%  postgres  postgres   [.] AllocSetFreeIndex
 +   1.75%  postgres  postgres   [.]
 pg_atomic_fetch_add_u32_impl

Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
the compiler settings you used?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
 You said above that it's OK to pass the card numbers to leakproof
 functions. But if you allow that, you can write a function that
 takes as argument a redacted card number, and unredacts it (using
 the  and = operators in a binary search). And then you can just do
 SELECT unredact(card_number) from redacted_table.

Not sure I'm following what you mean by 'redacted'.  The original
proposal provided '   1234' as the 'redacted' number, and
I'm not seeing how you can get the rest of the number trivially with
just equality and binary search.

If you start with a complete number then you can get the system to tell
you if it exists or not with a binary search or even just doing an
equality check.

 You seem to have something stronger in mind: only allow the equality
 operator on the redacted column, and nothing else.

That wasn't my suggestion- I was merely pointing out that if you have a
complete number (perhaps by pulling out a random number, with a filter
against the last four digits, reducing the search space to 10^12) which
you want to check for existance, you can do that directly.  No need for
a binary search at all.

 That might be
 better, although I'm not really convinced. There are just too many
 ways you could still leak the datum. Just a random example, inspired
 by the recent CRIME attack on SSL: build a row with the redacted
 datum, and another guess datum, and store it along with 1k of
 other data in a temporary table. The row gets toasted. Observe how
 much it compressed; if the guess datum is close to the original
 datum, it compresses well. Now, you can probably stop that
 particular attack with more restrictions on what you can do with the
 datum, but that just shows that pretty much any computation you
 allow with the datum can be used to reveal its value.

One concept I've been thinking about is a notion of 'trusted' data
sources to allow comparison against.  Perhaps individual values are
allowed from the user also, but my thought is that you have:

master_table
trusted_table

Such that you can't view the sensetive column in either the master or
the trusted table, but you can join between the two on the sensetive
column and view other, non-sensetive, attributes of the two tables.  You
might even allow other transformations on the sensetive column, provided
it always results in a boolean comparison to another sensetive column.
Not sure if that really solves Simon's use-case exactly, but it might
tease out other thoughts.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Column Redaction

* Thom Brown (t...@linux.com) wrote:
 On 10 October 2014 12:45, Stephen Frost sfr...@snowman.net wrote:
  There's a difference between intending that there shouldn't be a way
  past security and just making access a matter of walking a longer
  route.
 
  Throwing random 16-digit numbers and associated information at a credit
  card processor could be viewed as walking a longer route too.  The
  same goes for random key searches or password guesses.
 
 But those would need to be exhaustive, and in nearly all cases,
 impractical.

That would be exactly the idea with this- we make it impractical to get
at the unredacted information.

 Data such as plain credit card numbers stored in a
 column, even with all its data masked, would be easy to determine.

I'm not as convinced of that as you are..  Though I'll point out that in
the use-cases which I've been talking to users about, it isn't credit
cards under discussion.

 Salted and hashed passwords, even with complete visibility of the
 value, isn't vulnerable to scrutiny of particular character values.
 If it were, no-one would use it.

I wasn't suggesting otherwise, but I don't see it as particularly
relevant to the discussion regardless.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Column Redaction

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Oct 10, 2014 at 7:00 AM, Stephen Frost sfr...@snowman.net wrote:
  The discussion about looking up specific card numbers in the original
  email from Simon was actually an allowed use-case, as I understood it,
  not a risk concern.  Indeed, if you know a valid credit card number
  already, as in this example, then why are you bothering with the search?
  Perhaps it would provide confirmation, but it's not the database's
  responsibility to make you forget the number you already have.  Doing a
  random walk through a keyspace of 10^16 and extracting a significant
  enough number of results to be useful should be difficult.  I agree that
  if we're completely unable to make it difficult then this is less
  useful, but I feel it's a bit early to jump to that conclusion.

Thanks much for the laugh. :)

 You are obviously wearing your rose-colored glasses this morning.  I
 predict a competent SQL programmer could write an SQL function, or
 client-side code, to pump the data out of the database using binary
 search in milliseconds per row.

Clearly, if we're unable to prevent that, then this feature wouldn't be
useful.  What would be helpful is to consider what we could provide
along these lines without allowing the data to be trivially recovered.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Column Redaction

On 10 October 2014 15:56, Stephen Frost sfr...@snowman.net wrote:
 * Thom Brown (t...@linux.com) wrote:
 Data such as plain credit card numbers stored in a
 column, even with all its data masked, would be easy to determine.

 I'm not as convinced of that as you are..  Though I'll point out that in
 the use-cases which I've been talking to users about, it isn't credit
 cards under discussion.

I think credit card numbers are a good example.  If we're talking
about format functions here, there has to be something in addition to
that which determines permitted comparison operations.  If not, and we
were going to remove all but = operations, we'd effectively cripple
the functionality of anything that's been formatted that wasn't
intended as a security measure.  It almost sounds like an extension to
domains rather than column-level functionality.

But then if operators such as ,  and ~~ aren't hindered, it sounds
like no protection at all.

Also, joining to foreign tables could be an issue, copying data to
temporary tables could possibly remove any redaction, materialised
views would need to support it somehow.  Although just because I can't
picture how that would work, it's no indication that it couldn't.

 Salted and hashed passwords, even with complete visibility of the
 value, isn't vulnerable to scrutiny of particular character values.
 If it were, no-one would use it.

 I wasn't suggesting otherwise, but I don't see it as particularly
 relevant to the discussion regardless.

I guess I was trying to illustrate that the security in a hashed
password is acceptable because it requires exhaustive searching to
break.  If comparison operators worked on it, it would be broken out
of the box.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

2014-10-10 Thread Claudio Freire

On Fri, Oct 10, 2014 at 11:58 AM, Stephen Frost sfr...@snowman.net wrote:
 You are obviously wearing your rose-colored glasses this morning.  I
 predict a competent SQL programmer could write an SQL function, or
 client-side code, to pump the data out of the database using binary
 search in milliseconds per row.

 Clearly, if we're unable to prevent that, then this feature wouldn't be
 useful.  What would be helpful is to consider what we could provide
 along these lines without allowing the data to be trivially recovered.

Joins are way too powerful to allow arbitrary joins to untrusted users.

The only somewhat secure way is to allow administrators define which
joins are possible, and untrusted users use those.

You get that with views. I'm not sure you can allow more than that,
and not have lots of leaks.

Is there a use case where redaction is the only solution really?

Nothing mentioned till now really is:

* Transaction logs and blocked card lists can be joined against users
and a view can be provided that includes the user, and not the credit
card. So you can join freely between the views just fine, by user, and
do all the analysis you need without exposing credit card numbers in
any way, not even redacted.

* If not users, you can join against a random but unique per card
value generated at some point when the card is first inserted in the
records, and you get a random token for the card. Still works, and can
be done with triggers, and is far less leaky than the proposed
redaction.

* Credit card number verification is a leak on its own, but if you
really want it, you can provide a function that does it. And I think
it's perfectly reasonable that defining leaking functions has to be an
admin thing.

* Views can expose the redacted value just fine for direct use. A
generically usable user-id or random-token to redacted number mapping
view would provide all the freedom you could want.

* Functions defined as leakproof (even when they're not, which is an
admin decision to throw data safety out the window, but it's a
possible decision), which allows fetching redacted columns that way
from security barrier views.

Is there anything not covered by the above that can be done by
built-in redacting?

If the answer is yes, then maybe the feature has value.

If the feature's value is ease of use, I'd weight that with the
security loss. False sense of security is a net security loss in most
(if not all) cases. Having to flesh out the logic through security
barrier views, leakproof redacting functions and triggers can have the
good side-effect of making all the possible leaks obvious to the
admin.


On Fri, Oct 10, 2014 at 12:27 PM, Thom Brown t...@linux.com wrote:
 Also, joining to foreign tables could be an issue, copying data to
 temporary tables could possibly remove any redaction, materialised
 views would need to support it somehow.  Although just because I can't
 picture how that would work, it's no indication that it couldn't.

Well, that's why encryption is usually regulatorily required on credit
card data. Way too many ways to leak, and way too valuable to expect
lack of knowledgeable and motivated people trying to get them.


On Fri, Oct 10, 2014 at 12:27 PM, Thom Brown t...@linux.com wrote:
 Salted and hashed passwords, even with complete visibility of the
 value, isn't vulnerable to scrutiny of particular character values.
 If it were, no-one would use it.

 I wasn't suggesting otherwise, but I don't see it as particularly
 relevant to the discussion regardless.

 I guess I was trying to illustrate that the security in a hashed
 password is acceptable because it requires exhaustive searching to
 break.  If comparison operators worked on it, it would be broken out
 of the box.

Lately, the security of password-based authentication is being put
into question very often. So I wouldn't hold credit card numbers or
any other sensible information to the password standard.

But lets use the password example: it's widely accepted that holding
onto cleartext passwords or even transmitting over any channel them or
their plain hashes to be extremely bad practice. So redaction isn't
good enough for passwords, nor is salted hashing either. The only
generally accepted way on the security community, is a password proof
in the context of a zero-knowledge password proof protocol[0]. You'd
want something like that for any bit of info you need to join or
compare but you can't accept leaking it.

[0] http://en.wikipedia.org/wiki/Zero-knowledge_password_proof


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

2014-10-10 Thread Rod Taylor

On Fri, Oct 10, 2014 at 10:56 AM, Stephen Frost sfr...@snowman.net wrote:

 * Thom Brown (t...@linux.com) wrote:
  On 10 October 2014 12:45, Stephen Frost sfr...@snowman.net wrote:
   There's a difference between intending that there shouldn't be a way
   past security and just making access a matter of walking a longer
   route.
  
   Throwing random 16-digit numbers and associated information at a credit
   card processor could be viewed as walking a longer route too.  The
   same goes for random key searches or password guesses.
 
  But those would need to be exhaustive, and in nearly all cases,
  impractical.

 That would be exactly the idea with this- we make it impractical to get
 at the unredacted information.


For fun I gave the search a try.


create table cards (id serial, cc bigint);
insert into cards (cc)
  SELECT CAST(random() *  AS bigint) FROM
generate_series(1,1);

\timing on
WITH RECURSIVE t(id, range_min, range_max) AS (
  SELECT id, 1::bigint,  FROM cards
  UNION ALL
SELECT id
 , CASE WHEN cc = range_avg THEN range_avg ELSE range_min END
 , CASE WHEN cc = range_avg THEN range_avg ELSE range_max END
  FROM (SELECT id, (range_min + range_max) / 2 AS range_avg, range_min,
range_max
  FROM t
   ) AS t_avg
  JOIN cards USING (id)
 WHERE range_min != range_max
)
SELECT id, range_min AS cc FROM t WHERE range_min = range_max;


On my laptop I can pull all 10,000 card numbers in less than 1 second. For
a text based item I don't imagine it would be much different. Numbers are
pretty easy to work with though.

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-10 16:41:39 +0200, Andres Freund wrote:
 FWIW, the profile always looks like
 -  48.61%  postgres  postgres  [.] s_lock
- s_lock
   + 96.67% StrategyGetBuffer
   + 1.19% UnpinBuffer
   + 0.90% PinBuffer
   + 0.70% hash_search_with_hash_value
 +   3.11%  postgres  postgres  [.] GetSnapshotData
 +   2.47%  postgres  postgres  [.] StrategyGetBuffer
 +   1.93%  postgres  [kernel.kallsyms] [k] copy_user_generic_string
 +   1.28%  postgres  postgres  [.] hash_search_with_hash_value
 -   1.27%  postgres  postgres  [.] LWLockAttemptLock
- LWLockAttemptLock
   - 97.78% LWLockAcquire
  + 38.76% ReadBuffer_common
  + 28.62% _bt_getbuf
  + 8.59% _bt_relandgetbuf
  + 6.25% GetSnapshotData
  + 5.93% VirtualXactLockTableInsert
  + 3.95% VirtualXactLockTableCleanup
  + 2.35% index_fetch_heap
  + 1.66% StartBufferIO
  + 1.56% LockReleaseAll
  + 1.55% _bt_next
  + 0.78% LockAcquireExtended
   + 1.47% _bt_next
   + 0.75% _bt_relandgetbuf
 
 to me. Now that's with the client count 496, but it's similar with lower
 counts.
 
 BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
 smarter.

Which is nearly trivial now that atomics are in. Check out the attached
WIP patch which eliminates the spinlock from StrategyGetBuffer() unless
there's buffers on the freelist.

Test:
pgbench  -M prepared -P 5 -S -c 496 -j 496 -T 5000
on a scale=1000 database, with 4GB of shared buffers.

Before:
progress: 40.0 s, 136252.3 tps, lat 3.628 ms stddev 4.547
progress: 45.0 s, 135049.0 tps, lat 3.660 ms stddev 4.515
progress: 50.0 s, 135788.9 tps, lat 3.640 ms stddev 4.398
progress: 55.0 s, 135268.4 tps, lat 3.654 ms stddev 4.469
progress: 60.0 s, 134991.6 tps, lat 3.661 ms stddev 4.739

after:
progress: 40.0 s, 207701.1 tps, lat 2.382 ms stddev 3.018
progress: 45.0 s, 208022.4 tps, lat 2.377 ms stddev 2.902
progress: 50.0 s, 209187.1 tps, lat 2.364 ms stddev 2.970
progress: 55.0 s, 206462.7 tps, lat 2.396 ms stddev 2.871
progress: 60.0 s, 210263.8 tps, lat 2.351 ms stddev 2.914

Yes, no kidding.

The results are similar, but less extreme, for smaller client counts
like 80 or 160.

Amit, since your test seems to be currently completely bottlenecked
within StrategyGetBuffer(), could you compare with that patch applied to
HEAD and the LW_SHARED patch for one client count? That'll may allow us
to see a more meaningful profile...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 6b486e5b467e94ab9297d7656a5b39b816c5c55a Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 10 Oct 2014 17:36:46 +0200
Subject: [PATCH] WIP: lockless StrategyGetBuffer hotpath

---
 src/backend/storage/buffer/freelist.c | 154 --
 1 file changed, 90 insertions(+), 64 deletions(-)

diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5966beb..0c634a0 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -18,6 +18,12 @@
 #include storage/buf_internals.h
 #include storage/bufmgr.h
 
+#include port/atomics.h
+
+
+#define LATCHPTR_ACCESS_ONCE(var)	((Latch *)(*((volatile Latch **)(var
+#define INT_ACCESS_ONCE(var)	((int)(*((volatile int *)(var
+
 
 /*
  * The shared freelist control information.
@@ -27,8 +33,12 @@ typedef struct
 	/* Spinlock: protects the values below */
 	slock_t		buffer_strategy_lock;
 
-	/* Clock sweep hand: index of next buffer to consider grabbing */
-	int			nextVictimBuffer;
+	/*
+	 * Clock sweep hand: index of next buffer to consider grabbing. Note that
+	 * this isn't a concrete buffer - we only ever increase the value. So, to
+	 * get an actual buffer, it needs to be used modulo NBuffers.
+	 */
+	pg_atomic_uint32 nextVictimBuffer;
 
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
@@ -42,8 +52,8 @@ typedef struct
 	 * Statistics.  These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
 	 */
-	uint32		completePasses; /* Complete cycles of the clock sweep */
-	uint32		numBufferAllocs;	/* Buffers allocated since last reset */
+	pg_atomic_uint32 completePasses; /* Complete cycles of the clock sweep */
+	pg_atomic_uint32 numBufferAllocs;	/* Buffers allocated since last reset */
 
 	/*
 	 * Notification latch, or NULL if none.  See StrategyNotifyBgWriter.
@@ -124,87 +134,107 @@ StrategyGetBuffer(BufferAccessStrategy strategy)
 			return buf;
 	}
 
-	/* Nope, so lock the freelist */
-	SpinLockAcquire(StrategyControl-buffer_strategy_lock);
-
 	/*
 	 * We count buffer allocation requests so that the bgwriter can estimate
 	 * the rate of buffer consumption.

Re: [HACKERS] schema-only -n option in pg_restore fails

2014-10-10 Thread Fabrízio de Royes Mello

On Thu, Oct 9, 2014 at 6:19 PM, Josh Berkus j...@agliodbs.com wrote:

 On 10/09/2014 12:36 PM, Josh Berkus wrote:
  Summary: pg_restore -n attempts to restore objects to pg_catalog schema
  Versions Tested: 9.3.5, 9.3.0, 9.2.4

 Explored this some with Andrew offlist.  Turns out this is going to be a
 PITA to fix, so it should go on the big pile of TODOs for when we
 overhaul search_path.

 Here's what's happening under the hood, pg_restore generates this SQL
text:

 SET search_path = schem_a, pg_catalog;
 CREATE TABLE tab_a (
  test text
 );

 Since schem_a doesn't exist, it's skipped over and pg_restore attempts
 to create the objects in pg_catalog.  So this is Yet Another Issue
 caused by the ten meter tall tar baby which is search_path.

 So, my proposal for a resolution:

 1) In current versions, patch the docs to explicitly say that -n does
 not create the schema, and that if the user doesn't create the schema
 pg_restore will fail.

 2) Patch 9.5's pg_restore to do CREATE SCHEMA IF NOT EXISTS when -n is
 used.  This will be 100% backwards-compatible with current behavior.


I agree with this solution. Always when I restore some schema from a dump I
need to create schemas before and it's sucks.

I'm working on the 2th item [1] together with other friend (Sebastian, in
cc) to introduce him into the PostgreSQL development process.

We'll register soon to the next commitfest.

Regards,

[1]
https://github.com/fabriziomello/postgres/tree/create_schema_during_pg_restore_schema_only

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello

Re: [HACKERS] [GENERAL] Understanding and implementing a GiST Index

2014-10-10 Thread Connor Wolf

I'd be ok with either SP-GiST or GiST, though there are tree structures 
(BK tree) that are particularly suited to this application that I 
/think/ map to GiST better then SP-GiST.


Re: hamming in the RD-tree implementation: Where? I cloned the smlar git 
repo, and poked around, and it looks like it only can operate on set 
intersections, which is a non-starter for what I need to do.  I spent a 
while looking through the source, and I didn't see anything that looked 
like hamming distance calculation (through I probably missed some stuff. 
The indirection through `FCall2()` makes things very hard to follow).


Anyways, right now, smlar is not useful to me, because it completely 
ignores the structure of incoming arrays:


postgres=# SELECT smlar(
postgres(# '{1,0,0,0, 1,1,1,1, 1,1,1,0, 0,1,0,0}'::int[],
postgres(# '{1,0,0,1, 0,1,0,0, 0,1,0,0, 0,1,1,0}'
postgres(# );
 smlar
---
 1
(1 row)


For two radically different hashes (shortened for example purposes) 
which compare as identical.


I spent some time trying to see if I could easily disable the array 
uniquefication, and by disabling the calls to `qsort_arg`, and modifying 
`numOfIntersect` so it just counts the number of times the array 
elements do not match, and I'm getting the behaviour I want out of 
`smlar()` at this point:


postgres=# SELECT smlar(
postgres(# '{1,0,0,0, 1,1,1,1, 1,1,1,0, 0,1,0,0}'::int[],
postgres(# '{1,0,0,1, 0,1,0,0, 0,1,0,0, 0,1,1,0}'
postgres(# );
 smlar

 0.4375
(1 row)

postgres=# SELECT smlar(
'{0,1,1,0}'::int[],
'{1,0,1,0}'
);
 smlar
---
   0.5
(1 row)


But the index does not seem to work after the changes I made, and the 
debugging print statements (well, `elog()` statements) I inserted into 
`cmpArrayElem()` and `numOfIntersect()` are not being hit, so I don't 
understand how the index is even being built.


Anyways, I'm rambling a bit. Any input would be great.
Thanks,
Connor

On 10/9/2014 8:11 AM, Oleg Bartunov wrote:

Just quick recommendation.
You need to ask -hackers mailing list for that. Also, do you want GiST 
or SP-GiST ?
We already use hamming distance in rd-tree, which implemented in GiST, 
see our presentation 
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2012.pdf 
http://www.sai.msu.su/%7Emegera/postgres/talks/pgcon-2012.pdf, for 
example.


Oleg

On Thu, Oct 9, 2014 at 11:09 AM, Connor Wolf 
w...@imaginaryindustries.com mailto:w...@imaginaryindustries.com 
wrote:


Hi there!

I'm trying to implement a custom indexing system using the GiST
index framework, and I have a few questions.
Basically, I'm trying to implement a system that allows me to
search across a set of 64 bit integers by hamming distance. This
is intended to be used in searching for similar images, where the
64 bit integer is actually a phash
http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html
of an image, and similarity between two images is reflected in the
hamming distance between two integers.

Anyways, The appropriate approach here is to use something called
a BK tree
http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees,
for which I've written some test code

https://github.com/fake-name/MangaCMS/blob/master/deduplicator/hamDb.py#L27
and I think I have a decent grip of (my test code seems to work,
in any event).

That said:

  * Is there a /simple/ piece of example-code for implementing a
GiST index for a single index? I've been looking through the
/contrib/ directory, particularly the /contrib/btree_gist/
files, but it looks like 90% of the complexity there is
related to supporting all the column types Postgres has,
rather then actually tied to producing a functional index.
  * Once I have something compiling, how can I check to be sure
that I'm actually using the indexing module I created? I can
enable my compiled extension using `CREATE EXTENSION`, but is
there an option for `CREATE INDEX test_index ON testing USING
gist (val);` that lets me specify *which* GiST index is
actually employed? Is this even a valid question?
This is probably something that's available in one of the
system tables somewhere, but I haven't had much luck with
google finding out where.
  * Testing: What's the appropriate way to examine the generated
tree structure of the index? I certainly went through a few
bugs with my test tree system (and that was in python!). Is
there any way to examine the index structure for debugging
purposes?
  * Also, it looks like `ereport()` is the proper way to emit
debugging information. Is this correct?
  * In that vein, is there any way to have information that is on
the operations of an entire query? Looking at the number of
tree nodes touched for a scan would be nice (and I would not
be surprised if there is

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Robert Haas

On Wed, Oct 8, 2014 at 5:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Peter Geoghegan p...@heroku.com wrote:
 On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:
 I think the problem is that it's not possible to respect the usual
 guarantees even at READ COMMITTED level when performing an INSERT OR
 UPDATE operation (however spelled).  You may find that there's a tuple
 with the same PK which is committed but not visible to the snapshot
 you took at the beginning of the statement.

 Can you please comment on this, Kevin? It would be nice to converge on
 an agreement on syntax here

 Robert said however spelled -- which I take to mean that he at
 least sees that the MERGE-like UPSERT syntax can be turned into the
 desired semantics.  I have no idea why anyone would think otherwise.

 Although the last go-around does suggest that there is at least one
 point of difference on the semantics.  You seem to want to fire the
 BEFORE INSERT triggers before determining whether this will be an
 INSERT or an UPDATE.

OK, I have a comment on this.

Anything we do about triggers will by definition be novel.  Right now,
we have INSERT, UPDATE, and DELETE.  If we add a new operation,
whether it's called UPSERT or MERGE or FROB, or if we add a flag to
insert that makes it possibly do something other than inserting (e.g.
INSERT OR UPDATE), or if we add a flag to update that makes it do
something other than updating (e.g. UPDATE or INSERT), then some
people's triggers are going to get broken by that change no matter how
we do it.  When the new operation is invoked, we can fire the insert
triggers, the update triggers, some new kind of trigger, or no trigger
at all - and any decision we make there will not in all cases be
backward-compatible.  We can try to minimize the damage (and we
probably should) but we can't make it go to zero.

 INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON CONFLICT WITHIN targettable_pkey
 UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = 
 now();

I actually like this syntax reasonably well in some ways, but I don't
like that we're mentioning the index name, and the CONFLICTING()
notation is decidedly odd.  Maybe something like this:

INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON DUPLICATE (key)
 UPDATE SET quantity = quantity + OLD.quantity, updated_at = now();

(Perhaps OLD should reference the tuple already in the table, and NEW
the value from the VALUES clause.  That would be snazzy indeed.)

Also, how about making the SET clause optional, with the semantics
that we just update all of the fields for which a value is explicitly
specified:

INSERT INTO overwrite_with_abandon (key, value)
VALUES (42, 'meaning of life')
ON DUPLICATE (key) UPDATE;

While the ability to specify a SET clause there explicitly is useful,
I bet a lot of key-value store users would love the heck out of a
syntax that let them omit it when they want to overwrite.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

Thom Brown t...@linux.com wrote:
 On 10 October 2014 15:56, Stephen Frost sfr...@snowman.net wrote:
 Thom Brown (t...@linux.com) wrote:
 Data such as plain credit card numbers stored in a
 column, even with all its data masked, would be easy to determine.

 I'm not as convinced of that as you are..  Though I'll point out that in
 the use-cases which I've been talking to users about, it isn't credit
 cards under discussion.

 I think credit card numbers are a good example.

I'm not so sure.  Aren't credit card numbers generally required by
law to be stored in an encrypted form?

 If we're talking
 about format functions here, there has to be something in addition to
 that which determines permitted comparison operations.  If not, and we
 were going to remove all but = operations, we'd effectively cripple
 the functionality of anything that's been formatted that wasn't
 intended as a security measure.  It almost sounds like an extension to
 domains rather than column-level functionality.

I have to say that my first thought was that format functions
associated with types with domain override would be a very nice
capability.  But I don't see where that has much to do with
security.  I have seen many places where redaction is necessary
(and in fact done), but I don't see how that could be addressed by
what Simon is proposing.  Perhaps I'm missing something; if so, a
more concrete exposition of a use case might allow things to
click.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-10-10 Thread Tomas Vondra

Hi!

On 9.10.2014 22:28, Kevin Grittner wrote:
Tomas Vondra t...@fuzzy.cz wrote:

The only case I've been able to come up with is when the hash table
fits into work_mem only thanks to not counting the buckets. The new
code will start batching in this case.

Hmm. If you look at the timings in my posting from 2014-10-01 I had
cases where the patched version started with one batch and went to
two, while the unpatched used just one batch, and the patched version
was still more than twice as fast. I'm sure the on disk batch was
fully cached, however; it might work out differently if disk speed
actually came into the picture.

I think the case with large outer table and memory pressure, forcing the
batches to be actually written to disk (instead of just being in the
page cache) is the main concern here.

That is mostly luck, however, because it depends on the cardinality
estimates, and the best way to fix it is increasing work_mem (which is
safer thanks to reducing the overhead).

Also, Robert proposed a way to mitigate this, if we realize we'd
have to do batching during the initial sizing, we can peek whether
reducing the number of buckets (to 1/2 or maybe 1/4) would help. I
believe this is a good approach, and will look into that after
pgconf.eu (i.e. early November), unless someone else is interested.

Sure, but it would be good to confirm that it's actually needed first.

Yeah. But with cleverly crafted test case it's possible to confirm
almost everything ;-)

When given a generous work_mem setting the patched version often uses
more of what it is allowed than the unpatched version (which is
probably one of the reasons it tends to do better). If someone has
set a high work_mem and has gotten by only because the configured
amount is not all being used when it would benefit performance, they
may need to adjust work_mem down to avoid memory problems. That
doesn't seem to me to be a reason to reject the patch.

I'm not entirely sure I understand this paragraph. What do you mean by
configured amount is not all being used when it would benefit
performance? Can you give an example?

Again, the earlier post showed that while the unpatched used 3516kB
whether it had work_mem set to 4MB or 1GB, the patched version used
3073kB when work_mem was set to 4MB and 4540kB when work_mem was
set to 1GB. The extra memory allowed the patched version to stay
at 1 batch, improving performance over the other setting.

OK, so with 4MB the new version was batching, while the original code
keeps a single batch (likely due to not counting buckets into work_mem).
I believe that's expected behavior.

Also, in the e-mail from Oct 2 that got lost [1], I pointed out that by
testing only with small hash tables the results are likely influenced by
high CPU cache hit ratio. I think benchmarking with larger inner
relation (thus increasing the memory occupied by the hash table) might
be interesting.

[1]
http://www.postgresql.org/message-id/c84680e818f6a0f4a26223cd750ff252.squir...@2.emaily.eu

The only thing I can think of is the batching behavior described above.

This is in Waiting on Author status only because I never got an
answer about why the debug code used printf() rather the elog() at
a DEBUG level. Other than that, I would say this patch is Ready
for Committer. Tomas? You there?

I think I responded to that on October 2, quoting:
...
Ah, I never received that email. That tends to happen every now
and then. :-(

I believe it's safe to switch the logging to elog(). IMHO the printf
logging is there from some very early version of the code, before elog
was introduced. Or something like that.

I guess it might be best to remain consistent in this patch and
change that in a separate patch. I just wanted to make sure you
didn't see any reason not to do so.

OK, I'll put that on my TODO list for the next CF.

With that addressed I will move this to Ready for Committer. Since
both Heikki and Robert spent time on this patch earlier, I'll give
either of them a shot at committing it if they want; otherwise I'll
do it.

Great, thanks!
Tomas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

2014-10-10 Thread Bruce Momjian

On Fri, Oct 10, 2014 at 02:05:05PM +0100, Thom Brown wrote:
 On 10 October 2014 13:43, Simon Riggs si...@2ndquadrant.com wrote:
  On 10 October 2014 11:45, Thom Brown t...@linux.com wrote:
 
  To be honest, this all sounds rather flaky.
 
  To be honest, suggesting anything at all is rather difficult and I
  recommend people try it.
 
 I have, and most ideas I've had have been justifiably shot down or
 picked apart (scheduled background tasks, offloading stats collection
 to standby, index maintenance in DML query plans, expression
 statistics... to name but a few).
 
  Everything sounds crap when you didn't think of it and you've given it
  an hour's thought.
 
 I'm not sure that means my concerns aren't valid.  I don't think it
 sounds crap, but I also can't see any use-case for it where we don't
 already have things covered, or where it's going to offer any useful
 level of security.  Like with RLS, it may be that I'm just looking at
 things from the wrong perspective.

Agreed.  The problem isn't giving it only an hours thought --- it is
that we can come up with serious problems in five _seconds_ of thought. 
Unless you can some up with a solution to those issues, I am not sure
why we are even talking about it.

My other concern is you must have realized these issues in five seconds
too, so why didn't you mention them?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote:
 Anything we do about triggers will by definition be novel.  Right now,
 we have INSERT, UPDATE, and DELETE.  If we add a new operation,
 whether it's called UPSERT or MERGE or FROB, or if we add a flag to
 insert that makes it possibly do something other than inserting (e.g.
 INSERT OR UPDATE), or if we add a flag to update that makes it do
 something other than updating (e.g. UPDATE or INSERT), then some
 people's triggers are going to get broken by that change no matter how
 we do it.  When the new operation is invoked, we can fire the insert
 triggers, the update triggers, some new kind of trigger, or no trigger
 at all - and any decision we make there will not in all cases be
 backward-compatible.  We can try to minimize the damage (and we
 probably should) but we can't make it go to zero.

+1. It's inevitable that someone isn't going to be entirely happy with
whatever we do. Let's be realistic about that, while minimizing the
risk.

 I actually like this syntax reasonably well in some ways, but I don't
 like that we're mentioning the index name, and the CONFLICTING()
 notation is decidedly odd.  Maybe something like this:

People keep remarking that they don't like that you can (optionally)
name a unique index explicitly, and I keep telling them why I've done
it that way [1]. There is a trade-off here. I am willing to go another
way in that trade-off, but let's have a realistic conversation about
it.

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

 While the ability to specify a SET clause there explicitly is useful,
 I bet a lot of key-value store users would love the heck out of a
 syntax that let them omit it when they want to overwrite.

While I initially like that idea, now I'm not so sure about it [2].
MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE
command allows it, since it is defined as a DELETE followed by an
INSERT (or something like that), but I think that in general that
feature has been a disaster for them.

[1] 
http://www.postgresql.org/message-id/CAM3SWZQpGf6_ME6D1vWqdCFadD7Nprdx2JrE=Hcf=93kxeh...@mail.gmail.com

[2] 
http://www.postgresql.org/message-id/CAM3SWZT=vxbj7qkaidamybu40ap10udsqooqhvix3ykj7wb...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

Robert Haas robertmh...@gmail.com wrote:

 Anything we do about triggers will by definition be novel.  Right
 now, we have INSERT, UPDATE, and DELETE.  If we add a new
 operation, whether it's called UPSERT or MERGE or FROB, [ ... ]

If we call it MERGE, then we had better follow the rules the
standard lays out for triggers on a MERGE statement -- which is
that UPDATE triggers fire for rows updated and that INSERT triggers
fire for rows inserted.  That kinda makes sense, since the MERGE
command is almost like a row-by-row CASE statement allowing one or
the other.

If we make up our own command verb, we are free to do as we like.

 Maybe something like this:

 INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON DUPLICATE (key)
 UPDATE SET quantity = quantity + OLD.quantity, updated_at = now();

That seems a lot cleaner than the proposal on the Wiki page.  If we
go that route, it makes sense to fire the BEFORE INSERT triggers
before attempting the insert and then fire BEFORE UPDATE triggers
before attempting the UPDATE.  If either succeeds, I think we
should fire the corresponding AFTER triggers.  We already allow a
BEFORE triggers to run and then omit the triggering operation
without an error, so I don't see that as a problem.  This makes a
lot more sense to me than attempting to add a new UPSERT trigger
type.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner kgri...@ymail.com wrote:
 That seems a lot cleaner than the proposal on the Wiki page.  If we
 go that route, it makes sense to fire the BEFORE INSERT triggers
 before attempting the insert and then fire BEFORE UPDATE triggers
 before attempting the UPDATE.  If either succeeds, I think we
 should fire the corresponding AFTER triggers.  We already allow a
 BEFORE triggers to run and then omit the triggering operation
 without an error, so I don't see that as a problem.  This makes a
 lot more sense to me than attempting to add a new UPSERT trigger
 type.

You realize that that's exactly what my patch does, right?

AFAICT the only confusion that my patch has is with statement-level
triggers, as discussed on the Wiki page. I think that the row-level
trigger behavior is fine; a lot more thought has gone into it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote:

 I actually like this syntax reasonably well in some ways, but I don't
 like that we're mentioning the index name, and the CONFLICTING()
 notation is decidedly odd.

 People keep remarking that they don't like that you can (optionally)
 name a unique index explicitly, and I keep telling them why I've done
 it that way [1]. There is a trade-off here. I am willing to go another
 way in that trade-off, but let's have a realistic conversation about
 it.

We've all read that, and your repeated arguments for that point of
view.  We disagree and have said why.  What in that is not a
realistic conversation?

To restate: to do so is conflating the logical definition of the 
database with a particular implementation detail.  As just one 
reason that is a bad idea: we can look up unique indexes on the 
specified columns, but if we implement a other storage techniques 
where there is no such thing as a unique index on the columns, yet 
manage to duplicate the semantics (yes, stranger things have 
happened), people can't migrate to the new structure without 
rewriting their queries.  If the syntax references logical details 
(like column names) there is no need to rewrite.  We don't want to 
be painted into a corner.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 11:38 AM, Peter Geoghegan p...@heroku.com wrote:
 That seems a lot cleaner than the proposal on the Wiki page.  If we
 go that route, it makes sense to fire the BEFORE INSERT triggers
 before attempting the insert and then fire BEFORE UPDATE triggers
 before attempting the UPDATE.

By the way, there is no problem with failing to UPDATE, because we
lock rows ahead of the UPDATE. Once a row is locked, the UPDATE cannot
conflict. There is no danger of UPDATE before row-level triggers
firing without then updating (unless the xact aborts, but you know
what I mean). In general, there is no danger of triggers firing more
often than you might consider that they should, with the sole
exception of the fact that we always fire before insert row-level
triggers, even when an UPDATE is the ultimate outcome.

Restarting for conflicts (e.g. handling concurrent insertions with
promise tuples) necessitates a restart, but to the point after before
row-level insert triggers fire, and from a point before any other
triggers have the opportunity to fire.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner kgri...@ymail.com wrote:
 We've all read that, and your repeated arguments for that point of
 view.  We disagree and have said why.  What in that is not a
 realistic conversation?

Because, as I've said many times, there are problems with naming
columns directly that need to be addressed. There are corner-cases.
Are you going to complain about those once I go implement something?
Let's talk about that.

I think we can just throw an error in these edge-cases, but let's
decide if we're comfortable with that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner kgri...@ymail.com wrote:

 That seems a lot cleaner than the proposal on the Wiki page.  If we
 go that route, it makes sense to fire the BEFORE INSERT triggers
 before attempting the insert and then fire BEFORE UPDATE triggers
 before attempting the UPDATE.  If either succeeds, I think we
 should fire the corresponding AFTER triggers.  We already allow a
 BEFORE triggers to run and then omit the triggering operation
 without an error, so I don't see that as a problem.  This makes a
 lot more sense to me than attempting to add a new UPSERT trigger
 type.

 You realize that that's exactly what my patch does, right?

I haven't read the patch, but the discussion has made that clear,
yes.  Try to take agreement on a point gracefully, will ya?  ;-)

 AFAICT the only confusion that my patch has is with statement-level
 triggers, as discussed on the Wiki page. I think that the row-level
 trigger behavior is fine; a lot more thought has gone into it.

IMV it is clear that since the standard says that update or insert
operations which affect zero rows fire the corresponding trigger,
the statement level triggers for both should fire for an UPSERT,
even if the set of affected rows for one or the other (or both!) is
zero.  If we ever get transition relations, it will be easy to
check the count of affected rows or otherwise behave appropriately
based on what was done.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

Peter Geoghegan p...@heroku.com wrote:

 There is no danger of UPDATE before row-level triggers firing
 without then updating (unless the xact aborts, but you know what
 I mean).

Well, let's make sure I do know what you mean.  If a BEFORE UPDATE
... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped
without error, right?  The BEFORE UPDATE triggers will run before
the UPDATE if a duplicate is found, and the return value will be
treated in the usual manner, replacing values and potentially
skipping the update?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Robert Haas

On Fri, Oct 10, 2014 at 2:29 PM, Peter Geoghegan p...@heroku.com wrote:
 People keep remarking that they don't like that you can (optionally)
 name a unique index explicitly, and I keep telling them why I've done
 it that way [1]. There is a trade-off here. I am willing to go another
 way in that trade-off, but let's have a realistic conversation about
 it.

I think what's realistic here is that the patch isn't going to get
committed the way you have it today, because nobody else likes that
design.  That may be harsh, but I think it's accurate.

But on the substance of the issue, I'm totally unconvinced by your
argument in the linked email.  Let's suppose you have this:

create table foo (a int, b int);
create unique index foo1 on foo (a);
create unique index foo2 on foo (a);
create unique index foo3 on foo (a) where b = 1;

Anything that conflicts in foo1 or foo2 is going to conflict in the
other one, and with foo3.  Anything that conflicts in foo3 is perhaps
also going to conflict in foo1 and foo2, or perhaps not.  Yet, if the
statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
what to do: when we hit a conflict in any of those three indexes,
update the row.  No matter which index has the conflict, updating is
the right answer, and solves the conflict in all three indexes.  I
dunno what you'd expect someone to write in your syntax to solve this
problem - presumably either (1) they can list any of the indexes that
might conflict, (2) they must list all of the indexes that might
conflict, or (3) it just doesn't work.  Whatever the actual behavior
of your patch is today, it seems to me that the conflict is,
fundamentally, over a set of column values, NOT over a particular
index.  The index is simply a mechanism for noticing that conflict.

If you want to allow this to work with expression indexes, that's not
really a problem; you can let the user write INSERT .. ON CONFLICT
(upper(foo)) UPDATE ... and match that to the index.  It wouldn't
bother me if the first version of this feature couldn't target
expression indexes anyway, but if you want to include that, having the
user mention the actual expression rather than the index name
shouldn't make much difference.

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

 While the ability to specify a SET clause there explicitly is useful,
 I bet a lot of key-value store users would love the heck out of a
 syntax that let them omit it when they want to overwrite.

 While I initially like that idea, now I'm not so sure about it [2].
 MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE
 command allows it, since it is defined as a DELETE followed by an
 INSERT (or something like that), but I think that in general that
 feature has been a disaster for them.

Your syntax allows the exact same thing; it simply require the user to
be more verbose in order to get that behavior.  If we think that
people wanting that behavior will be rare, then it's fine to require
them to spell it out when they want it.  If we think it will be the
overwhelming common application of this feature, and I do, then making
people spell it out when we could just as well infer it is pointless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner kgri...@ymail.com wrote:

 [discussion on committers rejecting the notion of a syntax 
  involving specification of an index name]

 as I've said many times, there are problems with naming
 columns directly that need to be addressed. There are corner-cases.
 Are you going to complain about those once I go implement something?

If I don't like what I see, yes.  I think it will be entirely
possible for you to write something along those lines that I won't
complain about.

 I think we can just throw an error in these edge-cases, but let's
 decide if we're comfortable with that.

If the list of columns does not allow a choice of a suitable
full-table unique index on only non-null columns, an error seems
appropriate.  What other problem cases do you see?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 11:55 AM, Kevin Grittner kgri...@ymail.com wrote:
 I haven't read the patch, but the discussion has made that clear,
 yes.  Try to take agreement on a point gracefully, will ya?  ;-)

Heh, sorry. I did literally mean what I said - it wasn't 100% clear to
me that you got that. It's safest to not make any assumptions like
that.

 AFAICT the only confusion that my patch has is with statement-level
 triggers, as discussed on the Wiki page. I think that the row-level
 trigger behavior is fine; a lot more thought has gone into it.

 IMV it is clear that since the standard says that update or insert
 operations which affect zero rows fire the corresponding trigger,
 the statement level triggers for both should fire for an UPSERT,
 even if the set of affected rows for one or the other (or both!) is
 zero.  If we ever get transition relations, it will be easy to
 check the count of affected rows or otherwise behave appropriately
 based on what was done.

I think that you're probably right about that. Note that UPDATE rules
will not be applied to the UPDATE portion of the statement. Not sure
what to do about that, but I'm pretty sure that it's inevitable,
because the UPDATE is effectively the same top-level statement for the
purposes of rules.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 12:04 PM, Kevin Grittner kgri...@ymail.com wrote:
 Peter Geoghegan p...@heroku.com wrote:

 There is no danger of UPDATE before row-level triggers firing
 without then updating (unless the xact aborts, but you know what
 I mean).

 Well, let's make sure I do know what you mean.  If a BEFORE UPDATE
 ... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped
 without error, right?  The BEFORE UPDATE triggers will run before
 the UPDATE if a duplicate is found, and the return value will be
 treated in the usual manner, replacing values and potentially
 skipping the update?

That's exactly right, yes (in particular, you can return NULL from
before row update triggers and have that cancel the update in the
usual manner). And potentially, the final value could be affected by
both the before row insert and before row update triggers (by way of
CONFLICTING()). That's the most surprising aspect of what I've done
(although I would argue it's the least surprising behavior possible
given my constraints).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas robertmh...@gmail.com wrote:
 I think what's realistic here is that the patch isn't going to get
 committed the way you have it today, because nobody else likes that
 design.  That may be harsh, but I think it's accurate.

I do think that's harsh, because it's unnecessary: I am looking for
the best design. If you want to propose alternatives, great, but there
is a reason why I've done things that way, and that should be
acknowledged. I too think naming the unique index is ugly as sin, and
have said as much multiple times. We're almost on the same page here.

 But on the substance of the issue, I'm totally unconvinced by your
 argument in the linked email.  Let's suppose you have this:

 create table foo (a int, b int);
 create unique index foo1 on foo (a);
 create unique index foo2 on foo (a);
 create unique index foo3 on foo (a) where b = 1;

 Anything that conflicts in foo1 or foo2 is going to conflict in the
 other one, and with foo3.  Anything that conflicts in foo3 is perhaps
 also going to conflict in foo1 and foo2, or perhaps not.  Yet, if the
 statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
 what to do: when we hit a conflict in any of those three indexes,
 update the row.  No matter which index has the conflict, updating is
 the right answer, and solves the conflict in all three indexes.  I
 dunno what you'd expect someone to write in your syntax to solve this
 problem - presumably either (1) they can list any of the indexes that
 might conflict, (2) they must list all of the indexes that might
 conflict, or (3) it just doesn't work.

I expect them to name exactly one unique index. They should either do
that explicitly, or have one in mind and make the behavior implicit at
the risk of miscalculating (and having a surprising outcome). It
doesn't matter if it's foo1 or foo2 in this example (but foo3 is
different, obviously).

 Whatever the actual behavior
 of your patch is today, it seems to me that the conflict is,
 fundamentally, over a set of column values, NOT over a particular
 index.  The index is simply a mechanism for noticing that conflict.

I think that this is the kernel of our disagreement. The index is not
simply a mechanism for noticing that conflict. The user had better
have one unique index in mind to provoke taking the alternative path
in the event of a would-be unique violation. Clearly it doesn't matter
much in this particular example. But what if there were two partial
unique indexes, that were actually distinct, but only in terms of the
constant appearing in their predicate? And what about having that
changed by a before insert row-level trigger? Are we to defer deciding
which unique index to use at the last possible minute?

As always with this stuff, the devil is in the details. If we work out
the details, then I can come up with something that's acceptable to
everyone. Would you be okay with this never working with partial
unique indexes? That gives me something to work with.

 If you want to allow this to work with expression indexes, that's not
 really a problem; you can let the user write INSERT .. ON CONFLICT
 (upper(foo)) UPDATE ... and match that to the index.  It wouldn't
 bother me if the first version of this feature couldn't target
 expression indexes anyway, but if you want to include that, having the
 user mention the actual expression rather than the index name
 shouldn't make much difference.

I'm not that worried about expression indexes, actually. I'm mostly
worried about partial unique indexes, particularly when before insert
row-level triggers are in play (making *that* play nice with
expression indexes is harder still, but expression indexes on their
own are probably not that much of a problem).

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

 Your syntax allows the exact same thing; it simply require the user to
 be more verbose in order to get that behavior.  If we think that
 people wanting that behavior will be rare, then it's fine to require
 them to spell it out when they want it.  If we think it will be the
 overwhelming common application of this feature, and I do, then making
 people spell it out when we could just as well infer it is pointless.

Did you consider my example? I think that people will like this idea,
too - that clearly isn't the only consideration, though. As you say,
it would be very easy to implement this. However, IMV, we shouldn't,
because it is hazardous. MySQL doesn't allow this, and they tend to
find expedients like this useful.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column Redaction

Rod,

* Rod Taylor (rod.tay...@gmail.com) wrote:
 For fun I gave the search a try.

Neat!

 On my laptop I can pull all 10,000 card numbers in less than 1 second. For
 a text based item I don't imagine it would be much different. Numbers are
 pretty easy to work with though.

I had been planning to give something like this a shot once I got back
from various meetings today- so thanks!  Being able to use the CC # *as*
the target for the binary search is definitely an issue, though looking
back on the overall problem space, CC's are less than 54 bits, and it's
actually a smaller space than than that if you know how they're put
together.

My thought on an attack was more along these lines:

select * from cards join (SELECT CAST(random() *  AS
bigint) a from generate_series(1,100)) as foo on (cards.cc = foo.a);

Which could pretty quickly find ~500 CC #s in a second or so (with a
'cards' table of about 1M entries) based on my testing.  That's clearly
sufficient enough to make it a viable attack also.

The next question I have is- do(es) the other vendor(s) provide a way to
address this or is it simply known that this doesn't offer any
protection at all from adhoc queries and it's strictly for formatting?
I can certainly imagine it actually being a way to simply avoid
*inadvertant* exposure rather than providing any security from the
individual running the commands.  I'm not sure that would make it
genuinely different enough from simply maintaining a view which does
that filtering to make it useful on its own as a feature though, but I'm
not particularly strongly against it either.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Column Redaction

On 10 October 2014 19:26, Bruce Momjian br...@momjian.us wrote:
 On Fri, Oct 10, 2014 at 02:05:05PM +0100, Thom Brown wrote:
 On 10 October 2014 13:43, Simon Riggs si...@2ndquadrant.com wrote:
  On 10 October 2014 11:45, Thom Brown t...@linux.com wrote:
 
  To be honest, this all sounds rather flaky.

 My other concern is you must have realized these issues in five seconds
 too, so why didn't you mention them?

Because the problems that you come up with in 5 seconds aren't
necessarily problems. You just think they are, given 5 seconds
thought. I think my first impression of the concept was poor also
though it would be wonderful if I had remembered all of my initial
objections.

I didn't have any problem with Thom's first post, which was helpful in
allowing me to explain the context and details. As I said in reply at
that point, this is not in itself a barrier; other measures are
necessary. The rest of the thread has descended into a massive
misunderstanding of the purpose and role of redaction.

When any of us move too quickly to a value judgement about a new
concept then we're probably missing the point.

All of us will be asked at sometime in the next few years why Postgres
doesn't have redaction. When you get it, post back here please. Or if
you win the argument on it not being useful in any circumstance, post
that here also. I'm not in a rush.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Jim Nasby


On 10/9/14, 6:59 PM, Gavin Flower wrote:

On 10/10/14 12:38, Jim Nasby wrote:

On 10/8/14, 5:51 PM, Peter Geoghegan wrote:

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.


FWIW, if each row was handled in a subtransaction then an insert that turned 
out to be an update could be rolled back... but the performance impact of going 
that route might be pretty horrid. :( There's also the potential to get stuck 
in a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a 
BEFORE UPDATE trigger turns it into an INSERT.

Perhaps you need an UPSERT trigger?


I would think that a BEFORE UPSERT trigger would very likely want to know 
whether we were inserting or updating, which basically puts us back where we 
started.

That said, since the use case for UPSERT is different than both INSERT and 
UPDATE maybe it would be a good idea to have a separate trigger for them anyway.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas robertmh...@gmail.com wrote:
 I think what's realistic here is that the patch isn't going to get
 committed the way you have it today, because nobody else likes that
 design.  That may be harsh, but I think it's accurate.

 I do think that's harsh, because it's unnecessary: I am looking for
 the best design. If you want to propose alternatives, great, but there
 is a reason why I've done things that way, and that should be
 acknowledged. I too think naming the unique index is ugly as sin, and
 have said as much multiple times. We're almost on the same page here.

I hope you can adjust to the feedback, because it would be
disappointing to have this feature slip from the release, which is
what will happen if the index name remains part of the syntax.

 Would you be okay with this never working with partial unique
 indexes? That gives me something to work with.

That seems like the only sensible course, to me.

 If you want to allow this to work with expression indexes, that's not
 really a problem; you can let the user write INSERT .. ON CONFLICT
 (upper(foo)) UPDATE ... and match that to the index.  It wouldn't
 bother me if the first version of this feature couldn't target
 expression indexes anyway, but if you want to include that, having the
 user mention the actual expression rather than the index name
 shouldn't make much difference.

 I'm not that worried about expression indexes, actually. I'm mostly
 worried about partial unique indexes, particularly when before insert
 row-level triggers are in play (making *that* play nice with
 expression indexes is harder still, but expression indexes on their
 own are probably not that much of a problem).

There is a problem if any column of the index allows nulls, since
that would allow multiple rows in the target table to match an
argument. Proving that an expression does not could be tricky.  I
suggest that, at least for a first cut, we restrict this to
matching an index on NOT NULL columns.

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
VALUES (42, 'meaning of life')
ON DUPLICATE (key) UPDATE;

 Your syntax allows the exact same thing; it simply require the user to
 be more verbose in order to get that behavior.  If we think that
 people wanting that behavior will be rare, then it's fine to require
 them to spell it out when they want it.  If we think it will be the
 overwhelming common application of this feature, and I do, then making
 people spell it out when we could just as well infer it is pointless.

+1

 Did you consider my example? I think that people will like this idea,
 too - that clearly isn't the only consideration, though. As you say,
 it would be very easy to implement this. However, IMV, we shouldn't,
 because it is hazardous.

To quote the cited case:

 1. Developer writes the query, and it works fine.

 2. Some time later, the DBA adds an inserted_at column (those are
 common). The DBA is not aware of the existence of this particular
 query. The new column has a default value of now(), say.

 Should we UPDATE the inserted_at column to be NULL? Or (more
 plausibly) the default value filled in by the INSERT? Or leave it be?
 I think there is a case to be made for all of these behaviors, and
 that tension makes me prefer to not do this at all. It's like
 encouraging SELECT * queries in production, only worse.

That does point out the problem, in general, with carrying values
from a BEFORE INSERT trigger into an UPDATE.  Perhaps if the INSERT
fails the UPDATE phase should start with the values specified in
the first place, and not try to use anything returned by the BEFORE
INSERT triggers?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 2:17 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 That said, since the use case for UPSERT is different than both INSERT and
 UPDATE maybe it would be a good idea to have a separate trigger for them
 anyway.

-1. I think that having a separate UPSERT trigger is a bad idea. I'll
need to change the statement-level behavior to match what Kevin
described (fires twice, always, when INSERT ON CONFLICT UPDATE is
used).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

On Fri, Oct 10, 2014 at 2:16 PM, Kevin Grittner kgri...@ymail.com wrote:
 Would you be okay with this never working with partial unique
 indexes? That gives me something to work with.

 That seems like the only sensible course, to me.

Okay, great. If that's the consensus, then that's all I need to know.

 I'm not that worried about expression indexes, actually. I'm mostly
 worried about partial unique indexes, particularly when before insert
 row-level triggers are in play

 There is a problem if any column of the index allows nulls, since
 that would allow multiple rows in the target table to match an
 argument. Proving that an expression does not could be tricky.  I
 suggest that, at least for a first cut, we restrict this to
 matching an index on NOT NULL columns.

Hmm. That could definitely be a problem. But the problem is clearly
the user's fault. It could allow multiple rows to match, as you say
- but you're talking about a different concept of matching. Unlike
with certain other systems, that isn't a would-be unique violation,
and so it isn't a match in the sense I intend, and so you insert.
You don't match more than one row, because (in the sense peculiar to
this feature) there are no matches, and insertion really is the
correct outcome.

As a user, you have no more right to be surprised by this then by the
fact that you wouldn't see a dup violation with a similar vanilla
INSERT - some people are surprised by that, with regularity.

Maybe I need to emphasize the distinction in the documentation between
the two slightly different ideas of matching that are in tension
here. I think it would be a mistake to forcibly normalize things to
remove that distinction (by adding an artificial restriction), at any
rate.

 Did you consider my example? I think that people will like this idea,
 too - that clearly isn't the only consideration, though. As you say,
 it would be very easy to implement this. However, IMV, we shouldn't,
 because it is hazardous.

 To quote the cited case:

 1. Developer writes the query, and it works fine.

 2. Some time later, the DBA adds an inserted_at column (those are
 common). The DBA is not aware of the existence of this particular
 query. The new column has a default value of now(), say.

 Should we UPDATE the inserted_at column to be NULL? Or (more
 plausibly) the default value filled in by the INSERT? Or leave it be?
 I think there is a case to be made for all of these behaviors, and
 that tension makes me prefer to not do this at all. It's like
 encouraging SELECT * queries in production, only worse.

 That does point out the problem, in general, with carrying values
 from a BEFORE INSERT trigger into an UPDATE.  Perhaps if the INSERT
 fails the UPDATE phase should start with the values specified in
 the first place, and not try to use anything returned by the BEFORE
 INSERT triggers?

I think that that behavior is far more problematic for numerous other
reasons, though. Potentially, you're not seeing what *really* caused
the conflict at all. I just don't think it's worth it to save a bit of
typing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Sehrope Sarkuni

Hi,

I've been testing out some of the new materialized view functionality
in 9.4 and noticed that they don't show up in the information_schema
data dictionary views, specifically information_schema.tables or
information_schema.views (sample output is below and it's the same for
9.3 as well).

Is this on purpose or just an oversight?

I think they should at least be in the information_schema.views view,
though the case could be made that they should be in
information_schema.tables as well (as they can have additional
indexes). If they're added to the tables view then they should have a
separate table_type (say, MATERIALIZED VIEW), similar to how foreign
tables show up.

Looking at the view definitions, it looks like it's just a matter of
adding 'm' values to the IN clauses that filter pg_class.relkind. It
doesn't look like they're used internally by anything other than
tests. Only client apps that query the data dictionary views (ex: GUI
clients or dynamic SQL generators) would be impacted. I'd argue it's
for the better as they can they can now see those objects, even if
they think they're regular tables or views.

If this sound fine I can put together a patch for this.

Regards,
-- Sehrope Sarkuni


postgres@vagrant-ubuntu-trusty-64:~$ psql test
psql (9.4beta3)
Type help for help.

test=# CREATE TABLE some_table (x text);
CREATE TABLE

test=# CREATE MATERIALIZED VIEW some_mview AS SELECT * FROM some_table;
SELECT 0

test=# CREATE VIEW some_view AS SELECT * FROM some_table;
CREATE VIEW

test=# \d some_mview
Materialized view public.some_mview
 Column | Type | Modifiers
+--+---
 x  | text |

test=# SELECT table_name, table_type FROM information_schema.tables
WHERE table_schema = 'public';
 table_name | table_type
+
 some_table | BASE TABLE
 some_view  | VIEW
(2 rows)

test=# SELECT table_name, view_definition FROM
information_schema.views WHERE table_schema = 'public';
 table_name |   view_definition
+--
 some_view  |  SELECT some_table.x+
|FROM some_table;
(1 row)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Tom Lane

Sehrope Sarkuni sehr...@jackdb.com writes:
 I've been testing out some of the new materialized view functionality
 in 9.4 and noticed that they don't show up in the information_schema
 data dictionary views, specifically information_schema.tables or
 information_schema.views (sample output is below and it's the same for
 9.3 as well).

 Is this on purpose or just an oversight?

It's on purpose.  The information_schema can only show objects that
exist in the SQL standard.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views don't show up in information_schema

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Sehrope Sarkuni sehr...@jackdb.com writes:
  I've been testing out some of the new materialized view functionality
  in 9.4 and noticed that they don't show up in the information_schema
  data dictionary views, specifically information_schema.tables or
  information_schema.views (sample output is below and it's the same for
  9.3 as well).
 
  Is this on purpose or just an oversight?
 
 It's on purpose.  The information_schema can only show objects that
 exist in the SQL standard.

I'm not particularly thrilled with this answer.  I'd aruge that the
'materialized' part of mat views isn't relevant to the standard, which
does not concern itself with such performance-oriented considerations,
and therefore, to the standard's view (pun rather intended), they're
views.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] orangutan seizes up during isolation-check

2014-10-10 Thread Noah Misch

On Mon, Sep 15, 2014 at 12:51:14AM -0400, Noah Misch wrote:
 1. Fork, call setlocale(LC_x, ) in the child, pass back the effective locale
name through a pipe, and pass that name to setlocale() in the original
process.  The short-lived child will get the extra threads, and the
postmaster will remain clean.

Here's an implementation thereof covering both backend and frontend use of
setlocale().  A setlocale() wrapper, pg_setlocale(), injects use of the child
process where necessary.  I placed the new function in src/common/exec.c,
because every executable already needed that file's code and translatable
strings for set_pglocale_pgservice() and callees.  (Some executables do miss
exec.c translations; I did not update them.)  src/port/chklocale.c was closer
in subject matter, but libpq imports it; this function does not belong in
libpq.  Also, this function would benefit from a frontend ereport()
implementation, which is likely to land in libpgcommon if it lands at all.
The runtime changes are conditional on __darwin__ but not on --enable-nls.
NLS builds are the production norm; I'd like non-NLS builds to consistently
exercise this code rather than shave its bytes and cycles.

pg_setlocale() relies on main() setting all six LC_category environment
variables.  The second attached patch seals the cracks in main()'s
longstanding attempt to do so.  This improves preexisting user-visible
behavior in weird cases: LANG=pt_BR.utf8 LC_ALL=invalid postgres -D nosuch
will now print untranslated text, not pt_BR text.

Documentation for each of lc_monetary, lc_numeric and lc_time says If this
variable is set to the empty string (which is the default) then the value is
inherited from the execution environment of the server in a system-dependent
way.  Not so; by design, setlocale(LC_x, ) just re-selects the last value L
passed to pg_perm_setlocale(LC_x, L).  I have not touched this; I mention it
for the sake of the archives.
commit cd27ff7 (HEAD)
Author: Noah Misch n...@leadboat.com
AuthorDate: Fri Oct 10 04:04:03 2014 -0400
Commit: Noah Misch n...@leadboat.com
CommitDate: Fri Oct 10 04:04:03 2014 -0400

When setlocale(LC_x, ) will start a thread, run it in a child process.

Only Darwin, --enable-nls builds use a setlocale() that can start a
thread.  Buildfarm member orangutan experienced BackendList corruption
on account of different postmaster threads executing signal handlers
simultaneously.  Furthermore, a multithreaded postmaster risks undefined
behavior from sigprocmask() and fork().

Introduce pg_setlocale(), a wrapper around setlocale().  When the
corresponding raw setlocale() call could start a thread, it forks,
retrieves the setlocale() return value from the child, and uses that
non- locale value to make an equivalent setlocale() call in the
original process.  The short-lived child does become multithreaded, but
it uses no feature for which that poses a problem.  Use of
pg_setlocale() in the frontend protects forking executables, such as
pg_dump, from undefined behavior.  As the usage guideline comment
implies, whether to call setlocale() or pg_setlocale() is a mere style
question for most code sites.  Opt for pg_setlocale() at indifferent
code sites, leaving raw setlocale() calls in src/interfaces, in
pg_setlocale() itself, and in callees of pg_setlocale().  Back-patch to
9.0 (all supported versions).

diff --git a/configure b/configure
index f0580ce..ee08963 100755
--- a/configure
+++ b/configure
@@ -11298,7 +11298,7 @@ fi
 LIBS_including_readline=$LIBS
 LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit 
mbstowcs_l memmove poll pstat readlink setproctitle setsid shm_open sigprocmask 
symlink sync_file_range towlower utime utimes wcstombs wcstombs_l
+for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit 
mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle 
setsid shm_open sigprocmask symlink sync_file_range towlower utime utimes 
wcstombs wcstombs_l
 do :
   as_ac_var=`$as_echo ac_cv_func_$ac_func | $as_tr_sh`
 ac_fn_c_check_func $LINENO $ac_func $as_ac_var
diff --git a/configure.in b/configure.in
index 527b076..2f8adfc 100644
--- a/configure.in
+++ b/configure.in
@@ -1257,7 +1257,7 @@ PGAC_FUNC_GETTIMEOFDAY_1ARG
 LIBS_including_readline=$LIBS
 LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit 
mbstowcs_l memmove poll pstat readlink setproctitle setsid shm_open sigprocmask 
symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
+AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit 
mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle 
setsid shm_open sigprocmask symlink sync_file_range towlower utime utimes 
wcstombs wcstombs_l])

[HACKERS] B-Tree index builds, CLUSTER, and sortsupport

Both Robert and Heikki expressed dissatisfaction with the fact that
B-Tree index builds don't use sortsupport. Because B-Tree index builds
cannot really use the onlyKey optimization, the historic oversight
of not supporting B-Tree builds (and CLUSTER) might have been at least
a little understandable [1]. But with the recent work on sortsupport
for text, it's likely that B-Tree index builds will be missing out on
rather a lot by not taking advantage of sortsupport.

Attached patch modifies tuplesort to correct this oversight. What's
really nice about it is that there is actually a net negative code
footprint:

 src/backend/access/nbtree/nbtsort.c  |  63 +++---
 src/backend/utils/sort/sortsupport.c |  77 ++--
 src/backend/utils/sort/tuplesort.c   | 274 +++
 src/include/utils/sortsupport.h  |   3 +
 4 files changed, 203 insertions(+), 214 deletions(-)

I've been able to throw out a lot of code, including two virtually
identical inlinable comparators (that have logic for NULL-wise
comparisons). This patch pretty much justifies itself as a refactoring
exercise, without performance improvements entering into it, which is
nice. I haven't benchmarked it yet, but it seems reasonable to assume
that much the same benefits will be seen as were seen for the
MinimalTuple case (for multiple-attribute sorts, that similarly cannot
use the onlyKey optimization).

With this patch, all callers of _bt_mkscankey_nodata() now use
sortsupport. This raises the question: Why not just have
_bt_mkscankey_nodata() do it all for us? I think that continuing to
have B-Tree provide a scanKey, and working off of that is a better
abstraction. It would save a few cycles to have the
index_getprocinfo() call currently within _bt_mkscankey_nodata() look
for BTSORTSUPPORT_PROC rather than BTORDER_PROC and be done with it,
but I'd call that a modularity violation. Tuplesort decides a strategy
(BTGreaterStrategyNumber or BTLessStrategyNumber) based on the scanKey
B-Tree private flag SK_BT_DESC, and it's appropriate for that to live
in tuplesort's head if possible, because tuplesort/sortsupport tracks
sort direction in a fairly intimate way. Besides, I think that
sortsupport already has enough clients for what it is.

I imagine it makes sense to directly access a sortsupport-installed
comparator through a scanKey, for actual index scans (think
abbreviated keys in internal B-Tree pages), but I still want
uniformity with the other cases within tuplesort (i.e. the
MinimalTuple and Datum cases), so I'm not about to change scanKeys to
have a comparator that doesn't need a fmgr trampoline for the benefit
of this patch. Note that I don't even store a scanKey within
Tuplesortstate anymore. That uniformity is what allowed to to throw
out the per-tuplesort-case reversedirection() logic, and use generic
reversedirection() logic instead (as anticipated by current comments
within Tuplesortstate).

Thoughts?

[1] 
http://www.postgresql.org/message-id/cam3swzqlg8nx2yeb+67xx0gig+-folfbtqjkg+jl15_zhi1...@mail.gmail.com
-- 
Peter Geoghegan
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
new file mode 100644
index e8a89d2..a6c44a7
*** a/src/backend/access/nbtree/nbtsort.c
--- b/src/backend/access/nbtree/nbtsort.c
*** _bt_load(BTWriteState *wstate, BTSpool *
*** 684,689 
--- 684,690 
  	int			i,
  keysz = RelationGetNumberOfAttributes(wstate-index);
  	ScanKey		indexScanKey = NULL;
+ 	SortSupport sortKeys;
  
  	if (merge)
  	{
*** _bt_load(BTWriteState *wstate, BTSpool *
*** 699,704 
--- 700,730 
  		true, should_free2);
  		indexScanKey = _bt_mkscankey_nodata(wstate-index);
  
+ 		/* Prepare SortSupport data for each column */
+ 		sortKeys = (SortSupport) palloc0(keysz * sizeof(SortSupportData));
+ 
+ 		for (i = 0; i  keysz; i++)
+ 		{
+ 			SortSupport sortKey = sortKeys + i;
+ 			ScanKey		scanKey = indexScanKey + i;
+ 			int16		strategy;
+ 
+ 			sortKey-ssup_cxt = CurrentMemoryContext;
+ 			sortKey-ssup_collation = scanKey-sk_collation;
+ 			sortKey-ssup_nulls_first =
+ (scanKey-sk_flags  SK_BT_NULLS_FIRST) != 0;
+ 			sortKey-ssup_attno = scanKey-sk_attno;
+ 
+ 			AssertState(sortKey-ssup_attno != 0);
+ 
+ 			strategy  = (scanKey-sk_flags  SK_BT_DESC) != 0 ?
+ BTGreaterStrategyNumber : BTLessStrategyNumber;
+ 
+ 			PrepareSortSupportFromIndexRel(wstate-index, strategy, sortKey);
+ 		}
+ 
+ 		_bt_freeskey(indexScanKey);
+ 
  		for (;;)
  		{
  			load1 = true;		/* load BTSpool next ? */
*** _bt_load(BTWriteState *wstate, BTSpool *
*** 711,753 
  			{
  for (i = 1; i = keysz; i++)
  {
! 	ScanKey		entry;
  	Datum		attrDatum1,
  attrDatum2;
  	bool		isNull1,
  isNull2;
  	int32		compare;
  
! 	entry = indexScanKey + i - 1;
  	attrDatum1 = index_getattr(itup, i, tupdes, isNull1);
  	attrDatum2 = index_getattr(itup2, i, tupdes, isNull2);
- 	if (isNull1)
- 	{
-

Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Peter Eisentraut

On 10/10/14 6:53 PM, Stephen Frost wrote:
 I'm not particularly thrilled with this answer.  I'd aruge that the
 'materialized' part of mat views isn't relevant to the standard, which
 does not concern itself with such performance-oriented considerations,
 and therefore, to the standard's view (pun rather intended), they're
 views.

For example, you can't drop a materialized view with DROP VIEW.  So any
tool that offers a list of views to manipulate based on the information
schema would be confused.  This is different from temporary views, for
example.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views don't show up in information_schema

* Peter Eisentraut (pete...@gmx.net) wrote:
 On 10/10/14 6:53 PM, Stephen Frost wrote:
  I'm not particularly thrilled with this answer.  I'd aruge that the
  'materialized' part of mat views isn't relevant to the standard, which
  does not concern itself with such performance-oriented considerations,
  and therefore, to the standard's view (pun rather intended), they're
  views.
 
 For example, you can't drop a materialized view with DROP VIEW.  So any
 tool that offers a list of views to manipulate based on the information
 schema would be confused.  This is different from temporary views, for
 example.

And users will be confused when using a tool which doesn't see mat
views, which is what started this thread.  Overall, I'm inclined to view
materialized views as a subset of views, which would mean that we'd
perhaps add the ability to drop them with 'drop view'.

As a comparison, what about unlogged tables?  They're not normal tables
and they aren't defined by the SQL standard either.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund and...@2ndquadrant.com
wrote:
On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com
wrote:
Observations
--
a. The patch performs really well (increase upto ~40%) incase all
the
data fits in shared buffers (scale factor -100).
b. Incase data doesn't fit in shared buffers, but fits in RAM
(scale factor -3000), there is performance increase upto 16 client
count,
however after that it starts dipping (in above config unto ~4.4%).

Hm. Interesting. I don't see that dip on x86.

Is it possible that implementation of some atomic operation is costlier
for particular architecture?

Yes, sure. And IIRC POWER improved atomics performance considerably for
POWER8...

I have tried again for scale factor 3000 and could see the dip and this
time I have even tried with 175 client count and the dip is
approximately
5% which is slightly more than 160 client count.

FWIW, the profile always looks like:

For my tests on Power8, the profile looks somewhat similar to below
profile mentioned by you, please see this mail:
http://www.postgresql.org/message-id/caa4ek1je9zblhsfiavhd18gdwxux21zfqpjgq_dz_zoa35n...@mail.gmail.com

However on Power7, the profile looks different which I have
posted above thread.

BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
smarter.

Yeah, even bgreclaimer patch is able to achieve the same, however
after that the contention moves to somewhere else as you can see
in above link.

Here it goes..

Lwlock_contention patches - client_count=128
--

+ 7.95% postgres postgres [.] GetSnapshotData
+ 3.58% postgres postgres [.] AllocSetAlloc
+ 2.51% postgres postgres [.] _bt_compare
+ 2.44% postgres postgres [.]
hash_search_with_hash_value
+ 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user
+ 2.24% postgres postgres [.] AllocSetFreeIndex
+ 1.75% postgres postgres [.]
pg_atomic_fetch_add_u32_impl

Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
the compiler settings you used?

Nothing specific, for performance tests where I have to take profiles
I use below:
./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer
make

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
 On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
   On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund and...@2ndquadrant.com
   wrote:
 Observations
 --
 a. The patch performs really well (increase upto ~40%) incase all
 the
 data fits in shared buffers (scale factor -100).
 b. Incase data doesn't fit in shared buffers, but fits in RAM
 (scale factor -3000), there is performance increase upto 16 client
   count,
 however after that it starts dipping (in above config unto ~4.4%).
   
Hm. Interesting. I don't see that dip on x86.
  
   Is it possible that implementation of some atomic operation is costlier
   for particular architecture?
 
  Yes, sure. And IIRC POWER improved atomics performance considerably for
  POWER8...
 
   I have tried again for scale factor 3000 and could see the dip and this
   time I have even tried with 175 client count and the dip is
 approximately
   5% which is slightly more than 160 client count.

I've run some short tests on hydra:

scale 1000:

base:
4GB:
tps = 296273.004800 (including connections establishing)
tps = 296373.978100 (excluding connections establishing)

8GB:
tps = 338001.455970 (including connections establishing)
tps = 338177.439106 (excluding connections establishing)

base + freelist:
4GB:
tps = 297057.523528 (including connections establishing)
tps = 297156.987418 (excluding connections establishing)

8GB:
tps = 335123.867097 (including connections establishing)
tps = 335239.122472 (excluding connections establishing)

base + LW_SHARED:
4GB:
tps = 296262.164455 (including connections establishing)
tps = 296357.524819 (excluding connections establishing)
8GB:
tps = 336988.744742 (including connections establishing)
tps = 337097.836395 (excluding connections establishing)

base + LW_SHARED + freelist:
4GB:
tps = 296887.981743 (including connections establishing)
tps = 296980.231853 (excluding connections establishing)

8GB:
tps = 345049.062898 (including connections establishing)
tps = 345161.947055 (excluding connections establishing)

I've also run some preliminary tests using scale=3000 - and I couldn't
see a performance difference either.

Note that all these are noticeably faster than your results.

  
   Lwlock_contention patches - client_count=128
   --
  
   +   7.95%  postgres  postgres   [.] GetSnapshotData
   +   3.58%  postgres  postgres   [.] AllocSetAlloc
   +   2.51%  postgres  postgres   [.] _bt_compare
   +   2.44%  postgres  postgres   [.]
   hash_search_with_hash_value
   +   2.33%  postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
   +   2.24%  postgres  postgres   [.] AllocSetFreeIndex
   +   1.75%  postgres  postgres   [.]
   pg_atomic_fetch_add_u32_impl
 
  Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
  the compiler settings you used?
 
 Nothing specific, for performance tests where I have to take profiles
 I use below:
 ./configure --prefix=installation_path CFLAGS=-fno-omit-frame-pointer
 make

Hah. Doing so overwrites the CFLAGS configure normally sets. Check
# CFLAGS are selected so:
# If the user specifies something in the environment, that is used.
# else:  If the template file set something, that is used.
# else:  If coverage was enabled, don't set anything.
# else:  If the compiler is GCC, then we use -O2.
# else:  If the compiler is something else, then we use -O, unless debugging.

so, if you do like above, you're compiling without optimizations... So,
include at least -O2 as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] orangutan seizes up during isolation-check

2014-10-10 Thread Peter Eisentraut

On 10/10/14 8:24 PM, Noah Misch wrote:
 Here's an implementation thereof covering both backend and frontend use of
 setlocale().  A setlocale() wrapper, pg_setlocale(), injects use of the child
 process where necessary.

Would such a change not be better in gnulib itself?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Sehrope Sarkuni

Another example is an ETL tool or ORM that queries the data dictionary to 
generate SQL or object models.

If mviews show up in the regular information_schema views (specifically the 
.columns view) then it'll just work as if it's a regular table or view (ex: 
like foreign tables).

If not, it's possible to use the internal pg_ catalog tables to get the column 
details but it'd require PG specific coding.

In our case I already plan on using the internal catalog tables to display 
mview metadata. I brought this up thinking about existing apps and tools being 
able to work with mviews without additional code changes.

Regards,
-- Sehrope Sarkuni

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
 I've run some short tests on hydra:

 scale 1000:

 base:
 4GB:
 tps = 296273.004800 (including connections establishing)
 tps = 296373.978100 (excluding connections establishing)

 8GB:
 tps = 338001.455970 (including connections establishing)
 tps = 338177.439106 (excluding connections establishing)

 base + freelist:
 4GB:
 tps = 297057.523528 (including connections establishing)
 tps = 297156.987418 (excluding connections establishing)

 8GB:
 tps = 335123.867097 (including connections establishing)
 tps = 335239.122472 (excluding connections establishing)

 base + LW_SHARED:
 4GB:
 tps = 296262.164455 (including connections establishing)
 tps = 296357.524819 (excluding connections establishing)
 8GB:
 tps = 336988.744742 (including connections establishing)
 tps = 337097.836395 (excluding connections establishing)

 base + LW_SHARED + freelist:
 4GB:
 tps = 296887.981743 (including connections establishing)
 tps = 296980.231853 (excluding connections establishing)

 8GB:
 tps = 345049.062898 (including connections establishing)
 tps = 345161.947055 (excluding connections establishing)

 I've also run some preliminary tests using scale=3000 - and I couldn't
 see a performance difference either.

 Note that all these are noticeably faster than your results.

What is the client count?
Could you please post numbers you are getting for 3000 scale
factor for client count 128 and 175?

  Nothing specific, for performance tests where I have to take profiles
  I use below:
  ./configure --prefix=installation_path
CFLAGS=-fno-omit-frame-pointer
  make

 Hah. Doing so overwrites the CFLAGS configure normally sets. Check
 # CFLAGS are selected so:
 # If the user specifies something in the environment, that is used.
 # else:  If the template file set something, that is used.
 # else:  If coverage was enabled, don't set anything.
 # else:  If the compiler is GCC, then we use -O2.
 # else:  If the compiler is something else, then we use -O, unless
debugging.

 so, if you do like above, you're compiling without optimizations... So,
 include at least -O2 as well.

Hmm. okay, but is this required when we do actual performance
tests, because for that currently I don't use CFLAGS.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Peter Eisentraut

On 10/10/14 8:44 PM, Stephen Frost wrote:
 As a comparison, what about unlogged tables?  They're not normal tables
 and they aren't defined by the SQL standard either.

They are normal tables when considered within the scope of the SQL
standard.  The only difference to normal tables is their crash recovery
behavior, which is outside the scope of the SQL standard.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

On 2014-10-11 06:49:54 +0530, Amit Kapila wrote:
 On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund and...@2ndquadrant.com
 wrote:
 
  On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
  I've run some short tests on hydra:
 
  scale 1000:
 
  base:
  4GB:
  tps = 296273.004800 (including connections establishing)
  tps = 296373.978100 (excluding connections establishing)
 
  8GB:
  tps = 338001.455970 (including connections establishing)
  tps = 338177.439106 (excluding connections establishing)
 
  base + freelist:
  4GB:
  tps = 297057.523528 (including connections establishing)
  tps = 297156.987418 (excluding connections establishing)
 
  8GB:
  tps = 335123.867097 (including connections establishing)
  tps = 335239.122472 (excluding connections establishing)
 
  base + LW_SHARED:
  4GB:
  tps = 296262.164455 (including connections establishing)
  tps = 296357.524819 (excluding connections establishing)
  8GB:
  tps = 336988.744742 (including connections establishing)
  tps = 337097.836395 (excluding connections establishing)
 
  base + LW_SHARED + freelist:
  4GB:
  tps = 296887.981743 (including connections establishing)
  tps = 296980.231853 (excluding connections establishing)
 
  8GB:
  tps = 345049.062898 (including connections establishing)
  tps = 345161.947055 (excluding connections establishing)
 
  I've also run some preliminary tests using scale=3000 - and I couldn't
  see a performance difference either.
 
  Note that all these are noticeably faster than your results.
 
 What is the client count?

160, because that was the one you reported the biggest regression.

 Could you please post numbers you are getting for 3000 scale
 factor for client count 128 and 175?

Yes, although not tonight Also from hydra?

   Nothing specific, for performance tests where I have to take profiles
   I use below:
   ./configure --prefix=installation_path
 CFLAGS=-fno-omit-frame-pointer
   make
 
  Hah. Doing so overwrites the CFLAGS configure normally sets. Check
  # CFLAGS are selected so:
  # If the user specifies something in the environment, that is used.
  # else:  If the template file set something, that is used.
  # else:  If coverage was enabled, don't set anything.
  # else:  If the compiler is GCC, then we use -O2.
  # else:  If the compiler is something else, then we use -O, unless
 debugging.
 
  so, if you do like above, you're compiling without optimizations... So,
  include at least -O2 as well.
 
 Hmm. okay, but is this required when we do actual performance
 tests, because for that currently I don't use CFLAGS.

I'm not sure what you mean? You need to include -O2 in CFLAGS whenever
you override it. Your profile was clearly without inlining... And since
your general performance numbers are a fair bit lower than what I see
with, hopefully, the same code on the same machine...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9