Re: [HACKERS] orangutan seizes up during isolation-check

2014-10-10 Thread Noah Misch
On Fri, Oct 10, 2014 at 09:14:38PM -0400, Peter Eisentraut wrote:
> On 10/10/14 8:24 PM, Noah Misch wrote:
> > Here's an implementation thereof covering both backend and frontend use of
> > setlocale().  A setlocale() wrapper, pg_setlocale(), injects use of the 
> > child
> > process where necessary.
> 
> Would such a change not be better in gnulib itself?

Good question.  It would be nice to make the change there, for the benefit of
other consumers.  The patch's setlocale_native_forked() assumes it never runs
in a multithreaded process, but libintl_setlocale() must not assume that.  I
see a few ways libintl/gnulib might proceed:

1) exec() a helper program to run CFLocaleCopyCurrent().  Distributing that
   executable alongside libintl and locating it at runtime is daunting.

2) Audit the CFLocaleCopyCurrent() code to see if it will, in practice, run
   reliably in a fork of a multithreaded process.  (That conclusion could
   change without notice.)  Use the setlocale_native_forked() technique.

3) Don't change libintl_setlocale(), but add a libintl_setlocale_nothread()
   for single-threaded consumers to adopt.  Each consumer will need to modify
   code.  libintl has a lean API; I expect adding this would be controversial.

Among those, I would probably adopt (2).  We know much about the process
conditions in which pg_setlocale() will run, so placing the workaround in
PostgreSQL erases the problem of (2).  I'm inclined to stick with placing the
workaround in PostgreSQL, but there are fair arguments on both sides.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Amit Kapila
On Sat, Oct 11, 2014 at 7:00 AM, Andres Freund 
wrote:
> On 2014-10-11 06:49:54 +0530, Amit Kapila wrote:
> > On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund 
> > wrote:
> > >
> > > On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
> > > I've run some short tests on hydra:
> > >
>
> > Could you please post numbers you are getting for 3000 scale
> > factor for client count 128 and 175?
>
> Yes, although not tonight

No issues, whenever you get it.

> Also from hydra?

Yes.  One more thing I would like to share with you is that while doing
tests, there are some other settings change in postgresql.conf

maintenance_work_mem = 1GB
synchronous_commit = off
wal_writer_delay = 20ms
checkpoint_segments=256
checkpoint_timeout=35min

I don't think these parameters matter for the tests we are doing, but
still I thought it is good to share, because I forgot to send some of
these non-default settings in previous mail.

> > > > Nothing specific, for performance tests where I have to take
profiles
> > > > I use below:
> > > > ./configure --prefix=
> > CFLAGS="-fno-omit-frame-pointer"
> > > > make
> > >
> > > Hah. Doing so overwrites the CFLAGS configure normally sets. Check
> > > # CFLAGS are selected so:
> > > # If the user specifies something in the environment, that is used.
> > > # else:  If the template file set something, that is used.
> > > # else:  If coverage was enabled, don't set anything.
> > > # else:  If the compiler is GCC, then we use -O2.
> > > # else:  If the compiler is something else, then we use -O, unless
> > debugging.
> > >
> > > so, if you do like above, you're compiling without optimizations...
So,
> > > include at least -O2 as well.
> >
> > Hmm. okay, but is this required when we do actual performance
> > tests, because for that currently I don't use CFLAGS.
>
> I'm not sure what you mean? You need to include -O2 in CFLAGS whenever
> you override it.

okay, thats what I wanted to ask you, so that we should not see different
numbers due to the way code is built.

When I do performance tests where I don't want to see profile,
I use below statement:
./configure --prefix=

> And since
> your general performance numbers are a fair bit lower than what I see
> with, hopefully, the same code on the same machine...

You have reported numbers at 1000 scale factor and mine were
at 3000 scale factor, so I think the difference is expected.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Andres Freund
On 2014-10-11 06:49:54 +0530, Amit Kapila wrote:
> On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund 
> wrote:
> >
> > On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
> > I've run some short tests on hydra:
> >
> > scale 1000:
> >
> > base:
> > 4GB:
> > tps = 296273.004800 (including connections establishing)
> > tps = 296373.978100 (excluding connections establishing)
> >
> > 8GB:
> > tps = 338001.455970 (including connections establishing)
> > tps = 338177.439106 (excluding connections establishing)
> >
> > base + freelist:
> > 4GB:
> > tps = 297057.523528 (including connections establishing)
> > tps = 297156.987418 (excluding connections establishing)
> >
> > 8GB:
> > tps = 335123.867097 (including connections establishing)
> > tps = 335239.122472 (excluding connections establishing)
> >
> > base + LW_SHARED:
> > 4GB:
> > tps = 296262.164455 (including connections establishing)
> > tps = 296357.524819 (excluding connections establishing)
> > 8GB:
> > tps = 336988.744742 (including connections establishing)
> > tps = 337097.836395 (excluding connections establishing)
> >
> > base + LW_SHARED + freelist:
> > 4GB:
> > tps = 296887.981743 (including connections establishing)
> > tps = 296980.231853 (excluding connections establishing)
> >
> > 8GB:
> > tps = 345049.062898 (including connections establishing)
> > tps = 345161.947055 (excluding connections establishing)
> >
> > I've also run some preliminary tests using scale=3000 - and I couldn't
> > see a performance difference either.
> >
> > Note that all these are noticeably faster than your results.
> 
> What is the client count?

160, because that was the one you reported the biggest regression.

> Could you please post numbers you are getting for 3000 scale
> factor for client count 128 and 175?

Yes, although not tonight Also from hydra?

> > > Nothing specific, for performance tests where I have to take profiles
> > > I use below:
> > > ./configure --prefix=
> CFLAGS="-fno-omit-frame-pointer"
> > > make
> >
> > Hah. Doing so overwrites the CFLAGS configure normally sets. Check
> > # CFLAGS are selected so:
> > # If the user specifies something in the environment, that is used.
> > # else:  If the template file set something, that is used.
> > # else:  If coverage was enabled, don't set anything.
> > # else:  If the compiler is GCC, then we use -O2.
> > # else:  If the compiler is something else, then we use -O, unless
> debugging.
> >
> > so, if you do like above, you're compiling without optimizations... So,
> > include at least -O2 as well.
> 
> Hmm. okay, but is this required when we do actual performance
> tests, because for that currently I don't use CFLAGS.

I'm not sure what you mean? You need to include -O2 in CFLAGS whenever
you override it. Your profile was clearly without inlining... And since
your general performance numbers are a fair bit lower than what I see
with, hopefully, the same code on the same machine...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Peter Eisentraut
On 10/10/14 8:44 PM, Stephen Frost wrote:
> As a comparison, what about unlogged tables?  They're not normal tables
> and they aren't defined by the SQL standard either.

They are normal tables when considered within the scope of the SQL
standard.  The only difference to normal tables is their crash recovery
behavior, which is outside the scope of the SQL standard.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Amit Kapila
On Sat, Oct 11, 2014 at 6:29 AM, Andres Freund 
wrote:
>
> On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
> I've run some short tests on hydra:
>
> scale 1000:
>
> base:
> 4GB:
> tps = 296273.004800 (including connections establishing)
> tps = 296373.978100 (excluding connections establishing)
>
> 8GB:
> tps = 338001.455970 (including connections establishing)
> tps = 338177.439106 (excluding connections establishing)
>
> base + freelist:
> 4GB:
> tps = 297057.523528 (including connections establishing)
> tps = 297156.987418 (excluding connections establishing)
>
> 8GB:
> tps = 335123.867097 (including connections establishing)
> tps = 335239.122472 (excluding connections establishing)
>
> base + LW_SHARED:
> 4GB:
> tps = 296262.164455 (including connections establishing)
> tps = 296357.524819 (excluding connections establishing)
> 8GB:
> tps = 336988.744742 (including connections establishing)
> tps = 337097.836395 (excluding connections establishing)
>
> base + LW_SHARED + freelist:
> 4GB:
> tps = 296887.981743 (including connections establishing)
> tps = 296980.231853 (excluding connections establishing)
>
> 8GB:
> tps = 345049.062898 (including connections establishing)
> tps = 345161.947055 (excluding connections establishing)
>
> I've also run some preliminary tests using scale=3000 - and I couldn't
> see a performance difference either.
>
> Note that all these are noticeably faster than your results.

What is the client count?
Could you please post numbers you are getting for 3000 scale
factor for client count 128 and 175?

> > Nothing specific, for performance tests where I have to take profiles
> > I use below:
> > ./configure --prefix=
CFLAGS="-fno-omit-frame-pointer"
> > make
>
> Hah. Doing so overwrites the CFLAGS configure normally sets. Check
> # CFLAGS are selected so:
> # If the user specifies something in the environment, that is used.
> # else:  If the template file set something, that is used.
> # else:  If coverage was enabled, don't set anything.
> # else:  If the compiler is GCC, then we use -O2.
> # else:  If the compiler is something else, then we use -O, unless
debugging.
>
> so, if you do like above, you're compiling without optimizations... So,
> include at least -O2 as well.

Hmm. okay, but is this required when we do actual performance
tests, because for that currently I don't use CFLAGS.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Sehrope Sarkuni
Another example is an ETL tool or ORM that queries the data dictionary to 
generate SQL or object models.

If mviews show up in the regular information_schema views (specifically the 
.columns view) then it'll just work as if it's a regular table or view (ex: 
like foreign tables).

If not, it's possible to use the internal pg_ catalog tables to get the column 
details but it'd require PG specific coding.

In our case I already plan on using the internal catalog tables to display 
mview metadata. I brought this up thinking about existing apps and tools being 
able to work with mviews without additional code changes.

Regards,
-- Sehrope Sarkuni

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] orangutan seizes up during isolation-check

2014-10-10 Thread Peter Eisentraut
On 10/10/14 8:24 PM, Noah Misch wrote:
> Here's an implementation thereof covering both backend and frontend use of
> setlocale().  A setlocale() wrapper, pg_setlocale(), injects use of the child
> process where necessary.

Would such a change not be better in gnulib itself?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Andres Freund
On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
> On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund 
> wrote:
> > On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
> > > On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund 
> > > wrote:
> > > > > Observations
> > > > > --
> > > > > a. The patch performs really well (increase upto ~40%) incase all
> the
> > > > > data fits in shared buffers (scale factor -100).
> > > > > b. Incase data doesn't fit in shared buffers, but fits in RAM
> > > > > (scale factor -3000), there is performance increase upto 16 client
> > > count,
> > > > > however after that it starts dipping (in above config unto ~4.4%).
> > > >
> > > > Hm. Interesting. I don't see that dip on x86.
> > >
> > > Is it possible that implementation of some atomic operation is costlier
> > > for particular architecture?
> >
> > Yes, sure. And IIRC POWER improved atomics performance considerably for
> > POWER8...
> >
> > > I have tried again for scale factor 3000 and could see the dip and this
> > > time I have even tried with 175 client count and the dip is
> approximately
> > > 5% which is slightly more than 160 client count.

I've run some short tests on hydra:

scale 1000:

base:
4GB:
tps = 296273.004800 (including connections establishing)
tps = 296373.978100 (excluding connections establishing)

8GB:
tps = 338001.455970 (including connections establishing)
tps = 338177.439106 (excluding connections establishing)

base + freelist:
4GB:
tps = 297057.523528 (including connections establishing)
tps = 297156.987418 (excluding connections establishing)

8GB:
tps = 335123.867097 (including connections establishing)
tps = 335239.122472 (excluding connections establishing)

base + LW_SHARED:
4GB:
tps = 296262.164455 (including connections establishing)
tps = 296357.524819 (excluding connections establishing)
8GB:
tps = 336988.744742 (including connections establishing)
tps = 337097.836395 (excluding connections establishing)

base + LW_SHARED + freelist:
4GB:
tps = 296887.981743 (including connections establishing)
tps = 296980.231853 (excluding connections establishing)

8GB:
tps = 345049.062898 (including connections establishing)
tps = 345161.947055 (excluding connections establishing)

I've also run some preliminary tests using scale=3000 - and I couldn't
see a performance difference either.

Note that all these are noticeably faster than your results.

> > >
> > > Lwlock_contention patches - client_count=128
> > > --
> > >
> > > +   7.95%  postgres  postgres   [.] GetSnapshotData
> > > +   3.58%  postgres  postgres   [.] AllocSetAlloc
> > > +   2.51%  postgres  postgres   [.] _bt_compare
> > > +   2.44%  postgres  postgres   [.]
> > > hash_search_with_hash_value
> > > +   2.33%  postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
> > > +   2.24%  postgres  postgres   [.] AllocSetFreeIndex
> > > +   1.75%  postgres  postgres   [.]
> > > pg_atomic_fetch_add_u32_impl
> >
> > Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
> > the compiler settings you used?
> 
> Nothing specific, for performance tests where I have to take profiles
> I use below:
> ./configure --prefix= CFLAGS="-fno-omit-frame-pointer"
> make

Hah. Doing so overwrites the CFLAGS configure normally sets. Check
# CFLAGS are selected so:
# If the user specifies something in the environment, that is used.
# else:  If the template file set something, that is used.
# else:  If coverage was enabled, don't set anything.
# else:  If the compiler is GCC, then we use -O2.
# else:  If the compiler is something else, then we use -O, unless debugging.

so, if you do like above, you're compiling without optimizations... So,
include at least -O2 as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Amit Kapila
On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund 
wrote:
> On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
> > On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund 
> > wrote:
> > > > Observations
> > > > --
> > > > a. The patch performs really well (increase upto ~40%) incase all
the
> > > > data fits in shared buffers (scale factor -100).
> > > > b. Incase data doesn't fit in shared buffers, but fits in RAM
> > > > (scale factor -3000), there is performance increase upto 16 client
> > count,
> > > > however after that it starts dipping (in above config unto ~4.4%).
> > >
> > > Hm. Interesting. I don't see that dip on x86.
> >
> > Is it possible that implementation of some atomic operation is costlier
> > for particular architecture?
>
> Yes, sure. And IIRC POWER improved atomics performance considerably for
> POWER8...
>
> > I have tried again for scale factor 3000 and could see the dip and this
> > time I have even tried with 175 client count and the dip is
approximately
> > 5% which is slightly more than 160 client count.
>
> FWIW, the profile always looks like:

For my tests on Power8, the profile looks somewhat similar to below
profile mentioned by you, please see this mail:
http://www.postgresql.org/message-id/caa4ek1je9zblhsfiavhd18gdwxux21zfqpjgq_dz_zoa35n...@mail.gmail.com

However on Power7, the profile looks different which I have
posted above thread.

>
> BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
> smarter.

Yeah, even bgreclaimer patch is able to achieve the same, however
after that the contention moves to somewhere else as you can see
in above link.

> >
> > Here it goes..
> >
> > Lwlock_contention patches - client_count=128
> > --
> >
> > +   7.95%  postgres  postgres   [.] GetSnapshotData
> > +   3.58%  postgres  postgres   [.] AllocSetAlloc
> > +   2.51%  postgres  postgres   [.] _bt_compare
> > +   2.44%  postgres  postgres   [.]
> > hash_search_with_hash_value
> > +   2.33%  postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
> > +   2.24%  postgres  postgres   [.] AllocSetFreeIndex
> > +   1.75%  postgres  postgres   [.]
> > pg_atomic_fetch_add_u32_impl
>
> Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
> the compiler settings you used?

Nothing specific, for performance tests where I have to take profiles
I use below:
./configure --prefix= CFLAGS="-fno-omit-frame-pointer"
make


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Stephen Frost
* Peter Eisentraut (pete...@gmx.net) wrote:
> On 10/10/14 6:53 PM, Stephen Frost wrote:
> > I'm not particularly thrilled with this answer.  I'd aruge that the
> > 'materialized' part of mat views isn't relevant to the standard, which
> > does not concern itself with such performance-oriented considerations,
> > and therefore, to the standard's view (pun rather intended), they're
> > views.
> 
> For example, you can't drop a materialized view with DROP VIEW.  So any
> tool that offers a list of views to manipulate based on the information
> schema would be confused.  This is different from temporary views, for
> example.

And users will be confused when using a tool which doesn't see mat
views, which is what started this thread.  Overall, I'm inclined to view
materialized views as a subset of views, which would mean that we'd
perhaps add the ability to drop them with 'drop view'.

As a comparison, what about unlogged tables?  They're not normal tables
and they aren't defined by the SQL standard either.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Peter Eisentraut
On 10/10/14 6:53 PM, Stephen Frost wrote:
> I'm not particularly thrilled with this answer.  I'd aruge that the
> 'materialized' part of mat views isn't relevant to the standard, which
> does not concern itself with such performance-oriented considerations,
> and therefore, to the standard's view (pun rather intended), they're
> views.

For example, you can't drop a materialized view with DROP VIEW.  So any
tool that offers a list of views to manipulate based on the information
schema would be confused.  This is different from temporary views, for
example.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] B-Tree index builds, CLUSTER, and sortsupport

2014-10-10 Thread Peter Geoghegan
Both Robert and Heikki expressed dissatisfaction with the fact that
B-Tree index builds don't use sortsupport. Because B-Tree index builds
cannot really use the "onlyKey" optimization, the historic oversight
of not supporting B-Tree builds (and CLUSTER) might have been at least
a little understandable [1]. But with the recent work on sortsupport
for text, it's likely that B-Tree index builds will be missing out on
rather a lot by not taking advantage of sortsupport.

Attached patch modifies tuplesort to correct this oversight. What's
really nice about it is that there is actually a net negative code
footprint:

 src/backend/access/nbtree/nbtsort.c  |  63 +++---
 src/backend/utils/sort/sortsupport.c |  77 ++--
 src/backend/utils/sort/tuplesort.c   | 274 +++
 src/include/utils/sortsupport.h  |   3 +
 4 files changed, 203 insertions(+), 214 deletions(-)

I've been able to throw out a lot of code, including two virtually
identical inlinable comparators (that have logic for NULL-wise
comparisons). This patch pretty much justifies itself as a refactoring
exercise, without performance improvements entering into it, which is
nice. I haven't benchmarked it yet, but it seems reasonable to assume
that much the same benefits will be seen as were seen for the
MinimalTuple case (for multiple-attribute sorts, that similarly cannot
use the "onlyKey" optimization).

With this patch, all callers of _bt_mkscankey_nodata() now use
sortsupport. This raises the question: Why not just have
_bt_mkscankey_nodata() do it all for us? I think that continuing to
have B-Tree provide a scanKey, and working off of that is a better
abstraction. It would save a few cycles to have the
index_getprocinfo() call currently within _bt_mkscankey_nodata() look
for BTSORTSUPPORT_PROC rather than BTORDER_PROC and be done with it,
but I'd call that a modularity violation. Tuplesort decides a strategy
(BTGreaterStrategyNumber or BTLessStrategyNumber) based on the scanKey
B-Tree private flag SK_BT_DESC, and it's appropriate for that to live
in tuplesort's head if possible, because tuplesort/sortsupport tracks
sort "direction" in a fairly intimate way. Besides, I think that
sortsupport already has enough clients for what it is.

I imagine it makes sense to directly access a sortsupport-installed
comparator through a scanKey, for actual index scans (think
abbreviated keys in internal B-Tree pages), but I still want
uniformity with the other cases within tuplesort (i.e. the
MinimalTuple and Datum cases), so I'm not about to change scanKeys to
have a comparator that doesn't need a fmgr trampoline for the benefit
of this patch. Note that I don't even store a scanKey within
Tuplesortstate anymore. That uniformity is what allowed to to throw
out the per-tuplesort-case reversedirection() logic, and use generic
reversedirection() logic instead (as anticipated by current comments
within Tuplesortstate).

Thoughts?

[1] 
http://www.postgresql.org/message-id/cam3swzqlg8nx2yeb+67xx0gig+-folfbtqjkg+jl15_zhi1...@mail.gmail.com
-- 
Peter Geoghegan
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
new file mode 100644
index e8a89d2..a6c44a7
*** a/src/backend/access/nbtree/nbtsort.c
--- b/src/backend/access/nbtree/nbtsort.c
*** _bt_load(BTWriteState *wstate, BTSpool *
*** 684,689 
--- 684,690 
  	int			i,
  keysz = RelationGetNumberOfAttributes(wstate->index);
  	ScanKey		indexScanKey = NULL;
+ 	SortSupport sortKeys;
  
  	if (merge)
  	{
*** _bt_load(BTWriteState *wstate, BTSpool *
*** 699,704 
--- 700,730 
  		true, &should_free2);
  		indexScanKey = _bt_mkscankey_nodata(wstate->index);
  
+ 		/* Prepare SortSupport data for each column */
+ 		sortKeys = (SortSupport) palloc0(keysz * sizeof(SortSupportData));
+ 
+ 		for (i = 0; i < keysz; i++)
+ 		{
+ 			SortSupport sortKey = sortKeys + i;
+ 			ScanKey		scanKey = indexScanKey + i;
+ 			int16		strategy;
+ 
+ 			sortKey->ssup_cxt = CurrentMemoryContext;
+ 			sortKey->ssup_collation = scanKey->sk_collation;
+ 			sortKey->ssup_nulls_first =
+ (scanKey->sk_flags & SK_BT_NULLS_FIRST) != 0;
+ 			sortKey->ssup_attno = scanKey->sk_attno;
+ 
+ 			AssertState(sortKey->ssup_attno != 0);
+ 
+ 			strategy  = (scanKey->sk_flags & SK_BT_DESC) != 0 ?
+ BTGreaterStrategyNumber : BTLessStrategyNumber;
+ 
+ 			PrepareSortSupportFromIndexRel(wstate->index, strategy, sortKey);
+ 		}
+ 
+ 		_bt_freeskey(indexScanKey);
+ 
  		for (;;)
  		{
  			load1 = true;		/* load BTSpool next ? */
*** _bt_load(BTWriteState *wstate, BTSpool *
*** 711,753 
  			{
  for (i = 1; i <= keysz; i++)
  {
! 	ScanKey		entry;
  	Datum		attrDatum1,
  attrDatum2;
  	bool		isNull1,
  isNull2;
  	int32		compare;
  
! 	entry = indexScanKey + i - 1;
  	attrDatum1 = index_getattr(itup, i, tupdes, &isNull1);
  	attrDatum2 = index_getattr(itup2, i, tupdes, &isNull2);
- 	if (is

Re: [HACKERS] orangutan seizes up during isolation-check

2014-10-10 Thread Noah Misch
On Mon, Sep 15, 2014 at 12:51:14AM -0400, Noah Misch wrote:
> 1. Fork, call setlocale(LC_x, "") in the child, pass back the effective locale
>name through a pipe, and pass that name to setlocale() in the original
>process.  The short-lived child will get the extra threads, and the
>postmaster will remain clean.

Here's an implementation thereof covering both backend and frontend use of
setlocale().  A setlocale() wrapper, pg_setlocale(), injects use of the child
process where necessary.  I placed the new function in src/common/exec.c,
because every executable already needed that file's code and translatable
strings for set_pglocale_pgservice() and callees.  (Some executables do miss
exec.c translations; I did not update them.)  src/port/chklocale.c was closer
in subject matter, but libpq imports it; this function does not belong in
libpq.  Also, this function would benefit from a frontend ereport()
implementation, which is likely to land in libpgcommon if it lands at all.
The runtime changes are conditional on __darwin__ but not on --enable-nls.
NLS builds are the production norm; I'd like non-NLS builds to consistently
exercise this code rather than shave its bytes and cycles.

pg_setlocale() relies on main() setting all six LC_ environment
variables.  The second attached patch seals the cracks in main()'s
longstanding attempt to do so.  This improves preexisting user-visible
behavior in weird cases: "LANG=pt_BR.utf8 LC_ALL=invalid postgres -D nosuch"
will now print untranslated text, not pt_BR text.

Documentation for each of lc_monetary, lc_numeric and lc_time says "If this
variable is set to the empty string (which is the default) then the value is
inherited from the execution environment of the server in a system-dependent
way."  Not so; by design, setlocale(LC_x, "") just re-selects the last value L
passed to pg_perm_setlocale(LC_x, L).  I have not touched this; I mention it
for the sake of the archives.
commit cd27ff7 (HEAD)
Author: Noah Misch 
AuthorDate: Fri Oct 10 04:04:03 2014 -0400
Commit: Noah Misch 
CommitDate: Fri Oct 10 04:04:03 2014 -0400

When setlocale(LC_x, "") will start a thread, run it in a child process.

Only Darwin, --enable-nls builds use a setlocale() that can start a
thread.  Buildfarm member orangutan experienced BackendList corruption
on account of different postmaster threads executing signal handlers
simultaneously.  Furthermore, a multithreaded postmaster risks undefined
behavior from sigprocmask() and fork().

Introduce pg_setlocale(), a wrapper around setlocale().  When the
corresponding raw setlocale() call could start a thread, it forks,
retrieves the setlocale() return value from the child, and uses that
non-"" locale value to make an equivalent setlocale() call in the
original process.  The short-lived child does become multithreaded, but
it uses no feature for which that poses a problem.  Use of
pg_setlocale() in the frontend protects forking executables, such as
pg_dump, from undefined behavior.  As the usage guideline comment
implies, whether to call setlocale() or pg_setlocale() is a mere style
question for most code sites.  Opt for pg_setlocale() at indifferent
code sites, leaving raw setlocale() calls in src/interfaces, in
pg_setlocale() itself, and in callees of pg_setlocale().  Back-patch to
9.0 (all supported versions).

diff --git a/configure b/configure
index f0580ce..ee08963 100755
--- a/configure
+++ b/configure
@@ -11298,7 +11298,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit 
mbstowcs_l memmove poll pstat readlink setproctitle setsid shm_open sigprocmask 
symlink sync_file_range towlower utime utimes wcstombs wcstombs_l
+for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit 
mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle 
setsid shm_open sigprocmask symlink sync_file_range towlower utime utimes 
wcstombs wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index 527b076..2f8adfc 100644
--- a/configure.in
+++ b/configure.in
@@ -1257,7 +1257,7 @@ PGAC_FUNC_GETTIMEOFDAY_1ARG
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit 
mbstowcs_l memmove poll pstat readlink setproctitle setsid shm_open sigprocmask 
symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
+AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit 
mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle 
setsid shm_open sigprocmask symlink sync_file_range towlower utime utimes 
wcstombs wcstombs_l])
 
 AC_REPLACE_FUNCS(fseeko)
 c

Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Sehrope Sarkuni  writes:
> > I've been testing out some of the new materialized view functionality
> > in 9.4 and noticed that they don't show up in the information_schema
> > data dictionary views, specifically information_schema.tables or
> > information_schema.views (sample output is below and it's the same for
> > 9.3 as well).
> 
> > Is this on purpose or just an oversight?
> 
> It's on purpose.  The information_schema can only show objects that
> exist in the SQL standard.

I'm not particularly thrilled with this answer.  I'd aruge that the
'materialized' part of mat views isn't relevant to the standard, which
does not concern itself with such performance-oriented considerations,
and therefore, to the standard's view (pun rather intended), they're
views.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Tom Lane
Sehrope Sarkuni  writes:
> I've been testing out some of the new materialized view functionality
> in 9.4 and noticed that they don't show up in the information_schema
> data dictionary views, specifically information_schema.tables or
> information_schema.views (sample output is below and it's the same for
> 9.3 as well).

> Is this on purpose or just an oversight?

It's on purpose.  The information_schema can only show objects that
exist in the SQL standard.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Materialized views don't show up in information_schema

2014-10-10 Thread Sehrope Sarkuni
Hi,

I've been testing out some of the new materialized view functionality
in 9.4 and noticed that they don't show up in the information_schema
data dictionary views, specifically information_schema.tables or
information_schema.views (sample output is below and it's the same for
9.3 as well).

Is this on purpose or just an oversight?

I think they should at least be in the information_schema.views view,
though the case could be made that they should be in
information_schema.tables as well (as they can have additional
indexes). If they're added to the tables view then they should have a
separate table_type (say, "MATERIALIZED VIEW"), similar to how foreign
tables show up.

Looking at the view definitions, it looks like it's just a matter of
adding 'm' values to the IN clauses that filter pg_class.relkind. It
doesn't look like they're used internally by anything other than
tests. Only client apps that query the data dictionary views (ex: GUI
clients or dynamic SQL generators) would be impacted. I'd argue it's
for the better as they can they can now see those objects, even if
they think they're regular tables or views.

If this sound fine I can put together a patch for this.

Regards,
-- Sehrope Sarkuni


postgres@vagrant-ubuntu-trusty-64:~$ psql test
psql (9.4beta3)
Type "help" for help.

test=# CREATE TABLE some_table (x text);
CREATE TABLE

test=# CREATE MATERIALIZED VIEW some_mview AS SELECT * FROM some_table;
SELECT 0

test=# CREATE VIEW some_view AS SELECT * FROM some_table;
CREATE VIEW

test=# \d some_mview
Materialized view "public.some_mview"
 Column | Type | Modifiers
+--+---
 x  | text |

test=# SELECT table_name, table_type FROM information_schema.tables
WHERE table_schema = 'public';
 table_name | table_type
+
 some_table | BASE TABLE
 some_view  | VIEW
(2 rows)

test=# SELECT table_name, view_definition FROM
information_schema.views WHERE table_schema = 'public';
 table_name |   view_definition
+--
 some_view  |  SELECT some_table.x+
|FROM some_table;
(1 row)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 2:16 PM, Kevin Grittner  wrote:
>> Would you be okay with this never working with partial unique
>> indexes? That gives me something to work with.
>
> That seems like the only sensible course, to me.

Okay, great. If that's the consensus, then that's all I need to know.

>> I'm not that worried about expression indexes, actually. I'm mostly
>> worried about partial unique indexes, particularly when before insert
>> row-level triggers are in play

> There is a problem if any column of the index allows nulls, since
> that would allow multiple rows in the target table to match an
> argument. Proving that an expression does not could be tricky.  I
> suggest that, at least for a first cut, we restrict this to
> matching an index on NOT NULL columns.

Hmm. That could definitely be a problem. But the problem is clearly
the user's fault. It could allow multiple rows to "match", as you say
- but you're talking about a different concept of matching. Unlike
with certain other systems, that isn't a would-be unique violation,
and so it isn't a "match" in the sense I intend, and so you insert.
You don't match more than one row, because (in the sense peculiar to
this feature) there are no matches, and insertion really is the
correct outcome.

As a user, you have no more right to be surprised by this then by the
fact that you wouldn't see a dup violation with a similar vanilla
INSERT - some people are surprised by that, with regularity.

Maybe I need to emphasize the distinction in the documentation between
the two slightly different ideas of "matching" that are in tension
here. I think it would be a mistake to forcibly normalize things to
remove that distinction (by adding an artificial restriction), at any
rate.

>> Did you consider my example? I think that people will like this idea,
>> too - that clearly isn't the only consideration, though. As you say,
>> it would be very easy to implement this. However, IMV, we shouldn't,
>> because it is hazardous.
>
> To quote the cited case:
>
>> 1. Developer writes the query, and it works fine.
>>
>> 2. Some time later, the DBA adds an inserted_at column (those are
>> common). The DBA is not aware of the existence of this particular
>> query. The new column has a default value of now(), say.
>>
>> Should we UPDATE the inserted_at column to be NULL? Or (more
>> plausibly) the default value filled in by the INSERT? Or leave it be?
>> I think there is a case to be made for all of these behaviors, and
>> that tension makes me prefer to not do this at all. It's like
>> encouraging "SELECT *" queries in production, only worse.
>
> That does point out the problem, in general, with carrying values
> from a BEFORE INSERT trigger into an UPDATE.  Perhaps if the INSERT
> fails the UPDATE phase should start with the values specified in
> the first place, and not try to use anything returned by the BEFORE
> INSERT triggers?

I think that that behavior is far more problematic for numerous other
reasons, though. Potentially, you're not seeing what *really* caused
the conflict at all. I just don't think it's worth it to save a bit of
typing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 2:17 PM, Jim Nasby  wrote:
> That said, since the use case for UPSERT is different than both INSERT and
> UPDATE maybe it would be a good idea to have a separate trigger for them
> anyway.

-1. I think that having a separate UPSERT trigger is a bad idea. I'll
need to change the statement-level behavior to match what Kevin
described (fires twice, always, when INSERT ON CONFLICT UPDATE is
used).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan  wrote:
> On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas  wrote:
>> I think what's realistic here is that the patch isn't going to get
>> committed the way you have it today, because nobody else likes that
>> design.  That may be harsh, but I think it's accurate.
>
> I do think that's harsh, because it's unnecessary: I am looking for
> the best design. If you want to propose alternatives, great, but there
> is a reason why I've done things that way, and that should be
> acknowledged. I too think naming the unique index is ugly as sin, and
> have said as much multiple times. We're almost on the same page here.

I hope you can adjust to the feedback, because it would be
disappointing to have this feature slip from the release, which is
what will happen if the index name remains part of the syntax.

> Would you be okay with this never working with partial unique
> indexes? That gives me something to work with.

That seems like the only sensible course, to me.

>> If you want to allow this to work with expression indexes, that's not
>> really a problem; you can let the user write INSERT .. ON CONFLICT
>> (upper(foo)) UPDATE ... and match that to the index.  It wouldn't
>> bother me if the first version of this feature couldn't target
>> expression indexes anyway, but if you want to include that, having the
>> user mention the actual expression rather than the index name
>> shouldn't make much difference.
>
> I'm not that worried about expression indexes, actually. I'm mostly
> worried about partial unique indexes, particularly when before insert
> row-level triggers are in play (making *that* play nice with
> expression indexes is harder still, but expression indexes on their
> own are probably not that much of a problem).

There is a problem if any column of the index allows nulls, since
that would allow multiple rows in the target table to match an
argument. Proving that an expression does not could be tricky.  I
suggest that, at least for a first cut, we restrict this to
matching an index on NOT NULL columns.

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
VALUES (42, 'meaning of life')
ON DUPLICATE (key) UPDATE;
>
>> Your syntax allows the exact same thing; it simply require the user to
>> be more verbose in order to get that behavior.  If we think that
>> people wanting that behavior will be rare, then it's fine to require
>> them to spell it out when they want it.  If we think it will be the
>> overwhelming common application of this feature, and I do, then making
>> people spell it out when we could just as well infer it is pointless.

+1

> Did you consider my example? I think that people will like this idea,
> too - that clearly isn't the only consideration, though. As you say,
> it would be very easy to implement this. However, IMV, we shouldn't,
> because it is hazardous.

To quote the cited case:

> 1. Developer writes the query, and it works fine.
>
> 2. Some time later, the DBA adds an inserted_at column (those are
> common). The DBA is not aware of the existence of this particular
> query. The new column has a default value of now(), say.
>
> Should we UPDATE the inserted_at column to be NULL? Or (more
> plausibly) the default value filled in by the INSERT? Or leave it be?
> I think there is a case to be made for all of these behaviors, and
> that tension makes me prefer to not do this at all. It's like
> encouraging "SELECT *" queries in production, only worse.

That does point out the problem, in general, with carrying values
from a BEFORE INSERT trigger into an UPDATE.  Perhaps if the INSERT
fails the UPDATE phase should start with the values specified in
the first place, and not try to use anything returned by the BEFORE
INSERT triggers?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Jim Nasby

On 10/9/14, 6:59 PM, Gavin Flower wrote:

On 10/10/14 12:38, Jim Nasby wrote:

On 10/8/14, 5:51 PM, Peter Geoghegan wrote:

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner wrote:

>Although the last go-around does suggest that there is at least one
>point of difference on the semantics.  You seem to want to fire the
>BEFORE INSERT triggers before determining whether this will be an
>INSERT or an UPDATE.  That seems like a bad idea to me, but if the
>consensus is that we want to do that, it does argue for your plan
>of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.


FWIW, if each row was handled in a subtransaction then an insert that turned 
out to be an update could be rolled back... but the performance impact of going 
that route might be pretty horrid. :( There's also the potential to get stuck 
in a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a 
BEFORE UPDATE trigger turns it into an INSERT.

Perhaps you need an UPSERT trigger?


I would think that a BEFORE UPSERT trigger would very likely want to know 
whether we were inserting or updating, which basically puts us back where we 
started.

That said, since the use case for UPSERT is different than both INSERT and 
UPDATE maybe it would be a good idea to have a separate trigger for them anyway.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Simon Riggs
On 10 October 2014 19:26, Bruce Momjian  wrote:
> On Fri, Oct 10, 2014 at 02:05:05PM +0100, Thom Brown wrote:
>> On 10 October 2014 13:43, Simon Riggs  wrote:
>> > On 10 October 2014 11:45, Thom Brown  wrote:
>> >
>> >> To be honest, this all sounds rather flaky.

> My other concern is you must have realized these issues in five seconds
> too, so why didn't you mention them?

Because the problems that you come up with in 5 seconds aren't
necessarily problems. You just think they are, given 5 seconds
thought. I think my first impression of the concept was poor also
though it would be wonderful if I had remembered all of my initial
objections.

I didn't have any problem with Thom's first post, which was helpful in
allowing me to explain the context and details. As I said in reply at
that point, this is not in itself a barrier; other measures are
necessary. The rest of the thread has descended into a massive
misunderstanding of the purpose and role of redaction.

When any of us move too quickly to a value judgement about a new
concept then we're probably missing the point.

All of us will be asked at sometime in the next few years why Postgres
doesn't have redaction. When you get it, post back here please. Or if
you win the argument on it not being useful in any circumstance, post
that here also. I'm not in a rush.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
Rod,

* Rod Taylor (rod.tay...@gmail.com) wrote:
> For fun I gave the search a try.

Neat!

> On my laptop I can pull all 10,000 card numbers in less than 1 second. For
> a text based item I don't imagine it would be much different. Numbers are
> pretty easy to work with though.

I had been planning to give something like this a shot once I got back
from various meetings today- so thanks!  Being able to use the CC # *as*
the target for the binary search is definitely an issue, though looking
back on the overall problem space, CC's are less than 54 bits, and it's
actually a smaller space than than that if you know how they're put
together.

My thought on an attack was more along these lines:

select * from cards join (SELECT CAST(random() *  AS
bigint) a from generate_series(1,100)) as foo on (cards.cc = foo.a);

Which could pretty quickly find ~500 CC #s in a second or so (with a
'cards' table of about 1M entries) based on my testing.  That's clearly
sufficient enough to make it a viable attack also.

The next question I have is- do(es) the other vendor(s) provide a way to
address this or is it simply known that this doesn't offer any
protection at all from adhoc queries and it's strictly for formatting?
I can certainly imagine it actually being a way to simply avoid
*inadvertant* exposure rather than providing any security from the
individual running the commands.  I'm not sure that would make it
genuinely different enough from simply maintaining a view which does
that filtering to make it useful on its own as a feature though, but I'm
not particularly strongly against it either.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas  wrote:
> I think what's realistic here is that the patch isn't going to get
> committed the way you have it today, because nobody else likes that
> design.  That may be harsh, but I think it's accurate.

I do think that's harsh, because it's unnecessary: I am looking for
the best design. If you want to propose alternatives, great, but there
is a reason why I've done things that way, and that should be
acknowledged. I too think naming the unique index is ugly as sin, and
have said as much multiple times. We're almost on the same page here.

> But on the substance of the issue, I'm totally unconvinced by your
> argument in the linked email.  Let's suppose you have this:
>
> create table foo (a int, b int);
> create unique index foo1 on foo (a);
> create unique index foo2 on foo (a);
> create unique index foo3 on foo (a) where b = 1;
>
> Anything that conflicts in foo1 or foo2 is going to conflict in the
> other one, and with foo3.  Anything that conflicts in foo3 is perhaps
> also going to conflict in foo1 and foo2, or perhaps not.  Yet, if the
> statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
> what to do: when we hit a conflict in any of those three indexes,
> update the row.  No matter which index has the conflict, updating is
> the right answer, and solves the conflict in all three indexes.  I
> dunno what you'd expect someone to write in your syntax to solve this
> problem - presumably either (1) they can list any of the indexes that
> might conflict, (2) they must list all of the indexes that might
> conflict, or (3) it just doesn't work.

I expect them to name exactly one unique index. They should either do
that explicitly, or have one in mind and make the behavior implicit at
the risk of miscalculating (and having a surprising outcome). It
doesn't matter if it's foo1 or foo2 in this example (but foo3 is
different, obviously).

> Whatever the actual behavior
> of your patch is today, it seems to me that the conflict is,
> fundamentally, over a set of column values, NOT over a particular
> index.  The index is simply a mechanism for noticing that conflict.

I think that this is the kernel of our disagreement. The index is not
simply a mechanism for noticing that conflict. The user had better
have one unique index in mind to provoke taking the alternative path
in the event of a would-be unique violation. Clearly it doesn't matter
much in this particular example. But what if there were two partial
unique indexes, that were actually distinct, but only in terms of the
constant appearing in their predicate? And what about having that
changed by a before insert row-level trigger? Are we to defer deciding
which unique index to use at the last possible minute?

As always with this stuff, the devil is in the details. If we work out
the details, then I can come up with something that's acceptable to
everyone. Would you be okay with this never working with partial
unique indexes? That gives me something to work with.

> If you want to allow this to work with expression indexes, that's not
> really a problem; you can let the user write INSERT .. ON CONFLICT
> (upper(foo)) UPDATE ... and match that to the index.  It wouldn't
> bother me if the first version of this feature couldn't target
> expression indexes anyway, but if you want to include that, having the
> user mention the actual expression rather than the index name
> shouldn't make much difference.

I'm not that worried about expression indexes, actually. I'm mostly
worried about partial unique indexes, particularly when before insert
row-level triggers are in play (making *that* play nice with
expression indexes is harder still, but expression indexes on their
own are probably not that much of a problem).

>>> Also, how about making the SET clause optional, with the semantics
>>> that we just update all of the fields for which a value is explicitly
>>> specified:
>>>
>>> INSERT INTO overwrite_with_abandon (key, value)
>>> VALUES (42, 'meaning of life')
>>> ON DUPLICATE (key) UPDATE;

> Your syntax allows the exact same thing; it simply require the user to
> be more verbose in order to get that behavior.  If we think that
> people wanting that behavior will be rare, then it's fine to require
> them to spell it out when they want it.  If we think it will be the
> overwhelming common application of this feature, and I do, then making
> people spell it out when we could just as well infer it is pointless.

Did you consider my example? I think that people will like this idea,
too - that clearly isn't the only consideration, though. As you say,
it would be very easy to implement this. However, IMV, we shouldn't,
because it is hazardous. MySQL doesn't allow this, and they tend to
find expedients like this useful.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 12:04 PM, Kevin Grittner  wrote:
> Peter Geoghegan  wrote:
>
>> There is no danger of UPDATE before row-level triggers firing
>> without then updating (unless the xact aborts, but you know what
>> I mean).
>
> Well, let's make sure I do know what you mean.  If a BEFORE UPDATE
> ... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped
> without error, right?  The BEFORE UPDATE triggers will run before
> the UPDATE if a duplicate is found, and the return value will be
> treated in the usual manner, replacing values and potentially
> skipping the update?

That's exactly right, yes (in particular, you can return NULL from
before row update triggers and have that cancel the update in the
usual manner). And potentially, the final value could be affected by
both the before row insert and before row update triggers (by way of
CONFLICTING()). That's the most surprising aspect of what I've done
(although I would argue it's the least surprising behavior possible
given my constraints).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:55 AM, Kevin Grittner  wrote:
> I haven't read the patch, but the discussion has made that clear,
> yes.  Try to take agreement on a point gracefully, will ya?  ;-)

Heh, sorry. I did literally mean what I said - it wasn't 100% clear to
me that you got that. It's safest to not make any assumptions like
that.

>> AFAICT the only confusion that my patch has is with statement-level
>> triggers, as discussed on the Wiki page. I think that the row-level
>> trigger behavior is fine; a lot more thought has gone into it.
>
> IMV it is clear that since the standard says that update or insert
> operations which affect zero rows fire the corresponding trigger,
> the statement level triggers for both should fire for an UPSERT,
> even if the set of affected rows for one or the other (or both!) is
> zero.  If we ever get transition relations, it will be easy to
> check the count of affected rows or otherwise behave appropriately
> based on what was done.

I think that you're probably right about that. Note that UPDATE rules
will not be applied to the "UPDATE portion" of the statement. Not sure
what to do about that, but I'm pretty sure that it's inevitable,
because the UPDATE is effectively the same top-level statement for the
purposes of rules.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan  wrote:
> On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner  wrote:

>> [discussion on committers rejecting the notion of a syntax 
>>  involving specification of an index name]

> as I've said many times, there are problems with naming
> columns directly that need to be addressed. There are corner-cases.
> Are you going to complain about those once I go implement something?

If I don't like what I see, yes.  I think it will be entirely
possible for you to write something along those lines that I won't
complain about.

> I think we can just throw an error in these edge-cases, but let's
> decide if we're comfortable with that.

If the list of columns does not allow a choice of a suitable
full-table unique index on only non-null columns, an error seems
appropriate.  What other problem cases do you see?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Robert Haas
On Fri, Oct 10, 2014 at 2:29 PM, Peter Geoghegan  wrote:
> People keep remarking that they don't like that you can (optionally)
> name a unique index explicitly, and I keep telling them why I've done
> it that way [1]. There is a trade-off here. I am willing to go another
> way in that trade-off, but let's have a realistic conversation about
> it.

I think what's realistic here is that the patch isn't going to get
committed the way you have it today, because nobody else likes that
design.  That may be harsh, but I think it's accurate.

But on the substance of the issue, I'm totally unconvinced by your
argument in the linked email.  Let's suppose you have this:

create table foo (a int, b int);
create unique index foo1 on foo (a);
create unique index foo2 on foo (a);
create unique index foo3 on foo (a) where b = 1;

Anything that conflicts in foo1 or foo2 is going to conflict in the
other one, and with foo3.  Anything that conflicts in foo3 is perhaps
also going to conflict in foo1 and foo2, or perhaps not.  Yet, if the
statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
what to do: when we hit a conflict in any of those three indexes,
update the row.  No matter which index has the conflict, updating is
the right answer, and solves the conflict in all three indexes.  I
dunno what you'd expect someone to write in your syntax to solve this
problem - presumably either (1) they can list any of the indexes that
might conflict, (2) they must list all of the indexes that might
conflict, or (3) it just doesn't work.  Whatever the actual behavior
of your patch is today, it seems to me that the conflict is,
fundamentally, over a set of column values, NOT over a particular
index.  The index is simply a mechanism for noticing that conflict.

If you want to allow this to work with expression indexes, that's not
really a problem; you can let the user write INSERT .. ON CONFLICT
(upper(foo)) UPDATE ... and match that to the index.  It wouldn't
bother me if the first version of this feature couldn't target
expression indexes anyway, but if you want to include that, having the
user mention the actual expression rather than the index name
shouldn't make much difference.

>> Also, how about making the SET clause optional, with the semantics
>> that we just update all of the fields for which a value is explicitly
>> specified:
>>
>> INSERT INTO overwrite_with_abandon (key, value)
>> VALUES (42, 'meaning of life')
>> ON DUPLICATE (key) UPDATE;
>>
>> While the ability to specify a SET clause there explicitly is useful,
>> I bet a lot of key-value store users would love the heck out of a
>> syntax that let them omit it when they want to overwrite.
>
> While I initially like that idea, now I'm not so sure about it [2].
> MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE
> command allows it, since it is defined as a DELETE followed by an
> INSERT (or something like that), but I think that in general that
> feature has been a disaster for them.

Your syntax allows the exact same thing; it simply require the user to
be more verbose in order to get that behavior.  If we think that
people wanting that behavior will be rare, then it's fine to require
them to spell it out when they want it.  If we think it will be the
overwhelming common application of this feature, and I do, then making
people spell it out when we could just as well infer it is pointless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan  wrote:

> There is no danger of UPDATE before row-level triggers firing
> without then updating (unless the xact aborts, but you know what
> I mean).

Well, let's make sure I do know what you mean.  If a BEFORE UPDATE
... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped
without error, right?  The BEFORE UPDATE triggers will run before
the UPDATE if a duplicate is found, and the return value will be
treated in the usual manner, replacing values and potentially
skipping the update?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan  wrote:
> On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner  wrote:

>> That seems a lot cleaner than the proposal on the Wiki page.  If we
>> go that route, it makes sense to fire the BEFORE INSERT triggers
>> before attempting the insert and then fire BEFORE UPDATE triggers
>> before attempting the UPDATE.  If either succeeds, I think we
>> should fire the corresponding AFTER triggers.  We already allow a
>> BEFORE triggers to run and then omit the triggering operation
>> without an error, so I don't see that as a problem.  This makes a
>> lot more sense to me than attempting to add a new UPSERT trigger
>> type.
>
> You realize that that's exactly what my patch does, right?

I haven't read the patch, but the discussion has made that clear,
yes.  Try to take agreement on a point gracefully, will ya?  ;-)

> AFAICT the only confusion that my patch has is with statement-level
> triggers, as discussed on the Wiki page. I think that the row-level
> trigger behavior is fine; a lot more thought has gone into it.

IMV it is clear that since the standard says that update or insert
operations which affect zero rows fire the corresponding trigger,
the statement level triggers for both should fire for an UPSERT,
even if the set of affected rows for one or the other (or both!) is
zero.  If we ever get transition relations, it will be easy to
check the count of affected rows or otherwise behave appropriately
based on what was done.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner  wrote:
> We've all read that, and your repeated arguments for that point of
> view.  We disagree and have said why.  What in that is not a
> realistic conversation?

Because, as I've said many times, there are problems with naming
columns directly that need to be addressed. There are corner-cases.
Are you going to complain about those once I go implement something?
Let's talk about that.

I think we can just throw an error in these edge-cases, but let's
decide if we're comfortable with that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:38 AM, Peter Geoghegan  wrote:
>> That seems a lot cleaner than the proposal on the Wiki page.  If we
>> go that route, it makes sense to fire the BEFORE INSERT triggers
>> before attempting the insert and then fire BEFORE UPDATE triggers
>> before attempting the UPDATE.

By the way, there is no problem with failing to UPDATE, because we
lock rows ahead of the UPDATE. Once a row is locked, the UPDATE cannot
conflict. There is no danger of UPDATE before row-level triggers
firing without then updating (unless the xact aborts, but you know
what I mean). In general, there is no danger of triggers firing more
often than you might consider that they should, with the sole
exception of the fact that we always fire before insert row-level
triggers, even when an UPDATE is the ultimate outcome.

Restarting for conflicts (e.g. handling concurrent insertions with
promise tuples) necessitates a restart, but to the point after before
row-level insert triggers fire, and from a point before any other
triggers have the opportunity to fire.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan  wrote:
> On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas  wrote:

>> I actually like this syntax reasonably well in some ways, but I don't
>> like that we're mentioning the index name, and the CONFLICTING()
>> notation is decidedly odd.
>
> People keep remarking that they don't like that you can (optionally)
> name a unique index explicitly, and I keep telling them why I've done
> it that way [1]. There is a trade-off here. I am willing to go another
> way in that trade-off, but let's have a realistic conversation about
> it.

We've all read that, and your repeated arguments for that point of
view.  We disagree and have said why.  What in that is not a
realistic conversation?

To restate: to do so is conflating the logical definition of the 
database with a particular implementation detail.  As just one 
reason that is a bad idea: we can look up unique indexes on the 
specified columns, but if we implement a other storage techniques 
where there is no such thing as a unique index on the columns, yet 
manage to duplicate the semantics (yes, stranger things have 
happened), people can't migrate to the new structure without 
rewriting their queries.  If the syntax references logical details 
(like column names) there is no need to rewrite.  We don't want to 
be painted into a corner.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner  wrote:
> That seems a lot cleaner than the proposal on the Wiki page.  If we
> go that route, it makes sense to fire the BEFORE INSERT triggers
> before attempting the insert and then fire BEFORE UPDATE triggers
> before attempting the UPDATE.  If either succeeds, I think we
> should fire the corresponding AFTER triggers.  We already allow a
> BEFORE triggers to run and then omit the triggering operation
> without an error, so I don't see that as a problem.  This makes a
> lot more sense to me than attempting to add a new UPSERT trigger
> type.

You realize that that's exactly what my patch does, right?

AFAICT the only confusion that my patch has is with statement-level
triggers, as discussed on the Wiki page. I think that the row-level
trigger behavior is fine; a lot more thought has gone into it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Robert Haas  wrote:

> Anything we do about triggers will by definition be novel.  Right
> now, we have INSERT, UPDATE, and DELETE.  If we add a new
> operation, whether it's called UPSERT or MERGE or FROB, [ ... ]

If we call it MERGE, then we had better follow the rules the
standard lays out for triggers on a MERGE statement -- which is
that UPDATE triggers fire for rows updated and that INSERT triggers
fire for rows inserted.  That kinda makes sense, since the MERGE
command is almost like a row-by-row CASE statement allowing one or
the other.

If we make up our own command verb, we are free to do as we like.

> Maybe something like this:
>
> INSERT INTO targettable(key, quantity, inserted_at)
>   VALUES(123, quantity, now())
>   ON DUPLICATE (key)
> UPDATE SET quantity = quantity + OLD.quantity, updated_at = now();

That seems a lot cleaner than the proposal on the Wiki page.  If we
go that route, it makes sense to fire the BEFORE INSERT triggers
before attempting the insert and then fire BEFORE UPDATE triggers
before attempting the UPDATE.  If either succeeds, I think we
should fire the corresponding AFTER triggers.  We already allow a
BEFORE triggers to run and then omit the triggering operation
without an error, so I don't see that as a problem.  This makes a
lot more sense to me than attempting to add a new UPSERT trigger
type.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas  wrote:
> Anything we do about triggers will by definition be novel.  Right now,
> we have INSERT, UPDATE, and DELETE.  If we add a new operation,
> whether it's called UPSERT or MERGE or FROB, or if we add a flag to
> insert that makes it possibly do something other than inserting (e.g.
> INSERT OR UPDATE), or if we add a flag to update that makes it do
> something other than updating (e.g. UPDATE or INSERT), then some
> people's triggers are going to get broken by that change no matter how
> we do it.  When the new operation is invoked, we can fire the insert
> triggers, the update triggers, some new kind of trigger, or no trigger
> at all - and any decision we make there will not in all cases be
> backward-compatible.  We can try to minimize the damage (and we
> probably should) but we can't make it go to zero.

+1. It's inevitable that someone isn't going to be entirely happy with
whatever we do. Let's be realistic about that, while minimizing the
risk.

> I actually like this syntax reasonably well in some ways, but I don't
> like that we're mentioning the index name, and the CONFLICTING()
> notation is decidedly odd.  Maybe something like this:

People keep remarking that they don't like that you can (optionally)
name a unique index explicitly, and I keep telling them why I've done
it that way [1]. There is a trade-off here. I am willing to go another
way in that trade-off, but let's have a realistic conversation about
it.

> Also, how about making the SET clause optional, with the semantics
> that we just update all of the fields for which a value is explicitly
> specified:
>
> INSERT INTO overwrite_with_abandon (key, value)
> VALUES (42, 'meaning of life')
> ON DUPLICATE (key) UPDATE;
>
> While the ability to specify a SET clause there explicitly is useful,
> I bet a lot of key-value store users would love the heck out of a
> syntax that let them omit it when they want to overwrite.

While I initially like that idea, now I'm not so sure about it [2].
MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE
command allows it, since it is defined as a DELETE followed by an
INSERT (or something like that), but I think that in general that
feature has been a disaster for them.

[1] 
http://www.postgresql.org/message-id/CAM3SWZQpGf6_ME6D1vWqdCFadD7Nprdx2JrE=Hcf=93kxeh...@mail.gmail.com

[2] 
http://www.postgresql.org/message-id/CAM3SWZT=vxbj7qkaidamybu40ap10udsqooqhvix3ykj7wb...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Bruce Momjian
On Fri, Oct 10, 2014 at 02:05:05PM +0100, Thom Brown wrote:
> On 10 October 2014 13:43, Simon Riggs  wrote:
> > On 10 October 2014 11:45, Thom Brown  wrote:
> >
> >> To be honest, this all sounds rather flaky.
> >
> > To be honest, suggesting anything at all is rather difficult and I
> > recommend people try it.
> 
> I have, and most ideas I've had have been justifiably shot down or
> picked apart (scheduled background tasks, offloading stats collection
> to standby, index maintenance in DML query plans, expression
> statistics... to name but a few).
> 
> > Everything sounds crap when you didn't think of it and you've given it
> > an hour's thought.
> 
> I'm not sure that means my concerns aren't valid.  I don't think it
> sounds crap, but I also can't see any use-case for it where we don't
> already have things covered, or where it's going to offer any useful
> level of security.  Like with RLS, it may be that I'm just looking at
> things from the wrong perspective.

Agreed.  The problem isn't giving it only an hours thought --- it is
that we can come up with serious problems in five _seconds_ of thought. 
Unless you can some up with a solution to those issues, I am not sure
why we are even talking about it.

My other concern is you must have realized these issues in five seconds
too, so why didn't you mention them?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-10-10 Thread Tomas Vondra
Hi!

On 9.10.2014 22:28, Kevin Grittner wrote:
> Tomas Vondra  wrote:
>>
>> The only case I've been able to come up with is when the hash table
>> fits into work_mem only thanks to not counting the buckets. The new
>> code will start batching in this case.
> 
> Hmm. If you look at the timings in my posting from 2014-10-01 I had
> cases where the patched version started with one batch and went to
> two, while the unpatched used just one batch, and the patched version
> was still more than twice as fast. I'm sure the "on disk" batch was
> fully cached, however; it might work out differently if disk speed
> actually came into the picture.

I think the case with large outer table and memory pressure, forcing the
batches to be actually written to disk (instead of just being in the
page cache) is the main concern here.


>> That is mostly luck, however, because it depends on the cardinality
>> estimates, and the best way to fix it is increasing work_mem (which is
>> safer thanks to reducing the overhead).
>>
>> Also, Robert proposed a way to mitigate this, if we realize we'd
>> have to do batching during the initial sizing, we can peek whether
>> reducing the number of buckets (to 1/2 or maybe 1/4) would help. I
>> believe this is a good approach, and will look into that after
>> pgconf.eu (i.e. early November), unless someone else is interested.
> 
> Sure, but it would be good to confirm that it's actually needed first.

Yeah. But with cleverly crafted test case it's possible to confirm
almost everything ;-)

>>> When given a generous work_mem setting the patched version often uses
>>> more of what it is allowed than the unpatched version (which is
>>> probably one of the reasons it tends to do better). If someone has
>>> set a high work_mem and has gotten by only because the configured
>>> amount is not all being used when it would benefit performance, they
>>> may need to adjust work_mem down to avoid memory problems. That
>>> doesn't seem to me to be a reason to reject the patch.
>>
>> I'm not entirely sure I understand this paragraph. What do you mean by
>> "configured amount is not all being used when it would benefit
>> performance"? Can you give an example?
> 
> Again, the earlier post showed that while the unpatched used 3516kB
> whether it had work_mem set to 4MB or 1GB, the patched version used
> 3073kB when work_mem was set to 4MB and 4540kB when work_mem was
> set to 1GB.  The extra memory allowed the patched version to stay
> at 1 batch, improving performance over the other setting.

OK, so with 4MB the new version was batching, while the original code
keeps a single batch (likely due to not counting buckets into work_mem).
I believe that's expected behavior.

Also, in the e-mail from Oct 2 that got lost [1], I pointed out that by
testing only with small hash tables the results are likely influenced by
high CPU cache hit ratio. I think benchmarking with larger inner
relation (thus increasing the memory occupied by the hash table) might
be interesting.

[1]
http://www.postgresql.org/message-id/c84680e818f6a0f4a26223cd750ff252.squir...@2.emaily.eu

> 
>> The only thing I can think of is the batching behavior described above.
>>
>>> This is in "Waiting on Author" status only because I never got an
>>> answer about why the debug code used printf() rather the elog() at
>>> a DEBUG level.  Other than that, I would say this patch is Ready
>>> for Committer.  Tomas?  You there?
>>
>> I think I responded to that on October 2, quoting:
...
> Ah, I never received that email.  That tends to happen every now
> and then.  :-(
> 
>> I believe it's safe to switch the logging to elog(). IMHO the printf
>> logging is there from some very early version of the code, before elog
>> was introduced. Or something like that.
> 
> I guess it might be best to remain consistent in this patch and
> change that in a separate patch.  I just wanted to make sure you
> didn't see any reason not to do so.

OK, I'll put that on my TODO list for the next CF.

> With that addressed I will move this to Ready for Committer.  Since
> both Heikki and Robert spent time on this patch earlier, I'll give
> either of them a shot at committing it if they want; otherwise I'll
> do it.

Great, thanks!
Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Kevin Grittner
Thom Brown  wrote:
> On 10 October 2014 15:56, Stephen Frost  wrote:
>> Thom Brown (t...@linux.com) wrote:
>>> Data such as plain credit card numbers stored in a
>>> column, even with all its data masked, would be easy to determine.
>>
>> I'm not as convinced of that as you are..  Though I'll point out that in
>> the use-cases which I've been talking to users about, it isn't credit
>> cards under discussion.
>
> I think credit card numbers are a good example.

I'm not so sure.  Aren't credit card numbers generally required by
law to be stored in an encrypted form?

> If we're talking
> about format functions here, there has to be something in addition to
> that which determines permitted comparison operations.  If not, and we
> were going to remove all but = operations, we'd effectively cripple
> the functionality of anything that's been formatted that wasn't
> intended as a security measure.  It almost sounds like an extension to
> domains rather than column-level functionality.

I have to say that my first thought was that format functions
associated with types with domain override would be a very nice
capability.  But I don't see where that has much to do with
security.  I have seen many places where redaction is necessary
(and in fact done), but I don't see how that could be addressed by
what Simon is proposing.  Perhaps I'm missing something; if so, a
more concrete exposition of a use case might allow things to
"click".

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Robert Haas
On Wed, Oct 8, 2014 at 5:01 PM, Kevin Grittner  wrote:
> Peter Geoghegan  wrote:
>> On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas  wrote:
>>> I think the problem is that it's not possible to respect the "usual
>>> guarantees" even at READ COMMITTED level when performing an INSERT OR
>>> UPDATE operation (however spelled).  You may find that there's a tuple
>>> with the same PK which is committed but not visible to the snapshot
>>> you took at the beginning of the statement.
>>
>> Can you please comment on this, Kevin? It would be nice to converge on
>> an agreement on syntax here
>
> Robert said "however spelled" -- which I take to mean that he at
> least sees that the MERGE-like UPSERT syntax can be turned into the
> desired semantics.  I have no idea why anyone would think otherwise.
>
> Although the last go-around does suggest that there is at least one
> point of difference on the semantics.  You seem to want to fire the
> BEFORE INSERT triggers before determining whether this will be an
> INSERT or an UPDATE.

OK, I have a comment on this.

Anything we do about triggers will by definition be novel.  Right now,
we have INSERT, UPDATE, and DELETE.  If we add a new operation,
whether it's called UPSERT or MERGE or FROB, or if we add a flag to
insert that makes it possibly do something other than inserting (e.g.
INSERT OR UPDATE), or if we add a flag to update that makes it do
something other than updating (e.g. UPDATE or INSERT), then some
people's triggers are going to get broken by that change no matter how
we do it.  When the new operation is invoked, we can fire the insert
triggers, the update triggers, some new kind of trigger, or no trigger
at all - and any decision we make there will not in all cases be
backward-compatible.  We can try to minimize the damage (and we
probably should) but we can't make it go to zero.

> INSERT INTO targettable(key, quantity, inserted_at)
>   VALUES(123, quantity, now())
>   ON CONFLICT WITHIN targettable_pkey
> UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = 
> now();

I actually like this syntax reasonably well in some ways, but I don't
like that we're mentioning the index name, and the CONFLICTING()
notation is decidedly odd.  Maybe something like this:

INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON DUPLICATE (key)
 UPDATE SET quantity = quantity + OLD.quantity, updated_at = now();

(Perhaps OLD should reference the tuple already in the table, and NEW
the value from the VALUES clause.  That would be snazzy indeed.)

Also, how about making the SET clause optional, with the semantics
that we just update all of the fields for which a value is explicitly
specified:

INSERT INTO overwrite_with_abandon (key, value)
VALUES (42, 'meaning of life')
ON DUPLICATE (key) UPDATE;

While the ability to specify a SET clause there explicitly is useful,
I bet a lot of key-value store users would love the heck out of a
syntax that let them omit it when they want to overwrite.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Understanding and implementing a GiST Index

2014-10-10 Thread Connor Wolf
I'd be ok with either SP-GiST or GiST, though there are tree structures 
(BK tree) that are particularly suited to this application that I 
/think/ map to GiST better then SP-GiST.


Re: hamming in the RD-tree implementation: Where? I cloned the smlar git 
repo, and poked around, and it looks like it only can operate on set 
intersections, which is a non-starter for what I need to do.  I spent a 
while looking through the source, and I didn't see anything that looked 
like hamming distance calculation (through I probably missed some stuff. 
The indirection through `FCall2()` makes things very hard to follow).


Anyways, right now, smlar is not useful to me, because it completely 
ignores the structure of incoming arrays:


postgres=# SELECT smlar(
postgres(# '{1,0,0,0, 1,1,1,1, 1,1,1,0, 0,1,0,0}'::int[],
postgres(# '{1,0,0,1, 0,1,0,0, 0,1,0,0, 0,1,1,0}'
postgres(# );
 smlar
---
 1
(1 row)


For two radically different hashes (shortened for example purposes) 
which compare as identical.


I spent some time trying to see if I could easily disable the array 
uniquefication, and by disabling the calls to `qsort_arg`, and modifying 
`numOfIntersect` so it just counts the number of times the array 
elements do not match, and I'm getting the behaviour I want out of 
`smlar()` at this point:


postgres=# SELECT smlar(
postgres(# '{1,0,0,0, 1,1,1,1, 1,1,1,0, 0,1,0,0}'::int[],
postgres(# '{1,0,0,1, 0,1,0,0, 0,1,0,0, 0,1,1,0}'
postgres(# );
 smlar

 0.4375
(1 row)

postgres=# SELECT smlar(
'{0,1,1,0}'::int[],
'{1,0,1,0}'
);
 smlar
---
   0.5
(1 row)


But the index does not seem to work after the changes I made, and the 
debugging print statements (well, `elog()` statements) I inserted into 
`cmpArrayElem()` and `numOfIntersect()` are not being hit, so I don't 
understand how the index is even being built.


Anyways, I'm rambling a bit. Any input would be great.
Thanks,
Connor

On 10/9/2014 8:11 AM, Oleg Bartunov wrote:

Just quick recommendation.
You need to ask -hackers mailing list for that. Also, do you want GiST 
or SP-GiST ?
We already use hamming distance in rd-tree, which implemented in GiST, 
see our presentation 
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2012.pdf 
, for 
example.


Oleg

On Thu, Oct 9, 2014 at 11:09 AM, Connor Wolf 
mailto:w...@imaginaryindustries.com>> 
wrote:


Hi there!

I'm trying to implement a custom indexing system using the GiST
index framework, and I have a few questions.
Basically, I'm trying to implement a system that allows me to
search across a set of 64 bit integers by hamming distance. This
is intended to be used in searching for similar images, where the
64 bit integer is actually a phash

of an image, and similarity between two images is reflected in the
hamming distance between two integers.

Anyways, The appropriate approach here is to use something called
a BK tree
,
for which I've written some test code


and I think I have a decent grip of (my test code seems to work,
in any event).

That said:

  * Is there a /simple/ piece of example-code for implementing a
GiST index for a single index? I've been looking through the
/contrib/ directory, particularly the /contrib/btree_gist/
files, but it looks like 90% of the complexity there is
related to supporting all the column types Postgres has,
rather then actually tied to producing a functional index.
  * Once I have something compiling, how can I check to be sure
that I'm actually using the indexing module I created? I can
enable my compiled extension using `CREATE EXTENSION`, but is
there an option for `CREATE INDEX test_index ON testing USING
gist (val);` that lets me specify *which* GiST index is
actually employed? Is this even a valid question?
This is probably something that's available in one of the
system tables somewhere, but I haven't had much luck with
google finding out where.
  * Testing: What's the appropriate way to examine the generated
tree structure of the index? I certainly went through a few
bugs with my test tree system (and that was in python!). Is
there any way to examine the index structure for debugging
purposes?
  * Also, it looks like `ereport()` is the proper way to emit
debugging information. Is this correct?
  * In that vein, is there any way to have information that is on
the operations of an entire query? Looking at the number of
tree nodes touched for a scan would be nice (and I would not
be surprised if there is already a facility for it

Re: [HACKERS] schema-only -n option in pg_restore fails

2014-10-10 Thread FabrĂ­zio de Royes Mello
On Thu, Oct 9, 2014 at 6:19 PM, Josh Berkus  wrote:
>
> On 10/09/2014 12:36 PM, Josh Berkus wrote:
> > Summary: pg_restore -n attempts to restore objects to pg_catalog schema
> > Versions Tested: 9.3.5, 9.3.0, 9.2.4
>
> Explored this some with Andrew offlist.  Turns out this is going to be a
> PITA to fix, so it should go on the big pile of TODOs for when we
> overhaul search_path.
>
> Here's what's happening under the hood, pg_restore generates this SQL
text:
>
> SET search_path = schem_a, pg_catalog;
> CREATE TABLE tab_a (
>  test text
> );
>
> Since schem_a doesn't exist, it's skipped over and pg_restore attempts
> to create the objects in pg_catalog.  So this is Yet Another Issue
> caused by the ten meter tall tar baby which is search_path.
>
> So, my proposal for a resolution:
>
> 1) In current versions, patch the docs to explicitly say that -n does
> not create the schema, and that if the user doesn't create the schema
> pg_restore will fail.
>
> 2) Patch 9.5's pg_restore to do "CREATE SCHEMA IF NOT EXISTS" when -n is
> used.  This will be 100% backwards-compatible with current behavior.
>

I agree with this solution. Always when I restore some schema from a dump I
need to create schemas before and it's sucks.

I'm working on the 2th item [1] together with other friend (Sebastian, in
cc) to introduce him into the PostgreSQL development process.

We'll register soon to the next commitfest.

Regards,

[1]
https://github.com/fabriziomello/postgres/tree/create_schema_during_pg_restore_schema_only

--
FabrĂ­zio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Andres Freund
On 2014-10-10 16:41:39 +0200, Andres Freund wrote:
> FWIW, the profile always looks like
> -  48.61%  postgres  postgres  [.] s_lock
>- s_lock
>   + 96.67% StrategyGetBuffer
>   + 1.19% UnpinBuffer
>   + 0.90% PinBuffer
>   + 0.70% hash_search_with_hash_value
> +   3.11%  postgres  postgres  [.] GetSnapshotData
> +   2.47%  postgres  postgres  [.] StrategyGetBuffer
> +   1.93%  postgres  [kernel.kallsyms] [k] copy_user_generic_string
> +   1.28%  postgres  postgres  [.] hash_search_with_hash_value
> -   1.27%  postgres  postgres  [.] LWLockAttemptLock
>- LWLockAttemptLock
>   - 97.78% LWLockAcquire
>  + 38.76% ReadBuffer_common
>  + 28.62% _bt_getbuf
>  + 8.59% _bt_relandgetbuf
>  + 6.25% GetSnapshotData
>  + 5.93% VirtualXactLockTableInsert
>  + 3.95% VirtualXactLockTableCleanup
>  + 2.35% index_fetch_heap
>  + 1.66% StartBufferIO
>  + 1.56% LockReleaseAll
>  + 1.55% _bt_next
>  + 0.78% LockAcquireExtended
>   + 1.47% _bt_next
>   + 0.75% _bt_relandgetbuf
> 
> to me. Now that's with the client count 496, but it's similar with lower
> counts.
> 
> BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
> smarter.

Which is nearly trivial now that atomics are in. Check out the attached
WIP patch which eliminates the spinlock from StrategyGetBuffer() unless
there's buffers on the freelist.

Test:
pgbench  -M prepared -P 5 -S -c 496 -j 496 -T 5000
on a scale=1000 database, with 4GB of shared buffers.

Before:
progress: 40.0 s, 136252.3 tps, lat 3.628 ms stddev 4.547
progress: 45.0 s, 135049.0 tps, lat 3.660 ms stddev 4.515
progress: 50.0 s, 135788.9 tps, lat 3.640 ms stddev 4.398
progress: 55.0 s, 135268.4 tps, lat 3.654 ms stddev 4.469
progress: 60.0 s, 134991.6 tps, lat 3.661 ms stddev 4.739

after:
progress: 40.0 s, 207701.1 tps, lat 2.382 ms stddev 3.018
progress: 45.0 s, 208022.4 tps, lat 2.377 ms stddev 2.902
progress: 50.0 s, 209187.1 tps, lat 2.364 ms stddev 2.970
progress: 55.0 s, 206462.7 tps, lat 2.396 ms stddev 2.871
progress: 60.0 s, 210263.8 tps, lat 2.351 ms stddev 2.914

Yes, no kidding.

The results are similar, but less extreme, for smaller client counts
like 80 or 160.

Amit, since your test seems to be currently completely bottlenecked
within StrategyGetBuffer(), could you compare with that patch applied to
HEAD and the LW_SHARED patch for one client count? That'll may allow us
to see a more meaningful profile...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 6b486e5b467e94ab9297d7656a5b39b816c5c55a Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 10 Oct 2014 17:36:46 +0200
Subject: [PATCH] WIP: lockless StrategyGetBuffer hotpath

---
 src/backend/storage/buffer/freelist.c | 154 --
 1 file changed, 90 insertions(+), 64 deletions(-)

diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 5966beb..0c634a0 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -18,6 +18,12 @@
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
 
+#include "port/atomics.h"
+
+
+#define LATCHPTR_ACCESS_ONCE(var)	((Latch *)(*((volatile Latch **)&(var
+#define INT_ACCESS_ONCE(var)	((int)(*((volatile int *)&(var
+
 
 /*
  * The shared freelist control information.
@@ -27,8 +33,12 @@ typedef struct
 	/* Spinlock: protects the values below */
 	slock_t		buffer_strategy_lock;
 
-	/* Clock sweep hand: index of next buffer to consider grabbing */
-	int			nextVictimBuffer;
+	/*
+	 * Clock sweep hand: index of next buffer to consider grabbing. Note that
+	 * this isn't a concrete buffer - we only ever increase the value. So, to
+	 * get an actual buffer, it needs to be used modulo NBuffers.
+	 */
+	pg_atomic_uint32 nextVictimBuffer;
 
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
@@ -42,8 +52,8 @@ typedef struct
 	 * Statistics.  These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
 	 */
-	uint32		completePasses; /* Complete cycles of the clock sweep */
-	uint32		numBufferAllocs;	/* Buffers allocated since last reset */
+	pg_atomic_uint32 completePasses; /* Complete cycles of the clock sweep */
+	pg_atomic_uint32 numBufferAllocs;	/* Buffers allocated since last reset */
 
 	/*
 	 * Notification latch, or NULL if none.  See StrategyNotifyBgWriter.
@@ -124,87 +134,107 @@ StrategyGetBuffer(BufferAccessStrategy strategy)
 			return buf;
 	}
 
-	/* Nope, so lock the freelist */
-	SpinLockAcquire(&StrategyControl->buffer_strategy_lock);
-
 	/*
 	 * We count buffer allocation requests so that the bgwriter can estimate
 	 * the r

Re: [HACKERS] Column Redaction

2014-10-10 Thread Rod Taylor
On Fri, Oct 10, 2014 at 10:56 AM, Stephen Frost  wrote:

> * Thom Brown (t...@linux.com) wrote:
> > On 10 October 2014 12:45, Stephen Frost  wrote:
> > >> There's a difference between intending that there shouldn't be a way
> > >> past security and just making access a matter of walking a longer
> > >> route.
> > >
> > > Throwing random 16-digit numbers and associated information at a credit
> > > card processor could be viewed as "walking a longer route" too.  The
> > > same goes for random key searches or password guesses.
> >
> > But those would need to be exhaustive, and in nearly all cases,
> > impractical.
>
> That would be exactly the idea with this- we make it impractical to get
> at the unredacted information.
>

For fun I gave the search a try.


create table cards (id serial, cc bigint);
insert into cards (cc)
  SELECT CAST(random() *  AS bigint) FROM
generate_series(1,1);

\timing on
WITH RECURSIVE t(id, range_min, range_max) AS (
  SELECT id, 1::bigint,  FROM cards
  UNION ALL
SELECT id
 , CASE WHEN cc >= range_avg THEN range_avg ELSE range_min END
 , CASE WHEN cc <= range_avg THEN range_avg ELSE range_max END
  FROM (SELECT id, (range_min + range_max) / 2 AS range_avg, range_min,
range_max
  FROM t
   ) AS t_avg
  JOIN cards USING (id)
 WHERE range_min != range_max
)
SELECT id, range_min AS cc FROM t WHERE range_min = range_max;


On my laptop I can pull all 10,000 card numbers in less than 1 second. For
a text based item I don't imagine it would be much different. Numbers are
pretty easy to work with though.


Re: [HACKERS] Column Redaction

2014-10-10 Thread Claudio Freire
On Fri, Oct 10, 2014 at 11:58 AM, Stephen Frost  wrote:
>> You are obviously wearing your rose-colored glasses this morning.  I
>> predict a competent SQL programmer could write an SQL function, or
>> client-side code, to pump the data out of the database using binary
>> search in milliseconds per row.
>
> Clearly, if we're unable to prevent that, then this feature wouldn't be
> useful.  What would be helpful is to consider what we could provide
> along these lines without allowing the data to be trivially recovered.

Joins are way too powerful to allow arbitrary joins to untrusted users.

The only somewhat secure way is to allow administrators define which
joins are possible, and untrusted users use those.

You get that with views. I'm not sure you can allow more than that,
and not have lots of leaks.

Is there a use case where redaction is the only solution really?

Nothing mentioned till now really is:

* Transaction logs and blocked card lists can be joined against users
and a view can be provided that includes the user, and not the credit
card. So you can join freely between the views just fine, by user, and
do all the analysis you need without exposing credit card numbers in
any way, not even redacted.

* If not users, you can join against a random but unique per card
value generated at some point when the card is first inserted in the
records, and you get a random token for the card. Still works, and can
be done with triggers, and is far less leaky than the proposed
redaction.

* Credit card number verification is a leak on its own, but if you
really want it, you can provide a function that does it. And I think
it's perfectly reasonable that defining leaking functions has to be an
admin thing.

* Views can expose the redacted value just fine for direct use. A
generically usable user-id or random-token to redacted number mapping
view would provide all the freedom you could want.

* Functions defined as leakproof (even when they're not, which is an
admin decision to throw data safety out the window, but it's a
possible decision), which allows fetching redacted columns that way
from security barrier views.

Is there anything not covered by the above that can be done by
built-in redacting?

If the answer is yes, then maybe the feature has value.

If the feature's value is ease of use, I'd weight that with the
security loss. False sense of security is a net security loss in most
(if not all) cases. Having to flesh out the logic through security
barrier views, leakproof redacting functions and triggers can have the
good side-effect of making all the possible leaks obvious to the
admin.


On Fri, Oct 10, 2014 at 12:27 PM, Thom Brown  wrote:
> Also, joining to foreign tables could be an issue, copying data to
> temporary tables could possibly remove any redaction, materialised
> views would need to support it somehow.  Although just because I can't
> picture how that would work, it's no indication that it couldn't.

Well, that's why encryption is usually regulatorily required on credit
card data. Way too many ways to leak, and way too valuable to expect
lack of knowledgeable and motivated people trying to get them.


On Fri, Oct 10, 2014 at 12:27 PM, Thom Brown  wrote:
>>> Salted and hashed passwords, even with complete visibility of the
>>> value, isn't vulnerable to scrutiny of particular character values.
>>> If it were, no-one would use it.
>>
>> I wasn't suggesting otherwise, but I don't see it as particularly
>> relevant to the discussion regardless.
>
> I guess I was trying to illustrate that the security in a hashed
> password is acceptable because it requires exhaustive searching to
> break.  If comparison operators worked on it, it would be broken out
> of the box.

Lately, the security of password-based authentication is being put
into question very often. So I wouldn't hold credit card numbers or
any other sensible information to the password standard.

But lets use the password example: it's widely accepted that holding
onto cleartext passwords or even transmitting over any channel them or
their plain hashes to be extremely bad practice. So redaction isn't
good enough for passwords, nor is salted hashing either. The only
generally accepted way on the security community, is a password proof
in the context of a zero-knowledge password proof protocol[0]. You'd
want something like that for any bit of info you need to "join" or
"compare" but you can't accept leaking it.

[0] http://en.wikipedia.org/wiki/Zero-knowledge_password_proof


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Thom Brown
On 10 October 2014 15:56, Stephen Frost  wrote:
> * Thom Brown (t...@linux.com) wrote:
>> Data such as plain credit card numbers stored in a
>> column, even with all its data masked, would be easy to determine.
>
> I'm not as convinced of that as you are..  Though I'll point out that in
> the use-cases which I've been talking to users about, it isn't credit
> cards under discussion.

I think credit card numbers are a good example.  If we're talking
about format functions here, there has to be something in addition to
that which determines permitted comparison operations.  If not, and we
were going to remove all but = operations, we'd effectively cripple
the functionality of anything that's been formatted that wasn't
intended as a security measure.  It almost sounds like an extension to
domains rather than column-level functionality.

But then if operators such as <, > and ~~ aren't hindered, it sounds
like no protection at all.

Also, joining to foreign tables could be an issue, copying data to
temporary tables could possibly remove any redaction, materialised
views would need to support it somehow.  Although just because I can't
picture how that would work, it's no indication that it couldn't.

>> Salted and hashed passwords, even with complete visibility of the
>> value, isn't vulnerable to scrutiny of particular character values.
>> If it were, no-one would use it.
>
> I wasn't suggesting otherwise, but I don't see it as particularly
> relevant to the discussion regardless.

I guess I was trying to illustrate that the security in a hashed
password is acceptable because it requires exhaustive searching to
break.  If comparison operators worked on it, it would be broken out
of the box.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Oct 10, 2014 at 7:00 AM, Stephen Frost  wrote:
> > The discussion about looking up specific card numbers in the original
> > email from Simon was actually an allowed use-case, as I understood it,
> > not a risk concern.  Indeed, if you know a valid credit card number
> > already, as in this example, then why are you bothering with the search?
> > Perhaps it would provide confirmation, but it's not the database's
> > responsibility to make you forget the number you already have.  Doing a
> > random walk through a keyspace of 10^16 and extracting a significant
> > enough number of results to be useful should be difficult.  I agree that
> > if we're completely unable to make it difficult then this is less
> > useful, but I feel it's a bit early to jump to that conclusion.

Thanks much for the laugh. :)

> You are obviously wearing your rose-colored glasses this morning.  I
> predict a competent SQL programmer could write an SQL function, or
> client-side code, to pump the data out of the database using binary
> search in milliseconds per row.

Clearly, if we're unable to prevent that, then this feature wouldn't be
useful.  What would be helpful is to consider what we could provide
along these lines without allowing the data to be trivially recovered.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
* Thom Brown (t...@linux.com) wrote:
> On 10 October 2014 12:45, Stephen Frost  wrote:
> >> There's a difference between intending that there shouldn't be a way
> >> past security and just making access a matter of walking a longer
> >> route.
> >
> > Throwing random 16-digit numbers and associated information at a credit
> > card processor could be viewed as "walking a longer route" too.  The
> > same goes for random key searches or password guesses.
> 
> But those would need to be exhaustive, and in nearly all cases,
> impractical.

That would be exactly the idea with this- we make it impractical to get
at the unredacted information.

> Data such as plain credit card numbers stored in a
> column, even with all its data masked, would be easy to determine.

I'm not as convinced of that as you are..  Though I'll point out that in
the use-cases which I've been talking to users about, it isn't credit
cards under discussion.

> Salted and hashed passwords, even with complete visibility of the
> value, isn't vulnerable to scrutiny of particular character values.
> If it were, no-one would use it.

I wasn't suggesting otherwise, but I don't see it as particularly
relevant to the discussion regardless.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> You said above that it's OK to pass the card numbers to leakproof
> functions. But if you allow that, you can write a function that
> takes as argument a redacted card number, and unredacts it (using
> the < and = operators in a binary search). And then you can just do
> "SELECT unredact(card_number) from redacted_table".

Not sure I'm following what you mean by 'redacted'.  The original
proposal provided '   1234' as the 'redacted' number, and
I'm not seeing how you can get the rest of the number trivially with
just equality and binary search.

If you start with a complete number then you can get the system to tell
you if it exists or not with a binary search or even just doing an
equality check.

> You seem to have something stronger in mind: only allow the equality
> operator on the redacted column, and nothing else.

That wasn't my suggestion- I was merely pointing out that if you have a
complete number (perhaps by pulling out a random number, with a filter
against the last four digits, reducing the search space to 10^12) which
you want to check for existance, you can do that directly.  No need for
a binary search at all.

> That might be
> better, although I'm not really convinced. There are just too many
> ways you could still leak the datum. Just a random example, inspired
> by the recent CRIME attack on SSL: build a row with the redacted
> datum, and another "guess" datum, and store it along with 1k of
> other data in a temporary table. The row gets toasted. Observe how
> much it compressed; if the guess datum is close to the original
> datum, it compresses well. Now, you can probably stop that
> particular attack with more restrictions on what you can do with the
> datum, but that just shows that pretty much any computation you
> allow with the datum can be used to reveal its value.

One concept I've been thinking about is a notion of 'trusted' data
sources to allow comparison against.  Perhaps individual values are
allowed from the user also, but my thought is that you have:

master_table
trusted_table

Such that you can't view the sensetive column in either the master or
the trusted table, but you can join between the two on the sensetive
column and view other, non-sensetive, attributes of the two tables.  You
might even allow other transformations on the sensetive column, provided
it always results in a boolean comparison to another sensetive column.
Not sure if that really solves Simon's use-case exactly, but it might
tease out other thoughts.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Andres Freund
On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
> On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund 
> wrote:
> > > Observations
> > > --
> > > a. The patch performs really well (increase upto ~40%) incase all the
> > > data fits in shared buffers (scale factor -100).
> > > b. Incase data doesn't fit in shared buffers, but fits in RAM
> > > (scale factor -3000), there is performance increase upto 16 client
> count,
> > > however after that it starts dipping (in above config unto ~4.4%).
> >
> > Hm. Interesting. I don't see that dip on x86.
>
> Is it possible that implementation of some atomic operation is costlier
> for particular architecture?

Yes, sure. And IIRC POWER improved atomics performance considerably for
POWER8...

> I have tried again for scale factor 3000 and could see the dip and this
> time I have even tried with 175 client count and the dip is approximately
> 5% which is slightly more than 160 client count.

FWIW, the profile always looks like
-  48.61%  postgres  postgres  [.] s_lock
   - s_lock
  + 96.67% StrategyGetBuffer
  + 1.19% UnpinBuffer
  + 0.90% PinBuffer
  + 0.70% hash_search_with_hash_value
+   3.11%  postgres  postgres  [.] GetSnapshotData
+   2.47%  postgres  postgres  [.] StrategyGetBuffer
+   1.93%  postgres  [kernel.kallsyms] [k] copy_user_generic_string
+   1.28%  postgres  postgres  [.] hash_search_with_hash_value
-   1.27%  postgres  postgres  [.] LWLockAttemptLock
   - LWLockAttemptLock
  - 97.78% LWLockAcquire
 + 38.76% ReadBuffer_common
 + 28.62% _bt_getbuf
 + 8.59% _bt_relandgetbuf
 + 6.25% GetSnapshotData
 + 5.93% VirtualXactLockTableInsert
 + 3.95% VirtualXactLockTableCleanup
 + 2.35% index_fetch_heap
 + 1.66% StartBufferIO
 + 1.56% LockReleaseAll
 + 1.55% _bt_next
 + 0.78% LockAcquireExtended
  + 1.47% _bt_next
  + 0.75% _bt_relandgetbuf

to me. Now that's with the client count 496, but it's similar with lower
counts.

BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
smarter.

>   Patch_ver/Client_count 175  HEAD 248374  PATCH 235669
> > > Now probably these shouldn't matter much in case backend needs to
> > > wait for other Exclusive locker, but I am not sure what else could be
> > > the reason for dip in case we need to have Exclusive LWLocks.
> >
> > Any chance to get a profile?
>
> Here it goes..
>
> Lwlock_contention patches - client_count=128
> --
>
> +   7.95%  postgres  postgres   [.] GetSnapshotData
> +   3.58%  postgres  postgres   [.] AllocSetAlloc
> +   2.51%  postgres  postgres   [.] _bt_compare
> +   2.44%  postgres  postgres   [.]
> hash_search_with_hash_value
> +   2.33%  postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
> +   2.24%  postgres  postgres   [.] AllocSetFreeIndex
> +   1.75%  postgres  postgres   [.]
> pg_atomic_fetch_add_u32_impl

Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
the compiler settings you used?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-10 Thread Andres Freund
On 2014-10-10 23:08:34 +0900, MauMau wrote:
> From: "Craig Ringer" 
> >It sounds like they've produced a test case, so they should be able to
> >with a bit of luck.
> >
> >Or even better, send you the test case.
> 
> I asked the user about this.  It sounds like the relevant test case consists
> of many scripts.  He explained to me that the simplified test steps are:
> 
> 1. initdb
> 2. pg_ctl start
> 3. Create 16 tables.  Each of those tables consist of around 10 columns.
> 4. Insert 1000 rows into each of those 16 tables.
> 5. Launch 16 psql sessions concurrently.  Each session updates all 1000 rows
> of one table, e.g., session 1 updates table 1, session 2 updates table 2,
> and so on.
> 6. Repeat step 5 50 times.
> 
> This sounds a bit complicated, but I understood that the core part is 16
> concurrent updates, which should lead to contention on xlog insert slots
> and/or spinlocks.

Hm. I've run similar loads on linux for long enough that I'm relatively
sure I'd have seen this.

Could you get them to print out the content's of the lwlock all these
processes are waiting for?

> >Your next step here really needs to be to make this reproducible against
> >a debug build. Then see if reverting the xlog scalability work actually
> >changes the behaviour, given that you hypothesised that it could be
> >involved.

I don't think you can trivially revert the xlog scalability stuff.

> Thank you, but that may be labor-intensive and time-consuming.  In addition,
> the user uses a machine with multiple CPU cores, while I only have a desktop
> PC with two CPU cores.  So I doubt I can reproduce the problem on my PC.

Well, it'll also be labor intensive for the community to debug.

> I asked the user to change S_UNLOCK to something like the following and run
> the test during this weekend (the next Monday is a national holiday in
> Japan).
> 
> #define S_UNLOCK(lock)  InterlockedExchange(lock, 0)

That shouldn't be required. For one, on 9.4 (not 9.5!) spinlock releases
only need to prevent reordering on the CPU level. As x86 is a TSO
architecture (total store order) that doesn't require doing anything
special. And even if it'd require more, on msvc volatile reads/stores
act as acquire/release fences unless you monkey with the compiler settings.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-10 Thread MauMau

From: "Craig Ringer" 

It sounds like they've produced a test case, so they should be able to
with a bit of luck.

Or even better, send you the test case.


I asked the user about this.  It sounds like the relevant test case consists 
of many scripts.  He explained to me that the simplified test steps are:


1. initdb
2. pg_ctl start
3. Create 16 tables.  Each of those tables consist of around 10 columns.
4. Insert 1000 rows into each of those 16 tables.
5. Launch 16 psql sessions concurrently.  Each session updates all 1000 rows 
of one table, e.g., session 1 updates table 1, session 2 updates table 2, 
and so on.

6. Repeat step 5 50 times.

This sounds a bit complicated, but I understood that the core part is 16 
concurrent updates, which should lead to contention on xlog insert slots 
and/or spinlocks.




Your next step here really needs to be to make this reproducible against
a debug build. Then see if reverting the xlog scalability work actually
changes the behaviour, given that you hypothesised that it could be
involved.


Thank you, but that may be labor-intensive and time-consuming.  In addition, 
the user uses a machine with multiple CPU cores, while I only have a desktop 
PC with two CPU cores.  So I doubt I can reproduce the problem on my PC.


I asked the user to change S_UNLOCK to something like the following and run 
the test during this weekend (the next Monday is a national holiday in 
Japan).


#define S_UNLOCK(lock)  InterlockedExchange(lock, 0)

FYI, the user reported today that the problem didn't occur when he ran the 
same test for 24 hours on 9.3.5.  Do you see something relevant in 9.4?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Inconsistencies in documentation of row-level locking

2014-10-10 Thread Michael Paquier
Hi all,

Currently all the row-level lock modes are described in the page for
SELECT query:
http://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS
However, after browsing the documentation, I noticed in the page
describing all the explicit locks of the system that there is a
portion dedicated to row-level locks and this section is not
mentioning at all FOR KEY SHARE and FOR NO KEY UPDATE. It seems that
this is something rather misleading for the user:
http://www.postgresql.org/docs/devel/static/explicit-locking.html#LOCKING-ROWS

Attached is a patch that refactors the whole and improves the documentation:
- Addition of a table showing the conflicts between each lock
- Moved description of each row-level lock mode to the explicit locking page
- Addition of a link in SELECT portion to redirect the user to the
explicit locking page
Regards,
-- 
Michael
From 974b360cc8bfdade607154c94d0a78f5cf466e0a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 10 Oct 2014 22:10:10 +0900
Subject: [PATCH] Refactor documentation of row-level locking

All the details about each row-level lock mode was referenced in
details in the page dedicated to SELECT query, what seems rather
incorrect because there is a section in MVCC section describing in
details all the locks that can be used in the system. Note that this
portion has not been updated for the implementation of FOR KEY SHARE
and FOR NO KEY UPDATE while it should have been the case, making the
information provided to user inconsistent and misleading. This patch
refactors the whole, adding a link in SELECT to redirect the user to
the page describing all the explicit locks of the system.
---
 doc/src/sgml/mvcc.sgml   | 172 +--
 doc/src/sgml/ref/select.sgml |  60 +--
 2 files changed, 152 insertions(+), 80 deletions(-)

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index cd55be8..478279d 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1106,30 +1106,107 @@ ERROR:  could not serialize access due to read/write dependencies among transact
 
 
  In addition to table-level locks, there are row-level locks, which
- can be exclusive or shared locks.  An exclusive row-level lock on a
- specific row is automatically acquired when the row is updated or
- deleted.  The lock is held until the transaction commits or rolls
- back, just like table-level locks.  Row-level locks do
- not affect data querying; they block only writers to the same
- row.
+ are listed as below with the contexts in which they are used
+ automatically by PostgreSQL.  The main
+ differences between one lock mode and another is the set of lock modes
+ with each conflicts (see ).
+ Two transactions cannot hold locks of conflicting modes on the same rows
+ of the same table at the same time.  Also, a transaction never conflicts
+ with itself.  Row-level locks do not affect data querying; they block only
+ writers to the same row.
 
 
-
- To acquire an exclusive row-level lock on a row without actually
- modifying the row, select the row with SELECT FOR
- UPDATE.  Note that once the row-level lock is acquired,
- the transaction can update the row multiple times without
- fear of conflicts.
-
+ 
+  Row-level Lock Modes
+  
+   
+FOR UPDATE
+   
+   
+
+ FOR UPDATE causes the rows retrieved by the
+ SELECT statement to be locked as though for
+ update.  This prevents them from being modified or deleted by
+ other transactions until the current transaction ends.  That is,
+ other transactions that attempt UPDATE,
+ DELETE,
+ SELECT FOR UPDATE,
+ SELECT FOR NO KEY UPDATE,
+ SELECT FOR SHARE or
+ SELECT FOR KEY SHARE
+ of these rows will be blocked until the current transaction ends.
+ The FOR UPDATE lock mode
+ is also acquired by any DELETE on a row, and also by an
+ UPDATE that modifies the values on certain columns.  Currently,
+ the set of columns considered for the UPDATE case are those that
+ have a unique index on them that can be used in a foreign key (so partial
+ indexes and expressional indexes are not considered), but this may change
+ in the future.
+ Also, if an UPDATE, DELETE,
+ or SELECT FOR UPDATE from another transaction
+ has already locked a selected row or rows, SELECT FOR
+ UPDATE will wait for the other transaction to complete,
+ and will then lock and return the updated row (or no row, if the
+ row was deleted).  Within a REPEATABLE READ or
+ SERIALIZABLE transaction,
+ however, an error will be thrown if a row to be locked has changed
+ since the transaction started.  For further discussion see
+ .
+
+   
+  

Re: [HACKERS] Column Redaction

2014-10-10 Thread Claudio Freire
On Fri, Oct 10, 2014 at 5:57 AM, Simon Riggs  wrote:
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
> ...
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like ' *  7733'


I wouldn't want to allow that:

select ref.ref, customer.name from (select generate_series as ref from
generate_series(0, )) ref, customer
where ref.ref = stored_card_number.ref

May take a long while. Just disable everything except nestloop and
suck up the data as it comes. Can be optimized. Not sure how you'd
avoid this, not trivial at all. Not possible at all I'd venture.

But if you really really want to allow this, encrypt the column, and
provide a C function that can decrypt it. You can join encrypted
columns, and you can even include the last 4 digits unencrypted if you
want (I wouldn't want).

Has to be a C function to be able to avoid leaking the key, btw.

> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..

view works for this pretty well


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Thom Brown
On 10 October 2014 13:43, Simon Riggs  wrote:
> On 10 October 2014 11:45, Thom Brown  wrote:
>
>> To be honest, this all sounds rather flaky.
>
> To be honest, suggesting anything at all is rather difficult and I
> recommend people try it.

I have, and most ideas I've had have been justifiably shot down or
picked apart (scheduled background tasks, offloading stats collection
to standby, index maintenance in DML query plans, expression
statistics... to name but a few).

> Everything sounds crap when you didn't think of it and you've given it
> an hour's thought.

I'm not sure that means my concerns aren't valid.  I don't think it
sounds crap, but I also can't see any use-case for it where we don't
already have things covered, or where it's going to offer any useful
level of security.  Like with RLS, it may be that I'm just looking at
things from the wrong perspective.
-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Robert Haas
On Fri, Oct 10, 2014 at 7:00 AM, Stephen Frost  wrote:
> * Thom Brown (t...@linux.com) wrote:
>> To be honest, this all sounds rather flaky.  Even if you do rate-limit
>> their queries, they can use methods that avoid rate-limiting, such as
>> recursive queries.  And if you're only after one credit card number
>> (to use the original example), you'd get it in a relatively short
>> amount of time, despite some rate-limiting system.
>
> The discussion about looking up specific card numbers in the original
> email from Simon was actually an allowed use-case, as I understood it,
> not a risk concern.  Indeed, if you know a valid credit card number
> already, as in this example, then why are you bothering with the search?
> Perhaps it would provide confirmation, but it's not the database's
> responsibility to make you forget the number you already have.  Doing a
> random walk through a keyspace of 10^16 and extracting a significant
> enough number of results to be useful should be difficult.  I agree that
> if we're completely unable to make it difficult then this is less
> useful, but I feel it's a bit early to jump to that conclusion.

You are obviously wearing your rose-colored glasses this morning.  I
predict a competent SQL programmer could write an SQL function, or
client-side code, to pump the data out of the database using binary
search in milliseconds per row.  And I think it's more likely than not
that there are other techniques that are much faster.  The idea that
you're going to be able to let people query the data but not actually
retrieve it should be viewed with great skepticism.  This is the
equivalent of telling a child that she can't open her Christmas
presents until Christmas, but she can shake them, hold them up to a
bright light, and/or X-ray the packages.  If she doesn't know what's
in there by the time she opens it, it's just for lack of effort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Simon Riggs
On 10 October 2014 12:01, Heikki Linnakangas  wrote:

> Really, I don't see how this can possible be made to work. You can't allow
> ad hoc processing of data, and still avoid revealing it to the user.

Anyone with unmonitored access and sufficient time can break through security.

I think that is true of any kind of security, and so it is true here also.

Auditing and controls are required also, that's why I suggested those
first. This proposal was looking beyond that to what we might need
next.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Simon Riggs
On 10 October 2014 11:45, Thom Brown  wrote:

> To be honest, this all sounds rather flaky.

To be honest, suggesting anything at all is rather difficult and I
recommend people try it.

Everything sounds crap when you didn't think of it and you've given it
an hour's thought.

I'm not blind to the difficulties raised and I thank you for your
input, but I think its too early to make sweeping generalisations.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Thom Brown
On 10 October 2014 12:45, Stephen Frost  wrote:
>> >> This gives the vague impression of security, but it really seems just
>> >> the placing of a few obstacles in the way.
>> >
>> > One might consider that all security is just placing obstacles in the
>> > way.
>>
>> There's a difference between intending that there shouldn't be a way
>> past security and just making access a matter of walking a longer
>> route.
>
> Throwing random 16-digit numbers and associated information at a credit
> card processor could be viewed as "walking a longer route" too.  The
> same goes for random key searches or password guesses.

But those would need to be exhaustive, and in nearly all cases,
impractical.  Data such as plain credit card numbers stored in a
column, even with all its data masked, would be easy to determine.
Salted and hashed passwords, even with complete visibility of the
value, isn't vulnerable to scrutiny of particular character values.
If it were, no-one would use it.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Heikki Linnakangas

On 10/10/2014 02:27 PM, Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

On 10/10/2014 02:05 PM, Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

On 10/10/2014 01:35 PM, Stephen Frost wrote:

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client).  Of
course, that approach also requires that you have a specific item in
mind.


It doesn't require that you have a specific item in mind. Binary
search is cheap, O(log n). It's easy to write a function to do a
binary search on a single item, passed as argument, and then apply
that to all rows:

SELECT binary_search_reveal(cardnumber) FROM redacted_table;


Note that your binary_search_reveal wouldn't be marked as leakproof and
therefore this wouldn't be allowed.  If this was allowed, you'd simply
do "raise notice" inside the function and call it a day.


*shrug*, just do the same with a more complicated query, then. Even
if you can't create a function that does that, you can still execute
the same logic without a function.


Not sure I see what you're getting at here..?  My point was that you'd
need a target number and the system would only provide confirmation that
the number exists, or does not.  Your argument was that the table
itself would provide the target number, which was flawed.  I don't see
how "just do the same with a more complicated query" removes the need to
have a target number for the binary search.


You said above that it's OK to pass the card numbers to leakproof 
functions. But if you allow that, you can write a function that takes as 
argument a redacted card number, and unredacts it (using the < and = 
operators in a binary search). And then you can just do "SELECT 
unredact(card_number) from redacted_table".


You seem to have something stronger in mind: only allow the equality 
operator on the redacted column, and nothing else. That might be better, 
although I'm not really convinced. There are just too many ways you 
could still leak the datum. Just a random example, inspired by the 
recent CRIME attack on SSL: build a row with the redacted datum, and 
another "guess" datum, and store it along with 1k of other data in a 
temporary table. The row gets toasted. Observe how much it compressed; 
if the guess datum is close to the original datum, it compresses well. 
Now, you can probably stop that particular attack with more restrictions 
on what you can do with the datum, but that just shows that pretty much 
any computation you allow with the datum can be used to reveal its value.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Amit Kapila
On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund 
wrote:
> On 2014-10-10 10:13:03 +0530, Amit Kapila wrote:
> > I have done few performance tests for above patches and results of
> > same is as below:
>
> Cool, thanks.
>
> > Performance Data
> > --
> > IBM POWER-7 16 cores, 64 hardware threads
> > RAM = 64GB
> > max_connections =210
> > Database Locale =C
> > checkpoint_segments=256
> > checkpoint_timeout=35min
> > shared_buffers=8GB
> > Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
> > Duration of each individual run = 5mins
> > Test type - read only pgbench with -M prepared
> > Other Related information about test
> > a. This is the data for median of 3 runs, the detailed data of
individual
> > run
> > is attached with mail.
> > b. I have applied both the patches to take performance data.
> >
> >
> > Observations
> > --
> > a. The patch performs really well (increase upto ~40%) incase all the
> > data fits in shared buffers (scale factor -100).
> > b. Incase data doesn't fit in shared buffers, but fits in RAM
> > (scale factor -3000), there is performance increase upto 16 client
count,
> > however after that it starts dipping (in above config unto ~4.4%).
>
> Hm. Interesting. I don't see that dip on x86.

Is it possible that implementation of some atomic operation is costlier
for particular architecture?

I have tried again for scale factor 3000 and could see the dip and this
time I have even tried with 175 client count and the dip is approximately
5% which is slightly more than 160 client count.


  Patch_ver/Client_count 175  HEAD 248374  PATCH 235669
> > Now probably these shouldn't matter much in case backend needs to
> > wait for other Exclusive locker, but I am not sure what else could be
> > the reason for dip in case we need to have Exclusive LWLocks.
>
> Any chance to get a profile?

Here it goes..

HEAD - client_count=128
-

+   7.53% postgres  postgres   [.] GetSnapshotData
+   3.41% postgres  postgres   [.] AllocSetAlloc
+   2.61% postgres  postgres   [.] AllocSetFreeIndex
+   2.49% postgres  postgres   [.] _bt_compare
+   2.43% postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
+   2.40% postgres  postgres   [.]
hash_search_with_hash_value
+   1.83% postgres  postgres   [.] tas
+   1.29% postgres  postgres   [.] pg_encoding_mbcliplen
+   1.27% postgres  postgres   [.] MemoryContextCreate
+   1.22% postgres  postgres   [.]
MemoryContextAllocZeroAligned
+   1.17% postgres  postgres   [.] hash_seq_search
+   0.97% postgres  postgres   [.] LWLockRelease
+   0.96% postgres  postgres   [.]
MemoryContextAllocZero
+   0.91% postgres  postgres   [.]
GetPrivateRefCountEntry
+   0.82% postgres  postgres   [.] AllocSetFree
+   0.79% postgres  postgres   [.] LWLockAcquireCommon
+   0.78% postgres  postgres   [.] pfree



Detailed Data
-
-   7.53% postgres  postgres   [.] GetSnapshotData
   - GetSnapshotData
  - 7.46% GetSnapshotData
 - 7.46% GetTransactionSnapshot
- 3.74% exec_bind_message
 PostgresMain
 BackendRun
 BackendStartup
 ServerLoop
 PostmasterMain
 main
 generic_start_main.isra.0
 __libc_start_main
 0
- 3.72% PortalStart
 exec_bind_message
 PostgresMain
 BackendRun
 BackendStartup
 ServerLoop
 PostmasterMain
 main
 generic_start_main.isra.0
 __libc_start_main
 0
-   3.41% postgres  postgres   [.] AllocSetAlloc
   - AllocSetAlloc
  - 2.01% AllocSetAlloc
   0.81% palloc
   0.63% MemoryContextAlloc
-   2.61% postgres  postgres   [.] AllocSetFreeIndex
   - AllocSetFreeIndex
1.59% AllocSetAlloc
0.79% AllocSetFree
-   2.49% postgres  postgres   [.] _bt_compare
   - _bt_compare
  - 1.80% _bt_binsrch
 - 1.80% _bt_binsrch
- 1.21% _bt_search
 _bt_first

Lwlock_contention patches - client_count=128
--

+   7.95%  postgres  postgres   [.] GetSnapshotData
+   3.58%  postgres  postgres   [.] AllocSetAlloc
+   2.51%  postgres  postgres   [.] _bt_compare
+   2.44%  postgres  postgres   [.]
hash_search_with_hash_value
+   2.33% 

Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
* Thom Brown (t...@linux.com) wrote:
> On 10 October 2014 12:00, Stephen Frost  wrote:
> > The discussion about looking up specific card numbers in the original
> > email from Simon was actually an allowed use-case, as I understood it,
> > not a risk concern.  Indeed, if you know a valid credit card number
> > already, as in this example, then why are you bothering with the search?
> 
> The topic being "column redaction" rather than "column formatting"
> leads me to believe that the main use-case of the feature would be to
> prevent the user from discovering the full value of the column.

I believe the idea is to limit the chances that a user with limited
pre-existing knowledge would be able to determine the full value of
items in the column, especially in bulk.

> It's
> not so much point 1 I was responding do, rather point 3, where you
> don't know the card number, but you get information about it in the
> results.

We'd certainly want to prevent that to the limit possible.  Do you have
a specific thought about how they'd be able to find a full number beyond
a random search..?

> The purpose of this feature would be to prevent the user
> from seeing all that data, which is a security feature, but at the
> moment it just seems to be a way of making it a little less easy to
> get at that data.

I certainly appreciate the thought challenges and critique and I'm
hopeful we could make it more than "a little less easy" to get at the
information.  If we aren't able to do that, then the feature isn't
useful, certainly.

> >> This gives the vague impression of security, but it really seems just
> >> the placing of a few obstacles in the way.
> >
> > One might consider that all security is just placing obstacles in the
> > way.
> 
> There's a difference between intending that there shouldn't be a way
> past security and just making access a matter of walking a longer
> route.

Throwing random 16-digit numbers and associated information at a credit
card processor could be viewed as "walking a longer route" too.  The
same goes for random key searches or password guesses.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> On 10/10/2014 02:05 PM, Stephen Frost wrote:
> >* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> >>On 10/10/2014 01:35 PM, Stephen Frost wrote:
> >>>Regarding functions, 'leakproof' functions should be alright to allow,
> >>>though Heikki brings up a good point regarding binary search being
> >>>possible in a plpgsql function (or even directly by a client).  Of
> >>>course, that approach also requires that you have a specific item in
> >>>mind.
> >>
> >>It doesn't require that you have a specific item in mind. Binary
> >>search is cheap, O(log n). It's easy to write a function to do a
> >>binary search on a single item, passed as argument, and then apply
> >>that to all rows:
> >>
> >>SELECT binary_search_reveal(cardnumber) FROM redacted_table;
> >
> >Note that your binary_search_reveal wouldn't be marked as leakproof and
> >therefore this wouldn't be allowed.  If this was allowed, you'd simply
> >do "raise notice" inside the function and call it a day.
> 
> *shrug*, just do the same with a more complicated query, then. Even
> if you can't create a function that does that, you can still execute
> the same logic without a function.

Not sure I see what you're getting at here..?  My point was that you'd
need a target number and the system would only provide confirmation that
the number exists, or does not.  Your argument was that the table
itself would provide the target number, which was flawed.  I don't see
how "just do the same with a more complicated query" removes the need to
have a target number for the binary search.

A better argument would be the equality case than the binary search if
you're simply looking for confirmation of existence.  If the user can
define a table of targets, or uses a VALUES construct, and then join to
it then we might build a hash table and provide those results faster
than a binary search, though this again means that the user is
providing the list of keys to check.

As mentioned elsewhere on the thread, I agree that this capability
wouldn't be useful if a random search (which is providing the 'targets')
through a 10^16 keyspace generated a significant number of results (I'd
also throw in there "in a reasonable amount of time"- clearly it'd be
possible to extract all keys given sufficient time, even with a random
search).  The sketch that Simon outlined won't obviously provide that
guarantee, but I'm not prepared to say we couldn't provide it at all
while meeting the goal he outlined.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-10 Thread Thom Brown
On 10 October 2014 12:00, Stephen Frost  wrote:
> * Thom Brown (t...@linux.com) wrote:
>> To be honest, this all sounds rather flaky.  Even if you do rate-limit
>> their queries, they can use methods that avoid rate-limiting, such as
>> recursive queries.  And if you're only after one credit card number
>> (to use the original example), you'd get it in a relatively short
>> amount of time, despite some rate-limiting system.
>
> The discussion about looking up specific card numbers in the original
> email from Simon was actually an allowed use-case, as I understood it,
> not a risk concern.  Indeed, if you know a valid credit card number
> already, as in this example, then why are you bothering with the search?

The topic being "column redaction" rather than "column formatting"
leads me to believe that the main use-case of the feature would be to
prevent the user from discovering the full value of the column.  It's
not so much point 1 I was responding do, rather point 3, where you
don't know the card number, but you get information about it in the
results.  The purpose of this feature would be to prevent the user
from seeing all that data, which is a security feature, but at the
moment it just seems to be a way of making it a little less easy to
get at that data.

>> This gives the vague impression of security, but it really seems just
>> the placing of a few obstacles in the way.
>
> One might consider that all security is just placing obstacles in the
> way.

There's a difference between intending that there shouldn't be a way
past security and just making access a matter of walking a longer
route.

I wouldn't be against formatting per se, but for the purposes of that,
I would say that views can already serve that purpose.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Heikki Linnakangas

On 10/10/2014 02:05 PM, Stephen Frost wrote:

* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:

On 10/10/2014 01:35 PM, Stephen Frost wrote:

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client).  Of
course, that approach also requires that you have a specific item in
mind.


It doesn't require that you have a specific item in mind. Binary
search is cheap, O(log n). It's easy to write a function to do a
binary search on a single item, passed as argument, and then apply
that to all rows:

SELECT binary_search_reveal(cardnumber) FROM redacted_table;


Note that your binary_search_reveal wouldn't be marked as leakproof and
therefore this wouldn't be allowed.  If this was allowed, you'd simply
do "raise notice" inside the function and call it a day.


*shrug*, just do the same with a more complicated query, then. Even if 
you can't create a function that does that, you can still execute the 
same logic without a function.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Hannu Krosing
On 10/10/2014 11:38 AM, Simon Riggs wrote:
> On 10 October 2014 10:29, Heikki Linnakangas  wrote:
>> On 10/10/2014 11:57 AM, Simon Riggs wrote:
>>> Postgres currently supports column level SELECT privileges.
>>>
>>> 1. If we want to confirm a credit card number, we can issue SELECT 1
>>> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>>>
>>> 2. If we want to look for card fraud, we need to be able to use the
>>> full card number to join to transaction data and look up blocked card
>>> lists etc..
>>>
>>> 3. We want to block the direct retrieval of card numbers for
>>> additional security.
>>> In some cases, we might want to return an answer like ' * 
>>> 7733'
>>>
>>> We can't do all of the above with current facilities inside the database.
>>
>> Deny access to the underlying tables. Write SQL functions to do 1. and 2.,
>> and grant privileges to the functions, instead. For 3. create views that do
>> the redaction.
> If everything were easy to lock down the approach you suggest is of
> course the best way.
>
> The problem there is that the SQL for (2) changes frequently, so we
> want to give people SQL access.
1. Give people access to development system with "safe" data where they
write their functions

2. once function is working, pass it to auditors

3. deploy and use the function.
> Just not the ability to retrieve data in a usable form.
For an attacker any access is "in a usable form", for honest people you
can just provide a view or set-returning function.

btw, one way to do the "redaction" you suggested above is to write a
special
type, which redacts data on output.

You can even make the type output function dependent on backup role.

Just make sure that users are aware that it is not really a security
feature
which protects against attackers.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OĂœ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Andres Freund
Hi Robert,

On 2014-10-08 16:01:53 -0400, Robert Haas wrote:
> [ comment fixes ]

Thanks, I've incorporated these + a bit more.

Could you otherwise make sense of the explanation and the algorithm?

> +/* yipeyyahee */
> 
> Although this will be clear to individuals with a good command of
> English, I suggest avoiding such usages.

I've removed them with a heavy heart. These are heartfelt emotions from
getting the algorithm to work (:P)

I've attached these fixes + the removal of spinlocks around releaseOK as
follow up patches. Obviously they'll be merged into the other patch, but
sounds useful to be able see them separately.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 6885a15cc6f2e193ff575a4463d90ad252d74f5e Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 7 Oct 2014 15:32:34 +0200
Subject: [PATCH 1/4] Convert the PGPROC->lwWaitLink list into a dlist instead
 of open coding it.

Besides being shorter and much easier to read it:

* changes the logic in LWLockRelease() to release all shared lockers
  when waking up any. This can yield some significant performance
  improvements - and the fairness isn't really much worse than before,
  as we always allowed new shared lockers to jump the queue.

* adds a memory pg_write_barrier() in the wakeup paths between
  dequeuing and unsetting ->lwWaiting. That was always required on
  weakly ordered machines, but f4077cda2 made it more urgent.

Author: Andres Freund
---
 src/backend/access/transam/twophase.c |   1 -
 src/backend/storage/lmgr/lwlock.c | 151 +-
 src/backend/storage/lmgr/proc.c   |   2 -
 src/include/storage/lwlock.h  |   5 +-
 src/include/storage/proc.h|   3 +-
 5 files changed, 60 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d5409a6..6401943 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -389,7 +389,6 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 	proc->roleId = owner;
 	proc->lwWaiting = false;
 	proc->lwWaitMode = 0;
-	proc->lwWaitLink = NULL;
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 9fe6855..e6f9158 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -35,6 +35,7 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "replication/slot.h"
+#include "storage/barrier.h"
 #include "storage/ipc.h"
 #include "storage/predicate.h"
 #include "storage/proc.h"
@@ -115,9 +116,9 @@ inline static void
 PRINT_LWDEBUG(const char *where, const LWLock *lock)
 {
 	if (Trace_lwlocks)
-		elog(LOG, "%s(%s %d): excl %d shared %d head %p rOK %d",
+		elog(LOG, "%s(%s %d): excl %d shared %d rOK %d",
 			 where, T_NAME(lock), T_ID(lock),
-			 (int) lock->exclusive, lock->shared, lock->head,
+			 (int) lock->exclusive, lock->shared,
 			 (int) lock->releaseOK);
 }
 
@@ -475,8 +476,7 @@ LWLockInitialize(LWLock *lock, int tranche_id)
 	lock->exclusive = 0;
 	lock->shared = 0;
 	lock->tranche = tranche_id;
-	lock->head = NULL;
-	lock->tail = NULL;
+	dlist_init(&lock->waiters);
 }
 
 
@@ -615,12 +615,7 @@ LWLockAcquireCommon(LWLock *lock, LWLockMode mode, uint64 *valptr, uint64 val)
 
 		proc->lwWaiting = true;
 		proc->lwWaitMode = mode;
-		proc->lwWaitLink = NULL;
-		if (lock->head == NULL)
-			lock->head = proc;
-		else
-			lock->tail->lwWaitLink = proc;
-		lock->tail = proc;
+		dlist_push_head(&lock->waiters, &proc->lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(&lock->mutex);
@@ -836,12 +831,7 @@ LWLockAcquireOrWait(LWLock *lock, LWLockMode mode)
 
 		proc->lwWaiting = true;
 		proc->lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc->lwWaitLink = NULL;
-		if (lock->head == NULL)
-			lock->head = proc;
-		else
-			lock->tail->lwWaitLink = proc;
-		lock->tail = proc;
+		dlist_push_head(&lock->waiters, &proc->lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(&lock->mutex);
@@ -997,13 +987,8 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 		 */
 		proc->lwWaiting = true;
 		proc->lwWaitMode = LW_WAIT_UNTIL_FREE;
-		proc->lwWaitLink = NULL;
-
 		/* waiters are added to the front of the queue */
-		proc->lwWaitLink = lock->head;
-		if (lock->head == NULL)
-			lock->tail = proc;
-		lock->head = proc;
+		dlist_push_head(&lock->waiters, &proc->lwWaitLink);
 
 		/* Can release the mutex now */
 		SpinLockRelease(&lock->mutex);
@@ -1079,9 +1064,10 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
 void
 LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
 {
-	PGPROC	   *head;
-	PGPROC	   *proc;
-	PGPROC	   *next;
+	dlist_head	wakeup;
+	dlist_mutable_iter iter;
+

Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> On 10/10/2014 01:35 PM, Stephen Frost wrote:
> >Regarding functions, 'leakproof' functions should be alright to allow,
> >though Heikki brings up a good point regarding binary search being
> >possible in a plpgsql function (or even directly by a client).  Of
> >course, that approach also requires that you have a specific item in
> >mind.
> 
> It doesn't require that you have a specific item in mind. Binary
> search is cheap, O(log n). It's easy to write a function to do a
> binary search on a single item, passed as argument, and then apply
> that to all rows:
> 
> SELECT binary_search_reveal(cardnumber) FROM redacted_table;

Note that your binary_search_reveal wouldn't be marked as leakproof and
therefore this wouldn't be allowed.  If this was allowed, you'd simply
do "raise notice" inside the function and call it a day.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-10 Thread Heikki Linnakangas

On 10/10/2014 01:35 PM, Stephen Frost wrote:

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client).  Of
course, that approach also requires that you have a specific item in
mind.


It doesn't require that you have a specific item in mind. Binary search 
is cheap, O(log n). It's easy to write a function to do a binary search 
on a single item, passed as argument, and then apply that to all rows:


SELECT binary_search_reveal(cardnumber) FROM redacted_table;

Really, I don't see how this can possible be made to work. You can't 
allow ad hoc processing of data, and still avoid revealing it to the user.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
* Thom Brown (t...@linux.com) wrote:
> To be honest, this all sounds rather flaky.  Even if you do rate-limit
> their queries, they can use methods that avoid rate-limiting, such as
> recursive queries.  And if you're only after one credit card number
> (to use the original example), you'd get it in a relatively short
> amount of time, despite some rate-limiting system.

The discussion about looking up specific card numbers in the original
email from Simon was actually an allowed use-case, as I understood it,
not a risk concern.  Indeed, if you know a valid credit card number
already, as in this example, then why are you bothering with the search?
Perhaps it would provide confirmation, but it's not the database's
responsibility to make you forget the number you already have.  Doing a
random walk through a keyspace of 10^16 and extracting a significant
enough number of results to be useful should be difficult.  I agree that
if we're completely unable to make it difficult then this is less
useful, but I feel it's a bit early to jump to that conclusion.

> This gives the vague impression of security, but it really seems just
> the placing of a few obstacles in the way.

One might consider that all security is just placing obstacles in the
way.

> And "auditing" sounds like a euphemism for "pass the problem of
> security on elsewhere anyway".

Auditing is a known requirement for good security..  There's certainly
different levels of it, but if you aren't at least auditing your
security configuration for the attack vectors you're concerned about,
then you're unlikely to have any real security.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
* Heikki Linnakangas (hlinnakan...@vmware.com) wrote:
> On 10/10/2014 01:21 PM, Simon Riggs wrote:
> >Redaction is now a feature available in other databases. I guess its
> >possible its all smoke and mirrors, but thats why we discuss stuff
> >before we build it.
> 
> I googled for Oracle Data redaction, and found "General Usage guidelines":
> 
> >General Usage Guidelines
> >
> >* Oracle Data Redaction is not intended to protect against attacks by
> >privileged database users who run ad hoc queries directly against the
> >database.
> >
> >* Oracle Data Redaction is not intended to protect against users who
> >run exhaustive SQL queries that attempt to determine the actual
> >values by inference.
> 
> So it's not actually suitable for the example you gave. I don't
> think we want this feature...

Or, we need to consider how Oracle addresses these risks and consider if
we can provide a similar capability.  Those capabilities may include
specific configuration and could be a prerequisite for this feature, but
I don't think it's sensible to say we don't want this feature simply
because it can't stand alone as a perfect answer to these risks.

As has been discussed before, we are likely in a better position to
identify the concerns and problem areas, come up with recommendations
for configuration and/or develop new capabilities to mitigate those
risks, than the every-day user or DBA.  If we provide it and address
these issues in a central location which is generally available, then
fixes and problems can be addressed and fixed rather than every
database implementation faced with these concerns having to address
them independently with, most likely, poorer quality solutions.

While we don't want every feature of every database, this deserves more
consideration.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-10-10 Thread Andres Freund
On 2014-10-08 20:07:35 -0400, Robert Haas wrote:
> On Wed, Oct 8, 2014 at 2:04 PM, Andres Freund  wrote:
> > So, what makes it work for me (among other unrelated stuff) seems to be
> > the following in .gdbinit, defineing away some things that gdb doesn't
> > handle:
> > macro define __builtin_offsetof(T, F) ((int) &(((T *) 0)->F))
> > macro define __extension__
> > macro define AssertVariableIsOfTypeMacro(x, y) ((void)0)
> >
> > Additionally I have "-ggdb -g3" in CFLAGS. That way gdb knows about
> > postgres' macros. At least if you're in the right scope.
> >
> > As an example, the following works:
> > (gdb) p dlist_is_empty(&BackendList) ? NULL : dlist_head_element(Backend, 
> > elem, &BackendList)
> 
> Ah, cool.  I'll try that.

If that works for you, should we put it somewhere in the docs? If so,
where?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Thom Brown
On 10 October 2014 11:35, Stephen Frost  wrote:
> Simon,
>
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> The requirement for redaction cannot be provided by a view.
>>
>> A view provides a single value for each column, no matter whether it
>> is used in SELECT or WHERE clause.
>>
>> Redaction requires output formatting only, but unchanged for other purposes.
>>
>> Redaction is now a feature available in other databases. I guess its
>> possible its all smoke and mirrors, but thats why we discuss stuff
>> before we build it.
>
> In general, I'm on-board with the idea and similar requests have come
> from users I've talked with.
>
> Is there any additional information available on how these other
> databases deal with the questions and concerns which have been raised?
>
> Regarding functions, 'leakproof' functions should be alright to allow,
> though Heikki brings up a good point regarding binary search being
> possible in a plpgsql function (or even directly by a client).  Of
> course, that approach also requires that you have a specific item in
> mind.  Methods to mitigate would include not allowing regular users to
> create functions or run DO blocks and rate-limiting their queries, along
> with appropriate auditing.

To be honest, this all sounds rather flaky.  Even if you do rate-limit
their queries, they can use methods that avoid rate-limiting, such as
recursive queries.  And if you're only after one credit card number
(to use the original example), you'd get it in a relatively short
amount of time, despite some rate-limiting system.

This gives the vague impression of security, but it really seems just
the placing of a few obstacles in the way.

And "auditing" sounds like a euphemism for "pass the problem of
security on elsewhere anyway".

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Pavel Stehule
Hi

2014-10-10 10:57 GMT+02:00 Simon Riggs :

> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like ' * 
> 7733'
>
> We can't do all of the above with current facilities inside the database.
>
> The ability to mask output for data in certain cases, for the purpose
> of security, is known lately as data redaction, or column-level data
> redaction.
>
> The best way to support this requirement would be to allow columns to
> have an additional "output formatting function". This would be
> executed only when data is about to be returned by a query. All other
> uses of that would not restrict the data.
>
> This would have other uses as well, such as default report formats, so
> we can store financial amounts as NUMERIC, but format them on
> retrieval as $12,345.78 etc..
>
> Suggested user interface would be...
>FORMAT functionname(parameters, if any)
>
> e.g.
> CREATE TABLE customer
> ( id ...
> ...
> , stored_card_number  NUMERIC FORMAT pci_card_number_redaction()
> ...
> );
>
> We'd need to implement something to allow pg_dump to ignore format
> functions. I suggest the best way to do that is by providing a BACKUP
> role that can be delegated to other users. We would then allow a
> parameter for SET output_formatting = on | off, which can only be set
> by superuser and BACKUP role, then have pg_dump issue SET
> output_formatting = off explicitly when it runs.
>
>
I see a benefit of this feature as alternative output function .. I
remember a talk about output format of boolean function. But how this
feature can help to security?

You should to disallow any expression over this column marked or you have
to enforced output alternative output function early.

When you require a alternative output format function (should be
implemented in C), then there is not too less work than implementation of
new type. So probably much more practical a any expression can be used

like

 stored_card_number NUMERIC FORMAT (right(stored_card_numbe::text, 4))

Regards

Pavel


> Do we want redaction in PostgreSQL?
> Do we want it generalised into output format functions?
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Column Redaction

2014-10-10 Thread Stephen Frost
Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> The requirement for redaction cannot be provided by a view.
> 
> A view provides a single value for each column, no matter whether it
> is used in SELECT or WHERE clause.
> 
> Redaction requires output formatting only, but unchanged for other purposes.
> 
> Redaction is now a feature available in other databases. I guess its
> possible its all smoke and mirrors, but thats why we discuss stuff
> before we build it.

In general, I'm on-board with the idea and similar requests have come
from users I've talked with.

Is there any additional information available on how these other
databases deal with the questions and concerns which have been raised?

Regarding functions, 'leakproof' functions should be alright to allow,
though Heikki brings up a good point regarding binary search being
possible in a plpgsql function (or even directly by a client).  Of
course, that approach also requires that you have a specific item in
mind.  Methods to mitigate would include not allowing regular users to
create functions or run DO blocks and rate-limiting their queries, along
with appropriate auditing. 

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-10 Thread Heikki Linnakangas

On 10/10/2014 01:21 PM, Simon Riggs wrote:

Redaction is now a feature available in other databases. I guess its
possible its all smoke and mirrors, but thats why we discuss stuff
before we build it.


I googled for Oracle Data redaction, and found "General Usage guidelines":


General Usage Guidelines

* Oracle Data Redaction is not intended to protect against attacks by
privileged database users who run ad hoc queries directly against the
database.

* Oracle Data Redaction is not intended to protect against users who
run exhaustive SQL queries that attempt to determine the actual
values by inference.


So it's not actually suitable for the example you gave. I don't think we 
want this feature...


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Simon Riggs
On 10 October 2014 11:08, Damian Wolgast  wrote:
>
>> The problem there is that the SQL for (2) changes frequently, so we
>> want to give people SQL access.
>
> So you want to give people access to your SQL database and worry that they 
> could see specific information (credit card numbers) in plain and therefore 
> you want to format it, so that people cannot see the real data. Is that 
> correct?
>
> I'd either do that by only letting them access a view or be reconsidering if 
> it is really a good idea to give them SQL access to the server as they could 
> do other things which e.g. could slow down the server enormously.
> Never trust the user. So I see what you want to achieve but I am not sure if 
> the reason to do that is good. Can you explain please?
> Maybe you should provide them an interface (e.g. web app) that restricts 
> access to certain functions and cares about formatting.

The requirement for redaction cannot be provided by a view.

A view provides a single value for each column, no matter whether it
is used in SELECT or WHERE clause.

Redaction requires output formatting only, but unchanged for other purposes.

Redaction is now a feature available in other databases. I guess its
possible its all smoke and mirrors, but thats why we discuss stuff
before we build it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-10-10 Thread Andres Freund
On 2014-10-10 12:28:13 +0530, Amit Kapila wrote:
> On Fri, Oct 10, 2014 at 1:08 AM, Andres Freund 
> wrote:
> > On 2014-10-09 16:01:55 +0200, Andres Freund wrote:
> > >
> > > I don't think OLTP really is the best test case for this. Especially not
> > > pgbench with relatilvely small rows *and* a uniform distribution of
> > > access.
> > >
> > > Try parallel COPY TO. Batch write loads is where I've seen this hurt
> > > badly.
> >
> >
> > just by switching shared_buffers from 1 to 8GB. I haven't tried, but I
> > hope that with an approach like your's this might become better.
> >
> > psql -f /tmp/prepare.sql
> > pgbench -P5 -n -f /tmp/copy.sql -c 8 -j 8 -T 100
> 
> Thanks for providing the scripts.  You haven't specified how much data
> is present in 'large' file used in tests.

I don't think it matters much. It should be small enough that you get a
couple TPS over all backends.

> I have tried with different set of
> rows, but I could not see the dip that is present in your data when you
> increased shared buffers from 1GB to 8GB, also I don't see any difference
> with patch.

Interesting. I wonder whether that's because the concurrency wasn't high
enough for that machine to show the problem. I ran the test on my
workstation which has 8 actual cores.

> BTW, why do you think that for such worklaods this patch can
> be helpful, according to my understanding it can be helpful mainly for
> read mostly workloads when all the data doesn't fit in shared buffers.

The performance dip comes from all the backends performing the clock
sweep. As the access is pretty uniform all the buffers start with some
usage count (IIRC 3 in this example. Much worse if 5). Due to the
uniform usagecount the clock sweep frequently has to go several times
through *all* the buffers. That leads to quite horrible performance in
some cases.
I had hoped that bgreclaimer can make that workload les horrible by
funneling most of the accesses through the freelist.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Damian Wolgast

> The problem there is that the SQL for (2) changes frequently, so we
> want to give people SQL access.

So you want to give people access to your SQL database and worry that they 
could see specific information (credit card numbers) in plain and therefore you 
want to format it, so that people cannot see the real data. Is that correct?

I'd either do that by only letting them access a view or be reconsidering if it 
is really a good idea to give them SQL access to the server as they could do 
other things which e.g. could slow down the server enormously.
Never trust the user. So I see what you want to achieve but I am not sure if 
the reason to do that is good. Can you explain please?
Maybe you should provide them an interface (e.g. web app) that restricts access 
to certain functions and cares about formatting.

Regards
Damian Wolgast (irc:asymetrixs)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Simon Riggs
On 10 October 2014 10:15, Thom Brown  wrote:
> On 10 October 2014 09:57, Simon Riggs  wrote:
>> Postgres currently supports column level SELECT privileges.
>>
>> 1. If we want to confirm a credit card number, we can issue SELECT 1
>> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>>
>> 2. If we want to look for card fraud, we need to be able to use the
>> full card number to join to transaction data and look up blocked card
>> lists etc..
>>
>> 3. We want to block the direct retrieval of card numbers for
>> additional security.
>> In some cases, we might want to return an answer like ' *  7733'
>
> One question that immediately springs to mind is: would the format
> apply when passing columns to other functions?  If not, wouldn't
> something like
>
> SELECT upper(redacted_column::text) ...
>
> just bypass the formatting?

Yes, it would. As would SELECT redacted_column || ' '

I'm not sure how to block such usage, other than to apply it prior to
final calculation of functions.

i.e. we apply it in the SELECT clause, but not in the other clauses
FROM ON/WHERE/GROUP/ORDER/HAVING etc..



> Also, how would casting be handled?  Would it be forbidden for such cases?
>
>
> And couldn't the card number be worked out using:
>
> SELECT 1 FROM customer WHERE stored_card_number LIKE '%1 7733';
>  ?column?
> --
> (0 rows)
>
> SELECT 1 FROM customer WHERE stored_card_number LIKE '%2 7733';
>  ?column?
> --
> 1
> (1 row)
>
> SELECT 1 FROM customer WHERE stored_card_number LIKE '%12 7733';
>  ?column?
> --
> (0 rows)
>
> .. and so on, which could be scripted in a DO statement?
>
>
> Not so much a challenge to the idea, but just wishing to understand
> how it would work.

Yes, covert channels would always exist. It would really be down to
auditing to control such exploits.

Redaction is aimed at minimising access in normal usage.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Simon Riggs
On 10 October 2014 10:29, Heikki Linnakangas  wrote:
> On 10/10/2014 11:57 AM, Simon Riggs wrote:
>>
>> Postgres currently supports column level SELECT privileges.
>>
>> 1. If we want to confirm a credit card number, we can issue SELECT 1
>> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>>
>> 2. If we want to look for card fraud, we need to be able to use the
>> full card number to join to transaction data and look up blocked card
>> lists etc..
>>
>> 3. We want to block the direct retrieval of card numbers for
>> additional security.
>> In some cases, we might want to return an answer like ' * 
>> 7733'
>>
>> We can't do all of the above with current facilities inside the database.
>
>
> Deny access to the underlying tables. Write SQL functions to do 1. and 2.,
> and grant privileges to the functions, instead. For 3. create views that do
> the redaction.

If everything were easy to lock down the approach you suggest is of
course the best way.

The problem there is that the SQL for (2) changes frequently, so we
want to give people SQL access.

Just not the ability to retrieve data in a usable form.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Damian Wolgast


>
>This would have other uses as well, such as default report formats, so
>we can store financial amounts as NUMERIC, but format them on
>retrieval as $12,345.78 etc..

Nice idea, but what if you need to do further calculations?
If you output the value of credit card transactions it works fine, but in
case you want to SUM up the values, then you need to cast it back from
text(?) to numeric, calculate it and cast it to text(?) again?
And if you do - for any reason - need the credit card number in your
application (for example sending it to the credit card company to deduct
money) how can you retrieve it¹s original value?

Moreover, if you SELECT from a sub-SELECT which already has the formatted
information and not the plain data?

Maybe you should restrict access to tables for a certain user and only
allow the user to use a view which formats the output.

Modern applications do have a presentation layer which should take care of
data formatting. I am not sure if it is a good idea to mix data storage
and data presentation in the database.


Regards,
Damian Wolgast (irc:asymetrixs)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user set local_preload_libraries.

2014-10-10 Thread Fujii Masao
On Fri, Oct 10, 2014 at 5:36 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, I overlooked this thread.
>
>> On Mon, Sep 15, 2014 at 1:33 AM, Tom Lane  wrote:
>> > Peter Eisentraut  writes:
>> >> On 9/1/14 7:51 AM, Kyotaro HORIGUCHI wrote:
>> >>> The attached patch simply changes the context for local_... to
>> >>> PGC_USERSET and edits the doc.
>> >
>> >> I had this ready to commit, but then
>> >
>> >> Invent PGC_SU_BACKEND and mark log_connections/log_disconnections
>> >> that way.
>> >
>> >> was committed in the meantime.
>> >
>> >> Does this affect what we should do with this change?
>> >
>> >> I guess one thing to look into would be whether we could leave
>> >> local_preload_libraries as PGC_BACKEND and change
>> >> session_preload_libraries to PGC_SU_BACKEND, and then investigate
>> >> whether we could allow settings made with ALTER ROLE / SET to change
>> >> PGC_BACKEND settings.
>> >
>> > Yeah, I was wondering about that while I was making the other commit.
>> > I did not touch those variables at the time, but it would make sense
>> > to restrict them as you suggest.
>>
>> +1
>>
>> Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
>> set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
>> about applying the attached patch? This patch allows that and
>> changes the context of session_preload_libraries to PGC_SU_BACKEND.
>
> It's not my business to decide to aplly it but I don't see
> obvious problmen in it so far.
>
> By the way, I became unable to login at all after wrongly setting
> *_preload_libraries for all available users. Is there any releaf
> measures for the situation? I think it's okay even if there's no
> way to login again but want to know if any.

Yep, that's a problem. You can login to the server even in that case
by, for example, executing the following commands, though.

$ export PGOPTIONS="-c *_preload_libraries="
$ psql

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Heikki Linnakangas

On 10/10/2014 11:57 AM, Simon Riggs wrote:

Postgres currently supports column level SELECT privileges.

1. If we want to confirm a credit card number, we can issue SELECT 1
FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

2. If we want to look for card fraud, we need to be able to use the
full card number to join to transaction data and look up blocked card
lists etc..

3. We want to block the direct retrieval of card numbers for
additional security.
In some cases, we might want to return an answer like ' *  7733'

We can't do all of the above with current facilities inside the database.


Deny access to the underlying tables. Write SQL functions to do 1. and 
2., and grant privileges to the functions, instead. For 3. create views 
that do the redaction.



The ability to mask output for data in certain cases, for the purpose
of security, is known lately as data redaction, or column-level data
redaction.

The best way to support this requirement would be to allow columns to
have an additional "output formatting function". This would be
executed only when data is about to be returned by a query. All other
uses of that would not restrict the data.


I don't see how that could work. Once you have access to the datum, you 
can find its value in many indirect ways, without invoking the output 
function. For example, write a PL/pgSQL function that takes the card 
number as argument. Use < and > to binary search its value. If you block 
< and >, I'm sure there are countless other ways.


And messing with output functions seems pretty, well, messy, in general.

I think the only solution that's going to work in practice is to 
implement the redaction at a higher level. Don't allow direct access to 
the tables with card numbers. Create functions that do whatever joins, 
etc. you need to do with them, and grant privileges to only the functions.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Thom Brown
On 10 October 2014 09:57, Simon Riggs  wrote:
> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like ' *  7733'

One question that immediately springs to mind is: would the format
apply when passing columns to other functions?  If not, wouldn't
something like

SELECT upper(redacted_column::text) ...

just bypass the formatting?


Also, how would casting be handled?  Would it be forbidden for such cases?


And couldn't the card number be worked out using:

SELECT 1 FROM customer WHERE stored_card_number LIKE '%1 7733';
 ?column?
--
(0 rows)

SELECT 1 FROM customer WHERE stored_card_number LIKE '%2 7733';
 ?column?
--
1
(1 row)

SELECT 1 FROM customer WHERE stored_card_number LIKE '%12 7733';
 ?column?
--
(0 rows)

.. and so on, which could be scripted in a DO statement?


Not so much a challenge to the idea, but just wishing to understand
how it would work.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-10 Thread Dave Page
On Fri, Oct 10, 2014 at 9:57 AM, Simon Riggs  wrote:
> Postgres currently supports column level SELECT privileges.
>
> 1. If we want to confirm a credit card number, we can issue SELECT 1
> FROM customer WHERE stored_card_number = '1234 5678 5344 7733'
>
> 2. If we want to look for card fraud, we need to be able to use the
> full card number to join to transaction data and look up blocked card
> lists etc..
>
> 3. We want to block the direct retrieval of card numbers for
> additional security.
> In some cases, we might want to return an answer like ' *  7733'
>
> We can't do all of the above with current facilities inside the database.
>
> The ability to mask output for data in certain cases, for the purpose
> of security, is known lately as data redaction, or column-level data
> redaction.
>
> The best way to support this requirement would be to allow columns to
> have an additional "output formatting function". This would be
> executed only when data is about to be returned by a query. All other
> uses of that would not restrict the data.
>
> This would have other uses as well, such as default report formats, so
> we can store financial amounts as NUMERIC, but format them on
> retrieval as $12,345.78 etc..
>
> Suggested user interface would be...
>FORMAT functionname(parameters, if any)
>
> e.g.
> CREATE TABLE customer
> ( id ...
> ...
> , stored_card_number  NUMERIC FORMAT pci_card_number_redaction()
> ...
> );

I like that idea a lot - could be very useful (it reminds me of my Pick days).

> We'd need to implement something to allow pg_dump to ignore format
> functions. I suggest the best way to do that is by providing a BACKUP
> role that can be delegated to other users. We would then allow a
> parameter for SET output_formatting = on | off, which can only be set
> by superuser and BACKUP role, then have pg_dump issue SET
> output_formatting = off explicitly when it runs.

That seems like a reasonable approach. I can imagine other uses for a
BACKUP role in the future.

> Do we want redaction in PostgreSQL?

+1

> Do we want it generalised into output format functions?

+1

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Windows,PATCH] Use faster, higher precision timer API

2014-10-10 Thread Craig Ringer
On 09/17/2014 08:27 PM, Craig Ringer wrote:
> Hi all
> 
> Attached is a patch to switch 9.5 over to using the
> GetSystemTimeAsFileTime call instead of separate GetSystemTime and
> SystemTimeToFileTime calls.

Following on from my prior patch that switches to using
GetSystemTimeAsFileTime, I now attach a two-patch series that also adds
support for GetFileTimePreciseAsFileTime where it is available.

Details in the patch commit messages.

These should apply fine with "git am", or at worst "git am --3way".
Otherwise you can just grab them from the working tree:

https://github.com/ringerc/postgres/tree/win32_use_filetime

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 7f4891ee288a6b36828ccf23f63396d35925311e Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 18 Sep 2014 23:02:14 +0800
Subject: [PATCH 2/2] Use GetSystemTimePreciseAsFileTime when available

This will cause PostgreSQL on Windows 8 or Windows Server 2012 to
obtain high-resolution timestamps while allowing the same binaries
to run without problems on older releases.
---
 src/backend/main/main.c |  6 ++
 src/include/port.h  |  2 ++
 src/port/gettimeofday.c | 54 +++--
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index c1116e0..6ddf147 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -259,6 +259,12 @@ startup_hacks(const char *progname)
 
 		/* In case of general protection fault, don't show GUI popup box */
 		SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
+
+#ifndef HAVE_GETTIMEOFDAY
+		/* Figure out which syscall to use to capture timestamp information */
+		init_win32_gettimeofday();
+#endif
+
 	}
 #endif   /* WIN32 */
 
diff --git a/src/include/port.h b/src/include/port.h
index 9f8465e..4f8af0a 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -328,6 +328,8 @@ extern FILE *pgwin32_popen(const char *command, const char *type);
 #ifndef HAVE_GETTIMEOFDAY
 /* Last parameter not used */
 extern int	gettimeofday(struct timeval * tp, struct timezone * tzp);
+/* On windows we need to call some backend start setup for accurate timing */
+extern void init_win32_gettimeofday(void);
 #endif
 #else			/* !WIN32 */
 
diff --git a/src/port/gettimeofday.c b/src/port/gettimeofday.c
index 73ec406..63c3a79 100644
--- a/src/port/gettimeofday.c
+++ b/src/port/gettimeofday.c
@@ -30,14 +30,64 @@
 
 #include 
 
+#ifndef FRONTEND
+#include 
+#endif
+
 
 /* FILETIME of Jan 1 1970 00:00:00. */
 static const unsigned __int64 epoch = UINT64CONST(1164447360);
 
 /*
+ * Both GetSystemTimeAsFileTime and GetSystemTimePreciseAsFileTime share a
+ * signature, so we can just store a pointer to whichever we find. This
+ * is the pointer's type.
+ */
+typedef VOID (WINAPI *PgGetSystemTimeFn)(LPFILETIME);
+/* Storage for the function we pick at runtime */
+static PgGetSystemTimeFn pg_get_system_time = NULL;
+
+/*
+ * During backend startup, determine if GetSystemTimePreciseAsFileTime is
+ * available and use it; if not, fall back to GetSystemTimeAsFileTime.
+ */
+void
+init_win32_gettimeofday(void)
+{
+	/*
+	 * Because it's guaranteed that kernel32.dll will be linked into our
+	 * address space already, we don't need to LoadLibrary it and worry about
+	 * closing it afterwards, so we're not using Pg's dlopen/dlsym() wrapper.
+	 *
+	 * We'll just look up the address of GetSystemTimePreciseAsFileTime if
+	 * present.
+	 *
+	 * While we could look up the Windows version and skip this on Windows
+	 * versions below Windows 8 / Windows Server 2012 there isn't much point,
+	 * and determining the windows version is its self somewhat Windows version
+	 * and development SDK specific...
+	 */
+	pg_get_system_time = (PgGetSystemTimeFn) GetProcAddress(
+			GetModuleHandle(TEXT("kernel32.dll")),
+"GetSystemRimePreciseAsFileTime");
+	if (pg_get_system_time == NULL)
+	{
+		DWORD errcode = GetLastError();
+#ifndef FRONTEND
+		if (errcode != ERROR_PROC_NOT_FOUND)
+		{
+			elog(DEBUG1, "GetProcAddress(\"GetSystemRimePreciseAsFileTime\") on kernel32.dll failed with error code %d not expected ERROR_PROC_NOT_FOUND(127)", errcode);
+		}
+#endif
+		pg_get_system_time = &GetSystemTimeAsFileTime;
+	}
+
+}
+
+/*
  * timezone information is stored outside the kernel so tzp isn't used anymore.
  *
- * Note: this function is not for Win32 high precision timing purpose. See
+ * Note: this function is not for Win32 high precision timing purposes. See
  * elapsed_time().
  */
 int
@@ -46,7 +96,7 @@ gettimeofday(struct timeval * tp, struct timezone * tzp)
 	FILETIME	file_time;
 	ULARGE_INTEGER ularge;
 
-	GetSystemTimeAsFileTime(&file_time);
+	(*pg_get_system_time)(&file_time);
 	ularge.LowPart = file_time.dwLowDateTime;
 	ularge.HighPart = file_time.dwHighDateTime;
 
-- 
1.9.3

>From 337f1e4894bf7071c6d0c24a77c433a0b9e0cc81 Mon Sep 17 00:00:00 20

[HACKERS] Column Redaction

2014-10-10 Thread Simon Riggs
Postgres currently supports column level SELECT privileges.

1. If we want to confirm a credit card number, we can issue SELECT 1
FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

2. If we want to look for card fraud, we need to be able to use the
full card number to join to transaction data and look up blocked card
lists etc..

3. We want to block the direct retrieval of card numbers for
additional security.
In some cases, we might want to return an answer like ' *  7733'

We can't do all of the above with current facilities inside the database.

The ability to mask output for data in certain cases, for the purpose
of security, is known lately as data redaction, or column-level data
redaction.

The best way to support this requirement would be to allow columns to
have an additional "output formatting function". This would be
executed only when data is about to be returned by a query. All other
uses of that would not restrict the data.

This would have other uses as well, such as default report formats, so
we can store financial amounts as NUMERIC, but format them on
retrieval as $12,345.78 etc..

Suggested user interface would be...
   FORMAT functionname(parameters, if any)

e.g.
CREATE TABLE customer
( id ...
...
, stored_card_number  NUMERIC FORMAT pci_card_number_redaction()
...
);

We'd need to implement something to allow pg_dump to ignore format
functions. I suggest the best way to do that is by providing a BACKUP
role that can be delegated to other users. We would then allow a
parameter for SET output_formatting = on | off, which can only be set
by superuser and BACKUP role, then have pg_dump issue SET
output_formatting = off explicitly when it runs.

Do we want redaction in PostgreSQL?
Do we want it generalised into output format functions?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user set local_preload_libraries.

2014-10-10 Thread Kyotaro HORIGUCHI
Hello, I overlooked this thread.

> On Mon, Sep 15, 2014 at 1:33 AM, Tom Lane  wrote:
> > Peter Eisentraut  writes:
> >> On 9/1/14 7:51 AM, Kyotaro HORIGUCHI wrote:
> >>> The attached patch simply changes the context for local_... to
> >>> PGC_USERSET and edits the doc.
> >
> >> I had this ready to commit, but then
> >
> >> Invent PGC_SU_BACKEND and mark log_connections/log_disconnections
> >> that way.
> >
> >> was committed in the meantime.
> >
> >> Does this affect what we should do with this change?
> >
> >> I guess one thing to look into would be whether we could leave
> >> local_preload_libraries as PGC_BACKEND and change
> >> session_preload_libraries to PGC_SU_BACKEND, and then investigate
> >> whether we could allow settings made with ALTER ROLE / SET to change
> >> PGC_BACKEND settings.
> >
> > Yeah, I was wondering about that while I was making the other commit.
> > I did not touch those variables at the time, but it would make sense
> > to restrict them as you suggest.
> 
> +1
> 
> Also I think that it's useful to allow ALTER ROLE/DATABASE SET to
> set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what
> about applying the attached patch? This patch allows that and
> changes the context of session_preload_libraries to PGC_SU_BACKEND.

It's not my business to decide to aplly it but I don't see
obvious problmen in it so far.

By the way, I became unable to login at all after wrongly setting
*_preload_libraries for all available users. Is there any releaf
measures for the situation? I think it's okay even if there's no
way to login again but want to know if any.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog --status-interval add fsync feedback

2014-10-10 Thread Fujii Masao
On Thu, Oct 9, 2014 at 6:42 PM,   wrote:
>> >>> If we remove --fsync-interval, resoponse time to user will not be
>> delay.
>> >>> Although, fsync will be executed multiple times in a short period.
>> >>> And there is no way to solve the problem without --fsync-interval,
>> >>> what
>> >> should we do about it?
>> >>
>> >> I'm sorry, I didn't understand that.
>> >
>> > Here is an example.
>> > When WAL is sent at 100ms intervals, fsync() is executed 10 times per
>> second.
>> > If --fsync-interval is set by 1 second, we have to wait SQL
>> responce(kind of making WAL record) for 1 second, though, fsync() won't
>> be executed several times in 1 second.
>> > I think --fsync-interval meets the demands of people who wants to
>> restrain fsync() happens for several time in short period, but what do
>> you think?
>> > Is it ok to delete --fsync-interval ?
>>
>> I still don't see the problem.
>>
>> In synchronous mode, pg_receivexlog should have similar logic as
>> walreceiver does. It should read as much WAL from the socket as it can
>> without blocking, and fsync() and send reply after that. And also fsync
>> whenever switching to new segment. If the master sends WAL every 100ms,
>> then pg_recevexlog will indeed fsync() 10 times per second. There's
>> nothing wrong with that.
>>
>> In asynchronous mode, only fsync whenever switching to new segment.
>>
>> >> Yeah. Or rather, add a new message type, to indicate the
>> >> synchronous/asynchronous status.
>> >
>> > What kind of 'message type' we have to add ?
>> >
>> > Do we need to separate 'w' into two types ? synchronous and
>> asynchronous ?
>> >
>> > OR
>> >
>> > Add a new message type, kind of 'notify synchronous', and inform
>> > pg_receivexlog of synchronous status when it connect to the server.
>>
>> Better to add a new "notify" message type. And pg_recevexlog should be
>> prepared to receive it at any time. The status might change on the fly,
>> if the server's configuration is reloaded.
>
> Thanks for the reply.
>
>> In synchronous mode, pg_receivexlog should have similar logic as walreceiver 
>> does.
>
> OK. I understand that removing --fsync-interval has no problem.

+1 for adding something like --synchronous option. To me,
it sounds walreceiver-compatible mode rather than synchronous.

>> Better to add a new "notify" message type. And pg_recevexlog should be 
>> prepared to receive it at any time. The status might change on the fly, if 
>> the server's configuration is reloaded.
>
> OK. I'll consider it.

I don't think that's good idea because it prevents us from using
pg_receivexlog as async walreceiver (i.e., received WAL data is
fsynced and feedback is sent back to the server soon,
but transaction commit in the server doesn't wait for the feedback).

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] alter user/role CURRENT_USER

2014-10-10 Thread Kyotaro HORIGUCHI
Hello, on the way considering alter role set .., I found that
ALTER ROLE/USER cannot take CURRENT_USER as the role name.

In addition to that, documents of ALTER ROLE / USER are
inconsistent with each other in spite of ALTER USER is said to be
an alias for ALTER ROLE. Plus, ALTER USER cannot take ALL as user
name although ALTER ROLE can.

This patch does following things,

 - ALTER USER/ROLE now takes CURRENT_USER as user name.

 - Rewrite sysnopsis of the documents for ALTER USER and ALTER
   ROLE so as to they have exactly same syntax.

 - Modify syntax of ALTER USER so as to be an alias of ALTER ROLE.

   - Added CURRENT_USER/CURRENT_ROLE syntax to both.
   - Added ALL syntax as user name to ALTER USER.
   - Added IN DATABASE syntax to ALTER USER.

   x Integrating ALTER ROLE/USER syntax could not be done because of
 shift/reduce error of bison.

 x This patch contains no additional regressions. Is it needed?

SESSION_USER/USER also can be made usable for this command, but
this patch doesn't so (yet).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d12f479de845f55f77096e79fea69930bd665416 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 9 Sep 2014 19:26:33 +0900
Subject: [PATCH 2/2] ALTER ROLE CURRENT_USER document

---
 doc/src/sgml/ref/alter_role.sgml |   15 ---
 doc/src/sgml/ref/alter_user.sgml |   13 +++--
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/ref/alter_role.sgml b/doc/src/sgml/ref/alter_role.sgml
index 0471daa..e6f8093 100644
--- a/doc/src/sgml/ref/alter_role.sgml
+++ b/doc/src/sgml/ref/alter_role.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-ALTER ROLE name [ [ WITH ] option [ ... ] ]
+ALTER ROLE { name | CURRENT_USER } [ [ WITH ] option [ ... ] ]
 
 where option can be:
 
@@ -37,12 +37,12 @@ ALTER ROLE name [ [ WITH ] password'
 | VALID UNTIL 'timestamp'
 
-ALTER ROLE name RENAME TO new_name
+ALTER ROLE name RENAME TO new_name
 
-ALTER ROLE name [ IN DATABASE database_name ] SET configuration_parameter { TO | = } { value | DEFAULT }
-ALTER ROLE { name | ALL } [ IN DATABASE database_name ] SET configuration_parameter FROM CURRENT
-ALTER ROLE { name | ALL } [ IN DATABASE database_name ] RESET configuration_parameter
-ALTER ROLE { name | ALL } [ IN DATABASE database_name ] RESET ALL
+ALTER ROLE { name | CURRENT_USER | ALL } [ IN DATABASE database_name ] SET configuration_parameter { TO | = } { value | DEFAULT }
+ALTER ROLE { name | CURRENT_USER | ALL } [ IN DATABASE database_name ] SET configuration_parameter FROM CURRENT
+ALTER ROLE { name | CURRENT_USER | ALL } [ IN DATABASE database_name ] RESET configuration_parameter
+ALTER ROLE { name | CURRENT_USER | ALL } [ IN DATABASE database_name ] RESET ALL
 
  
 
@@ -123,7 +123,8 @@ ALTER ROLE { name | ALL } [ IN DATA
   name
   

-The name of the role whose attributes are to be altered.
+The name of the role whose attributes are to be
+altered. CURRENT_USER matches the name of the current user.

   
  
diff --git a/doc/src/sgml/ref/alter_user.sgml b/doc/src/sgml/ref/alter_user.sgml
index 58ae1da..feb1197 100644
--- a/doc/src/sgml/ref/alter_user.sgml
+++ b/doc/src/sgml/ref/alter_user.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-ALTER USER name [ [ WITH ] option [ ... ] ]
+ALTER USER { name | CURRENT_USER } [ [ WITH ] option [ ... ] ]
 
 where option can be:
 
@@ -32,16 +32,17 @@ ALTER USER name [ [ WITH ] connlimit
 | [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'password'
 | VALID UNTIL 'timestamp'
 
-ALTER USER name RENAME TO new_name
+ALTER USER name RENAME TO new_name
 
-ALTER USER name SET configuration_parameter { TO | = } { value | DEFAULT }
-ALTER USER name SET configuration_parameter FROM CURRENT
-ALTER USER name RESET configuration_parameter
-ALTER USER name RESET ALL
+ALTER USER { name | CURRENT_USER | ALL } [ IN DATABASE database_name ] SET configuration_parameter { TO | = } { value | DEFAULT }
+ALTER USER { name | CURRENT_USER | ALL } [ IN DATABASE database_name ] SET configuration_parameter FROM CURRENT
+ALTER USER { name | CURRENT_USER | ALL } [ IN DATABASE database_name ] RESET configuration_parameter
+ALTER USER { name | CURRENT_USER | ALL } [ IN DATABASE database_name ] RESET ALL
 
  
 
-- 
1.7.1

>From 9be0ca6f7961ccadf665867c52233079a1024737 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 9 Sep 2014 19:26:24 +0900
Subject: [PATCH 1/2] ALTER ROLE CURRENT_USER

---
 src/backend/commands/user.c |   48 --
 src/backend/parser/gram.y   |   27 +--
 2 files changed, 56 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 1a73fd8..8630323 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -649,13 +649,25 @@ AlterRole(AlterRoleStmt *stmt)
 	pg_authid_rel = heap_open(AuthIdRelationId, RowExclusiveLock);
 	pg_authi

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-10 Thread Andres Freund
Hi,

On 2014-10-10 10:13:03 +0530, Amit Kapila wrote:
> I have done few performance tests for above patches and results of
> same is as below:

Cool, thanks.

> Performance Data
> --
> IBM POWER-7 16 cores, 64 hardware threads
> RAM = 64GB
> max_connections =210
> Database Locale =C
> checkpoint_segments=256
> checkpoint_timeout=35min
> shared_buffers=8GB
> Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
> Duration of each individual run = 5mins
> Test type - read only pgbench with -M prepared
> Other Related information about test
> a. This is the data for median of 3 runs, the detailed data of individual
> run
> is attached with mail.
> b. I have applied both the patches to take performance data.
> 
> Scale Factor - 100
> 
>Patch_ver/Client_count 1 8 16 32 64 128  HEAD 13344 106921 196629 295123
> 377846 333928  PATCH 13662 106179 203960 298955 452638 465671
> 
> Scale Factor - 3000
> 
>Patch_ver/Client_count 8 16 32 64 128 160  HEAD 86920 152417 231668
> 280827 257093 255122  PATCH 87552 160313 230677 276186 248609 244372
> 
> 
> Observations
> --
> a. The patch performs really well (increase upto ~40%) incase all the
> data fits in shared buffers (scale factor -100).
> b. Incase data doesn't fit in shared buffers, but fits in RAM
> (scale factor -3000), there is performance increase upto 16 client count,
> however after that it starts dipping (in above config unto ~4.4%).

Hm. Interesting. I don't see that dip on x86.

> The above data shows that the patch improves performance for cases
> when there is shared LWLock contention, however there is a slight
> performance dip in case of Exclusive LWLocks (at scale factor 3000,
> it needs exclusive LWLocks for buf mapping tables).  Now I am not
> sure if this is the worst case dip or under similar configurations the
> performance dip can be higher, because the trend shows that dip is
> increasing with more client counts.
> 
> Brief Analysis of code w.r.t performance dip
> -
> Extra Instructions w.r.t Head in Acquire Exclusive lock path
> a. Attempt lock twice
> b. atomic operations for nwaiters in LWLockQueueSelf() and
> LWLockAcquireCommon()
> c. Now we need to take spinlock twice, once for self queuing and then
> again for setting releaseOK.
> d. few function calls and some extra checks

Hm. I can't really see the number of atomics itself matter - a spinning
lock will do many more atomic ops than this. But I wonder whether we
could get rid of the releaseOK lock. Should be quite possible.

> Now probably these shouldn't matter much in case backend needs to
> wait for other Exclusive locker, but I am not sure what else could be
> the reason for dip in case we need to have Exclusive LWLocks.

Any chance to get a profile?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c

2014-10-10 Thread Peter Geoghegan
On Thu, Oct 9, 2014 at 11:58 PM, Heikki Linnakangas
 wrote:
> Yeah. Want to write that into a patch?

Attached.


-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
new file mode 100644
index 8e57505..2d15da0
*** a/src/backend/utils/sort/tuplesort.c
--- b/src/backend/utils/sort/tuplesort.c
*** comparetup_index_btree(const SortTuple *
*** 3176,3184 
   Tuplesortstate *state)
  {
/*
!* This is similar to _bt_tuplecompare(), but we have already done the
!* index_getattr calls for the first column, and we need to keep track 
of
!* whether any null fields are present.  Also see the special treatment
 * for equal keys at the end.
 */
ScanKey scanKey = state->indexScanKey;
--- 3176,3183 
   Tuplesortstate *state)
  {
/*
!* This is similar to comparetup_heap(), but expects index tuples.  
There
!* is also special handling for enforcing uniqueness, and special 
treatment
 * for equal keys at the end.
 */
ScanKey scanKey = state->indexScanKey;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Obsolete reference to _bt_tuplecompare() within tuplesort.c

2014-10-10 Thread Heikki Linnakangas

On 10/10/2014 02:04 AM, Peter Geoghegan wrote:

I found a reference made obsolete by commit 9e85183b, which is from
way back in 2000.

comparetup_index_btree() says:

/*
* This is similar to _bt_tuplecompare(), but we have already done the
* index_getattr calls for the first column, and we need to keep track of
* whether any null fields are present.  Also see the special treatment
* for equal keys at the end.
*/

I think that this comment should simply indicate that the routine is
similar to comparetup_heap(), except that it takes care of the special
tie-break stuff for B-Tree sorts, as well as enforcing uniqueness
during unique index builds. It should not reference _bt_compare() at
all (which is arguably the successor to _bt_tuplecompare()), since
_bt_compare() is concerned with a bunch of stuff highly specific to
the B-Tree implementation (e.g. having a hard-wired return value for
comparisons involving the first data item on an internal page).


Yeah. Want to write that into a patch?

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers