Re: [HACKERS] [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...

2014-10-12 Thread Andres Freund
On 2014-10-11 20:33:57 -0400, Bruce Momjian wrote:
 On Tue, Aug 12, 2014 at 07:08:06PM -0400, Robert Haas wrote:
  On Tue, Aug 12, 2014 at 12:59 PM, Andres Freund and...@2ndquadrant.com 
  wrote:
   On 2014-08-12 09:42:30 -0700, Sean Chittenden wrote:
   One of the patches that I've been sitting on and am derelict in punting
   upstream is the attached mmap(2) flags patch for the BSDs. Is there any
   chance this can be squeezed in to the PostreSQL 9.4 release?
  
   The patch is trivial in size and is used to add one flag to mmap(2) 
   calls in
   dsm_impl.c.  Alan Cox (FreeBSD alc, not Linux) and I went back and forth
   regarding PostgreSQL's use of mmap(2) and determined that the following 
   is
   correct and will prevent a likely performance regression in PostgreSQL 
   9.4.
   In PostgreSQL 9.3, all mmap(2) calls were called with the flags MAP_ANON 
   |
   MAP_SHARED, whereas in PostgreSQL 9.4 this is not the case.
  
   The performancewise important call to mmap will still use that set of
   flags, no? That's the one backing shared_buffers.
  
   The mmap backend for *dynamic* shared memory (aka dsm) is *NOT* supposed
   to be used on common platforms. Both posix and sysv shared memory will
   be used before falling back to the mmap() backend.
  
  Hmm, yeah.  This might still be a good thing to do (because what do we
  lose?) but it shouldn't really be an issue in practice.
 
 Is there a reason this was not applied?

IIRC, as pointed out above, it's primarily based on a misunderstanding
about when mmap is used for in dsm. I.e. that it's essentially just a
fallback/toy implementation and that posix or sysv should rather be
used.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-10-12 Thread David Rowley
On Wed, Sep 24, 2014 at 8:02 AM, Christoph Berg c...@df7cb.de wrote:

 Re: Tom Lane 2014-09-23 15155.1411493...@sss.pgh.pa.us
  Robert Haas robertmh...@gmail.com writes:
   On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote:
   Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time
   line? That would even be backwards compatible with 9.x where it would
   be a no-op.
 
   I don't think that'll work becuase:
 
   /* check that timing is used with EXPLAIN ANALYZE */
   if (es.timing  !es.analyze)
   ereport(ERROR,
  
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg(EXPLAIN option TIMING
   requires ANALYZE)));
 
  It looks to me like that would complain about EXPLAIN (TIMING ON),
  not the case Christoph is suggesting.  What he proposes seems a bit
  odd and non-orthogonal, but we could make the code do it if we wanted.

 I don't think this warrants a new flag, and TIMING OFF seems to be the
 right naming for it. (In fact it was the first I tried, and I was
 cursing quite a bit over the lack of configurability until I realized
 that COSTS OFF disabled the planning time display as well.) It might
 be a bit odd, but it's easy to remember.



I'm pretty interested in seeing something change around here.
The patch I'm working on at the moment (INNER JOIN removals) implements
skipping of joins at execution time rather than planning time. Currently
I'm working on the regression test for this and it's not all that easy due
to the execution time appearing in the results.

An explain analyze output from master with the patch can look something
like:

explain (analyze, costs off, timing off)
select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id =
c.id;
QUERY PLAN
---
 Hash Join (actual rows=1 loops=1)
   Hash Cond: (b.c_id = c.id)
   -  Hash Join (actual rows=1 loops=1)
 Hash Cond: (a.b_id = b.id)
 -  Seq Scan on a (actual rows=1 loops=1)
 -  Hash (never executed)
   -  Seq Scan on b (never executed)
   -  Hash (never executed)
 -  Seq Scan on c (never executed)
 Execution time: 0.092 ms
(10 rows)

From this I can see easily that the joins to b and c were skipped, however
the output the way it is at the moment is quite useless for regression
testing with.

Regards

David Rowley


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-10-12 Thread Andres Freund
On 2014-10-12 23:13:27 +1300, David Rowley wrote:
 On Wed, Sep 24, 2014 at 8:02 AM, Christoph Berg c...@df7cb.de wrote:
 
  Re: Tom Lane 2014-09-23 15155.1411493...@sss.pgh.pa.us
   Robert Haas robertmh...@gmail.com writes:
On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote:
Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time
line? That would even be backwards compatible with 9.x where it would
be a no-op.
  
I don't think that'll work becuase:
  
/* check that timing is used with EXPLAIN ANALYZE */
if (es.timing  !es.analyze)
ereport(ERROR,
   
   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(EXPLAIN option TIMING
requires ANALYZE)));
  
   It looks to me like that would complain about EXPLAIN (TIMING ON),
   not the case Christoph is suggesting.  What he proposes seems a bit
   odd and non-orthogonal, but we could make the code do it if we wanted.
 
  I don't think this warrants a new flag, and TIMING OFF seems to be the
  right naming for it. (In fact it was the first I tried, and I was
  cursing quite a bit over the lack of configurability until I realized
  that COSTS OFF disabled the planning time display as well.) It might
  be a bit odd, but it's easy to remember.
 
 
 
 I'm pretty interested in seeing something change around here.
 The patch I'm working on at the moment (INNER JOIN removals) implements
 skipping of joins at execution time rather than planning time. Currently
 I'm working on the regression test for this and it's not all that easy due
 to the execution time appearing in the results.
 
 An explain analyze output from master with the patch can look something
 like:
 
 explain (analyze, costs off, timing off)
 select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id =
 c.id;
 QUERY PLAN
 ---
  Hash Join (actual rows=1 loops=1)
Hash Cond: (b.c_id = c.id)
-  Hash Join (actual rows=1 loops=1)
  Hash Cond: (a.b_id = b.id)
  -  Seq Scan on a (actual rows=1 loops=1)
  -  Hash (never executed)
-  Seq Scan on b (never executed)
-  Hash (never executed)
  -  Seq Scan on c (never executed)
  Execution time: 0.092 ms
 (10 rows)

So you're now the third person reporting problems here. Let's remove
'execution time' for COSTS off.

I personally would even say that we should backpatch that to make
backpatches involving regression tests less painful.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function array_agg(array)

2014-10-12 Thread Ali Akbar
2014-10-11 22:28 GMT+07:00 Tom Lane t...@sss.pgh.pa.us:

 Seems dangerous as heck; certainly it would have side-effects far more
 wide-ranging than just making this particular function work.

 A safer answer is to split array_agg into two functions,
 array_agg(anynonarray) - anyarray
 array_agg(anyarray) - anyarray

 I rather imagine you should do that anyway, because I really doubt
 that this hack is operating quite as intended.  I suspect you are
 producing arrays containing arrays as elements, not true 2-D arrays.
 That's not a direction we want to go in I think; certainly there are
 no other operations that produce such things.


Thanks for the review. Yes, it looks like the patch produced array as the
elements. So, all array operations behaves wierdly.

In this quick  dirty patch, I am trying to implement the
array_agg(anyarray), introducing two new functions:
- array_agg_anyarray_transfn
- array_agg_anyarray_finalfn

At first, i want to use accumArrayResult and makeMdArrayResult, but it's
complicated to work with multi-dimensional arrays with those two functions.
So i combined array_cat with those function.

Currently, it cannot handle NULL arrays:
backend select array_agg(a) from (values(null::int[])) a(a);
 1: array_agg(typeid = 1007, len = -1, typmod = -1, byval = f)

ERROR:  cannot aggregate null arrays

Regards,
-- 
Ali Akbar
*** a/src/backend/utils/adt/array_userfuncs.c
--- b/src/backend/utils/adt/array_userfuncs.c
***
*** 16,21 
--- 16,51 
  #include utils/builtins.h
  #include utils/lsyscache.h
  
+ #include utils/memutils.h
+ 
+ /*-
+  * ArrayAggAnyArrayState:
+  *		aggregate state for array_agg(anyarray)
+  *-
+  */
+ typedef struct
+ {
+ 	MemoryContext mcontext;		/* where all the temp stuff is kept */
+ 	char	   *data;			/* array of accumulated data */
+ 	bits8	   *nullbitmap;		/* bitmap of is-null flags for data */
+ 
+ 	int			abytes;			/* allocated length of above arrays */
+ 	int			aitems;			/* allocated length of above arrays */
+ 	int			nbytes;			/* number of used bytes in above arrays */
+ 	int			nitems;			/* number of elements in above arrays */
+ 	int			narray;			/* number of array accumulated */
+ 	Oid			element_type;	/* data type of the Datums */
+ 	int16		typlen;			/* needed info about datatype */
+ 	bool		typbyval;
+ 	char		typalign;
+ 
+ 	int			ndims;			/* element dimensions */
+ 	int		   *dims;
+ 	int		   *lbs;
+ 
+ 	bool		hasnull;		/* any element has null */
+ } ArrayAggAnyArrayState;
+ 
  
  /*-
   * array_push :
***
*** 544,546  array_agg_finalfn(PG_FUNCTION_ARGS)
--- 574,814 
  
  	PG_RETURN_DATUM(result);
  }
+ 
+ /*
+  * ARRAY_AGG(anyarray) aggregate function
+  */
+ Datum
+ array_agg_anyarray_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext aggcontext,
+ arr_context,
+ oldcontext;
+ 	ArrayAggAnyArrayState *astate;
+ 
+ 	Oid			arg_typeid = get_fn_expr_argtype(fcinfo-flinfo, 1);
+ 	Oid			arg_elemtype = get_element_type(arg_typeid);
+ 	ArrayType  *arg;
+ 	int		   *dims,
+ 			   *lbs,
+ ndims,
+ nitems,
+ ndatabytes;
+ 	char	   *data;
+ 
+ 	int			i;
+ 
+ 	if (arg_elemtype == InvalidOid)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg(could not determine input data type)));
+ 
+ 	if (!AggCheckCallContext(fcinfo, aggcontext))
+ 		elog(ERROR, array_agg_anyarray_transfn called in non-aggregate context);
+ 
+ 	if (PG_ARGISNULL(1))
+ 		elog(ERROR, cannot aggregate null arrays);
+ 
+ 	astate = PG_ARGISNULL(0) ? NULL : (ArrayAggAnyArrayState *) PG_GETARG_POINTER(0);
+ 
+ 	if (astate == NULL)
+ 	{
+ 		arr_context = AllocSetContextCreate(aggcontext,
+ 			array_agg_anyarray_transfn,
+ 			ALLOCSET_DEFAULT_MINSIZE,
+ 			ALLOCSET_DEFAULT_INITSIZE,
+ 			ALLOCSET_DEFAULT_MAXSIZE);
+ 		oldcontext = MemoryContextSwitchTo(arr_context);
+ 		astate = (ArrayAggAnyArrayState *) palloc(sizeof(ArrayAggAnyArrayState));
+ 
+ 		astate-mcontext = arr_context;
+ 		astate-abytes = 0;
+ 		astate-aitems = 0;
+ 		astate-data = NULL;
+ 		astate-nullbitmap = NULL;
+ 		astate-nitems = 0;
+ 		astate-narray = 0;
+ 		astate-element_type = arg_elemtype;
+ 		get_typlenbyvalalign(arg_elemtype,
+ 			 astate-typlen,
+ 			 astate-typbyval,
+ 			 astate-typalign);
+ 	}
+ 	else
+ 	{
+ 		oldcontext = MemoryContextSwitchTo(astate-mcontext);
+ 		Assert(astate-element_type == arg_elemtype);
+ 	}
+ 
+ 	arg = PG_GETARG_ARRAYTYPE_P(1);
+ 
+ 	ndims = ARR_NDIM(arg);
+ 	dims = ARR_DIMS(arg);
+ 	lbs = ARR_LBOUND(arg);
+ 	data = ARR_DATA_PTR(arg);
+ 	nitems = ArrayGetNItems(ndims, dims);
+ 
+ 	ndatabytes = ARR_SIZE(arg) - ARR_DATA_OFFSET(arg);
+ 
+ 	if (astate-data == NULL)
+ 	{
+ 		if (ndims + 1  MAXDIM)
+ 			ereport(ERROR,
+ 

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-12 Thread Matthew Woodcraft
On 2014-10-10 19:44, Kevin Grittner wrote:
 Peter Geoghegan p...@heroku.com wrote:
 People keep remarking that they don't like that you can (optionally)
 name a unique index explicitly, 

[...]

 To restate: to do so is conflating the logical definition of the 
 database with a particular implementation detail.  As just one 
 reason that is a bad idea: we can look up unique indexes on the 
 specified columns, but if we implement a other storage techniques 
 where there is no such thing as a unique index on the columns, yet 
 manage to duplicate the semantics (yes, stranger things have 
 happened), people can't migrate to the new structure without 
 rewriting their queries

Wouldn't it be good enough to define the 'WITHIN' as expecting a
unique-constraint name rather than an index name (even though those
happen to be the same strings)?

I think constraints are part of the logical definition of the database,
and a new storage technique which doesn't use indexes should still have
names for its unique constraints.

-M-




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-12 Thread Marko Tiikkaja

On 10/12/14, 2:36 PM, Matthew Woodcraft wrote:

On 2014-10-10 19:44, Kevin Grittner wrote:

To restate: to do so is conflating the logical definition of the
database with a particular implementation detail.  As just one
reason that is a bad idea: we can look up unique indexes on the
specified columns, but if we implement a other storage techniques
where there is no such thing as a unique index on the columns, yet
manage to duplicate the semantics (yes, stranger things have
happened), people can't migrate to the new structure without
rewriting their queries


Wouldn't it be good enough to define the 'WITHIN' as expecting a
unique-constraint name rather than an index name (even though those
happen to be the same strings)?

I think constraints are part of the logical definition of the database,
and a new storage technique which doesn't use indexes should still have
names for its unique constraints.


What about partial indexes?  Indexes on expressions or functions calls?


.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal : REINDEX SCHEMA

2014-10-12 Thread Sawada Masahiko
Hi all,

Attached WIP patch adds new syntax REINEX SCHEMA which does reindexing
all table of specified schema.
There are syntax dose reindexing specified index, per table and per database,
but we can not do reindexing per schema for now.
So we must use reindexdb command if we want to do.
This new syntax supports it as SQL command.
This use similar logic as REINDEX DATABASE, but we can use it in
transaction block.
Here is some example,

-- Table information
[postgres][5432](1)=# \d n1.hoge
   Table n1.hoge
 Column |  Type   | Modifiers
+-+---
 col| integer | not null
Indexes:
hoge_pkey PRIMARY KEY, btree (col)

[postgres][5432](1)=# \d n2.hoge
   Table n2.hoge
 Column |  Type   | Modifiers
+-+---
 col| integer |

[postgres][5432](1)=# \d n3.hoge
Did not find any relation named n3.hoge.

-- Do reindexing
[postgres][5432](1)=# reindex schema n1;
NOTICE:  table n1.hoge was reindexed
REINDEX
[postgres][5432](1)=# reindex schema n2;
REINDEX
[postgres][5432](1)=# reindex schema n3;
NOTICE:  scheman3 does not hava any table
REINDEX

Please review and comment.

Regards,

---
Sawada Masahiko


reindex_schema_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-10-12 Thread Ronan Dunklau
Le dimanche 12 octobre 2014 13:17:00 Andres Freund a écrit :
 On 2014-10-12 23:13:27 +1300, David Rowley wrote:
  On Wed, Sep 24, 2014 at 8:02 AM, Christoph Berg c...@df7cb.de wrote:
   Re: Tom Lane 2014-09-23 15155.1411493...@sss.pgh.pa.us
   
Robert Haas robertmh...@gmail.com writes:
 On Sat, Sep 20, 2014 at 4:13 PM, Christoph Berg c...@df7cb.de wrote:
 Can we have EXPLAIN (timing off) in 9.4+ hide the Planning time
 line? That would even be backwards compatible with 9.x where it
 would
 be a no-op.
 
 I don't think that'll work becuase:
 /* check that timing is used with EXPLAIN ANALYZE */
 if (es.timing  !es.analyze)
 
 ereport(ERROR,

(errcode(ERRCODE_INVALID_PARAMETER_VALUE),

  errmsg(EXPLAIN option TIMING
 
 requires ANALYZE)));

It looks to me like that would complain about EXPLAIN (TIMING ON),
not the case Christoph is suggesting.  What he proposes seems a bit
odd and non-orthogonal, but we could make the code do it if we wanted.
   
   I don't think this warrants a new flag, and TIMING OFF seems to be the
   right naming for it. (In fact it was the first I tried, and I was
   cursing quite a bit over the lack of configurability until I realized
   that COSTS OFF disabled the planning time display as well.) It might
   be a bit odd, but it's easy to remember.
  
  I'm pretty interested in seeing something change around here.
  The patch I'm working on at the moment (INNER JOIN removals) implements
  skipping of joins at execution time rather than planning time. Currently
  I'm working on the regression test for this and it's not all that easy due
  to the execution time appearing in the results.
  
  An explain analyze output from master with the patch can look something
  like:
  
  explain (analyze, costs off, timing off)
  select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id =
  c.id;
  
  QUERY PLAN
  
  ---
  
   Hash Join (actual rows=1 loops=1)
   
 Hash Cond: (b.c_id = c.id)
 -  Hash Join (actual rows=1 loops=1)
 
   Hash Cond: (a.b_id = b.id)
   -  Seq Scan on a (actual rows=1 loops=1)
   -  Hash (never executed)
   
 -  Seq Scan on b (never executed)
 
 -  Hash (never executed)
 
   -  Seq Scan on c (never executed)
   
   Execution time: 0.092 ms
  
  (10 rows)
 
 So you're now the third person reporting problems here. Let's remove
 'execution time' for COSTS off.
 
 I personally would even say that we should backpatch that to make
 backpatches involving regression tests less painful.

That wouldn't solve the first problem mentioned, which is that for some 
regression tests one may want to test the costs themselves, which is now 
impossible with the new planning time feature.

What would IMO make both cases suitable would be to eliminate ALL timing from 
TIMING OFF, not only the timing on the individual nodes. As was mentioned 
before, it is a bit counter intuitive to have COSTS OFF disable the planning 
time, and not TIMING OFF.

 
 Greetings,
 
 Andres Freund

-- 
Ronan Dunklau

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Hide 'Execution time' in EXPLAIN (COSTS OFF)

2014-10-12 Thread Tom Lane
Ronan Dunklau ro...@dunklau.fr writes:
 That wouldn't solve the first problem mentioned, which is that for some 
 regression tests one may want to test the costs themselves, which is now 
 impossible with the new planning time feature.

That's a bogus argument, because it was impossible before too.  We have
no such tests now, and it's unlikely we will ever add any, because costs
inherently are platform-dependent.  The reason we invented COSTS OFF in
the first place was to make it possible to do EXPLAIN in regression tests
without getting platform-dependent output.

I have no great objection to making both COSTS OFF and TIMING OFF suppress
the planning time output, if that's the consensus.  I would object to
taking away that behavior of COSTS OFF, because of the implications for
back-patching EXPLAIN queries in regression tests.

Another possibility, which would introduce less non-orthogonality into
the switch design, is to remove the connection to COSTS OFF but say that
planning time is only printed when execution time is also printed (ie,
only in EXPLAIN ANALYZE).  This seems to me that it would not be removing
much functionality, because if you just did a plain EXPLAIN then you can
take the client-side runtime (psql \timing) as a close-enough estimate
of planning time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-10-12 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Sawada Masahiko wrote:
  Attached WIP patch adds new syntax REINEX SCHEMA which does reindexing
  all table of specified schema.
  There are syntax dose reindexing specified index, per table and per 
  database,
  but we can not do reindexing per schema for now.
 
 It seems doubtful that there really is much use for this feature, but if
 there is, I think a better syntax precedent is the new ALTER TABLE ALL
 IN TABLESPACE thingy, rather than your proposed REINDEX SCHEMA.
 Something like REINDEX TABLE ALL IN SCHEMA perhaps.

Yeah, I tend to agree that we should be looking at the 'ALL IN
TABLESPACE' and 'ALL IN SCHEMA' type of commands to keep things
consistent.  This might be an alternative for the vacuum / analyze /
reindex database commands also..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-12 Thread Gavin Flower

On 10/10/14 21:57, Simon Riggs wrote:

Postgres currently supports column level SELECT privileges.

1. If we want to confirm a credit card number, we can issue SELECT 1
FROM customer WHERE stored_card_number = '1234 5678 5344 7733'

2. If we want to look for card fraud, we need to be able to use the
full card number to join to transaction data and look up blocked card
lists etc..

3. We want to block the direct retrieval of card numbers for
additional security.
In some cases, we might want to return an answer like ' *  7733'

We can't do all of the above with current facilities inside the database.

The ability to mask output for data in certain cases, for the purpose
of security, is known lately as data redaction, or column-level data
redaction.

The best way to support this requirement would be to allow columns to
have an additional output formatting function. This would be
executed only when data is about to be returned by a query. All other
uses of that would not restrict the data.

This would have other uses as well, such as default report formats, so
we can store financial amounts as NUMERIC, but format them on
retrieval as $12,345.78 etc..

Suggested user interface would be...
FORMAT functionname(parameters, if any)

e.g.
CREATE TABLE customer
( id ...
...
, stored_card_number  NUMERIC FORMAT pci_card_number_redaction()
...
);

We'd need to implement something to allow pg_dump to ignore format
functions. I suggest the best way to do that is by providing a BACKUP
role that can be delegated to other users. We would then allow a
parameter for SET output_formatting = on | off, which can only be set
by superuser and BACKUP role, then have pg_dump issue SET
output_formatting = off explicitly when it runs.

Do we want redaction in PostgreSQL?
Do we want it generalised into output format functions?



I think having a FORMAT option would be good, but I strongly feel that 
end users should NEVER EVER have direct access to any database with 
sensitive information!  And if the full details are stored, then 
obviously, at some time people will have a legitimate need to access all 
the digits - so it does not make sense to prevent this .


Also I think it would be useful to store formats, especially complicated 
ones, so they can be defined once and reused as many times as  required 
- helps for standardisation.


How about something like:

CREATE FORMAT /format-name/ [WITH] /format-spec/ [DENY | ALLOW role-1, ...];

Where the /format-spec/ is either a function, or something similar to a 
COBOL picture spec., I suspect that the implied security control with 
the ALLOW  DENY options might prove too weak for anyone determined, 
though it might be good enough in some common contexts.



CREATE FORMAT card_format_redacted WITH '   ' ALLOW ALL;
CREATE FORMAT card_format_full '   ' ALLOW admin_1;
CREATE FORMAT card_format_special special_card_formatter(); ALLOW 
admin_42, mariadba;


-- specify default FORMAT
CREATE TABLE customer
(
...
stored_card_number  NUMERIC FORMAT card_format_redacted,
...
)


-- unformatted, fails if role is neither admin-1 or a role that inherits 
from it

SELECT
stored_card_number
WHERE
...;


-- using card_format_redacted
SELECT
stored_card_number FORMAT DEFAULT
WHERE
...;


-- using card_format_full, fails if role is neither admin-1 or a role 
that inherits from it

SELECT
stored_card_number FORMAT card_format_full
WHERE
...;



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-10-12 Thread Fabrízio de Royes Mello
On Sun, Oct 12, 2014 at 2:27 PM, Stephen Frost sfr...@snowman.net wrote:

 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
  Sawada Masahiko wrote:
   Attached WIP patch adds new syntax REINEX SCHEMA which does reindexing
   all table of specified schema.
   There are syntax dose reindexing specified index, per table and per
database,
   but we can not do reindexing per schema for now.
 
  It seems doubtful that there really is much use for this feature, but if
  there is, I think a better syntax precedent is the new ALTER TABLE ALL
  IN TABLESPACE thingy, rather than your proposed REINDEX SCHEMA.
  Something like REINDEX TABLE ALL IN SCHEMA perhaps.

 Yeah, I tend to agree that we should be looking at the 'ALL IN
 TABLESPACE' and 'ALL IN SCHEMA' type of commands to keep things
 consistent.  This might be an alternative for the vacuum / analyze /
 reindex database commands also..


Some review:

1) +1 to REINDEX ALL IN SCHEMA name


2) IMHO the logic should be exactly the same as REINDEX DATABASE, including
the transaction control. Imagine a schema with a lot of tables, you can
lead to a deadlock using just one transaction block.


3) The patch was applied to master and compile without warnings


4) Typo (... does not have any table)

+   if (!reindex_schema(heapOid))
+   ereport(NOTICE,
+   (errmsg(schema\%s\ does not hava any table,
+   schema-relname)));


5) Missing of regression tests, please add it to
src/test/regress/sql/create_index.sql

6) You need to add psql complete tabs

7) I think we can add -S / --schema option do reindexdb in this patch
too. What do you think?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello