date:20150305

Re: [HACKERS] TABLESAMPLE patch

2015-03-05 Thread Amit Kapila

On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek  wrote:
>
>
> I didn't add the whole page visibility caching as the tuple ids we get
from sampling methods don't map well to the visibility info we get from
heapgetpage (it maps to the values in the rs_vistuples array not to to its
indexes). Commented about it in code also.
>

I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?

Few other comments:

1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej

2.
Few warnings in code (compiled on windows(msvc))
1>src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1>src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'

3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
{
..
+ /*
+ * Page at a time mode is useless for us as we need to check visibility
+ * of tuples individually because tuple offsets returned by sampling
+ * methods map to rs_vistuples values and not to its indexes.
+ */
+ node->ss.ss_currentScanDesc->rs_pageatatime = false;
..
}

Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?

I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c

4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+ *tuples = path->rows * samplesize;
}

It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results.  I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.

5.
gram.y

-
 /*
  * Given "UPDATE foo set set ...", we have to decide without looking any

Unrelated line removed.

6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
  | PLACING
  | PRIMARY
  | REFERENCES
+ | REPEATABLE

Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.

To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html

Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.

On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
;

 opt_repeatable_clause:
-   REPEATABLE '(' AexprConst ')'   { $$ = (Node *) $3;
}
+   REPEATABLE '(' a_expr ')'   { $$ = (Node *) $3;
}
| /*EMPTY*/
{ $$ = NULL; }

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] WALWriter active during recovery

2015-03-05 Thread Fujii Masao

On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao  wrote:
> On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs  wrote:
>> Currently, WALReceiver writes and fsyncs data it receives. Clearly,
>> while we are waiting for an fsync we aren't doing any other useful
>> work.
>>
>> Following patch starts WALWriter during recovery and makes it
>> responsible for fsyncing data, allowing WALReceiver to progress other
>> useful actions.

With the patch, replication didn't work fine in my machine. I started
the standby server after removing all the WAL files from the standby.
ISTM that the patch doesn't handle that case. That is, in that case,
the standby tries to start up walreceiver and replication to retrieve
the REDO-starting checkpoint record *before* starting up walwriter
(IOW, before reaching the consistent point). Then since walreceiver works
without walwriter, no received WAL data cannot be fsync'd in the standby.
So replication cannot advance furthermore. I think that walwriter needs
to start before walreceiver starts.

I just marked this patch as Waiting on Author.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-03-05 Thread Fujii Masao

On Thu, Feb 19, 2015 at 3:32 PM, Michael Paquier
 wrote:
> On Thu, Feb 19, 2015 at 2:13 PM, Naoya Anzai
>  wrote:
>> As a result, I think you should not delete VACOPT_VERBOSE.
>
> In v8 it is not deleted. It is still declared, and its use is isolated
> in gram.y, similarly to VACOPT_FREEZE.
>
>> According to the last mail I have posted, the difference of
>> manual-vacuum log and auto-vacuum log exists clearly.
>
> Did you test the latest patch v8? I have added checks in it to see if
> a process is an autovacuum worker to control elevel and the extra logs
> of v7 do not show up.
>
>> So, at least you should not touch the mechanism of VACOPT_VERBOSE
>> until both vacuum log formats are unified to a same format.
>
> If you mean that we should have the same kind of log outputs for
> autovacuum and manual vacuum, I think that this is not going to
> happen. Autovacuum entries are kept less verbose on purpose, contract
> that v7 clealy broke.
>
>> If you agree my think, please undo your removing VACOPT_VERBOSE work.
>
> Well, I don't agree :) And I am guessing that you did not look at v8
> as well. Centralizing the control of logs using log_min_duration is
> more extensible than simply having VACOPT_VERBOSE.

With the patch, VACUUM ANALYZE VERBOSE doesn't emit any verbose message.
Why did you remove that functionality?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2015-03-05 Thread Etsuro Fujita


On 2015/03/04 17:07, Etsuro Fujita wrote:

On 2015/03/04 16:58, Albe Laurenz wrote:

Etsuro Fujita wrote:

While updating the patch, I noticed that in the previous patch, there is
a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
for such queries fail with a can't-happen error.  I fixed the bug and
tried to add the regression tests that execute the generic plans, but I
couldn't because I can't figure out how to force generic plans.  Is
there any way to do that?


I don't know about a way to force it, but if you run the statement six
times, it will probably switch to a generic plan.


Yeah, I was just thinking running the statement six times in the
regression tests ...


Here is an updated version.  In this version, the bug has been fixed, 
but any regression tests for that hasn't been added, because I'm not 
sure that the above way is a good idea and don't have any other ideas.


The EXPLAIN output has also been improved as discussed in [1].

On top of this, I'll try to extend the join push-down patch to support a 
pushed-down update on a join, though I'm still digesting the series of 
patches.


Comments welcome.

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/31942.1410534...@sss.pgh.pa.us
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 189,198  is_foreign_expr(PlannerInfo *root,
  	if (!foreign_expr_walker((Node *) expr, &glob_cxt, &loc_cxt))
  		return false;
  
- 	/* Expressions examined here should be boolean, ie noncollatable */
- 	Assert(loc_cxt.collation == InvalidOid);
- 	Assert(loc_cxt.state == FDW_COLLATE_NONE);
- 
  	/*
  	 * An expression which includes any mutable functions can't be sent over
  	 * because its result is not stable.  For example, sending now() remote
--- 189,194 
***
*** 788,794  deparseTargetList(StringInfo buf,
   *
   * If params is not NULL, it receives a list of Params and other-relation Vars
   * used in the clauses; these values must be transmitted to the remote server
!  * as parameter values.
   *
   * If params is NULL, we're generating the query for EXPLAIN purposes,
   * so Params and other-relation Vars should be replaced by dummy values.
--- 784,791 
   *
   * If params is not NULL, it receives a list of Params and other-relation Vars
   * used in the clauses; these values must be transmitted to the remote server
!  * as parameter values.  Caller is responsible for initializing the result list
!  * to empty if necessary.
   *
   * If params is NULL, we're generating the query for EXPLAIN purposes,
   * so Params and other-relation Vars should be replaced by dummy values.
***
*** 805,813  appendWhereClause(StringInfo buf,
  	int			nestlevel;
  	ListCell   *lc;
  
- 	if (params)
- 		*params = NIL;			/* initialize result list to empty */
- 
  	/* Set up context struct for recursion */
  	context.root = root;
  	context.foreignrel = baserel;
--- 802,807 
***
*** 940,945  deparseUpdateSql(StringInfo buf, PlannerInfo *root,
--- 934,996 
  }
  
  /*
+  * deparse remote UPDATE statement
+  *
+  * The statement text is appended to buf, and we also create an integer List
+  * of the columns being retrieved by RETURNING (if any), which is returned
+  * to *retrieved_attrs.
+  */
+ void
+ deparsePushedDownUpdateSql(StringInfo buf, PlannerInfo *root,
+ 		   Index rtindex, Relation rel,
+ 		   List	*targetlist,
+ 		   List *targetAttrs,
+ 		   List	*remote_conds,
+ 		   List **params_list,
+ 		   List *returningList,
+ 		   List **retrieved_attrs)
+ {
+ 	RelOptInfo *baserel = root->simple_rel_array[rtindex];
+ 	deparse_expr_cxt context;
+ 	bool		first;
+ 	ListCell   *lc;
+ 
+ 	if (params_list)
+ 		*params_list = NIL;		/* initialize result list to empty */
+ 
+ 	/* Set up context struct for recursion */
+ 	context.root = root;
+ 	context.foreignrel = baserel;
+ 	context.buf = buf;
+ 	context.params_list = params_list;
+ 
+ 	appendStringInfoString(buf, "UPDATE ");
+ 	deparseRelation(buf, rel);
+ 	appendStringInfoString(buf, " SET ");
+ 
+ 	first = true;
+ 	foreach(lc, targetAttrs)
+ 	{
+ 		int			attnum = lfirst_int(lc);
+ 		TargetEntry *tle = get_tle_by_resno(targetlist, attnum);
+ 
+ 		if (!first)
+ 			appendStringInfoString(buf, ", ");
+ 		first = false;
+ 
+ 		deparseColumnRef(buf, rtindex, attnum, root);
+ 		appendStringInfo(buf, " = ");
+ 		deparseExpr((Expr *) tle->expr, &context);
+ 	}
+ 	if (remote_conds)
+ 		appendWhereClause(buf, root, baserel, remote_conds,
+ 		  true, params_list);
+ 
+ 	deparseReturningList(buf, root, rtindex, rel, false,
+ 		 returningList, retrieved_attrs);
+ }
+ 
+ /*
   * deparse remote DELETE statement
   *
   * The statement text is appended to buf, and we also create an integer List
***
*** 962,967  deparseDeleteSql(StringInfo buf, PlannerInfo *root,
--- 1013,1048 
  }
  
  /*
+  * deparse remote DELETE statement
+  *
+  * The s

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Syed, Rahila


Hello,

Please find attached  a patch. As discussed, flag to denote compression and 
presence of hole in block image has been added in XLogRecordImageHeader rather 
than block header.  

Following are WAL numbers based on attached  test script  posted by Michael 
earlier in the thread.
 
  WAL generated
FPW compression on   122.032 MB

FPW compression off   155.223 MB

HEAD  155.236 MB 

Compression : 21 %
Number of block images generated in WAL :   63637


Thank you,
Rahila Syed


__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


Support-compression-of-full-page-writes-in-WAL_v23.patch
Description: Support-compression-of-full-page-writes-in-WAL_v23.patch


compress_run.bash
Description: compress_run.bash

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] contraints_exclusion fails to refute simple condition

2015-03-05 Thread Sandro Santilli

PostGIS installs standard constraints of this kind:

  CHECK (geometrytype(g) = 'POINT'::text OR g IS NULL)

The constraint is used by constraint_exclusion if using this condition:

  WHERE g IS NOT NULL AND geometrytype(g) = 'LINESTRING'

But it is _NOT_ used if the NOT NULL condition is removed:

  WHERE geometrytype(g) = 'LINESTRING'

As the "geometrytype" is defined as STRICT and IMMUTABLE, there's
no way for geometrytype(g) = 'LINESTRING' to hold true, so why
is the "IS NOT NULL" condition also needed by the planner ?

Andres Freund on IRC suggested that predicate_refuted_by_simple_clause()
looks like trying to handle such cases, but if that's the case it seems
to fail here.

--strk; 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-05 Thread Pavel Stehule

2015-03-04 22:41 GMT+01:00 Peter Eisentraut :

> On 3/3/15 7:17 PM, Jim Nasby wrote:
> > I think we're screwed in that regard anyway, because of the special
> > constructs. You'd need different logic to handle things like +role and
> > sameuser. We might even end up painted in a corner where we can't change
> > it in the future because it'll break everyone's scripts.
>
> Yeah, I'm getting worried about this.  I think most people agree that
> getting a peek at pg_hba.conf from within the server is useful, but
> everyone seems to have quite different uses for it.  Greg wants to join
> against other catalog tables, Jim wants to reassemble a valid and
> accurate pg_hba.conf, Josh wants to write an editing tool.  Personally,
> I'd like to see something as close to the actual file as possible.
>
> If there were an obviously correct way to map the various special
> constructs to the available SQL data types, then fine.  But if there
> isn't, then we shouldn't give a false overinterpretation.  So I'd render
> everything that's disputed as a plain text field.  (Not even an array of
> text.)
>

I disagree with last note - arrays where is expected are valid. I don't see
any reason why anybody have to do parsing some informations from any table.

The face of pg_hba.conf in SQL space can be different than original file -
but all data should be clean (without necessity of additional parsing)

Regards

Pavel


>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Join push-down support for foreign tables

2015-03-05 Thread Shigeru Hanada

Hi Ashutosh, thanks for the review.

2015-03-04 19:17 GMT+09:00 Ashutosh Bapat :
> In create_foreignscan_path() we have lines like -
> 1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
> 1588
> required_outer);
> Now, that the same function is being used for creating foreign scan paths
> for joins, we should be calling get_joinrel_parampathinfo() on a join rel
> and get_baserel_parampathinfo() on base rel.

Got it.  Please let me check the difference.

>
> The patch seems to handle all the restriction clauses in the same way. There
> are two kinds of restriction clauses - a. join quals (specified using ON
> clause; optimizer might move them to the other class if that doesn't affect
> correctness) and b. quals on join relation (specified in the WHERE clause,
> optimizer might move them to the other class if that doesn't affect
> correctness). The quals in "a" are applied while the join is being computed
> whereas those in "b" are applied after the join is computed. For example,
> postgres=# select * from lt;
>  val | val2
> -+--
>1 |2
>1 |3
> (2 rows)
>
> postgres=# select * from lt2;
>  val | val2
> -+--
>1 |2
> (1 row)
>
> postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
>  val | val2 | val | val2
> -+--+-+--
>1 |2 |   1 |2
>1 |3 | |
> (2 rows)
>
> postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
> lt2.val2);
>  val | val2 | val | val2
> -+--+-+--
>1 |2 |   1 |2
> (1 row)
>
> The difference between these two kinds is evident in case of outer joins,
> for inner join optimizer puts all of them in class "b". The remote query
> sent to the foreign server has all those in ON clause. Consider foreign
> tables ft1 and ft2 pointing to local tables on the same server.
> postgres=# \d ft1
>  Foreign table "public.ft1"
>  Column |  Type   | Modifiers | FDW Options
> +-+---+-
>  val| integer |   |
>  val2   | integer |   |
> Server: loopback
> FDW Options: (table_name 'lt')
>
> postgres=# \d ft2
>  Foreign table "public.ft2"
>  Column |  Type   | Modifiers | FDW Options
> +-+---+-
>  val| integer |   |
>  val2   | integer |   |
> Server: loopback
> FDW Options: (table_name 'lt2')
>
> postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
> ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;
>
> QUERY PLAN
>
> ---
> 
>  Foreign Scan  (cost=100.00..125.60 rows=2560 width=16)
>Output: val, val2, val, val2
>Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
> public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
> _0, a_1)  ON r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
> l.a_1))
> (3 rows)
>
> The result is then wrong
> postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
> ft1.val + ft2.val > ft1.val2 or ft2.val is null;
>  val | val2 | val | val2
> -+--+-+--
>1 |2 | |
>1 |3 | |
> (2 rows)
>
> which should match the result obtained by substituting local tables for
> foreign ones
> postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
> lt.val + lt2.val > lt.val2 or lt2.val is null;
>  val | val2 | val | val2
> -+--+-+--
>1 |3 | |
> (1 row)
>
> Once we start distinguishing the two kinds of quals, there is some
> optimization possible. For pushing down a join it's essential that all the
> quals in "a" are safe to be pushed down. But a join can be pushed down, even
> if quals in "a" are not safe to be pushed down. But more clauses one pushed
> down to foreign server, lesser are the rows fetched from the foreign server.
> In postgresGetForeignJoinPath, instead of checking all the restriction
> clauses to be safe to be pushed down, we need to check only those which are
> join quals (class "a").

The argument restrictlist of GetForeignJoinPaths contains both
conditions mixed, so I added extract_actual_join_clauses() to separate
it into two lists, join_quals and other clauses.  This is similar to
what create_nestloop_plan and siblings do.


>
> Following EXPLAIN output seems to be confusing
> ft1 and ft2 both are pointing to same lt on a foreign server.
> postgres=# explain verbose select ft1.val + ft1.val2 from ft1, ft2 where
> ft1.val + ft1.val2 = ft2.val;
>
> QUERY PLAN
>
> ---
> --
>  Foreign Scan  (cost=100.00..132.00 rows=256

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Michael Paquier

On Thu, Mar 5, 2015 at 9:14 PM, Syed, Rahila  wrote:
> Please find attached  a patch. As discussed, flag to denote compression and 
> presence of hole in block image has been added in XLogRecordImageHeader 
> rather than block header.
>
> Following are WAL numbers based on attached  test script  posted by Michael 
> earlier in the thread.
>
>   WAL generated
> FPW compression on   122.032 MB
>
> FPW compression off   155.223 MB
>
> HEAD  155.236 MB
>
> Compression : 21 %
> Number of block images generated in WAL :   63637

ISTM that we are getting a nice thing here. I tested the patch and WAL
replay is working correctly.

Some nitpicky comments...

+ * bkp_info stores flags for information about the backup block image
+ * BKPIMAGE_IS_COMPRESSED is used to identify if a given block image
is compressed.
+ * BKPIMAGE_WITH_HOLE is used to identify the presence of a hole in a
block image.
+ * If the block image has no hole, it is ensured that the raw size of
a compressed
+ * block image is equal to BLCKSZ, hence the contents of
+ * XLogRecordBlockImageCompressionInfo are not necessary.
Take care of the limit of 80 characters per line. (Perhaps you could
run pgindent on your code before sending a patch?). The first line of
this paragraph is a sentence in itself, no?

In xlogreader.c, blk->with_hole is a boolean, you could remove the ==0
and ==1 it is compared with.

+  /*
+   * Length of a block image must be less than BLCKSZ
+   * if the block has hole
+   */
"if the block has a hole." (End of the sentence needs a dot.)

+   /*
+* Length of a block image must be equal to BLCKSZ
+* if the block does not have hole
+*/
"if the block does not have a hole."

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Andres Freund

On 2015-03-05 12:14:04 +, Syed, Rahila wrote:
> Please find attached  a patch. As discussed, flag to denote
> compression and presence of hole in block image has been added in
> XLogRecordImageHeader rather than block header.

FWIW, I personally won't commit it with things done that way. I think
it's going the wrong way, leading to a harder to interpret and less
flexible format.  I'm not going to further protest if Fujii or Heikki
commit it this way though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anyarray

2015-03-05 Thread Andrew Dunstan



On 03/04/2015 10:28 PM, Tom Lane wrote:

Peter Eisentraut  writes:

On 2/13/15 10:20 AM, Teodor Sigaev wrote:

Some of users of intarray contrib module wish to use its features with
another kind of arrays, not only for int4 type. Suggested module
generalizes intarray over other (not all) types op pgsql.

I think this module should be merged with the intarray module.  Having
two modules with very similar functionality would be confusing.

Perhaps.  I think it would be hard to remove intarray without breaking
things for existing users of it; even if the functionality remains under
another name.  And surely we don't want to generalize intarray while
keeping that same name.  So it might be hard to get to a clean solution.

Speaking of names, I can't avoid the feeling that it is a seriously bad
idea to name an extension the same thing as an existing core type.





+1. We have far too much experience already of this type of naming 
confusion.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-05 Thread Greg Stark

On Wed, Mar 4, 2015 at 9:41 PM, Peter Eisentraut  wrote:
> everyone seems to have quite different uses for it.  Greg wants to join
> against other catalog tables, Jim wants to reassemble a valid and
> accurate pg_hba.conf, Josh wants to write an editing tool.  Personally,
> I'd like to see something as close to the actual file as possible.

Well if you want to read the file as is you can do so using the file
reading functions which afaik were specifically intended for the
purpose of writing config editing tools.

Just to repeat, even if you're reassembling a valid and accurage
pg_hba.conf or writing an editing tool you can do that with what we
have today as long as there are no databases named "all", "sameuser",
or "samerole" and no roles named "all". That's something an editing
tool could check and provide a warning for to the user and refuse to
write a config file if it's the case and I doubt there are any such
users out there anyways.

Having five separate columns would work but I just think it's way more
complicated than necessary and burdens everyone who wants to use the
view less formally.

>
> If there were an obviously correct way to map the various special
> constructs to the available SQL data types, then fine.  But if there
> isn't, then we shouldn't give a false overinterpretation.  So I'd render
> everything that's disputed as a plain text field.  (Not even an array of
> text.)

Joining against other catalog tables may be kind of exaggerated but if
I have data in an SQL view I certainly expect to be able to write
WHERE clauses to find the rows I want without having to do string
parsing. If it were a comma-delimited string I have to do something
like WHERE users LIKE '%,username,%' but then that's not good enough
since it may be the first or last and the username itself may contain
white-space or a comma etc etc.

I think knowing about the + prefix and the risk of tokens like "all"
and "sameuser" etc are pretty small compromises to make. But having to
know the quoting rules and write a tokenizer are too far.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-03-05 Thread Alvaro Herrera

Robert Haas wrote:
> On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier
>  wrote:
> > Hm, why not. That would remove all inconsistencies between the parser
> > and the autovacuum code path. Perhaps something like the attached
> > makes sense then?
> 
> I really don't see this patch, or any of the previous ones, as solving
> any actual problem.  There's no bug here, and no reason to suspect
> that future code changes would be particularly like to introduce one.
> Assertions are a great way to help developers catch coding mistakes,
> but it's a real stretch to think that a developer is going to add a
> new syntax for ANALYZE that involves setting options proper to VACUUM
> and not notice it.

That was my opinion of previous patches in this thread.  But I think
this last one makes a lot more sense: why is the parser concerned with
figuring out the right defaults given FREEZE/not-FREEZE?  I think there
is a real gain in clarity here by deferring those decisions until vacuum
time.  The parser's job should be to pass the FREEZE flag down only,
which is what this patch does, and consequently results in a (small) net
reduction of LOC in gram.y.

Here's a simple idea to improve the patch: make VacuumParams include
VacuumStmt and the for_wraparound and do_toast flags.  ISTM that reduces
the number of arguments to be passed down in a couple of places.  In
particular:

vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)

vacuum_rel(VacuumParams *params)

Does that sound more attractive?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-03-05 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Robert Haas wrote:
> > On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier
> >  wrote:
> > > Hm, why not. That would remove all inconsistencies between the parser
> > > and the autovacuum code path. Perhaps something like the attached
> > > makes sense then?
> > 
> > I really don't see this patch, or any of the previous ones, as solving
> > any actual problem.  There's no bug here, and no reason to suspect
> > that future code changes would be particularly like to introduce one.
> > Assertions are a great way to help developers catch coding mistakes,
> > but it's a real stretch to think that a developer is going to add a
> > new syntax for ANALYZE that involves setting options proper to VACUUM
> > and not notice it.
> 
> That was my opinion of previous patches in this thread.  But I think
> this last one makes a lot more sense: why is the parser concerned with
> figuring out the right defaults given FREEZE/not-FREEZE?  I think there
> is a real gain in clarity here by deferring those decisions until vacuum
> time.  The parser's job should be to pass the FREEZE flag down only,
> which is what this patch does, and consequently results in a (small) net
> reduction of LOC in gram.y.

Yeah, that was my thinking also in my earlier review.

> Here's a simple idea to improve the patch: make VacuumParams include
> VacuumStmt and the for_wraparound and do_toast flags.  ISTM that reduces
> the number of arguments to be passed down in a couple of places.  In
> particular:
> 
> vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)
> 
> vacuum_rel(VacuumParams *params)
> 
> Does that sound more attractive?

I had been hoping we'd be able to provide an API which didn't require
autovacuum to build up a VacuumStmt, but that's not a big deal and it's
doing it currently anyway.  Just mentioning it as that was one of the
other things that I had been hoping to get out of this.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Join push-down support for foreign tables

2015-03-05 Thread Shigeru Hanada

Here is the v5 patch of Join push-down support for foreign tables.

Changes since v4:

- Separete remote conditions into ON and WHERE, per Ashutosh.
- Add regression test cases for foreign join.
- Don't skip reversed relation combination in OUTER join cases.

I'm now working on two issues from Kaigai-san and Ashutosu, whole-row
reference handling and use of get_joinrel_parampathinfo().

2015-03-05 22:00 GMT+09:00 Shigeru Hanada :
> Hi Ashutosh, thanks for the review.
>
> 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat :
>> In create_foreignscan_path() we have lines like -
>> 1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
>> 1588
>> required_outer);
>> Now, that the same function is being used for creating foreign scan paths
>> for joins, we should be calling get_joinrel_parampathinfo() on a join rel
>> and get_baserel_parampathinfo() on base rel.
>
> Got it.  Please let me check the difference.
>
>>
>> The patch seems to handle all the restriction clauses in the same way. There
>> are two kinds of restriction clauses - a. join quals (specified using ON
>> clause; optimizer might move them to the other class if that doesn't affect
>> correctness) and b. quals on join relation (specified in the WHERE clause,
>> optimizer might move them to the other class if that doesn't affect
>> correctness). The quals in "a" are applied while the join is being computed
>> whereas those in "b" are applied after the join is computed. For example,
>> postgres=# select * from lt;
>>  val | val2
>> -+--
>>1 |2
>>1 |3
>> (2 rows)
>>
>> postgres=# select * from lt2;
>>  val | val2
>> -+--
>>1 |2
>> (1 row)
>>
>> postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
>>  val | val2 | val | val2
>> -+--+-+--
>>1 |2 |   1 |2
>>1 |3 | |
>> (2 rows)
>>
>> postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
>> lt2.val2);
>>  val | val2 | val | val2
>> -+--+-+--
>>1 |2 |   1 |2
>> (1 row)
>>
>> The difference between these two kinds is evident in case of outer joins,
>> for inner join optimizer puts all of them in class "b". The remote query
>> sent to the foreign server has all those in ON clause. Consider foreign
>> tables ft1 and ft2 pointing to local tables on the same server.
>> postgres=# \d ft1
>>  Foreign table "public.ft1"
>>  Column |  Type   | Modifiers | FDW Options
>> +-+---+-
>>  val| integer |   |
>>  val2   | integer |   |
>> Server: loopback
>> FDW Options: (table_name 'lt')
>>
>> postgres=# \d ft2
>>  Foreign table "public.ft2"
>>  Column |  Type   | Modifiers | FDW Options
>> +-+---+-
>>  val| integer |   |
>>  val2   | integer |   |
>> Server: loopback
>> FDW Options: (table_name 'lt2')
>>
>> postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
>> ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;
>>
>> QUERY PLAN
>>
>> ---
>> 
>>  Foreign Scan  (cost=100.00..125.60 rows=2560 width=16)
>>Output: val, val2, val, val2
>>Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
>> public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
>> _0, a_1)  ON r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
>> l.a_1))
>> (3 rows)
>>
>> The result is then wrong
>> postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
>> ft1.val + ft2.val > ft1.val2 or ft2.val is null;
>>  val | val2 | val | val2
>> -+--+-+--
>>1 |2 | |
>>1 |3 | |
>> (2 rows)
>>
>> which should match the result obtained by substituting local tables for
>> foreign ones
>> postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
>> lt.val + lt2.val > lt.val2 or lt2.val is null;
>>  val | val2 | val | val2
>> -+--+-+--
>>1 |3 | |
>> (1 row)
>>
>> Once we start distinguishing the two kinds of quals, there is some
>> optimization possible. For pushing down a join it's essential that all the
>> quals in "a" are safe to be pushed down. But a join can be pushed down, even
>> if quals in "a" are not safe to be pushed down. But more clauses one pushed
>> down to foreign server, lesser are the rows fetched from the foreign server.
>> In postgresGetForeignJoinPath, instead of checking all the restriction
>> clauses to be safe to be pushed down, we need to check only those which are
>> join quals (class "a").
>
> The argument restrictlist of GetForeignJoinPaths contains both
> conditions mixed, so I added extract_actual_join_clauses() to separate
> it into two lists, join_qua

Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-03-05 Thread Robert Haas

On Thu, Mar 5, 2015 at 9:58 AM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> On Sun, Mar 1, 2015 at 6:58 AM, Michael Paquier
>>  wrote:
>> > Hm, why not. That would remove all inconsistencies between the parser
>> > and the autovacuum code path. Perhaps something like the attached
>> > makes sense then?
>>
>> I really don't see this patch, or any of the previous ones, as solving
>> any actual problem.  There's no bug here, and no reason to suspect
>> that future code changes would be particularly like to introduce one.
>> Assertions are a great way to help developers catch coding mistakes,
>> but it's a real stretch to think that a developer is going to add a
>> new syntax for ANALYZE that involves setting options proper to VACUUM
>> and not notice it.
>
> That was my opinion of previous patches in this thread.  But I think
> this last one makes a lot more sense: why is the parser concerned with
> figuring out the right defaults given FREEZE/not-FREEZE?  I think there
> is a real gain in clarity here by deferring those decisions until vacuum
> time.  The parser's job should be to pass the FREEZE flag down only,
> which is what this patch does, and consequently results in a (small) net
> reduction of LOC in gram.y.

Sure, I'll buy that.  If you want to refactor things that way, I have
no issue with it - I just want to point out that it seems to have zip
to do with what started this thread, which is why I wondered if we
were just talking for the sake of talking.

> Here's a simple idea to improve the patch: make VacuumParams include
> VacuumStmt and the for_wraparound and do_toast flags.  ISTM that reduces
> the number of arguments to be passed down in a couple of places.  In
> particular:
>
> vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)
>
> vacuum_rel(VacuumParams *params)
>
> Does that sound more attractive?

I dislike passing down parser nodes straight into utility commands.
It tends to make those those functions hard for in-core users to call,
and also to lead to security vulnerabilities where we look up the same
names more than once and just hope that we get the same OID every
time.  Stuffing the VacuumStmt pointer inside the VacuumParams object
doesn't, for me, help anything.  It'd be a lot more interesting if we
could get rid of that altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] object description for FDW user mappings

2015-03-05 Thread Alvaro Herrera

When commit cae565e503 introduced FDW user mappings, it used this in
getObjectDescription for them:

appendStringInfo(&buffer, _("user mapping for %s"), usename);

This was later mostly copied (by yours truly) as object identity by
commit f8348ea32e wherein I used this:
appendStringInfo(&buffer, "%s", usename);

As it turns out, this is wrong, because the pg_user_mapping catalog has
a two-column "primary key" which is user OID and server OID.  Therefore
it seems to me that the correct object description and identity must
include both username and server name.  I propose we change the above to
this:

appendStringInfo(&buffer, _("user mapping for %s in server %s"), 
usename,
 srv->servername);

I propose to apply the (appropriately rebased) attached patch to all
back branches.  Note in particular the wording changes in some error
messages in the foreign_data test.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 82e8814c9f4b89d31d51b2fa370add1c04ae0f12 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 5 Mar 2015 12:30:53 -0300
Subject: [PATCH] fix user mapping object description/identity

---
 src/backend/catalog/objectaddress.c| 30 --
 src/test/regress/expected/foreign_data.out | 24 
 2 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 70043fc..0740b4f 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -2510,14 +2510,17 @@ getObjectDescription(const ObjectAddress *object)
 HeapTuple	tup;
 Oid			useid;
 char	   *usename;
+Form_pg_user_mapping umform;
+ForeignServer *srv;
 
 tup = SearchSysCache1(USERMAPPINGOID,
 	  ObjectIdGetDatum(object->objectId));
 if (!HeapTupleIsValid(tup))
 	elog(ERROR, "cache lookup failed for user mapping %u",
 		 object->objectId);
-
-useid = ((Form_pg_user_mapping) GETSTRUCT(tup))->umuser;
+umform = (Form_pg_user_mapping) GETSTRUCT(tup);
+useid = umform->umuser;
+srv = GetForeignServer(umform->umserver);
 
 ReleaseSysCache(tup);
 
@@ -2526,7 +2529,8 @@ getObjectDescription(const ObjectAddress *object)
 else
 	usename = "public";
 
-appendStringInfo(&buffer, _("user mapping for %s"), usename);
+appendStringInfo(&buffer, _("user mapping for %s in server %s"), usename,
+ srv->servername);
 break;
 			}
 
@@ -3906,19 +3910,18 @@ getObjectIdentityParts(const ObjectAddress *object,
 			{
 HeapTuple	tup;
 Oid			useid;
+Form_pg_user_mapping umform;
+ForeignServer *srv;
 const char *usename;
 
-/* no objname support */
-if (objname)
-	*objname = NIL;
-
 tup = SearchSysCache1(USERMAPPINGOID,
 	  ObjectIdGetDatum(object->objectId));
 if (!HeapTupleIsValid(tup))
 	elog(ERROR, "cache lookup failed for user mapping %u",
 		 object->objectId);
-
-useid = ((Form_pg_user_mapping) GETSTRUCT(tup))->umuser;
+umform = (Form_pg_user_mapping) GETSTRUCT(tup);
+useid = umform->umuser;
+srv = GetForeignServer(umform->umserver);
 
 ReleaseSysCache(tup);
 
@@ -3927,7 +3930,14 @@ getObjectIdentityParts(const ObjectAddress *object,
 else
 	usename = "public";
 
-appendStringInfoString(&buffer, usename);
+if (objname)
+{
+	*objname = list_make1(pstrdup(usename));
+	*objargs = list_make1(pstrdup(srv->servername));
+}
+
+appendStringInfo(&buffer, "%s in server %s", usename,
+ srv->servername);
 break;
 			}
 
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 632b7e5..4d0de1f 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -247,7 +247,7 @@ CREATE USER MAPPING FOR current_user SERVER s1;
 DROP FOREIGN DATA WRAPPER foo;  -- ERROR
 ERROR:  cannot drop foreign-data wrapper foo because other objects depend on it
 DETAIL:  server s1 depends on foreign-data wrapper foo
-user mapping for foreign_data_user depends on server s1
+user mapping for foreign_data_user in server s1 depends on server s1
 HINT:  Use DROP ... CASCADE to drop the dependent objects too.
 SET ROLE regress_test_role;
 DROP FOREIGN DATA WRAPPER foo CASCADE;  -- ERROR
@@ -256,7 +256,7 @@ RESET ROLE;
 DROP FOREIGN DATA WRAPPER foo CASCADE;
 NOTICE:  drop cascades to 2 other objects
 DETAIL:  drop cascades to server s1
-drop cascades to user mapping for foreign_data_user
+drop cascades to user mapping for foreign_data_user in server s1
 \dew+
 List of foreign-data wrappers
 Name|   Owner   | Handler |Validator | Access privileges | FDW Options | Description 
@@ -526

Re: [HACKERS] Combining Aggregates

2015-03-05 Thread Robert Haas

On Wed, Mar 4, 2015 at 11:00 PM, Kouhei Kaigai  wrote:
> Anyway, I could find out, at least, these complicated issues around
> two-phase aggregate integration with planner. Someone can suggest
> minimum invasive way for these integration?

I don't think I have the brain space to think about that until we get
the basic parallelism stuff in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] File based Incremental backup v8

2015-03-05 Thread Robert Haas

On Wed, Mar 4, 2015 at 11:42 PM, Bruce Momjian  wrote:
> On Thu, Mar  5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
>> >> Yeah, it might make the situation better than today. But I'm afraid that
>> >> many users might get disappointed about that behavior of an incremental
>> >> backup after the release...
>> >
>> > I don't get what do you mean here. Can you elaborate this point?
>>
>> The proposed version of LSN-based incremental backup has some limitations
>> (e.g., every database files need to be read even when there is no 
>> modification
>> in database since last backup, and which may make the backup time longer than
>> users expect) which may disappoint users. So I'm afraid that users who can
>> benefit from the feature might be very limited. IOW, I'm just sticking to
>> the idea of timestamp-based one :) But I should drop it if the majority in
>> the list prefers the LSN-based one even if it has such limitations.
>
> We need numbers on how effective each level of tracking will be.  Until
> then, the patch can't move forward.

The point is that this is a stepping stone toward what will ultimately
be a better solution.  You can use timestamps today if (a) whole-file
granularity is good enough for you and (b) you trust your system clock
to never go backwards.  In fact, if you use pg_start_backup() and
pg_stop_backup(), you don't even need a server patch; you can just go
right ahead and implement whatever you like.  A server patch would be
needed to make pg_basebackup do a file-time-based incremental backup,
but I'm not excited about that because I think the approach is a
dead-end.

If you want block-level granularity, and you should, an approach based
on file times is never going to get you there.  An approach based on
LSNs can.  If the first version of the patch requires reading the
whole database, fine, it's not going to perform all that terribly
well.  But we can optimize that later by keeping track of which blocks
have been modified since a given LSN.  If we do that, we can get
better reliability than the timestamp approach can ever offer, plus
excellent transfer and storage characteristics.

What I'm unhappy with about this patch is that it insists on sending
the whole file if a single block in that file has changed.  That is
lame.  To get something useful out of this, we should be looking to
send only those blocks whose LSNs have actually changed.  That would
reduce I/O (in the worst case, the current patch each file in its
entirety twice) and transfer bandwidth as compared to the proposed
patch.  We'd still have to read the whole database so it might very
well do more I/O than the file-timestamp approach, but it would beat
the file-timestamp approach on transfer bandwidth and on the amount of
storage required to store the incremental.  In many workloads, I
expect those savings would be quite significant.  If we then went back
in a later release and implemented one of the various proposals to
avoid needing to read every block, we'd then have a very robust and
complete solution.

But I agree with Fujii to the extent that I see little value in
committing this patch in the form proposed.  Being smart enough to use
the LSN to identify changed blocks, but then sending the entirety of
every file anyway because you don't want to go to the trouble of
figuring out how to revise the wire protocol to identify the
individual blocks being sent and write the tools to reconstruct a full
backup based on that data, does not seem like enough of a win. As
Fujii says, if we ship this patch as written, people will just keep
using the timestamp-based approach anyway.  Let's wait until we have
something that is, at least in some circumstances, a material
improvement over the status quo before committing anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost

* Josh Berkus (j...@agliodbs.com) wrote:
> Catching up here ...
> 
> On 03/03/2015 06:01 PM, Bruce Momjian wrote:
> > It feels like MD5 has accumulated enough problems that we need to start
> > looking for another way to store and pass passwords.  The MD5 problems
> > are:
> > 
> > 1)  MD5 makes users feel uneasy (though our usage is mostly safe) 
> > 
> > 2)  The per-session salt sent to the client is only 32-bits, meaning
> > that it is possible to reply an observed MD5 hash in ~16k connection
> > attempts.
> 
> Seems like we could pretty easily increase the size of the salt.  Of
> course, that just increases the required number of connection attempts,
> without really fixing the problem.

Please do not propose hacks on top of the existing "md5" authentication
method which break the wireline protocol.  There is absolutely nothing
useful to be gained from that.  We need to introduce a new auth method
with whatever protocol changes that requires without breaking the
existing auth method.  Discussing trade-offs of changing the existing
md5 mechanism *without* breaking the wireline protocol may be worthwhile
as we might be able to improve things a bit there, but nothing there is
a proper solution for security conscious individuals.

> > 3)  Using the user name for the MD5 storage salt allows the MD5 stored
> > hash to be used on a different cluster if the user used the same
> > password. 
> 
> This is a feature as well as a bug. For example, pgBouncer relies on
> this aspect of md5 auth.

It's not a feature and pgBouncer could be made to not rely on this.

> > 4)  Using the user name for the MD5 storage salt causes the renaming of
> > a user to break the stored password.
> 
> Wierdly, in 17 years of Postgres, I've never encountered this issue.

I agree, that's kind of odd. :)

> So, are we more worried about attackers getting a copy of pg_authid, or
> sniffing the hash on the wire?

They are both attack vectors to consider but most applications address
the network-based risk through TLS and have a hash-based approach to
address the pg_authid-based risk.  Administrators are used to that and
are quite surprised to discover that PG doesn't work that way.  As I
mentioned up-thread, it's certainly very rare for anyone concerned about
a network-based risk to *not* use TLS, but they tend to still use
passwords for the actual authentication mechansim and the md5 auth mech
makes the wrong trade-off for them.  The password auth mech isn't ideal
either since the server ends up seeing the PW.  The change I'm
suggesting addresses both the pg_authid-based risk and the "server
seeing the PW" risk without changing the wireline protocol.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Mar  4, 2015 at 05:56:25PM -0800, Josh Berkus wrote:
> > So, are we more worried about attackers getting a copy of pg_authid, or
> > sniffing the hash on the wire?
> 
> Both.  Stephen is more worried about pg_authid, but I am more worried
> about sniffing.

I'm also worried about both, but if the admin is worried about sniffing
in their environment, they're much more likely to use TLS than to set up
client side certificates, kerberos, or some other strong auth mechanism,
simply because TLS is pretty darn easy to get working and distros set it
up for you by default.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> One way to fix #2 would be to use a per-user or per-cluster counter for
> the session salt, rather than a random number --- that would change
> replays from ~16k to 4 billion, with no wire protocol change needed.

I'm not against doing that if we decide to ignore the pg_authid-based
vector (which we could certainly do), but given the relatively poor
hashing algorithm we use and the small salt, along with the commonly
used practice of using TLS to address network-based attacks, I'm not
sure it's really worth it.

Note that changing the algorithm or the salt would require a wireline
protocol change and therefore isn't interesting to consider as, if we're
going to do that, we should be moving to a vetted solution instead.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Mar  4, 2015 at 04:19:00PM -0500, Stephen Frost wrote:
> > > Hm, well, "don't change the wireline protocol" could be another wanna-have
> > > ... but if we want to stop using MD5, that's not really a realistic goal
> > > is it?
> > 
> > I'm trying to address both sides of the issue- improve the current
> > situation without breaking existing clients AND provide a new auth
> > method and encourage everyone to move to it as soon as they're able.  We
> > can't simply deprecate md5 as that would break pg_upgrade for any users
> > who are currently using it.
> 
> Actually, pg_upgrade uses 'trust' and a private socket file, at least on
> Unix.  Of course, post-upgrade, users would have trouble logging in.

The post-upgrade trouble logging in is what I was getting at.  Of
course, that issue exists with a pg_dump-based approach also, so my
reference to pg_upgrade above wasn't accurate and it shold have been
"would break upgrades for any users who are currently using it."

> > This half of the discussion has been all about improving md5 without
> > breaking the wireline protocol or existing users.
> > 
> > The other half of the discussion is about implementing a new
> > password-based authentication based on one of the vetted authentication
> > protocols already in use today (SCRAM or SRP, for example).  Using those
> > new authentication protocols would include a move off of the MD5 hashing
> > function, of course.  This would also mean breaking the on-disk hash,
> > but that's necessary anyway because what we do there today isn't secure
> > either and no amount of futzing is going to change that.
> 
> While I don't like the requirement to use TLS to improve MD5 fix, I also
> don't like the idea of having users go through updating all these
> passwords only to have us implement the _right_ solution in the next
> release.  I don't see why it is useful to be patching up MD5 with a TLS
> requirement when we know they should be moving to SCRAM or SRP.  If the
> MD5 change was transparent to users/admins, that would be different.

So, TLS *isn't* an actual requirement with the approach that I've
outlined, it's just that if you're not using it then you have a somewhat
larger risk of successful network-based attacks, but then, if you're
not using TLS then you're unlikely to be worried about that vector.

> > I've got nearly zero interest in trying to go half-way on this by
> > designing something that we think is secure which has had no external
> > review or anyone else who uses it.  Further, going that route makes me
> > very nervous that we'd decide on certain compromises in order to make
> > things easier for users without actually realising the problems with
> > such an approach (eg: "well, if we use hack X we wouldn't have to
> > change what is stored on disk and therefore we wouldn't break
> 
> I am not happy to blindly accept a new security setup without
> understanding exactly what it is trying to fix, which is why I am asking
> all these questions.

I certainly didn't intend to suggest that anyone blindly accept a new
security setup.  I don't mind the questions but I do get a bit
frustrated at the suggestions that we can "fix" the md5 auth mechanism
by simply increasing the salt or putting in a better hashing algorithm.
Those are not solutions to this authentication challenge- for that, we
need to look at SCRAM or SRP, where RFCs have been reviewed and
published which outline exactly how they work and why they work the way
they do.  I'd certainly encourage everyone interested in this to go look
at those RFCs.

SCRAM - https://tools.ietf.org/html/rfc5802
SRP - https://www.ietf.org/rfc/rfc2945.txt

> > of a transistion would be a lot easier on our users and I'd definitely
> > recommend that we add that with this new authentication mechanism, to
> > address this kind of issue in the future (not to mention that's often
> > asked for..).  Password complexity would be another thing we should
> > really add and is also often requested.
> 
> I agree our password management could use improvement.

I'm *very* glad to hear that.  When I brought it up 10-or-so years ago,
there was very little interest in these kinds of improvements.

> > Frankly, in the end, I don't see us being able to produce something in
> > time for this release unless someone can be dedicated to the effort over
> > the next couple of months, and therefore I'd prefer to improve the
> > current issues with md5 without breaking the wireline protocol than
> > simply do nothing (again).
> 
> I am not sure why we have to shove something into 9.5 --- as you said,
> this issue has been known about for 10+ years.

It'd be nice to show that we're being responsive when issues are brought
up, even if they've existed for a long time (because, in the minds of
some, that we've done nothing in 10 years doesn't exactly reflect very
well on us :( ).

> I think we should do what we can to improve MD5 in cases where the

Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Tom Lane

Alvaro Herrera  writes:
> When commit cae565e503 introduced FDW user mappings, it used this in
> getObjectDescription for them:
>   appendStringInfo(&buffer, _("user mapping for %s"), usename);

> This was later mostly copied (by yours truly) as object identity by
> commit f8348ea32e wherein I used this:
>   appendStringInfo(&buffer, "%s", usename);

> As it turns out, this is wrong, because the pg_user_mapping catalog has
> a two-column "primary key" which is user OID and server OID.  Therefore
> it seems to me that the correct object description and identity must
> include both username and server name.  I propose we change the above to
> this:

>   appendStringInfo(&buffer, _("user mapping for %s in server %s"), 
> usename,
>srv->servername);

+1 for the concept, but to be nitpicky, "in" doesn't seem like the right
word here.  "on server" would read better to me; or perhaps "at server".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-05 Thread Sawada Masahiko

On Sat, Feb 28, 2015 at 2:27 PM, Stephen Frost  wrote:
> Sawada,
>
> * Sawada Masahiko (sawada.m...@gmail.com) wrote:
>> Attached patch is latest version including following changes.
>> - This view is available to super use only
>> - Add sourceline coulmn
>
> Alright, first off, to Josh's point- I'm definitely interested in a
> capability to show where the heck a given config value is coming from.
> I'd be even happier if there was a way to *force* where config values
> have to come from, but that ship sailed a year ago or so.  Having this
> be for the files, specifically, seems perfectly reasonable to me.  I
> could see having another function which will report where a currently
> set GUC variable came from (eg: ALTER ROLE or ALTER DATABASE), but
> having a function for "which config file is this GUC set in" seems
> perfectly reasonable to me.
>
> To that point, here's a review of this patch.
>
>> diff --git a/src/backend/catalog/system_views.sql 
>> b/src/backend/catalog/system_views.sql
>> index 6df6bf2..f628cb0 100644
>> --- a/src/backend/catalog/system_views.sql
>> +++ b/src/backend/catalog/system_views.sql
>> @@ -412,6 +412,11 @@ CREATE RULE pg_settings_n AS
>>
>>  GRANT SELECT, UPDATE ON pg_settings TO PUBLIC;
>>
>> +CREATE VIEW pg_file_settings AS
>> +   SELECT * FROM pg_show_all_file_settings() AS A;
>> +
>> +REVOKE ALL on pg_file_settings FROM public;
>
> While this generally "works", the usual expectation is that functions
> that should be superuser-only have a check in the function rather than
> depending on the execute privilege.  I'm certainly happy to debate the
> merits of that approach, but for the purposes of this patch, I'd suggest
> you stick an if (!superuser()) ereport("must be superuser") into the
> function itself and not work about setting the correct permissions on
> it.
>
>> + ConfigFileVariable *guc;
>
> As this ends up being an array, I'd call it "guc_array" or something
> along those lines.  GUC in PG-land has a pretty specific single-entity
> meaning.
>
>> @@ -344,6 +346,21 @@ ProcessConfigFile(GucContext context)
>>   PGC_BACKEND, 
>> PGC_S_DYNAMIC_DEFAULT);
>>   }
>>
>> + guc_file_variables = (ConfigFileVariable *)
>> + guc_malloc(FATAL, num_guc_file_variables * 
>> sizeof(struct ConfigFileVariable));
>
> Uh, perhaps I missed it, but what happens on a reload?  Aren't we going
> to realloc this every time?  Seems like we should be doing a
> guc_malloc() the first time through but doing guc_realloc() after, or
> we'll leak memory on every reload.
>
>> + /*
>> +  * Apply guc config parameters to guc_file_variable
>> +  */
>> + guc = guc_file_variables;
>> + for (item = head; item; item = item->next, guc++)
>> + {
>> + guc->name = guc_strdup(FATAL, item->name);
>> + guc->value = guc_strdup(FATAL, item->value);
>> + guc->filename = guc_strdup(FATAL, item->filename);
>> + guc->sourceline = item->sourceline;
>> + }
>
> Uh, ditto and double-down here.  I don't see a great solution other than
> looping through the previous array and free'ing each of these, since we
> can't depend on the memory context machinery being up and ready at this
> point, as I recall.
>
>> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
>> index 931993c..c69ce66 100644
>> --- a/src/backend/utils/misc/guc.c
>> +++ b/src/backend/utils/misc/guc.c
>> @@ -3561,9 +3561,17 @@ static const char *const map_old_guc_names[] = {
>>   */
>>  static struct config_generic **guc_variables;
>>
>> +/*
>> + * lookup of variables for pg_file_settings view.
>> + */
>> +static struct ConfigFileVariable *guc_file_variables;
>> +
>>  /* Current number of variables contained in the vector */
>>  static int   num_guc_variables;
>>
>> +/* Number of file variables */
>> +static int   num_guc_file_variables;
>> +
>
> I'd put these together, and add a comment explaining that
> guc_file_variables is an array of length num_guc_file_variables..
>
>>  /*
>> + * Return the total number of GUC File variables
>> + */
>> +int
>> +GetNumConfigFileOptions(void)
>> +{
>> + return num_guc_file_variables;
>> +}
>
> I don't see the point of this function..
>
>> +Datum
>> +show_all_file_settings(PG_FUNCTION_ARGS)
>> +{
>> + FuncCallContext *funcctx;
>> + TupleDesc   tupdesc;
>> + int call_cntr;
>> + int max_calls;
>> + AttInMetadata *attinmeta;
>> + MemoryContext oldcontext;
>> +
>> + if (SRF_IS_FIRSTCALL())
>> + {
>> + funcctx = SRF_FIRSTCALL_INIT();
>> +
>> + oldcontext = 
>> MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
>> +
>> + /*
>> +  * need a tuple descriptor representing NUM_PG_SETTINGS_ATTS 
>> columns
>> +  * of the appropriate types
>> +  */
>> +
>> + tupdesc = CreateTemplateTupleDesc(NUM_PG_FILE_SETTINGS_ATTS,

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-05 Thread Stephen Frost

* Peter Eisentraut (pete...@gmx.net) wrote:
> On 3/3/15 5:29 PM, Stephen Frost wrote:
> > For my part, I understood that we specifically didn't want to allow that
> > for the same reason that we didn't want to simply depend on the GRANT
> > system for the above functions, but everyone else on these discussions
> > so far is advocating for using the GRANT system.  My memory might be
> > wrong, but I could have sworn that I had brought up exactly that
> > suggestion in years past only to have it shot down.
> 
> I think a lot of this access control work is done based on some
> undocumented understandings, when in fact there is no consensus on
> anything.  I think we need more clarity.

When there hasn't been discussion on a particular topic and years of
ongoing development, all of which uses one approach, I tend to figure
that makes it an unspoking consensus.  I'm not saying we shouldn't
question that when it makes sense to, we certainly should, but I'm not
sure it makes sense to ask "why didn't you attempt to get consensus on
this thing we've all been doing for years?"

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Additional role attributes && superuser review

2015-03-05 Thread Stephen Frost

* Peter Eisentraut (pete...@gmx.net) wrote:
> On 2/28/15 10:10 PM, Stephen Frost wrote:
> > * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
> >> I have attached and updated patch for review.
> > 
> > Thanks!  I've gone over this and made quite a few documentation and
> > comment updates, but not too much else, so I'm pretty happy with how
> > this is coming along.  As mentioned elsewhere, this conflicts with the
> > GetUserId() to has_privs_of_role() cleanup, but as I anticipate handling
> > both this patch and that one, I'll find some way to manage. :)
> > 
> > Updated patch attached.  Barring objections, I'll be moving forward with
> > this soonish.  Would certainly appreciate any additional testing or
> > review that you (or anyone!) has time to provide.
> 
> Let's move this discussion to the right thread.

Agreed.

> Why are we not using roles and function execute privileges for this?

There isn't a particular reason not to, except that the existing checks
are in C code and those would need to be removed and the permission
changes done at initdb time to revoke EXECUTE from PUBLIC for these
functions.  Further, as you pointed out, we'd need to dump out the
permissions for the catalog tables and functions with this approach.  I
don't expect that to be too difficult to do though.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] contraints_exclusion fails to refute simple condition

2015-03-05 Thread Andrew Gierth

> "Sandro" == Sandro Santilli  writes:

 Sandro> PostGIS installs standard constraints of this kind:

 Sandro>   CHECK (geometrytype(g) = 'POINT'::text OR g IS NULL)

If geometrytype() is strict, the IS NULL condition is superfluous.
CHECK(x) _passes_ when x is null, rather than rejecting the row.

 Sandro> But it is _NOT_ used if the NOT NULL condition is removed:

 Sandro>   WHERE geometrytype(g) = 'LINESTRING'

 Sandro> As the "geometrytype" is defined as STRICT and IMMUTABLE,
 Sandro> there's no way for geometrytype(g) = 'LINESTRING' to hold true,
 Sandro> so why is the "IS NOT NULL" condition also needed by the
 Sandro> planner ?

 Sandro> Andres Freund on IRC suggested that
 Sandro> predicate_refuted_by_simple_clause() looks like trying to
 Sandro> handle such cases, but if that's the case it seems to fail
 Sandro> here.

It looks like it searches only one level deep into the clause, so it
knows that if OP or FUNC is strict that (g OP const) or (FUNC(g)) will
refute (g IS NULL), but not that (FUNC(g) OP const) will do so, because
the reference to g is nested too deep.

In this case the obvious thing to do is remove the redundant clause.
Whether it would be worth teaching the predicate prover to recurse in
such cases would be a performance tradeoff - likely a poor one, since it
would have to recurse into arbitrarily complex expressions.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CATUPDATE confusion?

2015-03-05 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Peter Eisentraut  writes:
> > Any other opinions?
> > The options are:
> > 0) Leave as is.
> > 1) Remove catupdate and replace with superuser checks.
> > 2) Remove catupdate and rely on regular table permissions on catalogs.
> 
> I think I vote for (1).  I do not like (2) because of the argument I made
> upthread that write access on system catalogs is far more dangerous than
> a naive DBA might think.  (0) has some attraction but really CATUPDATE
> is one more concept than we need.

I agree with #1 on this.

> On the other hand, if Stephen is going to push forward with his plan to
> subdivide superuserness, we might have the equivalent of CATUPDATE right
> back again.  (But at least it would be properly documented and
> supported...)

The subdivision of superuserness is intended only for operations which
can't be used to directly give the user superuser access back and
therefore I don't think we'd ever put back CATUPDATE or an equivilant.

I'd much rather reduce the need for direct catalog modifications by
adding additional syntax for those operations which can't be done
without modifying the catalog directly and, where it makes sense to, add
a way to control access to those operations.

For example, changing a database to not accept connections seems like
something the database owner should be allowed to do.  Perhaps that'd be
interesting to allow users other than the owner to do, perhaps it
doesn't, but that would be an independent question to address.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Proposal: knowing detail of config files via SQL

2015-03-05 Thread Stephen Frost

* Peter Eisentraut (pete...@gmx.net) wrote:
> On 3/3/15 5:58 PM, Tom Lane wrote:
> > One aspect of this that merits some thought is that in some cases
> > access to some set of functions is best granted as a unit.  That's
> > easy with role properties but much less so with plain GRANT.
> > Do we have enough such cases to make it an issue?
> 
> You could have built-in roles, such as "backup" and ship the system with
> the "backup" role having permissions on some functions.  And then users
> are granted those roles.  Similar to how some Linux systems ship with
> groups such as "adm".

One thought I had for this was a contrib module which added an extension
to create and grant those roles.  That approach would mean that we don't
need to worry about upgrade-path problems which we could get into if we
declared new roles like 'backup' which users might already have.

An alternative approach which might be better, now that I think about
it, would be to declare that the 'pg_' prefix applies to roles too and
then have a 'pg_backup' role which is granted the correct permissions.
Personally, I like that idea a lot..

We could then have pg_upgrade throw an error and pg_dump a warning (or
something along those lines) if they find any existing roles with that
prefix.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] deparsing utility commands

2015-03-05 Thread Stephen Frost

Alvaro,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Thanks for the review.  I have pushed these patches, squashed in one
> commit, with the exception of the one that changed ALTER TABLE.  You had
> enough comments about that one that I decided to put it aside for now,
> and get the other ones done.  I will post a new series shortly.

I look forward to seeing the new series posted soon.

> I fixed (most of?) the issues you pointed out; some additional comments:

These all looked good to me.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Jim Nasby

On 3/4/15 2:56 PM, Stephen Frost wrote:

2)  The per-session salt sent to the client is only 32-bits, meaning
>that it is possible to reply an observed MD5 hash in ~16k connection
>attempts.

Yes, and we have no (PG-based) mechanism to prevent those connection
attempts, which is a pretty horrible situation to be in.

Is there some reason we don't just fix that? I'm thinking that this is a 
special case where we could just modify the pg_auth tuple in-place 
without bloating the catalog (we already do that somewhere else). Is 
there something else that makes this difficult? Are we afraid of an 
extra GUC to control it?

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost

* Jim Nasby (jim.na...@bluetreble.com) wrote:
> On 3/4/15 2:56 PM, Stephen Frost wrote:
> >>2)  The per-session salt sent to the client is only 32-bits, meaning
> >>>that it is possible to reply an observed MD5 hash in ~16k connection
> >>>attempts.
> >Yes, and we have no (PG-based) mechanism to prevent those connection
> >attempts, which is a pretty horrible situation to be in.
> 
> Is there some reason we don't just fix that? I'm thinking that this
> is a special case where we could just modify the pg_auth tuple
> in-place without bloating the catalog (we already do that somewhere
> else). Is there something else that makes this difficult? Are we
> afraid of an extra GUC to control it?

I'm all for it, though I would ask that we provide a way for superusers
to delegate the ability to reset locked accounts to non-superusers.

I'd want to think about it a bit more before settling on using pg_authid
to track the data.  In any case, I do think we need a way to disable
this ability for certain roles and, furtherr, that we not track failed
logins in cases where it's disabled (which might well be the default- I
don't think we want to add this overhead for systems which have lots of
recurring logins (think application users where they aren't doing
pooling).

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] a2e35b53 added unused variable to ConversionCreate()

2015-03-05 Thread Alvaro Herrera

Peter Geoghegan wrote:
> I'm seeing this on a the master branch tip when building at -O2:
> 
> pg_conversion.c: In function ‘ConversionCreate’:
> pg_conversion.c:53:6: error: variable ‘oid’ set but not used
> [-Werror=unused-but-set-variable]
>   Oid   oid;
>   ^
> cc1: all warnings being treated as errors
> 
> I think that commit a2e35b53 did this.

Obviously the commit did not introduce an unused variable, but instead
made another variable take its place as the function's return value.  In
an assert-enabled build it was used by an assert, which is why I didn't
notice the problem.  I removed the assert and the variable so this
should be fixed now.

Thanks for reporting.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Jim Nasby


On 3/5/15 2:17 PM, Stephen Frost wrote:

* Jim Nasby (jim.na...@bluetreble.com) wrote:

On 3/4/15 2:56 PM, Stephen Frost wrote:

2)  The per-session salt sent to the client is only 32-bits, meaning

that it is possible to reply an observed MD5 hash in ~16k connection
attempts.

Yes, and we have no (PG-based) mechanism to prevent those connection
attempts, which is a pretty horrible situation to be in.


Is there some reason we don't just fix that? I'm thinking that this
is a special case where we could just modify the pg_auth tuple
in-place without bloating the catalog (we already do that somewhere
else). Is there something else that makes this difficult? Are we
afraid of an extra GUC to control it?


I'm all for it, though I would ask that we provide a way for superusers
to delegate the ability to reset locked accounts to non-superusers.

I'd want to think about it a bit more before settling on using pg_authid


I guess it's a question of how durable we want it to be. We could 
conceivable keep it in shared memory and let it wipe on a crash.


But we already have code that ignores MVCC on a catalog table (IIRC for 
updating pg_class stats after vacuum) so the pattern is there. I don't 
see that we need more sophistication than that...



to track the data.  In any case, I do think we need a way to disable
this ability for certain roles


In the interest of something for this release... do we really need that? 
My thought is we just special-case the postgres user and be done with 
it. Though, if there's some other way to reset an account from the 
shell, no need to even special case postgres.


Though, I guess if we just follow the normal GUC behavior of allowing 
per-database and -user overrides it wouldn't be that hard.



and, furtherr, that we not track failed
logins in cases where it's disabled (which might well be the default- I
don't think we want to add this overhead for systems which have lots of
recurring logins (think application users where they aren't doing
pooling).


Yeah, presumably if allowed_authentication_failures < 0 then we don't 
bother with the check at all.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MD5 authentication needs help

2015-03-05 Thread Stephen Frost

* Jim Nasby (jim.na...@bluetreble.com) wrote:
> On 3/5/15 2:17 PM, Stephen Frost wrote:
> >* Jim Nasby (jim.na...@bluetreble.com) wrote:
> >I'm all for it, though I would ask that we provide a way for superusers
> >to delegate the ability to reset locked accounts to non-superusers.
> >
> >I'd want to think about it a bit more before settling on using pg_authid
> 
> I guess it's a question of how durable we want it to be. We could
> conceivable keep it in shared memory and let it wipe on a crash.
> 
> But we already have code that ignores MVCC on a catalog table (IIRC
> for updating pg_class stats after vacuum) so the pattern is there. I
> don't see that we need more sophistication than that...

I'm not sure we should jump to that immediately..

> >to track the data.  In any case, I do think we need a way to disable
> >this ability for certain roles
> 
> In the interest of something for this release... do we really need
> that? My thought is we just special-case the postgres user and be
> done with it. Though, if there's some other way to reset an account
> from the shell, no need to even special case postgres.

I don't think this is going to happen for 9.5 unless someone shows up
with code in the *very* short term..  Further, realistically, we would
want to design this properly and not just hack something together.

> Though, I guess if we just follow the normal GUC behavior of
> allowing per-database and -user overrides it wouldn't be that hard.

Yes, using the GUC-based approach would allow users to be excluded from
an overall (or per-database) policy.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Idea: closing the loop for "pg_ctl reload"

2015-03-05 Thread Jim Nasby


On 3/4/15 7:13 PM, Jan de Visser wrote:

On March 4, 2015 11:08:09 PM Andres Freund wrote:

Let's get the basic feature (notification of failed reloads) done
first. That will be required with or without including the error
message.  Then we can get more fancy later, if somebody really wants to
invest the time.


Except for also fixing pg_reload_conf() to pay attention to what happens with
postmaster.pid, the patch I submitted does exactly what you want I think.

And I don't mind spending time on stuff like this. I'm not smart enough to deal
with actual, you know, database technology.


Yeah, lets at least get this wrapped and we can see about improving it.

I like the idea of doing a here-doc or similar in the .pid, though I 
think it'd be sufficient to just prefix all the continuation lines with 
a tab. An uglier option would be just striping the newlines out.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about lazy_space_alloc() / linux over-commit

2015-03-05 Thread Jim Nasby


On 3/4/15 9:10 AM, Robert Haas wrote:

On Wed, Feb 25, 2015 at 5:06 PM, Jim Nasby  wrote:

Could the large allocation[2] for the dead tuple array in lazy_space_alloc
cause problems with linux OOM? [1] and some other things I've read indicate
that a large mmap will count towards total system memory, including
producing a failure if overcommit is disabled.


I believe that this is possible.


Would it be worth avoiding the full size allocation when we can?


Maybe.  I'm not aware of any evidence that this is an actual problem
as opposed to a theoretical one.  vacrelstats->dead_tuples is limited
to a 1GB allocation, which is not a trivial amount of memory, but it's
not huge, either.  But we could consider changing the representation
from a single flat array to a list of chunks, with each chunk capped
at say 64MB.  That would not only reduce the amount of memory that we


I was thinking the simpler route of just repalloc'ing... the memcpy 
would suck, but much less so than the extra index pass. 64M gets us 11M 
tuples, which probably isn't very common.



needlessly allocate, but would allow autovacuum to make use of more
than 1GB of maintenance_work_mem, which it looks like it currently
can't.  I'm not sure that's a huge problem right now either, because


I'm confused... how autovacuum is special in this regard? Each worker 
can use up to 1G, just like a regular vacuum, right? Or are you just 
saying getting rid of the 1G limit would be good?



it's probably rare to vacuum at able with more than 1GB / 6 =
178,956,970 dead tuples in it, but it would certainly suck if you did
and if the current 1GB limit forced you to do multiple vacuum passes.


Yeah, with 100+ GB machines not that uncommon today perhaps it's worth 
significantly upping this.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump

2015-03-05 Thread Michael Paquier

On Wed, Mar 4, 2015 at 2:03 PM, Michael Paquier
 wrote:
> On Wed, Mar 4, 2015 at 6:48 AM, Peter Eisentraut  wrote:
>> - set up basic scaffolding for TAP tests in src/bin/pg_dump
>
> Agreed.
>
>> - write a Perl function that can create an extension on the fly, given
>> name, C code, SQL code
>
> I am perplex about that. Where would the SQL code or C code be stored?
> In the pl script itself? I cannot really see the advantage to generate
> automatically the skeletton of an extension based on some C or SQL
> code in comparison to store the extension statically as-is. Adding
> those extensions in src/test/modules is out of scope to not bloat it,
> so we could for example add such test extensions in t/extensions/ or
> similar, and have prove_check scan this folder, then install those
> extensions in the temporary installation.
>
>> - add to the proposed t/001_dump_test.pl code to write the extension
>> - add that test to the pg_dump test suite
>> Eventually, the dump-and-restore routine could also be refactored, but
>> as long as we only have one test case, that can wait.
>
> Agreed on all those points.

Please note that I have created a new thread especially for this purpose here:
http://www.postgresql.org/message-id/CAB7nPqRx=zmbfjyjrwhguhhqk__8m+wd+p95cenjtmhxxr7...@mail.gmail.com
Perhaps we should move there this discussion as it is rather
independent of the problem that has been reported.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

2015-03-05 Thread Peter Geoghegan

On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan  wrote:
> Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
> CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
> this development justifies a new thread, though.)

Bruce Momjian kindly made available a server for stress-testing [1].
I'm using jjanes_upsert for this task (I stopped doing this for a
little while, and was in need of a new server).

At the very highest client count I'm testing (128), I see unprincipled
deadlocks for the exclusion constraint case very infrequently:

"
2015-03-05 14:09:36 EST [ 64987901 ]: ERROR:  deadlock detected
2015-03-05 14:09:36 EST [ 64987901 ]: DETAIL:  Process 7044 waits for
ShareLock on promise tuple with token 1 of transaction 64987589;
blocked by process 7200.
Process 7200 waits for ShareLock on transaction 64987901;
blocked by process 7044.
Process 7044: insert into upsert_race_test (index, count)
values ('541','-1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning count
Process 7200: insert into upsert_race_test (index, count)
values ('541','-1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning count
2015-03-05 14:09:36 EST [ 64987901 ]: HINT:  See server log for query details.
2015-03-05 14:09:36 EST [ 64987901 ]: STATEMENT:  insert into
upsert_race_test (index, count) values ('541','-1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning count

"

(Reminder: Exclusion constraints doing UPSERTs, and not just IGNOREs,
are artificial; this has been enabled within the parser solely for the
benefit of this stress-testing. Also, the B-Tree AM does not have nor
require "livelock insurance").

This only happens after just over 30 minutes, while consistently doing
128 client exclusion constraint runs. This is pretty close to the most
stressful thing that you could throw at the implementation, so that's
really not too bad.

I believe that this regression occurred as a direct result of adding
"livelock insurance". Basically, we cannot be 100% sure that the
interleaving of WAL-logged things within and across transactions won't
be such that the "only", "oldest" session that gets to wait in the
second stage (the second possible call to
check_exclusion_or_unique_constraint(), from ExecInsertIndexTuples())
will really be the "oldest" XID. Another *older* xact could just get
in ahead of us, waiting on our promise tuple as we wait on its xact
end (maybe it updates some third tuple that we didn't see in that has
already committed...not 100% sure). Obviously XID assignment order
does not guarantee that things like heap insertion and index tuple
insertion occur in serial XID order, especially with confounding
factors like super deletion. And so, every once in a long while we
deadlock.

Now, the very fact that this happens at all actually demonstrates the
need for "livelock insurance", IMV. The fact that we reliably
terminate is *reassuring*, because livelocks are in general a
terrifying possibility. We cannot *truly* solve the unprincipled
deadlock problem without adding livelocks, I think. But what we have
here is very close to eliminating unprincipled deadlocks, while not
also adding any livelocks, AFAICT. I'd argue that that's good enough.

Of course, when I remove "livelock insurance", the problem ostensibly
goes away (I've waited on such a stress test session for a couple of
hours now, so this conclusion seems very likely to be correct). I
think that we should do nothing about this, other than document it as
possible in our comments on "livelock insurance". Again, it's very
unlikely to be a problem in the real world, especially since B-Trees
are unaffected.

Also, I should point out that one of those waiters doesn't look like
an insertion-related wait at all: "7200 waits for ShareLock on
transaction 64987901; blocked by process 7044". Looks like row locking
is an essential part of this deadlock, and ordinarily that isn't even
possible for exclusion constraints (unless the patch is hacked to make
the parser accept exclusion constraints for an UPSERT, as it has been
here). Not quite sure what exact  XactLockTableWait() call did this,
but don't think it was any of them within
check_exclusion_or_unique_constraint(), based on the lack of details
(TID and so on are not shown).

[1] http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Amit Langote

On 06-03-2015 AM 01:32, Tom Lane wrote:
> Alvaro Herrera  writes:
>>  appendStringInfo(&buffer, _("user mapping for %s in server %s"), 
>> usename,
>>   srv->servername);
> 
> +1 for the concept, but to be nitpicky, "in" doesn't seem like the right
> word here.  "on server" would read better to me; or perhaps "at server".
> 

One more option may be "for server" (reading the doc for CREATE USER MAPPING)

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Amit Langote

On 06-03-2015 AM 09:18, Amit Langote wrote:
> On 06-03-2015 AM 01:32, Tom Lane wrote:

>> +1 for the concept, but to be nitpicky, "in" doesn't seem like the right
>> word here.  "on server" would read better to me; or perhaps "at server".
>>
> 
> One more option may be "for server" (reading the doc for CREATE USER MAPPING)

Oh, I see it's been done that way already.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Rethinking pg_dump's function sorting code

2015-03-05 Thread Tom Lane

In bug #12832 Marko Tiikkaja points out that commit
7b583b20b1c95acb621c71251150beef958bb603 created a rather unnecessary
dump failure hazard, since it applies pg_get_function_identity_arguments()
to every function in the database, even those that won't get dumped.
I think we should fix this by getting rid of pg_dump's use of that
function altogether.  A low-tech way to sort functions of identical names
would be to compare argument type OIDs, as in the attached simple patch.
If people feel it's important to avoid depending on numerical OID order,
we could instead look up type names locally and compare them as in the
attached less-simple patch.  (Both patches neglect reverting the data
collection aspects of the prior commit, since that's mechanical; the only
interesting part is what we'll do to sort.)

Neither patch will exactly preserve the sort behavior of the current
code, but I don't think that's important.

Comments?

regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 4b9bba0..816c9f0 100644
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
*** DOTypeNameCompare(const void *p1, const 
*** 291,303 
  	{
  		FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
  		FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
  
  		cmpval = fobj1->nargs - fobj2->nargs;
  		if (cmpval != 0)
  			return cmpval;
! 		cmpval = strcmp(fobj1->proiargs, fobj2->proiargs);
! 		if (cmpval != 0)
! 			return cmpval;
  	}
  	else if (obj1->objType == DO_OPERATOR)
  	{
--- 291,307 
  	{
  		FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
  		FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
+ 		int			i;
  
  		cmpval = fobj1->nargs - fobj2->nargs;
  		if (cmpval != 0)
  			return cmpval;
! 		for (i = 0; i < fobj1->nargs; i++)
! 		{
! 			cmpval = oidcmp(fobj1->argtypes[i], fobj2->argtypes[i]);
! 			if (cmpval != 0)
! return cmpval;
! 		}
  	}
  	else if (obj1->objType == DO_OPERATOR)
  	{
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 4b9bba0..de9407c 100644
*** a/src/bin/pg_dump/pg_dump_sort.c
--- b/src/bin/pg_dump/pg_dump_sort.c
*** DOTypeNameCompare(const void *p1, const 
*** 291,303 
  	{
  		FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
  		FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
  
  		cmpval = fobj1->nargs - fobj2->nargs;
  		if (cmpval != 0)
  			return cmpval;
! 		cmpval = strcmp(fobj1->proiargs, fobj2->proiargs);
! 		if (cmpval != 0)
! 			return cmpval;
  	}
  	else if (obj1->objType == DO_OPERATOR)
  	{
--- 291,313 
  	{
  		FuncInfo   *fobj1 = *(FuncInfo *const *) p1;
  		FuncInfo   *fobj2 = *(FuncInfo *const *) p2;
+ 		int			i;
  
  		cmpval = fobj1->nargs - fobj2->nargs;
  		if (cmpval != 0)
  			return cmpval;
! 		for (i = 0; i < fobj1->nargs; i++)
! 		{
! 			TypeInfo   *argtype1 = findTypeByOid(fobj1->argtypes[i]);
! 			TypeInfo   *argtype2 = findTypeByOid(fobj2->argtypes[i]);
! 
! 			if (argtype1 && argtype2)
! 			{
! cmpval = strcmp(argtype1->dobj.name, argtype2->dobj.name);
! if (cmpval != 0)
! 	return cmpval;
! 			}
! 		}
  	}
  	else if (obj1->objType == DO_OPERATOR)
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Tom Lane

Amit Langote  writes:
> On 06-03-2015 AM 01:32, Tom Lane wrote:
>> +1 for the concept, but to be nitpicky, "in" doesn't seem like the right
>> word here.  "on server" would read better to me; or perhaps "at server".

> One more option may be "for server" (reading the doc for CREATE USER MAPPING)

Hm, but then you'd have "user mapping for foo for server bar", which
doesn't read so nicely either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Amit Langote

On 06-03-2015 AM 09:30, Tom Lane wrote:
> Amit Langote  writes:
> 
>> One more option may be "for server" (reading the doc for CREATE USER MAPPING)
> 
> Hm, but then you'd have "user mapping for foo for server bar", which
> doesn't read so nicely either.
> 

Yeah, I had totally missed the "for foo" part; I was thinking of it as "of foo".

By the way, in this case, is "foo" the name/id of a local user or does it
really refer to some "foo on the remote server"?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread Tom Lane

Amit Langote  writes:
> By the way, in this case, is "foo" the name/id of a local user or does it
> really refer to some "foo on the remote server"?

It's the name of a local user.  I see your point that somebody might
misread this as suggesting that it's a remote username, but not sure
that there's anything great we can do here to disambiguate that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join push-down support for foreign tables

2015-03-05 Thread Kouhei Kaigai

> Actually val and val2 come from public.lt in "r" side, but as you say
> it's too difficult to know that from EXPLAIN output.  Do you have any
> idea to make the "Output" item more readable?
>
A fundamental reason why we need to have symbolic aliases here is that
postgres_fdw has remote query in cstring form. It makes implementation
complicated to deconstruct/construct a query that is once constructed
on the underlying foreign-path level.
If ForeignScan keeps items to construct remote query in expression node
form (and construction of remote query is delayed to beginning of the
executor, probably), we will be able to construct more human readable
remote query.

However, I don't recommend to work on this great refactoring stuff
within the scope of join push-down support project.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada
> Sent: Thursday, March 05, 2015 10:00 PM
> To: Ashutosh Bapat
> Cc: Kaigai Kouhei(海外 浩平); Robert Haas; PostgreSQL-development
> Subject: Re: [HACKERS] Join push-down support for foreign tables
> 
> Hi Ashutosh, thanks for the review.
> 
> 2015-03-04 19:17 GMT+09:00 Ashutosh Bapat :
> > In create_foreignscan_path() we have lines like -
> > 1587 pathnode->path.param_info = get_baserel_parampathinfo(root, rel,
> > 1588
> > required_outer);
> > Now, that the same function is being used for creating foreign scan paths
> > for joins, we should be calling get_joinrel_parampathinfo() on a join rel
> > and get_baserel_parampathinfo() on base rel.
> 
> Got it.  Please let me check the difference.
> 
> >
> > The patch seems to handle all the restriction clauses in the same way. There
> > are two kinds of restriction clauses - a. join quals (specified using ON
> > clause; optimizer might move them to the other class if that doesn't affect
> > correctness) and b. quals on join relation (specified in the WHERE clause,
> > optimizer might move them to the other class if that doesn't affect
> > correctness). The quals in "a" are applied while the join is being computed
> > whereas those in "b" are applied after the join is computed. For example,
> > postgres=# select * from lt;
> >  val | val2
> > -+--
> >1 |2
> >1 |3
> > (2 rows)
> >
> > postgres=# select * from lt2;
> >  val | val2
> > -+--
> >1 |2
> > (1 row)
> >
> > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2);
> >  val | val2 | val | val2
> > -+--+-+--
> >1 |2 |   1 |2
> >1 |3 | |
> > (2 rows)
> >
> > postgres=# select * from lt left join lt2 on (true) where (lt.val2 =
> > lt2.val2);
> >  val | val2 | val | val2
> > -+--+-+--
> >1 |2 |   1 |2
> > (1 row)
> >
> > The difference between these two kinds is evident in case of outer joins,
> > for inner join optimizer puts all of them in class "b". The remote query
> > sent to the foreign server has all those in ON clause. Consider foreign
> > tables ft1 and ft2 pointing to local tables on the same server.
> > postgres=# \d ft1
> >  Foreign table "public.ft1"
> >  Column |  Type   | Modifiers | FDW Options
> > +-+---+-
> >  val| integer |   |
> >  val2   | integer |   |
> > Server: loopback
> > FDW Options: (table_name 'lt')
> >
> > postgres=# \d ft2
> >  Foreign table "public.ft2"
> >  Column |  Type   | Modifiers | FDW Options
> > +-+---+-
> >  val| integer |   |
> >  val2   | integer |   |
> > Server: loopback
> > FDW Options: (table_name 'lt2')
> >
> > postgres=# explain verbose select * from ft1 left join ft2 on (ft1.val2 =
> > ft2.val2) where ft1.val + ft2.val > ft1.val2 or ft2.val is null;
> >
> > QUERY PLAN
> >
> >
> 
> ---
> >
> 
> 
> >  Foreign Scan  (cost=100.00..125.60 rows=2560 width=16)
> >Output: val, val2, val, val2
> >Remote SQL: SELECT r.a_0, r.a_1, l.a_0, l.a_1 FROM (SELECT val, val2 FROM
> > public.lt2) l (a_0, a_1) RIGHT JOIN (SELECT val, val2 FROM public.lt) r (a
> > _0, a_1)  ON r.a_0 + l.a_0) > r.a_1) OR (l.a_0 IS NULL))) AND ((r.a_1 =
> > l.a_1))
> > (3 rows)
> >
> > The result is then wrong
> > postgres=# select * from ft1 left join ft2 on (ft1.val2 = ft2.val2) where
> > ft1.val + ft2.val > ft1.val2 or ft2.val is null;
> >  val | val2 | val | val2
> > -+--+-+--
> >1 |2 | |
> >1 |3 | |
> > (2 rows)
> >
> > which should match the result obtained by substituting local tables for
> > foreign ones
> > postgres=# select * from lt left join lt2 on (lt.val2 = lt2.val2) where
> > l

Re: [HACKERS] object description for FDW user mappings

2015-03-05 Thread David G Johnston

Tom Lane-2 wrote
> Amit Langote <

> Langote_Amit_f8@.co

> > writes:
>> By the way, in this case, is "foo" the name/id of a local user or does it
>> really refer to some "foo on the remote server"?
> 
> It's the name of a local user.  I see your point that somebody might
> misread this as suggesting that it's a remote username, but not sure
> that there's anything great we can do here to disambiguate that.

Yeah, clarifying that point seems to add verbosity for little gain.

On the message aspect is it against style to use composite notation in a
case like this?

"user mapping (%s, %s)"

It is basically a single, if compound, noun so adding in/at/to doesn't feel
right...

David J.





--
View this message in context: 
http://postgresql.nabble.com/object-description-for-FDW-user-mappings-tp5840655p5840729.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Jim Nasby


On 2/28/15 12:01 PM, David Kubečka wrote:

With 'random_fk_dupl':
  ->  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
With 'random_fk_uniq':
  ->  Index Scan using facts_fk_idx on facts (cost=0.42..214.26
rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)

I have read the optimizer README file and also looked briefly at the
code, but this seems to be something not related to particular
implementation of algorithm (e.g. nested loop). Perhaps it's the way how
cost estimates are propagated down (or sideways? that would be weird...)
the query tree. But I am really not sure, since this is my first time
lookng at the optimizer code base. I should also add that I have
reproduced this behaviour for all versions of Pg from 9.2 up to current
devel.


This got answered on one of the other lists, right?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : REINDEX xxx VERBOSE

2015-03-05 Thread Jim Nasby


On 3/2/15 10:58 AM, Sawada Masahiko wrote:

On Wed, Feb 25, 2015 at 4:58 PM, Jim Nasby  wrote:

On 2/24/15 8:28 AM, Sawada Masahiko wrote:


According to the above discussion, VACUUM and REINDEX should have
trailing options. Tom seems (to me) suggesting that SQL-style
(bare word preceded by WITH) options and Jim suggesting '()'
style options? (Anyway VACUUM gets the*third additional*  option
sytle, but it is the different discussion from this)



Well, almost everything does a trailing WITH. We need to either stick with
that for consistency, or add leading () as an option to those WITH commands.

Does anyone know why those are WITH? Is it ANSI?

As a refresher, current commands are:

  VACUUM (ANALYZE, VERBOSE) table1 (col1);
  REINDEX INDEX index1 FORCE;
  COPY table1 FROM 'file.txt' WITH (FORMAT csv);
  CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA;
  CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over;
  CREATE ROLE role WITH LOGIN;
  GRANT  WITH GRANT OPTION;
  CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION;
  ALTER DATABASE db1 WITH CONNECTION LIMIT 50;
  DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD;


BTW, I'm fine with Tom's bare-word with WITH idea. That seems to be the 
most consistent with everything else. Is there a problem with doing 
that? I know getting syntax is one of the hard parts of new features, 
but it seems like we reached consensus here...



We have discussed about this option including FORCE option, but I
think there are not user who want to use both FORCE and VERBOSE option
at same time.


I find that very hard to believe... I would expect a primary use case 
for VERBOSE to be "I ran REINDEX, but it doesn't seem to have done 
anything... what's going on?" and that's exactly when you might want to 
use FORCE.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Configurable location for extension .control files

2015-03-05 Thread Jim Nasby

On 3/4/15 1:41 AM, Oskari Saarenmaa wrote:

>I'm interested in this because it could potentially make it possible to
>install SQL extensions without OS access. (My understanding is there's
>some issue with writing the extension files even if LIBDIR is writable
>by the OS account).

I'm not sure this patch makes extensions without OS access any easier to
implement; you still need to write the files somewhere, and someone
needs to set up the cluster with a nonstandard extension path.

Right, I wasn't expecting it to work out of the box.

Admittedly I'm foggy on what the exact problem with pushing a file into 
that directory via SQL is, so maybe it still won't help.

>>I don't think anyone should be packaging and shipping PG with
>>extension_directory set, but we really should allow it for superusers
>>and postgresql.conf so people can test extensions without hacks like
>>this:https://github.com/ohmu/pgmemcache/blob/master/localtests.sh#L23

>
>FWIW I'd like to see is breaking the cluster setup/teardown capability
>in pg_regress into it's own tool. It wouldn't solve the extension test
>problem, but it would eliminate the need for most of what that script is
>doing, and it would do it more robustly. It would make it very easy to
>unit test things with frameworks other than pg_regress.

This would certainly be useful.  I can try to do something about it if
we get configurable extension path supported.  The patch is now in
https://commitfest.postgresql.org/5/170/

Awesome! Let me know if I can help. Or I may start on it if I can get 
some other stuff off my plate.

By the way, after thinking about this a little more, I think a utility 
for standing up "temporary" Postgres clusters would be very welcome by 
power users. It's a very common need for QA environments. Originally I 
was thinking about it in terms of things like extension testing, but now 
I realize there's a much larger audience than that. So I definitely 
think this should be a stand alone shell command (pg_temp_cluster?), and 
either have pg_regress call it or (probably more logical) add it to the 
make files as a dependency for make check (make 
temp-cluster/remove-temp-cluster or similar).

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] anyarray

2015-03-05 Thread Peter Eisentraut

On 3/4/15 10:28 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 2/13/15 10:20 AM, Teodor Sigaev wrote:
>>> Some of users of intarray contrib module wish to use its features with
>>> another kind of arrays, not only for int4 type. Suggested module
>>> generalizes intarray over other (not all) types op pgsql.
> 
>> I think this module should be merged with the intarray module.  Having
>> two modules with very similar functionality would be confusing.
> 
> Perhaps.  I think it would be hard to remove intarray without breaking
> things for existing users of it; even if the functionality remains under
> another name.  And surely we don't want to generalize intarray while
> keeping that same name.  So it might be hard to get to a clean solution.

Maybe we could have the anyarray module (name TBD) install two
extensions: one of its own, and one that is named intarray that just
pulls anyarray in as a dependency.

There are more fundamental questions to be answered about the
functionality of this proposal first, however.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Jim Nasby


On 3/5/15 7:58 PM, Jim Nasby wrote:

This got answered on one of the other lists, right?


That was supposed to be off-list. I'll answer my own question: yes.

Sorry for the noise. :(
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-03-05 Thread Bruce Momjian

On Thu, Mar  5, 2015 at 10:55:28AM +0300, Vladimir Borodin wrote:
> You are correct that a pg_controldata file is copied over that has
> wal_level=minimal, but that should not be a problem.
> 
> 
> I suppose, this is the root cause of why replica does not start as hot 
> standby.
> It it enough to start it as warm standby, but not hot standby.
> See CheckRequiredParameterValues function in xlog.c which is called inside of
> StartupXLOG function.

Yes, you are correct.  I spent all day building a test harness so I
could automate this setup and test various failures.  I was able to
reproduce your failure, and you are correct that the proper fix is to
set wal_level=hot_standby on the new master, and then start and stop the
new cluster just before rsync.

The root cause is that pg_upgrade calls pg_resetxlog -o on the new
cluster _after_ the new cluster stopped for the final time, so rsync is
copying the incorrect pg_controldata wal_level value.  Also, even if
pg_resetxlog preserved wal_level in the control file, there is no
guarantee that the user configured the new cluster's wal_level for
hot_standby anyway.

What I have done is to update the pg_upgrade instructions to add this
required step.  Updated doc patch attached.  (I also added the --delete
flag to rsync.)  Thanks so much for your detailed report.

> But it could not be done with --size-only key, because control-file is
> of fixed
> size and rsync would skip it. Or may be everything should be copied
> with
> --size-only and control-file should be copied without this option.
> 
> 
> Well, what happens is that there is no _new_ standby pg_controldata
> file, so it is copied fully from the new master.  Are you running initdb
> to create the new standby --- you shouldn't be doing that as the rsync
> will do that for you.  
> 
> 
> No, I don’t. The scenario of the problem with copying control-file was in case
> when I:
> 1. ran pg_upgrade on master and got control-file with "wal_level = minimal",
> 2. did rsync --size-only to replica (and it got this control-file with
> "wal_level = minimal"),
> 3. started and stopped postgres on master to get «good» control-file with
>  "wal_level = hot_standby»,
> 4. did rsync --size-only to replica one more time. And this time control-file
> is not copied because of the same size of control-file.
> 
> Actually, if you don’t do step 2, everything works as expected. Sorry for
> bothering you.

Ah, yes, I think doing rsync twice is never a good suggestion.  It can
lead to too many failures.  Doing the start/stop before rsync seems like
the best solution.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
new file mode 100644
index 07ca0dc..e25e0d0
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
*** tar -cf backup.tar /usr/local/pgsql/data
*** 438,445 
 Another option is to use rsync to perform a file
 system backup.  This is done by first running rsync
 while the database server is running, then shutting down the database
!server just long enough to do a second rsync.  The
!second rsync will be much quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
--- 438,447 
 Another option is to use rsync to perform a file
 system backup.  This is done by first running rsync
 while the database server is running, then shutting down the database
!server long enough to do an rsync --checksum.
!(--checksum is necessary because rsync only
!has file modification-time granularity of one second.)  The
!second rsync will be quicker than the first,
 because it has relatively little data to transfer, and the end result
 will be consistent because the server was down.  This method
 allows a file system backup to be performed with minimal downtime.
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index e1cd260..56cb45d
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** NET STOP postgresql-8.4
*** 315,320 
--- 315,324 
  NET STOP postgresql-9.0
  
  
+ 
+ 
+  Log-shipping standby servers can remain running until a later step.
+ 
 
  
 
*** pg_upgrade.exe
*** 399,404 
--- 403,535 
 
  
 
+ Upgrade any Log-Shipping Standby Servers
+ 
+ 
+  If you have Log-Shipping Standby Servers (), follow these steps to upgrade them (before
+  starting any servers):
+ 
+ 
+ 
+ 
+  
+   Install the new PostgreSQL binaries on standby servers
+ 
+   
+Make sure the new binaries and support fi

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-03-05 Thread Peter Eisentraut

On 3/5/15 9:42 AM, Greg Stark wrote:
> Well if you want to read the file as is you can do so using the file
> reading functions which afaik were specifically intended for the
> purpose of writing config editing tools.

Sure, but those things are almost never installed by default, and I
don't want to install them if they provide easy write access, and they
don't help you find the actual file.  This proposal is much nicer.

> Joining against other catalog tables may be kind of exaggerated but if
> I have data in an SQL view I certainly expect to be able to write
> WHERE clauses to find the rows I want without having to do string
> parsing. If it were a comma-delimited string I have to do something
> like WHERE users LIKE '%,username,%' but then that's not good enough
> since it may be the first or last and the username itself may contain
> white-space or a comma etc etc.

The point is, it should be one or the other (or both), not something in
the middle.

It's either a textual representation of the file or a semantic one.  If
it's the latter, then all user names, group names, and special key words
need to be resolved in some way that they cannot be confused with
unfortunately named user names.  And there needs to be a way that we can
add more in the future.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-03-05 Thread Fujii Masao

On Thu, Mar 5, 2015 at 10:28 PM, Andres Freund  wrote:
> On 2015-03-05 12:14:04 +, Syed, Rahila wrote:
>> Please find attached  a patch. As discussed, flag to denote
>> compression and presence of hole in block image has been added in
>> XLogRecordImageHeader rather than block header.
>
> FWIW, I personally won't commit it with things done that way. I think
> it's going the wrong way, leading to a harder to interpret and less
> flexible format.  I'm not going to further protest if Fujii or Heikki
> commit it this way though.

I'm pretty sure that we can discuss the *better* WAL format even after
committing this patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-03-05 Thread Michael Paquier

On Fri, Mar 6, 2015 at 12:42 AM, Robert Haas wrote:
> On Thu, Mar 5, 2015 at 9:58 AM, Alvaro Herrera wrote:
>> Here's a simple idea to improve the patch: make VacuumParams include
>> VacuumStmt and the for_wraparound and do_toast flags.  ISTM that reduces
>> the number of arguments to be passed down in a couple of places.  In
>> particular:
>>
>> vacuum(VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)
>>
>> vacuum_rel(VacuumParams *params)
>>
>> Does that sound more attractive?
>
> I dislike passing down parser nodes straight into utility commands.
> It tends to make those those functions hard for in-core users to call,
> and also to lead to security vulnerabilities where we look up the same
> names more than once and just hope that we get the same OID every
> time.  Stuffing the VacuumStmt pointer inside the VacuumParams object
> doesn't, for me, help anything.  It'd be a lot more interesting if we
> could get rid of that altogether.

Do you mean removing totally VacuumStmt from the stack? We would then
need to add relation and va_cols as additional arguments of things
like vacuum_rel, analyze_rel, do_analyze_rel or similar.

FWIW, adding do_toast and for_wraparound into VacuumParams makes sense
to me, but not VacuumStmt. It has little meaning as VacuumParams
should be used for parameters.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-03-05 Thread Fujii Masao

On Thu, Mar 5, 2015 at 9:49 PM, Michael Paquier
 wrote:
> On Thu, Mar 5, 2015 at 7:10 PM, Fujii Masao wrote:
>> With the patch, VACUUM ANALYZE VERBOSE doesn't emit any verbose message.
>> Why did you remove that functionality?
>
> Oops. Sorry about that. In gram.y, the combination of VacuumStmt with
> AnalyzeStmt overwrote the value of log_min_duration incorrectly. I
> also found another bug related to logging of ANALYZE not working
> correctly because of the use of IsAutoVacuumWorkerProcess() instead of
> VACOPT_VERBOSE (this is reducing the diffs of the patch btw). All
> those things are fixed in the attached.

Thanks for updating the patch!

Why does log_min_duration need to be set even when manual VACUUM command is
executed? Per the latest version of the patch, log_min_duration is checked only
when the process is autovacuum worker. So ISTM that log_min_duration doesn't
need to be set in gram.y. It's even confusing to me. Or if you're going to
implement something like "VACUUM VERBOSE DURATION n" (i.e., verbose message
is output if n seconds have been elapsed), that might be necessary, though...

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Weirdly pesimistic estimates in optimizer

2015-03-05 Thread Tom Lane

=?UTF-8?Q?David_Kube=C4=8Dka?=  writes:
> There is division by loop_count because of predicted effect of caching and
> it is exactly this division which makes the run_cost for single index
> lookup so low compared with the query version with random_fk_uniq.  So the
> main problem is that the planner calls cost_index with loop_count equal to
> number of rows of inner table, *but it ignores semi-join semantics*, i.e.
> it doesn't account for the number of *unique* rows in the inner table which
> will actually be the number of loops.

Good point.

> Since this is my first time looking into the optimizer (and in fact any
> postgres) code I am not yet able to locate the exact place where this
> should be repaired, but I hope that in few days I will have a patch :-)

Hm, this might not be the best problem to tackle for your first Postgres
patch :-(.  The information about the estimated number of unique rows
isn't readily available at the point where we're creating parameterized
paths.  When we do compute such an estimate, it's done like this:

pathnode->path.rows = estimate_num_groups(root, uniq_exprs, rel->rows);

where uniq_exprs is a list of right-hand-side expressions we've determined
belong to the semijoin, and rel->rows represents the raw size of the
semijoin's RHS relation (which might itself be a join).  Neither of those
things are available to create_index_path.

I chewed on this for awhile and decided that there'd be no real harm in
taking identification of the unique expressions out of 
create_unique_path() and doing it earlier, in initsplan.c; we'd need a
couple more fields in SpecialJoinInfo but that doesn't seem like a
problem.  However, rel->rows is a *big* problem; we simply have not made
any join size estimates yet, and can't, because these things are done
bottom up.

However ... estimate_num_groups's dependency on its rowcount input is not
large (it's basically using it as a clamp).  So conceivably we could have
get_loop_count just multiply together the sizes of the base relations
included in the semijoin's RHS to get a preliminary estimate of that
number.  This would be the right thing anyway for a single relation in the
RHS, which is the most common case.  It would usually be an overestimate
for join RHS, but we could hope that the output of estimate_num_groups
wouldn't be affected too badly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2015-03-05 Thread Ashutosh Bapat

On Thu, Mar 5, 2015 at 9:30 AM, Kouhei Kaigai  wrote:

> > On Wed, Mar 4, 2015 at 4:41 AM, David Rowley 
> wrote:
> > >> This thread mentions "parallel queries" as a use case, but that means
> > >> passing data between processes, and that requires being able to
> > >> serialize and deserialize the aggregate state somehow. For actual data
> > >> types that's not overly difficult I guess (we can use in/out
> functions),
> > >> but what about aggretates using 'internal' state? I.e. aggregates
> > >> passing pointers that we can't serialize?
> > >
> > > This is a good question. I really don't know the answer to it as I've
> not
> > > looked at the Robert's API for passing data between backends yet.
> > >
> > > Maybe Robert or Amit can answer this?
> >
> > I think parallel aggregation will probably require both the
> > infrastructure discussed here, namely the ability to combine two
> > transition states into a single transition state, and also the ability
> > to serialize and de-serialize transition states, which has previously
> > been discussed in the context of letting hash aggregates spill to
> > disk.  My current thinking is that the parallel plan will look
> > something like this:
> >
> > HashAggregateFinish
> > -> Funnel
> > -> HashAggregatePartial
> >   -> PartialHeapScan
> >
> > So the workers will all pull from a partial heap scan and each worker
> > will aggregate its own portion of the data.  Then it'll need to pass
> > the results of that step back to the master, which will aggregate the
> > partial results and produce the final results.  I'm guessing that if
> > we're grouping on, say, column a, the HashAggregatePartial nodes will
> > kick out 2-column tuples of the form (,  > transition state for group with that value for a>).
> >
> It may not be an urgent topic to be discussed towards v9.5, however,
> I'd like to raise a topic about planner and aggregates.
>
> Once we could get the two phase aggregation, planner needs to pay
> attention possibility of partial aggregate during plan construction,
> even though our current implementation attach Agg node after the
> join/scan plan construction.
>
> Probably, a straightforward design is to add FunnelPath with some
> child nodes on set_rel_pathlist() or add_paths_to_joinrel().
> Its child node may be PartialAggregate node (or some other parallel
> safe node of course). In this case, it must inform the planner this
> node (for more correctness, targetlist of the node) returns partial
> aggregation, instead of the values row-by-row.
> Then, planner need to track and care about which type of data shall
> be returned to the upper node. Path node may have a flag to show it,
> and we also may need to inject dummy PartialAggregate if we try to
> join a relation that returns values row-by-row and another one that
> returns partial aggregate.
> It also leads another problem. The RelOptInfo->targetlist will
> depend on the Path node chosen. Even if float datum is expected as
> an argument of AVG(), its state to be combined is float[3].
> So, we will need to adjust the targetlist of RelOptInfo, once Path
> got chosen.
>
> Anyway, I could find out, at least, these complicated issues around
> two-phase aggregate integration with planner. Someone can suggest
> minimum invasive way for these integration?
>
>
Postgres-XC solved this question by creating a plan with two Agg/Group
nodes, one for combining transitioned result and one for creating the
distributed transition results (one per distributed run per group). So,
Agg/Group for combining result had as many Agg/Group nodes as there are
distributed/parallel runs. But XC chose this way to reduce the code
footprint. In Postgres, we can have different nodes for combining and
transitioning as you have specified above. Aggregation is not pathified in
current planner, hence XC took the approach of pushing the Agg nodes down
the plan tree when there was distributed/parallel execution possible. If we
can get aggregation pathified, we can go by path-based approach which might
give a better judgement of whether or not to distribute the aggregates
itself.

Looking at Postgres-XC might be useful to get ideas. I can help you there.


> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei 
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Combining Aggregates

2015-03-05 Thread David Rowley

On 6 March 2015 at 19:01, Ashutosh Bapat 
wrote:

> Postgres-XC solved this question by creating a plan with two Agg/Group
> nodes, one for combining transitioned result and one for creating the
> distributed transition results (one per distributed run per group).
>

> So, Agg/Group for combining result had as many Agg/Group nodes as there
> are distributed/parallel runs.
>

This sounds quite like the planner must be forcing the executor to having
to execute the plan on a fixed number of worker processes.

I really hoped that we could, one day, have a load monitor process that
decided what might be the best number of threads to execute a parallel plan
on. Otherwise how would we decide how many worker processes to allocate to
a plan? Surely there must be times where only utilising half of the
processors for a query would be better than trying to use all processors
and having many more context switched to perform.

Probably the harder part about dynamically deciding the number of workers
would be around the costing. Where maybe the plan will execute the fastest
with 32 workers, but if it was only given 2 workers then it might execute
better as a non-parallel plan.

> But XC chose this way to reduce the code footprint. In Postgres, we can
> have different nodes for combining and transitioning as you have specified
> above. Aggregation is not pathified in current planner, hence XC took the
> approach of pushing the Agg nodes down the plan tree when there was
> distributed/parallel execution possible. If we can get aggregation
> pathified, we can go by path-based approach which might give a better
> judgement of whether or not to distribute the aggregates itself.
>
> Looking at Postgres-XC might be useful to get ideas. I can help you there.
>
>

 Regards

David Rowley

Re: [HACKERS] pg_upgrade and rsync

2015-03-05 Thread Vladimir Borodin


> 6 марта 2015 г., в 6:11, Bruce Momjian  написал(а):
> 
> On Thu, Mar  5, 2015 at 10:55:28AM +0300, Vladimir Borodin wrote:
>>You are correct that a pg_controldata file is copied over that has
>>wal_level=minimal, but that should not be a problem.
>> 
>> 
>> I suppose, this is the root cause of why replica does not start as hot 
>> standby.
>> It it enough to start it as warm standby, but not hot standby.
>> See CheckRequiredParameterValues function in xlog.c which is called inside of
>> StartupXLOG function.
> 
> Yes, you are correct.  I spent all day building a test harness so I
> could automate this setup and test various failures.  I was able to
> reproduce your failure, and you are correct that the proper fix is to
> set wal_level=hot_standby on the new master, and then start and stop the
> new cluster just before rsync.
> 
> The root cause is that pg_upgrade calls pg_resetxlog -o on the new
> cluster _after_ the new cluster stopped for the final time, so rsync is
> copying the incorrect pg_controldata wal_level value.  Also, even if
> pg_resetxlog preserved wal_level in the control file, there is no
> guarantee that the user configured the new cluster's wal_level for
> hot_standby anyway.
> 
> What I have done is to update the pg_upgrade instructions to add this
> required step.  Updated doc patch attached.  (I also added the --delete
> flag to rsync.)  Thanks so much for your detailed report.

It seems to work fine now. The only thing that would be nice to change is to 
explicitly write that these instructions are correct for hot standby 
installations too.

+ 
+  If you have Log-Shipping Standby Servers (), follow these steps to upgrade them (before
+  starting any servers):
+ 

Actually, I’ve entered this thread because it is not obvious from the paragraph 
above or any other places.

> 
>>But it could not be done with --size-only key, because control-file is
>>of fixed
>>size and rsync would skip it. Or may be everything should be copied
>>with
>>--size-only and control-file should be copied without this option.
>> 
>> 
>>Well, what happens is that there is no _new_ standby pg_controldata
>>file, so it is copied fully from the new master.  Are you running initdb
>>to create the new standby --- you shouldn't be doing that as the rsync
>>will do that for you.  
>> 
>> 
>> No, I don’t. The scenario of the problem with copying control-file was in 
>> case
>> when I:
>> 1. ran pg_upgrade on master and got control-file with "wal_level = minimal",
>> 2. did rsync --size-only to replica (and it got this control-file with
>> "wal_level = minimal"),
>> 3. started and stopped postgres on master to get «good» control-file with
>> "wal_level = hot_standby»,
>> 4. did rsync --size-only to replica one more time. And this time control-file
>> is not copied because of the same size of control-file.
>> 
>> Actually, if you don’t do step 2, everything works as expected. Sorry for
>> bothering you.
> 
> Ah, yes, I think doing rsync twice is never a good suggestion.  It can
> lead to too many failures.  Doing the start/stop before rsync seems like
> the best solution.
> 
> -- 
>  Bruce Momjian  http://momjian.us
>  EnterpriseDB http://enterprisedb.com
> 
>  + Everyone has their own god. +
> 


--
May the force be with you…
https://simply.name

Re: [HACKERS] Combining Aggregates

2015-03-05 Thread Ashutosh Bapat

On Fri, Mar 6, 2015 at 12:41 PM, David Rowley  wrote:

> On 6 March 2015 at 19:01, Ashutosh Bapat 
> wrote:
>
>> Postgres-XC solved this question by creating a plan with two Agg/Group
>> nodes, one for combining transitioned result and one for creating the
>> distributed transition results (one per distributed run per group).
>>
>
>
>> So, Agg/Group for combining result had as many Agg/Group nodes as there
>> are distributed/parallel runs.
>>
>
> This sounds quite like the planner must be forcing the executor to having
> to execute the plan on a fixed number of worker processes.
>
> I really hoped that we could, one day, have a load monitor process that
> decided what might be the best number of threads to execute a parallel plan
> on. Otherwise how would we decide how many worker processes to allocate to
> a plan? Surely there must be times where only utilising half of the
> processors for a query would be better than trying to use all processors
> and having many more context switched to perform.
>
> Probably the harder part about dynamically deciding the number of workers
> would be around the costing. Where maybe the plan will execute the fastest
> with 32 workers, but if it was only given 2 workers then it might execute
> better as a non-parallel plan.
>

XC does that, because it knew on how many nodes it had to distribute the
aggregation while creating the plan. To keep that dynamic, we can add a
place-holder planner node for producing transitioned results for a given
distributed run. At the time of execution, that node creates one executor
node (corresponding to the place-holder node) per parallel run. I haven't
seen a precedence in PG code to create more than one executor node for a
given planner node, but is that a rule?


>
>
>> But XC chose this way to reduce the code footprint. In Postgres, we can
>> have different nodes for combining and transitioning as you have specified
>> above. Aggregation is not pathified in current planner, hence XC took the
>> approach of pushing the Agg nodes down the plan tree when there was
>> distributed/parallel execution possible. If we can get aggregation
>> pathified, we can go by path-based approach which might give a better
>> judgement of whether or not to distribute the aggregates itself.
>>
>> Looking at Postgres-XC might be useful to get ideas. I can help you there.
>>
>>
>
>  Regards
>
> David Rowley
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

63 matches

Mail list logo