date:20140127

On 25 January 2014 23:08, Simon Riggs si...@2ndquadrant.com wrote:
 On 25 January 2014 22:33, Stephen Frost sfr...@snowman.net wrote:

 * Tom Lane (t...@sss.pgh.pa.us) wrote:

 AFAICT, there was no consensus in this thread on what to do, which
 probably has something to do with the lack of concrete performance
 tests presented to back up any particular proposal.

 This I entirely agree with- more testing and more information on how
 such a change impacts other workloads would be great.  Unfortunately,
 while I've provided a couple of test cases and seen similar situations
 on IRC, this is very data-dependent which makes it difficult to have
 concrete answers for every workload.

 Still, I'll try and spend some time w/ pg_bench's schema definition and
 writing up some larger queries to run through it (aiui, the default set
 of queries won't typically result in a hashjoin) and see what happens
 there.

 The case that action of some kind was needed was clear, for me.
 Hopefully some small improvement can be found from that investigation,
 even if the greatest gain is in some way under dispute.

I don't see anything for 9.4 in here now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hackers,

Is somebody available to volunteer to review the custom-scan patch?

Even though Hanada-san acknowledged before, it seems to me this patch
has potentially arguable implementations. Even if you have enough time
to review whole of the code, it helps me if you can comment on the
following topics.


(1) Interface to add alternative paths instead of built-in join paths

This patch adds add_join_path_hook on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.


(2) CUSTOM_VAR for special Var reference

@@ -134,6 +134,7 @@ typedef struct Expr
 #defineINNER_VAR   65000   /* reference to inner subplan */
 #defineOUTER_VAR   65001   /* reference to outer subplan */
 #defineINDEX_VAR   65002   /* reference to index column */
+#defineCUSTOM_VAR  65003   /* reference to custom column */

I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
instead of built-in joins, because its tuples being fetched are usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kohei KaiGai
 Sent: Tuesday, January 14, 2014 11:20 PM
 To: Shigeru Hanada
 Cc: Kaigai, Kouhei(海外, 浩平); Jim Mlodgenski; Robert Haas; Tom Lane;
 PgHacker; Peter Eisentraut
 Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
 
 Hello,
 
 The attached patches are the ones rebased to the latest git tree, but no
 functional changes from the previous revision on the commit-fest:Nov.
 Hanada-san volunteered to review the series of patches, including the
 portion for postgres_fdw, then marked it as ready for committer on the
 last commit fest.
 So, I hope someone of committer also volunteer to review the patches for
 final checking.
 
 * Part-1 - CustomScan APIs
 This patch provides a set of interfaces to interact query-optimizer and
 -executor for extensions. The new add_scan_path_hook or add_join_path_hook
 allows to offer alternative ways to scan a particular relation or to join
 a particular relations.
 Then, once the alternative ways are chosen by the optimizer, associated
 callbacks shall be kicked from the executor. In this case, extension has
 responsibility to return a slot that hold a tuple (or empty for end of scan)
 being scanned from the underlying relation.
 
 * Part-2 - contrib/ctidscan
 This patch provides a simple example implementation of CustomScan API.
 It enables to skip pages when inequality operators are given on ctid system

Re: [HACKERS] GIN improvements part2: fast scan

2014-01-27 Thread Alexander Korotkov

On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/26/2014 08:24 AM, Tomas Vondra wrote:

 Hi!

 On 25.1.2014 22:21, Heikki Linnakangas wrote:

 Attached is a new version of the patch set, with those bugs fixed.


 I've done a bunch of tests with all the 4 patches applied, and it seems
 to work now. I've done tests with various conditions (AND/OR, number of
 words, number of conditions) and I so far I did not get any crashes,
 infinite loops or anything like that.

 I've also compared the results to 9.3 - by dumping the database and
 running the same set of queries on both machines, and indeed I got 100%
 match.

 I also did some performance tests, and that's when I started to worry.

 For example, I generated and ran 1000 queries that look like this:

SELECT id FROM messages
 WHERE body_tsvector @@ to_tsquery('english','(header  53  32 
 useful  dropped)')
 ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header  53 
 32  useful  dropped)')) DESC;

 i.e. in this case the query always was 5 words connected by AND. This
 query is a pretty common pattern for fulltext search - sort by a list of
 words and give me the best ranked results.

 On 9.3, the script was running for ~23 seconds, on patched HEAD it was
 ~40. It's perfectly reproducible, I've repeated the test several times
 with exactly the same results. The test is CPU bound, there's no I/O
 activity at all. I got the same results with more queries (~100k).

 Attached is a simple chart with x-axis used for durations measured on
 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
 a vast majority of queries is up to 2x slower - that's pretty obvious
 from the chart.

 Only about 50 queries are faster on HEAD, and 700 queries are more than
 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes 150ms
 on HEAD).

 Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):

   http://explain.depesz.com/s/5tv

 and on HEAD (same query):

   http://explain.depesz.com/s/1lI

 Clearly the main difference is in the Bitmap Index Scan which takes
 60ms on 9.3 and 120ms on HEAD.

 On 9.3 the perf top looks like this:

  34.79%  postgres [.] gingetbitmap
  28.96%  postgres [.] ginCompareItemPointers
   9.36%  postgres [.] TS_execute
   5.36%  postgres [.] check_stack_depth
   3.57%  postgres [.] FunctionCall8Coll

 while on 9.4 it looks like this:

  28.20%  postgres [.] gingetbitmap
  21.17%  postgres [.] TS_execute
   8.08%  postgres [.] check_stack_depth
   7.11%  postgres [.] FunctionCall8Coll
   4.34%  postgres [.] shimTriConsistentFn

 Not sure how to interpret that, though. For example where did the
 ginCompareItemPointers go? I suspect it's thanks to inlining, and that
 it might be related to the performance decrease. Or maybe not.


 Yeah, inlining makes it disappear from the profile, and spreads that time
 to the functions calling it.

 The profile tells us that the consistent function is called a lot more
 than before. That is expected - with the fast scan feature, we're calling
 consistent not only for potential matches, but also to refute TIDs based on
 just a few entries matching. If that's effective, it allows us to skip many
 TIDs and avoid consistent calls, which compensates, but if it's not
 effective, it's just overhead.

 I would actually expect it to be fairly effective for that query, so
 that's a bit surprising. I added counters to see where the calls are coming
 from, and it seems that about 80% of the calls are actually coming from
 this little the feature I explained earlier:


  In addition to that, I'm using the ternary consistent function to check
 if minItem is a match, even if we haven't loaded all the entries yet.
 That's less important, but I think for something like rare1 | (rare2 
 frequent) it might be useful. It would allow us to skip fetching
 'frequent', when we already know that 'rare1' matches for the current
 item. I'm not sure if that's worth the cycles, but it seemed like an
 obvious thing to do, now that we have the ternary consistent function.


 So, that clearly isn't worth the cycles :-). At least not with an
 expensive consistent function; it might be worthwhile if we pre-build the
 truth-table, or cache the results of the consistent function.

 Attached is a quick patch to remove that, on top of all the other patches,
 if you want to test the effect.


Every single change you did in fast scan seems to be reasonable, but
testing shows that something went wrong. Simple test with 3 words of
different selectivities.

After applying your patches:

# select count(*) from fts_test where fti @@ plainto_tsquery('english',
'gin index select');
 count
───
   627
(1 row)

Time: 21,252 ms

In original fast-scan:

# select

Re: [HACKERS] plpgsql.warn_shadow

2014-01-27 Thread Marti Raudsepp

On Sun, Jan 26, 2014 at 11:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
 For 9.4, we should cut down the patch so it has
   plpgsql.warnings = none (default) | all | [individual item list]

   plpgsql.warnings_as_errors = off (default) | on

I hope I'm not late for the bikeshedding :)

Why not have 2 similar options:
plpgsql.warnings = none (default) | all | [individual item list]
plpgsql.errors = none (default) | all | [individual item list]

That would be cleaner, more flexible, and one less option to
set if you just want errors and no warnings.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

2014-01-27 Thread Pavel Stehule

2014-01-27 Marti Raudsepp ma...@juffo.org

 On Sun, Jan 26, 2014 at 11:19 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  For 9.4, we should cut down the patch so it has
plpgsql.warnings = none (default) | all | [individual item list]

plpgsql.warnings_as_errors = off (default) | on

 I hope I'm not late for the bikeshedding :)

 Why not have 2 similar options:
 plpgsql.warnings = none (default) | all | [individual item list]
 plpgsql.errors = none (default) | all | [individual item list]

 That would be cleaner, more flexible, and one less option to
 set if you just want errors and no warnings.


what means plpgsql.errors = none ???

I strongly disagree so this design is clean

Regards

Pavel



 Regards,
 Marti


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

On 27 January 2014 10:40, Marti Raudsepp ma...@juffo.org wrote:
 On Sun, Jan 26, 2014 at 11:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
 For 9.4, we should cut down the patch so it has
   plpgsql.warnings = none (default) | all | [individual item list]

   plpgsql.warnings_as_errors = off (default) | on

 I hope I'm not late for the bikeshedding :)

 Why not have 2 similar options:
 plpgsql.warnings = none (default) | all | [individual item list]
 plpgsql.errors = none (default) | all | [individual item list]

 That would be cleaner, more flexible, and one less option to
 set if you just want errors and no warnings.

That would allow you to mis-set the parameters and then cause a
runtime error for something that was only a warning at compile time.

Florian's point was well made and we must come up with something that
allows warning/errors at compile time and once accepted, nothing at
run time.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Show process IDs of processes holding a lock; show relation and tuple infos of a lock to acquire

2014-01-27 Thread Rajeev rastogi

On 23/01/14, Christian Kruse wrote:
  Well, is it context or detail?  Those fields have reasonably well
  defined meanings IMO.
 
 I find the distinction somewhat blurry and think both would be
 appropriate. But since I wasn't sure I changed to detail.
 
  If we need errcontext_plural, let's add it, not adopt inferior
  solutions just because that isn't there for lack of previous need.
 
 I would've added it if I would've been sure.
 
  But having said that, I think this is indeed detail not context.
  (I kinda wonder whether some of the stuff that's now in the primary
  message shouldn't be pushed to errdetail as well.  It looks like some
  previous patches in this area have been lazy.)
 
 I agree, the primary message is not very well worded. On the other hand
 finding an appropriate alternative seems hard for me.
 
  While I'm griping, this message isn't even trying to follow the
  project's message style guidelines.  Detail or context messages are
  supposed to be complete sentence(s), with capitalization and
 punctuation to match.
 
 Hm, I hope I fixed it in this version of the patch.
 
  Lastly, is this information that we want to be shipping to clients?
  Perhaps from a security standpoint that's not such a wise idea, and
  errdetail_log() is what should be used.
 
 Fixed. I added an errdetail_log_plural() for this, too.

I have checked the revised patch. It looks fine to me except one minor code 
formatting issue.
In elog.c, two tabs are missing in the definition of function 
errdetail_log_plural. 
Please run pgindent tool to check the same.

Also I would like to highlight one behavior here is that process ID of process 
trying to
acquire lock is also listed in the list of Request queue. E.g.

session 1 with process id X: BEGIN; LOCK TABLE foo IN SHARE MODE; 
session 2 with process id Y: BEGIN; LOCK TABLE foo IN EXCLUSIVE MODE;

On execution of LOCK in session-2, as part of log it will display as:
DETAIL:  Process holding the lock: X. Request queue: Y.

Where Y is the process ID of same process, which was trying to acquire 
lock.

To me, it seems to be correct behavior as the meaning of message will be
list of all processes already holding the lock and processes waiting in queue 
and 
position of self process in wait-list. In above example, it will indicate that
process Y in on top of wait list.

Thanks and Regards,
Kumar Rajeev Rastogi




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread KONDO Mitsumasa


(2014/01/23 23:18), Andrew Dunstan wrote:

What is more, if the square root calculation is affecting your benchmarks, I
suspect you are benchmarking the wrong thing.
I run another test that has two pgbench-clients in same time, one is 
select-only-query and another is executing 'SELECT *　pg_stat_statement' query in 
every one second. I used v6 patch in this test.


* Benchmark Commands
$bin/pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -S -T 180 -n 
$bin/pgbench -h xxx.xxx.xxx.xxx mitsu-ko -T 180 -n -f file.sql

** file.sql
SELECT * FROM pg_stat_statement;
\sleep 1s

* Select-only-query Result (Test result is represented by tps.)
   method|  try1  |  try2  |  try3

with pgss| 125502 | 125818 | 125809
with patched pgss| 125909 | 125699 | 126040


This result shows my patch is almost same performance than before.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread KONDO Mitsumasa

(2014/01/26 17:43), Mitsumasa KONDO wrote:
 2014-01-26 Simon Riggs si...@2ndquadrant.com mailto:si...@2ndquadrant.com
 
 On 21 January 2014 19:48, Simon Riggs si...@2ndquadrant.com
 mailto:si...@2ndquadrant.com wrote:
   On 21 January 2014 12:54, KONDO Mitsumasa 
 kondo.mitsum...@lab.ntt.co.jp
 mailto:kondo.mitsum...@lab.ntt.co.jp wrote:
   Rebased patch is attached.
  
   Does this fix the Windows bug reported by Kumar on 20/11/2013 ?
Sorry, I was misunderstanding. First name of Mr. Rajeev Rastogi is Kumar! I 
searched
only e-mail address and title by his name...

I don't have windows compiler enviroment, but attached patch might be fixed.
Could I ask Mr. Rajeev Rastogi to test my patch again?

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***
*** 19,24  CREATE FUNCTION pg_stat_statements(
--- 19,27 
  OUT query text,
  OUT calls int8,
  OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT stdev_time float8,
  OUT rows int8,
  OUT shared_blks_hit int8,
  OUT shared_blks_read int8,
***
*** 41,43  CREATE VIEW pg_stat_statements AS
--- 44,52 
SELECT * FROM pg_stat_statements();
  
  GRANT SELECT ON pg_stat_statements TO PUBLIC;
+ 
+ /* New Function */
+ CREATE FUNCTION pg_stat_statements_reset_time()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
*** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***
*** 9,14  RETURNS void
--- 9,19 
  AS 'MODULE_PATHNAME'
  LANGUAGE C;
  
+ CREATE FUNCTION pg_stat_statements_reset_time()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
  CREATE FUNCTION pg_stat_statements(
  OUT userid oid,
  OUT dbid oid,
***
*** 16,21  CREATE FUNCTION pg_stat_statements(
--- 21,29 
  OUT query text,
  OUT calls int8,
  OUT total_time float8,
+ OUT min_time float8,
+ OUT max_time float8,
+ OUT stdev_time float8,
  OUT rows int8,
  OUT shared_blks_hit int8,
  OUT shared_blks_read int8,
***
*** 42,44  GRANT SELECT ON pg_stat_statements TO PUBLIC;
--- 50,53 
  
  -- Don't want this to be available to non-superusers.
  REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset_time() FROM PUBLIC;
*** a/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--unpackaged--1.0.sql
***
*** 4,8 
--- 4,9 
  \echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit
  
  ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset();
+ ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements_reset_time();
  ALTER EXTENSION pg_stat_statements ADD function pg_stat_statements();
  ALTER EXTENSION pg_stat_statements ADD view pg_stat_statements;
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 78,84  static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
  #define USAGE_DECREASE_FACTOR	(0.99)	/* decreased every entry_dealloc */
  #define STICKY_DECREASE_FACTOR	(0.50)	/* factor for sticky entries */
  #define USAGE_DEALLOC_PERCENT	5		/* free this % of entries at once */
! 
  #define JUMBLE_SIZE1024	/* query serialization buffer size */
  
  /*
--- 78,85 
  #define USAGE_DECREASE_FACTOR	(0.99)	/* decreased every entry_dealloc */
  #define STICKY_DECREASE_FACTOR	(0.50)	/* factor for sticky entries */
  #define USAGE_DEALLOC_PERCENT	5		/* free this % of entries at once */
! #define EXEC_TIME_INIT_MIN	DBL_MAX		/* initial execution min time */
! #define	EXEC_TIME_INIT_MAX	-DBL_MAX	/* initial execution max time */
  #define JUMBLE_SIZE1024	/* query serialization buffer size */
  
  /*
***
*** 114,119  typedef struct Counters
--- 115,123 
  {
  	int64		calls;			/* # of times executed */
  	double		total_time;		/* total execution time, in msec */
+ 	double		total_sqtime;		/* cumulated square execution time, in msec */
+ 	double		min_time;		/* maximum execution time, in msec */
+ 	double		max_time;		/* minimum execution time, in msec */
  	int64		rows;			/* total # of retrieved or affected rows */
  	int64		shared_blks_hit;	/* # of shared buffer hits */
  	int64		shared_blks_read;		/* # of shared disk blocks read */
***
*** 237,245  void		_PG_init(void);
--- 241,251 
  void		_PG_fini(void);
  
  Datum		pg_stat_statements_reset(PG_FUNCTION_ARGS);
+ Datum		pg_stat_statements_reset_time(PG_FUNCTION_ARGS);
  Datum		pg_stat_statements(PG_FUNCTION_ARGS);
  
  PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
+

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement



On 01/27/2014 07:09 AM, KONDO Mitsumasa wrote:

(2014/01/23 23:18), Andrew Dunstan wrote:
What is more, if the square root calculation is affecting your 
benchmarks, I

suspect you are benchmarking the wrong thing.
I run another test that has two pgbench-clients in same time, one is 
select-only-query and another is executing 'SELECT *　 
pg_stat_statement' query in every one second. I used v6 patch in this 
test.





The issue of concern is not the performance of pg_stat_statements, AUIU. 
The issue is whether this patch affects performance generally, i.e. is 
there a significant cost in collecting these extra stats. To test this 
you would compare two general pgbench runs, one with the patch applied 
and one without. I personally don't give a tinker's cuss about whether 
the patch slows down pg_stat_statements a bit.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Review] inherit support for foreign tables

2014-01-27 Thread Shigeru Hanada

2014-01-27 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 (2014/01/25 11:27), Shigeru Hanada wrote:
 Yeah, the consistency is essential for its ease of use.  But I'm not sure
 that inherited stats ignoring foreign tables is actually useful for query
 optimization.  What I think about the consistency is a) the ANALYZE command
 with no table names skips ANALYZEing inheritance trees that include at least
 one foreign table as a child, but b) the ANALYZE command with a table name
 indicating an inheritance tree that includes any foreign tables does compute
 the inherited stats in the same way as an inheritance tree consiting of only
 ordinary tables by acquiring the sample rows from each foreign table on the
 far side.

b) sounds little complex to understand or explain.

Is it too big change that making ANALYZE command to handle foreign
tables too even if no table name was specified?  IIRC, performance
issue was the reason to exclude foreign tables from auto-analyze
processing.  ANALYZEing large database contains local huge data also
takes long time.  One idea to avoid unexpected long processing is to
add option to foreign tables to mark it as not-auto-analyzable.

Thoughts?
-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.1

On Sat, Jan 25, 2014 at 5:25 PM, Andres Freund and...@2ndquadrant.com wrote:
 Looking over patch 0002, I see that there's code to allow a walsender
 to create or drop a physical replication slot.  Also, if we've
 acquired a replication slot, there's code to update it, and code to
 make sure we disconnect from it, but there's no code to acquire it. I
 think maybe the hunk in StartReplication() is supposed to be calling
 ReplicationSlotAcquire() instead of ReplicationSlotRelease(),

 Uh. You had me worried here for a minute or two, a hunk or two earlier
 than the ReplicationSlotRelease() you mention. What probably confused
 you is that StartReplication only returns once all streaming is
 finished. Not my idea...

No, what confuses me is that there's no call to
ReplicationSlotAcquire() in patch 0001 or patch 0002  the function
is added but not called.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread Shigeru Hanada

2014-01-27 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 While still reviwing this patch, I feel this patch has given enough
 consideration to interactions with other commands, but found the following
 incorrect? behabior:

 postgres=# CREATE TABLE product (id INTEGER, description TEXT);
 CREATE TABLE
 postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product) SERVER fs
 OPTIONS (filename '/home/foo/product1.csv', format 'csv');
 CREATE FOREIGN TABLE
 postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE
 EXTERNAL;
 ERROR:  product1 is not a table or materialized view

 ISTN the ALTER TABLE simple recursion mechanism (ie ATSimpleRecursion())
 should be modified for the ALTER COLUMN SET STORAGE case.

 I just wanted to quickly tell you this for you to take time to consider.

Thanks for the review.  It must be an oversight, so I'll fix it up soon.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.1

On Sat, Jan 25, 2014 at 5:25 PM, Andres Freund and...@2ndquadrant.com wrote:
 I'm also wondering about
 whether we've got the right naming here.  AFAICT, it's not the case
 that we're going to use the catalog xmin for catalogs and the data
 xmin for non-catalogs.  Rather, the catalog xmin is going to always
 be included in globalxmin calculations, so IOW it should just be
 called xmin.

 Well, not really. That's true for GetSnapshotData(), but not for
 GetOldestXmin() where we calculate an xmin *not* including the catalog
 xmin. And the data_xmin is always used, regardless of
 catalog/non_catalog, we peg the xmin further for catalog tables, based
 on that value.
 The reason for doing things this way is that it makes all current usages
 of RecentGlobalXmin safe, since that is the more conservative
 value. Only in inspected location we can use RecentGlobalDataXmin which
 *does* include data_xmin but *not* catalog_xmin.

Well, OK, so I guess I'm turned around.  But I guess my point is - if
one of data_xmin and catalog_xmin is really just xmin, then I think it
would be more clear to call that one xmin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2014-01-27 Thread Mitsumasa KONDO

2014-01-27 Andrew Dunstan and...@dunslane.net


 On 01/27/2014 07:09 AM, KONDO Mitsumasa wrote:

 (2014/01/23 23:18), Andrew Dunstan wrote:

 What is more, if the square root calculation is affecting your
 benchmarks, I
 suspect you are benchmarking the wrong thing.

 I run another test that has two pgbench-clients in same time, one is
 select-only-query and another is executing 'SELECT *  pg_stat_statement'
 query in every one second. I used v6 patch in this test.



 The issue of concern is not the performance of pg_stat_statements, AUIU.
 The issue is whether this patch affects performance generally, i.e. is
 there a significant cost in collecting these extra stats. To test this you
 would compare two general pgbench runs, one with the patch applied and one
 without.

I showed first test result which is compared with without
pg_stat_statements and without patch last day.  They ran in same server and
same benchmark settings(clients and scale factor) as today's result. When
you merge and see the results, you can confirm not to affect of performance
in my patch.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Re: [HACKERS] What is happening on buildfarm member crake?

On Sat, Jan 25, 2014 at 5:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah.  If Robert's diagnosis is correct, and it sounds pretty plausible,
 then this is really just one instance of a bug that's probably pretty
 widespread in our signal handlers.  Somebody needs to go through 'em
 all and look for touches of shared memory.

I haven't made a comprehensive study of every signal handler we have,
but looking at procsignal_sigusr1_handler, the list of functions that
can get called from here is quite short: CheckProcSignal(),
RecoveryConflictInterrupt(), SetLatch(), and latch_sigusr1_handler().
Taking those in reverse order:

- latch_sigusr1_handler() is fine.  Nothing down this path touches
shared memory; moreover, if we've already disowned our latch, the
waiting flag won't be set and this will do nothing at all.
- The call to SetLatch() is problematic as we already know.  This is
new code in 9.4.
- RecoveryConflictInterrupt() does nothing if proc_exit_inprogress is
set.  So it's fine.
- CheckProcSignal() also appears problematic.  If we've already
detached shared memory, MyProcSignalSlot will be pointing to garbage,
but we'll try to dereference it anyway.

I think maybe the best fix is to *clear* MyProc in
ProcKill/AuxiliaryProcKill and MyProcSignalSlot in
CleanupProcSignalState, as shown in the attached patch.  Most places
that dereference those pointers already check that they aren't null,
and we can easily add a NULL guard to the SetLatch() call in
procsignal_sigusr1_handler, which the attached patch also does.

This might not be a complete fix to every problem of this type that
exists anywhere in our code, but I think it's enough to make the world
safe for procsignal_sigusr1_handler.  We also have a *large* number of
signal handlers that do little more than this:

if (MyProc)
SetLatch(MyProc-procLatch);

...and this change should make all of those safe as well.  So I think
this is a pretty good start.

Assuming nobody objects too much to this basic approach, should I
back-patch the parts of this that apply pre-9.4?  The problem with
CleanupProcSignalState, at least, goes all the way back to 9.0, when
the signal-multiplexing infrastructure was introduced.  But the
probability of an actual crash must be pretty low, or I imagine we
would have noticed this sooner.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 6ebabce..1372a7e 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -149,6 +149,8 @@ CleanupProcSignalState(int status, Datum arg)
 	slot = ProcSignalSlots[pss_idx - 1];
 	Assert(slot == MyProcSignalSlot);
 
+	MyProcSignalSlot = NULL;
+
 	/* sanity check */
 	if (slot-pss_pid != MyProcPid)
 	{
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ee6c24c..f64e1c4 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -772,6 +772,7 @@ ProcKill(int code, Datum arg)
 {
 	/* use volatile pointer to prevent code rearrangement */
 	volatile PROC_HDR *procglobal = ProcGlobal;
+	PGPROC	   *proc;
 
 	Assert(MyProc != NULL);
 
@@ -796,31 +797,34 @@ ProcKill(int code, Datum arg)
 	 */
 	LWLockReleaseAll();
 
-	/* Release ownership of the process's latch, too */
-	DisownLatch(MyProc-procLatch);
+	/*
+	 * Clear MyProc first; then disown the process latch.  This is so that
+	 * signal handlers won't try to clear the process latch after it's no
+	 * longer ours.
+	 */
+	proc = MyProc;
+	MyProc = NULL;
+	DisownLatch(proc-procLatch);
 
 	SpinLockAcquire(ProcStructLock);
 
 	/* Return PGPROC structure (and semaphore) to appropriate freelist */
 	if (IsAnyAutoVacuumProcess())
 	{
-		MyProc-links.next = (SHM_QUEUE *) procglobal-autovacFreeProcs;
-		procglobal-autovacFreeProcs = MyProc;
+		proc-links.next = (SHM_QUEUE *) procglobal-autovacFreeProcs;
+		procglobal-autovacFreeProcs = proc;
 	}
 	else if (IsBackgroundWorker)
 	{
-		MyProc-links.next = (SHM_QUEUE *) procglobal-bgworkerFreeProcs;
-		procglobal-bgworkerFreeProcs = MyProc;
+		proc-links.next = (SHM_QUEUE *) procglobal-bgworkerFreeProcs;
+		procglobal-bgworkerFreeProcs = proc;
 	}
 	else
 	{
-		MyProc-links.next = (SHM_QUEUE *) procglobal-freeProcs;
-		procglobal-freeProcs = MyProc;
+		proc-links.next = (SHM_QUEUE *) procglobal-freeProcs;
+		procglobal-freeProcs = proc;
 	}
 
-	/* PGPROC struct isn't mine anymore */
-	MyProc = NULL;
-
 	/* Update shared estimate of spins_per_delay */
 	procglobal-spins_per_delay = update_spins_per_delay(procglobal-spins_per_delay);
 
@@ -849,6 +853,7 @@ AuxiliaryProcKill(int code, Datum arg)
 {
 	int			proctype = DatumGetInt32(arg);
 	PGPROC	   *auxproc PG_USED_FOR_ASSERTS_ONLY;
+	PGPROC	   *proc;
 
 	Assert(proctype = 0  proctype  NUM_AUXILIARY_PROCS);
 
@@ -859,16 +864,19 @@ AuxiliaryProcKill(int code, Datum arg)
 	/* Release any LW locks I am holding (see notes above) */

Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread Atri Sharma



Sent from my iPad

 On 27-Jan-2014, at 21:03, David Fetter da...@fetter.org wrote:
 
 On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote:
 Hi Hanada-san,
 
 While still reviwing this patch, I feel this patch has given enough
 consideration to interactions with other commands, but found the
 following incorrect? behabior:
 
 postgres=# CREATE TABLE product (id INTEGER, description TEXT);
 CREATE TABLE
 postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product)
 SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv');
 CREATE FOREIGN TABLE
 postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE
 EXTERNAL;
 ERROR:  product1 is not a table or materialized view
 
 ISTN the ALTER TABLE simple recursion mechanism (ie
 ATSimpleRecursion()) should be modified for the ALTER COLUMN SET
 STORAGE case.
 
 This points to a larger discussion about what precisely foreign tables
 can and cannot inherit from local ones.  I don't think that a generic
 solution will be satisfactory, as the PostgreSQL FDW could, at least
 in principle, support many more than the CSV FDW, as shown above.
 
 In my estimation, the outcome of discussion above is not a blocker for
 this 

I wonder what shall be the cases when foreign table is on a server which does 
not support *all* SQL features.

Does a FDW need to have the possible inherit options mentioned in its 
documentation for this patch?

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-01-27 Thread Amit Langote

On Mon, Jan 27, 2014 at 11:18 PM, Amit Langote amitlangot...@gmail.com wrote:
 On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 I have extended test (contrib) module dsm_demo such that now user
 can specify during dsm_demo_create the lifespan of segment.
 The values it can accept are 0 or 1. Default value is 0.
 0 -- means segment will be accessible for session life time
 1 -- means segment will be accessible for postmaster life time


 The behaviour is as below:
 Test -1 (Session life time)
 Session - 1
 -- here it will create segment for session lifetime
 select dsm_demo_create('this message is from session-1', 0);
  dsm_demo_create
 -
82712

 Session - 2
 -
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)


 Session-1
 \q

 Session-2
 postgres=# select dsm_demo_read(82712);
  dsm_demo_read
 ---

 (1 row)

 Conclusion of Test-1 : As soon as session which has created segment finished,
 the segment becomes non-accessible.


 Test -2 (Postmaster life time)
 Session - 1
 -- here it will create segment for postmaster lifetime
 select dsm_demo_create('this message is from session-1', 1);
  dsm_demo_create
 -
82712

 Session - 2
 -
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)


 Session-1
 \q

 Session-2
 postgres=# select dsm_demo_read(82712);
  dsm_demo_read
 ---
  this message is from session-1
 (1 row)

 Conclusion of Test-2 : a. Segment is accessible for postmaster lifetime.
  b. if user restart server, segment is
 not accessible.



 Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch.
 Got the following warning when I tried above example:

 postgres=# select dsm_demo_create('this message is from session-new', 1);
 WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
 WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
  dsm_demo_create
 -
   1402373971
 (1 row)



I see that PrintDSMLeakWarning() which emits this warning is for debugging.

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

Amit Kapila amit.kapil...@gmail.com writes:
 On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Jan 24, 2014 at 2:38 AM, Robert Haas robertmh...@gmail.com wrote:
 I mean, if
 there's a GUC that controls the event source name, then it can be
 changed between restarts, regardless of what you call it.

 Yes, but not default values (when user don't provide any value
 for event_soource). Here the question is about default value of
 event_source.

 To proceed with the review of this patch, I need to know about
 whether appending version number or any other constant togli

 Default Event Source name is acceptable or not, else for now
 we can remove this part of code from patch and handle non-default
 case where the change will be that pg_ctl will enquire non-default
 event_source value from server.

 Could you please let me know your views about same?

Unless I'm missing something, this entire thread is a tempest in a teapot,
because the default event_source value does not matter, because *by
default we don't log to eventlog*.  The presumption is that if the user
turns on logging to eventlog, it's his responsibility to first make sure
that event_source is set to something appropriate.  And who's to say that
plain PostgreSQL isn't what he wanted, anyway?  Even if he's got
multiple servers on one machine, maybe directing all their logs to the
same place is okay by him.

Also, those who don't run multiple servers are probably not going to
thank us for moving their logs around unnecessarily.

In short, I think we should just reject this idea as introducing more
problems than it solves, and not fully solving even the problem it
purports to solve.

Possibly there's room for a documentation patch reminding users to
make sure that event_source is set appropriately before they turn
on eventlog.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race condition in b-tree page deletion

2014-01-27 Thread Heikki Linnakangas


On 12/17/2013 04:55 AM, Jim Nasby wrote:

On 11/9/13, 10:02 AM, Heikki Linnakangas wrote:

3. Another approach would be to get rid of the can't delete
rightmost child limitation. We currently have that limitation
because it ensures that we never need to change the high-key of a
page. If we delete a page that is the rightmost child of its
parent, we transfer the deleted keyspace from the parent page to
its right sibling. To do that, we need to update the high key of
the parent, as well as the downlink of the right sibling at the
grandparent level. That's a bit complicated, because updating the
high key might require splitting the page.


Is the rightmost child issue likely to affect indexes on increasing
values, like a queue table?


No. In a FIFO queue, you keep adding new items to the right-end of the 
index, so old pages become non-rightmost fairly quickly.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

On Sun, Jan 26, 2014 at 10:56 PM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:
 On 01/25/2014, Josh Berkus wrote:
  ISTM the consensus is that we need better monitoring/administration
  interfaces so that people can script the behavior they want in
  external tools. Also, a new synchronous apply replication mode would
  be handy, but that'd be a whole different patch. We don't have a
 patch
  on the table that we could consider committing any time soon, so I'm
  going to mark this as rejected in the commitfest app.

 I don't feel that we'll never do auto-degrade is determinative;
 several hackers were for auto-degrade, and they have a good use-case
 argument.  However, we do have consensus that we need more scaffolding
 than this patch supplies in order to make auto-degrade *safe*.

 I encourage the submitter to resumbit and improved version of this
 patch (one with more monitorability) for  9.5 CF1.  That'll give us a
 whole dev cycle to argue about it.

 I shall rework to improve this patch. Below are the summarization of all
 discussions, which will be used as input for improving the patch:

 1. Method of degrading the synchronous mode:
 a. Expose the configuration variable to a new SQL-callable functions.
 b. Using ALTER SYSTEM SET.
 c. Auto-degrade using some sort of configuration parameter as done in 
 current patch.
 d. Or may be combination of above, which DBA can use depending on 
 their use-cases.

   We can discuss further to decide on one of the approach.

 2. Synchronous mode should upgraded/restored after at-least one synchronous 
 standby comes up and has caught up with the master.

 3. A better monitoring/administration interfaces, which can be even better if 
 it is made as a generic trap system.

   I shall propose a better approach for this.

 4. Send committing clients, a WARNING if they have committed a synchronous 
 transaction and we are in degraded mode.

 5. Please add more if I am missing something.

All of those things have been mentioned, but I'm not sure we have
consensus on which of them we actually want to do, or how.  Figuring
that out seems like the next step.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

Simon Riggs si...@2ndquadrant.com writes:
 Am I right in thinking that we have this fully working now?

I will look at this at some point during the CF, but have not yet,
and probably won't as long as it's not marked ready for committer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

On 27 January 2014 15:04, Dean Rasheed dean.a.rash...@gmail.com wrote:

 So for example, when planning the query to update an inheritance
 child, the rtable will contain an RTE for the parent, but it will not
 be referenced in the parse tree, and so it will not be expanded while
 planning the child update.

Am I right in thinking that we have this fully working now?

If we commit this aspect soon, we stand a chance of also touching upon RLS.

AFAICS the only area of objection is the handling of inherited
relations, which occurs within the planner in the current patch. I can
see that would be a cause for concern since the planner is pluggable
and it would then be possible to bypass security checks. Obviously
installing a new planner isn't trivial, but doing so shouldn't cause
collateral damage.

We have long had restrictions around updateable views. My suggestion
from here is that we accept the restriction that we cannot yet have
the 3-way combination of updateable views, security views and views on
inherited tables.

Most people aren't using inherited tables and people that are have
special measures in place for their apps. We won't lose much by
accepting that restriction for 9.4 and re-addressing the issue in a
later release. We need not adopt an all or nothing approach. Perhaps
we might yet find a solution for 9.4, but again, that need not delay
the rest of the patch.

From a review perspective, I'd want to see some greatly expanded
README comments, but given the Wiki entry, I think we can do that
quickly. Other than that, the code seems clear, modular and well
tested, so is something I could see me committing the uncontentious
parts of.

Thoughts?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

On Thu, Jan 23, 2014 at 11:10 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 22, 2014 at 12:42 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 On 2014-01-22 12:40:34 -0500, Robert Haas wrote:
 On Wed, Jan 22, 2014 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  Shouldn't we introduce a typedef LWLock* LWLockid; or something to avoid
  breaking external code using lwlocks?
 
  +1, in fact there's probably no reason to touch most *internal* code using
  that type name either.

 I thought about this but figured it was too much of a misnomer to
 refer to a pointer as an ID.  But, if we're sure we want to go that
 route, I can go revise the patch along those lines.

 I personally don't care either way for internal code as long as external
 code continues to work. There's the argument of making the commit better
 readable by having less noise and less divergence in the branches and
 there's your argument of that being less clear.

 OK, well then, if no one objects violently, I'll stick my current
 approach of getting rid of all core mentions of LWLockId in favor of
 LWLock *, but also add typedef LWLock *LWLockId with a comment that
 this is to minimize breakage of third-party code.

Hearing no objections, violent or otherwise, I've done that, made the
other adjustments suggested by Andres and KaiGai, and committed this.
Let's see what the buildfarm thinks...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-01-27 Thread Stephen Frost

KaiGai Kohei,

* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
 Is somebody available to volunteer to review the custom-scan patch?

I looked through it a bit and my first take away from it was that the
patches to actually use the new hooks were also making more changes to
the backend code, leaving me with the impression that the proposed
interface isn't terribly stable.  Perhaps those changes should have just
been in the first patch, but they weren't and that certainly gave me
pause.

I'm also not entirely convinced that this is the direction to go in when
it comes to pushing down joins to FDWs.  While that's certainly a goal
that I think we all share, this seems to be intending to add a
completely different feature which happens to be able to be used for
that.  For FDWs, wouldn't we only present the FDW with the paths where
the foreign tables for that FDW, or perhaps just a given foreign server,
are being joined?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

2014-01-27 Thread Amit Langote

On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 I have extended test (contrib) module dsm_demo such that now user
 can specify during dsm_demo_create the lifespan of segment.
 The values it can accept are 0 or 1. Default value is 0.
 0 -- means segment will be accessible for session life time
 1 -- means segment will be accessible for postmaster life time


 The behaviour is as below:
 Test -1 (Session life time)
 Session - 1
 -- here it will create segment for session lifetime
 select dsm_demo_create('this message is from session-1', 0);
  dsm_demo_create
 -
82712

 Session - 2
 -
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)


 Session-1
 \q

 Session-2
 postgres=# select dsm_demo_read(82712);
  dsm_demo_read
 ---

 (1 row)

 Conclusion of Test-1 : As soon as session which has created segment finished,
 the segment becomes non-accessible.


 Test -2 (Postmaster life time)
 Session - 1
 -- here it will create segment for postmaster lifetime
 select dsm_demo_create('this message is from session-1', 1);
  dsm_demo_create
 -
82712

 Session - 2
 -
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)


 Session-1
 \q

 Session-2
 postgres=# select dsm_demo_read(82712);
  dsm_demo_read
 ---
  this message is from session-1
 (1 row)

 Conclusion of Test-2 : a. Segment is accessible for postmaster lifetime.
  b. if user restart server, segment is
 not accessible.



Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch.
Got the following warning when I tried above example:

postgres=# select dsm_demo_create('this message is from session-new', 1);
WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
 dsm_demo_create
-
  1402373971
(1 row)



--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?

2014-01-27 Thread Magnus Hagander

On Mon, Jan 27, 2014 at 3:43 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Jan 26, 2014 at 1:03 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  For some reason CheckForStandbyTrigger() doesn't report permission
  errors when stat()int the trigger file. Shouldn't we fix that?
 
  static bool
  CheckForStandbyTrigger(void)
  {
  ...
  if (stat(TriggerFile, stat_buf) == 0)
  {
  ereport(LOG,
  (errmsg(trigger file found: %s,
 TriggerFile)));
  unlink(TriggerFile);
  triggered = true;
  fast_promote = true;
  return true;
  }
 
  Imo the stat() should warn about all errors but ENOENT?

 Seems reasonable.  It could lead to quite a bit of log spam, I
 suppose, but the way things are now could be pretty mystifying if
 you've located your trigger file somewhere outside $PGDATA, and a
 parent directory is lacking permissions.


+1. Since it actually indicates something that's quite broken (since with
that you can never make the trigger work until you fix it), the log spam
seems like it would be appropriate. (Logspam is never nice, but a single
log line is also very easy to miss - this should log enough that you
wouldn't)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement


On 01/27/2014 08:48 AM, Mitsumasa KONDO wrote:


 The issue of concern is not the performance of pg_stat_statements,
 AUIU. The issue is whether this patch affects performance
 generally, i.e. is there a significant cost in collecting these
 extra stats. To test this you would compare two general pgbench
 runs, one with the patch applied and one without.

 I showed first test result which is compared with without
 pg_stat_statements and without patch last day. They ran in same server
 and same benchmark settings(clients and scale factor) as today's
 result. When you merge and see the results, you can confirm not to
 affect of performance in my patch.




Yeah, sorry, I misread your message. I think this is good to go from a
performance point of view, although I'm still a bit worried about the
validity of the method (accumulating a sum of squares). OTOH, Welford's
method probably requires slightly more per statement overhead, and
certainly requires one more accumulator per statement. I guess if we
find ill conditioned results it wouldn't be terribly hard to change.

cheers

andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?

On Sun, Jan 26, 2014 at 1:03 PM, Andres Freund and...@2ndquadrant.com wrote:
 For some reason CheckForStandbyTrigger() doesn't report permission
 errors when stat()int the trigger file. Shouldn't we fix that?

 static bool
 CheckForStandbyTrigger(void)
 {
 ...
 if (stat(TriggerFile, stat_buf) == 0)
 {
 ereport(LOG,
 (errmsg(trigger file found: %s, 
 TriggerFile)));
 unlink(TriggerFile);
 triggered = true;
 fast_promote = true;
 return true;
 }

 Imo the stat() should warn about all errors but ENOENT?

Seems reasonable.  It could lead to quite a bit of log spam, I
suppose, but the way things are now could be pretty mystifying if
you've located your trigger file somewhere outside $PGDATA, and a
parent directory is lacking permissions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-27 Thread Dean Rasheed

On 27 January 2014 07:54, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 Hello,

 I checked the latest updatable security barrier view patch.
 Even though I couldn't find a major design problem in this revision,
 here are two minor comments below.

 I think, it needs to be reviewed by committer to stick direction
 to implement this feature. Of course, even I know Tom argued the
 current design of this feature on the up-thread, it does not seem
 to me Dean's design is not reasonable.


Thanks for looking at this.


 Below is minor comments of mine:

 @@ -932,9 +938,32 @@ inheritance_planner(PlannerInfo *root)
 if (final_rtable == NIL)
 final_rtable = subroot.parse-rtable;
 else
 -   final_rtable = list_concat(final_rtable,
 +   {
 +   List   *tmp_rtable = NIL;
 +   ListCell   *cell1, *cell2;
 +
 +   /*
 +* Planning this new child may have turned some of the original
 +* RTEs into subqueries (if they had security barrier quals). If
 +* so, we want to use these in the final rtable.
 +*/
 +   forboth(cell1, final_rtable, cell2, subroot.parse-rtable)
 +   {
 +   RangeTblEntry *rte1 = (RangeTblEntry *) lfirst(cell1);
 +   RangeTblEntry *rte2 = (RangeTblEntry *) lfirst(cell2);
 +
 +   if (rte1-rtekind == RTE_RELATION 
 +   rte1-securityQuals != NIL 
 +   rte2-rtekind == RTE_SUBQUERY)
 +   tmp_rtable = lappend(tmp_rtable, rte2);
 +   else
 +   tmp_rtable = lappend(tmp_rtable, rte1);
 +   }

 Do we have a case if rte1 is regular relation with securityQuals but
 rte2 is not a sub-query? If so, rte2-rtekind == RTE_SUBQUERY should
 be a condition in Assert, but the third condition in if-block.


Yes it is possible for rte1 to be a RTE_RELATION with securityQuals
and rte2 to be something other than a RTE_SUBQUERY because the
subquery expansion code in expand_security_quals() only expands RTEs
that are actually used in the query.

So for example, when planning the query to update an inheritance
child, the rtable will contain an RTE for the parent, but it will not
be referenced in the parse tree, and so it will not be expanded while
planning the child update.


 In case when a sub-query is simple enough; no qualifier and no projection
 towards underlying scan, is it pulled-up even if this sub-query has
 security-barrier attribute, isn't it?
 See the example below. The view v2 is defined as follows.

 postgres=# CREATE VIEW v2 WITH (security_barrier) AS SELECT * FROM t2 WHERE x 
 % 10 = 5;
 CREATE VIEW
 postgres=# EXPLAIN SELECT * FROM v2 WHERE f_leak(z);
QUERY PLAN
 -
  Subquery Scan on v2  (cost=0.00..3.76 rows=1 width=41)
Filter: f_leak(v2.z)
-  Seq Scan on t2  (cost=0.00..3.50 rows=1 width=41)
  Filter: ((x % 10) = 5)
 (4 rows)

 postgres=# EXPLAIN SELECT * FROM v2;
 QUERY PLAN
 ---
  Seq Scan on t2  (cost=0.00..3.50 rows=1 width=41)
Filter: ((x % 10) = 5)
 (2 rows)

 The second explain result shows the underlying sub-query is
 pulled-up even though it has security-barrier attribute.
 (IIRC, it was a new feature in v9.3.)


Actually what happens is that it is planned as a subquery scan, then
at the very end of the planning process, it detects that the subquery
scan is trivial (has no quals, and has a no-op targetlist) and it
removes that plan node --- see set_subqueryscan_references() and
trivial_subqueryscan().

That subquery scan removal code requires the targetlist to have
exactly the same number of attributes, in exactly the same order as
the underlying relation. As soon as you add anything non-trivial to
the select list in the above queries, or even just change the order of
its attributes, the subquery scan node is no longer removed.


 On the other hand, this kind of optimization was not applied
 on a sub-query being extracted from a relation with securityQuals

 postgres=# EXPLAIN UPDATE v2 SET z = z;
  QUERY PLAN
 
  Update on t2 t2_1  (cost=0.00..3.51 rows=1 width=47)
-  Subquery Scan on t2  (cost=0.00..3.51 rows=1 width=47)
  -  Seq Scan on t2 t2_2  (cost=0.00..3.50 rows=1 width=47)
Filter: ((x % 10) = 5)
 (4 rows)

 If it has no security_barrier option, the view reference is extracted
 in the rewriter stage, it was pulled up as we expected.

 postgres=# ALTER VIEW v2 RESET (security_barrier);
 ALTER VIEW
 postgres=# EXPLAIN UPDATE t2 SET z = z;
 QUERY PLAN
 ---
  Update on t2  (cost=0.00..3.00 rows=100 width=47)
-  Seq Scan on t2  (cost=0.00..3.00 rows=100 width=47)
 (2 rows)

 Probably, it

Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread David Fetter

On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote:
 Hi Hanada-san,
 
 While still reviwing this patch, I feel this patch has given enough
 consideration to interactions with other commands, but found the
 following incorrect? behabior:
 
 postgres=# CREATE TABLE product (id INTEGER, description TEXT);
 CREATE TABLE
 postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product)
 SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv');
 CREATE FOREIGN TABLE
 postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE
 EXTERNAL;
 ERROR:  product1 is not a table or materialized view
 
 ISTN the ALTER TABLE simple recursion mechanism (ie
 ATSimpleRecursion()) should be modified for the ALTER COLUMN SET
 STORAGE case.

This points to a larger discussion about what precisely foreign tables
can and cannot inherit from local ones.  I don't think that a generic
solution will be satisfactory, as the PostgreSQL FDW could, at least
in principle, support many more than the CSV FDW, as shown above.

In my estimation, the outcome of discussion above is not a blocker for
this patch.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Fri, Jan 24, 2014 at 2:38 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 23, 2014 at 9:23 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 23, 2014 at 10:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 I think what we might want to do is redefine the server's behavior
 as creating an event named after the concatenation of event_source
 and port number, or maybe even get rid of event_source entirely and
 just say it's PostgreSQL followed by the port number.

To accomplish this behaviour, each time server starts and stops,
we need to register and unregister event log using mechanism
described at below link to ensure that there is no mismatch between
what server uses and what OS knows.
http://www.postgresql.org/docs/devel/static/event-log-registration.html

 Why wouldn't that be necessary with your approach, too?

Because in my approach we are using compile time constant
 + #define DEFAULT_EVENT_SOURCE
PostgreSQL  PG_MAJORVERSION

 I mean, if
 there's a GUC that controls the event source name, then it can be
 changed between restarts, regardless of what you call it.

 Yes, but not default values (when user don't provide any value
 for event_soource). Here the question is about default value of
 event_source.

To proceed with the review of this patch, I need to know about
whether appending version number or any other constant to
Default Event Source name is acceptable or not, else for now
we can remove this part of code from patch and handle non-default
case where the change will be that pg_ctl will enquire non-default
event_source value from server.

Could you please let me know your views about same?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-01-27 Thread Alexander Korotkov

On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 In addition to that, I'm using the ternary consistent function to check
 if minItem is a match, even if we haven't loaded all the entries yet.
 That's less important, but I think for something like rare1 | (rare2 
 frequent) it might be useful. It would allow us to skip fetching
 'frequent', when we already know that 'rare1' matches for the current
 item. I'm not sure if that's worth the cycles, but it seemed like an
 obvious thing to do, now that we have the ternary consistent function.


 So, that clearly isn't worth the cycles :-). At least not with an
 expensive consistent function; it might be worthwhile if we pre-build the
 truth-table, or cache the results of the consistent function.


I believe cache consistent function results is quite same as lazy
truth-table. I think it's a good option to use with two-state consistent
function. However, I don't think it's a reason to refuse from three-state
consistent function because number of entries could be large.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

On 27 January 2014 16:11, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Am I right in thinking that we have this fully working now?

 I will look at this at some point during the CF, but have not yet,
 and probably won't as long as it's not marked ready for committer.

I've marked it Ready for Committer, to indicate my personal opinion.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fwd: Request for error explaination || Adding a new integer in indextupleData Structure

2014-01-27 Thread Rohit Goyal

Hi All,

I was trying to modify indextupledata structure by adding an integer
variable. ButI faced an error message psql: FATAL:  could not find tuple
for opclass 10032.

Could anyone please help me in resolving this issue.


Regards,
Rohit Goyal



-- 
Regards,
Rohit Goyal

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
 I don't see anything for 9.4 in here now.

Attached is what I was toying with (thought I had attached it previously
somewhere..  perhaps not), but in re-testing, it doesn't appear to do
enough to move things in the right direction in all cases.  I did play
with this a fair bit yesterday and while it improved some cases by 20%
(eg: a simple join between pgbench_accounts and pgbench_history), when
we decide to *still* hash the larger side (as in my 'test_case2.sql'),
it can cause a similairly-sized decrease in performance.  Of course, if
we can push that case to hash the smaller side (which I did by hand with
cpu_tuple_cost), then it goes back to being a win to use a larger number
of buckets.

I definitely feel that there's room for improvment here but it's not an
easily done thing, unfortunately.  To be honest, I was pretty surprised
when I saw that the larger number of buckets performed worse, even if it
was when we picked the wrong side to hash and I plan to look into that
more closely to try and understand what's happening.  My first guess
would be what Tom had mentioned over the summer- if the size of the
bucket array ends up being larger than the CPU cache, we can end up
paying a great deal more to build the hash table than it costs to scan
through the deeper buckets that we end up with as a result (particularly
when we're scanning a smaller table).  Of course, choosing to hash the
larger table makes that more likely..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Stephen Frost

* Stephen Frost (sfr...@snowman.net) wrote:
 * Simon Riggs (si...@2ndquadrant.com) wrote:
  I don't see anything for 9.4 in here now.
 
 Attached [...]

I'm apparently bad at this 'attaching' thing, particularly on this
subject.

Here it is.

Thanks,

Stephen
colordiff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
new file mode 100644
index 6a2f236..a8a8168
*** a/src/backend/executor/nodeHash.c
--- b/src/backend/executor/nodeHash.c
*** ExecChooseHashTableSize(double ntuples,
*** 489,496 
  		/* We expect the hashtable to fit in memory */
  		double		dbuckets;
  
! 		dbuckets = ceil(ntuples / NTUP_PER_BUCKET);
! 		dbuckets = Min(dbuckets, max_pointers);
  		nbuckets = (int) dbuckets;
  
  		nbatch = 1;
--- 489,495 
  		/* We expect the hashtable to fit in memory */
  		double		dbuckets;
  
! 		dbuckets = Min(ntuples, max_pointers);
  		nbuckets = (int) dbuckets;
  
  		nbatch = 1;


signature.asc
Description: Digital signature

Re: [HACKERS] Fwd: Request for error explaination || Adding a new integer in indextupleData Structure

Rohit Goyal rhtgyl...@gmail.com writes:
 Hi All,
 I was trying to modify indextupledata structure by adding an integer
 variable. ButI faced an error message psql: FATAL:  could not find tuple
 for opclass 10032.

 Could anyone please help me in resolving this issue.

You broke a system catalog index.  Without seeing what you changed and
where, it's impossible to say just how, but that's the bottom line.

In recent versions of PG, opclass 10032 is btree name_ops (unless you've
also added/removed system catalog entries), which is a pretty plausible
thing to be one of the first indexscanned fetches during relcache.c
initialization, so I don't think there's any great significance in this
particular error message.  It's likely that you broke *all* indexscans
not just one specific one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-01-27 Thread Ronan Dunklau

Le mardi 7 janvier 2014 17:05:03 Michael Paquier a écrit :
 On Sun, Jan 5, 2014 at 3:49 PM, MauMau maumau...@gmail.com wrote:
  Could you confirm again and tell me what problem is happening?
 
 FWIW, I just quickly tested those two patches independently and got
 them correctly applied with patch -p1  $PATCH on master at edc4345.
 They compiled and passed as well make check.
 Regards,

Both patches apply cleanly, I'll focus on the pg_stop_fail_v3 patch.

Tests are running correctly.

The bug this patch is supposed to fix has been reproduced on current HEAD, and 
cannot be reproduced using the patch.

Previous concerns about using both get_pgpid and postmaster_is_alive are 
adressed.

There is no regression tests covering this bugfix, althought I don't know if 
it would be practical to implement them.


-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-27 Thread Pavel Stehule

2014-01-27 Stephen Frost sfr...@snowman.net

 * Simon Riggs (si...@2ndquadrant.com) wrote:
  I don't see anything for 9.4 in here now.

 Attached is what I was toying with (thought I had attached it previously
 somewhere..  perhaps not), but in re-testing, it doesn't appear to do
 enough to move things in the right direction in all cases.  I did play
 with this a fair bit yesterday and while it improved some cases by 20%
 (eg: a simple join between pgbench_accounts and pgbench_history), when
 we decide to *still* hash the larger side (as in my 'test_case2.sql'),
 it can cause a similairly-sized decrease in performance.  Of course, if
 we can push that case to hash the smaller side (which I did by hand with
 cpu_tuple_cost), then it goes back to being a win to use a larger number
 of buckets.

 I definitely feel that there's room for improvment here but it's not an
 easily done thing, unfortunately.  To be honest, I was pretty surprised
 when I saw that the larger number of buckets performed worse, even if it
 was when we picked the wrong side to hash and I plan to look into that
 more closely to try and understand what's happening.  My first guess
 would be what Tom had mentioned over the summer- if the size of the
 bucket array ends up being larger than the CPU cache, we can end up
 paying a great deal more to build the hash table than it costs to scan
 through the deeper buckets that we end up with as a result (particularly
 when we're scanning a smaller table).  Of course, choosing to hash the
 larger table makes that more likely..


This topic is interesting - we found very bad performance with hashing
large tables with high work_mem. MergeJoin with quicksort was significantly
faster.

I didn't deeper research - there is a possibility of virtualization
overhead.

Regards

Pavel



 Thanks,

 Stephen

Re: [HACKERS] new json funcs

2014-01-27 Thread Merlin Moncure

On Fri, Jan 24, 2014 at 3:26 PM, Josh Berkus j...@agliodbs.com wrote:
 On 01/24/2014 12:59 PM, Andrew Dunstan wrote:

 On 01/24/2014 03:40 PM, Laurence Rowe wrote:
 For consistency with the existing json functions (json_each,
 json_each_text, etc.) it might be better to add separate
 json_to_record_text and json_to_recordset_text functions in place of
 the nested_as_text parameter to json_to_record and json_to_recordset.



 It wouldn't be consistent with json_populate_record() and
 json_populate_recordset(), the two closest relatives, however.

 And yes, I appreciate that we have not been 100% consistent. Community
 design can be a bit messy that way.

 FWIW, I prefer the parameter to having differently named functions.

+1.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add %z support to elog/ereport?

I wrote:
 the idea that we might get many dozen such warnings on more-current
 compilers is scarier, as that might well interfere with people's
 ability to do development on, say, Windows.  Could somebody check
 whether MSVC for instance complains about format strings using z?
 Or shall I just commit this and we'll see what the buildfarm says?

 Given the lack of response, I've pushed the patch, and will canvass
 the buildfarm results later.

Just to follow up on that, I couldn't find any related warnings in the
buildfarm this morning (although there is one laggard machine, anole,
which uses an unusual compiler and still hasn't reported in).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs



On 01/27/2014 12:43 PM, Merlin Moncure wrote:

On Fri, Jan 24, 2014 at 3:26 PM, Josh Berkus j...@agliodbs.com wrote:

On 01/24/2014 12:59 PM, Andrew Dunstan wrote:

On 01/24/2014 03:40 PM, Laurence Rowe wrote:

For consistency with the existing json functions (json_each,
json_each_text, etc.) it might be better to add separate
json_to_record_text and json_to_recordset_text functions in place of
the nested_as_text parameter to json_to_record and json_to_recordset.



It wouldn't be consistent with json_populate_record() and
json_populate_recordset(), the two closest relatives, however.

And yes, I appreciate that we have not been 100% consistent. Community
design can be a bit messy that way.

FWIW, I prefer the parameter to having differently named functions.

+1.




Note that we can only do this when the result type stays the same. It 
does not for json_each/json_each_text or 
json_extract_path/json_extract_path_text, which is why we have different 
functions for those cases.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

On 24 January 2014 08:33, Simon Riggs si...@2ndquadrant.com wrote:
 On 24 January 2014 07:08, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs si...@2ndquadrant.com wrote:
 v15 to fix the above problem.

 v16 attached

v17 attached

This version adds a GUC called ddl_exclusive_locks which allows us to
keep the 9.3 behaviour if we wish it. Some people may be surprised
that their programs don't wait in the same places they used to. We
hope that is a positive and useful behaviour, but it may not always be
so.

I'll commit this on Thurs 30 Jan unless I hear objections.

Thanks

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planning time in explain/explain analyze

2014-01-27 Thread Andreas Karlsson


On 01/13/2014 09:48 PM, Robert Haas wrote:

What I'm saying is that if EXPLAIN reports something that's labelled
Planning Time, it should *be* the planning time, and not anything
else.  When we retrieve a plan from cache, it would be sensible not to
report the planning time at all, and IMHO it would also be sensible to
report the time it actually took to plan whenever we originally did
it.  But reporting a value that is not the planning time and calling
it the planning time does not seem like a good idea to me.


Here is a patch which only prints when Planning time when a prepared 
statment actually planned a query. I do not really like how I check for 
if it was replanned, but I tried to avoid making changes in plancache.c.


Does this idea look ok?

--
Andreas Karlsson
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
new file mode 100644
index 2af1738..482490b
*** a/doc/src/sgml/perform.sgml
--- b/doc/src/sgml/perform.sgml
*** EXPLAIN SELECT * FROM tenk1;
*** 89,94 
--- 89,95 
   QUERY PLAN
  -
   Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244)
+  Planning time: 0.113 ms
  /screen
 /para
  
*** EXPLAIN SELECT * FROM tenk1;
*** 162,167 
--- 163,174 
 /para
  
 para
+ The literalPlanning time/literal shown is the time it took to generate
+ the query plan from the parsed query and optimize it. It does not include
+ rewriting and parsing.
+/para
+ 
+para
  Returning to our example:
  
  screen
*** EXPLAIN SELECT * FROM tenk1;
*** 170,175 
--- 177,183 
   QUERY PLAN
  -
   Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244)
+  Planning time: 0.113 ms
  /screen
 /para
  
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 198,203 
--- 206,212 
  
   Seq Scan on tenk1  (cost=0.00..483.00 rows=7001 width=244)
 Filter: (unique1 lt; 7000)
+  Planning time: 0.104 ms
  /screen
  
  Notice that the commandEXPLAIN/ output shows the literalWHERE/
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 234,239 
--- 243,249 
 Recheck Cond: (unique1 lt; 100)
 -gt;  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
   Index Cond: (unique1 lt; 100)
+  Planning time: 0.093 ms
  /screen
  
  Here the planner has decided to use a two-step plan: the child plan
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 262,267 
--- 272,278 
 Filter: (stringu1 = 'xxx'::name)
 -gt;  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.04 rows=101 width=0)
   Index Cond: (unique1 lt; 100)
+  Planning time: 0.089 ms
  /screen
  
  The added condition literalstringu1 = 'xxx'/literal reduces the
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 283,288 
--- 294,300 
  -
   Index Scan using tenk1_unique1 on tenk1  (cost=0.29..8.30 rows=1 width=244)
 Index Cond: (unique1 = 42)
+  Planning time: 0.076 ms
  /screen
  
  In this type of plan the table rows are fetched in index order, which
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 311,316 
--- 323,329 
 Index Cond: (unique1 lt; 100)
   -gt;  Bitmap Index Scan on tenk1_unique2  (cost=0.00..19.78 rows=999 width=0)
 Index Cond: (unique2 gt; 9000)
+  Planning time: 0.094 ms
  /screen
  
  But this requires visiting both indexes, so it's not necessarily a win
*** EXPLAIN SELECT * FROM tenk1 WHERE unique
*** 331,336 
--- 344,350 
 -gt;  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..71.27 rows=10 width=244)
   Index Cond: (unique2 gt; 9000)
   Filter: (unique1 lt; 100)
+  Planning time: 0.087 ms
  /screen
 /para
  
*** WHERE t1.unique1 lt; 10 AND t1.unique2
*** 364,369 
--- 378,384 
 Index Cond: (unique1 lt; 10)
 -gt;  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..7.91 rows=1 width=244)
   Index Cond: (unique2 = t1.unique2)
+  Planning time: 0.117 ms
  /screen
 /para
  
*** WHERE t1.unique1 lt; 10 AND t2.unique2
*** 415,420 
--- 430,436 
 -gt;  Materialize  (cost=0.29..8.51 rows=10 width=244)
   -gt;  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..8.46 rows=10 width=244)
 Index Cond: (unique2 lt; 10)
+  Planning time: 0.119 ms
  /screen
  
  The condition literalt1.hundred lt; t2.hundred/literal can't be
*** WHERE t1.unique1 lt; 100 AND t1.unique2
*** 462,467 
--- 478,484 
 Recheck Cond: (unique1 lt; 100)
 -gt;  Bitmap Index Scan on

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

On 27 January 2014 17:58, Simon Riggs si...@2ndquadrant.com wrote:
 On 24 January 2014 08:33, Simon Riggs si...@2ndquadrant.com wrote:
 On 24 January 2014 07:08, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs si...@2ndquadrant.com wrote:
 v15 to fix the above problem.

 v16 attached

 v17 attached

Frostbite...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


reduce_lock_levels.v17.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

On 27 January 2014 17:44, Pavel Stehule pavel.steh...@gmail.com wrote:

 This topic is interesting - we found very bad performance with hashing large
 tables with high work_mem. MergeJoin with quicksort was significantly
 faster.

I've seen this also.

 I didn't deeper research - there is a possibility of virtualization
 overhead.

I took measurements and the effect was repeatable and happened for all
sizes of work_mem, but nothing more to add.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Heikki Linnakangas


On 01/23/2014 11:36 PM, Peter Geoghegan wrote:

The first thing I noticed about this patchset is that it completely
expunges btree_xlog_startup(), btree_xlog_cleanup() and
btree_safe_restartpoint(). The post-recovery cleanup that previously
occurred to address both sets of problems (the problem addressed by
this patch, incomplete page splits, and the problem addressed by the
dependency patch, incomplete page deletions) now both occur
opportunistically/lazily. Notably, this means that there are now
exactly zero entries in the resource manager list with a
'restartpoint' callback. I guess we should consider removing it
entirely for that reason. As you said in the commit message where
gin_safe_restartpoint was similarly retired (commit 631118fe):


The post-recovery cleanup mechanism was never totally reliable, as insertion
to the parent could fail e.g because of running out of memory or disk space,
leaving the tree in an inconsistent state.


I'm of the general impression that post-recovery cleanup is
questionable. I'm surprised that you didn't mention this commit
previously. You just mentioned the original analogous work on GiST,
but this certainly seems to be something you've been thinking about
*systematically* eliminating for a while.


Yes, that's correct, these b-tree patches are part of a grand plan to 
eliminate post-recovery cleanup actions altogether.



So while post-recovery callbacks no longer exist for any
rmgr-managed-resource, 100% of remaining startup and cleanup callbacks
concern the simple management of memory of AM-specific recovery
contexts (for GiST, GiN and SP-GiST). I have to wonder if there isn't
a better abstraction than that, such as a generic recovery memory
context, allowing you to retire all 3 callbacks. I mean, StartupXLOG()
just calls those callbacks for each resource at exactly the same time
anyway, just as it initializes resource managers in precisely the same
manner earlier on. Plus if you look at what those AM-local memory
management routines do, it all seems very simple.

I think you should bump XLOG_PAGE_MAGIC.

Perhaps you should increase the elevel here:

+   elog(DEBUG1, finishing incomplete split of %u/%u,
+BufferGetBlockNumber(lbuf), BufferGetBlockNumber(rbuf));

Since this is a very rare occurrence that involves some subtle
interactions, I can't think why you wouldn't want to LOG this for
forensic purposes.


Hmm. I'm not sure I agree with that line of thinking in general - what 
is an admin supposed to do about the message? But there's a precedent in 
vacuumlazy.c, which logs a message when it finds all-zero pages in the 
heap. So I guess that should be a LOG.



Why did you remove the local linkage of _bt_walk_left(), given that it
is called in exactly the same way as before? I guess that that is just
a vestige of some earlier revision.


Yeah, reverted.


I think I see some bugs in _bt_moveright(). If you examine
_bt_finish_split() in detail, you'll see that it doesn't just drop the
write buffer lock that the caller will have provided (per its
comments) - it also drops the buffer pin. It calls _bt_insert_parent()
last, which was previously only called last thing by _bt_insertonpg()
(some of the time), and _bt_insertonpg() is indeed documented to drop
both the lock and the pin. And if you look at _bt_moveright() again,
you'll see that there is a tacit assumption that the buffer pin isn't
dropped, or at least that opaque (the BTPageOpaque BT special page
area pointer) continues to point to the same page held in the same
buffer, even though the code doesn't set the opaque' pointer again
and it may not point to a pinned buffer or even the appropriate
buffer. Ditto page. So opaque may point to the wrong thing on
subsequent iterations - you don't do anything with the return value of
_bt_getbuf(), unlike all of the existing call sites. I believe you
should store its return value, and get the held page and the special
pointer on the page, and assign that to the opaque pointer before
the next iteration (an iteration in which you very probably really do
make progress not limited to completing a page split, the occurrence
of which the _bt_moveright() loop gets stuck on). You know, do what
is done in the non-split-handling case. It may not always be the same
buffer as before, obviously.


Yep, fixed.


Do you suppose that there are similar problems in other direct callers
of _bt_finish_split()? I'm thinking in particular of
_bt_findinsertloc().


Hmm, no, the other callers look OK to me.


I'm also not sure about the lock escalation that may occur within
_bt_moveright() for callers with BT_READ access - there is nothing to
prevent a caller from ending up being left with a write lock where
before they only had a read lock if they find that
!P_INCOMPLETE_SPLIT() with the write lock (after lock promotion) and
conclude that there is no split finishing to be done after all. It
looks like all callers currently won't care about that, but it seems
like that

Re: [HACKERS] alternative back-end block formats

2014-01-27 Thread Christian Convey

Hi Craig,

On Sun, Jan 26, 2014 at 5:47 AM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 01/21/2014 07:43 PM, Christian Convey wrote:
  Hi all,
 
  I'm playing around with Postgres, and I thought it might be fun to
  experiment with alternative formats for relation blocks, to see if I can
  get smaller files and/or faster server performance.

 It's not clear how you'd do this without massively rewriting the guts of
 Pg.

 Per the docs on internal structure, Pg has a block header, then tuples
 within the blocks, each with a tuple header and list of Datum values for
 the tuple. Each Datum has a generic Datum header (handling varlena vs
 fixed length values etc) then a type-specific on-disk representation
 controlled by the type output function for that type.


I'm still in the process of getting familiar with the pg backend code, so I
don't have a concrete plan yet.  However, I'm working on the assumption
that some set of macros and functions encapsulates the page layout.

If/when I tackle this, I expect to add a layer of indirection somewhere
around that boundary, so that some non-catalog tables, whose schemas meet
certain simplifying assumptions, are read and modified using specialized
code.

I don't want to get into the specific optimizations I'd like to try, only
because I haven't fully studied the code yet, so I don't want to put my
foot in my mouth.

What concrete problem do you mean to tackle? What idea do you want to
 explore or implement?


My real motivation is that I'd like to get more familiar with the pg
backend codebase, and tilting at this windmill seemed like an interesting
way to accomplish that.

If I was focused on really solving a real-world problem, I'd say that this
lays the groundwork for table-schema-specific storage optimizations and
optimized record-filtering code.  But I'd only make that argument if I
planned to (a) perform a careful study with statistically significant
benchmarks, and/or (b) produce a merge-worthy patch.  At this point I have
no intentions of doing so.  My main goal really is just to have fun with
the code.


  Does anyone know if this has been done before with Postgres?  I would
  have assumed yes, but I'm not finding anything in Google about people
  having done this.

 AFAIK (and I don't know much in this area) the storage manager isn't
 very pluggable compared to the rest of Pg.


Thanks for the warning.  Duly noted.

Kind regards,
Christian

Re: [HACKERS] [PATCH] Support for pg_stat_archiver view

2014-01-27 Thread Fujii Masao

On Sat, Jan 25, 2014 at 3:19 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Sat, Jan 25, 2014 at 5:41 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jan 23, 2014 at 4:10 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 6:36 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
 Il 08/01/14 18:42, Simon Riggs ha scritto:
 Not sure I see why it needs to be an SRF. It only returns one row.
 Attached is version 4, which removes management of SRF stages.

 I have been looking at v4 a bit more, and found a couple of small things:
 - a warning in pgstatfuncs.c
 - some typos and format fixing in the sgml docs
 - some corrections in a couple of comments
 - Fixed an error message related to pg_stat_reset_shared referring
 only to bgwriter and not the new option archiver. Here is how the new
 message looks:
 =# select pg_stat_reset_shared('popo');
 ERROR:  22023: unrecognized reset target: popo
 HINT:  Target must be bgwriter or archiver
 A new version is attached containing those fixes. I played also with
 the patch and pgbench, simulating some archive failures and successes
 while running pgbench and the reports given by pg_stat_archiver were
 correct, so I am marking this patch as Ready for committer.

 I refactored the patch further.

 * Remove ArchiverStats global variable to simplify pgarch.c.
 * Remove stats_timestamp field from PgStat_ArchiverStats struct because
it's not required.
 Thanks, pgstat_send_archiver is cleaner now.


 I have some review comments:

 +s.archived_wals,
 +s.last_archived_wal,
 +s.last_archived_wal_time,
 +s.failed_attempts,
 +s.last_failed_wal,
 +s.last_failed_wal_time,

 The column names of pg_stat_archiver look not consistent at least to me.
 What about the followings?

   archive_count
   last_archived_wal
   last_archived_time
   fail_count
   last_failed_wal
   last_failed_time
 And what about archived_count and failed_count instead of respectively
 archive_count and failed_count? The rest of the names are better now
 indeed.

 I think that it's time to rename all the variables related to 
 pg_stat_bgwriter.
 For example, it's better to change PgStat_GlobalStats to PgStat_Bgwriter.
 I think that it's okay to make this change as separate patch, though.
 Yep, this is definitely a different patch.

 +char last_archived_wal[MAXFNAMELEN];/* last WAL file archived */
 +TimestampTz last_archived_wal_timestamp;/* last archival success */
 +PgStat_Counter failed_attempts;
 +char last_failed_wal[MAXFNAMELEN];/* last WAL file involved
 in failure */
 Some hackers don't like the increase of the size of the statsfile. In order 
 to
 reduce the size as much as possible, we should use the exact size of WAL file
 here instead of MAXFNAMELEN?
 The first versions of the patch used a more limited size length more
 inline with what you say:
 +#define MAX_XFN_CHARS40
 But this is inconsistent with xlog_internal.h.

Using MAX_XFN_CHARS instead of MAXFNAMELEN has a clear advantage
to reduce the size of the statistics file. Though I'm not sure how much this
change improve the performance of the statistics collector, basically
I'd like to
use MAX_XFN_CHARS here.

+(errmsg(corrupted statistics file (global) \%s\,
statfile)));

I think that it's better to use the view name pg_stat_bgwriter instead of
internal name global here. The view name is easier-to-understand to a user.

+(errmsg(corrupted statistics file (archiver)
\%s\, statfile)));

Same as above. What using pg_stat_archiver instead of just archive?

+if (fread(myArchiverStats, 1, sizeof(myArchiverStats),
+  fpin) != sizeof(myArchiverStats))
+{
+ereport(pgStatRunningInCollector ? LOG : WARNING,
+(errmsg(corrupted statistics file \%s\, statfile)));

Same as above. Isn't it better to add something like
pg_stat_archiver even here?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-27 Thread Josh Berkus

On 01/26/2014 07:56 PM, Rajeev rastogi wrote:
 I shall rework to improve this patch. Below are the summarization of all
 discussions, which will be used as input for improving the patch:
 
 1. Method of degrading the synchronous mode:
   a. Expose the configuration variable to a new SQL-callable functions.
   b. Using ALTER SYSTEM SET.
   c. Auto-degrade using some sort of configuration parameter as done in 
 current patch.
   d. Or may be combination of above, which DBA can use depending on their 
 use-cases.  
 
   We can discuss further to decide on one of the approach.
 
 2. Synchronous mode should upgraded/restored after at-least one synchronous 
 standby comes up and has caught up with the master.
 
 3. A better monitoring/administration interfaces, which can be even better if 
 it is made as a generic trap system.
 
   I shall propose a better approach for this.
 
 4. Send committing clients, a WARNING if they have committed a synchronous 
 transaction and we are in degraded mode.
 
 5. Please add more if I am missing something.

I think we actually need two degrade modes:

A. degrade once: if the sync standby connection is ever lost, degrade
and do not resync.

B. reconnect: if the sync standby catches up again, return it to sync
status.

The reason you'd want degrade once is to avoid the flaky network
issue where you're constantly degrading then reattaching the sync
standby, resulting in horrible performance.

If we did offer degrade once though, we'd need some easy way to
determine that the master was in a state of permanent degrade, and a
command to make it resync.

Discuss?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I think I see some bugs in _bt_moveright(). If you examine
 _bt_finish_split() in detail, you'll see that it doesn't just drop the
 write buffer lock that the caller will have provided (per its
 comments) - it also drops the buffer pin. It calls _bt_insert_parent()
 last, which was previously only called last thing by _bt_insertonpg()
 (some of the time), and _bt_insertonpg() is indeed documented to drop
 both the lock and the pin. And if you look at _bt_moveright() again,
 you'll see that there is a tacit assumption that the buffer pin isn't
 dropped, or at least that opaque (the BTPageOpaque BT special page
 area pointer) continues to point to the same page held in the same
 buffer, even though the code doesn't set the opaque' pointer again
 and it may not point to a pinned buffer or even the appropriate
 buffer. Ditto page. So opaque may point to the wrong thing on
 subsequent iterations - you don't do anything with the return value of
 _bt_getbuf(), unlike all of the existing call sites. I believe you
 should store its return value, and get the held page and the special
 pointer on the page, and assign that to the opaque pointer before
 the next iteration (an iteration in which you very probably really do
 make progress not limited to completing a page split, the occurrence
 of which the _bt_moveright() loop gets stuck on). You know, do what
 is done in the non-split-handling case. It may not always be the same
 buffer as before, obviously.


 Yep, fixed.

Can you explain what the fix was, please?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-01-27 Thread Alvaro Herrera

Heikki Linnakangas wrote:

 I spent some time whacking this around, new patch version attached.
 I moved the mmap() code into a new function, that leaves the
 PGSharedMemoryCreate more readable.

Did this patch go anywhere?

Someone just pinged me about a kernel scalability problem in Linux with
huge pages; if someone did performance measurements with this patch,
perhaps it'd be good to measure again with the kernel patch in place.

https://lkml.org/lkml/2014/1/26/227

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-27 Thread Alexander Korotkov

Hi!

On Tue, Jan 21, 2014 at 3:24 AM, Marti Raudsepp ma...@juffo.org wrote:

 On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
 On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp ma...@juffo.org wrote:
  I've been trying it out in a few situations. I implemented a new
  enable_partialsort GUC to make it easier to turn on/off, this way it's
 a lot
  easier to test. The attached patch applies on top of
 partial-sort-5.patch
 
  I though about such option. Generally not because of testing convenience,
  but because of overhead of planning. This way you implement it is quite
  naive :)

 I don't understand. I had another look at this and cost_sort still
 seems like the best place to implement this, since that's where the
 patch decides how many pre-sorted columns to use. Both mergejoin and
 simple order-by plans call into it. If enable_partialsort=false then I
 skip all pre-sorted options except full sort, making cost_sort behave
 pretty much like it did before the patch.

 I could change pathkeys_common to return 0, but that seems like a
 generic function that shouldn't be tied to partialsort. The old code
 paths called pathkeys_contained_in anyway, which has similar
 complexity. (Apart for initial_cost_mergejoin, but that doesn't seem
 special enough to make an exception for).

 Or should I use?:
   enable_partialsort ? pathkeys_common(...) : 0

  For instance, merge join rely on partial sort which will be
  replaced with simple sort.

 Are you saying that enable_partialsort=off should keep
 partialsort-based mergejoins enabled?

 Or are you saying that merge joins shouldn't use simple sort at all?
 But merge join was already able to use full Sort nodes before your
 patch.


Sorry that I didn't explained it. In particular I mean following:
1) With enable_partialsort = off all mergejoin logic should behave as
without partial sort patch.
2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
function is much more expensive to execute. With enable_partialsort = off
it should be as cheap as without partial sort patch.
I'll try to implement this option in this week.
For now, I have attempt to fix extra columns in mergejoin problem. It would
be nice if you test it.

--
With best regards,
Alexander Korotkov.


partial-sort-7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

On Mon, Jan 27, 2014 at 10:58 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Okay, promise not to laugh. I did write a bunch of hacks, to generate
 graphviz .dot files from the btree pages, and render them into pictures. It
 consist of multiple parts, all in the attached tarball.

It's funny that you should say that, because I was thinking about
building something similar over the past few days. I find it very
useful to build ad-hoc tools like that, and I thought that it would be
particularly useful to have something visual for testing btree code.
Sure, you can come up with a test and keep the details straight in
your head while coordinating everything, but that won't scale as new
testing scenarios inevitably occur to you. I've done plenty of things
with contrib/pageinspect along these lines in the past, but this is
better. Anything that reduces the friction involved in building an
accurate mental model of things is a very good thing.



-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

On Mon, Jan 27, 2014 at 12:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 24 January 2014 08:33, Simon Riggs si...@2ndquadrant.com wrote:
 On 24 January 2014 07:08, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Jan 15, 2014 at 6:57 AM, Simon Riggs si...@2ndquadrant.com wrote:
 v15 to fix the above problem.

 v16 attached

 v17 attached

 This version adds a GUC called ddl_exclusive_locks which allows us to
 keep the 9.3 behaviour if we wish it. Some people may be surprised
 that their programs don't wait in the same places they used to. We
 hope that is a positive and useful behaviour, but it may not always be
 so.

 I'll commit this on Thurs 30 Jan unless I hear objections.

I haven't reviewed the patch, but -1 for adding a GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas robertmh...@gmail.com wrote:
 I haven't reviewed the patch, but -1 for adding a GUC.

I'm pretty surprised that it's been suggested that some people might
prefer AccessExclusiveLocks. Why would anyone prefer that?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core

Peter Geoghegan p...@heroku.com writes:
 On Sun, Jan 26, 2014 at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Meh.  This line of argument seems to reduce to we don't need to worry
 about performance of this code path because it won't be reached often.

 I think I may have over-elaborated, giving you the false impression
 that this was something I felt strongly about. I'm glad that the
 overhead has been shown to be quite low, and I think that lexing
 without the lock held will be fine.

OK.  Committed after a couple of small further revisions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extension_control_path

2014-01-27 Thread Dimitri Fontaine

Hi,

Sergey Muraviov sergey.k.murav...@gmail.com writes:
 Now patch applies cleanly and works. :-)

Cool ;-)

 But I have some notes:

 1. There is an odd underscore character in functions
 find_in_extension_control_path and list_extension_control_paths:
 \extension_control__path\

Fixed in the new version of the patch, attached.

 2. If we have several versions of one extension in different directories
 (which are listed in extension_control_path parameter) then we
 get strange output from pg_available_extensions and
 pg_available_extension_versions views (Information about extension, whose
 path is at the beginning of the list, is duplicated). And only one version
 of the extension can be created.

Fixed.

 3. It would be fine to see an extension control path
 in pg_available_extensions and pg_available_extension_versions views (in
 separate column or within of extension name).

I think the on-disk location is an implementation detail and decided in
the attached version not to change those system view definitions.

 4. Perhaps the CREATE EXTENSION command should be improved to allow
 creation of the required version of the extension.
 So we can use different versions of extensions in different databases.

Fixed in the attached.

I also fixed ALTER EXTENSION UPDATE to search for udpate scripts in the
same directory where the main control file is found, but I suspect this
part requires more thinking.

When we ALTER EXTENSION UPDATE we might now have several places where we
find extname.control files, with possibly differents default_version
properties.

In the attached, we select the directory containing the control file
where default_version matches the already installed extension version.
That matches with a model where the new version of the extension changes
the default_version in an auxiliary file.

We might want to instead match on the default_version in the control
file to match with the new version we are asked to upgrade to.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 5773,5778  SET XML OPTION { DOCUMENT | CONTENT };
--- 5773,5827 
  
   variablelist
  
+  varlistentry id=guc-extension-control-path xreflabel=extension_control_path
+   termvarnameextension_control_path/varname (typestring/type)/term
+   indexterm
+primaryvarnameextension_control_path/ configuration parameter/primary
+   /indexterm
+   indextermprimaryextension packaging//
+   listitem
+para
+ The command commandCREATE EXTENSION/ searches for the extension
+ control file in order to install it. The value
+ for varnameextension_control_path/varname is used to search for
+ the literalname.control/literal files.
+/para
+ 
+para
+ The value for varnameextension_control_path/varname must be a list
+ of absolute directory paths separated by colons (or semi-colons on
+ Windows). If a list element starts with the special
+ string literal$extdir/literal, the
+ compiled-in productnamePostgreSQL/productname package extension
+ directory is substituted for literal$extdir/literal; this is where
+ the extensions provided by the standard
+ productnamePostgreSQL/productname distribution are installed.
+ (Use literalpg_config --extdir/literal to find out the name of
+ this directory.) For example:
+ programlisting
+ extension_control_path = '/usr/local/postgresql/extension:/home/my_project:$extdir'
+ /programlisting
+ or, in a Windows environment:
+ programlisting
+ extension_control_path = 'C:\tools\postgresql;H:\my_project\lib;$extdir'
+ /programlisting
+/para
+ 
+para
+ The default value for this parameter is literal'$extdir'/literal.
+/para
+ 
+para
+ This parameter can be changed at run time by superusers, but a
+ setting done that way will only persist until the end of the
+ client connection, so this method should be reserved for
+ development purposes. The recommended way to set this parameter
+ is in the filenamepostgresql.conf/filename configuration
+ file.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-dynamic-library-path xreflabel=dynamic_library_path
termvarnamedynamic_library_path/varname (typestring/type)/term
indexterm
*** a/src/backend/commands/extension.c
--- b/src/backend/commands/extension.c
***
*** 25,30 
--- 25,31 
  
  #include dirent.h
  #include limits.h
+ #include sys/stat.h
  #include unistd.h
  
  #include access/htup_details.h
***
*** 60,71 
--- 61,76 
  bool		creating_extension = false;
  Oid			CurrentExtensionObject = InvalidOid;
  
+ /* GUC extension_control_path */
+ char   *Extension_control_path;
+ 
  /*
   *

Re: [HACKERS] Storing pg_stat_statements query texts externally, pg_stat_statements in core

On Mon, Jan 27, 2014 at 12:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 OK.  Committed after a couple of small further revisions.

Great. Thank you.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

Peter Geoghegan p...@heroku.com writes:
 On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas robertmh...@gmail.com wrote:
 I haven't reviewed the patch, but -1 for adding a GUC.

 I'm pretty surprised that it's been suggested that some people might
 prefer AccessExclusiveLocks. Why would anyone prefer that?

For one thing, so they can back this out if it proves to be broken,
as the last committed version was.  Given that this patch was marked
(by its author) as Ready for Committer without any review in the current
CF, I can't say that I have any faith in it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-27 Thread Alvaro Herrera

Andrew Dunstan escribió:

 Note that we can only do this when the result type stays the same.
 It does not for json_each/json_each_text or
 json_extract_path/json_extract_path_text, which is why we have
 different functions for those cases.

In C code, if I extract a value using json_object_field or
json_array_element, is there a way to turn it into the dequoted version,
that is, the value that I would have gotten had I called
json_object_field_text or json_array_element_text instead?

I wrote a quick and dirty hack in the event triggers patch that just
removes the outermost  and turns any \ into , but that's probably
incomplete.  Does jsonfuncs.c offer any way to do this?  That might be
useful for the crowd that cares about the detail being discussed in this
subthread.

Thanks,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

On 27 January 2014 20:35, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas robertmh...@gmail.com wrote:
 I haven't reviewed the patch, but -1 for adding a GUC.

 I'm pretty surprised that it's been suggested that some people might
 prefer AccessExclusiveLocks. Why would anyone prefer that?

Nobody has said prefer. I said...

 Some people may be surprised
 that their programs don't wait in the same places they used to. We
 hope that is a positive and useful behaviour, but it may not always be
 so.

We get the new behaviour by default and I expect we'll be very happy with it.

A second thought is that if we have problems of some kind in the field
as a result of the new lock modes then we will be able to turn them
off. I'm happy to fix any problems that occur, but that doesn't mean
there won't be any. If everybody is confident that we've foreseen
every bug, then no problem, lets remove it. I recall being asked to
add hot_standby = on | off for similar reasons.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs



On 01/27/2014 03:53 PM, Alvaro Herrera wrote:

Andrew Dunstan escribió:


Note that we can only do this when the result type stays the same.
It does not for json_each/json_each_text or
json_extract_path/json_extract_path_text, which is why we have
different functions for those cases.

In C code, if I extract a value using json_object_field or
json_array_element, is there a way to turn it into the dequoted version,
that is, the value that I would have gotten had I called
json_object_field_text or json_array_element_text instead?

I wrote a quick and dirty hack in the event triggers patch that just
removes the outermost  and turns any \ into , but that's probably
incomplete.  Does jsonfuncs.c offer any way to do this?  That might be
useful for the crowd that cares about the detail being discussed in this
subthread.




I'm not sure I understand the need. This is the difference between the 
_text variants and their parents. Why would you call json_object_field 
when you want the dequoted text?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-27 Thread Alvaro Herrera

Andrew Dunstan escribió:

 I'm not sure I understand the need. This is the difference between
 the _text variants and their parents. Why would you call
 json_object_field when you want the dequoted text?

Because I first need to know its type.  Sometimes it's an array, or an
object, or a boolean, and for those I won't call the _text version
afterwards but just use the original.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

On 27 January 2014 20:47, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan p...@heroku.com writes:
 On Mon, Jan 27, 2014 at 12:25 PM, Robert Haas robertmh...@gmail.com wrote:
 I haven't reviewed the patch, but -1 for adding a GUC.

 I'm pretty surprised that it's been suggested that some people might
 prefer AccessExclusiveLocks. Why would anyone prefer that?

 For one thing, so they can back this out if it proves to be broken,
 as the last committed version was.

Agreed

 Given that this patch was marked
 (by its author) as Ready for Committer without any review in the current
 CF

True. The main review happened in a previous commitfest and there was
a small addition for this CF.

It was my understanding that you wanted us to indicate that to allow
you to review. I am happy to set status differently, as you wish,
presumably back to needs review.


I can't say that I have any faith in it.

That's a shame.


-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

On 26 January 2014 12:58, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-25 20:26:16 -0800, Peter Geoghegan wrote:
 Shouldn't this patch be in the January commitfest?

 I think we previously concluded that there wasn't much chance to get
 this into 9.4 and there's significant work to be done on the patch
 before new reviews are required, so not submitting it imo makes sense.

I think we should make this a priority feature for 9.5

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

On Mon, Jan 27, 2014 at 4:45 AM, Andrew Dunstan and...@dunslane.net wrote:
 I personally don't give a tinker's cuss about whether the patch slows down
 pg_stat_statements a bit.

Why not? The assurance that the overhead is generally very low is what
makes it possible to install it widely usually without a second
thought. That's hugely valuable.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INTERVAL overflow detection is terribly broken

2014-01-27 Thread Bruce Momjian

On Sun, Jan 26, 2014 at 02:13:03PM +0100, Florian Pflug wrote:
 On Jan26, 2014, at 03:50 , Bruce Momjian br...@momjian.us wrote:
  Patch attached.
 
  +   if ((float)tm-tm_year * MONTHS_PER_YEAR + tm-tm_mon  INT_MAX)
  +   return -1;
 
 Is this bullet-proof? If float and int are both 32-bit, the float's mantissa
 will be less than 32-bit (24 or so, I think), and thus won't be able to
 represent INT_MAX accurately. This means there's a value of
 tm_year * MONTHS_PER_YEAR which is less than INT_MAX, yet for which
 tm_year * MONTHS_PER_YEAR + tm_mon will return tm_year * MONTHS_PER_YEAR
 unmodified if tm_mon is small enough (e.g, 1). But if tm_year * 
 MONTHS_PER_YEAR
 was close enough to INT_MAX, the actual value of
 tm_year * MONTHS_PER_YEAR + tm_mon might still be larger than INT_MAX,
 the floating-point computation just won't detect it. In that case, the
 check would succeed, yet the actual integer computation would still overflow.
 
 It should be safe with double instead of float.

You are correct.  I have adjusted the code to use double.

  +   if (SAMESIGN(span1-month, span2-month) 
  +   !SAMESIGN(result-month, span1-month))
 
 This assumes wrapping semantics for signed integral types, which isn't
 guaranteed by the C standard - it says overflows of signed integral types
 produce undefined results. We currently depend on wrapping semantics for
 these types in more places, and therefore need GCC's -fwrapv anway, but
 I still wonder if adding more of these kinds of checks is a good idea.

Well, I copied this from int.c, so what I did was to mention the other
function I got this code from.  If we ever change int.c we would need to
change this too.

The updated attached patch has overflow detection for interval
subtraction, multiply, and negate.  There are also some comparison
cleanups.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 6bf4cf6..b7d7d80
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*** SELECT E'\\xDEADBEEF';
*** 1587,1593 
 /row
 row
  entrytypeinterval [ replaceablefields/replaceable ] [ (replaceablep/replaceable) ]/type/entry
! entry12 bytes/entry
  entrytime interval/entry
  entry-17800 years/entry
  entry17800 years/entry
--- 1587,1593 
 /row
 row
  entrytypeinterval [ replaceablefields/replaceable ] [ (replaceablep/replaceable) ]/type/entry
! entry16 bytes/entry
  entrytime interval/entry
  entry-17800 years/entry
  entry17800 years/entry
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
new file mode 100644
index a61b40e..70284e9
*** a/src/backend/utils/adt/datetime.c
--- b/src/backend/utils/adt/datetime.c
*** DecodeInterval(char **field, int *ftype,
*** 2976,2981 
--- 2976,2983 
  	type = DTK_MONTH;
  	if (*field[i] == '-')
  		val2 = -val2;
+ 	if (((double)val * MONTHS_PER_YEAR + val2)  INT_MAX)
+ 		return DTERR_FIELD_OVERFLOW;
  	val = val * MONTHS_PER_YEAR + val2;
  	fval = 0;
  }
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
new file mode 100644
index 4581862..3b8e8e8
*** a/src/backend/utils/adt/timestamp.c
--- b/src/backend/utils/adt/timestamp.c
***
*** 41,46 
--- 41,47 
  #error -ffast-math is known to break this code
  #endif
  
+ #define SAMESIGN(a,b)   (((a)  0) == ((b)  0))
  
  /* Set at postmaster start */
  TimestampTz PgStartTime;
*** interval2tm(Interval span, struct pg_tm
*** 1694,1700 
  #ifdef HAVE_INT64_TIMESTAMP
  	tfrac = time / USECS_PER_HOUR;
  	time -= tfrac * USECS_PER_HOUR;
! 	tm-tm_hour = tfrac;		/* could overflow ... */
  	tfrac = time / USECS_PER_MINUTE;
  	time -= tfrac * USECS_PER_MINUTE;
  	tm-tm_min = tfrac;
--- 1695,1705 
  #ifdef HAVE_INT64_TIMESTAMP
  	tfrac = time / USECS_PER_HOUR;
  	time -= tfrac * USECS_PER_HOUR;
! 	tm-tm_hour = tfrac;
! 	if (!SAMESIGN(tm-tm_hour, tfrac))
! 		ereport(ERROR,
! (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
!  errmsg(interval out of range)));
  	tfrac = time / USECS_PER_MINUTE;
  	time -= tfrac * USECS_PER_MINUTE;
  	tm-tm_min = tfrac;
*** recalc:
*** 1725,1731 
  int
  tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span)
  {
! 	span-month = tm-tm_year * MONTHS_PER_YEAR + tm-tm_mon;
  	span-day = tm-tm_mday;
  #ifdef HAVE_INT64_TIMESTAMP
  	span-time = (tm-tm_hour * INT64CONST(60)) +
--- 1730,1740 
  int
  tm2interval(struct pg_tm * tm, fsec_t fsec, Interval *span)
  {
! 	double total_months = (double)tm-tm_year * MONTHS_PER_YEAR + tm-tm_mon;
! 
! 	if (total_months  INT_MAX || total_months  INT_MIN)
! 		return -1;
! 	span-month

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement



On 01/27/2014 04:37 PM, Peter Geoghegan wrote:

On Mon, Jan 27, 2014 at 4:45 AM, Andrew Dunstan and...@dunslane.net wrote:

I personally don't give a tinker's cuss about whether the patch slows down
pg_stat_statements a bit.

Why not? The assurance that the overhead is generally very low is what
makes it possible to install it widely usually without a second
thought. That's hugely valuable.





I care very much what the module does to the performance of all 
statements. But I don't care much if selecting from pg_stat_statements 
itself is a bit slowed. Perhaps I didn't express myself as clearly as I 
could have.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INTERVAL overflow detection is terribly broken

2014-01-27 Thread Bruce Momjian

On Mon, Jan 27, 2014 at 04:47:22PM -0500, Bruce Momjian wrote:
 The updated attached patch has overflow detection for interval
 subtraction, multiply, and negate.  There are also some comparison
 cleanups.

Oh, one odd thing about this patch.  I found I needed to use INT64_MAX,
but I don't see it used anywhere else in our codebase.  Is this OK?  Is
there a better way?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-27 Thread Dean Rasheed

On 25 January 2014 02:21, Florian Pflug f...@phlo.org wrote:
 I've now split this up into

 invtrans_base: Basic machinery plus tests with SQL-language aggregates
 invtrans_arith: COUNT(),SUM(),AVG(),STDDEV() and the like
 invtrans_minmax: MIN(),MAX(),BOOL_AND(),BOOL_OR()
 invtrans_collecting: ARRAY_AGG(), STRING_AGG()
 invtrans_docs: Documentation


Thanks. That makes it a bit easier to review, and hopefully easier to commit.

The thing that I'm currently trying to get my head round is the
null/not null, strict/not strict handling in this patch, which I find
a bit odd, particularly when transfn and inv_transfn have different
strictness settings:


strict transfn vs non-strict inv_transfn


This case is explicitly forbidden, both in CREATE AGGREGATE and in the
executor. To me, that seems overly restrictive --- if transfn is
strict, then you know for sure that no NULL values are added to the
aggregate state, so surely it doesn't matter whether or not
inv_transfn is strict. It will never be asked to remove NULL values.

I think there are definite use-cases where a user might want to use a
pre-existing function as the inverse transition function, so it seems
harsh to force them to wrap it in a strict function in this case.

AFAICS, the code in advance/retreat_windowaggregate should just work
if those checks are removed.


non-strict transfn vs strict inv_transfn


At first this seems as though it must be an error --- the forwards
transition function allows NULL values into the aggregate state, and
the inverse transition function is strict, so it cannot remove them.

But actually what the patch is doing in this case is treating the
forwards transition function as partially strict --- it won't be
passed NULL values (at least in a window context), but it may be
passed a NULL state in order to build the initial state when the first
non-NULL value arrives.

It looks like this behaviour is intended to support aggregates that
ignore NULL values, but cannot actually be made to have a strict
transition function. I think typically this is because the aggregate's
initial state is NULL and it's state type differs from the type of the
values being aggregated, and so can't be automatically created from
the first non-NULL value.

That all seems quite ugly though, because now you have a transition
function that is not strict, which is passed NULL values when used in
a normal aggregate context, and not passed NULL values when used in a
window context (whether sliding or not), except for the NULL state for
the first non-NULL value.

I'm not sure if there is a better way to do it though. If we disallow
this case, these aggregates would have to use non-strict functions for
both the forward and inverse transition functions, which means they
would have to implement their own non-NULL value counting.
Alternatively, allowing strict transition functions for these
aggregates would require that we provide some other way to initialise
the state from the first non-NULL input, such as a new initfn.

Thoughts?

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A minor correction in comment in heaptuple.c

2014-01-27 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 On Tue, Jun 18, 2013 at 11:04:25AM -0700, Kevin Grittner wrote:
 D'Arcy J.M. Cain da...@druid.net

 Although, the more I think about it, the more I think that the
 comment is both confusing and superfluous.  The code itself is
 much clearer.

 Seriously, if there is any comment there at all, it should be a
 succinct explanation for why we didn't do this

 Is everyone OK with me applying this patch from Kevin, attached?

I guess my intent was misunderstood -- what I was trying to get
across was that the comment added exactly nothing to what you could
get more quickly by reading the code below it.  I felt the comment
should either be removed entirely, or a concise explanation of why
it was right thing to do should be there rather than just echoing
the code in English.  I wasn't suggesting applying the mini-patch,
but suggesting that *if* we have a comment there at all, it should
make clear why such a patch would be wrong; i.e., why is it is OK
to have attnum  tupdesc-natts here?  How would we get to such a
state, and why is NULL the right thing for it?  Such a comment
would actually help someone trying to understand the code, rather
than wasting their time.  After all, in the same file we have this:

    /* Check for caller error */
    if (attnum = 0 || attnum  slot-tts_tupleDescriptor-natts)
    elog(ERROR, invalid attribute number %d, attnum);

An explanation of why it is caller error one place and not another
isn't a waste of space.

 (which passes `make check-world`)

And I was disappointed that our regression tests don't actually
exercise that code path, which would be another good way to make
the point.

So anyway, *I* would object to applying that; it was meant to
illustrate what the comment, if any, should cover; not to be an
actual code change.  I don't think the change that was pushed helps
that comment carry its own weight, either.  If we can't do better
than that, we should just drop it.

I guess I won't try to illustrate a point *that* particular way
again

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

On Mon, Jan 27, 2014 at 2:01 PM, Andrew Dunstan and...@dunslane.net wrote:
 I care very much what the module does to the performance of all statements.
 But I don't care much if selecting from pg_stat_statements itself is a bit
 slowed. Perhaps I didn't express myself as clearly as I could have.

Oh, I see. Of course the overhead of calling the pg_stat_statements()
function is much less important. Actually, I think that calling that
function is going to add a lot of noise to any benchmark aimed at
measuring added overhead as the counters struct is expanded.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INTERVAL overflow detection is terribly broken

Bruce Momjian br...@momjian.us writes:
 Oh, one odd thing about this patch.  I found I needed to use INT64_MAX,
 but I don't see it used anywhere else in our codebase.  Is this OK?  Is
 there a better way?

Most of the overflow tests in int.c and int8.c are coded to avoid relying
on the MIN or MAX constants; which seemed like better style at the time.
I'm not sure whether relying on INT64_MAX to exist is portable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

Hi Stephen,

Thanks for your comments.

 * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
  Is somebody available to volunteer to review the custom-scan patch?
 
 I looked through it a bit and my first take away from it was that the patches
 to actually use the new hooks were also making more changes to the backend
 code, leaving me with the impression that the proposed interface isn't
 terribly stable.  Perhaps those changes should have just been in the first
 patch, but they weren't and that certainly gave me pause.
 
Yes, the part-1 patch provides a set of interface portion to interact
between the backend code and extension code. Rest of part-2 and part-3
portions are contrib modules that implements its feature on top of
custom-scan API.

 I'm also not entirely convinced that this is the direction to go in when
 it comes to pushing down joins to FDWs.  While that's certainly a goal that
 I think we all share, this seems to be intending to add a completely different
 feature which happens to be able to be used for that.  For FDWs, wouldn't
 we only present the FDW with the paths where the foreign tables for that
 FDW, or perhaps just a given foreign server, are being joined?

FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU acceleration
feature on top of this interface, that offers alternative way to scan or join
relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement radix-sort on top
of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down)
is a demonstration of custom-scan interface for application, but not designed
for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
side, it might be an idea to have common code (like a logic to check whether
the both relations to be joined belongs to same foreign server) on the backend
side as something like a gateway of them.


As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement something on behalf of
a particular data structure being declared with CREATE FOREIGN TABLE.
In other words, extension's responsibility is to generate a view of something
according to PostgreSQL' internal data structure, instead of the object itself.
On the other hands, custom-scan interface allows extensions to implement
alternative methods to scan or join particular relations, but it is not a role
to perform as a target being referenced in queries. In other words, it is 
methods
to access objects.
It is natural both features are similar because both of them intends extensions
to hook the planner and executor, however, its purpose is different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race condition in b-tree page deletion

On Sat, Jan 18, 2014 at 11:45 AM, Kevin Grittner kgri...@ymail.com wrote:
 Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Here's a patch implementing this scheme.

I thought I'd weigh in here too, since this is closely tied to the
page split patch, which is dependent on this patch.

 The basic approach is sound.  The papers on which we based our
 btree implementation did not discuss entry deletion, and our
 current approach introduces new possible states into the on-disk
 index structure which are not covered by the papers and are not
 always handled correctly by the current code.

Lehman and Yao don't discuss deletion. But V. Lanin and D. Shasha. do,
in the paper A Symmetric Concurrent B-Tree Algorithm[1], as the
README notes - we currently follow a simplified version of that.
Apparently a number of people proposed solutions to address the
omission of LY, as one survey of those approaches notes [2]. One of
the approaches appearing in that survey is LS.

The classic LY algorithm only has right-links, and not left-links.
The same applies to LS. But I'd describe this patch as bringing what
we do closer to LS, which is more or less a refinement of B-Link
trees (that is, LY B-Trees). LS does have a concept of an outlink,
crucial to their symmetric deletion approach, which is something
that we don't need, because we already have left-links (we have them
primarily to support backwards index scans). LS says In general, the
link technique calls for processes that change sections of a data
structure in a way that would cause other processes to fail to provide
a link to another section of the data structure where recovery is
possible. So loosely speaking we're doing a conceptual
_bt_moveleft() for deletion as the inverse of a literal _bt_moveright
for insertion.

LS says a deletion needs no more than two write locks at a time
during its ascent (the descent is just a garden variety _bt_search()
insertion scankey descent). However, we currently may obtain as many
as 4 write locks for page deletion. As the README notes:


To delete an empty page, we acquire write lock on its left sibling (if
any), the target page itself, the right sibling (there must be one), and
the parent page, in that order.


We obtain 2 write locks in the first phase (on target and parent). In
the second phase, we do what is described above: lock 4 pages in that
order. However, the reason for this difference from the LS paper is
made clear in 2.5 Freeing Empty nodes, which kind of cops out of
addressing how to *unlink* empty/half-dead pages, with a couple of
brief descriptions of approaches that others have come up with that
are not at all applicable to us.

For that reason, might I suggest that you change this in the README:

+ Page Deletion
+ -

I suggest that it should read Page Deletion and Unlinking to
emphasize the distinction.

 The general feeling is that this patch is not suitable for
 back-patching.  I agree with this, but this is a bug which could
 lead to index corruption which exists in all supported branches.
 If nobody can suggest an alternative which we can back-patch, we
 might want to reconsider this after this has been in production
 releases for several months.

That was what I thought. Let's revisit the question of back-patching
this when 9.4 has been released for 6 months or so.

 Other than that, I have not found any problems.

Incidentally, during my research of these techniques, I discovered
that Johnson and Shasha argue that for most concurrent B-tree
applications, merge-at-empty produces significantly lower
restructuring rates, and only a slightly lower space utilization, than
merge-at-half. This is covered in the paper B-trees with inserts,
deletes, and modifies: Why Free-at-empty is Better than Merge-at-half
[3]. I was pleased to learn that there was a sound, well regarded
theoretical basis for that behavior, even if the worst-case bloat can
be unpleasant. Though having said that, that paper doesn't weigh what
we do in the event of non-HOT updates, and that could change the
equation. That isn't an argument against merge-at-empty; it's an
argument against the current handling of non-HOT updates.

Currently, the comments above _bt_doinsert() say at one point:

 *  This routine is called by the public interface routines, btbuild
 *  and btinsert.

This is obsolete; since commit 89bda95d, only the insert public
interface routine calls _bt_doinsert(). Perhaps fix this in passing.

[1] LS: 
https://archive.org/stream/symmetricconcurr00lani/symmetricconcurr00lani_djvu.txt

[2] http://www.dtic.mil/get-tr-doc/pdf?AD=ADA232287Location=U2doc=GetTRDoc.pdf
(Chapter 3, The Coherent Shared Memory Algorithm)

[3] http://cs.nyu.edu/shasha/papers/bjournal.ps
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] git-review: linking commits to review discussion in git

2014-01-27 Thread Murtuza Mukadam

Hello All,
  We have linked peer review discussions on
'pgsql-hackers' to their respective commits within the main
postgresql.git repository. You can view the linked reviews from 2012
until present in the GitHub repo at
https://github.com/mmukadam/postgres/tree/review

If you want to work with these reviews locally, you can use our
git-review tool. It allows you to create reviews and attach them to
commits in git. We didn't modify git, instead we added some scripts
that use standard git commands. git-review is beta, but since it only
adds a detached 'review' branch and modifies the contents of this
branch, it has minimal impact and can easily be removed by deleting
the 'review' branch and scripts.

The online man-page is here:
http://users.encs.concordia.ca/~m_mukada/git-review.html

In order to install git-review, you need to clone the repository:
https://github.com/mmukadam/git-review.git

The online tutorial is available here:
http://users.encs.concordia.ca/~m_mukada/git-review-tutorial.html

The clone of postgresql.git with linked review discussion is here (new
review discussion are linked nightly)
https://github.com/mmukadam/postgres

This work is part of my Master's thesis. If you'd like us to change
the tool to better suit your review process, have another git repo
you'd like us to link commits with review discussion, or have other
feedback, please let us know.

Cheers,
Murtuza


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog

2014-01-27 Thread Peter Eisentraut

On 11/30/13, 6:59 AM, Haribabu kommi wrote:
 To detect provided data and xlog directories are same or not, I reused the
 Existing make_absolute_path() code as follows.

I note that initdb does not detect whether the data and xlog directories
are the same.  I think there is no point in addressing this only in
pg_basebackup.  If we want to forbid it, it should be done in initdb
foremost.

I'm not sure it's worth the trouble, but if I were to do it, I'd just
stat() the two directories and compare their inodes.  That seems much
easier and more robust than comparing path strings.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

On Mon, Jan 27, 2014 at 4:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 26 January 2014 12:58, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-25 20:26:16 -0800, Peter Geoghegan wrote:
 Shouldn't this patch be in the January commitfest?

 I think we previously concluded that there wasn't much chance to get
 this into 9.4 and there's significant work to be done on the patch
 before new reviews are required, so not submitting it imo makes sense.

 I think we should make this a priority feature for 9.5

+1.  I can't think of many things we might do that would be more important.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add force option to dropdb

2014-01-27 Thread Sawada Masahiko

On 2014年1月17日 0:56, salah jubeh s_ju...@yahoo.com wrote:

If the user owns objects, that will prevent this from working also. I
have the feeling that adding DROP OWNED BY and/or REASSIGNED OWNED BY
calls to this utility would be a bit excessive, but who knows.

Please find attached the first attempt to drop drop the client connections.
I have added an option -k, --kill instead of force since killing client
connection does not guarantee -drop force-.
Regards

On Tuesday, January 14, 2014 8:06 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
salah jubeh wrote:

For the sake of completeness:
1. I think also, I need also to temporary disallow conecting to the
database, is that right?
2. Is there other factors can hinder dropping database?

3. Should I write two patches one for pg_version=9.2 and one for
pg_version9.2

No point -- nothing gets applied to branches older than current
development anyway.

Thank you for the patch.
And sorry for delay in reviewing.

I started to look this patch, So the following is first review comment.

- This patch is not patched to master branch
I tried to patch this patch file to master branch, but I got following error.
$ cd postgresql
$ patch -d. -p1 ../dropdb.patch
can't find fiel to patch at input line 3
Perhaps you used the wrong -p or --strip option?
the text leading up to this was:
--
|--- dropdb_org.c 2014-01-16
|+++ dropdb.c 2014-01-16
--

There is not dropdb_org.c. I think that you made mistake when the
patch is created.

- This patch is not according the coding rule
For example, line 71 of the patch:
//new connections are not allowed
It should be:
/* new connections are not allowed */
(Comment blocks that need specific line breaks should be formatted as
block comments, where the comment starts as /*--.)
Please refer to coding rule.
http://wiki.postgresql.org/wiki/Developer_FAQ#What.27s_the_formatting_style_used_in_PostgreSQL_source_code.3F

Regards,

---
Sawada Masahiko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Sawada Masahiko

Hi all,

Attached patch fixes the typo which is in src/backend/command/cluster.c.

Regards,

---
Sawada Masahiko


fix_typo-cluster.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c



On 01/27/2014 10:24 PM, Sawada Masahiko wrote:

Hi all,

Attached patch fixes the typo which is in src/backend/command/cluster.c.




Are you sure that's a typo? iff is usually short hand for if and only 
if.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-27 Thread Craig Ringer

On 01/28/2014 12:09 AM, Simon Riggs wrote:
 On 27 January 2014 15:04, Dean Rasheed dean.a.rash...@gmail.com wrote:
 
 So for example, when planning the query to update an inheritance
 child, the rtable will contain an RTE for the parent, but it will not
 be referenced in the parse tree, and so it will not be expanded while
 planning the child update.
 
 Am I right in thinking that we have this fully working now?

I haven't found any further problems, though I've been focusing more on
reworking RLS on top of it.

 AFAICS the only area of objection is the handling of inherited
 relations, which occurs within the planner in the current patch. I can
 see that would be a cause for concern since the planner is pluggable
 and it would then be possible to bypass security checks. Obviously
 installing a new planner isn't trivial, but doing so shouldn't cause
 collateral damage.

FWIW, I don't see any way _not_ to do that in RLS. If there are security
quals on a child table, they must be added, and that can only happen
once inheritance expansion happens. That's in the planner.

I don't see it as acceptable to ignore security quals on child tables,
and if we can't, we've got to do some work in the planner.

(I'm starting to really loathe inheritance).

 We have long had restrictions around updateable views. My suggestion
 from here is that we accept the restriction that we cannot yet have
 the 3-way combination of updateable views, security views and views on
 inherited tables.

That prevents the use of updatable security barrier views over
partitioned tables, and therefore prevents row-security use on inherited
tables.

That seems like a very big thing to close off. I'm perfectly happy
having that limitation for 9.4, we just need to make it possible to
remove the limitation later.

 Most people aren't using inherited tables

Again, because we (ab)use them for paritioning, I'm not sure they're as
little-used as I'd like.

 and people that are have
 special measures in place for their apps. We won't lose much by
 accepting that restriction for 9.4 and re-addressing the issue in a
 later release.

Yep, I'd be happy with that.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] pg_ctl always uses the same event source

On Mon, Jan 27, 2014 at 9:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 On Fri, Jan 24, 2014 at 9:10 AM, Amit Kapila amit.kapil...@gmail.com wrote:

 To proceed with the review of this patch, I need to know about
 whether appending version number or any other constant togli

 Default Event Source name is acceptable or not, else for now
 we can remove this part of code from patch and handle non-default
 case where the change will be that pg_ctl will enquire non-default
 event_source value from server.

 Could you please let me know your views about same?

 Unless I'm missing something, this entire thread is a tempest in a teapot,
 because the default event_source value does not matter, because *by
 default we don't log to eventlog*.  The presumption is that if the user
 turns on logging to eventlog, it's his responsibility to first make sure
 that event_source is set to something appropriate.  And who's to say that
 plain PostgreSQL isn't what he wanted, anyway?  Even if he's got
 multiple servers on one machine, maybe directing all their logs to the
 same place is okay by him.

I think it's matter of user preference, how exactly he wants the setup
and as currently we don't have any strong reason to change default, so
lets keep it intact.

 Also, those who don't run multiple servers are probably not going to
 thank us for moving their logs around unnecessarily.

 In short, I think we should just reject this idea as introducing more
 problems than it solves, and not fully solving even the problem it
 purports to solve.

 Possibly there's room for a documentation patch reminding users to
 make sure that event_source is set appropriately before they turn
 on eventlog.

 Okay, but in that case also right now pg_ctl doesn't know the value
 of event source, so I think thats a clear bug and we should go ahead
 and fix it.

 As you said, I think we can improve documentation in this regard so
 that user will be able to setup event log without any such problems.

 As part of this patch we can fix the issue (make pg_ctl aware for event
 source name) and improve documentation.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

On Mon, Jan 27, 2014 at 12:03 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 I think that's a good thing to try.  Can you code it up?

 I have tried to improve algorithm in another way so that we can get
 benefit of same chunks during find match (something similar to lz).
 The main change is to consider chunks at fixed boundary (4 byte)
 and after finding match, try to find if there is a longer match than
 current chunk. While finding longer match, it still takes care that
 next bigger match should be at chunk boundary. I am not
 completely sure about the chunk boundary may be 8 or 16 can give
 better results.

 I think now we can once run with this patch on high end m/c.

Here are the results I got on the community PPC64 box.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgrb-v5 test.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-27 Thread Marti Raudsepp

On Mon, Jan 27, 2014 at 9:26 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 For now, I have attempt to fix extra columns in mergejoin problem. It would
 be nice if you test it.

Yes, it solves the test cases I was trying with, thanks.

 1) With enable_partialsort = off all mergejoin logic should behave as
 without partial sort patch.
 2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
 function is much more expensive to execute. With enable_partialsort = off it
 should be as cheap as without partial sort patch.

When it comes to planning time, I really don't think you should
bother. The planner enable_* settings are meant for troubleshooting,
debugging and learning about the planner. You should not expect people
to disable them in a production setting. It's not worth complicating
the code for that rare case.

This is stated in the documentation
(http://www.postgresql.org/docs/current/static/runtime-config-query.html)
and repeatedly on the mailing lists.

But some benchmarks of planning performance are certainly warranted.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Gavin Flower


On 28/01/14 16:33, Andrew Dunstan wrote:


On 01/27/2014 10:24 PM, Sawada Masahiko wrote:

Hi all,

Attached patch fixes the typo which is in 
src/backend/command/cluster.c.





Are you sure that's a typo? iff is usually short hand for if and 
only if.


cheers

andrew



Certainly, that is how I would interpret it.

I came across that abbreviation in a first years Maths course 
Principles of Mathematics in 1968 at the University of Auckland..



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INTERVAL overflow detection is terribly broken

2014-01-27 Thread Bruce Momjian

On Mon, Jan 27, 2014 at 07:19:21PM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Oh, one odd thing about this patch.  I found I needed to use INT64_MAX,
  but I don't see it used anywhere else in our codebase.  Is this OK?  Is
  there a better way?
 
 Most of the overflow tests in int.c and int8.c are coded to avoid relying
 on the MIN or MAX constants; which seemed like better style at the time.

Yes, I looked at those but they seemed like overkill for interval.  For
a case where there was an int64 multiplied by a double, then cast back
to an int64, I checked the double against INT64_MAX before casting to an
int64.

 I'm not sure whether relying on INT64_MAX to exist is portable.

The only use I found was in pgbench:

#ifndef INT64_MAX
#define INT64_MAX   INT64CONST(0x7FFF)
#endif

so I have just added that to my patch, and INT64_MIN:

#ifndef INT64_MIN
#define INT64_MIN   (-INT64CONST(0x7FFF) - 1)
#endif

This is only used for HAVE_INT64_TIMESTAMP.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix comment typo in /src/backend/command/cluster.c

2014-01-27 Thread Sawada Masahiko


 On 01/27/2014 10:24 PM, Sawada Masahiko wrote:

 Hi all,

 Attached patch fixes the typo which is in src/backend/command/cluster.
 c.



 Are you sure that's a typo? iff is usually short hand for if and only
 if.


Oops, I made mistake.
Thanks!

Regards,

-
Masahiko Sawada


-- 
Regards,

---
Sawada Masahiko

[HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-27 Thread Fujii Masao

Hi,

The files in pg_stat_tmp directory don't need to be backed up because they are
basically reset at the archive recovery. So I think it's worth
changing pg_basebackup
so that it skips any files in pg_stat_tmp directory. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

2014-01-27 Thread Etsuro Fujita


(2014/01/28 0:55), Atri Sharma wrote:

On 27-Jan-2014, at 21:03, David Fetter da...@fetter.org wrote:

On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote:
Hi Hanada-san,

While still reviwing this patch, I feel this patch has given enough
consideration to interactions with other commands, but found the
following incorrect? behabior:

postgres=# CREATE TABLE product (id INTEGER, description TEXT);
CREATE TABLE
postgres=# CREATE FOREIGN TABLE product1 () INHERITS (product)
SERVER fs OPTIONS (filename '/home/foo/product1.csv', format 'csv');
CREATE FOREIGN TABLE
postgres=# ALTER TABLE product ALTER COLUMN description SET STORAGE
EXTERNAL;
ERROR:  product1 is not a table or materialized view

ISTN the ALTER TABLE simple recursion mechanism (ie
ATSimpleRecursion()) should be modified for the ALTER COLUMN SET
STORAGE case.


This points to a larger discussion about what precisely foreign tables
can and cannot inherit from local ones.  I don't think that a generic
solution will be satisfactory, as the PostgreSQL FDW could, at least
in principle, support many more than the CSV FDW, as shown above.

In my estimation, the outcome of discussion above is not a blocker for
this


I just thought that among the structures that local tables can alter, 
the ones that foreign tables also can by ALTER FOREIGN TABLE are 
inherited, and the others are not inherited.  So for the case as shown 
above, I thought that we silently ignore executing the ALTER COLUMN SET 
STORAGE command for the foreign table.  I wonder that would be the first 
step.



I wonder what shall be the cases when foreign table is on a server which does 
not support *all* SQL features.

Does a FDW need to have the possible inherit options mentioned in its 
documentation for this patch?


The answer is no, in my understanding.  The altering operation simply 
declares some chages for foreign tables in the inheritance tree and does 
nothing to the underlying storages.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-27 Thread Michael Paquier

On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Hi,

 The files in pg_stat_tmp directory don't need to be backed up because they are
 basically reset at the archive recovery. So I think it's worth
 changing pg_basebackup
 so that it skips any files in pg_stat_tmp directory. Thought?
Skipping pgstat_temp_directory in basebackup.c would make more sense
than directly touching pg_basebackup.
My 2c.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

  AFAICS the only area of objection is the handling of inherited
  relations, which occurs within the planner in the current patch. I can
  see that would be a cause for concern since the planner is pluggable
  and it would then be possible to bypass security checks. Obviously
  installing a new planner isn't trivial, but doing so shouldn't cause
  collateral damage.
 
 FWIW, I don't see any way _not_ to do that in RLS. If there are security
 quals on a child table, they must be added, and that can only happen once
 inheritance expansion happens. That's in the planner.
 
 I don't see it as acceptable to ignore security quals on child tables, and
 if we can't, we've got to do some work in the planner.
 
 (I'm starting to really loathe inheritance).

Let me ask an elemental question. What is the reason why inheritance
expansion logic should be located on the planner stage, not on the tail
of rewriter?
Reference to a relation with children is very similar to reference of
multiple tables using UNION ALL. Isn't it a crappy idea to move the
logic into rewriter stage (if we have no technical reason here)?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Craig Ringer
 Sent: Tuesday, January 28, 2014 12:35 PM
 To: Simon Riggs; Dean Rasheed
 Cc: Kaigai, Kouhei(海外, 浩平); Tom Lane; PostgreSQL Hackers; Kohei KaiGai;
 Robert Haas
 Subject: Re: [HACKERS] WIP patch (v2) for updatable security barrier views
 
 On 01/28/2014 12:09 AM, Simon Riggs wrote:
  On 27 January 2014 15:04, Dean Rasheed dean.a.rash...@gmail.com wrote:
 
  So for example, when planning the query to update an inheritance
  child, the rtable will contain an RTE for the parent, but it will not
  be referenced in the parse tree, and so it will not be expanded while
  planning the child update.
 
  Am I right in thinking that we have this fully working now?
 
 I haven't found any further problems, though I've been focusing more on
 reworking RLS on top of it.
 
  AFAICS the only area of objection is the handling of inherited
  relations, which occurs within the planner in the current patch. I can
  see that would be a cause for concern since the planner is pluggable
  and it would then be possible to bypass security checks. Obviously
  installing a new planner isn't trivial, but doing so shouldn't cause
  collateral damage.
 
 FWIW, I don't see any way _not_ to do that in RLS. If there are security
 quals on a child table, they must be added, and that can only happen once
 inheritance expansion happens. That's in the planner.
 
 I don't see it as acceptable to ignore security quals on child tables, and
 if we can't, we've got to do some work in the planner.
 
 (I'm starting to really loathe inheritance).
 
  We have long had restrictions around updateable views. My suggestion
  from here is that we accept the restriction that we cannot yet have
  the 3-way combination of updateable views, security views and views on
  inherited tables.
 
 That prevents the use of updatable security barrier views over partitioned
 tables, and therefore prevents row-security use on inherited tables.
 
 That seems like a very big thing to close off. I'm perfectly happy having
 that limitation for 9.4, we just need to make it possible to remove the
 limitation later.
 
  Most people aren't using inherited tables
 
 Again, because we (ab)use them for paritioning, I'm not sure they're as
 little-used as I'd like.
 
  and people that are have
  special measures in place for their apps. We won't lose much by
  accepting that restriction for 9.4 and re-addressing the issue in a
  later release.
 
 Yep, I'd be happy with that.
 
 
 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
 changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] inherit support for foreign tables

  I wonder what shall be the cases when foreign table is on a server which
 does not support *all* SQL features.
 
  Does a FDW need to have the possible inherit options mentioned in its
 documentation for this patch?
 
 The answer is no, in my understanding.  The altering operation simply
 declares some chages for foreign tables in the inheritance tree and does
 nothing to the underlying storages.

It seems to me Atri mention about the idea to enforce constraint when
partitioned foreign table was referenced...

BTW, isn't it feasible to consign FDW drivers its decision?
If a foreign table has a check constraint (X BETWEEN 101 AND 200),
postgres_fdw will be capable to run this check on the remote server,
however, file_fdw will not be capable. It is usual job of them when
qualifiers are attached on Path node.
Probably, things to do is simple. If the backend appends the configured
check constraint as if a user-given qualifier, FDW driver will handle it
appropriately.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita
 Sent: Tuesday, January 28, 2014 1:08 PM
 To: Atri Sharma; David Fetter
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] inherit support for foreign tables
 
 (2014/01/28 0:55), Atri Sharma wrote:
  On 27-Jan-2014, at 21:03, David Fetter da...@fetter.org wrote:
  On Mon, Jan 27, 2014 at 05:06:19PM +0900, Etsuro Fujita wrote:
  Hi Hanada-san,
 
  While still reviwing this patch, I feel this patch has given enough
  consideration to interactions with other commands, but found the
  following incorrect? behabior:
 
  postgres=# CREATE TABLE product (id INTEGER, description TEXT);
  CREATE TABLE postgres=# CREATE FOREIGN TABLE product1 () INHERITS
  (product) SERVER fs OPTIONS (filename '/home/foo/product1.csv',
  format 'csv'); CREATE FOREIGN TABLE postgres=# ALTER TABLE product
  ALTER COLUMN description SET STORAGE EXTERNAL;
  ERROR:  product1 is not a table or materialized view
 
  ISTN the ALTER TABLE simple recursion mechanism (ie
  ATSimpleRecursion()) should be modified for the ALTER COLUMN SET
  STORAGE case.
 
  This points to a larger discussion about what precisely foreign
  tables can and cannot inherit from local ones.  I don't think that a
  generic solution will be satisfactory, as the PostgreSQL FDW could,
  at least in principle, support many more than the CSV FDW, as shown above.
 
  In my estimation, the outcome of discussion above is not a blocker
  for this
 
 I just thought that among the structures that local tables can alter, the
 ones that foreign tables also can by ALTER FOREIGN TABLE are inherited,
 and the others are not inherited.  So for the case as shown above, I thought
 that we silently ignore executing the ALTER COLUMN SET STORAGE command for
 the foreign table.  I wonder that would be the first step.
 
  I wonder what shall be the cases when foreign table is on a server which
 does not support *all* SQL features.
 
  Does a FDW need to have the possible inherit options mentioned in its
 documentation for this patch?
 
 The answer is no, in my understanding.  The altering operation simply
 declares some chages for foreign tables in the inheritance tree and does
 nothing to the underlying storages.
 
 Thanks,
 
 Best regards,
 Etsuro Fujita
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
 changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Retain dynamic shared memory segments for postmaster lifetime

On Mon, Jan 27, 2014 at 7:48 PM, Amit Langote amitlangot...@gmail.com wrote:
 On Mon, Jan 13, 2014 at 2:50 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 I have extended test (contrib) module dsm_demo such that now user
 can specify during dsm_demo_create the lifespan of segment.

 Applied dsm_keep_segment_v1.patch and dsm_demo_v1.patch.
 Got the following warning when I tried above example:

 postgres=# select dsm_demo_create('this message is from session-new', 1);
 WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
 WARNING:  dynamic shared memory leak: segment 1402373971 still referenced
  dsm_demo_create
 -
   1402373971
 (1 row)

Thanks for verification.
The reason is that resource owner has reference to segment till commit time
which leads to this warning. The fix is to remove reference from
resource owner when user calls this new API dsm_keep_segment, similar
to what currently dsm_keep_mapping does.

Find the updated patch to fix this problem attached with this mail.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


dsm_keep_segment_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and pg_stat_tmp directory

2014-01-27 Thread Fujii Masao

On Tue, Jan 28, 2014 at 1:08 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Tue, Jan 28, 2014 at 12:56 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Hi,

 The files in pg_stat_tmp directory don't need to be backed up because they 
 are
 basically reset at the archive recovery. So I think it's worth
 changing pg_basebackup
 so that it skips any files in pg_stat_tmp directory. Thought?
 Skipping pgstat_temp_directory in basebackup.c would make more sense
 than directly touching pg_basebackup.
 My 2c.

Yeah, that's what I was thinking :)

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and pg_stat_tmp directory