date:20140722

Re: [HACKERS] gaussian distribution pgbench

2014-07-22 Thread Fabien COELHO



Please find attached 2 patches, which are a split of the patch discussed in 
this thread.


Please find attached a very minor improvement to apply a code (variable 
name) simplification directly in patch A so as to avoid a change in patch 
B. The cumulated patch is the same as previous.



(A) add gaussian  exponential options to pgbench \setrandom
   the patch includes sql test files.

There is no change in the *code* from previous already reviewed submissions, 
so I do not think that it needs another review on that account.


However I have (yet again) reworked the *documentation* (for Andres Freund  
Robert Haas), in particular both descriptions now follow the same structure 
(introduction, formula, intuition, rule of thumb and constraint). I have 
differentiated the concept and the option by putting the later in literal 
tags, and added a link to the corresponding wikipedia pages.



Please bear in mind that:
1. English is not my native language.
2. this is not easy reading... this is maths, to read slowly:-)
3. word smithing contributions are welcome.

I assume somehow that a user interested in gaussian  exponential 
distributions must know a little bit about probabilities...




(B) add pgbench test variants with gauss  exponential.

I have reworked the patch so as to avoid copy pasting the 3 test cases, as 
requested by Andres Freund, thus this is new, although quite simple, code. I 
have also added explanations in the documentation about how to interpret the 
decile outputs, so as to hopefully address Robert Haas comments.


--
Fabien.diff --git a/contrib/pgbench/README b/contrib/pgbench/README
new file mode 100644
index 000..6881256
--- /dev/null
+++ b/contrib/pgbench/README
@@ -0,0 +1,5 @@
+# gaussian and exponential tests
+# with XXX as expo or gauss
+psql test  test-init.sql
+./pgbench -M prepared -f test-XXX-run.sql -t 100 -P 1 -n test
+psql test  test-XXX-check.sql
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 4aa8a50..379ef24 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -41,6 +41,7 @@
 #include math.h
 #include signal.h
 #include sys/time.h
+#include assert.h
 #ifdef HAVE_SYS_SELECT_H
 #include sys/select.h
 #endif
@@ -98,6 +99,8 @@ static int	pthread_join(pthread_t th, void **thread_return);
 #define LOG_STEP_SECONDS	5	/* seconds between log messages */
 #define DEFAULT_NXACTS	10		/* default nxacts */
 
+#define MIN_GAUSSIAN_THRESHOLD		2.0	/* minimum threshold for gauss */
+
 int			nxacts = 0;			/* number of transactions per client */
 int			duration = 0;		/* duration in seconds */
 
@@ -471,6 +474,76 @@ getrand(TState *thread, int64 min, int64 max)
 	return min + (int64) ((max - min + 1) * pg_erand48(thread-random_state));
 }
 
+/*
+ * random number generator: exponential distribution from min to max inclusive.
+ * the threshold is so that the density of probability for the last cut-off max
+ * value is exp(-threshold).
+ */
+static int64
+getExponentialrand(TState *thread, int64 min, int64 max, double threshold)
+{
+	double cut, uniform, rand;
+	assert(threshold  0.0);
+	cut = exp(-threshold);
+	/* erand in [0, 1), uniform in (0, 1] */
+	uniform = 1.0 - pg_erand48(thread-random_state);
+	/*
+	 * inner expresion in (cut, 1] (if threshold  0),
+	 * rand in [0, 1)
+	 */
+	assert((1.0 - cut) != 0.0);
+	rand = - log(cut + (1.0 - cut) * uniform) / threshold;
+	/* return int64 random number within between min and max */
+	return min + (int64)((max - min + 1) * rand);
+}
+
+/* random number generator: gaussian distribution from min to max inclusive */
+static int64
+getGaussianrand(TState *thread, int64 min, int64 max, double threshold)
+{
+	double		stdev;
+	double		rand;
+
+	/*
+	 * Get user specified random number from this loop, with
+	 * -threshold  stdev = threshold
+	 *
+	 * This loop is executed until the number is in the expected range.
+	 *
+	 * As the minimum threshold is 2.0, the probability of looping is low:
+	 * sqrt(-2 ln(r)) = 2 = r = e^{-2} ~ 0.135, then when taking the average
+	 * sinus multiplier as 2/pi, we have a 8.6% looping probability in the
+	 * worst case. For a 5.0 threshold value, the looping probability
+	 * is about e^{-5} * 2 / pi ~ 0.43%.
+	 */
+	do
+	{
+		/*
+		 * pg_erand48 generates [0,1), but for the basic version of the
+		 * Box-Muller transform the two uniformly distributed random numbers
+		 * are expected in (0, 1] (see http://en.wikipedia.org/wiki/Box_muller)
+		 */
+		double rand1 = 1.0 - pg_erand48(thread-random_state);
+		double rand2 = 1.0 - pg_erand48(thread-random_state);
+
+		/* Box-Muller basic form transform */
+		double var_sqrt = sqrt(-2.0 * log(rand1));
+		stdev = var_sqrt * sin(2.0 * M_PI * rand2);
+
+		/*
+ 		 * we may try with cos, but there may be a bias induced if the previous
+		 * value fails the test? To be on the safe side, let us try over.
+		 */
+	}
+	while (stdev  -threshold || stdev = threshold);
+
+	/* stdev is in [-threshold, threshold), normalization to [0,1)

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Andres Freund

On 2014-07-22 10:09:04 +0900, MauMau wrote:
 From: Andres Freund and...@2ndquadrant.com
 On 2014-07-18 23:38:09 +0900, MauMau wrote:
 So, I propose a simple fix to change the LOG level to DEBUG1.  I don't
 know
 which of DEBUG1-DEBUG5 is appropriate, and any level is OK.  Could you
 include this in 9.2.9?
 
 Surely that's the wrong end to tackle this from. Hiding actual problems
 is a seriously bad idea.
 
 No, there is no serious problem in the user operation in this situation.
 Server crash cannot be avoided, and must be anticipated.  The problem is
 that PostgreSQL makes users worried about lots of (probably) unnecessary
 messages.
 
 Is there any problem if we don't output the message?

Yes. The user won't know that possibly gigabytes worth of diskspace
aren't freed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.extra_warnings='num_into_expressions'

2014-07-22 Thread Marko Tiikkaja


On 7/22/14, 7:06 AM, Pavel Stehule wrote:

I looked on this patch and I am thinking so it is not a good idea. It
introduce  early dependency between functions and pg_class based objects.


What dependency?  The patch only looks at the raw parser output, so it 
won't e.g. know whether  SELECT * INTO a, b FROM foo;  is problematic or 
not.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-22 Thread Heikki Linnakangas


On 07/21/2014 10:47 PM, Anastasia Lubennikova wrote:

Hi, hackers!
There are new results of my work on GSoC project Index-only scans for
GIST.
Previous post is here:
http://postgresql.1045698.n5.nabble.com/Index-only-scans-for-GIST-td5804892.html

Repository is
https://github.com/lubennikovaav/postgres/tree/indexonlygist2
Patch is in attachments.
It includes indexonlyscan for multicolumn GiST. It works correctly -
results are the same with another scan plans.
Fetch() method is realized for box and circle opclasses
Improvement for circle opclass is not such distinct as for box opclass,
because of recheck.


For a circle, the GiST index stores a bounding box of the circle. The 
new fetch function reverses that, calculating the radius and center of 
the circle from the bounding box.


Those conversions lose some precision due to rounding. Are we okay with 
that? Floating point math is always subject to rounding errors, but 
there's a good argument to be made that it's not acceptable to get a 
different value back when the user didn't explicitly invoke any math 
functions.


As an example:

create table circle_tbl (c circle);
create index circle_tbl_idx on circle_tbl using gist (c);
insert into circle_tbl values ('1.23456789012345,1.23456789012345,1e300');

postgres=# set enable_seqscan=off; set enable_bitmapscan=off; set 
enable_indexonlyscan=on;

SET
SET
SET
postgres=# select * from circle_tbl ;
   c

 (0,0),1e+300
(1 row)

postgres=# set enable_indexonlyscan=off;
SET
postgres=# select * from circle_tbl ;
  c
--
 (1.23456789012345,1.23456789012345),1e+300
(1 row)

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.extra_warnings='num_into_expressions'

2014-07-22 Thread Pavel Stehule

2014-07-22 8:52 GMT+02:00 Marko Tiikkaja ma...@joh.to:

 On 7/22/14, 7:06 AM, Pavel Stehule wrote:

 I looked on this patch and I am thinking so it is not a good idea. It
 introduce  early dependency between functions and pg_class based objects.


 What dependency?  The patch only looks at the raw parser output, so it
 won't e.g. know whether  SELECT * INTO a, b FROM foo;  is problematic or
 not.


I am sorry, I was confused

There is dependencty in variable type, but this dependency is not new.

Regards

Pavel






 .marko

Re: [HACKERS] Production block comparison facility

2014-07-22 Thread Michael Paquier

On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The block comparison facility presented earlier by Heikki would not be
 able to be used in production systems. ISTM that it would be desirable
 to have something that could be used in that way.

 ISTM easy to make these changes

 * optionally generate a FPW for every WAL record, not just first
 change after checkpoint
 full_page_writes = 'always'

 * when an FPW arrives, optionally run a check to see if it compares
 correctly against the page already there, when running streaming
 replication without a recovery target. We could skip reporting any
 problems until the database is consistent
 full_page_write_check = on

 The above changes seem easy to implement.

 With FPW compression, this would be a usable feature in production.

 Comments?

This is an interesting idea, and it would be easier to use than what
has been submitted for CF1. However, full_page_writes set to always
would generate a large amount of WAL even for small records,
increasing I/O for the partition holding pg_xlog, and the frequency of
checkpoints run on system. Is this really something suitable for
production?
Then, looking at the code, we would need to tweak XLogInsert for the
WAL record construction to always do a FPW and to update
XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
do some extra operations in the area of
RestoreBackupBlock/RestoreBackupBlockContents, including masking
operations before comparing the content of the FPW and the current
page.

Does that sound right?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread MauMau


From: Andres Freund and...@2ndquadrant.com

On 2014-07-22 10:09:04 +0900, MauMau wrote:

Is there any problem if we don't output the message?


Yes. The user won't know that possibly gigabytes worth of diskspace
aren't freed.


RemovePgTempFiles() frees the disk space by removing temp relation files at 
server start.  In addition, the temp relation files are not replicated to 
the standby server of streaming replication (this is the customer's case). 
So, those messages seem no more than just the noise.


With this, are those messages really necessary?

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] parametric block size?

2014-07-22 Thread Fabien



Hello devs,

The default blocksize is currently 8k, which is not necessary optimal for 
all setup, especially with SSDs where the latency is much lower than HDD.


There is a case for different values with significant impact on 
performance (up to a not-to-be-sneezed-at 10% on a pgbench run on SSD, see 
http://www.cybertec.at/postgresql-block-sizes-getting-started/), and ISTM 
that the ability to align PostgreSQL block size to the underlying FS/HW 
block size would be nice.


This is currently possible, but it requires recompiling and maintaining 
distinct executables for various block sizes. This is annoying, thus most 
admins will not bother.


ISTM that a desirable and reasonably simple to implement feature would be 
to be able to set the blocksize at initdb time, and postgres could use 
the value found in the database instead of a compile-time one.


More advanced features, but with much more impact on the code, would be to 
be able to change the size at database/table level.


Any thoughts?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-22 Thread Kyotaro HORIGUCHI

Sorry , previous version has bugs. It stamps over the stack and
crashesX( The attached is the bug fixed version, with no
substantial changes.

 On Tue, Jul 15, 2014 at 2:17 PM, Kyotaro HORIGUCHI 
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 
  Hi, the attached is the revised version.
 
 Thanks Horiguchi-San for the updated patch.

# By the way, this style of calling a person is quite common
# among Japanese since the first-name basis implies very close
# relationship or it frequently conveys offensive shade. But I'm
# not sure what should I call others who're not Japases native.

Anyway,

 Today while looking into updated patch, I was wondering why can't
 we eliminate useless keys in query_pathkeys when we actually build
 the same in function standard_qp_callback(), basically somewhat
 similar to what we do in
 standard_qp_callback-make_pathkeys_for_sortclauses-pathkey_is_redundant().

I agree that standard_qp_callback is basically more preferable.

 We already have index information related to base_rels before building
 query pathkeys.  I noticed that you mentioned the below in your original
 mail which indicates that information related to inheritance tables is built
 only after set_base_rel_sizes()
 
 These steps take place between set_base_rel_sizes() and
 set_base_rel_pathlists() in make_one_rel(). The reason for the
 position is that the step 2 above needs all inheritence tables to
 be extended in PlannerInfo and set_base_rel_sizes (currently)
 does that.

Sorry, my memory had been worn down. After some reconfirmation,
this description found to be a bit (quite?) wrong.

collect_common_primary_pathkeys needs root-eq_classes to be set
up beforehand, not appendrels. Bacause build_index_pathkeys
doesn't work before every EC member for all relation involved is
already registered.

standard_qp_callback registers EC members in the following path
but only for the primary(?) tlist of the subquery, so EC members
for the child relations are not registered at the time.

.-make_pathekeys_sortclauses-make_pathkey_from_sortop
   -make_pathkey_from_sortinfo-get_eclass_for_sort_expr

EC members for the child rels are registered in

 set_base_rel_sizes-set_rel_size-set_append_rel_size
   -add_child_rel_equivalences

So trim_query_pathkeys (collect_common...) doesn't work before
set_base_rel_sizes(). These EC members needed to be registered at
least if root-query_pathkeys exists so this seems to be done in
standard_qp_callback adding a separate inheritance tree walk.

# rel-has_elcass_joins seems not working but it is not the topic
# here.

 Could you please explain me why the index information built in above
 path is not sufficient or is there any other case which I am missing?

Is the description above enough to be the explaination for the
place? Sorry for the inaccurate description.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 9573a9b..546e112 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1789,9 +1789,11 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 	/* indexprs is redundant since we print indextlist */
 	WRITE_NODE_FIELD(indpred);
 	WRITE_NODE_FIELD(indextlist);
+	/* cached pathkeys are omitted as indexkeys */
 	WRITE_BOOL_FIELD(predOK);
 	WRITE_BOOL_FIELD(unique);
 	WRITE_BOOL_FIELD(immediate);
+	WRITE_BOOL_FIELD(allnotnull);
 	WRITE_BOOL_FIELD(hypothetical);
 	/* we don't bother with fields copied from the pg_am entry */
 }
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index c81efe9..c12d0e7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -58,6 +58,7 @@ int			geqo_threshold;
 join_search_hook_type join_search_hook = NULL;
 
 
+static List *collect_common_primary_pathkeys(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
@@ -66,6 +67,7 @@ static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  Index rti, RangeTblEntry *rte);
 static void set_plain_rel_size(PlannerInfo *root, RelOptInfo *rel,
    RangeTblEntry *rte);
+static void trim_query_pathkeys(PlannerInfo * root);
 static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	   RangeTblEntry *rte);
 static void set_foreign_size(PlannerInfo *root, RelOptInfo *rel,
@@ -112,6 +114,203 @@ static void recurse_push_qual(Node *setOp, Query *topquery,
   RangeTblEntry *rte, Index rti, Node *qual);
 static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
 
+/*
+ * collect_common_primary_pathkeys: Collects common unique and non-null index
+ * pathkeys from all base relations in current root.
+ */
+static List *
+collect_common_primary_pathkeys(PlannerInfo *root)
+{
+	List *result = NIL;
+	Index rti;
+	int   nindex = 0;
+	List

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Andres Freund

On 2014-07-22 17:05:22 +0900, MauMau wrote:
 From: Andres Freund and...@2ndquadrant.com
 On 2014-07-22 10:09:04 +0900, MauMau wrote:
 Is there any problem if we don't output the message?
 
 Yes. The user won't know that possibly gigabytes worth of diskspace
 aren't freed.
 
 RemovePgTempFiles() frees the disk space by removing temp relation files at
 server start.

But it's not called during a crash restart.

 In addition, the temp relation files are not replicated to
 the standby server of streaming replication (this is the customer's
 case).

So?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-07-22 Thread Kyotaro HORIGUCHI

Hello,

  Although I doubt necessity of the flexibility seeing the current
  testing framework, I don't have so strong objection about
  that. Nevertheless, perhaps you are appreciated to put a notice
  on.. README or somewhere.
 Hm, well... Fine, I added it in this updated series.

Thank you for your patience:)

 After all, I have no more comment about this patch. I will mark
this as 'Ready for committer' unless no comment comes from others
for a few days.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUFFER_LOCK_EXCLUSIVE is used in ginbuildempty().

2014-07-22 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 17 Jul 2014 15:54:31 -0400, Tom Lane t...@sss.pgh.pa.us wrote in 
10710.1405626...@sss.pgh.pa.us
 Peter Geoghegan p...@heroku.com writes:
  On Thu, Jul 17, 2014 at 7:47 AM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
  I don't understand the point of having these GIN_EXCLUSIVE / GIN_SHARED
  symbols.  It's not like we could do anything different than
  BUFFER_LOCK_EXCLUSIVE etc instead.  It there was a GinLockBuffer() it
  might make more sense to have specialized symbols, but as it is it seems
  pointless.

I agree with you. From the eyes not specialized for each AM, of
me, the translation-only symbols didn't make me so happy.

  It's a pattern common to the index AMs. I think it's kind of pointless
  myself, but as long as we're doing it we might as well be consistent.
 
 I think that to the extent that these symbols are used in APIs above the
 direct buffer-access layer, they are useful --- for example using
 BT_READ/BT_WRITE in _bt_search calls seems like a useful increment of
 readability.  GIN seems to have less of that than some of the other AMs,
 but I do see GIN_SHARE being used that way in some calls.
 
 BTW, there's one direct usage of BUFFER_LOCK_EXCLUSIVE in the GIST code
 as well, which should probably be replaced with GIST_EXCLUSIVE if we're
 trying to be consistent.

Though I brought up this topic, this kind of consistency seems
not needed so much. If so, it seems better to be left as it is.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread MauMau


From: Andres Freund and...@2ndquadrant.com

On 2014-07-22 17:05:22 +0900, MauMau wrote:
RemovePgTempFiles() frees the disk space by removing temp relation files 
at

server start.


But it's not called during a crash restart.


Yes, the comment of the function says:

* NOTE: we could, but don't, call this during a post-backend-crash restart
* cycle.  The argument for not doing it is that someone might want to 
examine

* the temp files for debugging purposes.  This does however mean that
* OpenTemporaryFile had better allow for collision with an existing temp
* file name.

But this is true if restart_after_crash = on in postgresql.conf, because the 
crash restart only occurs in that case.  However, in HA cluster, whether it 
is shared-disk or replication, restart_after_crash is set to off, isn't it?


Moreover, as the comment says, the behavior of keeping leftover temp files 
is for debugging by developers.  It's not helpful for users, isn't it?  I 
thought messages of DEBUG level is more appropriate, because the behavior is 
for debugging purposes.



In addition, the temp relation files are not replicated to
the standby server of streaming replication (this is the customer's
case).


So?


Yes, so nobody can convince serious customers that the current behavior 
makes real sense.


Could you please reconsider this?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] how to reach D5 in tuplesort.c 's polyphase merge algorithm?

2014-07-22 Thread 土卜皿

hi,
   I got the same result after work_mem = 64,
but I can get to D5 and D6 after using bigger data sample (at least 10
records) as Tom said!




2014-07-19 6:35 GMT+08:00 土卜皿 pengcz.n...@gmail.com:


 2014-07-19 6:26 GMT+08:00 Tom Lane t...@sss.pgh.pa.us:

 =?UTF-8?B?5Zyf5Y2c55q/?= pengcz.n...@gmail.com writes:
for studying polyphase merge algorithm of tuplesort.c, I use ddd and
  apend a table, which has a schema as follows:
  ...
  and has 36684 records, and every record is like:
  id  code  article  name  department
  31800266\NMachault77

  and for getting into external sort, I type the following command:

  select * from towns order by name desc;

  but I found it need not reach D5 and D6 during sorting,

 That doesn't sound like enough data to force it to spill to disk at all;
 at least not unless you turn down work_mem to some very small value.



 hi, Tom
   thanks a lot!



 work_mem you said remind me one more thing I did, I tried to change BLCKSZ
 = 8192/2, and successfully compiled, but I got a error when executing
 initdb

 Dillon

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Andres Freund

On 2014-07-22 19:13:56 +0900, MauMau wrote:
 From: Andres Freund and...@2ndquadrant.com
 On 2014-07-22 17:05:22 +0900, MauMau wrote:
 RemovePgTempFiles() frees the disk space by removing temp relation files
 at
 server start.
 
 But it's not called during a crash restart.
 
 Yes, the comment of the function says:
 
 * NOTE: we could, but don't, call this during a post-backend-crash restart
 * cycle.  The argument for not doing it is that someone might want to
 examine
 * the temp files for debugging purposes.  This does however mean that
 * OpenTemporaryFile had better allow for collision with an existing temp
 * file name.
 
 But this is true if restart_after_crash = on in postgresql.conf, because the
 crash restart only occurs in that case.  However, in HA cluster, whether it
 is shared-disk or replication, restart_after_crash is set to off, isn't it?

In almost all setups I've seen it's set to on, even in HA scenarios.

 Moreover, as the comment says, the behavior of keeping leftover temp files
 is for debugging by developers.  It's not helpful for users, isn't it?  I
 thought messages of DEBUG level is more appropriate, because the behavior is
 for debugging purposes.

GRR. That doesn't change the fact that there'll be files left over after
a crash restart.

 Yes, so nobody can convince serious customers that the current behavior
 makes real sense.

I think you're making lots of noise over a trivial log message.

 Could you please reconsider this?

No. Just removing a warning isn't the way to solve this. If you want to
improve things you'll actually need to improve things not just stick
your head into the sand.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Production block comparison facility

2014-07-22 Thread Simon Riggs

On 22 July 2014 08:49, Michael Paquier michael.paqu...@gmail.com wrote:
 On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The block comparison facility presented earlier by Heikki would not be
 able to be used in production systems. ISTM that it would be desirable
 to have something that could be used in that way.

 ISTM easy to make these changes

 * optionally generate a FPW for every WAL record, not just first
 change after checkpoint
 full_page_writes = 'always'

 * when an FPW arrives, optionally run a check to see if it compares
 correctly against the page already there, when running streaming
 replication without a recovery target. We could skip reporting any
 problems until the database is consistent
 full_page_write_check = on

 The above changes seem easy to implement.

 With FPW compression, this would be a usable feature in production.

 Comments?

 This is an interesting idea, and it would be easier to use than what
 has been submitted for CF1. However, full_page_writes set to always
 would generate a large amount of WAL even for small records,
 increasing I/O for the partition holding pg_xlog, and the frequency of
 checkpoints run on system. Is this really something suitable for
 production?

For critical systems, yes, I think it is.

It would be possible to make that user selectable for particular
transactions or tables.

 Then, looking at the code, we would need to tweak XLogInsert for the
 WAL record construction to always do a FPW and to update
 XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
 do some extra operations in the area of
 RestoreBackupBlock/RestoreBackupBlockContents, including masking
 operations before comparing the content of the FPW and the current
 page.

 Does that sound right?

Yes, it doesn't look very much code because it fits well with existing
approaches.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Production block comparison facility

2014-07-22 Thread Greg Stark

If you're always going FPW then there's no point in the rest of the record.
The point here was to find problems so that users could run normally with
confidence.

The cases you might want to run in the mode you describe are the build farm
or integration testing. When treating your application on the next release
of postgres it would be nice to have tests for the replication in your
workload given the experience in 9.3.

Even without the constant full page writes a live production system could
do a FPW comparison after a FPW if it was in a consistent state. That would
give standbys periodic verification at low costs.

-- 
greg
On 22 Jul 2014 12:28, Simon Riggs si...@2ndquadrant.com wrote:

 On 22 July 2014 08:49, Michael Paquier michael.paqu...@gmail.com wrote:
  On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
  The block comparison facility presented earlier by Heikki would not be
  able to be used in production systems. ISTM that it would be desirable
  to have something that could be used in that way.
 
  ISTM easy to make these changes
 
  * optionally generate a FPW for every WAL record, not just first
  change after checkpoint
  full_page_writes = 'always'
 
  * when an FPW arrives, optionally run a check to see if it compares
  correctly against the page already there, when running streaming
  replication without a recovery target. We could skip reporting any
  problems until the database is consistent
  full_page_write_check = on
 
  The above changes seem easy to implement.
 
  With FPW compression, this would be a usable feature in production.
 
  Comments?
 
  This is an interesting idea, and it would be easier to use than what
  has been submitted for CF1. However, full_page_writes set to always
  would generate a large amount of WAL even for small records,
  increasing I/O for the partition holding pg_xlog, and the frequency of
  checkpoints run on system. Is this really something suitable for
  production?

 For critical systems, yes, I think it is.

 It would be possible to make that user selectable for particular
 transactions or tables.

  Then, looking at the code, we would need to tweak XLogInsert for the
  WAL record construction to always do a FPW and to update
  XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
  do some extra operations in the area of
  RestoreBackupBlock/RestoreBackupBlockContents, including masking
  operations before comparing the content of the FPW and the current
  page.
 
  Does that sound right?

 Yes, it doesn't look very much code because it fits well with existing
 approaches.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Production block comparison facility

2014-07-22 Thread Simon Riggs

On 22 July 2014 12:54, Greg Stark st...@mit.edu wrote:
 If you're always going FPW then there's no point in the rest of the record.

I think its a simple matter to mark them XLP_BKP_REMOVABLE and to skip
any optimization of remainder of WAL records.

 The point here was to find problems so that users could run normally with
 confidence.

Yes, but a full overwrite mode would provide an even safer mode of operation.

 The cases you might want to run in the mode you describe are the build farm
 or integration testing. When treating your application on the next release
 of postgres it would be nice to have tests for the replication in your
 workload given the experience in 9.3.

 Even without the constant full page writes a live production system could do
 a FPW comparison after a FPW if it was in a consistent state. That would
 give standbys periodic verification at low costs.

Yes, the two options I proposed are somewhat independent of each other.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread MauMau


From: Andres Freund and...@2ndquadrant.com

On 2014-07-22 19:13:56 +0900, MauMau wrote:
But this is true if restart_after_crash = on in postgresql.conf, because 
the
crash restart only occurs in that case.  However, in HA cluster, whether 
it
is shared-disk or replication, restart_after_crash is set to off, isn't 
it?


In almost all setups I've seen it's set to on, even in HA scenarios.


I'm afraid that's because people don't notice the existence or purpose of 
this parameter.  The 9.1 release note says:


Add restart_after_crash setting which disables automatic server restart 
after a backend crash (Robert Haas)
This allows external cluster management software to control whether the 
database server restarts or not.


Reading this, I guess the parameter was introduced, and should be used, for 
HA environments controlled by the clusterware.  Restarting the database 
server on the same machine may fail, or the restarted server may fail again, 
due to the broken hardware components, so I guess it was considered better 
to let the clusterware determine what to do.



Moreover, as the comment says, the behavior of keeping leftover temp 
files

is for debugging by developers.  It's not helpful for users, isn't it?  I
thought messages of DEBUG level is more appropriate, because the behavior 
is

for debugging purposes.


GRR. That doesn't change the fact that there'll be files left over after
a crash restart.


Yes... that's a source of headache.  But please understand that there's a 
problem -- trying to leave temp relations just for debugging is causing a 
flood of messages, which the customer is actually concerned about.



I think you're making lots of noise over a trivial log message.


Maybe so, and I hope so.  I may be too nervous about what the customer will 
ask and/or request next.  If they request something similar to what I 
proposed here, let me consult you again.




Could you please reconsider this?


No. Just removing a warning isn't the way to solve this. If you want to
improve things you'll actually need to improve things not just stick
your head into the sand.



I have a few ideas below, but none of them seems better than the original 
proposal.  What do you think?


1. startup process deletes the catalog entries and data files of leftover 
temp relations at the end of recovery.
This is probably difficult, impossible or undesirable, because the startup 
process cannot access system catalogs.  Even if it's possible, it is against 
the developers' desire to leave temp relation files for debugging.


2. autovacuum launcher deletes the catalog entries and data files of 
leftover temp relations during its initialization.
This may be possible, but it is against the developers' desire to leave temp 
relation files for debugging.


3. Emit the orphan temp relation message only when the associated data 
file actually exists.
autovacuum workers check if the temp relation file is left over with stat(). 
If not, delete the catalog entry in pg_class silently.
This sounds reasonable because the purpose of the message is to notify users 
of potential disk space shortage.  In the streaming replication case, no 
data files should exist on the promoted new primary, so no messages should 
be emitted.
However, in the shared-disk HA cluster case, the temp relation files are 
left over on the shared disk, so this fix doesn't improve anything.


4. Emit the orphan temp relation message only when restart_after_crash is 
on.

i.e.
 ereport(restart_after_crash ? LOG : DEBUG1, ...


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Robert Haas

On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 Yes, so nobody can convince serious customers that the current behavior
 makes real sense.

 I think you're making lots of noise over a trivial log message.

 Could you please reconsider this?

 No. Just removing a warning isn't the way to solve this. If you want to
 improve things you'll actually need to improve things not just stick
 your head into the sand.

I've studied this area of the code before, and I would actually
proposed to fix this in the opposite way - namely, adopt the logic
currently used for the wraparound case, which drops the temp table,
even if wraparound is not an issue.  The current code seems to be
laboring under the impression that there's some benefit to keeping the
temporary relation around, which, as far as I can see, there isn't.
Now, you could perhaps argue that it's useful for forensics, but that
presumes that the situation where a backend fails to crash without
cleaning up its temporary relations is exciting enough to merit an
investigation, which I believe to be false.
RemoveTempRelationsCallback just does this:

AbortOutOfAnyTransaction();
StartTransactionCommand();
RemoveTempRelations(myTempNamespace);
CommitTransactionCommand();

RemoveTempRelations uses:

deleteWhatDependsOn(object, false);

These are pretty high-level operations, and there are all kinds of
reasons they could fail.  Many of those reasons do indeed involve the
system being messed up in some way, but it's likely that the user will
know about that for other reasons anyway - e.g. if the cleanup fails
because the disk where the files are located has gone read-only at the
operating system level, things are going to go downhill in a hurry.
When the user restarts, they expect recovery - or other automatic
cleanup mechanisms - to put things right.  And normally they do: the
first backend to get the same backend ID as any orphaned temp schema
will clear it out anyway on first use - completely silently - but if
it so happens that a crashed backend had a very high backend ID and
that temp table usage is relatively uncommon, then the user gets log
spam from now and until eternity because of a problem that, in their
mind, is already fixed.  Even better, they will typically be
completely unable to connect the log spam back to the event that
triggered it, and will have no idea how to put it right.

So while I disagree with MauMau's proposed fix, I agree with him that
this error message is useless log spam.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-22 Thread Robert Haas

On Tue, Jul 22, 2014 at 2:55 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 07/21/2014 10:47 PM, Anastasia Lubennikova wrote:

 Hi, hackers!
 There are new results of my work on GSoC project Index-only scans for
 GIST.
 Previous post is here:

 http://postgresql.1045698.n5.nabble.com/Index-only-scans-for-GIST-td5804892.html

 Repository is
 https://github.com/lubennikovaav/postgres/tree/indexonlygist2
 Patch is in attachments.
 It includes indexonlyscan for multicolumn GiST. It works correctly -
 results are the same with another scan plans.
 Fetch() method is realized for box and circle opclasses
 Improvement for circle opclass is not such distinct as for box opclass,
 because of recheck.


 For a circle, the GiST index stores a bounding box of the circle. The new
 fetch function reverses that, calculating the radius and center of the
 circle from the bounding box.

 Those conversions lose some precision due to rounding. Are we okay with
 that? Floating point math is always subject to rounding errors, but there's
 a good argument to be made that it's not acceptable to get a different value
 back when the user didn't explicitly invoke any math functions.

 As an example:

 create table circle_tbl (c circle);
 create index circle_tbl_idx on circle_tbl using gist (c);
 insert into circle_tbl values ('1.23456789012345,1.23456789012345,1e300');

 postgres=# set enable_seqscan=off; set enable_bitmapscan=off; set
 enable_indexonlyscan=on;
 SET
 SET
 SET
 postgres=# select * from circle_tbl ;
c
 
  (0,0),1e+300
 (1 row)

 postgres=# set enable_indexonlyscan=off;
 SET
 postgres=# select * from circle_tbl ;
   c
 --
  (1.23456789012345,1.23456789012345),1e+300
 (1 row)

I really don't think it's ever OK for a query to produce different
answers depending on which plan is chosen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index-only scans for multicolumn GIST

2014-07-22 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 For a circle, the GiST index stores a bounding box of the circle. The 
 new fetch function reverses that, calculating the radius and center of 
 the circle from the bounding box.

 Those conversions lose some precision due to rounding. Are we okay with 
 that?

That seems like a nonstarter :-(.  Index-only scans don't have a license
to return approximations.  If we drop the behavior for circles, how much
functionality do we have left?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Andres Freund

On 2014-07-22 09:39:13 -0400, Robert Haas wrote:
 On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund and...@2ndquadrant.com wrote:
  Yes, so nobody can convince serious customers that the current behavior
  makes real sense.
 
  I think you're making lots of noise over a trivial log message.
 
  Could you please reconsider this?
 
  No. Just removing a warning isn't the way to solve this. If you want to
  improve things you'll actually need to improve things not just stick
  your head into the sand.
 
 I've studied this area of the code before, and I would actually
 proposed to fix this in the opposite way - namely, adopt the logic
 currently used for the wraparound case, which drops the temp table,
 even if wraparound is not an issue.

I'm absolutely not opposed to fixing this for real. I doubt it's
backpatching material, but that's something to judge when we know what
to do.

 The current code seems to be
 laboring under the impression that there's some benefit to keeping the
 temporary relation around, which, as far as I can see, there isn't.
 Now, you could perhaps argue that it's useful for forensics, but that
 presumes that the situation where a backend fails to crash without
 cleaning up its temporary relations is exciting enough to merit an
 investigation, which I believe to be false.

I think the picture here changed with the introduction of the
restart_after_crash GUC - before it it was pretty hard to investigate
anything that involved crashes. So I'm ok with changing things
around. But I think it's more complex than just doing the if
(wraparound) in do_autovacuum().

a) There very well could be a backend reconnecting to that
   backendId. Then we potentially might try to remove the temp schema
   from two backends - I'm not sure that's always going to end up going
   well. There's already a race window, but it's pretty darn unlikely to
   hit it right now because the wraparound case pretty much implies that
   nothing has used that backendid slot for a while.
   I guess we could do something like:

   LockDatabaseObject(tempschema);
   if (SearchSysCacheExists1)
  /* bailout */
   performDeletion(...);

b) I think at the very least we also need to call RemovePgTempFiles()
   during crash restart.

 RemoveTempRelationsCallback just does this:
 
 AbortOutOfAnyTransaction();
 StartTransactionCommand();
 RemoveTempRelations(myTempNamespace);
 CommitTransactionCommand();
 
 RemoveTempRelations uses:
 
 deleteWhatDependsOn(object, false);

 These are pretty high-level operations, and there are all kinds of
 reasons they could fail.

One could argue, without being very convincing from my pov, that that's
a reason not to always do it from autovacuum because it'll prevent
vacuum from progressing past the borked temporary relation.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 22, 2014 at 6:23 AM, Andres Freund and...@2ndquadrant.com wrote:
 No. Just removing a warning isn't the way to solve this. If you want to
 improve things you'll actually need to improve things not just stick
 your head into the sand.

 I've studied this area of the code before, and I would actually
 proposed to fix this in the opposite way - namely, adopt the logic
 currently used for the wraparound case, which drops the temp table,
 even if wraparound is not an issue.  The current code seems to be
 laboring under the impression that there's some benefit to keeping the
 temporary relation around, which, as far as I can see, there isn't.

FWIW, I agree with Andres on this.  The right response is to drop the temp
table not complain about log spam.  Or even more to the point, investigate
why it's there in the first place; perhaps there's an actual fixable bug
somewhere in there.  But deciding that orphaned temp tables are normal
is *not* an improvement IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Andres Freund

On 2014-07-22 10:17:15 -0400, Tom Lane wrote:
 Or even more to the point, investigate why it's there in the first
 place; perhaps there's an actual fixable bug somewhere in there.

I think MauMau's scenario of a failover to another database explains
their existance - there's no step that'd remove them after promoting a
standby.

So there indeed is a need to have a sensible mechanism for removing them
at some point. But it should be about removing, not ignoring them.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Andres Freund

On 2014-07-22 22:18:03 +0900, MauMau wrote:
 From: Andres Freund and...@2ndquadrant.com
 On 2014-07-22 19:13:56 +0900, MauMau wrote:
 But this is true if restart_after_crash = on in postgresql.conf, because
 the
 crash restart only occurs in that case.  However, in HA cluster, whether
 it
 is shared-disk or replication, restart_after_crash is set to off, isn't
 it?
 
 In almost all setups I've seen it's set to on, even in HA scenarios.
 
 I'm afraid that's because people don't notice the existence or purpose of
 this parameter.  The 9.1 release note says:

I think it's also because it's simply a good idea to keep it on in many
production scenarios. Failing over 'just' because something caused a
crash restart is often too expensive.

 No. Just removing a warning isn't the way to solve this. If you want to
 improve things you'll actually need to improve things not just stick
 your head into the sand.

 I have a few ideas below, but none of them seems better than the original
 proposal.  What do you think?

 1. startup process deletes the catalog entries and data files of leftover
 temp relations at the end of recovery.

 This is probably difficult, impossible or undesirable, because the startup
 process cannot access system catalogs.  Even if it's possible, it is against
 the developers' desire to leave temp relation files for debugging.
 
 2. autovacuum launcher deletes the catalog entries and data files of
 leftover temp relations during its initialization.
 This may be possible, but it is against the developers' desire to leave temp
 relation files for debugging.

I think that desire is pretty much antiquated and doesn't really count
for much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-07-22 10:17:15 -0400, Tom Lane wrote:
 Or even more to the point, investigate why it's there in the first
 place; perhaps there's an actual fixable bug somewhere in there.

 I think MauMau's scenario of a failover to another database explains
 their existance - there's no step that'd remove them after promoting a
 standby.

 So there indeed is a need to have a sensible mechanism for removing them
 at some point. But it should be about removing, not ignoring them.

Agreed.  Note that RemovePgTempFiles, as such, only reclaims disk space.
It does not clean out the pg_class entries, which means that just running
that at standby promotion would do nothing to get rid of autovacuum's
whining.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some bogus results from prairiedog

2014-07-22 Thread Andrew Dunstan



On 07/22/2014 12:24 AM, Tom Lane wrote:

According to
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-07-21%2022%3A36%3A55
prairiedog saw a crash in make check on the 9.4 branch earlier tonight;
but there's not a lot of evidence as to why in the buildfarm report,
because the postmaster log file is truncated well before where things got
interesting.  Fortunately, I was able to capture a copy of check.log
before it got overwritten by the next run.  I find that the place where
the webserver report stops matches this section of check.log:

[53cd99bb.134a:158] LOG:  statement: create index test_range_gist_idx on 
test_range_gist using gist (ir);
[53cd99bb.134a:159] LOG:  statement: insert into test_range_gist select 
int4range(g, g+10) from generate_series(1,2000) g;
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^\
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@[53cd99ba.1344:329] LOG:  statement: 
INSERT INTO num_exp_div VALUES (7,8,'-1108.80577182462841041118');
[53cd99ba.1344:330] LOG:  statement: INSERT INTO num_exp_add VALUES 
(7,9,'-107955289.045047420');
[53cd99ba.1344:331] LOG:  statement: INSERT INTO num_exp_sub VALUES 
(7,9,'-58101680.954952580');

The ^@'s represent nul bytes, which I find runs of elsewhere in the file
as well.  I think they are an artifact of OS X buffering policy caused by
multiple processes writing into the same file without any interlocks.
Perhaps we ought to consider making buildfarm runs use the logging
collector by default?  But in any case, it seems uncool that either the
buildfarm log-upload process, or the buildfarm web server, is unable to
cope with log files containing nul bytes.



The data is there, just click on the check stage link at the top of 
the page to see it in raw form.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-22 Thread Fabrízio de Royes Mello

On Thu, Jul 17, 2014 at 8:02 PM, Andres Freund and...@2ndquadrant.com
wrote:

  That means should I FlushRelationBuffers(rel) before change the
  relpersistence?

 I don't think that'd help. I think what this means that you simply
 cannot change the relpersistence of the old relation before the heap
 swap is successful. So I guess it has to be something like (pseudocode):

 OIDNewHeap = make_new_heap(..);
 newrel = heap_open(OIDNewHeap, AEL);

 /*
  * Change the temporary relation to be unlogged/logged. We have to do
  * that here so buffers for the new relfilenode will have the right
  * persistency set while the original filenode's buffers won't get read
  * in with the wrong (i.e. new) persistency setting. Otherwise a
  * rollback after the rewrite would possibly result with buffers for the
  * original filenode having the wrong persistency setting.
  *
  * NB: This relies on swap_relation_files() also swapping the
  * persistency. That wouldn't work for pg_class, but that can't be
  * unlogged anyway.
  */
 AlterTableChangeCatalogToLoggedOrUnlogged(newrel);
 FlushRelationBuffers(newrel);
 /* copy heap data into newrel */
 finish_heap_swap();

 And then in swap_relation_files() also copy the persistency.


 That's the best I can come up right now at least.


Isn't better if we can set the relpersistence as an argument to
make_new_heap ?

I'm thinking to change the make_new_heap:

From:

 make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
   LOCKMODE lockmode)

To:

 make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
   LOCKMODE lockmode)

That way we can create the new heap with the appropriate relpersistence, so
in the swap_relation_files also copy the persistency, of course.

And after copy the heap data to the new table (ATRewriteTable) change
relpersistence of the OldHeap's indexes, because in the finish_heap_swap
they'll be rebuild.

Thoughts?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread Andres Freund

On 2014-07-23 00:13:26 +0900, MauMau wrote:
 Hello, Robert-san, Andres-san, Tom-san,
 
 From: Andres Freund and...@2ndquadrant.com
 a) There very well could be a backend reconnecting to that
   backendId. Then we potentially might try to remove the temp schema
   from two backends - I'm not sure that's always going to end up going
   well. There's already a race window, but it's pretty darn unlikely to
   hit it right now because the wraparound case pretty much implies that
   nothing has used that backendid slot for a while.
   I guess we could do something like:
 
   LockDatabaseObject(tempschema);
   if (SearchSysCacheExists1)
  /* bailout */
   performDeletion(...);
 
 b) I think at the very least we also need to call RemovePgTempFiles()
   during crash restart.
 
 Thank you for showing the direction.  I'll investigate the code.  But that
 will be tomorrow as it's already past midnight.
 
 Could it be included in 9.2.9 if I could submit the patch tomorrow? (I'm not
 confident I can finish it...)  I'd really appreciate it if you could create
 the fix, if tomorrow will be late.

9.2.9 is already stamped, so no. But even without that, I don't think
that this is a change that should be rushed into the backbranches. The
risk/benefit ratio just isn't on the size of doing things hastily.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] Suppress autovacuum: found orphan temp table message

2014-07-22 Thread MauMau


Hello, Robert-san, Andres-san, Tom-san,

From: Andres Freund and...@2ndquadrant.com

a) There very well could be a backend reconnecting to that
  backendId. Then we potentially might try to remove the temp schema
  from two backends - I'm not sure that's always going to end up going
  well. There's already a race window, but it's pretty darn unlikely to
  hit it right now because the wraparound case pretty much implies that
  nothing has used that backendid slot for a while.
  I guess we could do something like:

  LockDatabaseObject(tempschema);
  if (SearchSysCacheExists1)
 /* bailout */
  performDeletion(...);

b) I think at the very least we also need to call RemovePgTempFiles()
  during crash restart.


Thank you for showing the direction.  I'll investigate the code.  But that 
will be tomorrow as it's already past midnight.


Could it be included in 9.2.9 if I could submit the patch tomorrow? (I'm not 
confident I can finish it...)  I'd really appreciate it if you could create 
the fix, if tomorrow will be late.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some bogus results from prairiedog

2014-07-22 Thread Robert Haas

On Tue, Jul 22, 2014 at 12:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 According to
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-07-21%2022%3A36%3A55
 prairiedog saw a crash in make check on the 9.4 branch earlier tonight;
 but there's not a lot of evidence as to why in the buildfarm report,
 because the postmaster log file is truncated well before where things got
 interesting.  Fortunately, I was able to capture a copy of check.log
 before it got overwritten by the next run.  I find that the place where
 the webserver report stops matches this section of check.log:

 [53cd99bb.134a:158] LOG:  statement: create index test_range_gist_idx on 
 test_range_gist using gist (ir);
 [53cd99bb.134a:159] LOG:  statement: insert into test_range_gist select 
 int4range(g, g+10) from generate_series(1,2000) g;
 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^\
 @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@[53cd99ba.1344:329] LOG:  statement: 
 INSERT INTO num_exp_div VALUES (7,8,'-1108.80577182462841041118');
 [53cd99ba.1344:330] LOG:  statement: INSERT INTO num_exp_add VALUES 
 (7,9,'-107955289.045047420');
 [53cd99ba.1344:331] LOG:  statement: INSERT INTO num_exp_sub VALUES 
 (7,9,'-58101680.954952580');

 The ^@'s represent nul bytes, which I find runs of elsewhere in the file
 as well.  I think they are an artifact of OS X buffering policy caused by
 multiple processes writing into the same file without any interlocks.
 Perhaps we ought to consider making buildfarm runs use the logging
 collector by default?  But in any case, it seems uncool that either the
 buildfarm log-upload process, or the buildfarm web server, is unable to
 cope with log files containing nul bytes.

 Anyway, to cut to the chase, the crash seems to be from this:

 TRAP: FailedAssertion(!(FastPathStrongRelationLocks-count[fasthashcode]  
 0), File: lock.c, Line: 2957)

 which the postmaster shortly later says this about:

 [53cd99b6.130e:2] LOG:  server process (PID 5230) was terminated by signal 6: 
 Abort trap
 [53cd99b6.130e:3] DETAIL:  Failed process was running: ROLLBACK PREPARED 
 'foo1';
 [53cd99b6.130e:4] LOG:  terminating any other active server processes

 So there is still something rotten in the fastpath lock logic.

Gosh, that sucks.

The inconstancy of this problem would seem to suggest some kind of
locking bug rather than a flat-out concurrency issue, but it looks to
me like everything relevant is marked volatile.  But ... maybe the
problem could be in s_lock.h?  That machine is powerpc, and powerpc's
definition of S_UNLOCK looks like this:

#ifdef USE_PPC_LWSYNC
#define S_UNLOCK(lock)  \
do \
{ \
__asm__ __volatile__ ( lwsync \n); \
*((volatile slock_t *) (lock)) = 0; \
} while (0)
#else
#define S_UNLOCK(lock)  \
do \
{ \
__asm__ __volatile__ ( sync \n); \
*((volatile slock_t *) (lock)) = 0; \
} while (0)
#endif /* USE_PPC_LWSYNC */

I believe Andres recently reported that this isn't enough to prevent
compiler reordering; for that, the __asm__ __volatile___ would need to
be augmented with ::: memory.  If that's correct, then the compiler
could decide to issue the volatile store before the lwsync or sync.
Then, the CPU could decide to perform the volatile store to the
spinlock before performing the update to FastPathStrongRelationLocks.
That would explain this problem.

The other possible explanation (at least, AFAICS) is that
lock_twophase_postcommit() is trying to release a strong lock count
that it doesn't actually hold.  That would indicate a
strong-lock-count handling bug upstream someplace - e.g. that the
count got released when the transaction was prepared, or somesuch.  I
certainly can't rule that out, although I don't know exactly where to
look.  We could add an assertion to AtPrepare_Locks(), just before
setting locallock-holdsStrongLockCount = FALSE, that
locallock-holdsStrongLockCount is true if and only if the lock tag
and more suggest that it ought to be.  But that's really just guessing
wildly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.

2014-07-22 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 Diagnose incompatible OpenLDAP versions during build and test.

Hmm.  I'm pretty sure it is not considered good style to drop AC_DEFUN
blocks right into configure.in; at least, we have never done that before.
PGAC_LDAP_SAFE should get defined somewhere in config/*.m4, instead.
I am not real sure however which existing config/ file is appropriate,
or whether we should create a new one.  Peter, any opinion?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Behavior of OFFSET -1

2014-07-22 Thread Tom Lane

Before 9.3, you got an error from this:

regression=# select * from tenk1 offset -1;
ERROR:  OFFSET must not be negative

But 9.3 and up ignore the negative OFFSET.  This seems to be a thinko in
my commit 1a1832eb.  limit_needed() thinks it can discard the Limit plan
node altogether, which of course prevents nodeLimit.c from complaining:

/* Executor would treat less-than-zero same as zero */
if (offset  0)
return true;/* OFFSET with a positive value */

I don't recall the reasoning behind that comment for sure, but I imagine
I examined the behavior of ExecLimit() and failed to notice that there
was an error check in recompute_limits().

This seems to me to be a clear bug: we should reinstate the former
behavior by tightening this check so it only discards OFFSET with a
constant value of exactly 0.  Anyone think differently?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some bogus results from prairiedog

2014-07-22 Thread Andrew Dunstan



On 07/22/2014 10:55 AM, Andrew Dunstan wrote:


On 07/22/2014 12:24 AM, Tom Lane wrote:

According to
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-07-21%2022%3A36%3A55 

prairiedog saw a crash in make check on the 9.4 branch earlier 
tonight;

but there's not a lot of evidence as to why in the buildfarm report,
because the postmaster log file is truncated well before where things 
got

interesting.  Fortunately, I was able to capture a copy of check.log
before it got overwritten by the next run.  I find that the place where
the webserver report stops matches this section of check.log:

[53cd99bb.134a:158] LOG:  statement: create index test_range_gist_idx 
on test_range_gist using gist (ir);
[53cd99bb.134a:159] LOG:  statement: insert into test_range_gist 
select int4range(g, g+10) from generate_series(1,2000) g;
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^\ 

@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@[53cd99ba.1344:329] LOG: 
statement: INSERT INTO num_exp_div VALUES 
(7,8,'-1108.80577182462841041118');
[53cd99ba.1344:330] LOG:  statement: INSERT INTO num_exp_add VALUES 
(7,9,'-107955289.045047420');
[53cd99ba.1344:331] LOG:  statement: INSERT INTO num_exp_sub VALUES 
(7,9,'-58101680.954952580');


The ^@'s represent nul bytes, which I find runs of elsewhere in the file
as well.  I think they are an artifact of OS X buffering policy 
caused by

multiple processes writing into the same file without any interlocks.
Perhaps we ought to consider making buildfarm runs use the logging
collector by default?  But in any case, it seems uncool that either the
buildfarm log-upload process, or the buildfarm web server, is unable to
cope with log files containing nul bytes.



The data is there, just click on the check stage link at the top of 
the page to see it in raw form.






I have made a change in the upload receiver app to escape nul bytes in 
the main log field too. This will operate prospectively.


Using the logging collector would be a larger change, but we can look at 
it if this isn't sufficient.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parametric block size?

2014-07-22 Thread Alvaro Herrera

Fabien wrote:

 ISTM that a desirable and reasonably simple to implement feature
 would be to be able to set the blocksize at initdb time, and
 postgres could use the value found in the database instead of a
 compile-time one.

I think you will find it more difficult to implement than it seems at
first.  For one thing, there are several macros that depend on the block
size and the algorithms involved cannot work with dynamic sizes;
consider MaxIndexTuplesPerPage which is used inPageIndexMultiDelete()
for instance.  That value is used to allocate an array in the stack, 
but that doesn't work if the array size is dynamic.  (Actually it works
almost everywhere, but that feature is not in C89 and thus it fails on
Windows).  That shouldn't be a problem, you say, just palloc() the array
-- except that that function is called within critical sections in some
places (e.g. _bt_delitems_vacuum) and you cannot use palloc there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Portability issues in TAP tests

2014-07-22 Thread Robert Haas

On Mon, Jul 21, 2014 at 11:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 07/21/2014 02:40 PM, Tom Lane wrote:
 I had the same feeling about the Perl on RHEL6 ;-).  The TAP tests
 will need to be a great deal more portable than they've proven so far
 before they'll really be useful for anything much.  I trust we're
 committed to making that happen.

 I'd be rather inclined to remove them from the check-world and
 installcheck-world targets until we have dealt with the issues people
 have identified.

 I think it would be reasonable to do that in the 9.4 branch, since
 it's a bit hard to argue that the TAP tests are production grade
 at the moment.  We could leave them there in HEAD.

That doesn't do developers who can't run the tests in their
environment much good: most people develop against master.  Maybe
that's the point: to get these fixed.   But it's pretty annoying that
I can no longer do 'make world'.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Behavior of OFFSET -1

2014-07-22 Thread Robert Haas

On Tue, Jul 22, 2014 at 12:49 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Before 9.3, you got an error from this:

 regression=# select * from tenk1 offset -1;
 ERROR:  OFFSET must not be negative

 But 9.3 and up ignore the negative OFFSET.  This seems to be a thinko in
 my commit 1a1832eb.  limit_needed() thinks it can discard the Limit plan
 node altogether, which of course prevents nodeLimit.c from complaining:

 /* Executor would treat less-than-zero same as zero */
 if (offset  0)
 return true;/* OFFSET with a positive value */

 I don't recall the reasoning behind that comment for sure, but I imagine
 I examined the behavior of ExecLimit() and failed to notice that there
 was an error check in recompute_limits().

 This seems to me to be a clear bug: we should reinstate the former
 behavior by tightening this check so it only discards OFFSET with a
 constant value of exactly 0.  Anyone think differently?

Not I.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parametric block size?

2014-07-22 Thread Fabien COELHO



Hello Alvaro,


ISTM that a desirable and reasonably simple to implement feature
would be to be able to set the blocksize at initdb time, and
postgres could use the value found in the database instead of a
compile-time one.


I think you will find it more difficult to implement than it seems at
first.  For one thing, there are several macros that depend on the block
size and the algorithms involved cannot work with dynamic sizes;
consider MaxIndexTuplesPerPage which is used inPageIndexMultiDelete()
for instance.  That value is used to allocate an array in the stack,
but that doesn't work if the array size is dynamic.  (Actually it works
almost everywhere, but that feature is not in C89 and thus it fails on
Windows).  That shouldn't be a problem, you say, just palloc() the array
-- except that that function is called within critical sections in some
places (e.g. _bt_delitems_vacuum) and you cannot use palloc there.


Hmmm. Thanks for your point... indeed there may be implementation 
details... not a surprise:-)



Note that I was more asking about the desirability of the feature, the 
implementation is another, although also relevant, issue. To me it is 
really desirable given the potential performance impact, but maybe we 
should not care about 10%?



About your point: if we really have to do without dynamic stack allocation 
(C99 is only 15, not ripe for adult use yet, maybe when it turns 18 or 21, 
depending on the state:-), a possible way around would be to allocate a 
larger area with some MAX_BLCKSZ with a ifdef for compilers that really 
would not support dynamic stack allocation. Moreover, it might be possible 
to hide it more or less cleanly in a macro. I had to put -pedantic 
-Werror to manage to get an error on dynamic stack allocation with gcc 
-std=c89.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Rohit Goyal

Hi All,

I am working on postgresql code and having some problem. :)

I need to create shared data structure, so that different client and
connection can update and share the state of those data structures in
memory. I planned to use top memory context but it can give me shared
structure within one session/terminal.

Please tel me how  postgresql do that and how i can do that?

Regards,
Rohit Goyal

Re: [HACKERS] Behavior of OFFSET -1

2014-07-22 Thread David Fetter

On Tue, Jul 22, 2014 at 12:49:37PM -0400, Tom Lane wrote:
 Before 9.3, you got an error from this:
 
 regression=# select * from tenk1 offset -1;
 ERROR:  OFFSET must not be negative

That seems eminently sane, and should continue to error out, IM.

The only circumstance I can imagine where this could be argued not to
be is just casuistry, namely LIMIT m OFFSET -n might be argued to mean
LIMIT m-n.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-22 Thread Fabrízio de Royes Mello

On Tue, Jul 22, 2014 at 12:01 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:



 On Thu, Jul 17, 2014 at 8:02 PM, Andres Freund and...@2ndquadrant.com
wrote:
 
   That means should I FlushRelationBuffers(rel) before change the
   relpersistence?
 
  I don't think that'd help. I think what this means that you simply
  cannot change the relpersistence of the old relation before the heap
  swap is successful. So I guess it has to be something like (pseudocode):
 
  OIDNewHeap = make_new_heap(..);
  newrel = heap_open(OIDNewHeap, AEL);
 
  /*
   * Change the temporary relation to be unlogged/logged. We have to do
   * that here so buffers for the new relfilenode will have the right
   * persistency set while the original filenode's buffers won't get read
   * in with the wrong (i.e. new) persistency setting. Otherwise a
   * rollback after the rewrite would possibly result with buffers for the
   * original filenode having the wrong persistency setting.
   *
   * NB: This relies on swap_relation_files() also swapping the
   * persistency. That wouldn't work for pg_class, but that can't be
   * unlogged anyway.
   */
  AlterTableChangeCatalogToLoggedOrUnlogged(newrel);
  FlushRelationBuffers(newrel);
  /* copy heap data into newrel */
  finish_heap_swap();
 
  And then in swap_relation_files() also copy the persistency.
 
 
  That's the best I can come up right now at least.
 

 Isn't better if we can set the relpersistence as an argument to
make_new_heap ?

 I'm thinking to change the make_new_heap:

 From:

  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
LOCKMODE lockmode)

 To:

  make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
LOCKMODE lockmode)

 That way we can create the new heap with the appropriate relpersistence,
so in the swap_relation_files also copy the persistency, of course.

 And after copy the heap data to the new table (ATRewriteTable) change
relpersistence of the OldHeap's indexes, because in the finish_heap_swap
they'll be rebuild.

 Thoughts?


The attached patch implement my previous idea based on Andres thoughts.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..2d131df 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
 ENABLE ALWAYS RULE replaceable class=PARAMETERrewrite_rule_name/replaceable
 CLUSTER ON replaceable class=PARAMETERindex_name/replaceable
 SET WITHOUT CLUSTER
+SET {LOGGED | UNLOGGED}
 SET WITH OIDS
 SET WITHOUT OIDS
 SET ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] )
+SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
 RESET ( replaceable class=PARAMETERstorage_parameter/replaceable [, ... ] )
 INHERIT replaceable class=PARAMETERparent_table/replaceable
 NO INHERIT replaceable class=PARAMETERparent_table/replaceable
 OF replaceable class=PARAMETERtype_name/replaceable
 NOT OF
 OWNER TO replaceable class=PARAMETERnew_owner/replaceable
-SET TABLESPACE replaceable class=PARAMETERnew_tablespace/replaceable
 REPLICA IDENTITY {DEFAULT | USING INDEX replaceable class=PARAMETERindex_name/replaceable | FULL | NOTHING}
 
 phraseand replaceable class=PARAMETERtable_constraint_using_index/replaceable is:/phrase
@@ -447,6 +448,20 @@ ALTER TABLE [ IF EXISTS ] replaceable class=PARAMETERname/replaceable
/varlistentry
 
varlistentry
+termliteralSET {LOGGED | UNLOGGED}/literal/term
+listitem
+ para
+  This form changes the table persistence type from unlogged to permanent or
+  from unlogged to permanent (see xref linkend=SQL-CREATETABLE-UNLOGGED).
+ /para
+ para
+  Changing the table persistence type acquires an literalACCESS EXCLUSIVE/literal lock
+  and rewrites the table contents and associated indexes into new disk files.
+ /para
+/listitem
+   /varlistentry
+
+   varlistentry
 termliteralSET WITH OIDS/literal/term
 listitem
  para
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b1c411a..7f497c7 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -574,7 +574,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	heap_close(OldHeap, NoLock);
 
 	/* Create the transient table that will receive the re-ordered data */
-	OIDNewHeap = make_new_heap(tableOid, tableSpace, false,
+	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   OldHeap-rd_rel-relpersistence,

Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Atri Sharma

On Tuesday, July 22, 2014, Rohit Goyal rhtgyl...@gmail.com wrote:

 Hi All,

 I am working on postgresql code and having some problem. :)

 I need to create shared data structure, so that different client and
 connection can update and share the state of those data structures in
 memory. I planned to use top memory context but it can give me shared
 structure within one session/terminal.

 Please tel me how  postgresql do that and how i can do that?

 Regards,
 Rohit Goyal


How about making it a part of shared mem, like shared buffers?

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Rohit Goyal

Hi Atri/All,

I am very new in postgresql code. Can you please help in a bit detail ortel
me how to create structure in shared memory(shared buffer).

It would be really easy for me if you can give me a code snippet or any
link to follow.

Regards,
Rohit Goyal


On Tue, Jul 22, 2014 at 8:30 PM, Atri Sharma atri.j...@gmail.com wrote:



 On Tuesday, July 22, 2014, Rohit Goyal rhtgyl...@gmail.com wrote:

 Hi All,

 I am working on postgresql code and having some problem. :)

 I need to create shared data structure, so that different client and
 connection can update and share the state of those data structures in
 memory. I planned to use top memory context but it can give me shared
 structure within one session/terminal.

 Please tel me how  postgresql do that and how i can do that?

 Regards,
 Rohit Goyal


 How about making it a part of shared mem, like shared buffers?

 Regards,

 Atri


 --
 Regards,

 Atri
 *l'apprenant*




-- 
Regards,
Rohit Goyal

Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Jaime Casanova

On Tue, Jul 22, 2014 at 1:33 PM, Rohit Goyal rhtgyl...@gmail.com wrote:
 Hi Atri/All,

 I am very new in postgresql code. Can you please help in a bit detail ortel
 me how to create structure in shared memory(shared buffer).

 It would be really easy for me if you can give me a code snippet or any link
 to follow.


you can look at contrib/pg_stat_statements

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal (9.5) : psql unicode border line styles

2014-07-22 Thread Tomas Vondra

On 28.6.2014 21:29, Pavel Stehule wrote:
 Hello
 
 rebase for 9.5
 
 test:
 \pset linestyle unicode \pset border 2
 \pset unicode_header_linestyle double
 
 \l
 
 Regards
 
 Pavel

I did a quick review of the patch today:

* it applies cleanly to current HEAD (no failures, small offsets)
* compiles and generally seems to work just fine

Two questions:

(1) Shouldn't the new options be listed in '\?' (as possible names for
pset)? I mean, here:

  \pset [NAME [VALUE]] set table output option
 (NAME :=
{format|border|expanded|fieldsep|fieldsep_zero|footer|null|
numericlocale|recordsep|recordsep_zero|tuples_only|title|tableattr|pager})


(2) I noticed this piece of code:

+typedef enum unicode_linestyle
+{
+   UNICODE_LINESTYLE_SINGLE = 0, /* to make sure someone initializes this 
*/
+   UNICODE_LINESTYLE_DOUBLE = 1
+} unicode_linestyle;

Why are the values defined explicitly? These values are set by the
compiled automatically, so why set them manually? Only a few of the
other enums are defined explicitly, and most of them have to do that to
define different values (e.g. 0x01, 0x02, 0x04, ...).

I don't understand how the comment to make sure someone initializes
this explains the purpose?


regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some bogus results from prairiedog

2014-07-22 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 22, 2014 at 12:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Anyway, to cut to the chase, the crash seems to be from this:
 TRAP: FailedAssertion(!(FastPathStrongRelationLocks-count[fasthashcode]  
 0), File: lock.c, Line: 2957)
 So there is still something rotten in the fastpath lock logic.

 Gosh, that sucks.

 The inconstancy of this problem would seem to suggest some kind of
 locking bug rather than a flat-out concurrency issue, but it looks to
 me like everything relevant is marked volatile.

I don't think that you need any big assumptions about machine-specific
coding issues to spot the problem.  The assert in question is here:

/*
 * Decrement strong lock count.  This logic is needed only for 2PC.
 */
if (decrement_strong_lock_count
 ConflictsWithRelationFastPath(lock-tag, lockmode))
{
uint32fasthashcode = FastPathStrongLockHashPartition(hashcode);

SpinLockAcquire(FastPathStrongRelationLocks-mutex);
Assert(FastPathStrongRelationLocks-count[fasthashcode]  0);
FastPathStrongRelationLocks-count[fasthashcode]--;
SpinLockRelease(FastPathStrongRelationLocks-mutex);
}

and it sure looks to me like that
ConflictsWithRelationFastPath(lock-tag is looking at the tag of the
shared-memory lock object you just released.  If someone else had managed
to recycle that locktable entry for some other purpose, the
ConflictsWithRelationFastPath call might incorrectly return true.

I think s/lock-tag/locktag/ would fix it, but maybe I'm missing
something.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.

2014-07-22 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 Diagnose incompatible OpenLDAP versions during build and test.

Oh, one more thing: the Windows buildfarm members mostly don't like
the dblink test case you added.  Looks like the mechanism for finding
the shared library doesn't work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README

2014-07-22 Thread Peter Geoghegan

On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 There is a mention about the race condition where it needs to move right
 in another caller (_bt_search) of _bt_moveright() as well.

 /*
 * Race -- the page we just grabbed may have split since we read its
 * pointer in the parent (or metapage).  If it has, we may need to
 * move right to its new sibling.  Do that.
 ..

 Do you think there is more to what is already mentioned on top of second
 caller which we should add or you think if it is true for both, then it
 should
 be on top of _bt_moveright()?

Well, maybe it is justified to mention it multiple times, so as to not
break the reader's train of thought. I'm not sure.

 In general, I agree with you that we should mention about any advantage
 of the algorithm we are using and especially if it is significant.  I think
 it
 will be better if can also mention how that advantage or use is realized
 in our implementation as we are already doing in README of nbtree.

Right. It seems like the nbtree README is very shy about telling us
what the point of all this extra work is. IMV that should be stated
very prominently to aid understanding. A lot of people believe that we
have to do lock coupling/crabbing when descending the tree. We do
not. The locks acquired when descending a B-Tree in Postgres are
pretty granular. One read buffer lock is held at a time in the process
of servicing index scans. There are times during the descent of the
tree when no buffer locks are held whatsoever. Moreover, (with some
caveats) it doesn't really matter if a stale view of the page is seen
during a descent (as it happens I've been trying to think of ways to
further take advantage of this). That's pretty cool. If you only know
one thing about how the nbtree code works, that should probably be it.

 The above indicates 2 things:
 a. L  Y doesn't need to hold read locks concurrently.
 b. Advantage of right-links at all levels and high-key.

 As per my understanding, we are not following point (a) in our code,
 so what is the benefit we get by having a reference of same in README?

The major reason that we don't completely avoid read locks, is, I
suppose, the need for self-consistent pages (but also because it would
break page deletion - I'm pretty sure that LY don't consider page
deletion, and the page deletion logic is entirely based on the Lanin
and Shasha paper and original research, but I didn't check). I think
that the sentence Lehman and Yao don't require read locks, but assume
that in-memory copies of tree pages are unshared is intended to
convey the point on the self-consistency of pages. Of course, Lehman
and Yao must assume that the B-Tree is in some sense in shared memory.
Otherwise, there wouldn't be much point to their elaborate locking
protocol.  :-)

 Isn't it better if we mention how the point (b) is used in our code and
 it's advantage together rather than keeping it at top of README?

Maybe it deserves more prominent mention in the code too.

 Already README mentions in brief about right-link and how it is used
 as below:
 .. The scan must remember the page's right-link at the time it was scanned,
 since that is the page to move right to; if we move right to the current
 right-link then we'd re-scan any items moved by a page split. ...

This is talking about how index scans interlock against VACUUM while
going to the heap, by keeping a leaf page pinned (this prevents super
exclusive lock acquisition). This is after the tree has been
descended. This is really just a detail (albeit one that follows
similar principles, since pages split right and it similarly exploits
that fact), whereas the use of right links and high keys while
descending the tree, and in particular the fact that the move right
LY technique obviates the prior need for lock coupling is pretty
much the whole point of LY.

In more concrete terms, _bt_search() releases and only then acquires
read locks during a descent of the tree (by calling
_bt_relandgetbuf()), and, perhaps counterintuitively, that's just
fine.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README

2014-07-22 Thread Tom Lane

Peter Geoghegan p...@heroku.com writes:
 Right. It seems like the nbtree README is very shy about telling us
 what the point of all this extra work is.

IIRC, the README was written on the assumption that you'd already read
LY.  If this patch is mostly about not assuming that, why not?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pgkill() is not POSIX-like for exiting processes

2014-07-22 Thread Noah Misch

My new OpenLDAP test case has been breaking each MSVC buildfarm member.  Most
MinGW members are fine, though the 9.0 and 9.1 narwhal members broke.  (Newer
narwhal members have been broken long-term.)  The MSVC build system has a
mundane inability to handle a Makefile construct I used; the first attached
patch fixes that.  With that fixed, though, the test case reveals a departure
from POSIX in pgkill(), our emulation of kill(2) for Windows.  A pgkill() call
falling very close in time to process exit can easily give ERROR_BROKEN_PIPE,
and I at least once observed it give ERROR_BAD_PIPE.  pgkill() translates
those codes to EINVAL.  Only a buggy POSIX program will ever see EINVAL from
kill(2).  This situation is just like signalling a zombie, and POSIX kill(2)
returns zero for zombies.  Let's do that, as attached.

I'm inclined to back-patch this, although a quick scan didn't turn up any code
having bugs today as a result.  The change will affect occasional error
messages.  (An alternative is to change the test case code in the back
branches to treat EINVAL like success.)

On my systems, MSVC builds get ERROR_BROKEN_PIPE far more easily than MinGW
builds.  In MinGW, I ran make -C contrib/dblink check many times without
ever hitting the error, but vcregress contribcheck hit it over 99% of the
time.  I don't have a theory to explain that.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 39698ee..a9e95cd 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -331,9 +331,13 @@ sub fetchRegressOpts
if ($m =~ /^\s*REGRESS_OPTS\s*=(.*)/m)
{
 
-   # ignore options that use makefile variables - can't handle 
those
-   # ignore anything that isn't an option staring with --
-   @opts = grep { $_ !~ /\$\(/  $_ =~ /^--/ } split(/\s+/, $1);
+   # Substitute known Makefile variables, then ignore options that 
retain
+   # an unhandled variable reference.  Ignore anything that isn't 
an
+   # option staring with --.
+   @opts = grep {
+   s/\Q$(top_builddir)\E/\$topdir\/;
+   $_ !~ /\$\(/  $_ =~ /^--/
+   } split(/\s+/, $1);
}
if ($m =~ /^\s*ENCODING\s*=\s*(\S+)/m)
{
diff --git a/src/port/kill.c b/src/port/kill.c
index 5a7d483..b33283e 100644
--- a/src/port/kill.c
+++ b/src/port/kill.c
@@ -70,13 +70,28 @@ pgkill(int pid, int sig)
return 0;
}
 
-   if (GetLastError() == ERROR_FILE_NOT_FOUND)
-   errno = ESRCH;
-   else if (GetLastError() == ERROR_ACCESS_DENIED)
-   errno = EPERM;
-   else
-   errno = EINVAL;
-   return -1;
+   switch (GetLastError())
+   {
+   case ERROR_BROKEN_PIPE:
+   case ERROR_BAD_PIPE:
+
+   /*
+* These arise transiently as a process is exiting.  
Treat them
+* like POSIX treats a zombie process, reporting 
success.
+*/
+   return 0;
+
+   case ERROR_FILE_NOT_FOUND:
+   /* pipe fully gone, so treat the process as gone */
+   errno = ESRCH;
+   return -1;
+   case ERROR_ACCESS_DENIED:
+   errno = EPERM;
+   return -1;
+   default:
+   errno = EINVAL; /* unexpected */
+   return -1;
+   }
 }
 
 #endif

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Craig Ringer

On 07/23/2014 02:33 AM, Rohit Goyal wrote:
 
 I am very new in postgresql code. Can you please help in a bit detail
 ortel me how to create structure in shared memory(shared buffer).
 
 It would be really easy for me if you can give me a code snippet or any
 link to follow.
 

There's a lot of detail on how to do this in the BDR codebase, see
contrib/bdr in
http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git;a=summary

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Craig Ringer

On 07/23/2014 09:46 AM, Craig Ringer wrote:
 There's a lot of detail on how to do this in the BDR codebase, see
 contrib/bdr in
 http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git;a=summary

Oh, sorry: in the bdr-next branch. Should've mentioned.

http://git.postgresql.org/gitweb/?p=2ndquadrant_bdr.git;a=tree;f=contrib/bdr;h=fad1aa59a15724deb98f9b923d84f0ce818afc1f;hb=refs/heads/bdr-next

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IS NOT DISTINCT FROM + Indexing

2014-07-22 Thread Jonathan S. Katz

On Jul 22, 2014, at 12:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jonathan S. Katz jonathan.k...@excoventures.com writes:
 On Jul 21, 2014, at 9:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The short reason why not is that it's not an operator (where operator
 is defined as something with a pg_operator entry), and all our indexing
 infrastructure is built around the notion that indexable clauses are of
 the form indexed_column indexable_operator comparison_value.
 
 What got me thinking this initially problem is that I know IS NULL is 
 indexable and I was unsure of how adding IS NOT DISTINCT FROM would be too 
 different from that - of course, this is from my perspective from primarily 
 operating on the surface.  It sounds like the IS NULL work is in the btree 
 code?
 
 We hacked in IS [NOT] NULL as a potentially indexable construct, but the
 key thing that made that possible without major redesign is that IS [NOT]
 NULL is datatype independent, so there's no need to identify any
 particular underlying operator or opclass.  I'm not sure what we'd do to
 handle IS [NOT] DISTINCT FROM, but that particular approach ain't gonna
 cut it.
 
 Another point is that people are unlikely to be satisfied with planner
 optimization for IS NOT DISTINCT FROM that doesn't support it as a join
 clause (i.e., tab1.col1 IS NOT DISTINCT FROM tab2.col2); which is an issue
 that doesn't arise for IS [NOT] NULL, as it has only one argument.  So
 that brings you into not just indexability but hashing and merging
 support.  I hasten to say that that doesn't necessarily have to happen
 in a version-zero patch; but trying to make IS NOT DISTINCT FROM into
 a first-class construct is a big project.

Well that definitely answers how hard would it be. - before embarking on 
something laborious (as even just indexing is nontrivial), I think it would be 
good to figure out how people are using IS [NOT] DISTINCT FROM and if there is 
interest in having it be indexable, let alone used in a JOIN optimization.  It 
could become a handy tool to simplify the SQL in queries that are returning a 
lot of NULL / NOT NULL data mixed together.

Jonathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README

2014-07-22 Thread Amit Kapila

On Wed, Jul 23, 2014 at 5:58 AM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
  The above indicates 2 things:
  a. L  Y doesn't need to hold read locks concurrently.
  b. Advantage of right-links at all levels and high-key.
 
  As per my understanding, we are not following point (a) in our code,
  so what is the benefit we get by having a reference of same in README?

 The major reason that we don't completely avoid read locks, is, I
 suppose, the need for self-consistent pages (but also because it would
 break page deletion - I'm pretty sure that LY don't consider page
 deletion, and the page deletion logic is entirely based on the Lanin
 and Shasha paper and original research, but I didn't check). I think
 that the sentence Lehman and Yao don't require read locks, but assume
 that in-memory copies of tree pages are unshared is intended to
 convey the point on the self-consistency of pages. Of course, Lehman
 and Yao must assume that the B-Tree is in some sense in shared memory.
 Otherwise, there wouldn't be much point to their elaborate locking
 protocol.  :-)

Okay, but how does this justify to add below new text in README.
+ Even with these read locks, Lehman and Yao's approach obviates the
+ need of earlier schemes to hold multiple read locks concurrently when
+ descending the tree as part of servicing index scans (pessimistic lock
+ coupling).

Actually I think putting it can lead to inconsistency in the README.
Currently it indicates that our algorithm is different from LY w.r.t taking
Read Locks and has given explanation for same.

  Isn't it better if we mention how the point (b) is used in our code and
  it's advantage together rather than keeping it at top of README?

 Maybe it deserves more prominent mention in the code too.

  Already README mentions in brief about right-link and how it is used
  as below:
  .. The scan must remember the page's right-link at the time it was
scanned,
  since that is the page to move right to; if we move right to the current
  right-link then we'd re-scan any items moved by a page split. ...

 This is talking about how index scans interlock against VACUUM while
 going to the heap, by keeping a leaf page pinned (this prevents super
 exclusive lock acquisition). This is after the tree has been
 descended. This is really just a detail (albeit one that follows
 similar principles, since pages split right and it similarly exploits
 that fact), whereas the use of right links and high keys while
 descending the tree, and in particular the fact that the move right
 LY technique obviates the prior need for lock coupling is pretty
 much the whole point of LY.

 In more concrete terms, _bt_search() releases and only then acquires
 read locks during a descent of the tree (by calling
 _bt_relandgetbuf()), and, perhaps counterintuitively, that's just
 fine.

So don't you think that it needs bit more explanation than you have
quoted in below text.
+ The addition of right-links at all levels, as well as the
+ addition of a page high key allows detection of, and dynamic
+ recovery from concurrent page splits (that is, splits between
+ unlocking an internal page, and subsequently locking its child page
+ during a descent).

Basically I think it will be better if you can explain in bit more detail
that
how does right-links at all levels and high-key helps to detect and
recover from concurrent page splits.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README

2014-07-22 Thread Peter Geoghegan

On Tue, Jul 22, 2014 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 IIRC, the README was written on the assumption that you'd already read
 LY.  If this patch is mostly about not assuming that, why not?

LY made the same mistake that the authors of most influential papers
make - they never get around to telling the reader why they should
bother to read it. The paper is over 30 years old, and we now know
that it's very influential, and the reasons why. I think that both the
nbtree README and LY would be a lot more approachable with a high
level introduction (arguably LY attempt this, but the way they go
about it seems impenetrable, mostly consisting of esoteric references
to other papers). Surely making that code more approachable is a
worthy goal.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Inconsistencies of service failure handling on Windows

2014-07-22 Thread Michael Paquier

Hi all,

While playing on Windows with services, I noticed an inconsistent behavior
in the way failures are handled when using a service for a Postgres
instance.

Let's assume that there is a service called postgres that has been
registered:
$ psql -At -c 'select version()'
PostgreSQL 9.5devel, compiled by Visual C++ build 1600, 64-bit
$ tasklist.exe -svc -FI SERVICES eq postgres

Image Name PID Services
= 

pg_ctl.exe 556 postgres

When pg_ctl is directly killed, service manager is able to detect a failure
correctly.
$ taskkill.exe -PID 556 -F
SUCCESS: The process with PID 556 has been terminated.
$ sc query postgres
SERVICE_NAME: postgres
TYPE   : 10  WIN32_OWN_PROCESS
STATE  : 1  STOPPED
WIN32_EXIT_CODE: 1067  (0x42b)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT : 0x0
WAIT_HINT  : 0x0
In this case 1067 means that the process left unexpectedly. Note that at
this point the Postgres instance is still running but we can use the
failure callback to run a script that could do some cleanup before
restarting properly the service.

However when a backend process is directly killed something different
happens.
$ tasklist.exe -FI IMAGENAME eq postgres.exe

Image Name PID Session NameSession#Mem Usage
=   === 
postgres.exe  2088 Services   0 17,380 K
postgres.exe  2132 Services   0  4,400 K
postgres.exe  2236 Services   0  5,064 K
postgres.exe  1524 Services   0  6,304 K
postgres.exe  2084 Services   0  9,200 K
postgres.exe  2384 Services   0  5,968 K
postgres.exe  2652 Services   0  4,500 K
postgres.exe  2116 Services   0  4,384 K
$ taskkill.exe -PID 2084 -F
SUCCESS: The process with PID 2084 has been terminated.
After that some processes remain:
$ tasklist.exe -FI IMAGENAME eq postgres.exe
Image Name PID Session NameSession#Mem Usage
=   === 
postgres.exe  2088 Services   0  5,708 K
postgres.exe  2132 Services   0  4,400 K

Processes that are immediately taken down when attempting a connection to
the server. Note that before attempting any connections service is
considered as running normally:
$ sc query postgres
SERVICE_NAME: postgres
TYPE   : 10  WIN32_OWN_PROCESS
STATE  : 4  RUNNING
(STOPPABLE, PAUSABLE, ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE: 0  (0x0)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT : 0x0
WAIT_HINT  : 0x0
$ psql
psql: could not connect to server: Connection refused (0x274D/10061)
Is the server running on host localhost (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused (0x274D/10061)
Is the server running on host localhost (127.0.0.1) and accepting
TCP/IP connections on port 5432?
$ tasklist.exe -FI IMAGENAME eq postgres.exe
INFO: No tasks are running which match the specified criteria.

But now service has stopped, and it is not considered as having failed:
$ sc query postgres
SERVICE_NAME: postgres
TYPE   : 10  WIN32_OWN_PROCESS
STATE  : 1  STOPPED
WIN32_EXIT_CODE: 0  (0x0)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT : 0x0
WAIT_HINT  : 0x0
This seems like an inconsistent behavior in error detection.

I am guessing that pg_ctl is not able to track appropriately failures that
are happening on postgres side. But are there things we could do to improve
the failure detection here?

Regards,
-- 
Michael

Re: [HACKERS] proposal (9.5) : psql unicode border line styles

2014-07-22 Thread Pavel Stehule

Hi Tomas

2014-07-22 23:20 GMT+02:00 Tomas Vondra t...@fuzzy.cz:

 On 28.6.2014 21:29, Pavel Stehule wrote:
  Hello
 
  rebase for 9.5
 
  test:
  \pset linestyle unicode \pset border 2
  \pset unicode_header_linestyle double
 
  \l
 
  Regards
 
  Pavel

 I did a quick review of the patch today:

 * it applies cleanly to current HEAD (no failures, small offsets)
 * compiles and generally seems to work just fine

 Two questions:

 (1) Shouldn't the new options be listed in '\?' (as possible names for
 pset)? I mean, here:

   \pset [NAME [VALUE]] set table output option
  (NAME :=
 {format|border|expanded|fieldsep|fieldsep_zero|footer|null|
 numericlocale|recordsep|recordsep_zero|tuples_only|title|tableattr|pager})


fixed



 (2) I noticed this piece of code:

 +typedef enum unicode_linestyle
 +{
 +   UNICODE_LINESTYLE_SINGLE = 0, /* to make sure someone initializes
 this */
 +   UNICODE_LINESTYLE_DOUBLE = 1
 +} unicode_linestyle;

 Why are the values defined explicitly? These values are set by the
 compiled automatically, so why set them manually? Only a few of the
 other enums are defined explicitly, and most of them have to do that to
 define different values (e.g. 0x01, 0x02, 0x04, ...).


this is useless - I removed it.



 I don't understand how the comment to make sure someone initializes
 this explains the purpose?


copy/paste error :( - removed

updated version is in attachment

Regards

Pavel




 regards
 Tomas


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

commit fb99f3b1e12d7dfb203b70187e63647e5d0a674d
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Wed Jul 23 07:30:46 2014 +0200

second version - minor changes: help, remove bogus comment and not necessary exact enum specification

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index fa0d6f2..fc8a503 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2294,6 +2294,42 @@ lo_import 152801
   /para
   /listitem
   /varlistentry
+
+  varlistentry
+  termliteralunicode_border_style/literal/term
+  listitem
+  para
+  Sets the border line drawing style to one
+  of literalsingle/literal or  literaldouble/literal
+  This option only affects the literalunicode/
+  linestyle
+  /para
+  /listitem
+  /varlistentry
+
+  varlistentry
+  termliteralunicode_column_style/literal/term
+  listitem
+  para
+  Sets the column line drawing style to one
+  of literalsingle/literal or  literaldouble/literal
+  This option only affects the literalunicode/
+  linestyle
+  /para
+  /listitem
+  /varlistentry
+
+  varlistentry
+  termliteralunicode_header_style/literal/term
+  listitem
+  para
+  Sets the header line drawing style to one
+  of literalsingle/literal or  literaldouble/literal
+  This option only affects the literalunicode/
+  linestyle
+  /para
+  /listitem
+  /varlistentry
 /variablelist
 /para
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 161de75..c0a09b1 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1054,6 +1054,9 @@ exec_command(const char *cmd,
 footer, format, linestyle, null,
 numericlocale, pager, recordsep,
 tableattr, title, tuples_only,
+unicode_border_linestyle,
+unicode_column_linestyle,
+unicode_header_linestyle,
 NULL
 			};
 
@@ -2248,6 +2251,55 @@ _align2string(enum printFormat in)
 	return unknown;
 }
 
+/*
+ * Parse entered unicode linestyle. Returns true, when entered string is
+ * known linestyle: single, double else returns false.
+ */
+static bool 
+set_unicode_line_style(const char *value, size_t vallen, unicode_linestyle *linestyle)
+{
+	if (pg_strncasecmp(single, value, vallen) == 0)
+		*linestyle = UNICODE_LINESTYLE_SINGLE;
+	else if (pg_strncasecmp(double, value, vallen) == 0)
+		*linestyle = UNICODE_LINESTYLE_DOUBLE;
+	else
+		return false;
+
+	return true;
+}
+
+static const char *
+_unicode_linestyle2string(int linestyle)
+{
+	switch (linestyle)
+	{
+		case UNICODE_LINESTYLE_SINGLE:
+			return single;
+			break;
+		case UNICODE_LINESTYLE_DOUBLE:
+			return double;
+			break;
+	}
+	return unknown;
+}
+
+static const char *
+_linestyle2string(linestyle_type line_style)
+{
+	switch (line_style)
+	{
+		case LINESTYLE_ASCII:
+			return ascii;
+			break;
+		case LINESTYLE_OLD_ASCII:
+			return old-ascii;
+			break;
+		case LINESTYLE_UNICODE:
+			return unicode;
+			break;
+	}
+	return unknown;
+}
 
 bool
 do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
@@ -2292,11 +2344,11 @@ do_pset(const char

Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README

2014-07-22 Thread Peter Geoghegan

On Tue, Jul 22, 2014 at 8:59 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 Okay, but how does this justify to add below new text in README.
 + Even with these read locks, Lehman and Yao's approach obviates the
 + need of earlier schemes to hold multiple read locks concurrently when
 + descending the tree as part of servicing index scans (pessimistic lock
 + coupling).

 Actually I think putting it can lead to inconsistency in the README.
 Currently it indicates that our algorithm is different from LY w.r.t taking
 Read Locks and has given explanation for same.

Not really. Firstly, that sentence acknowledges that there are read
locks where LY assume there will not be. Even with these read locks
references the first paragraph, where it is stated the Postgres
B-Trees still acquire read locks while descending the tree. Secondly,
I'm pretty sure that even Lehman and Yao realized that their apparent
assumption that real implementations would not require read locks
isn't realistic. Their handling of deletion seems perfunctory to me.
They say In situations where excessive deletions cause the storage
utilization of tree nodes to be unacceptably low, a batch
reorganization or an underflow operation which locks the entire tree
can be performed. I'm pretty sure that that sounded almost as bad in
1980 as it does now. We don't have a not quite LY implementation
just because there are single read locks acquired while descending the
tree. Prior schemes needed multiple *concurrent* exclusive locks.
B-Trees were around for about 10 years before LY.

There is reason to think that pretty much every practical
implementation uses read locks for many years, because there is a well
received 2001 paper [1] that describes a scheme where LY style B-link
trees can *actually* be made to not require read locks, which
discusses things like caching effects on contemporary hardware - it
involves playing tricks with detecting and recovering from page level
inconsistencies, IIRC. Furthermore, it references a scheme from the
late 90s involving local copies of B-Link pages. I thought about
pursuing something like that myself, but the cost of latching
(buffer locking) B-Trees in PostgreSQL is likely to be reduced before
too long anyway, which makes the general idea seem unenticing right
now.

 Basically I think it will be better if you can explain in bit more detail
 that
 how does right-links at all levels and high-key helps to detect and
 recover from concurrent page splits.

You might be right about that - perhaps I should go into more detail.

[1] http://www.vldb.org/conf/2001/P181.pdf
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

59 matches

Mail list logo