Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-02-12 Thread Sergey Konoplev
Hi,

On Mon, Jan 5, 2015 at 4:34 AM, Andres Freund and...@2ndquadrant.com wrote:
  pg_receivexlog: could not create archive status file
  mmm/archive_status/00010003.done: No such file or
  directory
 
  Dang. Stupid typo. And my tests didn't catch it, because I had
  archive_directory in the target directory :(

I started getting these errors after upgrading from 9.2.8 to 9.2.10.
Is it something critical that requires version downgrade or I can just
ignore that errors?

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-02-12 Thread Sergey Konoplev
On Thu, Feb 12, 2015 at 11:13 AM, Andres Freund and...@2ndquadrant.com wrote:
I started getting these errors after upgrading from 9.2.8 to 9.2.10.
Is it something critical that requires version downgrade or I can just
ignore that errors?

 What errors are you getting in precisely which circumstances? You're using 
 pg-receivexlog?

Errors like this one

pg_receivexlog: could not create archive status file
/mnt/archive/wal/archive_status/000402AF00B7.done: No
such file or directory
pg_receivexlog: disconnected..

on

Linux xyz 3.2.0-76-generic #111-Ubuntu SMP
PostgreSQL 9.2.10

Yes, I use pg_receivexlog. I also use a wrapper/watchdog script around
pg_receivexlog which tracks failures and restarts the latter.

The WAL files time correlates with the pg_receivexlog failures.

postgres@xyz:~$ ls -ltr /mnt/archive/wal/ | tail
-rw--- 1 postgres postgres 16777216 Feb 12 10:58 000402B60011
-rw--- 1 postgres postgres 16777216 Feb 12 11:02 000402B60012
-rw--- 1 postgres postgres 16777216 Feb 12 11:06 000402B60013
-rw--- 1 postgres postgres 16777216 Feb 12 11:11 000402B60014
-rw--- 1 postgres postgres 16777216 Feb 12 11:15 000402B60015
-rw--- 1 postgres postgres 16777216 Feb 12 11:19 000402B60016
-rw--- 1 postgres postgres 16777216 Feb 12 11:23 000402B60017
-rw--- 1 postgres postgres 16777216 Feb 12 11:27 000402B60018
-rw--- 1 postgres postgres 16777216 Feb 12 11:30 000402B60019
-rw--- 1 postgres postgres 16777216 Feb 12 11:32
000402B6001A.partial

postgres@xyz:~$ cat /var/log/pgcookbook/manage_pitr-wal.log | tail
Thu Feb 12 11:15:18 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file /mnt/archive/wal/archive_status/000402B60015.done:
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:19:33 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file /mnt/archive/wal/archive_status/000402B60016.done:
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:23:38 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file /mnt/archive/wal/archive_status/000402B60017.done:
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:27:32 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file /mnt/archive/wal/archive_status/000402B60018.done:
No such file or directory
pg_receivexlog: disconnected..
Thu Feb 12 11:30:34 PST 2015 ERROR manage_pitr.sh: Problem occured
during WAL archiving: pg_receivexlog: could not create archive status
file /mnt/archive/wal/archive_status/000402B60019.done:
No such file or directory
pg_receivexlog: disconnected..

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-02-12 Thread Sergey Konoplev
On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund and...@2ndquadrant.com wrote:
 This obviously should not be the case. I'll have a look in a couple of hours. 
 Until then you can likely  just work around the problem by creating the 
 archive_status directory.

Thank you. Just let me know if you need some extra info or debugging.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-02-12 Thread Andres Freund
On February 12, 2015 8:11:05 PM CET, Sergey Konoplev gray...@gmail.com wrote:
Hi,

On Mon, Jan 5, 2015 at 4:34 AM, Andres Freund and...@2ndquadrant.com
wrote:
  pg_receivexlog: could not create archive status file
  mmm/archive_status/00010003.done: No such file
or
  directory
 
  Dang. Stupid typo. And my tests didn't catch it, because I had
  archive_directory in the target directory :(

I started getting these errors after upgrading from 9.2.8 to 9.2.10.
Is it something critical that requires version downgrade or I can just
ignore that errors?

What errors are you getting in precisely which circumstances? You're using 
pg-receivexlog? 

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-02-12 Thread Andres Freund
Hi.

This obviously should not be the case. I'll have a look in a couple of hours. 
Until then you can likely  just work around the problem by creating the 
archive_status directory.
-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: binworld and install-binworld targets - was Re: [HACKERS] Release note bloat is getting out of hand

2015-02-12 Thread Peter Eisentraut
On 2/4/15 8:20 PM, Andrew Dunstan wrote:
 
 On 02/04/2015 06:53 PM, Tom Lane wrote:
 Or maybe use a make variable, like NO_DOC.  I think that's preferable to
 adding more targets.
 Unless we can come up with a new target name that obviously means
 world minus docs, the make-variable idea may be the best.

 I'm not terribly keen on this. If you don't like binworld, how about
 world-no-docs?

I think using options of some kind instead of top-level targets is
preferable.

If we add world-no-docs, should we also add install-world-no-docs,
installdirs-world-no-docs, uninstall-world-no-docs, check-work-no-docs,
installcheck-world-no-docs, clean-no-docs, distclean-no-docs, etc.?
This would get out of hand.

Also, it's harder to port things like that to other build systems,
including the secondary ones we already have.

We already have configure options to decide that we don't want to deal
with part of the tree.  (There is no make world-no-python.)  We used to
have support in configure to not build part of the docs.  We could
resurrect that if that's what people want.  I'd actually prefer that
even over a make variable.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2015-02-12 Thread Heikki Linnakangas

On 02/11/2015 04:20 PM, Abhijit Menon-Sen wrote:

At 2015-02-11 13:20:29 +0200, hlinnakan...@vmware.com wrote:


I don't follow. I didn't change configure at all, compared to your
patch.


OK, I extrapolated a little too much. Your patch didn't actually include
crc_instructions.h;


Oh, I'm sorry. Here's the complete patch with crc_instructions.h
- Heikki

From bd4a90d339e21cd6ac517d077fe3a76abb5ef37d Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Tue, 10 Feb 2015 14:26:24 +0200
Subject: [PATCH 1/1] Use Intel SSE4.2 CRC instructions where available.

On x86, perform a runtime check to see if we're running on a CPU that
supports SSE 4.2. If we are, we can use the special crc32b and crc32q
instructions for the CRC-32C calculations. That greatly speeds up CRC
calculation.

Abhijit Menon-Sen, reviewed by Andres Freund and me.
---
 configure   |   2 +-
 configure.in|   2 +-
 src/common/pg_crc.c | 109 +---
 src/include/common/pg_crc.h |  12 +++-
 src/include/pg_config.h.in  |   3 +
 src/include/port/crc_instructions.h | 121 
 6 files changed, 235 insertions(+), 14 deletions(-)
 create mode 100644 src/include/port/crc_instructions.h

diff --git a/configure b/configure
index fa271fe..c352128 100755
--- a/configure
+++ b/configure
@@ -9204,7 +9204,7 @@ fi
 done
 
 
-for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
+for ac_header in atomic.h cpuid.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
 do :
   as_ac_Header=`$as_echo ac_cv_header_$ac_header | $as_tr_sh`
 ac_fn_c_check_header_mongrel $LINENO $ac_header $as_ac_Header $ac_includes_default
diff --git a/configure.in b/configure.in
index e6a49d1..588d626 100644
--- a/configure.in
+++ b/configure.in
@@ -1032,7 +1032,7 @@ AC_SUBST(UUID_LIBS)
 ##
 
 dnl sys/socket.h is required by AC_FUNC_ACCEPT_ARGTYPES
-AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
+AC_CHECK_HEADERS([atomic.h cpuid.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
 
 # On BSD, test for net/if.h will fail unless sys/socket.h
 # is included first.
diff --git a/src/common/pg_crc.c b/src/common/pg_crc.c
index eba32d3..b6db749 100644
--- a/src/common/pg_crc.c
+++ b/src/common/pg_crc.c
@@ -21,25 +21,113 @@
 
 #include common/pg_crc.h
 
-/* Accumulate one input byte */
-#ifdef WORDS_BIGENDIAN
-#define CRC8(x) pg_crc32c_table[0][((crc  24) ^ (x))  0xFF] ^ (crc  8)
+#ifdef PG_HAVE_CRC32C_INSTRUCTIONS
+static pg_crc32 pg_comp_crc32c_hw(pg_crc32 crc, const void *data, size_t len);
+#endif
+
+#if !defined(PG_HAVE_CRC32C_INSTRUCTIONS) || defined(PG_CRC32C_INSTRUCTIONS_NEED_RUNTIME_CHECK)
+static pg_crc32 pg_comp_crc32c_sb8(pg_crc32 crc, const void *data, size_t len);
+static const uint32 pg_crc32c_table[8][256];
+#endif
+
+#ifdef PG_CRC32C_INSTRUCTIONS_NEED_RUNTIME_CHECK
+/*
+ * When built with support for CRC instructions, but we need to perform a
+ * run-time check to determine whether we can actually use them,
+ * pg_comp_crc32c is a function pointer. It is initialized to
+ * pg_comp_crc32c_choose, which performs the runtime check, and changes the
+ * function pointer so that subsequent calls go directly to the hw-accelerated
+ * version, or the fallback slicing-by-8 version.
+ */
+static pg_crc32
+pg_comp_crc32c_choose(pg_crc32 crc, const void *data, size_t len)
+{
+	if (pg_crc32_instructions_runtime_check())
+		pg_comp_crc32c = pg_comp_crc32c_hw;
+	else
+		pg_comp_crc32c = pg_comp_crc32c_sb8;
+
+	return pg_comp_crc32c(crc, data, len);
+}
+
+pg_crc32 (*pg_comp_crc32c)(pg_crc32 crc, const void *data, size_t len) = pg_comp_crc32c_choose;
+
 #else
-#define CRC8(x) pg_crc32c_table[0][(crc ^ (x))  0xFF] ^ (crc  8)
+/*
+ * No need for a runtime check. Compile directly with the hw-accelerated
+ * or the slicing-by-8 version. (We trust that the compiler
+ * is smart enough to inline it here.)
+ */
+pg_crc32
+pg_comp_crc32c(pg_crc32 crc, const void *data, size_t len)

Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Anastasia Lubennikova
Thanks for answer.
Now it seems to be applied correctly.

2015-02-12 3:12 GMT+04:00 Thom Brown t...@linux.com:

 On 11 February 2015 at 22:50, Anastasia Lubennikova 
 lubennikov...@gmail.com wrote:

 Finally there is a new version of patch (in attachments).
 It provides multicolumn index-only scan for GiST indexes.

 - Memory leak is fixed.
 - little code cleanup
 - example of performance test in attachmens
 - function OIDs have debugging values (*) just to avoid merge
 conflicts while testing patch

 Wiki page of the project is

 https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

 Waiting for feedback.


 Hi Anastasia.  Thanks for the updated patch.  I've just tried applying it
 to head and it doesn't appear to apply cleanly.

 $ patch -p1  ~/Downloads/indexonlyscan_gist_2.0.patch
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/backend/access/gist/gist.c
 Hunk #1 succeeded at 1404 (offset 9 lines).
 Hunk #2 succeeded at 1434 (offset 9 lines).
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/backend/access/gist/gistget.c
 Hunk #1 succeeded at 227 (offset 1 line).
 Hunk #2 succeeded at 243 (offset 1 line).
 Hunk #3 succeeded at 293 (offset -4 lines).
 Hunk #4 succeeded at 330 (offset -4 lines).
 Hunk #5 succeeded at 365 (offset -5 lines).
 Hunk #6 succeeded at 444 (offset -27 lines).
 Hunk #7 succeeded at 474 (offset -27 lines).
 Hunk #8 FAILED at 518.
 Hunk #9 succeeded at 507 (offset -28 lines).
 Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines).
 Hunk #11 FAILED at 601.
 2 out of 11 hunks FAILED -- saving rejects to file
 src/backend/access/gist/gistget.c.rej
 ...

 --
 Thom




-- 
Best regards,
Lubennikova Anastasia
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index db2a452..53750da 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1404,6 +1404,14 @@ initGISTstate(Relation index)
 		else
 			giststate-distanceFn[i].fn_oid = InvalidOid;
 
+		/* opclasses are not required to provide a Fetch method */
+		if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC)))
+			fmgr_info_copy((giststate-fetchFn[i]),
+		 index_getprocinfo(index, i + 1, GIST_FETCH_PROC),
+		   scanCxt);
+		else
+			giststate-fetchFn[i].fn_oid = InvalidOid;
+
 		/*
 		 * If the index column has a specified collation, we should honor that
 		 * while doing comparisons.  However, we may have a collatable storage
@@ -1426,6 +1434,22 @@ initGISTstate(Relation index)
 	return giststate;
 }
 
+/*
+ * Gistcanreturn is supposed to be true if ANY FetchFn method is defined.
+ * If FetchFn exists it would be used in index-only scan
+ * Thus the responsibility rests with the opclass developer.
+ */
+
+Datum
+gistcanreturn(PG_FUNCTION_ARGS) {
+	Relation index = (Relation) PG_GETARG_POINTER(0);
+	int i = PG_GETARG_INT32(1);
+	if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC)))
+		PG_RETURN_BOOL(true);
+	else
+		PG_RETURN_BOOL(false);
+}
+
 void
 freeGISTstate(GISTSTATE *giststate)
 {
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 717cb85..0925e56 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -227,8 +227,10 @@ gistindex_keytest(IndexScanDesc scan,
  * If tbm/ntids aren't NULL, we are doing an amgetbitmap scan, and heap
  * tuples should be reported directly into the bitmap.  If they are NULL,
  * we're doing a plain or ordered indexscan.  For a plain indexscan, heap
- * tuple TIDs are returned into so-pageData[].  For an ordered indexscan,
+ * tuple TIDs are returned into so-pageData. For an ordered indexscan,
  * heap tuple TIDs are pushed into individual search queue items.
+ * If index-only scan is possible, heap tuples themselves are returned
+ * into so-pageData or into search queue.
  *
  * If we detect that the index page has split since we saw its downlink
  * in the parent, we push its new right sibling onto the queue so the
@@ -241,6 +243,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	GISTScanOpaque so = (GISTScanOpaque) scan-opaque;
 	Buffer		buffer;
 	Page		page;
+	GISTSTATE *giststate = so-giststate;
+	Relation r = scan-indexRelation;
+	boolisnull[INDEX_MAX_KEYS];
+	GISTSearchHeapItem *tmpListItem;
 	GISTPageOpaque opaque;
 	OffsetNumber maxoff;
 	OffsetNumber i;
@@ -287,8 +293,6 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		MemoryContextSwitchTo(oldcxt);
 	}
 
-	so-nPageData = so-curPageData = 0;
-
 	/*
 	 * check all tuples on page
 	 */
@@ -326,11 +330,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		else if (scan-numberOfOrderBys == 0  GistPageIsLeaf(page))
 		{
 			/*
-			 * Non-ordered scan, so report heap tuples in so-pageData[]
+			 * Non-ordered scan, so report tuples in so-pageData
+			 */
+
+			/* form tmpListItem and 

Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański

Andres Freund writes:

 On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
 That doesn't solve the problem of the Python deadlock, where you're not at
 leisure to call a C function at the beginning of your module.

 We could just never unload the hooks...

That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
got changed after 
http://www.postgresql.org/message-id/48620925.6070...@pws.com.au


  * If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

 So yeah, that was my initial approach - check if callbacks are set, don't do
 the dance if they are. It felt like a crutch, though, and racy at that. 
 There's
 no atomic way to test-and-set those callbacks. The window for racyness is
 small, though.

 If you do that check during library initialization instead of every
 connection it shouldn't be racy - if that part is run in a multithreaded
 fashion you're doing something crazy.

Yes, that's true. The problem is that there's no real libpq initialisation
function. The docs say that:

If your application initializes libssl and/or libcrypto libraries and libpq is
built with SSL support, you should call PQinitOpenSSL

So most apps will just not bother. The moment you know you'll need SSL is only
when you get an 'S' message from the server...

Cheers,
Jan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-12 Thread Michael Paquier
On Thu, Feb 12, 2015 at 5:44 PM, Naoya Anzai anzai-na...@mxu.nes.nec.co.jp
wrote:

 Hi, Michael-san

  An updated patch is attached,

 I'm sorry for confusing you.

 I think you don't have to implement this code to disable this
 feature with using value -2.Because this use case is a rare case,
 and there is a practical workaround using huge value like 2e9.
 (You suggested 2e9 to me, didn't you? :) ) So, please remove this code.


I will clean up the code.


  Well, I see your point but this is not completely true: we could as
  well rely entirely on this parameter instead of VACOPT_VERBOSE to
  determine if autovacuum, a vacuum or an analyze are in verbose mode,
  and remove VACOPT_VERBOSE, but I can imagine people complaining if
  VACOPT_VERBOSE is removed. So let's set it up unconditionally to -1 in
  gram.y for now. However if people think that it is fine to remove
  VACOPT_VERBOSE, we could use exclusively this parameter in VacuumStmt.
  Or even add sanity checks at the top of vacuum() to ensure that
  VACOPT_VERBOSE is set only when log_min_duration is positive.
  Additional opinions on this matter are welcome.

 I understand your point at last. :)

 You mean that ...
 Log_autovacuum_min_duration assumes a role of VACOPT_VERBOSE.
 Even if this parameter never use currently for manual vacuum,
 log_autovacuum_min_duration should be set zero(anytime output)
 when we executes VACUUM(or ANALYZE) VERBOSE.
 Is my understanding correct? If so,it sounds logical.


Yup, that's my opinion. Now I don't know if people would mind to remove
VACOPT_VERBOSE and replace the control it does by log_min_duration in
VacuumStmt. At least both things are overlapping, and log_min_duration
offers more options than the plain VACOPT_VERBOSE.


 If we can abolish VERBOSE option,
 I think it's ideal that we will prepare a new parameter like
 a log_min_duration_vacuum(and log_min_duration_analyze) which
 integrating VERBOSE feature and log_autovacuum_min_duration.


What I think you are proposing here is a VERBOSE option that hypothetically
gets activated if a manual VACUUM takes more than a certain amount
specified by those parameters. I doubt this would be useful. In any case
this is unrelated to this patch.
-- 
Michael


Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Amit Kapila
On Thu, Feb 12, 2015 at 1:51 AM, Robert Haas robertmh...@gmail.com wrote:


 I think we may want a dedicated parallel-safe property for functions
 rather than piggybacking on provolatile, but that will probably also
 be changeable via ALTER FUNCTION, and stored rules won't get
 miraculously updated.  So this definitely can't be something we figure
 out at parse-time ... it's got to be determined later.  But at the
 moment I see no way to do that without an extra pass over the whole
 rewritten query tree.  :-(


If we have to go this way, then isn't it better to evaluate the same
when we are trying to create parallel path (something like in the
parallel_seq scan patch - create_parallelscan_paths())?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Parallel Seq Scan

2015-02-12 Thread Amit Kapila
On Thu, Feb 12, 2015 at 2:19 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Feb 10, 2015 at 3:56 PM, Andres Freund and...@2ndquadrant.com
wrote:
  On 2015-02-10 09:23:02 -0500, Robert Haas wrote:
  On Tue, Feb 10, 2015 at 9:08 AM, Andres Freund and...@2ndquadrant.com
wrote:
 
  As pointed out above (moved there after reading the patch...) I don't
  think a chunk size of 1 or any other constant size can make sense. I
  don't even believe it'll necessarily be constant across an entire query
  execution (big initially, small at the end).  Now, we could move
  determining that before the query execution into executor
  initialization, but then we won't yet know how many workers we're going
  to get. We could add a function setting that at runtime, but that'd mix
  up responsibilities quite a bit.

 I still think this belongs in heapam.c somehow or other.  If the logic
 is all in the executor, then it becomes impossible for any code that
 doensn't use the executor to do a parallel heap scan, and that's
 probably bad.  It's not hard to imagine something like CLUSTER wanting
 to reuse that code, and that won't be possible if the logic is up in
 some higher layer.  If the logic we want is to start with a large
 chunk size and then switch to a small chunk size when there's not much
 of the relation left to scan, there's still no reason that can't be
 encapsulated in heapam.c.


It seems to me that we need to use both ways (make heap or other lower
layers aware of parallelism and another one is handle at executor level and
use callback_function and callback_state to make it work) for doing
parallelism.  TBH, I think for the matter of this patch we can go either way
and then think more on it as we move ahead to parallelize other operations.
So what I can do is to try using Robert's patch to make heap aware of
parallelism and then see how it comes up?

  Btw, using a atomic uint32 you'd end up without the spinlock and just
  about the same amount of code... Just do a atomic_fetch_add_until32(var,
  1, InvalidBlockNumber)... ;)

 I thought of that, but I think there's an overflow hazard.

  Where the 'coordinated seqscan' scans a relation so that each tuple
  eventually gets returned once across all nodes, but the nested loop (and
  through it the index scan) will just run normally, without any
  coordination and parallelism. But everything below --- would happen
  multiple nodes. If you agree, yes, then we're in violent agreement
  ;). The single node that gets copied bit above makes me a bit unsure
  whether we are though.

 Yeah, I think we're talking about the same thing.

  To me, given the existing executor code, it seems easiest to achieve
  that by having the ParallelismDrivingNode above having a dynamic number
  of nestloop children in different backends and point the coordinated
  seqscan to some shared state.  As you point out, the number of these
  children cannot be certainly known (just targeted for) at plan time;
  that puts a certain limit on how independent they are.  But since a
  large number of them can be independent between workers it seems awkward
  to generally treat them as being the same node across workers. But maybe
  that's just an issue with my mental model.

 I think it makes sense to think of a set of tasks in which workers can
 assist.  So you a query tree which is just one query tree, with no
 copies of the nodes, and then there are certain places in that query
 tree where a worker can jump in and assist that node.  To do that, it
 will have a copy of the node, but that doesn't mean that all of the
 stuff inside the node becomes shared data at the code level, because
 that would be stupid.


As per my understanding of the discussion related to this point, I think
there are 3 somewhat related ways to achieve this.

1. Both master and worker runs the same node (ParallelSeqScan) where
the work done by worker (scan chunks of the heap) for this node is
subset of what is done by master (coordinate the data returned by workers +
scan chunks of heap).  It seems to me Robert is advocating this approach.
2. Master and worker uses different nodes to operate. Master runs
parallelism
drivingnode (ParallelSeqscan - coordinate the data returned by workers +
scan chunks of heap ) and worker runs some form of Parallelismdriver
node (PartialSeqScan - scan chunks of the heap).  It seems to me
Andres is proposing this approach.
3. Same as 2, but modify existing SeqScan node to behave as
PartialSeqScan.  This is what I have done in patch.

Correct me or add here if I have misunderstood any thing.

I think going forward (for cases like aggregation) the work done in
Master and Worker node will have substantial differences that it
is better to do the work as part of different nodes in master and
worker.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2015-02-12 Thread Syed, Rahila

Thank you for comments. Please find attached the updated patch.

This patch fails to compile:
xlogreader.c:1049:46: error: extraneous ')' after condition, expected a 
statement
blk-with_hole  blk-hole_offset = 
 0))
This has been rectified.

Note as well that at least clang does not like much how the sanity check with 
with_hole are done. You should place parentheses around the '' expressions. 
Also, I would rather define with_hole == 0 or with_hole == 1 explicitly int 
those checks
The expressions are modified accordingly.

There is a typo:
s/true,see/true, see/
[nitpicky]Be as well aware of the 80-character limit per line that is usually 
normally by comment blocks.[/]

Have corrected the typos and changed the comments as mentioned. Also , 
realigned certain lines to meet the 80-char limit.

Thank you,
Rahila Syed



__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

Support-compression-for-full-page-writes-in-WAL_v18.patch
Description: Support-compression-for-full-page-writes-in-WAL_v18.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about to have relnamespace and relrole?

2015-02-12 Thread Kyotaro HORIGUCHI
Hello, I changed the subject.

This mail is to address the point at hand, preparing for
registering this commitfest.

15 17:29:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote in
20150204.172914.52110711.horiguchi.kyot...@lab.ntt.co.jp
 Tue, 03 Feb 2015 10:12:12 -0500, Tom Lane t...@sss.pgh.pa.us wrote in 
 2540.1422976...@sss.pgh.pa.us
  I'm not really excited about that.  That line of thought would imply
  that we should have reg* types for every system catalog, which is
  surely overkill.

 Mmm. I suppose for every OID usage, not every system catalog.
 but I agree as the whole. There's no agreeable-by-all
 boundary. Perhaps it depends on how often the average DBA looks
 onto catalogs which have oids pointing another catalog which they
 want to see in human-readable form, without joins if possible.

 I very roughly counted how often the oids of catalogs referred
 from other catalogs.

1. Expected frequency of use

I counted how often oids of system catalogs are referred to by
other system catalog/views. Among them, pg_stat_* views, are
excluded since they have text representations for all oid
references.

The result is like this. The full result of the counting is in
the Excel file but it's not at hand for now.. I'll show it here
if anyone wants to see it.

pg_class.oid: 27
pg_authid.oid:33
pg_namespace.oid: 20
pg_proc.oid:  13
pg_type.oid:  15
pg_databse.oid:5
pg_operator.oid:   5
pg_am.oid: 4


Among these, authid and namespace are apparently referred to
frequently but don't have their reg* types but referred to from
more points than proc, type, operator, am..

# By the way, I don't understand where the reg comes from,
# REGistered? Or other origin?

For that reason, although the current policy of deciding whether
to have reg* types seems to be whether they have schema-qualified
names, I think regrole and regnamespace are valuable to have.


2. Anticipaed un-optimizability

Tom pointed that these reg* types prevents planner from
optimizing the query, so we should refrain from having such
machinary. It should have been a long-standing issue but reg*s
sees to be rather faster than joining corresponding catalogs
for moderate number of the result rows, but this raises another
more annoying issue.


3. Potentially breakage of MVCC

The another issue Tom pointed is potentially breakage of MVCC by
using these reg* types. Syscache is invalidated on updates so
this doesn't seem to be a problem on READ COMMITTED mode, but
breaks SERIALIZABLE mode. But IMHO it is not so serious problem
as long as such inconsistency occurs only on system catalog and
it is explicitly documented togethee with the un-optimizability
issue. I'll add a patch for this later.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Tom Lane
Ravi Kiran ravi.kolanp...@gmail.com writes:
 I am sorry for the late reply, when I  disabled the hash join command
 enable_hashjoin=off in the postgresql.conf file, it was not working. But
 I when I used the command set enable_hashjoin=off command in the back
 end. It worked.
  I am not able to understand why it did not get disabled when I changed it
 in the postgresql file.

The two plausible explanations for that are (1) you didn't do a reload
or (2) you forgot to remove the '#' comment marker in the file's entry.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread David G Johnston
Jim Nasby-5 wrote
 On 2/10/15 9:29 AM, Tom Lane wrote:
 Ravi Kiran lt;

 ravi.kolanpaka@

 gt; writes:
 yes sir, I did try the pg_ctl reload command, but its still using the
 hash
 join algorithm and not the nested loop algorithm. I even restarted the
 server, even then its still using the hash join algorithm

 Does show enable_hashjoin say it's off?  If not, I think you must've
 fat-fingered the postgresql.conf change somehow.
 
 For future reference, posts like this belong on pgsql-performance.


 but postgres is still using the hash join algorithm even after modifying
 the postgresql code

To be fair given the original post, and some other recent posts by the same
author, the question is not why is my query performing slowly but rather
I'm trying to change how PostgreSQL works and cannot figure out how to
properly enable and disable algorithms.  -hackers seems like the proper
forum though the OP could give more context as to his overarching goals to
make that more clear to readers.

David J.




--
View this message in context: 
http://postgresql.nabble.com/enabling-nestedloop-and-disabling-hashjon-tp5837275p5837728.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-12 Thread Christoph Berg
Hi,

gcc5 is lurking in Debian experimental, and it's breaking initdb.
There's bug reports for 9.1 and 9.4:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778070 (9.1)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778071 (9.4)
but I could reproduce it with 9.5devel (from last week) as well:

gcc (Debian 5-20150205-1) 5.0.0 20150205 (experimental) [trunk revision 220455]


make[2]: Leaving directory '/srv/projects/postgresql/pg/master/src/common'
../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check 
--top-builddir=../../..--dlpath=.  --schedule=./parallel_schedule   
== removing existing temp installation==
== creating temporary installation==
== initializing database system   ==

pg_regress: initdb failed
Examine /srv/projects/postgresql/pg/master/src/test/regress/log/initdb.log for 
the reason.
Command was: 
/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/initdb
 -D /srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/data -L 
/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/install//usr/local/pgsql/share
 --noclean --nosync  
/srv/projects/postgresql/pg/master/src/test/regress/log/initdb.log 21
GNUmakefile:138: recipe for target 'check' failed


src/test/regress $ cat log/initdb.log
Running in noclean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user cbe.
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  de_DE.utf8
  CTYPE:de_DE.utf8
  MESSAGES: C
  MONETARY: de_DE.utf8
  NUMERIC:  de_DE.utf8
  TIME: de_DE.utf8
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to german.

Data page checksums are disabled.

creating directory 
/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... sysv
creating configuration files ... ok
creating template1 database in 
/srv/projects/postgresql/pg/master/src/test/regress/./tmp_check/data/base/1 ... 
FATAL:  could not extend file base/1/2617_fsm: wrote only 4096 of 8192 bytes 
at block 46197
HINT:  Check free disk space.
PANIC:  cannot abort transaction 1, it was already committed
Aborted (core dumped)


src/test/regress $ ls -al tmp_check/data/base/1/
insgesamt 34156376
drwx-- 2 cbe cbe   4096 Feb 12 20:04 ./
drwx-- 3 cbe cbe   4096 Feb 12 19:55 ../
-rw--- 1 cbe cbe  40960 Feb 12 20:04 1247
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.1
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.2
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.3
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.4
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.5
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1247_fsm.6
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1247_fsm.7
-rw--- 1 cbe cbe   59138048 Feb 12 20:04 1247_fsm.8
-rw--- 1 cbe cbe  49152 Feb 12 20:04 1249
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.1
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.2
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.3
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.4
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.5
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.6
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1249_fsm.7
-rw--- 1 cbe cbe   59138048 Feb 12 20:04 1249_fsm.8
-rw--- 1 cbe cbe 540672 Feb 12 20:03 1255
-rw--- 1 cbe cbe 1073741824 Feb 12 19:55 1255_fsm
-rw--- 1 cbe cbe 1073741824 Feb 12 19:55 1255_fsm.1
-rw--- 1 cbe cbe 1073741824 Feb 12 19:55 1255_fsm.2
-rw--- 1 cbe cbe 1073741824 Feb 12 19:55 1255_fsm.3
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1255_fsm.4
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1255_fsm.5
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1255_fsm.6
-rw--- 1 cbe cbe 1073741824 Feb 12 20:03 1255_fsm.7
-rw--- 1 cbe cbe   59138048 Feb 12 20:03 1255_fsm.8
-rw--- 1 cbe cbe  16384 Feb 12 20:04 1259
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.1
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.2
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.3
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.4
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.5
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.6
-rw--- 1 cbe cbe 1073741824 Feb 12 20:04 1259_fsm.7
-rw--- 1 cbe cbe   59138048 Feb 12 20:04 1259_fsm.8
-rw--- 1 cbe cbe  0 Feb 12 20:04 2604
-rw--- 1 cbe cbe  

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Robert Haas
On Thu, Feb 12, 2015 at 3:52 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 Probably not, because many queries will scan multiple relations, and
 we want to do all of this work just once per query.

 By this, do you mean to say that if there is any parallel-unsafe
 expression (function call) in query, then we won't parallelize any
 part of query, if so why is that mandatory?

Because of stuff like set_config() and txid_current(), which will fail
outright in parallel mode.  Also because the user may have defined a
function that updates some other table in the database, which will
also fail outright if called in parallel mode.  Instead of failing, we
want those kinds of things to fall back to a non-parallel plan.

 Can't we parallelize scan on a particular relation if all the expressions
 in which that relation is involved are parallel-safe

No, because the execution of that node can be interleaved in arbitrary
fashion with all the other nodes in the plan tree.  Once we've got
parallel mode active, all the related prohibitions apply to
*everything* we do thereafter, not just that one node.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Jim Nasby

On 2/12/15 3:34 PM, Ravi Kiran wrote:

sorry for the inconvenience if caused to anyone, but as David G johnston
said, I was trying to change how the postgresql works and was not able
to figure out how it should be done. I will make sure it will be clear
from the next time. Thank you very much.


And we're glad for the input. But we get a tremendous amount of email, 
so it's best if posts go to the right place. Just for future reference.



@Tom lane   Sir, I forgot to remove the # comment marker in the file's
entry, Thank you.


Glad you were able to fix the problem. :)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Ravi Kiran
I am sorry for the late reply, when I  disabled the hash join command
enable_hashjoin=off in the postgresql.conf file, it was not working. But
I when I used the command set enable_hashjoin=off command in the back
end. It worked.
 I am not able to understand why it did not get disabled when I changed it
in the postgresql file.
ᐧ

On Fri, Feb 13, 2015 at 2:34 AM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 2/10/15 9:29 AM, Tom Lane wrote:

 Ravi Kiran ravi.kolanp...@gmail.com writes:

 yes sir, I did try the pg_ctl reload command, but its still using the
 hash
 join algorithm and not the nested loop algorithm. I even restarted the
 server, even then its still using the hash join algorithm


 Does show enable_hashjoin say it's off?  If not, I think you must've
 fat-fingered the postgresql.conf change somehow.


 For future reference, posts like this belong on pgsql-performance.

 The other possibility is that the query estimates are so high that the
 setting doesn't matter. When you set any of the enable_* settings to off,
 all that really happens is the planner adds a cost of 10M to those nodes
 when it's planning. Normally that's enough to toss those plans out, but in
 extreme cases the cost estimates will still come up with the un-desired
 plan.

 Can you post EXPLAIN ANALYZE output with the setting on and off? Or at
 least plain EXLPAIN output.
 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Jim Nasby

On 2/12/15 3:20 PM, David G Johnston wrote:

Does show enable_hashjoin say it's off?  If not, I think you must've
fat-fingered the postgresql.conf change somehow.


For future reference, posts like this belong on pgsql-performance.



but postgres is still using the hash join algorithm even after modifying
the postgresql code

To be fair given the original post, and some other recent posts by the same
author, the question is not why is my query performing slowly but rather
I'm trying to change how PostgreSQL works and cannot figure out how to
properly enable and disable algorithms.  -hackers seems like the proper
forum though the OP could give more context as to his overarching goals to
make that more clear to readers.


-hackers is for discussion directly related to developing Postgres 
itself. This email was request for help, and nothing to do with actual 
development.


I'm not trying to dismiss the importance of the request; it is 
important. But the proper forum for it was -general or -performance.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Rodrigo Gonzalez
On 12/2/15 18:29, Tom Lane wrote:
 Ravi Kiran ravi.kolanp...@gmail.com writes:
 I am sorry for the late reply, when I  disabled the hash join command
 enable_hashjoin=off in the postgresql.conf file, it was not working. But
 I when I used the command set enable_hashjoin=off command in the back
 end. It worked.
  I am not able to understand why it did not get disabled when I changed it
 in the postgresql file.
 
 The two plausible explanations for that are (1) you didn't do a reload
 or (2) you forgot to remove the '#' comment marker in the file's entry.

Or you changed the wrong postgresql.conf file

Regards

Rodrigo Gonzalez



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Help me! Why did the salve stop suddenly ?

2015-02-12 Thread hailong Li
Hi, dear pgsql-hackers


*1. environment*

*DB Master*

$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m

$ uname -av
Linux l-x1.xx.cnx 3.14.29-3.centos6.x86_64 #1 SMP Tue Jan 20 17:48:32
CST 2015 x86_64 x86_64 x86_64 GNU/Linux

$ psql -U postgres
psql (9.3.5)
Type help for help.

postgres=# select version();

version
--
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)

$ pg_config
BINDIR = /opt/pg93/bin
DOCDIR = /opt/pg93/share/doc/postgresql
HTMLDIR = /opt/pg93/share/doc/postgresql
INCLUDEDIR = /opt/pg93/include
PKGINCLUDEDIR = /opt/pg93/include/postgresql
INCLUDEDIR-SERVER = /opt/pg93/include/postgresql/server
LIBDIR = /opt/pg93/lib
PKGLIBDIR = /opt/pg93/lib/postgresql
LOCALEDIR = /opt/pg93/share/locale
MANDIR = /opt/pg93/share/man
SHAREDIR = /opt/pg93/share/postgresql
SYSCONFDIR = /opt/pg93/etc/postgresql
PGXS = /opt/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg93' '--with-perl' '--with-libxml'
'--with-libxslt' '--with-ossp-uuid' 'CFLAGS= -march=core2 -O2 '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -march=core2 -O2  -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed
-Wl,-rpath,'/opt/pg93/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lxslt -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.5





*DB Slave*$ cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m

$ uname -av
Linux l-2.xx.cnx 3.14.31-3.centos6.x86_64 #1 SMP Mon Feb 2 15:26:04 CST
2015 x86_64 x86_64 x86_64 GNU/Linux

$ psql -U postgres
psql (9.3.5)
Type help for help.

postgres=# select version();

version
--
 PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)

postgres=# show log_line_prefix ;
   log_line_prefix
--
 [%u %d %a %h %m %p %c %l %x]
(1 row)


$ pg_config
BINDIR = /opt/pg93/bin
DOCDIR = /opt/pg93/share/doc/postgresql
HTMLDIR = /opt/pg93/share/doc/postgresql
INCLUDEDIR = /opt/pg93/include
PKGINCLUDEDIR = /opt/pg93/include/postgresql
INCLUDEDIR-SERVER = /opt/pg93/include/postgresql/server
LIBDIR = /opt/pg93/lib
PKGLIBDIR = /opt/pg93/lib/postgresql
LOCALEDIR = /opt/pg93/share/locale
MANDIR = /opt/pg93/share/man
SHAREDIR = /opt/pg93/share/postgresql
SYSCONFDIR = /opt/pg93/etc/postgresql
PGXS = /opt/pg93/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/opt/pg93' '--with-perl' '--with-libxml'
'--with-libxslt' '--with-ossp-uuid' 'CFLAGS= -march=core2 -O2 '
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -march=core2 -O2  -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fpic
LDFLAGS = -L../../../src/common -Wl,--as-needed
-Wl,-rpath,'/opt/pg93/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lpgcommon -lxslt -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 9.3.5



2.

*the DB log in the Slave's log_directory*[2015-02-05 15:38:51.406
CST 2328 54d08abc.918 6 0]WARNING:  will not overwrite a used ItemId
[2015-02-05 15:38:51.406 CST 2328 54d08abc.918 7 0]CONTEXT:  xlog
redo insert: rel 38171461/16384/57220350; tid 1778398/9
[2015-02-05 15:38:51.406 CST 2328 54d08abc.918 8 0]PANIC:
heap_insert_redo: failed to add tuple
[2015-02-05 15:38:51.406 CST 2328 54d08abc.918 9 0]CONTEXT:  xlog
redo insert: rel 38171461/16384/57220350; tid 1778398/9
[2015-02-05 15:38:51.765 CST 2320 54d08abb.910 6 0]LOG:  startup
process (PID 2328) was terminated by signal 6: Aborted
[2015-02-05 15:38:51.765 CST 2320 54d08abb.910 7 0]LOG:
terminating any other active server processes
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST
61450 54d31d48.f00a 3 0]WARNING:  terminating connection because of
crash of another server process
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST
61450 54d31d48.f00a 4 0]DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST
61450 54d31d48.f00a 5 0]HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
[DBusesr DBname [unknown] 192.168.xxx.x 2015-02-05 15:38:51.765 CST
51208 54d315b6.c808 7 0]WARNING:  terminating connection because of
crash of another server process
[DBusesr 

Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-12 Thread Tom Lane
Christoph Berg c...@df7cb.de writes:
 gcc5 is lurking in Debian experimental, and it's breaking initdb.

Yeah, I just heard the same about Red Hat as well:

https://bugzilla.redhat.com/show_bug.cgi?id=1190978

Not clear if it's an outright compiler bug or they've just found some
creative new way to make an optimization assumption we're violating.
Either way it seems clear that the find-a-page-with-free-space code is
getting into an infinite loop whereby it keeps extending the FSM till
it runs out of disk space.

There's a more detailed stack trace in the Red Hat report.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Jim Nasby

On 2/10/15 9:29 AM, Tom Lane wrote:

Ravi Kiran ravi.kolanp...@gmail.com writes:

yes sir, I did try the pg_ctl reload command, but its still using the hash
join algorithm and not the nested loop algorithm. I even restarted the
server, even then its still using the hash join algorithm


Does show enable_hashjoin say it's off?  If not, I think you must've
fat-fingered the postgresql.conf change somehow.


For future reference, posts like this belong on pgsql-performance.

The other possibility is that the query estimates are so high that the 
setting doesn't matter. When you set any of the enable_* settings to 
off, all that really happens is the planner adds a cost of 10M to those 
nodes when it's planning. Normally that's enough to toss those plans 
out, but in extreme cases the cost estimates will still come up with the 
un-desired plan.


Can you post EXPLAIN ANALYZE output with the setting on and off? Or at 
least plain EXLPAIN output.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-12 Thread Michael Paquier
On Thu, Jan 15, 2015 at 4:05 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 We are soon entering in the money time for this CF. The last month has
 been mainly a vacation period, the progress being fantomatic on many
 fronts, still there are a couple of patches that are marked as ready
 for committer:
 - Foreign table inheritance , whose first patch has been committed
 - speedup tidbitmap patch: cache page
 - pg_basebackup vs. Windows and tablespaces (Extend base backup to
 include symlink file used to restore symlinks)
 - If pg_hba.conf has changed since last reload, emit a HINT to the
 client and server log
 - Add restore_command_retry_interval option to control timeout of
 restore_command nonzero status code
 - documentation update for doc/src/sgml/func.sgml
 - Reducing lock strength of trigger and foreign key DDL

 I am going to make a first pass on all the patches to check their
 status, and please note that the patches waiting for some input from
 their authors will be marked as returned with feedback.

In order to move on to the next CF, I am going to go through all the
remaining patches and update their status accordingly. And sorry for
slacking a bit regarding that. If you wish to complain, of course feel
free to post messages on this thread or on the thread related to the
patch itself.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about to have relnamespace and relrole?

2015-02-12 Thread Jim Nasby

On 2/12/15 5:28 AM, Kyotaro HORIGUCHI wrote:

Hello, I changed the subject.

This mail is to address the point at hand, preparing for
registering this commitfest.

15 17:29:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote in
20150204.172914.52110711.horiguchi.kyot...@lab.ntt.co.jp

Tue, 03 Feb 2015 10:12:12 -0500, Tom Lane t...@sss.pgh.pa.us wrote in 
2540.1422976...@sss.pgh.pa.us

I'm not really excited about that.  That line of thought would imply
that we should have reg* types for every system catalog, which is
surely overkill.


Mmm. I suppose for every OID usage, not every system catalog.
but I agree as the whole. There's no agreeable-by-all
boundary. Perhaps it depends on how often the average DBA looks
onto catalogs which have oids pointing another catalog which they
want to see in human-readable form, without joins if possible.

I very roughly counted how often the oids of catalogs referred
from other catalogs.


1. Expected frequency of use

...

For that reason, although the current policy of deciding whether
to have reg* types seems to be whether they have schema-qualified
names, I think regrole and regnamespace are valuable to have.


Perhaps schema qualification was the original intent, but I think at 
this point everyone uses them for convenience. Why would I want to type 
out JOIN pg_namespace n ON n.oid = blah.???namespace when I could simply 
do ???namespace::regnamespace?



2. Anticipaed un-optimizability

Tom pointed that these reg* types prevents planner from
optimizing the query, so we should refrain from having such
machinary. It should have been a long-standing issue but reg*s
sees to be rather faster than joining corresponding catalogs
for moderate number of the result rows, but this raises another
more annoying issue.


3. Potentially breakage of MVCC

The another issue Tom pointed is potentially breakage of MVCC by
using these reg* types. Syscache is invalidated on updates so
this doesn't seem to be a problem on READ COMMITTED mode, but
breaks SERIALIZABLE mode. But IMHO it is not so serious problem
as long as such inconsistency occurs only on system catalog and
it is explicitly documented togethee with the un-optimizability
issue. I'll add a patch for this later.


The way I look at it, all these arguments went out the window when 
regclass was introduced.


The reality is that reg* is *way* more convenient than manual joins. The 
arguments about optimization and MVCC presumably apply to all reg* 
casts, which means that ship has long since sailed.


If we offered views that made it easier to get at this data then maybe 
this wouldn't matter so much. But DBA's don't want to join 3 catalogs 
together to try and answer a simple question.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-12 Thread Naoya Anzai
 You mean that ...
 Log_autovacuum_min_duration assumes a role of VACOPT_VERBOSE.
 Even if this parameter never use currently for manual vacuum,
 log_autovacuum_min_duration should be set zero(anytime output)
 when we executes VACUUM(or ANALYZE) VERBOSE.
 Is my understanding correct? If so,it sounds logical.


Yup, that's my opinion. Now I don't know if people would mind to remove
VACOPT_VERBOSE and replace the control it does by log_min_duration in
VacuumStmt. At least both things are overlapping, and log_min_duration
offers more options than the plain VACOPT_VERBOSE.

OK. I agree with you.
Please re-implement as you are thinking.

 If we can abolish VERBOSE option,
 I think it's ideal that we will prepare a new parameter like
 a log_min_duration_vacuum(and log_min_duration_analyze) which
 integrating VERBOSE feature and log_autovacuum_min_duration.


What I think you are proposing here is a VERBOSE option that hypothetically
gets activated if a manual VACUUM takes more than a certain amount
specified by those parameters. I doubt this would be useful. In any case
this is unrelated to this patch.

I suspect manual vacuum often executes as semi-auto vacuum 
by job-scheduler, cron, etc in actual maintenance cases. 
Whether auto or manual, I think that's important to output 
their logs in the same mechanism.

Sorry, I seem to have wandered from the subject.

Naoya


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-12 Thread Robert Haas
On Thu, Feb 12, 2015 at 6:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Christoph Berg c...@df7cb.de writes:
 gcc5 is lurking in Debian experimental, and it's breaking initdb.

 Yeah, I just heard the same about Red Hat as well:
 https://bugzilla.redhat.com/show_bug.cgi?id=1190978
 Not clear if it's an outright compiler bug or they've just found some
 creative new way to make an optimization assumption we're violating.

 Apparently, it's the former.  See

 https://bugzilla.redhat.com/show_bug.cgi?id=1190978#c3

 I will be unamused if the gcc boys try to make an argument that they
 did some valid optimization here.

You're new here, aren't you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-02-12 Thread Michael Paquier
On Sun, Feb 8, 2015 at 10:05 AM, Andreas Karlsson andr...@proxel.se wrote:
 On 01/30/2015 07:48 AM, Michael Paquier wrote:

 Looking at the latest patch, it seems that in
 AlterTableGetLockLevel@tablecmds.c we ought to put AT_ReAddConstraint,
 AT_AddConstraintRecurse and AT_ProcessedConstraint under the same
 banner as AT_AddConstraint. Thoughts?


 A new version of the patch is attached which treats them as the same for
 locking. I think it is correct and improves readability to do so.

Well then, let's switch it to Ready for committer. I am moving as
well this entry to the next CF with the same status.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-12 Thread Tom Lane
I wrote:
 Christoph Berg c...@df7cb.de writes:
 gcc5 is lurking in Debian experimental, and it's breaking initdb.

 Yeah, I just heard the same about Red Hat as well:
 https://bugzilla.redhat.com/show_bug.cgi?id=1190978
 Not clear if it's an outright compiler bug or they've just found some
 creative new way to make an optimization assumption we're violating.

Apparently, it's the former.  See

https://bugzilla.redhat.com/show_bug.cgi?id=1190978#c3

I will be unamused if the gcc boys try to make an argument that they
did some valid optimization here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-02-12 Thread Andres Freund
On 2015-02-12 11:44:05 -0800, Sergey Konoplev wrote:
 On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  This obviously should not be the case. I'll have a look in a couple of 
  hours. Until then you can likely  just work around the problem by creating 
  the archive_status directory.
 
 Thank you. Just let me know if you need some extra info or debugging.

No need for debugging. It's plain and simply a (cherry-pick) conflict I
resolved wrongly during backpatching. 9.3, 9.4 and master do not have
that problem. That whole fix was quite painful because every single
release had significantly different code :(. pg_basebackup/ is pretty
messy.
I'm not sure why my testsuite didn't trigger that problem. Possibly
because a retry makes things work :(

Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
releases don't have pg_receivexlog) and can quite easily be worked
around by creating the archive_status directory.

If you want to fix it locally, you just need to replace
ReceiveXlogStream(conn, startpos, timeline, NULL, basedir,
  stop_streaming, 
standby_message_timeout, false, true);
by
ReceiveXlogStream(conn, startpos, timeline, NULL, basedir,
  stop_streaming, 
standby_message_timeout, false, false);

Yes, that and pretty much all other functions in that directory have too
many parameters.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański

Andres Freund writes:

 On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
 First of all, the current behaviour is crazy. We're setting and unsetting the
 locking callback every time a connection is made/closed, which is not how
 OpenSSL is supposed to be used. The *application* using libpq should set a
 callback before it starts threads, it's no business of the library's.

 I don't buy this argument at all. Delivering a libpq that's not
 threadsafe in some circumstances is a really bad idea.

I knew this would be a hard sell :( What I know is that the current situation
is not good and leaving it as-is causes real grief.

 The old behaviour was slightly less insane (set callbacks first time we're
 engaging OpenSSL code, never touch them again). The real sane solution is to
 leave it up to the application.

 The real solution would be for openssl to do it itself.

I think that the OpenSSL API is the real culprit there, requiring threads to
register a callback is what's causing all the issues, but I don't think this
will get ever changed.

 Note that the PHP pgsql extension doesn't set the OpenSSL callbacks, because
 libpq was setting them on its own. If we remove the callback handling from
 libpq, PHP will need to add them. By the way, the MySQL extension for PHP 
 also
 does not set those callbacks.

 I think this shows pretty clearly how insane it would be to change this
 in a minor release. Do you really expect people to update libpq and php
 in unison. After a minor release?

Well, I haven't found reports of threaded PHP+MySQL+SSL programs crashing, and
the MySQL extension also doesn't care about the callbacks. I think it's because
it's both because it's a rare combination, or because almost everyone loads the
cURL extension, which *does* set up callbacks. Like I said, it's not a good
situation.

 Note that we *already* provide applications with the ability to set the
 callbacks themselves and prevent us from doing so: PQinitSSL(false).

Ah, I only now saw that with PQinitOpenSSL(true, false) you can work around the
problem, if you set up your own callbacks first. That seems to at least provide
a way to make libpq not do the callbacks dance in released versions. Thank you.

 Let me reiterate: I now believe the callbacks should be set by the 
 application,
 libraries should not touch them, since they don't know what will they be
 stomping on. If the application is run through a VM like Python or PHP, it's
 the VM that should make sure the callbacks are set.

 I fail to see why it's any better to do it in the VM. That relies on
 either always loading the VM's openssl module, even if you don't
 actually need it because the library you use deals with openssl. It
 prevents other implementations of openssl in the VM.

The callbacks are a thing that's owned by the application, so libraries have no
business in setting them up. The way OpenSSL's API is specified (very
unfortunately IMHO) is that before you use OpenSSL from threads, you need to
set the callbacks. If you don't control your application's startup (like when
you're a script executed through a VM), you assume the VM took care of it at
startup. If you're a library, you assume the user took care of it, as they
should.

Now I know that a lot of apps get this wrong. Python almost does the right
thing, but you're right, it only sets up callbacks if you load the 'ssl'
module. It still feels like setting them up in library code is stomping on
things not owned by the library - like libperl setting up signal handlers,
which caused problems in Postgres. They're resources not owned by the library!

 What I think we should do is the following:

 * Emphasize the fact that it's important to use PQinitSSL(false) if you
   did things yourself.

PQinitOpenSSL(true, false) if anything. The reason that function got introduced
is precisely because PQinitSSL(false) isn't exactly right, see
http://www.postgresql.org/message-id/49820d7d.7030...@esilo.com

That doesn't solve the problem of the Python deadlock, where you're not at
leisure to call a C function at the beginning of your module.

 * If there's already callbacks set: Remember that fact and don't
   overwrite. In the next major version: warn.

So yeah, that was my initial approach - check if callbacks are set, don't do
the dance if they are. It felt like a crutch, though, and racy at that. There's
no atomic way to test-and-set those callbacks. The window for racyness is
small, though.

 * Assert or something if the callback when unregistering isn't the same
   as it was when registering. That's pretty much guaranteed to cause
   issues.

So let me try another proposal and see if it doesn't set alarm bells off.

  * When opening a connection, if there's a callback set, don't overwrite it
(small race window).

  * When closing a connection, if the callback set is not
pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

Asserts in frontend code are impossible, right? There's no way to 

Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Andres Freund
On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
 
 Andres Freund writes:
 
  On 2015-02-09 18:17:14 +0100, Jan Urbański wrote:
  First of all, the current behaviour is crazy. We're setting and unsetting 
  the
  locking callback every time a connection is made/closed, which is not how
  OpenSSL is supposed to be used. The *application* using libpq should set a
  callback before it starts threads, it's no business of the library's.
 
  I don't buy this argument at all. Delivering a libpq that's not
  threadsafe in some circumstances is a really bad idea.
 
 I knew this would be a hard sell :( What I know is that the current situation
 is not good and leaving it as-is causes real grief.

It certainly causes less grief than just breaking working
applications. While really shitty the current situation works in a large
number of scenarios. It's not that common to have several users of
openssl in an application *and* unload libpq.

  I fail to see why it's any better to do it in the VM. That relies on
  either always loading the VM's openssl module, even if you don't
  actually need it because the library you use deals with openssl. It
  prevents other implementations of openssl in the VM.
 
 The callbacks are a thing that's owned by the application, so libraries have 
 no
 business in setting them up. The way OpenSSL's API is specified (very
 unfortunately IMHO) is that before you use OpenSSL from threads, you need to
 set the callbacks. If you don't control your application's startup (like when
 you're a script executed through a VM), you assume the VM took care of it at
 startup. If you're a library, you assume the user took care of it, as they
 should.

That's a bogus argument. The VM cannot setup up every library in the
world in a correct way. It'd be obviously be completely insane to load
the ssl module in every library just because some part of some random
application might need it. In fact, the ssl library for python does
pretty much the same thing as libpq does (although it's slightly more
careful). It's not the VM at all.

  What I think we should do is the following:
 
  * Emphasize the fact that it's important to use PQinitSSL(false) if you
did things yourself.
 
 PQinitOpenSSL(true, false) if anything. The reason that function got 
 introduced
 is precisely because PQinitSSL(false) isn't exactly right, see
 http://www.postgresql.org/message-id/49820d7d.7030...@esilo.com

Well, that really depends on what the application is actually using...

 That doesn't solve the problem of the Python deadlock, where you're not at
 leisure to call a C function at the beginning of your module.

We could just never unload the hooks...

  * If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.
 
 So yeah, that was my initial approach - check if callbacks are set, don't do
 the dance if they are. It felt like a crutch, though, and racy at that. 
 There's
 no atomic way to test-and-set those callbacks. The window for racyness is
 small, though.

If you do that check during library initialization instead of every
connection it shouldn't be racy - if that part is run in a multithreaded
fashion you're doing something crazy.

  * Assert or something if the callback when unregistering isn't the same
as it was when registering. That's pretty much guaranteed to cause
issues.
 
 So let me try another proposal and see if it doesn't set alarm bells off.
 
   * When opening a connection, if there's a callback set, don't overwrite it
 (small race window).

   * When closing a connection, if the callback set is not
 pq_lockingcallback/pq_threadidcallback, don't NULL it (small race window)

If we do the tests once during initialization there shouldn't be a
race

 Asserts in frontend code are impossible, right? There's no way to warn, 
 either.

You can write to stderr...

 That would be a ~8 line patch. Does it feel back-patcheable?

I think we first need to find out what we all can agree on before
deciding about that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-12 Thread Naoya Anzai
Hi, Michael-san

 An updated patch is attached, 

I'm sorry for confusing you.

I think you don't have to implement this code to disable this
feature with using value -2.Because this use case is a rare case,
and there is a practical workaround using huge value like 2e9.
(You suggested 2e9 to me, didn't you? :) ) So, please remove this code.


 Well, I see your point but this is not completely true: we could as
 well rely entirely on this parameter instead of VACOPT_VERBOSE to
 determine if autovacuum, a vacuum or an analyze are in verbose mode,
 and remove VACOPT_VERBOSE, but I can imagine people complaining if
 VACOPT_VERBOSE is removed. So let's set it up unconditionally to -1 in
 gram.y for now. However if people think that it is fine to remove
 VACOPT_VERBOSE, we could use exclusively this parameter in VacuumStmt.
 Or even add sanity checks at the top of vacuum() to ensure that
 VACOPT_VERBOSE is set only when log_min_duration is positive.
 Additional opinions on this matter are welcome.

I understand your point at last. :)

You mean that ...
Log_autovacuum_min_duration assumes a role of VACOPT_VERBOSE.
Even if this parameter never use currently for manual vacuum,
log_autovacuum_min_duration should be set zero(anytime output)
when we executes VACUUM(or ANALYZE) VERBOSE.
Is my understanding correct? If so,it sounds logical.

If we can abolish VERBOSE option,
I think it's ideal that we will prepare a new parameter like 
a log_min_duration_vacuum(and log_min_duration_analyze) which 
integrating VERBOSE feature and log_autovacuum_min_duration.

Regards,
---
Naoya Anzai


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Strange assertion using VACOPT_FREEZE in vacuum.c

2015-02-12 Thread Michael Paquier
Hi all,

When calling vacuum(), there is the following assertion using VACOPT_FREEZE:
Assert((vacstmt-options  VACOPT_VACUUM) ||
!(vacstmt-options  (VACOPT_FULL | VACOPT_FREEZE)));
I think that this should be changed with sanity checks based on the
parameter values of freeze_* in VacuumStmt as we do not set up
VACOPT_FREEZE when VACUUM is used without options in parenthesis, for
something like that:
Assert((vacstmt-options  VACOPT_VACUUM) ||
-  !(vacstmt-options  (VACOPT_FULL | VACOPT_FREEZE)));
+  ((vacstmt-options  VACOPT_FULL) == 0 
+   vacstmt-freeze_min_age  0 
+   vacstmt-freeze_table_age  0 
+   vacstmt-multixact_freeze_min_age  0 
+   vacstmt-multixact_freeze_table_age  0));
This would also have the advantage to limit the use of VACOPT_FREEZE
in the query parser.
A patch is attached.
Thoughts?
-- 
Michael
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 2f3f79d..3ddc077 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -110,7 +110,11 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast,
 	/* sanity checks on options */
 	Assert(vacstmt-options  (VACOPT_VACUUM | VACOPT_ANALYZE));
 	Assert((vacstmt-options  VACOPT_VACUUM) ||
-		   !(vacstmt-options  (VACOPT_FULL | VACOPT_FREEZE)));
+		   ((vacstmt-options  VACOPT_FULL) == 0 
+			vacstmt-freeze_min_age  0 
+			vacstmt-freeze_table_age  0 
+			vacstmt-multixact_freeze_min_age  0 
+			vacstmt-multixact_freeze_table_age  0));
 	Assert((vacstmt-options  VACOPT_ANALYZE) || vacstmt-va_cols == NIL);
 
 	stmttype = (vacstmt-options  VACOPT_VACUUM) ? VACUUM : ANALYZE;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd behavior of updatable security barrier views on foreign tables

2015-02-12 Thread Etsuro Fujita

On 2015/02/11 4:06, Stephen Frost wrote:

* Etsuro Fujita (fujita.ets...@lab.ntt.co.jp) wrote:

On 2015/02/10 7:23, Dean Rasheed wrote:

Sorry, I didn't have time to look at this properly. My initial thought
is that expand_security_qual() needs to request a lock on rows coming

from the relation it pushes down into a subquery if that relation was

the result relation, because otherwise it won't have any locks, since
preprocess_rowmarks() only adds PlanRowMarks to non-target relations.


That seems close to what I had in mind; expand_security_qual() needs
to request a FOR UPDATE lock on rows coming from the relation it
pushes down into a subquery only when that relation is the result
relation and *foreign table*.


I had been trying to work out an FDW-specific way to address this, but I
think Dean's right that this should be addressed in
expand_security_qual(), which means it'll apply to all cases and not
just these FDW calls.  I don't think that's actually an issue though and
it'll match up to how SELECT FOR UPDATE is handled today.


Sorry, my explanation was not accurate, but I also agree with Dean's 
idea.  In the above, I just wanted to make it clear that such a lock 
request done by expand_security_qual() should be limited to the case 
where the relation that is a former result relation is a foreign table.


If it's OK, I'll submit a patch for that, maybe early next week.

Thank you for working on this issue!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table-level log_autovacuum_min_duration

2015-02-12 Thread Michael Paquier
On Fri, Feb 13, 2015 at 10:16 AM, Naoya Anzai
anzai-na...@mxu.nes.nec.co.jp wrote:
 You mean that ...
 Log_autovacuum_min_duration assumes a role of VACOPT_VERBOSE.
 Even if this parameter never use currently for manual vacuum,
 log_autovacuum_min_duration should be set zero(anytime output)
 when we executes VACUUM(or ANALYZE) VERBOSE.
 Is my understanding correct? If so,it sounds logical.


Yup, that's my opinion. Now I don't know if people would mind to remove
VACOPT_VERBOSE and replace the control it does by log_min_duration in
VacuumStmt. At least both things are overlapping, and log_min_duration
offers more options than the plain VACOPT_VERBOSE.

 OK. I agree with you.
 Please re-implement as you are thinking.

OK will do that today.

 If we can abolish VERBOSE option,
 I think it's ideal that we will prepare a new parameter like
 a log_min_duration_vacuum(and log_min_duration_analyze) which
 integrating VERBOSE feature and log_autovacuum_min_duration.


What I think you are proposing here is a VERBOSE option that hypothetically
gets activated if a manual VACUUM takes more than a certain amount
specified by those parameters. I doubt this would be useful. In any case
this is unrelated to this patch.

 I suspect manual vacuum often executes as semi-auto vacuum
 by job-scheduler, cron, etc in actual maintenance cases.
 Whether auto or manual, I think that's important to output
 their logs in the same mechanism.

 Sorry, I seem to have wandered from the subject.

No problem. That's a constructive discussion :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Amit Kapila
On Thu, Feb 12, 2015 at 10:07 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Feb 12, 2015 at 6:40 AM, Amit Kapila amit.kapil...@gmail.com
wrote:
  If we have to go this way, then isn't it better to evaluate the same
  when we are trying to create parallel path (something like in the
  parallel_seq scan patch - create_parallelscan_paths())?

 Probably not, because many queries will scan multiple relations, and
 we want to do all of this work just once per query.

By this, do you mean to say that if there is any parallel-unsafe
expression (function call) in query, then we won't parallelize any
part of query, if so why is that mandatory?
Can't we parallelize scan on a particular relation if all the expressions
in which that relation is involved are parallel-safe

 Also, when
 somebody adds another parallel node (e.g. parallel hash join) that
 will need this same information.


I think we should be able to handle this even if we have per relation
information (something like don't parallelize a node/path if any lower
node is not parallel)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] enabling nestedloop and disabling hashjon

2015-02-12 Thread Ravi Kiran
sorry for the inconvenience if caused to anyone, but as David G johnston
said, I was trying to change how the postgresql works and was not able to
figure out how it should be done. I will make sure it will be clear from
the next time. Thank you very much.

@Tom lane   Sir, I forgot to remove the # comment marker in the file's
entry, Thank you.
ᐧ

On Fri, Feb 13, 2015 at 2:50 AM, David G Johnston 
david.g.johns...@gmail.com wrote:

 Jim Nasby-5 wrote
  On 2/10/15 9:29 AM, Tom Lane wrote:
  Ravi Kiran lt;

  ravi.kolanpaka@

  gt; writes:
  yes sir, I did try the pg_ctl reload command, but its still using the
  hash
  join algorithm and not the nested loop algorithm. I even restarted the
  server, even then its still using the hash join algorithm
 
  Does show enable_hashjoin say it's off?  If not, I think you must've
  fat-fingered the postgresql.conf change somehow.
 
  For future reference, posts like this belong on pgsql-performance.


  but postgres is still using the hash join algorithm even after modifying
  the postgresql code

 To be fair given the original post, and some other recent posts by the same
 author, the question is not why is my query performing slowly but rather
 I'm trying to change how PostgreSQL works and cannot figure out how to
 properly enable and disable algorithms.  -hackers seems like the proper
 forum though the OP could give more context as to his overarching goals to
 make that more clear to readers.

 David J.




 --
 View this message in context:
 http://postgresql.nabble.com/enabling-nestedloop-and-disabling-hashjon-tp5837275p5837728.html
 Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] REVIEW: Track TRUNCATE via pgstat

2015-02-12 Thread Michael Paquier
On Sat, Jan 24, 2015 at 5:58 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Here's v0.5. (Why did you use that weird decimal versioning scheme?  You
 could just say v4 and save a couple of keystrokes).  This patch makes
 perfect sense to me now.  I was ready to commit, but I checked the
 regression test you added and noticed that you're only reading results
 for the last set of operations because they all use the same table and
 so each new set clobbers the values for the previous one.  So I modified
 them to use one table for each set, and report the counters for all
 tables.  In doing this I noticed that the one for trunc_stats_test3 is
 at odds with what your comment in the .sql file says; would you review
 it please?  Thanks.

 (I didn't update the expected file.)

 BTW you forgot to update expected/prepared_xact_1.out, for the case when
 prep xacts are disabled.

 If some other committer decides to give this a go, please remember to
 bump catversion before pushing.


Alex, this patch seems nicely backed. Could you review the changes of
Alvaro? This thread is waiting for your input for 3 weeks.
-- 
Michael


Re: [HACKERS] Patch to support SEMI and ANTI join removal

2015-02-12 Thread Michael Paquier
On Sun, Nov 23, 2014 at 8:23 PM, David Rowley dgrowle...@gmail.com wrote:


 As the patch stands there's still a couple of FIXMEs in there, so there's
 still a bit of work to do yet.
 Comments are welcome


Hm, if there is still work to do, we may as well mark this patch as
rejected as-is, also because it stands in this state for a couple of months.
-- 
Michael


Re: [HACKERS] Patch to support SEMI and ANTI join removal

2015-02-12 Thread Marko Tiikkaja

On 2/13/15 8:52 AM, Michael Paquier wrote:

On Sun, Nov 23, 2014 at 8:23 PM, David Rowley dgrowle...@gmail.com wrote:

As the patch stands there's still a couple of FIXMEs in there, so there's
still a bit of work to do yet.
Comments are welcome



Hm, if there is still work to do, we may as well mark this patch as
rejected as-is, also because it stands in this state for a couple of months.


I didn't bring this up before, but I'm pretty sure this patch should be 
marked returned with feedback.  From what I've understood, rejected 
means we don't want this thing, not in this form or any other.  That 
doesn't seem to be the case for this patch, nor for a few others marked 
rejected in the currently in-progress commit fest.




.m


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WALWriter active during recovery

2015-02-12 Thread Michael Paquier
On Thu, Dec 18, 2014 at 6:43 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Dec 16, 2014 at 3:51 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  Currently, WALReceiver writes and fsyncs data it receives. Clearly,
  while we are waiting for an fsync we aren't doing any other useful
  work.
 
  Following patch starts WALWriter during recovery and makes it
  responsible for fsyncing data, allowing WALReceiver to progress other
  useful actions.

 +1

  At present this is a WIP patch, for code comments only. Don't bother
  with anything other than code questions at this stage.
 
  Implementation questions are
 
  * How should we wake WALReceiver, since it waits on a poll(). Should
  we use SIGUSR1, which is already used for latch waits, or another
  signal?

 Probably we need to change libpqwalreceiver so that it uses the latch.
 This is useful even for the startup process to report the replay location
 to
 the walreceiver in real time.

  * Should we introduce some pacing delays if the WALreceiver gets too
  far ahead of apply?

 I don't think so for now. Instead, we can support synchronous_commit =
 replay,
 and the users can use that new mode if they are worried about the delay of
 WAL replay.

  * Other questions you may have?

 Who should wake the startup process so that it reads and replays the WAL
 data?
 Current walreceiver. But if walwriter is responsible for fsyncing WAL data,
 probably walwriter should do that. Because the startup process should not
 replay
 the WAL data which has not been fsync'd yet.


Moved this patch to CF 2015-02 to not lose track of it and because it did
not get any reviews.
-- 
Michael


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-02-12 Thread Michael Paquier
On Thu, Jan 15, 2015 at 8:02 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai kai...@ak.jp.nec.com
 wrote:
   When custom-scan node replaced a join-plan, it shall have at least two
   child plan-nodes. The callback handler of PlanCustomPath needs to be
   able to call create_plan_recurse() to transform the underlying
   path-nodes to plan-nodes, because this custom-scan node may take other
   built-in scan or sub-join nodes as its inner/outer input.
   In case of FDW, it shall kick any underlying scan relations to remote
   side, thus we may not expect ForeignScan has underlying plans...
 
  Do you have an example of this?
 
 Yes, even though full code set is too large for patch submission...

 https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880

 This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
 It takes GpuHashJoinPath inherited from CustomPath that has multiple
 underlying scan/join paths.
 Once it is called back from the backend, it also calls
 create_plan_recurse()
 to make inner/outer plan nodes according to the paths.

 In the result, we can see the following query execution plan that
 CustomScan
 takes underlying scan plans.

 postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
 QUERY PLAN

 --
  Custom Scan (GpuHashJoin)  (cost=2968.00..140120.31 rows=3970922
 width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-  Custom Scan (GpuScan) on t0  (cost=500.00..57643.00 rows=409
 width=77)
-  Custom Scan (MultiHash)  (cost=734.00..734.00 rows=4 width=37)
  hash keys: aid
  nBatches: 1  Buckets: 46000  Memory Usage: 99.99%
  -  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=37)
  -  Custom Scan (MultiHash)  (cost=734.00..734.00 rows=4
 width=37)
hash keys: bid
nBatches: 1  Buckets: 46000  Memory Usage: 49.99%
-  Seq Scan on t2  (cost=0.00..734.00 rows=4 width=37)
 (13 rows)


Where are we on this? AFAIK, we have now a feature with no documentation
and no example in-core to test those custom routine APIs, hence moved to
next CF.
-- 
Michael


Re: ctidscan as an example of custom-scan (Re: [HACKERS] [v9.5] Custom Plan API)

2015-02-12 Thread Michael Paquier
On Tue, Jan 6, 2015 at 10:51 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

 Jim, Thanks for your reviewing the patch.

 The attached patch is revised one according to your suggestion,
 and also includes bug fix I could found.

 * Definitions of TIDOperator was moved to pg_operator.h
   as other operator doing.
 * Support the case of ctid (operator) Param expression.
 * Add checks if commutator of TID was not found.
 * Qualifiers gets extracted on plan stage, not path stage.
 * Adjust cost estimation logic to fit SeqScan manner.
 * Add some new test cases for regression test.

  On 12/24/14, 7:54 PM, Kouhei Kaigai wrote:
   On Mon, Dec 15, 2014 at 4:55 PM, Kouhei Kaigai
   kai...@ak.jp.nec.com
   wrote:
   I'm not certain whether we should have this functionality in
   contrib from the perspective of workload that can help, but its
   major worth is for an example of custom-scan interface.
   worker_spi is now in src/test/modules. We may add it there as well,
  no?
  
   Hmm, it makes sense for me. Does it also make sense to add a
   test-case to the core regression test cases?
  
   The attached patch adds ctidscan module at test/module instead of
 contrib.
   Basic portion is not changed from the previous post, but file
   locations and test cases in regression test are changed.
 
  First, I'm glad for this. It will be VERY valuable for anyone trying to
  clean up the end of a majorly bloated table.
 
  Here's a partial review...
 
  +++ b/src/test/modules/ctidscan/ctidscan.c
 
  +/* missing declaration in pg_proc.h */
  +#ifndef TIDGreaterOperator
  +#define TIDGreaterOperator   2800
  ...
  If we're calling that out here, should we have a corresponding comment in
  pg_proc.h, in case these ever get renumbered?
 
 It was a quick hack when I moved this module out of the tree.
 Yep, I should revert when I submit this patch again.

  +CTidQualFromExpr(Node *expr, int varno)
  ...
  + if (IsCTIDVar(arg1, varno))
  + other = arg2;
  + else if (IsCTIDVar(arg2, varno))
  + other = arg1;
  + else
  + return NULL;
  + if (exprType(other) != TIDOID)
  + return NULL;/* should not happen */
  + /* The other argument must be a pseudoconstant */
  + if (!is_pseudo_constant_clause(other))
  + return NULL;
 
  I think this needs some additional blank lines...
 
 OK. And, I also noticed the coding style around this function is
 different from other built-in plans, so I redefined the role of
 this function just to check whether the supplied RestrictInfo is
 OpExpr that involves TID inequality operator, or not.

 Expression node shall be extracted when Plan node is created
 from Path node.

  + if (IsCTIDVar(arg1, varno))
  + other = arg2;
  + else if (IsCTIDVar(arg2, varno))
  + other = arg1;
  + else
  + return NULL;
  +
  + if (exprType(other) != TIDOID)
  + return NULL;/* should not happen */
  +
  + /* The other argument must be a pseudoconstant */
  + if (!is_pseudo_constant_clause(other))
  + return NULL;
 
  + * CTidEstimateCosts
  + *
  + * It estimates cost to scan the target relation according to the given
  + * restriction clauses. Its logic to scan relations are almost same as
  + * SeqScan doing, because it uses regular heap_getnext(), except for
  + * the number of tuples to be scanned if restriction clauses work well.
 
  grammarThat should read same as what SeqScan is doing... however,
 what
  actual function are you talking about? I couldn't find
 SeqScanEstimateCosts
  (or anything ending EstimateCosts).
 
 It is cost_seqscan(). But I don't put a raw function name in the source
 code comments in other portion, because nobody can guarantee it is up to
 date in the future...

  BTW, there's other grammar issues but it'd be best to handle those all at
  once after all the code stuff is done.
 
 Yep. Help by native English speaker is very helpful for us.

  + opno = get_commutator(op-opno);
  What happens if there's no commutator? Perhaps best to Assert(opno !=
  InvalidOid) at the end of that if block.
 
 Usually, commutator operator of TID is defined on initdb time, however,
 nobody can guarantee a mad superuser doesn't drop it.
 So, I added elog(ERROR,...) here.


  Though, it seems things will fall apart anyway if ctid_quals isn't
 exactly
  what we're expecting; I don't know if that's OK or not.
 
 No worry, it was already checked on planning stage.

  + /* disk costs --- assume each tuple on a different page */
  + run_cost += spc_random_page_cost * ntuples;
  Isn't that extremely pessimistic?
 
 Probably. I follow the manner of SeqScan.

  I'm not familiar enough with the custom-scan stuff to really comment past
  

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2015-02-12 Thread Michael Paquier
On Tue, Dec 23, 2014 at 9:43 AM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Dec 22, 2014 at 4:34 PM, Peter Geoghegan p...@heroku.com wrote:
  To put it another way, creating a separate object obfuscates
  scanRTEForColumn(), since it's the only client of
  updateFuzzyAttrMatchState().


 Excuse me. I mean *not* creating a separate object -- having a unified
 state representation for the entire range-table, rather than having
 one per RTE and merging them one by one into an overall/final range
 table object.


Patch moved to CF 2015-02.
-- 
Michael


Re: [HACKERS] Logical Replication Helpers WIP for discussion

2015-02-12 Thread Michael Paquier
On Mon, Dec 22, 2014 at 10:26 PM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Dec 19, 2014 at 8:40 AM, Petr Jelinek p...@2ndquadrant.com
 wrote:
  What I hope to get from this is agreement on the general approach and
  protocol so that we can have common base which will both make it easier
 to
  create external logical replication solutions and also eventually lead to
  full logical replication inside core PostgreSQL.

 The protocol is a really important topic which deserves its own
 discussion.  Andres has mentioned some of the ideas he has in mind -
 which I think are similar to what you did here - but there hasn't
 really been a thread devoted to discussing that topic specifically.  I
 think that would be a good idea: lay out what you have in mind, and
 why, and solicit feedback.


Looking at this patch, I don't see what we actually gain much here except a
decoder plugin that speaks a special protocol for a special background
worker that has not been presented yet. What actually is the value of that
defined as a contrib/ module in-core. Note that we have already
test_decoding to basically test the logical decoding facility, used at
least at the SQL level to get logical changes decoded.

Based on those reasons I am planning to mark this as rejected (it has no
documentation as well). So please speak up if you think the contrary, but
it seems to me that this could live happily out of core.
-- 
Michael


Re: [HACKERS] PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching

2015-02-12 Thread Michael Paquier
On Thu, Jan 15, 2015 at 5:02 AM, Tomas Vondra tomas.von...@2ndquadrant.com
wrote:

 Maybe we can try later again, but there's no poin in keeping this in the
 current CF.

 Any objections?


None, marked as rejected.
-- 
Michael


Re: [HACKERS] documentation update for doc/src/sgml/func.sgml

2015-02-12 Thread Michael Paquier
On Mon, Feb 2, 2015 at 10:42 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jan 20, 2015 at 4:01 AM, Fabien COELHO coe...@cri.ensmp.fr
 wrote:
  I had a look at this patch.  This patch adds some text below a table
  of functions.  Immediately above that table, there is this existing
  language:
 
The functions working with typedouble precision/type data are
 mostly
implemented on top of the host system's C library; accuracy and
 behavior
  in
boundary cases can therefore vary depending on the host system.
 
  This seems to me to substantially duplicate the information added by the
  patch.
 
  I would rather say that it explicites the potential issues. Taking that
 into
  account, maybe the part about floating point could be moved up after the
  above sentence, or the above sentence moved down as an introduction, with
  some pruning so that it fits in?

 Possibly.  If anyone still cares about this patch, then they should
 try revising it along those lines and submit an updated version.  If
 no one is excited enough about this to do that, we should just flag
 this as rejected and move on.  Since this patch has been kicking
 around since August, my reading is nobody's very excited about it, but
 maybe I'm misinterpreting the situation.


Let's move on then, marked as rejected.
-- 
Michael


Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments

2015-02-12 Thread Sergey Konoplev
On Thu, Feb 12, 2015 at 4:18 PM, Andres Freund and...@2ndquadrant.com wrote:
 No need for debugging. It's plain and simply a (cherry-pick) conflict I
 resolved wrongly during backpatching. 9.3, 9.4 and master do not have
 that problem. That whole fix was quite painful because every single
 release had significantly different code :(. pg_basebackup/ is pretty
 messy.
 I'm not sure why my testsuite didn't trigger that problem. Possibly
 because a retry makes things work :(

 Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier
 releases don't have pg_receivexlog) and can quite easily be worked
 around by creating the archive_status directory.

The workaround works perfectly for me in this case, I'm going to
updrade it up to 9.4 anyway soon.

Thank you, Andres.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel mode and parallel contexts

2015-02-12 Thread Michael Paquier
On Thu, Feb 12, 2015 at 3:59 AM, Robert Haas robertmh...@gmail.com wrote:

 We're not seeing eye to eye here yet, since I don't accept your
 example scenario and you don't accept mine.  Let's keep discussing.

 Meanwhile, here's an updated patch.


A lot of cool activity is showing up here, so moved the patch to CF
2015-02. Perhaps Andres you could add yourself as a reviewer?
-- 
Michael


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-02-12 Thread Michael Paquier
On Wed, Dec 17, 2014 at 5:39 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 15 December 2014 at 20:26, Jeff Janes jeff.ja...@gmail.com wrote:

  I still get the compiler error in contrib:
 
  pgstattuple.c: In function 'pgstat_heap':
  pgstattuple.c:279: error: too few arguments to function
  'heap_beginscan_strat'
 
  Should it pass false for the always_prune?

 Yes.

 New version attached.


Moved patch to CF 2015-02 with same status Needs review. It visibly needs
more work, and numbers to show increase in performance while only cases
showing up performance decrease showed up.
-- 
Michael


Re: [HACKERS] Logical Decoding follows timelines

2015-02-12 Thread Michael Paquier
On Wed, Dec 17, 2014 at 5:35 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 16 December 2014 at 21:17, Simon Riggs si...@2ndquadrant.com wrote:

  This patch is a WIP version of doing that, but only currently attempts

  With the patch, XLogSendLogical uses the same logic to calculate
 SendRqstPtr
  that XLogSendPhysical does. It would be good to refactor that into a
 common
  function, rather than copy-paste.
 
  Some of the logic is similar, but not all.
 
  SendRqstPtr isn't actually used for anything in XLogSendLogical.
 
  It exists to allow the call which resets TLI.
 
  I'll see if I can make it exactly identical; I didn't think so when I
  first looked, will look again.

 Yes, that works. New version attached


Moved patch to CF 2015-02 to not lose track of it, also because it does not
seem it received a proper review.
-- 
Michael


Re: [HACKERS] Logical decoding document

2015-02-12 Thread Andres Freund
Hi,

On 2015-02-12 22:55:33 +0900, Tatsuo Ishii wrote:
 Hi, I need help.
 
 In 46.6.4.5 Change Callback
 
Note: Only changes in user defined tables that are not unlogged
(see UNLOGGED) and not temporary (see TEMPORARY or TEMP) can be
extracted using logical decoding.
 
 I cannot parse the sentence above. Maybe logical decoding does not
 decode a table if it is an unloged table or a temporary table?

It basically is saying that you won't see changes made to temporary
and/or unlogged tables. But the begin/commit callbacks being called for
the relevant transaction.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-02-12 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Feb 10, 2015 at 3:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, I'm not all that thrilled with the deserialized object terminology.
 I found myself repeatedly tripping up on which form was serialized and
 which de-.  If anyone's got a better naming idea I'm willing to adopt it.

 My first thought is that we should form some kind of TOAST-like
 backronym, like Serialization Avoidance Loading and Access Device
 (SALAD) or Break-up, Read, Edit, Assemble, and Deposit (BREAD).  I
 don't think there is anything per se wrong with the terms
 serialization and deserialization; indeed, I used the same ones in the
 parallel-mode stuff.  But they are fairly general terms, so it might
 be nice to have something more specific that applies just to this
 particular usage.

Hm.  I'm not against the concept, but those particular suggestions don't
grab me.

 I found the notion of primary and secondary TOAST pointers to be
 quite confusing.  I *think* what you are doing is storing two pointers
 to the object in the object, and a pointer to the object is really a
 pointer to one of those two pointers to the object.  Depending on
 which one it is, you can write the object, or not.

There's more to it than that.  (Writing more docs is one of the to-do
items ;-).)  We could alternatively have done that with two different
va_tag values for read write and read only, which indeed was my
initial intention before I thought of this dodge.  However, then you
have to figure out where to store such pointers, which is problematic
both for plpgsql variable assignment and for ExecMakeSlotContentsReadOnly,
especially the latter which would have to put any freshly-made pointer
in a long-lived context resulting in query-lifespan memory leaks.
So I early decided that the read-write pointer should live right in the
object's own context where it need not be copied when swinging the
context ownership someplace else, and later realized that there should
also be a permanent read-only pointer in there for the use of
ExecMakeSlotContentsReadOnly, and then realized that they didn't need
to have different va_tag values if we implemented the is read-write
pointer test as it's done in the patch.  Having only one va_tag value
not two saves cycles, I think, because there are a lot of low-level
tests that don't need to distinguish, eg VARTAG_SIZE().  However it
does make it more expensive when you do need to distinguish, so I might
reconsider that decision later.  (Since these will never go to disk,
we can whack the representation around pretty freely if needed.)

Also, I have hopes of allowing deserialized-object pointers to be copied
into tuples as pointers rather than by reserialization, if we can
establish that the tuple is short-lived enough that the pointer will stay
good, which would be true in a lot of cases during execution of queries by
plpgsql.  With the patch's design, a pointer so copied will automatically
be considered read-only, which I *think* is the behavior we'd need.  If it
turns out that it's okay to propagate read-write-ness through such a copy
step then that would argue in favor of using two va_tag values.

It may be that this solution is overly cute and we should just use two
tag values.  But I wanted to be sure it was possible for copying of a
pointer to automatically lose read-write-ness, in case we need to have
such a guarantee.

 This is a clever
 representation, but it's hard to wrap your head around, and I'm not
 sure primary and secondary are the best names, although I don't
 have an idea as to what would be better.  I'm a bit confused, though:
 once you give out a secondary pointer, how is it safe to write the
 object through the primary pointer?

It's no different from allowing plpgsql to update the values of variables
of pass-by-reference types even though it has previously given out Datums
that are pointers to them: by the time we're ready to execute an
assignment, any query execution that had such a pointer is over and done
with.  (This implies that cursor parameters have to be physically copied
into the cursor's execution state, which is one of a depressingly large
number of reasons why datumCopy() has to physically copy a deserialized
value rather than just copying the pointer.  But otherwise it works.)

There is more work to do to figure out how we can safely give out a
read/write pointer for cases like
hstore_var := hstore_concat(hstore_var, ...);
Aside from the question of whether hstore_concat guarantees not to trash
the value on failure, we'd have to restrict this (I think) to expressions
in which there is only one reference to the target variable and it's an
argument of the topmost function/operator.  But that's something I've not
tried to implement yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding document

2015-02-12 Thread Tatsuo Ishii
 Hi, I need help.
 
 In 46.6.4.5 Change Callback
 
Note: Only changes in user defined tables that are not unlogged
(see UNLOGGED) and not temporary (see TEMPORARY or TEMP) can be
extracted using logical decoding.
 
 I cannot parse the sentence above. Maybe logical decoding does not
 decode a table if it is an unloged table or a temporary table?
 
 It basically is saying that you won't see changes made to temporary
 and/or unlogged tables. But the begin/commit callbacks being called for
 the relevant transaction.

Oh, thanks.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Simplify sleeping while reading/writing from client

2015-02-12 Thread Heikki Linnakangas

On 02/06/2015 11:50 AM, Andres Freund wrote:

On 2015-02-05 16:45:50 +0200, Heikki Linnakangas wrote:

I propose the attached, which pulls all the wait-retry logic up to
secure_read() and secure_write(). This makes the code a lot more
understandable.


Generally a good idea. Especially if we get more ssl implementations.

But I think it'll make the already broken renegotiation handling even
worse. Because now we're always entering do_handshake in nonblocking
mode (essentially).


Good point. In the other thread, we concluded that the 
SSL_do_handshake() call isn't really needed anyway, so attached are two 
patches: 1. remove the SSL_do_handshake() calls, and 2. the previous 
patch, to simplify the waiting logic.


- Heikki

From 30e5930c9da553d504339e00a05a2a8892182f63 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Thu, 12 Feb 2015 15:53:26 +0200
Subject: [PATCH 1/2] Simplify the way OpenSSL renegotiation is initiated in
 server.

At least in all modern versions of OpenSSL, it is enough to call
SSL_renegotiate() once, and then forget about it. Subsequent SSL_write()
and SSL_read() calls will finish the handshake.

The SSL_set_session_id_context() call is unnecessary too. We only have
one SSL context, and the SSL session was created with that to begin with.
---
 src/backend/libpq/be-secure-openssl.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index d5f9712..d13ce33 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -624,33 +624,10 @@ be_tls_write(Port *port, void *ptr, size_t len)
 		 */
 		SSL_clear_num_renegotiations(port-ssl);
 
-		SSL_set_session_id_context(port-ssl, (void *) SSL_context,
-   sizeof(SSL_context));
 		if (SSL_renegotiate(port-ssl) = 0)
 			ereport(COMMERROR,
 	(errcode(ERRCODE_PROTOCOL_VIOLATION),
 	 errmsg(SSL failure during renegotiation start)));
-		else
-		{
-			int			retries;
-
-			/*
-			 * A handshake can fail, so be prepared to retry it, but only
-			 * a few times.
-			 */
-			for (retries = 0;; retries++)
-			{
-if (SSL_do_handshake(port-ssl)  0)
-	break;	/* done */
-ereport(COMMERROR,
-		(errcode(ERRCODE_PROTOCOL_VIOLATION),
-		 errmsg(SSL handshake failure on renegotiation, retrying)));
-if (retries = 20)
-	ereport(FATAL,
-			(errcode(ERRCODE_PROTOCOL_VIOLATION),
-			 errmsg(could not complete SSL handshake on renegotiation, too many failures)));
-			}
-		}
 	}
 
 wloop:
-- 
2.1.4

From 6341bd76e8184c5c464c90a1ba3d99f02ef61bac Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakangas@iki.fi
Date: Thu, 5 Feb 2015 16:44:13 +0200
Subject: [PATCH 2/2] Simplify waiting logic in reading from / writing to
 client.

The client socket is always in non-blocking mode, and if we actually want
blocking behaviour, we emulate it by sleeping and retrying. But we retry
loops at different layers for reads and writes, which was confusing.
To simplify, remove all the sleeping and retrying code from the lower
levels, from be_tls_read and secure_raw_read and secure_raw_write, and put
all the logic in secure_read() and secure_write().
---
 src/backend/libpq/be-secure-openssl.c |  81 +--
 src/backend/libpq/be-secure.c | 143 ++
 src/backend/libpq/pqcomm.c|   3 +-
 src/include/libpq/libpq-be.h  |   4 +-
 4 files changed, 79 insertions(+), 152 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index d13ce33..37af6e4 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -511,14 +511,11 @@ be_tls_close(Port *port)
  *	Read data from a secure connection.
  */
 ssize_t
-be_tls_read(Port *port, void *ptr, size_t len)
+be_tls_read(Port *port, void *ptr, size_t len, int *waitfor)
 {
 	ssize_t		n;
 	int			err;
-	int			waitfor;
-	int			latchret;
 
-rloop:
 	errno = 0;
 	n = SSL_read(port-ssl, ptr, len);
 	err = SSL_get_error(port-ssl, n);
@@ -528,39 +525,15 @@ rloop:
 			port-count += n;
 			break;
 		case SSL_ERROR_WANT_READ:
+			*waitfor = WL_SOCKET_READABLE;
+			errno = EWOULDBLOCK;
+			n = -1;
+			break;
 		case SSL_ERROR_WANT_WRITE:
-			/* Don't retry if the socket is in nonblocking mode. */
-			if (port-noblock)
-			{
-errno = EWOULDBLOCK;
-n = -1;
-break;
-			}
-
-			waitfor = WL_LATCH_SET;
-
-			if (err == SSL_ERROR_WANT_READ)
-waitfor |= WL_SOCKET_READABLE;
-			else
-waitfor |= WL_SOCKET_WRITEABLE;
-
-			latchret = WaitLatchOrSocket(MyLatch, waitfor, port-sock, 0);
-
-			/*
-			 * We'll, among other situations, get here if the low level
-			 * routine doing the actual recv() via the socket got interrupted
-			 * by a signal. That's so we can handle interrupts once outside
-			 * openssl, so we don't jump out from underneath its covers. We
-			 * can check this 

Re: [HACKERS] pg_check_dir comments and implementation mismatch

2015-02-12 Thread Marco Nenciarini
Il 02/02/15 21:48, Robert Haas ha scritto:
 On Sat, Jan 31, 2015 at 8:28 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 Il 30/01/15 03:54, Michael Paquier ha scritto:
 On Fri, Jan 30, 2015 at 2:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 There is at least one other bug in that function now that I look at it:
 in event of a readdir() failure, it neglects to execute closedir().
 Perhaps not too significant since all existing callers will just exit()
 anyway after a failure, but still ...
 I would imagine that code scanners like coverity or similar would not
 be happy about that. ISTM that it is better to closedir()
 appropriately in all the code paths.


 I've attached a new version of the patch fixing the missing closedir on
 readdir error.
 
 If readir() fails and closedir() succeeds, the return will be -1 but
 errno will be 0.
 

The attached patch should fix it.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/src/port/pgcheckdir.c b/src/port/pgcheckdir.c
index 7061893..7102f2c 100644
*** a/src/port/pgcheckdir.c
--- b/src/port/pgcheckdir.c
***
*** 22,28 
   * Returns:
   *0 if nonexistent
   *1 if exists and empty
!  *2 if exists and not empty
   *-1 if trouble accessing directory (errno reflects the error)
   */
  int
--- 22,30 
   * Returns:
   *0 if nonexistent
   *1 if exists and empty
!  *2 if exists and contains _only_ dot files
!  *3 if exists and contains a mount point
!  *4 if exists and not empty
   *-1 if trouble accessing directory (errno reflects the error)
   */
  int
*** pg_check_dir(const char *dir)
*** 32,37 
--- 34,41 
DIR*chkdir;
struct dirent *file;
booldot_found = false;
+   boolmount_found = false;
+   int readdir_errno;
  
chkdir = opendir(dir);
if (chkdir == NULL)
*** pg_check_dir(const char *dir)
*** 51,60 
{
dot_found = true;
}
else if (strcmp(lost+found, file-d_name) == 0)
{
!   result = 3; /* not empty, mount 
point */
!   break;
}
  #endif
else
--- 55,64 
{
dot_found = true;
}
+   /* lost+found directory */
else if (strcmp(lost+found, file-d_name) == 0)
{
!   mount_found = true;
}
  #endif
else
*** pg_check_dir(const char *dir)
*** 64,72 
}
}
  
!   if (errno || closedir(chkdir))
result = -1;/* some kind of I/O error? */
  
/* We report on dot-files if we _only_ find dot files */
if (result == 1  dot_found)
result = 2;
--- 68,87 
}
}
  
!   if (errno)
result = -1;/* some kind of I/O error? */
  
+   /* Close chkdir and avoid overwriting the readdir errno on success */
+   readdir_errno = errno;
+   if (closedir(chkdir))
+   result = -1;/* error executing closedir */
+   else
+   errno = readdir_errno;
+ 
+   /* We report on mount point if we find a lost+found directory */
+   if (result == 1  mount_found)
+   result = 3;
+ 
/* We report on dot-files if we _only_ find dot files */
if (result == 1  dot_found)
result = 2;


signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-02-12 Thread Robert Haas
On Tue, Feb 10, 2015 at 3:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I've now taken this idea as far as building the required infrastructure
 and revamping a couple of array operators to use it.  There's a lot yet
 to do, but I've done enough to get some preliminary ideas about
 performance (see below).

Very impressive.  This is something that's been mentioned before and
which I always thought would be great to have, but I didn't expect it
would be this easy to cobble together a working implementation.  Or
maybe easy isn't the right term, but this isn't a very big patch.

 BTW, I'm not all that thrilled with the deserialized object terminology.
 I found myself repeatedly tripping up on which form was serialized and
 which de-.  If anyone's got a better naming idea I'm willing to adopt it.

My first thought is that we should form some kind of TOAST-like
backronym, like Serialization Avoidance Loading and Access Device
(SALAD) or Break-up, Read, Edit, Assemble, and Deposit (BREAD).  I
don't think there is anything per se wrong with the terms
serialization and deserialization; indeed, I used the same ones in the
parallel-mode stuff.  But they are fairly general terms, so it might
be nice to have something more specific that applies just to this
particular usage.

I found the notion of primary and secondary TOAST pointers to be
quite confusing.  I *think* what you are doing is storing two pointers
to the object in the object, and a pointer to the object is really a
pointer to one of those two pointers to the object.  Depending on
which one it is, you can write the object, or not.  This is a clever
representation, but it's hard to wrap your head around, and I'm not
sure primary and secondary are the best names, although I don't
have an idea as to what would be better.  I'm a bit confused, though:
once you give out a secondary pointer, how is it safe to write the
object through the primary pointer?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Logical decoding document

2015-02-12 Thread Tatsuo Ishii
Hi, I need help.

In 46.6.4.5 Change Callback

   Note: Only changes in user defined tables that are not unlogged
   (see UNLOGGED) and not temporary (see TEMPORARY or TEMP) can be
   extracted using logical decoding.

I cannot parse the sentence above. Maybe logical decoding does not
decode a table if it is an unloged table or a temporary table?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standby receiving part of missing WAL segment

2015-02-12 Thread Robert Haas
On Wed, Feb 11, 2015 at 12:55 PM, Thom Brown t...@linux.com wrote:
 Today I witnessed a situation which appears to have gone down like this:

 - The primary server starting streaming WAL data from segment 00A8 to the
 standby
 - The standby server started receiving that data
 - Before 00A8 is finished, the wal sender process dies on the primary, but
 the archiver process continues, and 00A8 ends up being archived as usual
 - The primary continues to generate WAL and cleans up old WAL files from
 pg_xlog until 00A8 is gone.
 - The primary is restarted and the wal sender process is back up and running
 - The standby says waiting for 00A8, which it can no longer get from the
 primary
 - 00A8 is in the standby's archive directory, but the standby is waiting for
 the rest of the segment from the primary via streaming replication, so
 doesn't check the archive
 - The standby is restarted
 - The standby goes back into recovery and eventually replays 00A8 and
 continues as normal.

 Should the standby be able to get feedback from the primary that the
 requested segment is no longer available, and therefore know to check its
 archive?

Last time I played around with this, if the standby requested a
segment from the master that was no longer present there, the standby
would immediately get an ERROR, which it seems like would get you out
of trouble.  I wonder why that didn't happen in your case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq's multi-threaded SSL callback handling is busted

2015-02-12 Thread Jan Urbański

Jan Urbański writes:

 Andres Freund writes:

 On 2015-02-12 09:31:27 +0100, Jan Urbański wrote:
 That doesn't solve the problem of the Python deadlock, where you're not at
 leisure to call a C function at the beginning of your module.

 We could just never unload the hooks...

 That's what we did before 4e816286533dd34c10b368487d4079595a3e1418 :) And it
 got changed after 
 http://www.postgresql.org/message-id/48620925.6070...@pws.com.au


  * If there's already callbacks set: Remember that fact and don't
overwrite. In the next major version: warn.

 So yeah, that was my initial approach - check if callbacks are set, don't do
 the dance if they are. It felt like a crutch, though, and racy at that. 
 There's
 no atomic way to test-and-set those callbacks. The window for racyness is
 small, though.

 If you do that check during library initialization instead of every
 connection it shouldn't be racy - if that part is run in a multithreaded
 fashion you're doing something crazy.

 Yes, that's true. The problem is that there's no real libpq initialisation
 function. The docs say that:

 If your application initializes libssl and/or libcrypto libraries and libpq 
 is
 built with SSL support, you should call PQinitOpenSSL

 So most apps will just not bother. The moment you know you'll need SSL is only
 when you get an 'S' message from the server...

For the sake of discussion, here's a patch to prevent stomping on
previously-set callbacks, racy as it looks.

FWIW, it does fix the Python deadlock and doesn't cause the PHP segfault...

J
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index a32af34..54312a5 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -710,6 +710,9 @@ verify_peer_name_matches_certificate(PGconn *conn)
  *	Callback functions for OpenSSL internal locking
  */
 
+#if OPENSSL_VERSION_NUMBER  0x1000
+/* OpenSSL 1.0.0 deprecates the CRYPTO_set_id_callback function and provides a
+ * default implementation, so there's no need for our own. */
 static unsigned long
 pq_threadidcallback(void)
 {
@@ -720,6 +723,7 @@ pq_threadidcallback(void)
 	 */
 	return (unsigned long) pthread_self();
 }
+#endif   /* OPENSSL_VERSION_NUMBER  0x1000 */
 
 static pthread_mutex_t *pq_lockarray;
 
@@ -806,9 +810,14 @@ pgtls_init(PGconn *conn)
 
 		if (ssl_open_connections++ == 0)
 		{
-			/* These are only required for threaded libcrypto applications */
-			CRYPTO_set_id_callback(pq_threadidcallback);
-			CRYPTO_set_locking_callback(pq_lockingcallback);
+			/* These are only required for threaded libcrypto applications, but
+			 * make sure we don't stomp on them if they're already set. */
+#if OPENSSL_VERSION_NUMBER  0x1000
+			if (CRYPTO_get_id_callback() == NULL)
+CRYPTO_set_id_callback(pq_threadidcallback);
+#endif
+			if (CRYPTO_get_locking_callback() == NULL)
+CRYPTO_set_locking_callback(pq_lockingcallback);
 		}
 	}
 #endif   /* ENABLE_THREAD_SAFETY */
@@ -885,9 +894,14 @@ destroy_ssl_system(void)
 
 	if (pq_init_crypto_lib  ssl_open_connections == 0)
 	{
-		/* No connections left, unregister libcrypto callbacks */
-		CRYPTO_set_locking_callback(NULL);
-		CRYPTO_set_id_callback(NULL);
+		/* No connections left, unregister libcrypto callbacks, if no one
+		 * registered different ones in the meantime. */
+#if OPENSSL_VERSION_NUMBER  0x1000
+		if (CRYPTO_get_id_callback() == pq_threadidcallback)
+			CRYPTO_set_id_callback(NULL);
+#endif
+		if (CRYPTO_get_locking_callback() == pq_lockingcallback)
+			CRYPTO_set_locking_callback(NULL);
 
 		/*
 		 * We don't free the lock array or the SSL_context. If we get another

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL renegotiation and other related woes

2015-02-12 Thread Heikki Linnakangas

On 02/12/2015 01:33 AM, Andres Freund wrote:

On 2015-02-11 14:54:03 +0200, Heikki Linnakangas wrote:

Thoughts? Can you reproduce any errors with this?


I'm on battery right now, so I can't really test much.

Did you test having an actual standby instead of pg_receivexlog? I saw
some slightly different errors there.

Does this patch alone work for you or did you test it together with the
earlier one to fix the renegotiation handling server side when
nonblocking? Because that frequently seemed to error out for me, at
least over actual network. A latch loop + checking for the states seemed
to fix that for me.


This patch alone worked for me.


+* NB: This relies on the calling code to call pqsecure_read(), 
completing
+* the renegotiation handshake, if pqsecure_write() returns 0. Otherwise
+* we'll never make progress.
+*/


I think this should a) refer to the fact that pqSendSome does that in
blocking scenarios b) expand the comment around the pqReadData to
reference that fact.

How are we going to deal with callers using libpq in nonblocking mode. A
client doing a large COPY FROM STDIN to the server using a nonblocking
connection (not a stupid thing to do) could hit this IIUC.


Hmm, that's problematic even without SSL. If you do a large COPY FROM 
STDIN, and the server sends a lot of NOTICEs, the NOTICEs will be 
accumulated in the TCP send and receive buffers until they fill up. 
After that, the server will block on the send, and will stop reading the 
tuple data the client sends, and you get a deadlock. In blocking mode, 
pqSendSome calls pqReadData to avoid that.


I think pqSendSome should call pqReadData() after getting EWOULDBLOCK, 
even in non-blocking mode. It won't completely prevent the problem: it's 
possible that the receive buffer fills up, but the application doesn't 
call pqSendSome() until the socket becomes writeable, which won't happen 
until the client drains the incoming data is drained and unblocks the 
server. But it greatly reduces the risk. In practice that will solve the 
problem for SSL renegotiations.


We should also document that the application should always wait for the 
socket readable condition, in addition to writeable, and call 
pqConsumeInput(). That fixes the issue for good.



Could we perhaps call SSL_peek() when SSL_write() hits WANTS_READ? In
combination with your fix I think that should fix the possibility of
such a deadlock? Especially on the serverside where the socket most of
the time is is in, although emulated, blocking mode that seems easier -
we can just retry afterwards.


Hmm. Not sure what the semantics of SSL_peek() are. At least we'd need 
to call it with a large enough buffer that it would read all the 
incoming data from the OS buffer. I think it would be safer to just call 
SSL_read(). Both the server and libpq have an input buffer that you can 
easily read into at any time.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-02-12 Thread Marco Nenciarini
Hi,

I've attached an updated version of the patch. This fixes the issue on
checksum calculation for segments after the first one.

To solve it I've added an optional uint32 *segno argument to
parse_filename_for_nontemp_relation, so I can know the segment number
and calculate the block number correctly.

Il 29/01/15 18:57, Robert Haas ha scritto:
 On Thu, Jan 29, 2015 at 9:47 AM, Marco Nenciarini
 marco.nenciar...@2ndquadrant.it wrote:
 The current implementation of copydir function is incompatible with LSN
 based incremental backups. The problem is that new files are created,
 but their blocks are still with the old LSN, so they will not be backed
 up because they are looking old enough.
 
 I think this is trying to pollute what's supposed to be a pure
 fs-level operation (copy a directory) into something that is aware
 of specific details like the PostgreSQL page format.  I really think
 that nothing in storage/file should know about the page format.  If we
 need a function that copies a file while replacing the LSNs, I think
 it should be a new function living somewhere else.
 

I've named it copydir_set_lsn and placed it as static function in
dbcommands.c. This lefts the copydir and copy_file functions in copydir.c
untouched. The copydir function in copydir.c is now unused, while the copy_file
function is still used during unlogged tables reinit.

 A bigger problem is that you are proposing to stamp those files with
 LSNs that are, for lack of a better word, fake.  I would expect that
 this would completely break if checksums are enabled.  Also, unlogged
 relations typically have an LSN of 0; this would change that in some
 cases, and I don't know whether that's OK.


I've investigate a bit and I have not been able to find any problem here.

 The issues here are similar to those in
 http://www.postgresql.org/message-id/20150120152819.gc24...@alap3.anarazel.de
 - basically, I think we need to make CREATE DATABASE and ALTER
 DATABASE .. SET TABLESPACE fully WAL-logged operations, or this is
 never going to work right.  If we're not going to allow that, we need
 to disallow hot backups while those operations are in progress.


As already said the copydir-LSN patch should be treated as a temporary
until a proper WAL logging of CREATE DATABASE and ALTER DATABASE SET
TABLESPACE will be implemented. At that time we could probably get rid
of the whole copydir.[ch] file moving the copy_file function inside reinit.c

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it


diff --git a/src/backend/storage/file/reinit.c 
b/src/backend/storage/file/reinit.c
index afd9255..02b5fee 100644
*** a/src/backend/storage/file/reinit.c
--- b/src/backend/storage/file/reinit.c
*** static void ResetUnloggedRelationsInTabl
*** 28,35 
  int 
op);
  static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
   int op);
- static bool parse_filename_for_nontemp_relation(const char *name,
-   int 
*oidchars, ForkNumber *fork);
  
  typedef struct
  {
--- 28,33 
*** ResetUnloggedRelationsInDbspaceDir(const
*** 388,446 
fsync_fname((char *) dbspacedirname, true);
}
  }
- 
- /*
-  * Basic parsing of putative relation filenames.
-  *
-  * This function returns true if the file appears to be in the correct format
-  * for a non-temporary relation and false otherwise.
-  *
-  * NB: If this function returns true, the caller is entitled to assume that
-  * *oidchars has been set to the a value no more than OIDCHARS, and thus
-  * that a buffer of OIDCHARS+1 characters is sufficient to hold the OID
-  * portion of the filename.  This is critical to protect against a possible
-  * buffer overrun.
-  */
- static bool
- parse_filename_for_nontemp_relation(const char *name, int *oidchars,
-   
ForkNumber *fork)
- {
-   int pos;
- 
-   /* Look for a non-empty string of digits (that isn't too long). */
-   for (pos = 0; isdigit((unsigned char) name[pos]); ++pos)
-   ;
-   if (pos == 0 || pos  OIDCHARS)
-   return false;
-   *oidchars = pos;
- 
-   /* Check for a fork name. */
-   if (name[pos] != '_')
-   *fork = MAIN_FORKNUM;
-   else
-   {
-   int forkchar;
- 
-   forkchar = forkname_chars(name[pos + 1], fork);
-   if (forkchar = 0)
-   return false;
-   pos += forkchar + 1;
-   }
- 
-   /* Check for a segment number. */
-   if (name[pos] == '.')
-   {
-   int segchar;
- 
-

Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Robert Haas
On Thu, Feb 12, 2015 at 12:16 AM, Noah Misch n...@leadboat.com wrote:
 That is a major mark against putting the check in simplify_function(), agreed.

I do see one way to rescue that idea, which is this: put two flags,
parallelModeOK and parallelModeRequired, into PlannerGlobal.  At the
beginning of planning, set parallelModeOK based on our best knowledge
at that time; as we preprocess expressions, update it to false if we
see something that's not parallel-safe.  Emit paths for parallel plans
only if the flag is currently true.  At the end of planning, when we
convert paths to plans, set parallelModeRequired if any parallel plan
elements are generated.  If we started with parallelModeOK = true or
ended up with parallelModeRequired = false, then life is good.  In the
unfortunate event that we end up with parallelModeOK = false and
parallelModeRequired = true, replan, this time with parallelModeOK =
false from the beginning.

If there are no sub-queries involved, this will work out fine -
parallelModeOK will always be definitively set to the right value
before we rely on it for anything.  This includes cases where
subqueries are eliminated by pulling them up.  If there *are*
subqueries, we'll still be OK if we happen to hit the parallel-unsafe
construct before we hit the part - if any - that can benefit from
parallelism.

So this would mostly be pretty cheap, but if you do hit the case where
a re-plan is required it would be pretty painful.

  Unless we want to rejigger this so that we do a
  complete eval_const_expressions() pass over the entire query tree
  (including all subqueries) FIRST, and then only after that go back and
  plan all of those subqueries, I don't see how to make this work; and
  I'm guessing that there are good reasons not to do that.

 I expect that would work fine, but I do think it premature to venture that far
 out of your way to optimize this new tree examination.  The cost may just not
 matter.  Other parts of the planner use code like contain_volatile_functions()
 and contain_nonstrict_functions(), which have the same kind of inefficiency
 you're looking to avoid here.

They do, but those are run on much smaller pieces of the query tree,
never on the whole thing.  I really wish Tom Lane would weigh in here,
as an opinion from him could save me from spending time on a doomed
approach.  I expect criticism of this type:

http://www.postgresql.org/message-id/22266.1269641...@sss.pgh.pa.us

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-12 Thread Alexander Korotkov
Hi!

On Mon, Feb 9, 2015 at 11:52 PM, Thom Brown t...@linux.com wrote:

 Google Summer of Code 2015 is approaching.  I'm intending on registering
 PostgreSQL again this year.

 Before I do that, I'd like to have an idea of how many people are
 interested in being either a student or a mentor.


I'm ready to participate as mentor again!

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Standby receiving part of missing WAL segment

2015-02-12 Thread Robert Haas
On Thu, Feb 12, 2015 at 9:56 AM, Thom Brown t...@linux.com wrote:
 On 12 February 2015 at 13:56, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Feb 11, 2015 at 12:55 PM, Thom Brown t...@linux.com wrote:
  Today I witnessed a situation which appears to have gone down like this:
 
  - The primary server starting streaming WAL data from segment 00A8 to
  the
  standby
  - The standby server started receiving that data
  - Before 00A8 is finished, the wal sender process dies on the primary,
  but
  the archiver process continues, and 00A8 ends up being archived as usual
  - The primary continues to generate WAL and cleans up old WAL files from
  pg_xlog until 00A8 is gone.
  - The primary is restarted and the wal sender process is back up and
  running
  - The standby says waiting for 00A8, which it can no longer get from
  the
  primary
  - 00A8 is in the standby's archive directory, but the standby is waiting
  for
  the rest of the segment from the primary via streaming replication, so
  doesn't check the archive
  - The standby is restarted
  - The standby goes back into recovery and eventually replays 00A8 and
  continues as normal.
 
  Should the standby be able to get feedback from the primary that the
  requested segment is no longer available, and therefore know to check
  its
  archive?

 Last time I played around with this, if the standby requested a
 segment from the master that was no longer present there, the standby
 would immediately get an ERROR, which it seems like would get you out
 of trouble.  I wonder why that didn't happen in your case.


 Yeah, I've tried recreating this like so:

 - Primary streams to standby like usual
 - Kill -9 primary then change its port and bring it back up
 - Create traffic on primary until it no longer has the WAL file the standby
 wants, but has archived it
 - Change the port of the primary back to what the standby is trying to talk
 to

 But before it gets to that 4th point, the standby has gone to the archive
 for the rest of it:

 cp: cannot stat ‘/tmp/walarch/0001000600C8’: No such file or
 directory
 2015-02-12 14:47:52 GMT [8280]: [1-1] user=,db=,client= FATAL:  could not
 connect to the primary server: could not connect to server: Connection
 refused
 Is the server running on host 127.0.0.1 and accepting
 TCP/IP connections on port 5488?

 cp: cannot stat ‘/tmp/walarch/0001000600C8’: No such file or
 directory
 2015-02-12 14:47:57 GMT [8283]: [1-1] user=,db=,client= FATAL:  could not
 connect to the primary server: could not connect to server: Connection
 refused
 Is the server running on host 127.0.0.1 and accepting
 TCP/IP connections on port 5488?

 2015-02-12 14:48:02 GMT [8202]: [6-1] user=,db=,client= LOG:  restored log
 file 0001000600C8 from archive

 I don't suppose this is something that was buggy in 9.3.1?

I don't know.  I don't remember a bug of that type, but I might have
missed it.  If you see the scenario again, it might help to pull a
stack trace from the master and the standby.  That might tell us
whether the master or the standby is to blame, and where the offender
is wedged.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Standby receiving part of missing WAL segment

2015-02-12 Thread Thom Brown
On 12 February 2015 at 13:56, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Feb 11, 2015 at 12:55 PM, Thom Brown t...@linux.com wrote:
  Today I witnessed a situation which appears to have gone down like this:
 
  - The primary server starting streaming WAL data from segment 00A8 to the
  standby
  - The standby server started receiving that data
  - Before 00A8 is finished, the wal sender process dies on the primary,
 but
  the archiver process continues, and 00A8 ends up being archived as usual
  - The primary continues to generate WAL and cleans up old WAL files from
  pg_xlog until 00A8 is gone.
  - The primary is restarted and the wal sender process is back up and
 running
  - The standby says waiting for 00A8, which it can no longer get from
 the
  primary
  - 00A8 is in the standby's archive directory, but the standby is waiting
 for
  the rest of the segment from the primary via streaming replication, so
  doesn't check the archive
  - The standby is restarted
  - The standby goes back into recovery and eventually replays 00A8 and
  continues as normal.
 
  Should the standby be able to get feedback from the primary that the
  requested segment is no longer available, and therefore know to check its
  archive?

 Last time I played around with this, if the standby requested a
 segment from the master that was no longer present there, the standby
 would immediately get an ERROR, which it seems like would get you out
 of trouble.  I wonder why that didn't happen in your case.


Yeah, I've tried recreating this like so:

- Primary streams to standby like usual
- Kill -9 primary then change its port and bring it back up
- Create traffic on primary until it no longer has the WAL file the standby
wants, but has archived it
- Change the port of the primary back to what the standby is trying to talk
to

But before it gets to that 4th point, the standby has gone to the archive
for the rest of it:

cp: cannot stat ‘/tmp/walarch/0001000600C8’: No such file or
directory
2015-02-12 14:47:52 GMT [8280]: [1-1] user=,db=,client= FATAL:  could not
connect to the primary server: could not connect to server: Connection
refused
Is the server running on host 127.0.0.1 and accepting
TCP/IP connections on port 5488?

cp: cannot stat ‘/tmp/walarch/0001000600C8’: No such file or
directory
2015-02-12 14:47:57 GMT [8283]: [1-1] user=,db=,client= FATAL:  could not
connect to the primary server: could not connect to server: Connection
refused
Is the server running on host 127.0.0.1 and accepting
TCP/IP connections on port 5488?

2015-02-12 14:48:02 GMT [8202]: [6-1] user=,db=,client= LOG:  restored log
file 0001000600C8 from archive

I don't suppose this is something that was buggy in 9.3.1?

Thom


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Thom Brown
On 12 February 2015 at 10:40, Anastasia Lubennikova lubennikov...@gmail.com
 wrote:

 Thanks for answer.
 Now it seems to be applied correctly.


(please avoid top-posting)

Thanks for the updated patch.  I can confirm that it now cleanly applies
and compiles fine.  I've run the tested in the SQL file you provided, and
it's behaving as expected.

Thom


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-02-12 Thread Robert Haas
On Thu, Feb 12, 2015 at 9:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 My first thought is that we should form some kind of TOAST-like
 backronym, like Serialization Avoidance Loading and Access Device
 (SALAD) or Break-up, Read, Edit, Assemble, and Deposit (BREAD).  I
 don't think there is anything per se wrong with the terms
 serialization and deserialization; indeed, I used the same ones in the
 parallel-mode stuff.  But they are fairly general terms, so it might
 be nice to have something more specific that applies just to this
 particular usage.

 Hm.  I'm not against the concept, but those particular suggestions don't
 grab me.

Fair enough.  I guess the core of my point is just that I suggest we
invent a name for this thing.  Serialize and deserialize describe
what you are doing just fine, but the mechanism itself should be
called something, I think.  When you say varlena header or TOAST
pointer that is a name for a very particular thing, not just a
general category of things you might do.  If we replaced every
instance of TOAST pointer to reference to where the full value is
stored, it would be much less clear, and naming all of the related
functions would be harder.

 This is a clever
 representation, but it's hard to wrap your head around, and I'm not
 sure primary and secondary are the best names, although I don't
 have an idea as to what would be better.  I'm a bit confused, though:
 once you give out a secondary pointer, how is it safe to write the
 object through the primary pointer?

 It's no different from allowing plpgsql to update the values of variables
 of pass-by-reference types even though it has previously given out Datums
 that are pointers to them: by the time we're ready to execute an
 assignment, any query execution that had such a pointer is over and done
 with.  (This implies that cursor parameters have to be physically copied
 into the cursor's execution state, which is one of a depressingly large
 number of reasons why datumCopy() has to physically copy a deserialized
 value rather than just copying the pointer.  But otherwise it works.)

OK, I see.  So giving out a secondary pointer doesn't necessarily
preclude further changes via the primary pointer, but you'd better be
sure that you don't try until such time as all of those secondary
references are gone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Thom Brown
On 12 February 2015 at 16:40, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:

 Thanks for answer.
 Now it seems to be applied correctly.


 * Documentation is missing.


Anastasia provided a documentation patch in the first email.

Thom


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Fabrízio de Royes Mello
On Thu, Feb 12, 2015 at 3:07 PM, Thom Brown t...@linux.com wrote:

 On 12 February 2015 at 16:40, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:

 Thanks for answer.
 Now it seems to be applied correctly.


 * Documentation is missing.


 Anastasia provided a documentation patch in the first email.


Yeah, but it's a good practice send all the patches together. Will make the
life of the reviewers better ;-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Heikki Linnakangas

On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:

Thanks for answer.
Now it seems to be applied correctly.


Thanks, it would be great to get this completed.

This still leaks memory with the same test query as earlier. The leak 
seems to be into a different memory context now; it used to be to the 
GiST scan context, but now it's to the ExecutorState context. See 
attached patch which makes the leak evident.


Looking at my previous comments from August:

* What's the reason for turning GISTScanOpaqueData.pageData from an
array to a List?

* Documentation is missing.

- Heikki
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 0925e56..3768c9c 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -526,6 +526,12 @@ gistgettuple(PG_FUNCTION_ARGS)
  * It's always head of so-pageData
  */
 so-pageData = list_delete_first(so-pageData);
+{
+	static int lastreport = 0;
+	if ((lastreport++) % 1 == 0)
+		MemoryContextStats(CurrentMemoryContext);
+}
+
 PG_RETURN_BOOL(TRUE);
 			}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Robert Haas
On Thu, Feb 12, 2015 at 6:40 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 If we have to go this way, then isn't it better to evaluate the same
 when we are trying to create parallel path (something like in the
 parallel_seq scan patch - create_parallelscan_paths())?

Probably not, because many queries will scan multiple relations, and
we want to do all of this work just once per query.  Also, when
somebody adds another parallel node (e.g. parallel hash join) that
will need this same information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assessing parallel-safety

2015-02-12 Thread Robert Haas
On Wed, Feb 11, 2015 at 3:21 PM, Robert Haas robertmh...@gmail.com wrote:
 I think we may want a dedicated parallel-safe property for functions
 rather than piggybacking on provolatile ...

I went through the current contents of pg_proc and tried to assess how
much parallel-unsafe stuff we've got.  I think there are actually
three categories of things: (1) functions that can be called in
parallel mode either in the worker or in the leader (parallel safe),
(2) functions that can be called in parallel mode in the worker, but
not in the leader (parallel restricted), and (3) functions that
cannot be called in parallel mode at all (parallel unsafe).  On a
first read-through, the number of things that looked not to be
anything other than parallel-safe looked to be fairly small; many of
these could be made parallel-safe with more work, but it's unlikely to
be worth the effort.

current_query() - Restricted because debug_query_string is not copied.
lo_open(), lo_close(), loread(), lowrite(), and other large object
functions - Restricted because large object state is not shared.
age(xid) - Restricted because it uses a transaction-lifespan cache
which is not shared.
now() - Restricted because transaction start timestamp is not copied.
statement_timestamp() - Restricted because statement start timestamp
is not copied.
pg_conf_load_time() - Restricted because PgReloadTime is not copied.
nextval(), currval() - Restricted because sequence-related state is not shared.
setval() - Unsafe because no data can be written in parallel mode.
random(), setseed() - Restricted because random seed state is not
shared. (We could alternatively treat setseed() as unsafe and random()
to be restricted only in sessions where setseed() has never been
called, and otherwise safe.)
pg_stat_get_* - Restricted because there's no guarantee the value
would be the same in the parallel worker as in the leader.
pg_backend_pid() - Restricted because the worker has a different PID.
set_config() - Unsafe because GUC state must remain synchronized.
pg_my_temp_schema() - Restricted because temporary namespaces aren't
shared with parallel workers.
pg_export_snapshot() - Restricted because the worker will go away quickly.
pg_prepared_statement(), pg_cursor() - Restricted because the prepared
statements and cursors are not synchronized with the worker.
pg_listening_channels() - Restricted because listening channels are
not synchronized with the worker.
pg*advisory*lock*() - Restricted because advisory lock state is not
shared with workers - and even if it were, the semantics would be hard
to reason about.
txid_current() - Unsafe because it might attempt XID assignment.
pg_logical_slot*() - Unsafe because they do all kinds of crazy stuff.

That's not a lot, and very little of it is anything you'd care about
parallelizing anyway.  I expect that the incidence of user-written
parallel-unsafe functions will be considerably higher.  I'm not sure
if this impacts the decision about how to design the facility for
assessing parallel-safety or not, but I thought it was worth sharing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting rid of wal_level=archive and default to hot_standby + wal_senders

2015-02-12 Thread Peter Eisentraut
On 2/3/15 11:00 AM, Robert Haas wrote:
 Crazy ideas: Could we make wal_level something other than
 PGC_POSTMASTER?  PGC_SIGHUP would be nice...  Could we, maybe, even
 make it a derived value rather than one that is explicitly configured?
  Like, if you set max_wal_senders0, you automatically get
 wal_level=hot_standby?  If you register a logical replication slot,
 you automatically get wal_level=logical?

We could probably make wal_level changeable at run-time if we somehow
recorded to the point at which it was changed, as you describe later (or
even brute-force it by forcing a checkpoint every time it is changed,
which is not worse than what we require now (or even just write out a
warning that the setting is not effective until after a checkpoint)).

But that still leaves max_wal_senders (and arguably
max_replication_slots) requiring a restart before replication can start.
 I don't see a great plan for those on the horizon.

To me, the restart requirement is the killer.

That there are so many interwoven settings isn't great either, but there
will always be more options, and all we can do is manage it.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers