Re: [HACKERS] Hash Indexes

2016-09-15 Thread Amit Kapila
On Thu, Sep 15, 2016 at 7:25 PM, Robert Haas  wrote:
> On Thu, Sep 15, 2016 at 2:13 AM, Amit Kapila  wrote:
>> One other point, I would like to discuss is that currently, we have a
>> concept for tracking active hash scans (hashscan.c) which I think is
>> mainly to protect splits when the backend which is trying to split has
>> some scan open. You can read "Other Notes" section of
>> access/hash/README for further details.  I think after this patch we
>> don't need that mechanism for splits because we always retain a pin on
>> bucket buffer till all the tuples are fetched or scan is finished
>> which will defend against a split by our own backend which tries to
>> ensure cleanup lock on bucket.
>
> Hmm, yeah.  It seems like we can remove it.
>
>> However, we might need it for vacuum
>> (hashbulkdelete), if we want to get rid of cleanup lock in vacuum,
>> once we have a page-at-a-time scan mode implemented for hash indexes.
>> If you agree with above analysis, then we can remove the checks for
>> _hash_has_active_scan from both vacuum and split path and also remove
>> corresponding code from hashbegin/end scan, but retain that hashscan.c
>> for future improvements.
>
> Do you have a plan for that?  I'd be inclined to just blow away
> hashscan.c if we don't need it any more, unless you're pretty sure
> it's going to get reused.  It's not like we can't pull it back out of
> git if we decide we want it back after all.
>

I do want to work on it, but it is always possible that due to some
other work this might get delayed.  Also, I think there is always a
chance that while doing that work, we face some problem due to which
we might not be able to use that optimization.  So I will go with your
suggestion of removing hashscan.c and it's usage for now and then if
required we will pull it back.  If nobody else thinks otherwise, I
will update this in next patch version.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Pavan Deolasee
On Fri, Sep 16, 2016 at 9:09 AM, Pavan Deolasee 
wrote:

>
> I also realised that we can compact the TID array in step (b) above
> because we only need to store 17 bits for block numbers (we already know
> which 1GB segment they belong to). Given that usable offsets are also just
> 13 bits, TID array needs only 4 bytes per TID instead of 6.
>
>
Actually this seems like a clear savings of at least 30% for all use cases,
at the cost of allocating in smaller chunks and doing some transformations.
But given the problem we are trying to solve, seems like a small price to
pay.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-15 Thread Ashutosh Bapat
On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov  wrote:
> Hello!
> Does parallel secscan works in plpgsql?
>

Parallel seq scan is a query optimization that will work independent
of the source of the query - i.e whether it comes directly from a
client or a procedural language like plpgsql. So, I guess, answer to
your question is yes. If you are expecting something else, more
context will help.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Pavan Deolasee
On Fri, Sep 16, 2016 at 12:24 AM, Claudio Freire 
wrote:

> On Thu, Sep 15, 2016 at 3:48 PM, Tomas Vondra
>  wrote:
> > For example, we always allocate the TID array as large as we can fit into
> > m_w_m, but maybe we don't need to wait with switching to the bitmap until
> > filling the whole array - we could wait as long as the bitmap fits into
> the
> > remaining part of the array, build it there and then copy it to the
> > beginning (and use the bitmap from that point).
>
> The bitmap can be created like that, but grow from the end of the
> segment backwards.
>
> So the scan can proceed until the bitmap fills the whole segment
> (filling backwards), no copy required.
>

I thought about those approaches when I suggested starting with half m_w_m.
So you fill in TIDs from one end and upon reaching half point, convert that
into bitmap (assuming stats tell you that there is significant savings with
bitmaps) but fill it from the other end. Then reset the TID array and start
filling up again. That guarantees that you can always work within available
limit.

But I actually wonder if we are over engineering things and overestimating
cost of memmove etc. How about this simpler approach:

1. Divide table in some fixed size chunks like Robert suggested. Say 1GB
2. Allocate pointer array to store a pointer to each segment. For 1TB
table, thats about 8192 bytes.
3. Allocate a bitmap which can hold MaxHeapTuplesPerPage * chunk size in
pages. For 8192 block and 1GB chunk, thats about 4.6MB. *Note*: I'm
suggesting to use a bitmap here because provisioning for worst case, fixed
size TID array will cost us 200MB+ where as a bitmap is just 4.6MB.
4. Collect dead tuples in that 1GB chunk. Also collect stats so that we
know about the most optimal representation.
5. At the end of 1GB scan, if no dead tuples found, set the chunk pointer
to NULL, move to next chunk and restart step 4. If dead tuples found, then
check:
   a. If bitmap can be further compressed by using less number of bits per
page. If so, allocate a new bitmap and compress the bitmap.
   b. If TID array will be a more compact representation. If so, allocate a
TID array of right size and convert bitmap into an array.
   c. Set chunk pointer to whichever representation we choose (of course
add headers etc to interpret the representation)
6. Continue until we consume all m_w_m or end-of-table is reached. If we
consume all m_w_m then do a round of index cleanup and restart.

I also realised that we can compact the TID array in step (b) above because
we only need to store 17 bits for block numbers (we already know which 1GB
segment they belong to). Given that usable offsets are also just 13 bits,
TID array needs only 4 bytes per TID instead of 6.

Many people are working on implementing different ideas, and I can
volunteer to write a patch on these lines unless someone wants to do that.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] kqueue

2016-09-15 Thread Thomas Munro
On Thu, Sep 15, 2016 at 11:04 AM, Thomas Munro
 wrote:
> On Thu, Sep 15, 2016 at 10:48 AM, Keith Fiske  wrote:
>> Thomas Munro brought up in #postgresql on freenode needing someone to test a
>> patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz
>> Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered
>> to give it a try.
>>
>> Bench setup was:
>> pgbench -i -s 100 -d postgres
>>
>> I ran this against 96rc1 instead of HEAD like most of the others in this
>> thread seem to have done. Not sure if that makes a difference and can re-run
>> if needed.
>> With higher concurrency, this seems to cause decreased performance. You can
>> tell which of the runs is the kqueue patch by looking at the path to
>> pgbench.
>
> Thanks Keith.  So to summarise, you saw no change with 1 client, but
> with 4 clients you saw a significant drop in performance (~93K TPS ->
> ~80K TPS), and a smaller drop for 64 clients (~72 TPS -> ~68K TPS).
> These results seem to be a nail in the coffin for this patch for now.
>
> Thanks to everyone who tested.  I might be back in a later commitfest
> if I can figure out why and how to fix it.

Ok, here's a version tweaked to use EVFILT_PROC for postmaster death
detection instead of the pipe, as Tom Lane suggested in another
thread[1].

The pipe still exists and is used for PostmasterIsAlive(), and also
for the race case where kevent discovers that the PID doesn't exist
when you try to add it (presumably it died already, but we want to
defer the report of that until you call EventSetWait, so in that case
we stick the traditional pipe into the kqueue set as before so that
it'll fire a readable-because-EOF event then).

Still no change measurable on my laptop.  Keith, would you be able to
test this on your rig and see if it sucks any less than the last one?

[1] https://www.postgresql.org/message-id/13774.1473972000%40sss.pgh.pa.us

-- 
Thomas Munro
http://www.enterprisedb.com


kqueue-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stopping logical replication protocol

2016-09-15 Thread Craig Ringer
On 9 September 2016 at 12:03, Craig Ringer  wrote:

> Setting "waiting on author" in CF per discussion of the need for tests.

Have you had a chance to look at adding the tests we discussed?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL consistency check facility

2016-09-15 Thread Michael Paquier
On Thu, Sep 15, 2016 at 7:30 PM, Kuntal Ghosh
 wrote:
> Thoughts?

There are still a couple of things that this patch makes me unhappy,
particularly the handling of the GUC with the xlogreader flags. I am
not sure if I'll be able to look at that again within the next couple
of weeks, but please be sure that this is registered in the next
commit fest. You could for example do that by changing the patch from
"Returned with Feedback" to "Moved to next CF" in the commit fest app.
Be sure as well to spend a couple of cycles in reviewing patches.
Usually for one patch sent, that's one patch of equal difficulty to
review, and there are many patch still waiting for feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: function xmltable

2016-09-15 Thread Peter Eisentraut
On 9/12/16 1:14 AM, Craig Ringer wrote:
> I would've expected once per query. There's no way the expressions can
> reference the row data, so there's no reason to evaluate them each
> time.
> 
> The only use case I see for evaluating them each time is - maybe -
> DEFAULT. Where maybe there's a use for nextval() or other volatile
> functions. But honestly, I think that's better done explicitly in a
> post-pass, i.e.
> 
> select uuid_generate_v4(), x.*
> from (
>   xmltable(.) x
> );
> 
> in cases where that's what the user actually wants.
> 
> There's no other case I can think of where expressions as arguments to
> set-returning functions are evaluated once per output row.

The SQL standard appears to show what the behavior ought to be:

 is equivalent to

LATERAL ( XNDC
SELECT SLI1 AS CN1, SLI2 AS CN2, ..., SLINC AS CNNC FROM XMLITERATE (
XMLQUERY ( XTRP XQAL
 RETURNING SEQUENCE BY REF EMPTY ON EMPTY ) )
AS I ( V, N )
) AS CORR DCLP

and SLIj is

CASE WHEN XEj
THEN XMLCAST( XQCj AS DTj CPMj )
ELSE DEFj END

where DEFj is the default expression.

So simplified it is

LATERAL ( SELECT CASE WHEN ... ELSE DEFj END, ... FROM something )

which indicates that the default expression is evaluated for every row.

If we're not sure about all this, it might be worth restricting the
default expressions to stable or immutable expressions for the time being.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: About CMake v2

2016-09-15 Thread Michael Paquier
On Fri, Sep 16, 2016 at 4:38 AM, Yury Zhuravlev
 wrote:
> Michael Paquier wrote:
>>
>> Could it be possible to get a refreshed patch on this thread for at
>> least the sake of the archives? I'd really like to see somehting
>> happening here and do some progress for this CF.
>
>
> Sure, I will do it on Friday.
> Today I finished mingw+msys support. (mingw without msys has not passed some
> tests)

Okay. That sounds good to me. I don't recall what your patch is
exactly doing but could you still keep the vanilla Makefiles around?
This will reduce the diff of the patch, and we'd need anyway to keep
the former Makefile methods around until the buildfarm scripts are
updated, the animals do the switch and then get green. So a period
where both live in parallel is unavoidable.

I heard as well that MySQL is using cmake... It may be interesting to
see how they are integrating with it as a large project and avoid
their past mistakes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-09-15 Thread Craig Ringer
On 2 September 2016 at 23:29, Petr Jelinek  wrote:

> You could put it to txid.c where all the other txid stuff is in?

Yeah, even though it's in adt/ I think it'll do.

I thought I'd need get_xid_in_recent_past() for catalog_xmin hot
standby feedback, but upon closer examination the needed logic isn't
the same anymore. txid_status() wants to ensure clog lookups are safe
and limit by oldest xid, wheras the walsender doesn't actually care
about that and is just avoiding wrapped xids.

I'm just going back to how it was, all in adt/txid.c, and making it
static again. We can move it and make it non-static if a need to do so
comes up.

Attached rebased patch updated and vs master.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 462a0ab51935b45d17820b83b8e9f6abd4ad2904 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 13 Sep 2016 11:06:58 +0800
Subject: [PATCH 1/3] Install the Perl TAP tests

---
 src/Makefile  |  3 ++-
 src/test/Makefile |  2 +-
 src/test/perl/GNUmakefile | 39 +++
 3 files changed, 42 insertions(+), 2 deletions(-)
 create mode 100644 src/test/perl/GNUmakefile

diff --git a/src/Makefile b/src/Makefile
index b526be7..977f80b 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -26,7 +26,8 @@ SUBDIRS = \
 	bin \
 	pl \
 	makefiles \
-	test/regress
+	test/regress \
+	test/perl
 
 # There are too many interdependencies between the subdirectories, so
 # don't attempt parallel make here.
diff --git a/src/test/Makefile b/src/test/Makefile
index 7f7754f..6b40cf5 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = regress isolation modules recovery
+SUBDIRS = perl regress isolation modules recovery
 
 # We don't build or execute examples/, locale/, or thread/ by default,
 # but we do want "make clean" etc to recurse into them.  Likewise for ssl/,
diff --git a/src/test/perl/GNUmakefile b/src/test/perl/GNUmakefile
new file mode 100644
index 000..3c5dc70
--- /dev/null
+++ b/src/test/perl/GNUmakefile
@@ -0,0 +1,39 @@
+#-
+#
+# Makefile for src/test/perl
+#
+# Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/perl/Makefile
+#
+#-
+
+subdir = src/test/perl
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(pgxsdir)/$(subdir)'
+
+ifeq ($(enable_tap_tests),yes)
+
+install: all installdirs
+	$(INSTALL_DATA) $(srcdir)/TestLib.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/TestLib.pm'
+	$(INSTALL_DATA) $(srcdir)/SimpleTee.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/SimpleTee.pm'
+	$(INSTALL_DATA) $(srcdir)/RecursiveCopy.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/RecursiveCopy.pm'
+	$(INSTALL_DATA) $(srcdir)/PostgresNode.pm '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgresNode.pm'
+
+uninstall:
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/TestLib.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/SimpleTee.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/RecursiveCopy.pm'
+	rm -f '$(DESTDIR)$(pgxsdir)/$(subdir)/PostgresNode.pm'
+
+else
+
+install: ;
+
+uninstall: ;
+
+endif
-- 
2.5.5

From 86aff40374e05fec0160cbab0d1879bd01c6f411 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 13 Sep 2016 11:00:41 +0800
Subject: [PATCH 2/3] Add install rules for isolation tester

Allow 'make install' for the isolation tester to work so it can be
used from PGXS extensions.
---
 src/Makefile|  1 +
 src/test/isolation/Makefile | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/src/Makefile b/src/Makefile
index 977f80b..d4aa06b 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 	pl \
 	makefiles \
 	test/regress \
+	test/isolation \
 	test/perl
 
 # There are too many interdependencies between the subdirectories, so
diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile
index 3d272d5..e111bf0 100644
--- a/src/test/isolation/Makefile
+++ b/src/test/isolation/Makefile
@@ -66,3 +66,14 @@ installcheck-prepared-txns: all temp-install
 
 check-prepared-txns: all temp-install
 	./pg_isolation_regress --temp-instance=./tmp_check $(TEMP_CONF) $(EXTRA_REGRESS_OPTS) --inputdir=$(srcdir) --schedule=$(srcdir)/isolation_schedule prepared-transactions
+
+install: all installdirs
+	$(INSTALL_PROGRAM) isolationtester$(X) '$(DESTDIR)$(pgxsdir)/$(subdir)/isolationtester$(X)'
+	$(INSTALL_PROGRAM) pg_isolation_regress$(X) '$(DESTDIR)$(pgxsdir)/$(subdir)/pg_isolation_regress$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(pgxsdir)/$(subdir)'
+
+uninstall:
+	rm -f 

Re: [HACKERS] patch: function xmltable

2016-09-15 Thread Craig Ringer
On 15 September 2016 at 19:31, Pavel Stehule  wrote:

> b_expr enforces shift/reduce conflict :(

No problem then. I just thought it'd be worth allowing more if it
worked to do so.

> I found other opened question - how we can translate empty tag to SQL value?
> The Oracle should not to solve this question, but PostgreSQL does. Some
> databases returns empty string.

Oracle doesn't solve the problem? it ERRORs?

> I prefer return a empty string - not null in this case.

I agree, and that's consistent with how most XML is interpreted. XSLT
for example considers  and  to be pretty much the same
thing.

>  The reason is simple
> - Empty string is some information - and NULL is less information. When it
> is necessary I can transform empty string to NULL - different direction is
> not unique.

Yep, I definitely agree. The only issue is if people want a DEFAULT to
be applied for empty tags. But that's something they can do in a
post-process pass easily enough, since XMLTABLE is callable as a
subquery / WITH expression / etc.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] condition variables

2016-09-15 Thread Peter Geoghegan
Maybe I can leave it up to you to determine if that applies in the context
of my parallel create index patch. You are the obvious candidate to review
that patch anyway, of course.

--
Peter Geoghegan


Re: [HACKERS] shm_mq_set_sender() crash

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 5:22 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Of course, it's also possible that the ParallelWorkerNumber code is
>> entirely correct and something overwrote the null bytes that were
>> supposed to be found at that location.  It would be very useful to see
>> (a) the value of ParallelWorkerNumber and (b) the contents of
>> vmq->mq_sender, and in particular whether that's actually a valid
>> pointer to a PGPROC in the ProcArray.  But unless we can reproduce
>> this I don't see how to manage that.
>
> Is it worth replacing that Assert with a test-and-elog that would
> print those values?
>
> Given that we've seen only one such instance in the buildfarm, this
> might've been just a cosmic ray bit-flip.  So one part of me says
> not to worry too much until we see it again.  OTOH, if it is real
> but rare, missing an opportunity to diagnose would be bad.

I wonder if we could persuade somebody to run pgbench on a Windows
machine with a similar environment, at least some concurrency, and
force_parallel_mode=on.  Assuming this is a generic
initialize-the-parallel-stuff bug and not something specific to a
particular query, that might well trigger it a lot quicker than
waiting for it to recur in the BF.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix comment in ATExecValidateConstraint

2016-09-15 Thread Robert Haas
On Thu, Aug 18, 2016 at 9:01 PM, Amit Langote
 wrote:
> Reads much less ambiguous, so +1.
>
> Except in the doc patch:
>
> s/change the type of a column constraint/change the type of a column/g
>
> I fixed that part in the attached.

Thanks.  Committed; sorry for the delay.

(As some of those of you following along at home may have noticed, I'm
catching up on old emails.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-09-15 Thread Tom Lane
Andreas Karlsson  writes:
> On 09/15/2016 05:38 PM, Alvaro Herrera wrote:
>> I suppose some interested party could grab the patch that Heikki
>> committed to the new branches and produce a back-patch that can be
>> applied to the older branches.

> Here is the result of backporting the sum of the two patches on top of 
> REL9_4_STABLE. Not sure if we need this, but if we do we can apply this 
> patch.

If someone's done the legwork, I think we would be well advised to
back-patch.  Maybe not bother with 9.1 though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-09-15 Thread Andreas Karlsson

On 09/15/2016 05:38 PM, Alvaro Herrera wrote:

I suppose some interested party could grab the patch that Heikki
committed to the new branches and produce a back-patch that can be
applied to the older branches.


Here is the result of backporting the sum of the two patches on top of 
REL9_4_STABLE. Not sure if we need this, but if we do we can apply this 
patch.


Andreas
diff --git a/configure b/configure
index 6c6c08d..9470ed1 100755
--- a/configure
+++ b/configure
@@ -8621,9 +8621,9 @@ else
   as_fn_error $? "library 'crypto' is required for OpenSSL" "$LINENO" 5
 fi
 
- { $as_echo "$as_me:${as_lineno-$LINENO}: checking for SSL_library_init in -lssl" >&5
-$as_echo_n "checking for SSL_library_init in -lssl... " >&6; }
-if ${ac_cv_lib_ssl_SSL_library_init+:} false; then :
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for SSL_new in -lssl" >&5
+$as_echo_n "checking for SSL_new in -lssl... " >&6; }
+if ${ac_cv_lib_ssl_SSL_new+:} false; then :
   $as_echo_n "(cached) " >&6
 else
   ac_check_lib_save_LIBS=$LIBS
@@ -8637,27 +8637,27 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 #ifdef __cplusplus
 extern "C"
 #endif
-char SSL_library_init ();
+char SSL_new ();
 int
 main ()
 {
-return SSL_library_init ();
+return SSL_new ();
   ;
   return 0;
 }
 _ACEOF
 if ac_fn_c_try_link "$LINENO"; then :
-  ac_cv_lib_ssl_SSL_library_init=yes
+  ac_cv_lib_ssl_SSL_new=yes
 else
-  ac_cv_lib_ssl_SSL_library_init=no
+  ac_cv_lib_ssl_SSL_new=no
 fi
 rm -f core conftest.err conftest.$ac_objext \
 conftest$ac_exeext conftest.$ac_ext
 LIBS=$ac_check_lib_save_LIBS
 fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_ssl_SSL_library_init" >&5
-$as_echo "$ac_cv_lib_ssl_SSL_library_init" >&6; }
-if test "x$ac_cv_lib_ssl_SSL_library_init" = xyes; then :
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_ssl_SSL_new" >&5
+$as_echo "$ac_cv_lib_ssl_SSL_new" >&6; }
+if test "x$ac_cv_lib_ssl_SSL_new" = xyes; then :
   cat >>confdefs.h <<_ACEOF
 #define HAVE_LIBSSL 1
 _ACEOF
@@ -8727,9 +8727,9 @@ else
   as_fn_error $? "library 'eay32' or 'crypto' is required for OpenSSL" "$LINENO" 5
 fi
 
- { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing SSL_library_init" >&5
-$as_echo_n "checking for library containing SSL_library_init... " >&6; }
-if ${ac_cv_search_SSL_library_init+:} false; then :
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing SSL_new" >&5
+$as_echo_n "checking for library containing SSL_new... " >&6; }
+if ${ac_cv_search_SSL_new+:} false; then :
   $as_echo_n "(cached) " >&6
 else
   ac_func_search_save_LIBS=$LIBS
@@ -8742,11 +8742,11 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 #ifdef __cplusplus
 extern "C"
 #endif
-char SSL_library_init ();
+char SSL_new ();
 int
 main ()
 {
-return SSL_library_init ();
+return SSL_new ();
   ;
   return 0;
 }
@@ -8759,25 +8759,25 @@ for ac_lib in '' ssleay32 ssl; do
 LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
   fi
   if ac_fn_c_try_link "$LINENO"; then :
-  ac_cv_search_SSL_library_init=$ac_res
+  ac_cv_search_SSL_new=$ac_res
 fi
 rm -f core conftest.err conftest.$ac_objext \
 conftest$ac_exeext
-  if ${ac_cv_search_SSL_library_init+:} false; then :
+  if ${ac_cv_search_SSL_new+:} false; then :
   break
 fi
 done
-if ${ac_cv_search_SSL_library_init+:} false; then :
+if ${ac_cv_search_SSL_new+:} false; then :
 
 else
-  ac_cv_search_SSL_library_init=no
+  ac_cv_search_SSL_new=no
 fi
 rm conftest.$ac_ext
 LIBS=$ac_func_search_save_LIBS
 fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_SSL_library_init" >&5
-$as_echo "$ac_cv_search_SSL_library_init" >&6; }
-ac_res=$ac_cv_search_SSL_library_init
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_SSL_new" >&5
+$as_echo "$ac_cv_search_SSL_new" >&6; }
+ac_res=$ac_cv_search_SSL_new
 if test "$ac_res" != no; then :
   test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
 
@@ -8797,6 +8797,37 @@ _ACEOF
 fi
 done
 
+  # Functions introduced in OpenSSL 1.1.0. We used to check for
+  # OPENSSL_VERSION_NUMBER, but that didn't work with 1.1.0, because LibreSSL
+  # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
+  # doesn't have these OpenSSL 1.1.0 functions. So check for individual
+  # functions.
+  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data RAND_OpenSSL
+do :
+  as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
+ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
+if eval test \"x\$"$as_ac_var"\" = x"yes"; then :
+  cat >>confdefs.h <<_ACEOF
+#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1
+_ACEOF
+
+fi
+done
+
+  # OpenSSL versions before 1.1.0 required setting callback functions, for
+  # thread-safety. In 1.1.0, it's no longer required, and CRYPTO_lock()
+  # function was removed.
+  for ac_func in CRYPTO_lock
+do :
+  ac_fn_c_check_func "$LINENO" "CRYPTO_lock" "ac_cv_func_CRYPTO_lock"
+if test "x$ac_cv_func_CRYPTO_lock" = xyes; then :
+  cat 

Re: [HACKERS] RLS related docs

2016-09-15 Thread Joe Conway
On 09/15/2016 01:33 PM, Robert Haas wrote:
> On Sun, Aug 28, 2016 at 4:23 PM, Joe Conway  wrote:
 For COPY, I think perhaps it would be more logical to put the new note
 immediately after the third note which describes the privileges
 required, since it's kind of related, and then we can talk about the
 RLS policies required, e.g.:

 If row-level security is enabled for the table, COPY table TO is
 internally converted to COPY (SELECT * FROM table) TO, and the
 relevant security policies are applied. Currently, COPY FROM is not
 supported for tables with row-level security.
>>>
>>> This sounds better than what I had, so I will do it that way.
>>
>> Apologies for the delay, but new patch attached. Assuming no more
>> comments, will commit this, backpatched to 9.5, in a day or two.
> 
> I don't think this was ever committed, but my comment is that it seems
> to be exposing rather more of the implementation than is probably
> wise.  Can't we say that SELECT policies will apply rather than saying
> that it is internally converted to a SELECT?

I've not committed it yet because I was going to look into the new info
Dean mentioned first. Seems like your wording is fine, so will make that
change.

Joe


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-09-15 Thread Christoph Berg
Re: Heikki Linnakangas 2016-09-15 <7e4991a9-410f-5e1f-2a3a-e918e4a4b...@iki.fi>
> > I'm afraid it's not that easy - Debian 9 (stretch) will release at the
> > beginning of next year, and apt.postgresql.org will want to build
> > 9.2/9.3/9.4 for that distribution. I guess yum.postgresql.org will
> > have the same problem with the next Fedora release.
> 
> Can you elaborate? Are you saying that Debian 9 (strect) will not ship
> OpenSSL 1.0.2 anymore, and will require using OpenSSL 1.1.0?

I thought that was the plan, but upon asking on #debian-devel, it
seems it's not set yet. I'll ask the maintainers directly and report
back.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-15 Thread Peter Geoghegan
On Thu, Sep 15, 2016 at 1:51 PM, Heikki Linnakangas  wrote:
> BTW, does a 1-way merge make any sense? I was surprised to see this in the
> log, even without this patch:
>
> LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
> STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER BY
> i) t;
> LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
> STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER BY
> i) t;
> LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
> STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER BY
> i) t;
> LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
> STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER BY
> i) t;
> LOG:  finished 3-way merge step: CPU 0.62s/7.23u sec elapsed 8.44 sec
> STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER BY
> i) t;
> LOG:  finished 6-way merge step: CPU 0.62s/7.24u sec elapsed 8.44 sec
> STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER BY
> i) t;
> LOG:  finished 6-way merge step: CPU 0.62s/7.24u sec elapsed 8.45 sec
> STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER BY
> i) t;

Another thing that I think it worth pointing out here is that the
number of merge passes shown is excessive, practically speaking. I
suggested that we have something like checkpoint_warning for this last
year, which Greg Stark eventually got behind, but Robert Haas didn't
seem to like. Maybe this idea should be revisited. What do you think?

There is no neat explanation for why it's considered excessive to
checkpoint every 10 seconds, but not every 2 minutes. But, we warn
about the former case by default, and not the latter. It's hard to
know exactly where to draw the line, but that isn't a great reason to
not do it (maybe one extra merge pass is a good threshold -- that's
what I suggested once). I think that other database systems similarly
surface multiple merge passes. It's just inefficient to ever do
multiple merge passes, even if you're very frugal with memory.
Certainly, it's almost impossible to defend doing 3+ passes these
days.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shm_mq_set_sender() crash

2016-09-15 Thread Tom Lane
Robert Haas  writes:
> Of course, it's also possible that the ParallelWorkerNumber code is
> entirely correct and something overwrote the null bytes that were
> supposed to be found at that location.  It would be very useful to see
> (a) the value of ParallelWorkerNumber and (b) the contents of
> vmq->mq_sender, and in particular whether that's actually a valid
> pointer to a PGPROC in the ProcArray.  But unless we can reproduce
> this I don't see how to manage that.

Is it worth replacing that Assert with a test-and-elog that would
print those values?

Given that we've seen only one such instance in the buildfarm, this
might've been just a cosmic ray bit-flip.  So one part of me says
not to worry too much until we see it again.  OTOH, if it is real
but rare, missing an opportunity to diagnose would be bad.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set log_line_prefix and application name in test drivers

2016-09-15 Thread Robert Haas
On Sat, Aug 27, 2016 at 3:59 PM, Tom Lane  wrote:
> Christoph Berg  writes:
>> I've always been wondering why we don't set a log_line_prefix by
>> default.
>
> I think the odds of getting to something that everyone would agree on
> are nil, so I'm not excited about getting into that particular
> bikeshed-painting discussion.  Look at the amount of trouble we're
> having converging on a default for the regression tests, which are
> a far narrower use-case than "everybody".

Well, practically anything that includes a PID and the timestamp is
going to be an improvement over the status quo.  Just because we can't
all agree on what would be perfect does not mean that we can't do
better than what we've got now.  +1 for trying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shm_mq_set_sender() crash

2016-09-15 Thread Robert Haas
On Sat, Aug 27, 2016 at 3:43 AM, Amit Kapila  wrote:
> On Fri, Aug 26, 2016 at 6:20 PM, Tom Lane  wrote:
>> Latest from lorikeet:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet=2016-08-26%2008%3A37%3A27
>>
>> TRAP: FailedAssertion("!(vmq->mq_sender == ((void *)0))", File: 
>> "/home/andrew/bf64/root/REL9_6_STABLE/pgsql.build/../pgsql/src/backend/storage/ipc/shm_mq.c",
>>  Line: 220)
>>
>
> Do you think, it is due to some recent change or we are just seeing
> now as it could be timing specific issue?
>
> So here what seems to be happening is that during worker startup, we
> are trying to set the sender for a shared memory queue and the same is
> already set.  Now, one theoretical possibility of the same could be
> that the two workers get the same ParallelWorkerNumber which is then
> used to access the shm queue (refer
> ParallelWorkerMain/ExecParallelGetReceiver).  We are setting the
> ParallelWorkerNumber in below code which seems to be doing what it is
> suppose to do:
>
> LaunchParallelWorkers()
> {
> ..
> for (i = 0; i < pcxt->nworkers; ++i)
> {
> memcpy(worker.bgw_extra, , sizeof(int));
> if (!any_registrations_failed &&
> RegisterDynamicBackgroundWorker(,
> >worker[i].bgwhandle))
> ..
> }
>
> Can some reordering impact the above code?

I don't think so.  Your guess that ParallelWorkerNumber is getting
messed up somehow seems like a good one, but I don't see anything
wrong with that code.  There's actually a pretty long chain here.
That code copies the value of the local variable i into
worker.bgw_extra.  Then, RegisterDynamicBackgroundWorker copies the
whole structure into shared memory.  Then, running inside the
postmaster, BackgroundWorkerStateChange copies it into the postmaster
address space.  But, since this is Windows, that copy doesn't actually
passed to the worker; instead, BackgroundWorkerEntry() copies the data
from shared memory into the new worker processes' MyBgworkerEntry.
Then BackgroundWorkerMain() copies the data from there to
ParallelWorkerNumber.  In theory any of those places could be going
wrong somehow, though none of them can be completely busted because
they all work at least most of the time.

Of course, it's also possible that the ParallelWorkerNumber code is
entirely correct and something overwrote the null bytes that were
supposed to be found at that location.  It would be very useful to see
(a) the value of ParallelWorkerNumber and (b) the contents of
vmq->mq_sender, and in particular whether that's actually a valid
pointer to a PGPROC in the ProcArray.  But unless we can reproduce
this I don't see how to manage that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-09-15 Thread Tom Lane
Andres Freund  writes:
> On 2016-09-15 16:48:59 -0400, Tom Lane wrote:
>> The patch that I posted would run both the generate_series(1, 2) and
>> generate_series(2,4) calls in the same SRF node, forcing them to run in
>> lockstep, after which their results would be fed to the SRF node doing
>> the top-level SRFs.  We could probably change it to run them in separate
>> nodes, but I don't see any principled way to decide which one goes first
>> (and in some variants of this example, it would matter).

> I think that's fine. I personally still think we're *much* better off
> getting rid of the non-lockstep variants. You're still on the fence
> about retaining the LCM behaviour (for the same nesting level at least)?

I'm happy to get rid of the LCM behavior, I just want to have some wiggle
room to be able to get it back if somebody really needs it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-09-15 Thread Andres Freund
On 2016-09-15 16:48:59 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Hm. One thing I wonder about with this approach, is how we're going to
> > handle something absurd like:
> > SELECT generate_series(1, generate_series(1, 2)), generate_series(1, 
> > generate_series(2,4));
> 
> The patch that I posted would run both the generate_series(1, 2) and
> generate_series(2,4) calls in the same SRF node, forcing them to run in
> lockstep, after which their results would be fed to the SRF node doing
> the top-level SRFs.  We could probably change it to run them in separate
> nodes, but I don't see any principled way to decide which one goes first
> (and in some variants of this example, it would matter).

I think that's fine. I personally still think we're *much* better off
getting rid of the non-lockstep variants. You're still on the fence
about retaining the LCM behaviour (for the same nesting level at least)?


> I think the LATERAL approach would face exactly the same issues: how
> many LATERAL nodes do you use, and what's their join order?

I think this specific issue could be handled in a bit easier to grasp
variant. My PoC basically generated one RTE for each "query
level". There'd have been one RTE for generate_series(1,2), one for
gs(2,4) and one for gs(1, var(gs(1,2))), gs(1, var(gs(2,4))). Lateral
machiner would force the join order to have the argument srfs first, and
then the twoi combined srf with lateral arguments after that.

> I think we could get away with defining it like this (ie, SRFs at the same
> SRF nesting level run in lockstep) as long as it's documented.  Whatever
> the current behavior is for such cases would be pretty bizarre too.

Indeed.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-15 Thread Peter Geoghegan
On Thu, Sep 15, 2016 at 1:51 PM, Heikki Linnakangas  wrote:
>> I still don't get why you're doing all of this within mergeruns() (the
>> beginning of when we start merging -- we merge all quicksorted runs),
>> rather than within beginmerge() (the beginning of one particular merge
>> pass, of which there are potentially more than one). As runs are
>> merged in a non-final merge pass, fewer tapes will remain active for
>> the next merge pass. It doesn't do to do all that up-front when we
>> have multiple merge passes, which will happen from time to time.
>
>
> Now that the pre-reading is done in logtape.c, it doesn't stop at a run
> boundary. For example, when we read the last 1 MB of the first run on a
> tape, and we're using a 10 MB read buffer, we will merrily also read the
> first 9 MB from the next run. You cannot un-read that data, even if the tape
> is inactive in the next merge pass.

I'm not sure that I like that approach. At the very least, it seems to
not be a good fit with the existing structure of things. I need to
think about it some more, and study how that plays out in practice.

> BTW, does a 1-way merge make any sense?

Not really, no, but it's something that I've seen plenty of times.

This is seen when runs are distributed such that mergeonerun() only
finds one real run on all active tapes, with all other active tapes
having only dummy runs. In general, dummy runs are "logically merged"
(decremented) in preference to any real runs on the same tape (that's
the reason why they exist), so you end up with what we call a "1-way
merge" when you see one real one on one active tape only. You never
see something like "0-way merge" within trace_sort output, though,
because that case is optimized to be almost a no-op.

It's almost a no-op because when it happens then mergeruns() knows to
itself directly decrement the number of dummy runs once for each
active tape, making that "logical merge" completed with only that
simple change in metadata (that is, the "merge" completes by just
decrementing dummy run counts -- no actual call to mergeonerun()).

We could optimize away "1-way merge" cases, perhaps, so that tuples
don't have to be spilt out one at a time (there could perhaps instead
be just some localized change to metadata, a bit like the all-dummy
case). That doesn't seem worth bothering with, especially with this
new approach of yours. I prefer to avoid special cases like that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming some binaries

2016-09-15 Thread Robert Haas
On Fri, Aug 26, 2016 at 5:42 PM, Peter Eisentraut
 wrote:
> On 8/26/16 12:26 PM, Euler Taveira wrote:
>> initdb: we already have 'pg_ctl init' (since 9.0) and could remove initdb.
>
> I have a concern specifically about pg_ctl.  Depending on how your
> PostgreSQL is packaged, not all of the pg_ctl actions are safe or
> sensible to run.  For example, if you are using systemd, then using
> pg_ctl restart will probably not do the right thing.  And depending on
> SELinux (maybe), running initdb directly might also not do the right
> thing.  In most cases, you can just not use pg_ctl and do all these
> things differently, but for example pg_ctl promote is the only
> documented way to do that thing.
>
> Until we have a better way to figure that out, I wouldn't want to put
> more things into pg_ctl, especially as the only way.

+1.  Actually, my reasons for not wanting to put more stuff into
pg_ctl are mostly different from yours, but I endorse the conclusion,
at least.  I think that it's just not going to be convenient.  initdb
is a complicated command with a bunch of options that are mostly all
different from the options to pg_ctl.  If we merge the two, then
either pg_ctl has to be able to parse the initdb options and pass them
through, or else you have to have some other convention for passing
arguments through to initdb, like pg_ctl init -o "-X /ssd/foo" which
is really not that much fun to remember or type, especially if the
path has spaces in it and thus requires some kind of nested quoting.

I do think it would make sense to add "pg_" to the beginning of all of
most or all of the command names.  And I do think it would make sense
to just get rid of the scripts that don't do anything more than run a
single command via psql.  Binaries that offer additional functionality
we might as well keep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-15 Thread Heikki Linnakangas

On 09/15/2016 10:12 PM, Peter Geoghegan wrote:

On Wed, Sep 14, 2016 at 10:43 AM, Heikki Linnakangas  wrote:

Spotted another issue with this code just now. Shouldn't it be based
on state->tapeRange? You don't want the destTape to get memory, since
you don't use batch memory for tapes that are written to (and final
on-the-fly merges don't use their destTape at all).



Wait, you're using a local variable maxTapes here, which potentially
differs from state->maxTapes:



I changed that so that it does actually change state->maxTapes. I considered
having a separate numTapes field, that can be smaller than maxTapes, but we
don't need the original maxTapes value after that point anymore, so it
would've been just pro forma to track them separately. I hope the comment
now explains that better.


I still don't get why you're doing all of this within mergeruns() (the
beginning of when we start merging -- we merge all quicksorted runs),
rather than within beginmerge() (the beginning of one particular merge
pass, of which there are potentially more than one). As runs are
merged in a non-final merge pass, fewer tapes will remain active for
the next merge pass. It doesn't do to do all that up-front when we
have multiple merge passes, which will happen from time to time.


Now that the pre-reading is done in logtape.c, it doesn't stop at a run 
boundary. For example, when we read the last 1 MB of the first run on a 
tape, and we're using a 10 MB read buffer, we will merrily also read the 
first 9 MB from the next run. You cannot un-read that data, even if the 
tape is inactive in the next merge pass.


I don't think it makes much difference in practice, because most merge 
passes use all, or almost all, of the available tapes. BTW, I think the 
polyphase algorithm prefers to do all the merges that don't use all 
tapes upfront, so that the last final merge always uses all the tapes. 
I'm not 100% sure about that, but that's my understanding of the 
algorithm, and that's what I've seen in my testing.



Correct me if I'm wrong, but I think that you're more skeptical of the
need for polyphase merge than I am. I at least see no reason to not
keep it around. I also think it has some value. It doesn't make this
optimization any harder, really.


We certainly still need multi-pass merges.

BTW, does a 1-way merge make any sense? I was surprised to see this in 
the log, even without this patch:


LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER 
BY i) t;

LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER 
BY i) t;

LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER 
BY i) t;

LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER 
BY i) t;

LOG:  finished 3-way merge step: CPU 0.62s/7.23u sec elapsed 8.44 sec
STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER 
BY i) t;

LOG:  finished 6-way merge step: CPU 0.62s/7.24u sec elapsed 8.44 sec
STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER 
BY i) t;

LOG:  finished 6-way merge step: CPU 0.62s/7.24u sec elapsed 8.45 sec
STATEMENT:  SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER 
BY i) t;


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-09-15 Thread Tom Lane
Andres Freund  writes:
> Hm. One thing I wonder about with this approach, is how we're going to
> handle something absurd like:
> SELECT generate_series(1, generate_series(1, 2)), generate_series(1, 
> generate_series(2,4));

The patch that I posted would run both the generate_series(1, 2) and
generate_series(2,4) calls in the same SRF node, forcing them to run in
lockstep, after which their results would be fed to the SRF node doing
the top-level SRFs.  We could probably change it to run them in separate
nodes, but I don't see any principled way to decide which one goes first
(and in some variants of this example, it would matter).  I think the
LATERAL approach would face exactly the same issues: how many LATERAL
nodes do you use, and what's their join order?

I think we could get away with defining it like this (ie, SRFs at the same
SRF nesting level run in lockstep) as long as it's documented.  Whatever
the current behavior is for such cases would be pretty bizarre too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Tom Lane
Thomas Munro  writes:
> Very interesting.  Perhaps that is why NetBSD shows a speedup with the
> kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
> kqueue patch to perform better on large FreeBSD systems, it would also
> be a solution to this problem.

I just noticed that kqueue appears to offer a solution to this problem,
ie one of the things you can wait for is exit of another process (named
by PID, looks like).  If that's portable to all kqueue platforms, then
integrating a substitute for the postmaster death pipe might push that
patch over the hump to being a net win.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-09-15 Thread Andres Freund
Hi,

On 2016-09-14 19:28:25 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-09-12 19:35:22 -0400, Tom Lane wrote:
> >> Anyway I'll draft a prototype and then we can compare.
> 
> > Ok, cool.
> 
> Here's a draft patch that is just meant to investigate what the planner
> changes might look like if we do it in the pipelined-result way.

Nice.


> A difficulty with this restriction is that if you have a query like
> "select f1, generate_series(1,2) / 10 from tab" then you end up with both
> a SRF-executing Result and a separate scalar-projection Result above it,
> because the division-by-ten has to happen in a separate plan level.

Makes sense.  I guess we could teach the SRF pipeline node to execute a
series of such steps. Hm. That makes me think of something:

Hm. One thing I wonder about with this approach, is how we're going to
handle something absurd like:
SELECT generate_series(1, generate_series(1, 2)), generate_series(1, 
generate_series(2,4));

I guess we have to do here is
Step: generate_series(1,2), 1, 2, 4
Step: generate_series(1, Var(generate_series(1,2))), 1, 2, 4
Step: Var(generate_series(1, Var(generate_series(1,2, 1, generate_series(2, 
4)
Step: Var(generate_series(1, Var(generate_series(1,2, generate_series(1, 
Var(generate_series(2, 4)))

But that'd still not have the same lockstepping behaviour, right?  I'm
at a conference, and half-ill, so I might just standing on my own brain
here.


> The planner's notions about the cost of Result make it think that this is
> quite expensive --- mainly because the upper Result will be iterated once
> per SRF output row, so that you get hit with cpu_tuple_cost per output row.
> And that in turn leads it to change plans in one or two cases in the
> regression tests.  Maybe that's fine.  I'm worried though that it's right
> that this will be unduly expensive.  So I'm kind of tempted to define the
> SRF-executing node as acting more like, say, Agg or WindowFunc, in that
> it has a list of SRFs to execute and then it has the ability to project a
> scalar tlist on top of those results.

Hah, was thinking the same above ;)


> On the whole I'm pretty pleased with this approach, at least from the
> point of view of the planner.  The net addition of planner code is
> smaller than what you had,

Not by much. But I do agree that there's some advantages here.


> and though I'm no doubt biased, I think this
> version is much cleaner.

Certainly seems a bit easier to extend and adjust behaviour. Not having
to deal with enforcing join order, and having less issues with
determining what to push where is certainly advantageous.  After all,
that was why I initially was thinking of tis approach.

> Also, though this patch doesn't address exactly
> how we might do it, it's fairly clear that it'd be possible to allow
> FDWs and CustomScans to implement SRF execution, eg pushing a SRF down to
> a foreign server, in a reasonably straightforward extension of the
> existing upper-pathification hooks.  If we go with the lateral function
> RTE approach, that's going to be somewhere between hard and impossible.

Hm. Not sure if there's that much point in doing that, but I agree that
the LATERAL approach adds more restrictions.


> So I think we should continue investigating this way of doing things.
> I'll try to take a look at the executor end of it tomorrow.  However
> I'm leaving Friday for a week's vacation, and may not have anything to
> show before that.

If you have something that's halfway recognizable, could you perhaps
post it?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Tom Lane
Marco Pfatschbacher  writes:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.

That seems like a rather bad kernel bug.

> In our case the system had to wakeup ~3000 idle ssh processes
> every time postgresql did call PostmasterIsAlive.

Uh, these are processes not even connected to the pipe in question?
That's *really* a kernel bug.

> Attached patch avoids the select contention by using a
> separate pipe for each auxiliary and background process.

I think this would likely move the performance problems somewhere else.
In particular, it would mean that every postmaster child would inherit
pipes leading to all the older children.  We could close 'em again
I guess, but we have previously found that having to do things that
way is a rather serious performance drag --- see the problems we had
with POSIX named semaphores, here for instance:
https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
I really don't want the postmaster to be holding any per-child kernel
resources.

It'd certainly be nice if we could find another solution besides
the pipe-based one, but I don't think "more pipes" is the answer.
There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG)
when available; do the BSDen have anything like that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-09-15 Thread Heikki Linnakangas

On 09/15/2016 05:33 PM, Christoph Berg wrote:

Re: Michael Paquier 2016-09-15 


On Thu, Sep 15, 2016 at 8:57 PM, Heikki Linnakangas  wrote:

I backpatched this to 9.5, but not further than that. The functions this
modified were moved around in 9.5, so the patch wouldn't apply as is. It
wouldn't be difficult to back-patch further if there's demand, but I'm not
eager to do that until someone complains.


Not going older than 9.5 may be fine:
https://www.openssl.org/blog/blog/2014/12/23/the-new-release-strategy/
https://wiki.freebsd.org/OpenSSL
As far as I can see 1.0.2 would be supported until Dec 2019, so that
would just overlap with 9.4's EOL.


I'm afraid it's not that easy - Debian 9 (stretch) will release at the
beginning of next year, and apt.postgresql.org will want to build
9.2/9.3/9.4 for that distribution. I guess yum.postgresql.org will
have the same problem with the next Fedora release.


Can you elaborate? Are you saying that Debian 9 (strect) will not ship 
OpenSSL 1.0.2 anymore, and will require using OpenSSL 1.1.0?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres abort found in 9.3.11

2016-09-15 Thread Tom Lane
"K S, Sandhya (Nokia - IN/Bangalore)"  writes:
> We tried to replicate the scenario without our patch(exiting postmaster) and 
> still we were able to see the issue.

> Same error was seen this time as well.
> node-0 postgres[8243]: [1-2] HINT:  Is another postmaster already running on 
> port 5433? If not, wait a few seconds and retry.  
> node-1 postgres[8650]: [18-1] PANIC:  btree_xlog_delete_get_latestRemovedXid: 
> cannot operate with inconsistent data

> Crash was not seen in 9.3.9 without the patch but it was reproduced in 9.3.11.
> So something specifically changed between 9.3.9 and 9.3.11 is causing the 
> issue.

Well, I looked through the git history from 9.3.9 to 9.3.11 and I don't
see anything that seems likely to explain a problem here.

If you can reproduce this, which it sounds like you can, maybe you could
create a self-contained test case for other people to try?

Also worth noting is that the current 9.3.x release is 9.3.14.  You
might save yourself some time by updating and seeing if it still
reproduces in 9.3.14.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: About CMake v2

2016-09-15 Thread Yury Zhuravlev

Michael Paquier wrote:

Could it be possible to get a refreshed patch on this thread for at
least the sake of the archives? I'd really like to see somehting
happening here and do some progress for this CF.


Sure, I will do it on Friday.
Today I finished mingw+msys support. (mingw without msys has not passed 
some tests)

--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-09-15 Thread Andres Freund
On 2016-09-15 15:23:58 -0400, Tom Lane wrote:
> Andres Freund  writes:
> >   In my present patch I've *ripped out* the support for materialization
> >   in nodeFunctionscan.c entirely. That means that rescans referencing
> >   volatile functions can change their behaviour (if a function is
> >   rescanned, without having it's parameters changed), and that native
> >   backward scan support is gone.  I don't think that's actually an issue.
> 
> I think you are wrong on this not being an issue: it is critical that
> rescan deliver the same results as before, else for example having a
> function RTE on the inside of a nestloop will give nonsensical/broken
> results.

I find that quite unconvincing. We quite freely re-evaluate functions in
the targetlist again, even if they're volatile and/or SRFs.

If we implement tSRFs as pipeline nodes, we can "simply" default to the
never materializing behaviour there I guess.


> Moreover, I think we'd all agreed that this effort needs to avoid any
> not-absolutely-necessary semantics changes.

I don't agree with that. Adding pointless complications for a niche
edge cases of niche features isn't worth it.


> Another idea is that we could extend the set of ExecInitNode flags
> (EXEC_FLAG_REWIND etc) to tell child nodes whether they need to implement
> rescan correctly in this sense; if they are not RHS children of nestloops,
> and maybe one or two other cases, they don't.  That would give another
> route by which nodeFunctionscan could decide that it can skip
> materialization in common cases.

That's something I've wondered about too. Materializing if rescans are
required is quite acceptable, and probably rather rare in
practice. Seems not unlikely that that information would be valuable for
other node types too.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Thomas Munro
On Fri, Sep 16, 2016 at 1:57 AM, Marco Pfatschbacher
 wrote:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.
>
> [...]
>
>  BUGS
>  [...]
>  "Internally to the kernel, select() and pselect() work poorly if multiple
>  processes wait on the same file descriptor. Given that, it is rather
>  surprising to see that many daemons are written that way."
>
> [1]
>  At least OpenBSD and NetBSD are affected, FreeBSD rewrote
>  their select implementation in 8.0.

Very interesting.  Perhaps that is why NetBSD shows a speedup with the
kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
kqueue patch to perform better on large FreeBSD systems, it would also
be a solution to this problem.

[1] 
https://www.postgresql.org/message-id/flat/CAEepm%3D2i78TOJeV4O0-0meiihiRfVQ29ur7%3DMBHxsUKaPSWeAg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2016-09-15 Thread Tom Lane
Andres Freund  writes:
> 0003-Avoid-materializing-SRFs-in-the-FROM-list.patch
>   To avoid performance regressions from moving SRFM_ValuePerCall SRFs to
>   ROWS FROM, nodeFunctionscan.c needs to support not materializing
>   output.

I looked over this patch a bit.

>   In my present patch I've *ripped out* the support for materialization
>   in nodeFunctionscan.c entirely. That means that rescans referencing
>   volatile functions can change their behaviour (if a function is
>   rescanned, without having it's parameters changed), and that native
>   backward scan support is gone.  I don't think that's actually an issue.

I think you are wrong on this not being an issue: it is critical that
rescan deliver the same results as before, else for example having a
function RTE on the inside of a nestloop will give nonsensical/broken
results.  I think what we'll have to do is allow the optimization of
skipping the tuplestore only when the function is declared nonvolatile.
(If it is, and it nonetheless gives different results on rescan, it's not
our fault if joins give haywire answers.)  I'm okay with not supporting
backward scan, but wrong answers during rescan is a different animal
entirely.

Moreover, I think we'd all agreed that this effort needs to avoid any
not-absolutely-necessary semantics changes.  This one is not only not
necessary, but it would result in subtle hard-to-detect breakage.

It's conceivable that we could allow the executor to be broken this way
and have the planner fix it by inserting a Material node when joining.
But I think it would be messy and would probably not perform as well as
an internal tuplestore --- for one thing, because the planner can't know
whether the function would return a tuplestore, making the external
materialization redundant.

Another idea is that we could extend the set of ExecInitNode flags
(EXEC_FLAG_REWIND etc) to tell child nodes whether they need to implement
rescan correctly in this sense; if they are not RHS children of nestloops,
and maybe one or two other cases, they don't.  That would give another
route by which nodeFunctionscan could decide that it can skip
materialization in common cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Andres Freund
Hi,

On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.

Yikes, that's a pretty absurd implementation.

Does openbsd's kqueue implementation have the same issue? There's
http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com

> Attached patch avoids the select contention by using a
> separate pipe for each auxiliary and background process.

I'm quite unenthusiastic about forcing that many additional file
descriptors onto the postmaster...

I'm not quite sure I understand why this an issue here - there shouldn't
ever be events on this fd, so why is the kernel waking up all processes?
It'd kinda makes sense it'd wake up all processes if there's one
waiting, but ... ?


>  BUGS
>  [...]
>  "Internally to the kernel, select() and pselect() work poorly if multiple
>  processes wait on the same file descriptor. Given that, it is rather
>  surprising to see that many daemons are written that way."

Gee. Maybe it's more surprising that that issue isn't being addressed?


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Marco Pfatschbacher
Hi,

the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour[0] in the select/poll
implementation on some operating systems[1]:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.

In our case the system had to wakeup ~3000 idle ssh processes
every time postgresql did call PostmasterIsAlive.
WalReceiver did run trigger with a rate of ~400 calls per second.
With the result that the system performs very badly,
being mostly busy scheduling idle processs.


Attached patch avoids the select contention by using a
separate pipe for each auxiliary and background process.
Since the postmaster has three different ways to create
new processes, the patch got a bit more complicated than 
I anticipated :)

For auxiliary processes, pgstat, pgarch and the autovacuum launcher
get a preallocated pipe each. The pipes are held in:
  postmaster_alive_fds_own[NUM_AUXPROCTYPES];
  postmaster_alive_fds_watch[NUM_AUXPROCTYPES];

Just before we fork a new process we set postmaster_alive_fd
for each process type:
  postmaster_alive_fd = postmaster_alive_fds_watch[type];

Since there can be multiple backend processes, BackendStarup()
allocates a pipe on-demand and keeps the reference in the Backend
structure. And is closed when the backend terminates.

The patch was developed and tested under OpenBSD using the REL9_4_STABLE
branch.  I've merged it to current, compile tested and ran make check
on Ubuntu 14.04.

   Marco

[0]
 http://man.openbsd.org/OpenBSD-5.9/man2/select.2?manpath=OpenBSD-5.9

 BUGS
 [...]
 "Internally to the kernel, select() and pselect() work poorly if multiple
 processes wait on the same file descriptor. Given that, it is rather
 surprising to see that many daemons are written that way."

[1]
 At least OpenBSD and NetBSD are affected, FreeBSD rewrote
 their select implementation in 8.0.
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1a92ca1..a3b4497 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -369,6 +369,13 @@ StartAutoVacLauncher(void)
 {
 	pid_t		AutoVacPID;
 
+#ifndef WIN32
+	/*
+	 * Set the monitoring pipe for PostmasterIsAlive check
+	 */
+	postmaster_alive_fd = postmaster_alive_fds_watch[AutoVacLauncherProcess];
+#endif
+
 #ifdef EXEC_BACKEND
 	switch ((AutoVacPID = avlauncher_forkexec()))
 #else
@@ -386,7 +393,7 @@ StartAutoVacLauncher(void)
 			InitPostmasterChild();
 
 			/* Close the postmaster's sockets */
-			ClosePostmasterPorts(false);
+			ClosePostmasterPorts(false, true);
 
 			AutoVacLauncherMain(0, NULL);
 			break;
@@ -1447,7 +1454,7 @@ StartAutoVacWorker(void)
 			InitPostmasterChild();
 
 			/* Close the postmaster's sockets */
-			ClosePostmasterPorts(false);
+			ClosePostmasterPorts(false, false);
 
 			AutoVacWorkerMain(0, NULL);
 			break;
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 1aa6466..3ff3479 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -138,6 +138,13 @@ pgarch_start(void)
 		return 0;
 	last_pgarch_start_time = curtime;
 
+#ifndef WIN32
+	/*
+	 * Set the monitoring pipe for PostmasterIsAlive check
+	 */
+	postmaster_alive_fd = postmaster_alive_fds_watch[PgArchiverProcess];
+#endif
+
 #ifdef EXEC_BACKEND
 	switch ((pgArchPid = pgarch_forkexec()))
 #else
@@ -155,7 +162,7 @@ pgarch_start(void)
 			InitPostmasterChild();
 
 			/* Close the postmaster's sockets */
-			ClosePostmasterPorts(false);
+			ClosePostmasterPorts(false, true);
 
 			/* Drop our connection to postmaster's shared memory, as well */
 			dsm_detach_all();
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8a2ce91..600fb14 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -676,6 +676,13 @@ pgstat_start(void)
 		return 0;
 	last_pgstat_start_time = curtime;
 
+#ifndef WIN32
+	/*
+	 * Set the monitoring pipe for PostmasterIsAlive check
+	 */
+	postmaster_alive_fd = postmaster_alive_fds_watch[PgStatProcess];
+#endif
+
 	/*
 	 * Okay, fork off the collector.
 	 */
@@ -696,7 +703,7 @@ pgstat_start(void)
 			InitPostmasterChild();
 
 			/* Close the postmaster's sockets */
-			ClosePostmasterPorts(false);
+			ClosePostmasterPorts(false, true);
 
 			/* Drop our connection to postmaster's shared memory, as well */
 			dsm_detach_all();
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eaf3f61..6eb8f48 100644
--- 

Re: [HACKERS] Postgres abort found in 9.3.11

2016-09-15 Thread K S, Sandhya (Nokia - IN/Bangalore)
Hello,

We tried to replicate the scenario without our patch(exiting postmaster) and 
still we were able to see the issue.

Same error was seen this time as well.
node-0 postgres[8243]: [1-2] HINT:  Is another postmaster already running on 
port 5433? If not, wait a few seconds and retry.  
node-1 postgres[8650]: [18-1] PANIC:  btree_xlog_delete_get_latestRemovedXid: 
cannot operate with inconsistent data

Crash was not seen in 9.3.9 without the patch but it was reproduced in 9.3.11.
So something specifically changed between 9.3.9 and 9.3.11 is causing the issue.

Thanks in advance!!!

Sandhya

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Tuesday, September 06, 2016 5:04 PM
To: K S, Sandhya (Nokia - IN/Bangalore) 
Cc: pgsql-hackers@postgresql.org; Itnal, Prakash (Nokia - IN/Bangalore) 

Subject: Re: [HACKERS] Postgres abort found in 9.3.11

"K S, Sandhya (Nokia - IN/Bangalore)"  writes:
> I was able to find a patch file where there is a call to ExitPostmaster() in 
> postmaster.c . 

> @@ -3081,6 +3081,11 @@
> shmem_exit(1);
> reset_shared(PostPortNumber);
 
> +   /* recovery termination */
> +   ereport(FATAL,
> +   (errmsg("recovery termination due to process crash")));
> +   ExitPostmaster(99);
> +
> StartupPID = StartupDataBase();
> Assert(StartupPID != 0); 
> pmState = PM_STARTUP;

There's no such code in the community sources, and I can't say that
such a patch looks like a bright idea to me.  It would disable any
restart after a crash (not only during recovery).

If you're running a version with assorted random non-community patches,
we can't really offer much support for that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuplesort merge pre-reading

2016-09-15 Thread Peter Geoghegan
On Wed, Sep 14, 2016 at 10:43 AM, Heikki Linnakangas  wrote:
> Addressed all your comments one way or another, new patch attached. Comments
> on some specific points below:

Cool. My response here is written under time pressure, which is not
ideal. I think it's still useful, though.

>> * You should probably point out that typically, access to batch memory
>> will still be sequential, despite your block-based scheme.

> That's not true, the "buffers" in batch memory are not accessed
> sequentially. When we pull the next tuple from a tape, we store it in the
> next free buffer. Usually, that buffer was used to hold the previous tuple
> that was returned from gettuple(), and was just added to the free list.
>
> It's still quite cache-friendly, though, because we only need a small number
> of slots (one for each tape).

That's kind of what I meant, I think -- it's more or less sequential.
Especially in the common case where there is only one merge pass.

> True. I fixed that by putting a MaxAllocSize cap on the buffer size instead.
> I doubt that doing > 1 GB of read-ahead of a single tape will do any good.

You may well be right about that, but ideally that could be verified.
I think that the tuplesort is entitled to have whatever memory the
user makes available, unless that's almost certainly useless. It
doesn't seem like our job to judge that it's always wrong to use extra
memory with only a small expected benefit. If it's actually a
microscopic expected benefit, or just as often negative to the sort
operation's performance, then I'd say it's okay to cap it at
MaxAllocSize. But it's not obvious to me that this is true; not yet,
anyway.

> Hmm. We don't really need the availMem accounting at all, after we have
> started merging. There is nothing we can do to free memory if we run out,
> and we use fairly little memory anyway. But yes, better safe than sorry. I
> tried to clarify the comments on that.

It is true that we don't really care about the accounting at all. But,
the same applies to the existing grow_memtuples() case at the
beginning of merging. The point is, we *do* care about availMem, this
one last time. We must at least produce a sane (i.e. >= 0) number in
any calculation. (I think you understand this already -- just saying.)

> OK. I solved that by calling LogicalTapeRewind(), when we're done reading a
> tape. Rewinding a tape has the side-effect of freeing the buffer. I was
> going to put that into mergereadnext(), but it turns out that it's tricky to
> figure out if there are any more runs on the same tape, because we have the
> "actual" tape number there, but the tp_runs is indexed by "logical" tape
> number. So I put the rewind calls at the end of mergeruns(), and in
> TSS_FINALMERGE processing, instead. It means that we don't free the buffers
> quite as early as we could, but I think this is good enough.

That seems adequate.

>> Spotted another issue with this code just now. Shouldn't it be based
>> on state->tapeRange? You don't want the destTape to get memory, since
>> you don't use batch memory for tapes that are written to (and final
>> on-the-fly merges don't use their destTape at all).

>> Wait, you're using a local variable maxTapes here, which potentially
>> differs from state->maxTapes:

> I changed that so that it does actually change state->maxTapes. I considered
> having a separate numTapes field, that can be smaller than maxTapes, but we
> don't need the original maxTapes value after that point anymore, so it
> would've been just pro forma to track them separately. I hope the comment
> now explains that better.

I still don't get why you're doing all of this within mergeruns() (the
beginning of when we start merging -- we merge all quicksorted runs),
rather than within beginmerge() (the beginning of one particular merge
pass, of which there are potentially more than one). As runs are
merged in a non-final merge pass, fewer tapes will remain active for
the next merge pass. It doesn't do to do all that up-front when we
have multiple merge passes, which will happen from time to time.

Correct me if I'm wrong, but I think that you're more skeptical of the
need for polyphase merge than I am. I at least see no reason to not
keep it around. I also think it has some value. It doesn't make this
optimization any harder, really.

> Hmm, yes, using currentRun here is wrong. It needs to be "currentRun + 1",
> because we need one more tape than there are runs, to hold the output.

As I said, I think it should be the number of active tapes, as you see
today within beginmerge() + mergebatch(). Why not do it that way? If
you don't get the distinction, see my remarks below on final merges
always using batch memory, even when there are to be multiple merge
passes (no reason to add that restriction here). More generally, I
really don't want to mess with the long standing definition of
maxTapes and things like that, because I see no advantage.

> Ah, no, the "+ 1" comes from the 

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Claudio Freire
On Thu, Sep 15, 2016 at 3:48 PM, Tomas Vondra
 wrote:
> For example, we always allocate the TID array as large as we can fit into
> m_w_m, but maybe we don't need to wait with switching to the bitmap until
> filling the whole array - we could wait as long as the bitmap fits into the
> remaining part of the array, build it there and then copy it to the
> beginning (and use the bitmap from that point).

The bitmap can be created like that, but grow from the end of the
segment backwards.

So the scan can proceed until the bitmap fills the whole segment
(filling backwards), no copy required.

I'll try that later, but first I'd like to get multiarray approach
right since that's the basis of it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Tomas Vondra



On 09/15/2016 06:40 PM, Robert Haas wrote:

On Thu, Sep 15, 2016 at 12:22 PM, Tom Lane  wrote:

Tomas Vondra  writes:

On 09/14/2016 07:57 PM, Tom Lane wrote:

People who are vacuuming because they are out of disk space will be very
very unhappy with that solution.



The people are usually running out of space for data, while these files
would be temporary files placed wherever temp_tablespaces points to. I'd
argue if this is a source of problems, the people are already in deep
trouble due to sorts, CREATE INDEX, ... as those commands may also
generate a lot of temporary files.


Except that if you are trying to recover disk space, VACUUM is what you
are doing, not CREATE INDEX.  Requiring extra disk space to perform a
vacuum successfully is exactly the wrong direction to be going in.
See for example this current commitfest entry:
https://commitfest.postgresql.org/10/649/
Regardless of what you think of the merits of that patch, it's trying
to solve a real-world problem.  And as Robert has already pointed out,
making this aspect of VACUUM more complicated is not solving any
pressing problem.  "But we made it faster" is going to be a poor answer
for the next person who finds themselves up against the wall with no
recourse.


I very much agree.



How does VACUUM alone help with recovering disk space? AFAIK it only 
makes the space available for new data, it does not reclaim the disk 
space at all. Sure, we truncate empty pages at the end of the last 
segment, but how likely is that in practice? What I do see people doing 
is usually either VACUUM FULL (which is however doomed for obvious 
reasons) or VACUUM + reindexing to get rid of index bloat (which however 
leads to CREATE INDEX using temporary files).


I'm not sure I agree with your claim there's no pressing problem. We do 
see quite a few people having to do VACUUM with multiple index scans 
(because the TIDs don't fit into m_w_m), which certainly has significant 
impact on production systems - both in terms of performance and it also 
slows down reclaiming the space. Sure, being able to set m_w_m above 1GB 
is an improvement, but perhaps using a more efficient TID storage would 
improve the situation further. Writing the TIDs to a temporary file may 
not the right approach, but I don't see why that would make the original 
problem less severe?


For example, we always allocate the TID array as large as we can fit 
into m_w_m, but maybe we don't need to wait with switching to the bitmap 
until filling the whole array - we could wait as long as the bitmap fits 
into the remaining part of the array, build it there and then copy it to 
the beginning (and use the bitmap from that point).


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS related docs

2016-09-15 Thread Robert Haas
On Sun, Aug 28, 2016 at 4:23 PM, Joe Conway  wrote:
>>> For COPY, I think perhaps it would be more logical to put the new note
>>> immediately after the third note which describes the privileges
>>> required, since it's kind of related, and then we can talk about the
>>> RLS policies required, e.g.:
>>>
>>> If row-level security is enabled for the table, COPY table TO is
>>> internally converted to COPY (SELECT * FROM table) TO, and the
>>> relevant security policies are applied. Currently, COPY FROM is not
>>> supported for tables with row-level security.
>>
>> This sounds better than what I had, so I will do it that way.
>
> Apologies for the delay, but new patch attached. Assuming no more
> comments, will commit this, backpatched to 9.5, in a day or two.

I don't think this was ever committed, but my comment is that it seems
to be exposing rather more of the implementation than is probably
wise.  Can't we say that SELECT policies will apply rather than saying
that it is internally converted to a SELECT?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RLS related docs

2016-09-15 Thread Robert Haas
On Tue, Aug 30, 2016 at 3:05 AM, Dean Rasheed  wrote:
> On 28 August 2016 at 21:23, Joe Conway  wrote:
>> Apologies for the delay, but new patch attached. Assuming no more
>> comments, will commit this, backpatched to 9.5, in a day or two.
>
> Looking at this again, I think there is something fishy about these
> dump/restore flags.
>
> If you do pg_dump --enable-row-security, then row_security is turned
> on during the dump and only the user-visible portions of the tables
> are dumped. But why does such a dump emit "SET row_security = on;" as
> part of the dump? There doesn't appear to be any reason for having
> row_security turned on during the restore just because it was on
> during the dump. The INSERT policies may well be different from the
> SELECT policies, and so this may lead to a dump that cannot be
> restored. ISTM that row_security should be off inside the dump, and
> only enabled during restore if the user explicitly asks for it,
> regardless of what setting was used to produce the dump.

I think you are right about this.

> Also, isn't it the case that --enable-row-security during pg_restore
> is only relevant when performing a data-only restore (like
> --disable-triggers). Otherwise, it looks to me as though the restore
> will create the tables, restore the data, and then only at the end
> restore the table policies and enable row level security on the
> tables. So it looks like the flag would have no effect (and a
> COPY-format dump would work fine) for a non-data-only dump.

Hmm.  That seems odd.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GiST: interpretation of NaN from penalty function

2016-09-15 Thread Andrew Borodin
> Certainly, "NaN means infinity", as your patch proposes, has no more basis to 
> it than "NaN means zero".
You are absolutley right. Now I see that best interpretation is "NaN
means NaN". Seems like we need only drop a check. Nodes with NaN
penalties will be considered even worser than those with infinity
penalty.
Penalty calculation is CPU performance critical, it is called for
every tuple on page along insertion path. Ommiting this check will
speed this up...a tiny bit.
> If the penalty function doesn't like that interpretation, it shouldn't return 
> NaN.
It may return NaN accidentally. If NaN will pass through union()
function then index will be poisoned.
That's not a good contract to remember for extension developer.


Regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use pread and pwrite instead of lseek + write and read

2016-09-15 Thread Tom Lane
Oskari Saarenmaa  writes:
> 17.08.2016, 22:11, Tom Lane kirjoitti:
>> I'd be more excited about this if the claimed improvement were more than
>> 1.5%, but you know as well as I do that that's barely above the noise
>> floor for most performance measurements.  I'm left wondering why bother,
>> and why take any risk of de-optimizing on some platforms.

> I think it makes sense to try to optimize for the platforms that people 
> actually use for performance critical workloads, especially if it also 
> allows us to simplify the code and remove more lines than we add.  It's 
> nice if the software still works on legacy platforms, but I don't think 
> we should be concerned about a hypothetical performance impact on 
> platforms no one uses in production anymore.

Well, my point remains that I see little value in messing with
long-established code if you can't demonstrate a benefit that's clearly
above the noise level.  We don't really know whether this change might
have adverse impacts somewhere --- either performance-wise or bug-wise ---
and for that amount of benefit I don't want to take the trouble to find
out.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LOCK TABLE .. DEFERRABLE

2016-09-15 Thread Robert Haas
On Tue, Sep 6, 2016 at 6:04 AM, Simon Riggs  wrote:
> On 1 September 2016 at 21:28, Simon Riggs  wrote:
>> So the only way to handle multiple locks is to do this roughly the way
>> Rod suggests.
>
> I've just been reading the VACUUM code and it turns out that we
> already use Rod's mechanism internally. So on that basis it seems fine
> to support this as a useful user-level feature. If there is a better
> way of doing it, then that can be added later.
>
> My proposed changes to this patch are these
>
> 1. Rename this WAIT PATIENTLY, which is IMHO a better description of
> what is being requested. Bikeshedding welcome.
>
> 2. Introduce a new API call LockTablePatiently() that returns bool. So
> its usage is similar to ConditionalLockTable(), the only difference is
> you supply some other wait parameters with it. This isolates the
> internal mechanism from the usage, so allows us to more easily support
> any fancy new way of doing this we think of later.
>
> 3. Use LockTablePatiently() within lockcmds.c where appropriate
>
> 4. Replace the code for waiting in VACUUM with the new call to
> LockTablePatiently()
>
> So I see this as 2 patches: 1) new API and make VACUUM use new API, 2)
> Rod's LOCK TABLE patch
>
> First patch attached, requires also lock by Oid.  If we agree, Rod,
> please update your patch to match?

Aside from the fact that polling is generally inefficient and wasteful
of system resources, this allows for undetected deadlocks.  Consider:

S1: LOCK TABLE A;
S2: LOCK TABLE B;
S1: LOCK TABLE B; -- blocks
S2: LOCK TABLE A PATIENTLY; -- retries forever

Livelock might be possible, too.

I think it would be better to think harder about what would be
required to implement this natively in the lock manager.  Suppose we
add a flag to each PROCLOCK which, if true, indicates that the lock
request is low priority.  Also, we add a counter to each LOCK
indicating the number of pending low-priority lock requests.  When
LockAcquireExtended reaches this code here...

if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
status = STATUS_FOUND;
else
status = LockCheckConflicts(lockMethodTable, lockmode,

 lock, proclock);

...we add an additional check to the upper branch: if the number of
low-priority waiters is not 0, then we walk the wait queue; if all
waiters that conflict with the requested lock mode are low-priority,
then we set status = STATUS_OK.  So, new lock requests refuse to queue
behind low-priority lock waiters.

Is that good enough to implement the requested behavior, or do we need
to do more?  If we only do what's described above, then a
normal-priority waiter which joins the queue after a low-priority
waiter is already waiting will let the low-priority waiter go first.
That's probably not desirable, but it's pretty easy to fix: the logic
that determines where a new waiter enters the wait queue is in
ProcSleep(), and it wouldn't be too hard to arrange for new
normal-priority waiters to skip over any low-priority waiters that are
at the end of the existing wait queue (but not any that are in the
middle, because if we did that we'd also be skipping over
normal-priority waiters, which we shouldn't).

What more?  Well, even after doing those two things, it's still
possible for either the simple deadlock logic in ProcSleep() or the
full deadlock detector to put a low-priority waiter in front of a
normal-priority waiter.  However, our typical policy is to advance
waiters in the wait queue as little as possible.  In other words, if
the wait queue contains A B C and we will deadlock unless C is moved
up, we will move it ahead of B but not A if that is sufficient to
avoid the deadlock.  We will only move it ahead of both B and A if
that is necessary to avoid deadlock.  So, low-priority requests won't
be moved up further than needed, which is good.

Still, it is possible to construct scenarios where we won't get
perfect low-priority behavior without more invasive changes. For
example, suppose we make a low-priority request queue-jump over an
intervening waiter to avoid deadlocking against it.  Next, a
normal-priority waiter enters the queue.  Then, the guy we skipped
aborts.  At this point, we could in theory notice that it's possible
to move the low-priority request behind the new normal-priority
waiter.  However, I think we shouldn't do that.  We certainly can't do
it unconditionally because it might introduce deadlocks.  We could
test whether it will introduce a deadlock and do it only if not, but
that's expensive.  Instead, I think we should document that a
low-priority request will ordinarily cause the request to be satisfied
only after all conflicting normal-priority lock requests, but that
this is not guaranteed in the case where the wait queue is rearranged
to avoid deadlock.  I don't think that limitation ought to be a
tremendous problem for users, and the alternatives are pretty
unappealing.

-- 

Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-09-15 Thread Tom Lane
"Daniel Verite"  writes:
> It looks easy to make the hooks return a bool, and when returning false,
> cancel the assignment of the variable.

Yeah, that's about how I was envisioning it too.

> I volunteer to write that patch.

Thanks.

> It would close the hazard that exists today of an internal psql flag
> getting decorrelated from the corresponding variable, due to feeding
> it with a wrong value, or in the case of autocommit, in the wrong
> context.

If we want to get to the situation that the variable's value always
reflects the internal state, then we should reject deleting the variable
too.  Currently that's allowed; I was unsure whether we should still
permit it, but this line of thought says no.

Another issue is what to do in SetVariableAssignHook().  Currently,
that's only invoked immediately after creating the variable space
so it will always go through the hook(NULL) call, which the hook
will think means deletion and then fail for, given the above change.

What might be cleanest is to modify SetVariableAssignHook so that it
is passed both the initial value and the hook (deleting the separate
initial-value-setting calls that currently exist near startup.c:40).
If the hook rejects the initial value, that would be grounds for
fatal exit.

> Also with that, the behavior of ParseVariableBool() assuming "on"
> when it's being fed with junk could be changed.

Right.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Support OpenSSL 1.1.0.

2016-09-15 Thread Heikki Linnakangas

On 09/15/2016 07:51 PM, Heikki Linnakangas wrote:

Wild guess: curculio is building with LibreSSL, which claims to be
OpenSSL >= 1.1.0, but it doesn't actually implement all the functions
that OpenSSL 1.1.0 does.

Looks like we need some more autoconf scripting to detect LibreSSL. Or
switch to detecting the existence of individual functions, rather than
checking the version number. That would be more autoconf-like anyway.


I downloaded LibreSSL and I'm getting similar errors on my laptop. So 
yes, that seems to be the problem. LibreSSL defines:



/* These will change with each release of LibreSSL-portable */
#define LIBRESSL_VERSION_NUMBER 0x2040200fL
#define LIBRESSL_VERSION_TEXT   "LibreSSL 2.4.2"

/* These will never change */
#define OPENSSL_VERSION_NUMBER  0x2000L
#define OPENSSL_VERSION_TEXTLIBRESSL_VERSION_TEXT
#define OPENSSL_VERSION_PTEXT   " part of " OPENSSL_VERSION_TEXT


I'm going to replace the OPENSSL_VERSION_NUMBER #ifdefs with autoconf 
AC_CHECK_FUNCS checks for the actual functions we need.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-09-15 Thread Daniel Verite
Tom Lane wrote:

> I think changing the hook API is a pretty reasonable thing to do here
> (though I'd make it its own patch and then add the autocommit change
> on top).  When was the last time you actually wanted to set VERBOSITY
> to "fubar"?

It looks easy to make the hooks return a bool, and when returning false,
cancel the assignment of the variable.
I volunteer to write that patch.

It would close the hazard that exists today of an internal psql flag
getting decorrelated from the corresponding variable, due to feeding
it with a wrong value, or in the case of autocommit, in the wrong
context.

Also with that, the behavior of ParseVariableBool() assuming "on"
when it's being fed with junk could be changed. Instead it could just
reject the assignment rather than emit a warning, and both
the internal flag and the variable would keep the values they had
at the point of the bogus assignment.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-15 Thread Jesper Pedersen

On 09/15/2016 02:03 AM, Amit Kapila wrote:

Same thing here - where the fields involving the hash index aren't updated.



Do you mean that for such cases also you see 40-60% gain?



No, UPDATEs are around 10-20% for our cases.



I have done a run to look at the concurrency / TPS aspect of the
implementation - to try something different than Mark's work on testing the
pgbench setup.

With definitions as above, with SELECT as

-- select.sql --
\set id random(1,10)
BEGIN;
SELECT * FROM test WHERE id = :id;
COMMIT;

and UPDATE/Indexed with an index on 'val', and finally UPDATE/Nonindexed w/o
one.

[1] [2] [3] is new_hash - old_hash is the existing hash implementation on
master. btree is master too.

Machine is a 28C/56T with 256Gb RAM with 2 x RAID10 SSD for data + wal.
Clients ran with -M prepared.

[1]
https://www.postgresql.org/message-id/caa4ek1+erbp+7mdkkahjzwq_dtdkocbpt7lswfwcqvuhbxz...@mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAD__OujvYghFX_XVkgRcJH4VcEbfJNSxySd9x=1wp5vylvk...@mail.gmail.com
[3]
https://www.postgresql.org/message-id/caa4ek1juyr_ab7bxfnsg5+jqhiwgklkgacfk9bfd4mlffk6...@mail.gmail.com

Don't know if you find this useful due to the small number of rows, but let
me know if there are other tests I can run, f.ex. bump the number of rows.



It might be useful to test with higher number of rows because with so
less data contention is not visible,


Attached is a run with 1000 rows.


but I think in general with your,
jeff's and mine own tests it is clear that there is significant win
for read-only cases and for read-write cases where index column is not
updated.  Also, we don't find any regression as compare to HEAD which
is sufficient to prove the worth of patch.


Very much agreed.


I think we should not
forget that one of the other main reason for this patch is to allow
WAL logging for hash indexes.


Absolutely. There are scenarios that will have a benefit of switching to 
a hash index.



I think for now, we have done
sufficient tests for this patch to ensure it's benefit, now if any
committer wants to see something more we can surely do it.


Ok.


 I think
the important thing at this stage is to find out what more (if
anything) is left to make this patch as "ready for committer".



I think for CHI is would be Robert's and others feedback. For WAL, there 
is [1].


[1] 
https://www.postgresql.org/message-id/5f8b4681-1229-92f4-4315-57d780d9c128%40redhat.com


Best regards,
 Jesper


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Support OpenSSL 1.1.0.

2016-09-15 Thread Heikki Linnakangas

On 09/15/2016 07:41 PM, Tom Lane wrote:

Heikki Linnakangas  writes:

Support OpenSSL 1.1.0.


Buildfarm member curculio doesn't like this patch.  I suspect the reason
is it's got some slightly-too-old version of OpenSSL, but if so, we ought
to try to fix configure's probe so the problem gets reported at configure
time, not somewhere down in the build.

Mikael, what openssl version is on that box exactly?  (And could you
fix it to start building the 9.6 branch?)


Hmm, that's odd:


be-secure-openssl.c: In function 'my_BIO_s_socket':
be-secure-openssl.c:732: warning: implicit declaration of function 
'BIO_get_new_index'
be-secure-openssl.c:735: warning: implicit declaration of function 
'BIO_meth_new'
be-secure-openssl.c:735: warning: assignment makes pointer from integer without 
a cast

> ...

It looks it's taking the OpenSSL 1.1.0 codepath:


#if OPENSSL_VERSION_NUMBER >= 0x1010L
int my_bio_index;

my_bio_index = BIO_get_new_index();
if (my_bio_index == -1)
return NULL;

> ...

Wild guess: curculio is building with LibreSSL, which claims to be 
OpenSSL >= 1.1.0, but it doesn't actually implement all the functions 
that OpenSSL 1.1.0 does.


Looks like we need some more autoconf scripting to detect LibreSSL. Or 
switch to detecting the existence of individual functions, rather than 
checking the version number. That would be more autoconf-like anyway.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 4:53 AM, Amit Langote
 wrote:
> Wow, this is bad.  What is needed in this case is "canonicalization" of
> the range partition bounds specified in the command.

I think we shouldn't worry about this.  It seems like unnecessary
scope creep.  All human beings capable of using PostgreSQL will
understand that there are no integers between 4 and 5, so any
practical impact on this would be on someone creating partitions
automatically.  But if someone is creating partitions automatically
they are highly likely to be using the same EXCLUSIVE/INCLUSIVE
settings for all of the partitions, in which case this won't arise.
And if they aren't, then I think we should just make them deal with
this limitation in their code instead of dealing with it in our code.
This patch is plenty complicated enough already; introducing a whole
new canonicalization concept that will help practically nobody seems
to me to be going in the wrong direction.  If somebody really cares
enough to want to try to fix this, they can submit a followup patch
someday.

> To mitigate this, how about we restrict range partition key to contain
> columns of only those types for which we know we can safely canonicalize a
> range bound (ie, discrete range types)?  I don't think we can use, say,
> existing int4range_canonical but will have to write a version of it for
> partitioning usage (range bounds of partitions are different from what
> int4range_canonical is ready to handle).  This approach will be very
> limiting as then range partitions will be limited to columns of int,
> bigint and date type only.

-1.  That is letting the tail wag the dog.  Let's leave it the way you
had it and be happy.

>> -- Observation 2 : able to create sub-partition out of the range set for
>> main table, causing not able to insert data satisfying any of the partition.
>>
>> create table test_subpart (c1 int) partition by range (c1);
>> create table test_subpart_p1 partition of test_subpart for values start (1)
>> end (100) inclusive partition by range (c1);
>> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
>> start (101) end (200);
>
> It seems that DDL should prevent the same column being used in partition
> key of lower level partitions.  I don't know how much sense it would make,
> but being able to use the same column as partition key of lower level
> partitions may be a feature useful to some users if they know what they
> are doing.  But this last part doesn't sound like a good thing.  I
> modified the patch such that lower level partitions cannot use columns
> used by ancestor tables.

I again disagree.  If somebody defines partition bounds that make it
impossible to insert the data that they care about, that's operator
error.  The fact that, across multiple levels, you can manage to make
it impossible to insert any data at all does not make it anything
other than operator error.  If we take the contrary position that it's
the system's job to prevent this sort of thing, we may disallow some
useful cases, like partitioning by the year portion of a date and then
subpartitioning by the month portion of that same date.

I think we'll probably also find that we're making the code
complicated to no purpose.  For example, now you have to check when
attaching a partition that it doesn't violate the rule; otherwise you
end up with a table that can't be created directly (and thus can't
survive dump-and-restore) but can be created indirectly by exploiting
loopholes in the checks.  It's tempting to think that we can check
simple cases - e.g. if the parent and the child are partitioning on
the same exact column, the child's range should be contained within
the parent's range - but more complicated cases are tricky.  Suppose
the table is range-partitioned on (a, b) and range-subpartitioned on
b.  It's not trivial to figure out whether the set of values that the
user can insert into that partition is non-empty.  If we allow
partitioning on expressions, then it quickly becomes altogether
impossible to deduce anything useful - unless you can solve the
halting problem.

And, really, why do we care?  If the user creates a partitioning
scheme that permits no rows in some or all of the partitions, then
they will have an empty table that can be correctly dumped and
restored but which cannot be used for anything useful unless it is
modified first.  They probably don't want that, but it's not any more
broken than a table inheritance setup with mutually exclusive CHECK
constraints, or for that matter a plain table with mutually exclusive
CHECK constraints - and we don't try to prohibit those things.  This
patch is supposed to be implementing partitioning, not artificial
intelligence.

>> -- Observation 3 : Getting cache lookup failed, when selecting list
>> partition table containing array.
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array;
>> ERROR:  cache lookup failed for type 0
>
> That's a bug.  

Re: [HACKERS] [COMMITTERS] pgsql: Support OpenSSL 1.1.0.

2016-09-15 Thread Tom Lane
Heikki Linnakangas  writes:
> Support OpenSSL 1.1.0.

Buildfarm member curculio doesn't like this patch.  I suspect the reason
is it's got some slightly-too-old version of OpenSSL, but if so, we ought
to try to fix configure's probe so the problem gets reported at configure
time, not somewhere down in the build.

Mikael, what openssl version is on that box exactly?  (And could you
fix it to start building the 9.6 branch?)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 12:22 PM, Tom Lane  wrote:
> Tomas Vondra  writes:
>> On 09/14/2016 07:57 PM, Tom Lane wrote:
>>> People who are vacuuming because they are out of disk space will be very
>>> very unhappy with that solution.
>
>> The people are usually running out of space for data, while these files
>> would be temporary files placed wherever temp_tablespaces points to. I'd
>> argue if this is a source of problems, the people are already in deep
>> trouble due to sorts, CREATE INDEX, ... as those commands may also
>> generate a lot of temporary files.
>
> Except that if you are trying to recover disk space, VACUUM is what you
> are doing, not CREATE INDEX.  Requiring extra disk space to perform a
> vacuum successfully is exactly the wrong direction to be going in.
> See for example this current commitfest entry:
> https://commitfest.postgresql.org/10/649/
> Regardless of what you think of the merits of that patch, it's trying
> to solve a real-world problem.  And as Robert has already pointed out,
> making this aspect of VACUUM more complicated is not solving any
> pressing problem.  "But we made it faster" is going to be a poor answer
> for the next person who finds themselves up against the wall with no
> recourse.

I very much agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Tom Lane
Tomas Vondra  writes:
> On 09/14/2016 07:57 PM, Tom Lane wrote:
>> People who are vacuuming because they are out of disk space will be very
>> very unhappy with that solution.

> The people are usually running out of space for data, while these files 
> would be temporary files placed wherever temp_tablespaces points to. I'd 
> argue if this is a source of problems, the people are already in deep 
> trouble due to sorts, CREATE INDEX, ... as those commands may also 
> generate a lot of temporary files.

Except that if you are trying to recover disk space, VACUUM is what you
are doing, not CREATE INDEX.  Requiring extra disk space to perform a
vacuum successfully is exactly the wrong direction to be going in.
See for example this current commitfest entry:
https://commitfest.postgresql.org/10/649/
Regardless of what you think of the merits of that patch, it's trying
to solve a real-world problem.  And as Robert has already pointed out,
making this aspect of VACUUM more complicated is not solving any
pressing problem.  "But we made it faster" is going to be a poor answer
for the next person who finds themselves up against the wall with no
recourse.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Claudio Freire
On Thu, Sep 15, 2016 at 12:50 PM, Tomas Vondra
 wrote:
> On 09/14/2016 07:57 PM, Tom Lane wrote:
>>
>> Pavan Deolasee  writes:
>>>
>>> On Wed, Sep 14, 2016 at 10:53 PM, Alvaro Herrera
>>> 
>>> wrote:

 One thing not quite clear to me is how do we create the bitmap
 representation starting from the array representation in midflight
 without using twice as much memory transiently.  Are we going to write
 the array to a temp file, free the array memory, then fill the bitmap by
 reading the array from disk?
>>
>>
>>> We could do that.
>>
>>
>> People who are vacuuming because they are out of disk space will be very
>> very unhappy with that solution.
>
>
> The people are usually running out of space for data, while these files
> would be temporary files placed wherever temp_tablespaces points to. I'd
> argue if this is a source of problems, the people are already in deep
> trouble due to sorts, CREATE INDEX, ... as those commands may also generate
> a lot of temporary files.

One would not expect "CREATE INDEX" to succeed when space is tight,
but VACUUM is quite the opposite.

Still, temporary storage could be used if available, and gracefully
fall back to some other technique (like not using bitmaps) when not.

Not sure it's worth the trouble, though.


On Wed, Sep 14, 2016 at 12:24 PM, Claudio Freire  wrote:
> On Wed, Sep 14, 2016 at 12:17 PM, Robert Haas  wrote:
>>
>> I am kind of doubtful about this whole line of investigation because
>> we're basically trying pretty hard to fix something that I'm not sure
>> is broken.I do agree that, all other things being equal, the TID
>> lookups will probably be faster with a bitmap than with a binary
>> search, but maybe not if the table is large and the number of dead
>> TIDs is small, because cache efficiency is pretty important.  But even
>> if it's always faster, does TID lookup speed even really matter to
>> overall VACUUM performance? Claudio's early results suggest that it
>> might, but maybe that's just a question of some optimization that
>> hasn't been done yet.
>
> FYI, the reported impact was on CPU time, not runtime. There was no
> significant difference in runtime (real time), because my test is
> heavily I/O bound.
>
> I tested with a few small tables and there was no significant
> difference either, but small tables don't stress the array lookup
> anyway so that's expected.
>
> But on the assumption that some systems may be CPU bound during vacuum
> (particularly those able to do more than 300-400MB/s sequential I/O),
> in those cases the increased or decreased cost of lazy_tid_reaped will
> directly correlate to runtime. It's just none of my systems, which all
> run on amazon and is heavily bandwidth constrained (fastest I/O
> subsystem I can get my hands on does 200MB/s).

Attached is the patch with the multiarray version.

The tests are weird. Best case comparison over several runs, to remove
the impact of concurrent activity on this host (I couldn't remove all
background activity even when running the tests overnight, the distro
adds tons of crons and background cleanup tasks it would seem),
there's only very mild CPU impact. I'd say insignificant, as it's well
below the mean variance.

Worst case:

DETAIL:  CPU 9.90s/80.94u sec elapsed 1232.42 sec.

Best case:

DETAIL:  CPU 12.10s/63.82u sec elapsed 832.79 sec.

There seems to be more variance with the multiarray approach than the
single array one, but I could not figure out why. Even I/O seems less
stable:

Worst case:

INFO:  "pgbench_accounts": removed 4 row versions in 6557382 pages
DETAIL:  CPU 64.31s/37.60u sec elapsed 2573.88 sec.

Best case:

INFO:  "pgbench_accounts": removed 4 row versions in 6557378 pages
DETAIL:  CPU 54.48s/31.78u sec elapsed 1552.18 sec.

Since this test takes several hours to complete, I could only run a
few runs of each version, so the statistical significance of the test
isn't very bright.

I'll try to compare with smaller pgbench scale numbers and more runs
over the weekend (gotta script that). It's certainly puzzling, I
cannot explain the increased variance, especially in I/O, since the
I/O should be exactly the same. I'm betting it's my system that's
unpredictable somehow. So I'm posting the patch in case someone gets
inspired and can spot the reason, and because there's been a lot of
talk about this very same approach, so I thought I'd better post the
code ;)

I'll also try to get a more predictable system to run the tests on.
From f85fd4213eb6c0b4da8dc9196424f6e8a5a2a9a7 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum 

Re: [HACKERS] File system operations.

2016-09-15 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 15, 2016 at 11:01 AM, Anastasia Lubennikova
>  wrote:
>> What do you think about moving stuff from pg_upgrade/file.c to storage/file/
>> to allow reuse of this code? I think it'll be really helpful for developers
>> of contrib modules
>> like backup managers.

> Well, storage/file is backend and pg_upgrade is frontend.  If you want
> to reuse the same code for both it's got to go someplace else.

Also, to the extent that it's important to use those wrappers rather
than libc directly, it's because the wrappers are tied into backend
resource management and error handling conventions.  I don't see how
you convert that into code that also works in a frontend program,
or what the point would be exactly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-09-15 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 15, 2016 at 11:10 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Thu, Sep 15, 2016 at 7:29 AM, Rahila Syed  wrote:
 In keeping with current design of hooks instead of rejecting
 autocommit 'ON' setting inside a transaction,the value can be set to
 'ON' with a psql_error displaying that the value will be effective
 when the current transaction has ended.

>>> Hmm, that seems like a reasonable compromise.

>> I dunno, implementing that seems like it will require some very fragile
>> behavior in the autocommit code, ie that even though the variable is "on"
>> we don't do anything until after reaching an out-of-transaction state.
>> It might work today but I'm afraid we'd break it in future.

> Hmm, I don't think any logic change is being proposed, just a warning
> that it may not work the way you think.  So I don't think it would be
> any more fragile than now.  Am I missing something?

As I see it, up to now we never considered what would happen if you
changed the variable's setting mid-transaction.  The fact that it works
like this is an implementation artifact.  Now that we are considering it,
we should ask ourselves if we like that artifact and are willing to commit
to keeping it working like that forevermore.  I'm not sure that the answer
to either one is "yes".  I think throwing an actual error would be
preferable.

>> I think changing the hook API is a pretty reasonable thing to do here
>> (though I'd make it its own patch and then add the autocommit change
>> on top).  When was the last time you actually wanted to set VERBOSITY
>> to "fubar"?

> I agree that'd be better but I don't know that we should expect Rahila
> to do that as a condition of getting a usability warning accepted.

If it's something that would end up getting thrown away after someone
does the API change, accepting a warning-only patch doesn't seem all
that useful.  But I just work here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tackling JsonPath support

2016-09-15 Thread Christian Convey
On Mon, Sep 5, 2016 at 1:44 PM, Pavel Stehule  wrote:
...

> I wrote XMLTABLE function, and I am thinking about JSON_TABLE function. But
> there is one blocker - missing JsonPath support in our JSON implementation.
>
> So one idea - implement JsonPath support and related JSON query functions.
> This can help with better standard conformance.

Hi Pavel,

Are you still looking for someone to add the JsonPath support to the
JSON implementation?  And if so, how urgently are people waiting for
it?

I'd be happy to start working on a patch, but since I'm new to PG
development, I'm probably not the fastest person to get it done.

Kind regards,
Christian


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Tomas Vondra



On 09/14/2016 05:17 PM, Robert Haas wrote:

I am kind of doubtful about this whole line of investigation because
we're basically trying pretty hard to fix something that I'm not sure
is broken.I do agree that, all other things being equal, the TID
lookups will probably be faster with a bitmap than with a binary
search, but maybe not if the table is large and the number of dead
TIDs is small, because cache efficiency is pretty important.  But even
if it's always faster, does TID lookup speed even really matter to
overall VACUUM performance? Claudio's early results suggest that it
might, but maybe that's just a question of some optimization that
hasn't been done yet.


Regarding the lookup performance, I don't think the bitmap alone can 
significantly improve that - it's more efficient memory-wise, no doubt 
about that, but it's still likely larger than CPU caches and accessed 
mostly randomly (when vacuuming the indexes).


IMHO the best way to speed-up lookups (if it's really an issue, haven't 
done any benchmarks) would be to build a small bloom filter in front of 
the TID array / bitmap. It shall be fairly small (depending on the 
number of TIDs, error rate etc.) and likely to fit into L2/L3, and 
eliminate a lot of probes into the much larger array/bitmap.


Of course, it's another layer of complexity - the good thing is we don't 
need to build the filter until after we collect the TIDs, so we got 
pretty good inputs for the bloom filter parameters.


But all this is based on the assumption that the lookups are actually 
expensive, not sure about that.


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Tomas Vondra



On 09/14/2016 07:57 PM, Tom Lane wrote:

Pavan Deolasee  writes:

On Wed, Sep 14, 2016 at 10:53 PM, Alvaro Herrera 
wrote:

One thing not quite clear to me is how do we create the bitmap
representation starting from the array representation in midflight
without using twice as much memory transiently.  Are we going to write
the array to a temp file, free the array memory, then fill the bitmap by
reading the array from disk?



We could do that.


People who are vacuuming because they are out of disk space will be very
very unhappy with that solution.


The people are usually running out of space for data, while these files 
would be temporary files placed wherever temp_tablespaces points to. I'd 
argue if this is a source of problems, the people are already in deep 
trouble due to sorts, CREATE INDEX, ... as those commands may also 
generate a lot of temporary files.


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_parallel test fails with nonstandard block size

2016-09-15 Thread Tom Lane
I wrote:
> OK, I'll take care of it (since I now realize that the inconsistency
> is my own fault --- I committed that GUC not you).  It's unclear what
> this will do for Peter's complaint though.

On closer inspection, the answer is "nothing", because the select_parallel
test overrides the default value of min_parallel_relation_size anyway.
(Without that, I don't think tenk1 is large enough to trigger
consideration of parallel scan at all.)

I find that at BLCKSZ 8K, the planner thinks the best plan is

 HashAggregate  (cost=5320.28..7920.28 rows=1 width=12)
   Group Key: parallel_restricted(unique1)
   ->  Index Only Scan using tenk1_unique1 on tenk1  (cost=0.29..2770.28 
rows=1 width=8)

which is what the regression test script expects.  Forcing the parallel
plan to be chosen, we get this using the cost parameters set up by
select_parallel:

 HashAggregate  (cost=5433.00..8033.00 rows=1 width=12)
   Group Key: parallel_restricted(unique1)
   ->  Gather  (cost=0.00..2883.00 rows=1 width=8)
 Workers Planned: 4
 ->  Parallel Seq Scan on tenk1  (cost=0.00..383.00 rows=2500 width=4)

However, at BLCKSZ 16K, we get these numbers instead:

 HashAggregate  (cost=5264.28..7864.28 rows=1 width=12)
   Group Key: parallel_restricted(unique1)
   ->  Index Only Scan using tenk1_unique1 on tenk1  (cost=0.29..2714.28 
rows=1 width=8)

 HashAggregate  (cost=5251.00..7851.00 rows=1 width=12)
   Group Key: parallel_restricted(unique1)
   ->  Gather  (cost=0.00..2701.00 rows=1 width=8)
 Workers Planned: 4
 ->  Parallel Seq Scan on tenk1  (cost=0.00..201.00 rows=2500 width=4)

so the planner goes for the second one.

I don't think there's anything particularly broken here.  The seqscan
cost estimate is largely dependent on the number of blocks, and there's
half as many blocks at 16K.  The indexscan estimate is also reduced,
but not as much, so it stops looking like the cheaper alternative.

We could maybe twiddle the cost parameters select_parallel uses so that
the same plan is chosen at both block sizes, but it seems like it would
be very fragile, and I'm not sure there's much point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel sec scan in plpgsql

2016-09-15 Thread Alex Ignatov

Hello!
Does parallel secscan works in plpgsql?

--
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan

2016-09-15 Thread Robert Haas
On Tue, Sep 13, 2016 at 9:07 PM, Kouhei Kaigai  wrote:
> In the current implementation calls recompute_limits() on the first
> invocation of ExecLimit and ExecReScanLimit. Do we expect the
> ps->numTuples will be also passed down to the child nodes on the same
> timing?

Sure, unless we find some reason why that's not good.

> I also think this new executor contract shall be considered as a hint
> (but not a requirement) for the child nodes, because it allows the
> parent nodes to re-distribute the upper limit regardless of the type
> of the child nodes as long as the parent node can work correctly and
> has benefit even if the child node returns a part of tuples. It makes
> the decision whether the upper limit should be passed down much simple.
> The child node "can" ignore the hint but can utilize for more optimization.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-09-15 Thread Alvaro Herrera
Christoph Berg wrote:
> Re: Michael Paquier 2016-09-15 
> 
> > On Thu, Sep 15, 2016 at 8:57 PM, Heikki Linnakangas  wrote:
> > > I backpatched this to 9.5, but not further than that. The functions this
> > > modified were moved around in 9.5, so the patch wouldn't apply as is. It
> > > wouldn't be difficult to back-patch further if there's demand, but I'm not
> > > eager to do that until someone complains.
> > 
> > Not going older than 9.5 may be fine:
> > https://www.openssl.org/blog/blog/2014/12/23/the-new-release-strategy/
> > https://wiki.freebsd.org/OpenSSL
> > As far as I can see 1.0.2 would be supported until Dec 2019, so that
> > would just overlap with 9.4's EOL.
> 
> I'm afraid it's not that easy - Debian 9 (stretch) will release at the
> beginning of next year, and apt.postgresql.org will want to build
> 9.2/9.3/9.4 for that distribution. I guess yum.postgresql.org will
> have the same problem with the next Fedora release.

I suppose some interested party could grab the patch that Heikki
committed to the new branches and produce a back-patch that can be
applied to the older branches.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Comment typo in execIndexing.c

2016-09-15 Thread Robert Haas
On Tue, Sep 13, 2016 at 10:48 PM, Amit Langote
 wrote:
> Spotted a typo in the header comment of execIndexing.c.  Attached fixes it.
>
> s/exclusive constraints/exclusion constraints/g

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 11:10 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Sep 15, 2016 at 7:29 AM, Rahila Syed  wrote:
>>> In keeping with current design of hooks instead of rejecting autocommit 'ON'
>>> setting inside
>>> a transaction,the value can be set to 'ON' with a psql_error displaying that
>>> the value
>>> will be effective when the current transaction has ended.
>
>> Hmm, that seems like a reasonable compromise.
>
> I dunno, implementing that seems like it will require some very fragile
> behavior in the autocommit code, ie that even though the variable is "on"
> we don't do anything until after reaching an out-of-transaction state.
> It might work today but I'm afraid we'd break it in future.

Hmm, I don't think any logic change is being proposed, just a warning
that it may not work the way you think.  So I don't think it would be
any more fragile than now.  Am I missing something?

> I think changing the hook API is a pretty reasonable thing to do here
> (though I'd make it its own patch and then add the autocommit change
> on top).  When was the last time you actually wanted to set VERBOSITY
> to "fubar"?

I agree that'd be better but I don't know that we should expect Rahila
to do that as a condition of getting a usability warning accepted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File system operations.

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 11:01 AM, Anastasia Lubennikova
 wrote:
> What do you think about moving stuff from pg_upgrade/file.c to storage/file/
> to allow reuse of this code? I think it'll be really helpful for developers
> of contrib modules
> like backup managers.

Well, storage/file is backend and pg_upgrade is frontend.  If you want
to reuse the same code for both it's got to go someplace else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-09-15 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 15, 2016 at 7:29 AM, Rahila Syed  wrote:
>> In keeping with current design of hooks instead of rejecting autocommit 'ON'
>> setting inside
>> a transaction,the value can be set to 'ON' with a psql_error displaying that
>> the value
>> will be effective when the current transaction has ended.

> Hmm, that seems like a reasonable compromise.

I dunno, implementing that seems like it will require some very fragile
behavior in the autocommit code, ie that even though the variable is "on"
we don't do anything until after reaching an out-of-transaction state.
It might work today but I'm afraid we'd break it in future.

I think changing the hook API is a pretty reasonable thing to do here
(though I'd make it its own patch and then add the autocommit change
on top).  When was the last time you actually wanted to set VERBOSITY
to "fubar"?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-15 Thread Masahiko Sawada
On Thu, Sep 15, 2016 at 2:40 AM, Simon Riggs  wrote:
> On 14 September 2016 at 11:19, Pavan Deolasee  
> wrote:
>
>>>  In
>>> theory we could even start with the list of TIDs and switch to the
>>> bitmap if the TID list becomes larger than the bitmap would have been,
>>> but I don't know if it's worth the effort.
>>>
>>
>> Yes, that works too. Or may be even better because we already know the
>> bitmap size requirements, definitely for the tuples collected so far. We
>> might need to maintain some more stats to further optimise the
>> representation, but that seems like unnecessary detailing at this point.
>
> That sounds best to me... build the simple representation, but as we
> do maintain stats to show to what extent that set of tuples is
> compressible.
>
> When we hit the limit on memory we can then selectively compress
> chunks to stay within memory, starting with the most compressible
> chunks.
>
> I think we should use the chunking approach Robert suggests, though
> mainly because that allows us to consider how parallel VACUUM should
> work - writing the chunks to shmem. That would also allow us to apply
> a single global limit for vacuum memory rather than an allocation per
> VACUUM.
> We can then scan multiple indexes at once in parallel, all accessing
> the shmem data structure.
>

Yeah, the chunking approach Robert suggested seems like a good idea
but considering implementing parallel vacuum, it would be more
complicated IMO.
I think It's better the multiple process simply allocate memory space
for its process than that the single process allocate huge memory
space using complicated way.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-15 Thread Tom Lane
Robert Haas  writes:
> This seems like a very complicated mechanism of substituting for a
> very simple patch.

Well, if we're gonna do it, then let's just do it, but please let's
have a patch that doesn't look like somebody's temporary debugging kluge.

I'd suggest that this is parallel to nodeToString() and therefore
(a) should be placed beside it, (b) should be named like it,
bmsToString() perhaps, and (c) should look more like it internally.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_parallel test fails with nonstandard block size

2016-09-15 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 15, 2016 at 10:44 AM, Tom Lane  wrote:
>> Well, sure, but at any reasonable value of min_parallel_relation_size
>> that won't be a factor.  The question here is whether we want the default
>> value to be platform-independent.  I notice that both config.sgml and
>> postgresql.conf.sample claim that the default value is 8MB, which this
>> discussion reveals to be a lie.  If you want to keep the default expressed
>> as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
>> documentation.

> I don't particularly care about that.  Changing it to 8MB always would
> be fine with me.

OK, I'll take care of it (since I now realize that the inconsistency
is my own fault --- I committed that GUC not you).  It's unclear what
this will do for Peter's complaint though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] File system operations.

2016-09-15 Thread Anastasia Lubennikova

Hi, hackers!

Do we have any standard set of functions to operate with files?

I mean link(), copy(), rename(), unlink(), ftruncate() and so on.
Mostly I worry about the first two since they have different
implementation on Windows.
I found a couple of functions in src/backend/storage/file/copydir.c,
some more in src/bin/pg_upgrade/file.c along with a set of defines
in src/bin/pg_upgrade/pg_upgrade.h. Obviously md.c and fd.c contain
some functions.
Do we have any policy of using them? There is a comment in fd.c

 * For this scheme to work, most (if not all) routines throughout the
 * server should use these interfaces instead of calling the C library
 * routines (e.g., open(2) and fopen(3)) themselves.  Otherwise, we
 * may find ourselves short of real file descriptors anyway.

and even more in fd.h

 *File {Close, Read, Write, Seek, Tell, Sync}
 *{Path Name Open, Allocate, Free} File
 *
 * These are NOT JUST RENAMINGS OF THE UNIX ROUTINES.
 * Use them for all file activity...

but I see that, for example, pg_open_tzfile() and get_controlfile(), 
logfile_open()
call open()/fopen() directly. Same behavior you can find for any C 
library function.

Am I missing something important or it is simply a legasy/overlooked code?

What do you think about moving stuff from pg_upgrade/file.c to storage/file/
to allow reuse of this code? I think it'll be really helpful for 
developers of contrib modules

like backup managers.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Surprising behaviour of \set AUTOCOMMIT ON

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 7:29 AM, Rahila Syed  wrote:
> In keeping with current design of hooks instead of rejecting autocommit 'ON'
> setting inside
> a transaction,the value can be set to 'ON' with a psql_error displaying that
> the value
> will be effective when the current transaction has ended.

Hmm, that seems like a reasonable compromise.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_parallel test fails with nonstandard block size

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 10:44 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane  wrote:
>>> Possibly we ought to change things so that the default value of
>>> min_parallel_relation_size is a fixed number of bytes rather
>>> than a fixed number of blocks.  Not sure though.
>
>> The reason why this was originally reckoned in blocks is because the
>> data is divided between the workers on the basis of a block number.
>> In the degenerate case where blocks < workers, the extra workers will
>> get no blocks at all, and thus no rows at all.
>
> Well, sure, but at any reasonable value of min_parallel_relation_size
> that won't be a factor.  The question here is whether we want the default
> value to be platform-independent.  I notice that both config.sgml and
> postgresql.conf.sample claim that the default value is 8MB, which this
> discussion reveals to be a lie.  If you want to keep the default expressed
> as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
> documentation.

I don't particularly care about that.  Changing it to 8MB always would
be fine with me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada  wrote:
> I'm implementing this patch but I need to resolve the problem
> regarding lock for extension by multiple parallel workers.
> In parallel vacuum, multiple workers could try to acquire the
> exclusive lock for extension on same relation.
> Since acquiring the exclusive lock for extension by multiple workers
> is regarded as locking from same locking group, multiple workers
> extend fsm or vm at the same time and end up with error.
> I thought that it might be involved with parallel update operation, so
> I'd like to discuss about this in advance.

Hmm, yeah.  This is one of the reasons why parallel queries currently
need to be entirely read-only.  I think there's a decent argument that
the relation extension lock mechanism should be entirely redesigned:
the current system is neither particularly fast nor particularly
elegant, and some of the services that the heavyweight lock manager
provides, such as deadlock detection, are not relevant for relation
extension locks.  I'm not sure if we should try to fix that right away
or come up with some special-purpose hack for vacuum, such as having
backends use condition variables to take turns calling
visibilitymap_set().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_parallel test fails with nonstandard block size

2016-09-15 Thread Tom Lane
Robert Haas  writes:
> On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane  wrote:
>> Possibly we ought to change things so that the default value of
>> min_parallel_relation_size is a fixed number of bytes rather
>> than a fixed number of blocks.  Not sure though.

> The reason why this was originally reckoned in blocks is because the
> data is divided between the workers on the basis of a block number.
> In the degenerate case where blocks < workers, the extra workers will
> get no blocks at all, and thus no rows at all.

Well, sure, but at any reasonable value of min_parallel_relation_size
that won't be a factor.  The question here is whether we want the default
value to be platform-independent.  I notice that both config.sgml and
postgresql.conf.sample claim that the default value is 8MB, which this
discussion reveals to be a lie.  If you want to keep the default expressed
as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
documentation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-15 Thread Pavan Deolasee
On Thu, Sep 15, 2016 at 7:55 PM, Pavan Deolasee 
wrote:

>
> (lldb) script print print_bms_members(lldb.frame.FindVariable ("a"))
> nwords = 1 bitmap: 0x200
>
>
Or even this if lldb.frame.FindVariable() is pushed inside the function:

(lldb) script print print_bms_members('a')
nwords = 1 bitmap: 0x200

Thanks,
Pavan

--
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-09-15 Thread Christoph Berg
Re: Michael Paquier 2016-09-15 

> On Thu, Sep 15, 2016 at 8:57 PM, Heikki Linnakangas  wrote:
> > I backpatched this to 9.5, but not further than that. The functions this
> > modified were moved around in 9.5, so the patch wouldn't apply as is. It
> > wouldn't be difficult to back-patch further if there's demand, but I'm not
> > eager to do that until someone complains.
> 
> Not going older than 9.5 may be fine:
> https://www.openssl.org/blog/blog/2014/12/23/the-new-release-strategy/
> https://wiki.freebsd.org/OpenSSL
> As far as I can see 1.0.2 would be supported until Dec 2019, so that
> would just overlap with 9.4's EOL.

I'm afraid it's not that easy - Debian 9 (stretch) will release at the
beginning of next year, and apt.postgresql.org will want to build
9.2/9.3/9.4 for that distribution. I guess yum.postgresql.org will
have the same problem with the next Fedora release.

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-15 Thread Pavan Deolasee
On Thu, Sep 15, 2016 at 7:38 PM, Robert Haas  wrote:

>
>
> This seems like a very complicated mechanism of substituting for a
> very simple patch.


I don't have objection to the patch per se. The point of posting this was
just to share other mechanisms that exists. BTW advantage of using debugger
scripts is that they also work while inspecting core dumps.


> Your LLDB script is about the same number of lines
> as Ashutosh's patch and only works for people who use LLDB.


Alvaro pointed out that gdb also have similar capabilities.


>   Plus,
> once you write it, you've got to enter the Python interpreter to use
> it and then run three more lines of code that aren't easy to remember.
>
In contrast, with Ashutosh's proposed patch, you just write:
>
> p bms_to_char(bms_object)
>
>
I learnt this yesterday and I am sure there are easier ways to do the same
thing. I just don't know. For example, you can also do this:

(lldb) script print print_bms_members(lldb.frame.FindVariable ("a"))
nwords = 1 bitmap: 0x200

It's still slightly cumbersome, but better than entering the interpreter.


> ...and you're done.  Now, I grant that his approach bloats the binary
> and yours does not, but nobody complains about pprint() bloating the
> binary.


Sure. I wasn't aware of existence of pprint() either and may be that's
enough from debugging perspective. When I tried that yesterday, the output
went to the logfile instead of coming on the debugger prompt. May be I did
something wrong or may be that's not inconvenient for those who use it
regularly.

So yeah, no objections to the patch. I was happy to discover what I did and
thought of sharing assuming others might find it useful too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Hash Indexes

2016-09-15 Thread Andres Freund
Hi,

On 2016-05-10 17:39:22 +0530, Amit Kapila wrote:
> For making hash indexes usable in production systems, we need to improve
> its concurrency and make them crash-safe by WAL logging them.

One earlier question about this is whether that is actually a worthwhile
goal.  Are the speed and space benefits big enough in the general case?
Could those benefits not be achieved in a more maintainable manner by
adding a layer that uses a btree over hash(columns), and adds
appropriate rechecks after heap scans?

Note that I'm not saying that hash indexes are not worthwhile, I'm just
doubtful that question has been explored sufficiently.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_parallel test fails with nonstandard block size

2016-09-15 Thread Alvaro Herrera
Robert Haas wrote:
> On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane  wrote:
> > Possibly we ought to change things so that the default value of
> > min_parallel_relation_size is a fixed number of bytes rather
> > than a fixed number of blocks.  Not sure though.
> 
> The reason why this was originally reckoned in blocks is because the
> data is divided between the workers on the basis of a block number.

Maybe the solution is to fill the table to a given number of blocks
rather than a number of rows.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 1:41 AM, Amit Kapila  wrote:
> I think it is possible without breaking pg_upgrade, if we match all
> items of a page at once (and save them as local copy), rather than
> matching item-by-item as we do now.  We are already doing similar for
> btree, refer explanation of BTScanPosItem and BTScanPosData in
> nbtree.h.

If ever we want to sort hash buckets by TID, it would be best to do
that in v10 since we're presumably going to be recommending a REINDEX
anyway.  But is that a good thing to do?  That's a little harder to
say.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_parallel test fails with nonstandard block size

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane  wrote:
> Possibly we ought to change things so that the default value of
> min_parallel_relation_size is a fixed number of bytes rather
> than a fixed number of blocks.  Not sure though.

The reason why this was originally reckoned in blocks is because the
data is divided between the workers on the basis of a block number.
In the degenerate case where blocks < workers, the extra workers will
get no blocks at all, and thus no rows at all.  It seemed best to
insist that the relation had a reasonable number of blocks so that we
could hope for a reasonably even distribution of work among a pool of
workers.  I'm not altogether sure that's the right way of thinking
about this problem but I'm not sure it's wrong, either; anyway, it's
as far as my thought process had progressed at the time I wrote the
code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-15 Thread Robert Haas
On Wed, Sep 14, 2016 at 8:01 AM, Pavan Deolasee
 wrote:
> On Wed, Sep 14, 2016 at 3:46 PM, Pavan Deolasee 
> wrote:
>>  lately I'm using LVM debugger (which probably does not have something
>> equivalent),
>
>
> And I was so clueless about lldb's powerful scripting interface. For
> example, you can write something like this in bms_utils.py:
>
> import lldb
>
> def print_bms_members (bms):
> words = bms.GetChildMemberWithName("words")
> nwords = int(bms.GetChildMemberWithName("nwords").GetValue())
>
> ret = 'nwords = {0} bitmap: '.format(nwords,)
> for i in range(0, nwords):
> ret += hex(int(words.GetChildAtIndex(0, lldb.eNoDynamicValues,
> True).GetValue()))
>
> return ret
>
> And then do this while attached to lldb debugger:
>
> Process 99659 stopped
> * thread #1: tid = 0x59ba69, 0x0001090b012f
> postgres`bms_add_member(a=0x7fe60a0351f8, x=10) + 15 at bitmapset.c:673,
> queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
> frame #0: 0x0001090b012f
> postgres`bms_add_member(a=0x7fe60a0351f8, x=10) + 15 at bitmapset.c:673
>670 int wordnum,
>671 bitnum;
>672
> -> 673 if (x < 0)
>674 elog(ERROR, "negative bitmapset member not allowed");
>675 if (a == NULL)
>676 return bms_make_singleton(x);
> (lldb) script
> Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
 from bms_utils import *
 bms = lldb.frame.FindVariable ("a")
 print print_bms_members(bms)
> nwords = 1 bitmap: 0x200
>
>
> The complete API reference is available here
> http://lldb.llvm.org/python_reference/index.html
>
> Looks like an interesting SoC project to write useful lldb/gdb scripts to
> print internal structures for ease of debugging :-)

This seems like a very complicated mechanism of substituting for a
very simple patch.  Your LLDB script is about the same number of lines
as Ashutosh's patch and only works for people who use LLDB.  Plus,
once you write it, you've got to enter the Python interpreter to use
it and then run three more lines of code that aren't easy to remember.
In contrast, with Ashutosh's proposed patch, you just write:

p bms_to_char(bms_object)

...and you're done.  Now, I grant that his approach bloats the binary
and yours does not, but nobody complains about pprint() bloating the
binary.  If that's actually an issue people are really concerned about
then let's just reject this and Ashutosh can patch his local copy.  If
that's not a serious problem then let's take the patch.  If anything,
I think this discussion shows that LLDB macros are probably too much
of a pain to be seriously considered for everyday use, unless perhaps
you're the sort of person who plans to spend the day inside the Python
shell anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] select_parallel test fails with nonstandard block size

2016-09-15 Thread Tom Lane
Peter Eisentraut  writes:
> When building with --with-blocksize=16, the select_parallel test fails
> with this difference:

>  explain (costs off)
> select  sum(parallel_restricted(unique1)) from tenk1
> group by(parallel_restricted(unique1));
> - QUERY PLAN
> -
> +QUERY PLAN
> +---
>   HashAggregate
> Group Key: parallel_restricted(unique1)
> -   ->  Index Only Scan using tenk1_unique1 on tenk1
> -(3 rows)
> +   ->  Gather
> + Workers Planned: 4
> + ->  Parallel Seq Scan on tenk1
> +(5 rows)

>  set force_parallel_mode=1;
>  explain (costs off)

> We know that different block sizes cause some test failures, mainly
> because of row ordering differences.  But this looked a bit different.

I suspect what is happening is that min_parallel_relation_size is
being interpreted differently (because the default is set at 1024
blocks, regardless of what BLCKSZ is) and that's affecting the
cost estimate for the parallel seqscan.  The direction of change
seems a bit surprising though; if the table is now half as big
blocks-wise, how did that make the parallel scan look cheaper?
Please step through create_plain_partial_paths and see what
is being done differently.

Possibly we ought to change things so that the default value of
min_parallel_relation_size is a fixed number of bytes rather
than a fixed number of blocks.  Not sure though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 2:13 AM, Amit Kapila  wrote:
> One other point, I would like to discuss is that currently, we have a
> concept for tracking active hash scans (hashscan.c) which I think is
> mainly to protect splits when the backend which is trying to split has
> some scan open. You can read "Other Notes" section of
> access/hash/README for further details.  I think after this patch we
> don't need that mechanism for splits because we always retain a pin on
> bucket buffer till all the tuples are fetched or scan is finished
> which will defend against a split by our own backend which tries to
> ensure cleanup lock on bucket.

Hmm, yeah.  It seems like we can remove it.

> However, we might need it for vacuum
> (hashbulkdelete), if we want to get rid of cleanup lock in vacuum,
> once we have a page-at-a-time scan mode implemented for hash indexes.
> If you agree with above analysis, then we can remove the checks for
> _hash_has_active_scan from both vacuum and split path and also remove
> corresponding code from hashbegin/end scan, but retain that hashscan.c
> for future improvements.

Do you have a plan for that?  I'd be inclined to just blow away
hashscan.c if we don't need it any more, unless you're pretty sure
it's going to get reused.  It's not like we can't pull it back out of
git if we decide we want it back after all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] palloc() too large on pg_buffercache with large shared_buffers

2016-09-15 Thread Robert Haas
On Wed, Sep 14, 2016 at 7:59 PM, Kouhei Kaigai  wrote:
>> On Wed, Sep 14, 2016 at 12:13 AM, Kouhei Kaigai  wrote:
>> > It looks to me pg_buffercache tries to allocate more than 1GB using
>> > palloc(), when shared_buffers is more than 256GB.
>> >
>> > # show shared_buffers ;
>> >  shared_buffers
>> > 
>> >  280GB
>> > (1 row)
>> >
>> > # SELECT buffers, d.datname, coalesce(c.relname, '???')
>> > FROM (SELECT count(*) buffers, reldatabase, relfilenode
>> > FROM pg_buffercache group by reldatabase, relfilenode) b
>> >LEFT JOIN pg_database d ON d.oid = b.reldatabase
>> >LEFT JOIN pg_class c ON d.oid = (SELECT oid FROM pg_database
>> >  WHERE datname = 
>> > current_database())
>> >AND b.relfilenode = pg_relation_filenode(c.oid)
>> >ORDER BY buffers desc;
>> > ERROR:  invalid memory alloc request size 1174405120
>> >
>> > It is a situation to use MemoryContextAllocHuge(), instead of palloc().
>> > Also, it may need a back patching?
>>
>> I guess so.  Although it's not very desirable for it to use that much
>> memory, I suppose if you have a terabyte of shared_buffers you
>> probably have 4GB of memory on top of that to show what they contain.
>>
> Exactly. I found this problem when a people asked me why shared_buffers=280GB
> is slower than shared_buffers=128MB to scan 350GB table.
> As I expected, most of shared buffers are not in-use and it also reduced
> amount of free memory; usable for page-cache.

OK.  Committed and back-patched to 9.4.  There's no support for huge
allocations before that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequences and pg_upgrade

2016-09-15 Thread Anastasia Lubennikova

15.09.2016 15:29, Peter Eisentraut:

On 9/14/16 8:52 AM, Anastasia Lubennikova wrote:

Could you clarify, please, why do we dump sequence in schemaOnly mode?
+   if (dopt.schemaOnly && dopt.sequence_data)
+   getSequenceData(, tblinfo, numTables, dopt.oids);

The point of this patch is that with the new option, you can dump
sequence data (but not table data) alongside with the schema.  This
would be used by pg_upgrade for the reasons described at the beginning
of the thread.



Oh, thank you. Now I see.
Somewhy I thought that it *always* dumps sequence data in schemaOnly mode.
Which is definitely not true.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-15 Thread Ashutosh Bapat
Hi Amit,

It looks like there is some problem while creating paramterized paths
for multi-level partitioned tables. Here's a longish testcase

CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END
(250) PARTITION BY RANGE (b);
CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END (100);
CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START
(100) END (250);
CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END
(500) PARTITION BY RANGE (c);
CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START
('0250') END ('0400');
CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START
('0400') END ('0500');
CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END
(600) PARTITION BY RANGE ((b + a));
CREATE TABLE prt1_l_p3_p1 PARTITION OF prt1_l_p3 FOR VALUES START
(1000) END (1100);
CREATE TABLE prt1_l_p3_p2 PARTITION OF prt1_l_p3 FOR VALUES START
(1100) END (1200);
INSERT INTO prt1_l SELECT i, i, to_char(i, 'FM') FROM
generate_series(0, 599, 2) i;
CREATE TABLE uprt1_l AS SELECT * FROM prt1_l;

CREATE TABLE prt2_l (a int, b int, c varchar) PARTITION BY RANGE(b);
CREATE TABLE prt2_l_p1 PARTITION OF prt2_l FOR VALUES START (0) END
(250) PARTITION BY RANGE (a);
CREATE TABLE prt2_l_p1_p1 PARTITION OF prt2_l_p1 FOR VALUES START (0) END (100);
CREATE TABLE prt2_l_p1_p2 PARTITION OF prt2_l_p1 FOR VALUES START
(100) END (250);
CREATE TABLE prt2_l_p2 PARTITION OF prt2_l FOR VALUES START (250) END
(500) PARTITION BY RANGE (c);
CREATE TABLE prt2_l_p2_p1 PARTITION OF prt2_l_p2 FOR VALUES START
('0250') END ('0400');
CREATE TABLE prt2_l_p2_p2 PARTITION OF prt2_l_p2 FOR VALUES START
('0400') END ('0500');
CREATE TABLE prt2_l_p3 PARTITION OF prt2_l FOR VALUES START (500) END
(600) PARTITION BY RANGE ((a + b));
CREATE TABLE prt2_l_p3_p1 PARTITION OF prt2_l_p3 FOR VALUES START
(1000) END (1100);
CREATE TABLE prt2_l_p3_p2 PARTITION OF prt2_l_p3 FOR VALUES START
(1100) END (1200);
INSERT INTO prt2_l SELECT i, i, to_char(i, 'FM') FROM
generate_series(0, 599, 3) i;
CREATE TABLE uprt2_l AS SELECT * FROM prt2_l;

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
  ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR:  could not devise a query plan for the given query

Let's replace the laterally referenced relation by an unpartitioned table.

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM uprt1_l t1 LEFT JOIN LATERAL
  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
  ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR:  could not devise a query plan for the given query

Let's replace another partitioned table in the inner query with an
unpartitioned table.

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM uprt1_l t1 LEFT JOIN LATERAL
  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN uprt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
  ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR:  could not devise a query plan for the given query

Please check if you are able to reproduce these errors in your
repository. I made sure that I cleaned up all partition-wise join code
before testing this, but ... .

I tried to debug the problem somewhat. In set_append_rel_pathlist(),
it finds that at least one child has a parameterized path as the
cheapest path, so it doesn't create an unparameterized path for append
rel. At the same time there is a parameterization common to all the
children, so it doesn't create any path. There seem to be two problems
here
1. The children from second level onwards may not be getting
parameterized for lateral references. That seems unlikely but
possible.
2. Reparameterization should have corrected this, but
reparameterize_path() does not support AppendPaths.

On Thu, Sep 15, 2016 at 2:23 PM, Amit Langote
 wrote:
>
> Hi
>
> On 2016/09/09 17:55, Amit Langote wrote:
>> On 2016/09/06 22:04, Amit Langote wrote:
>>> Will fix.
>>
>> Here is an updated set of patches.
>
> An email from Rajkumar somehow managed to break out of this thread.
> Quoting his message below so that I don't end up replying with patches on
> two different threads.
>
> On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
>> I 

Re: [HACKERS] autonomous transactions

2016-09-15 Thread Peter Eisentraut
On 9/3/16 9:08 AM, Serge Rielau wrote:
> Interestingly, despite being supported in PL/SQL on nested BEGIN END blocks, 
> we nearly exclusively see AT’s covering the entire function or trigger.

Oracle only supports an entire PL/SQL function to be declared
autonomous.  But it was pretty easy in my implementation to allow that
for arbitrary blocks.  In fact, it would have been harder not to do that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autonomous transactions

2016-09-15 Thread Peter Eisentraut
On 8/31/16 9:11 AM, Craig Ringer wrote:
> Peter, you mention "Oracle-style autonomous transaction blocks".
> 
> What are the semantics to be expected of those with regards to:
> 
> - Accessing objects exclusively locked by the outer xact or where the
> requested lockmode conflicts with a lock held by the outer xact
> 
> - Visibility of data written by the outer xact

The semantics are that the autonomous session is completely isolated
from the parent session.  It has no special knowledge about transactions
happening on the parent.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autonomous transactions

2016-09-15 Thread Peter Eisentraut
On 8/31/16 8:46 AM, Greg Stark wrote:
> I'm surprised by the background worker. I had envisioned autonomous
> transactions being implemented by allowing a process to reserve a
> second entry in PGPROC with the same pid. Or perhaps save its existing
> information in a separate pgproc slot (or stack of slots) and
> restoring it after the autonomous transaction commits.

There is quite likely room for a feature like that too, but it's not
this one.

> Using a background worker mean that the autonomous transaction can't
> access any state from the process memory.

That is intentional.

Autonomous transactions is actually a misnomer.  It's autonomous
sessions.  But Oracle names it wrong.  Autonomous sessions (or whatever
you want to call them) have their own session state, configuration
parameters, potentially different plugins loaded, different
authorization state, and so on.

> What happens if a
> statement timeout occurs during an autonomous transaction?

Right now not much. ;-)  But I think the behavior should be that the
autonomous session is aborted.

> What
> happens if you use a pl language in the autonomous transaction and if
> it tries to use non-transactional information such as prepared
> statements?

The same thing that happens if you open a new unrelated database
connection and try to do that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: About CMake v2

2016-09-15 Thread Michael Paquier
Yury,

On Mon, Aug 22, 2016 at 7:48 AM, Christian Convey
 wrote:
>>> I glad to hear it. I suppose you can just try build postgres and send all
>>> problems to github tracker.
>>> https://github.com/stalkerg/postgres_cmake/issues
>
> FYI, I had success using your "postgres_cmake" repo.  I tested it up
> through "make check" and "make install".
>
> Here are the details:
>
> * postgres_cmake commit e7e75160d4533cd8caa9f3f0dd7b485dbd4e7bdf
> * compiler = cc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
> * cmake version = 3.5.3

Could it be possible to get a refreshed patch on this thread for at
least the sake of the archives? I'd really like to see somehting
happening here and do some progress for this CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autonomous transactions

2016-09-15 Thread Peter Eisentraut
On 8/31/16 12:38 AM, Jaime Casanova wrote:
> On 30 August 2016 at 20:50, Peter Eisentraut
>  wrote:
>>
>> - Patches to PL/pgSQL to implement Oracle-style autonomous transaction
>> blocks:
>>
>> AS $$
>> DECLARE
>>   PRAGMA AUTONOMOUS_TRANSACTION;
>> BEGIN
>>   FOR i IN 0..9 LOOP
>> START TRANSACTION;
>> INSERT INTO test1 VALUES (i);
>> IF i % 2 = 0 THEN
>> COMMIT;
>> ELSE
>> ROLLBACK;
>> END IF;
>>   END LOOP;
>>
>>   RETURN 42;
>> END;
>> $$;
>>
> 
> this is the syntax it will use?

That is the syntax that Oracle uses.  We could make up our own.

> i just compiled this in head and created a function based on this one.
> The main difference is that the column in test1 it's a pk so i used
> INSERT ON CONFLICT DO NOTHING
> 
> and i'm getting this error
> 
> postgres=# select foo();
> LOG:  namespace item variable itemno 1, name val
> CONTEXT:  PL/pgSQL function foo() line 7 at SQL statement
> STATEMENT:  select foo();
> ERROR:  null value in column "i" violates not-null constraint
> DETAIL:  Failing row contains (null).
> STATEMENT:  INSERT INTO test1 VALUES (val) ON CONFLICT DO NOTHING
> ERROR:  null value in column "i" violates not-null constraint
> DETAIL:  Failing row contains (null).
> CONTEXT:  PL/pgSQL function foo() line 7 at SQL statement
> STATEMENT:  select foo();
> ERROR:  null value in column "i" violates not-null constraint
> DETAIL:  Failing row contains (null).
> CONTEXT:  PL/pgSQL function foo() line 7 at SQL statement
> 
> this happens even everytime i use the PRAGMA even if no START
> TRANSACTION, COMMIT or ROLLBACK are used

The PL/pgSQL part doesn't work well yet.  If you want to play around,
use the PL/Python integration.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-09-15 Thread Michael Paquier
On Thu, Sep 15, 2016 at 8:57 PM, Heikki Linnakangas  wrote:
> I backpatched this to 9.5, but not further than that. The functions this
> modified were moved around in 9.5, so the patch wouldn't apply as is. It
> wouldn't be difficult to back-patch further if there's demand, but I'm not
> eager to do that until someone complains.

Not going older than 9.5 may be fine:
https://www.openssl.org/blog/blog/2014/12/23/the-new-release-strategy/
https://wiki.freebsd.org/OpenSSL
As far as I can see 1.0.2 would be supported until Dec 2019, so that
would just overlap with 9.4's EOL.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequences and pg_upgrade

2016-09-15 Thread Peter Eisentraut
On 9/14/16 8:52 AM, Anastasia Lubennikova wrote:
> Could you clarify, please, why do we dump sequence in schemaOnly mode?
> + if (dopt.schemaOnly && dopt.sequence_data)
> + getSequenceData(, tblinfo, numTables, dopt.oids);

The point of this patch is that with the new option, you can dump
sequence data (but not table data) alongside with the schema.  This
would be used by pg_upgrade for the reasons described at the beginning
of the thread.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] select_parallel test fails with nonstandard block size

2016-09-15 Thread Peter Eisentraut
When building with --with-blocksize=16, the select_parallel test fails
with this difference:

 explain (costs off)
select  sum(parallel_restricted(unique1)) from tenk1
group by(parallel_restricted(unique1));
- QUERY PLAN
-
+QUERY PLAN
+---
  HashAggregate
Group Key: parallel_restricted(unique1)
-   ->  Index Only Scan using tenk1_unique1 on tenk1
-(3 rows)
+   ->  Gather
+ Workers Planned: 4
+ ->  Parallel Seq Scan on tenk1
+(5 rows)

 set force_parallel_mode=1;
 explain (costs off)

We know that different block sizes cause some test failures, mainly
because of row ordering differences.  But this looked a bit different.

The size of the tenk1 table is very similar under either block size:

16k: tenk1 = 2883584
8k:  tenk1 = 2932736

Is there an explanation for this difference, or is there something wrong
in the cost estimation somewhere?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >