date:20150120

Re: [HACKERS] hamerkop is stuck

2015-01-20 Thread Noah Misch

On Mon, Jan 19, 2015 at 10:44:37AM +0900, TAKATSUKA Haruka wrote:
> On Fri, 16 Jan 2015 03:25:00 -0500 Noah Misch  wrote:
> > Buildfarm member hamerkop stopped reporting in after commit f6dc6dd.  After
> > commit 8d9cb0b, it resumed reporting in for 9.3 and earlier branches.  It is
> > still silent for HEAD and REL9_4_STABLE.  It may have stale processes or 
> > stale
> > lock files requiring manual cleanup.  Would you check it?

> Sorry about giving corrupted reports.
> We examine what it is caused by now.

It is good now.  Thanks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Amit Langote

On 20-01-2015 PM 11:29, Amit Kapila wrote:
> 
> I have taken care of integrating the parallel sequence scan with the
> latest patch posted (parallel-mode-v1.patch) by Robert at below
> location:
> http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com
> 
> Changes in this version
> ---
> 1. As mentioned previously, I have exposed one parameter
> ParallelWorkerNumber as used in parallel-mode patch.
> 2. Enabled tuple queue to be used for passing tuples from
> worker backend to master backend along with error queue
> as per suggestion by Robert in the mail above.
> 3. Involved master backend to scan the heap directly when
> tuples are not available in any shared memory tuple queue.
> 4. Introduced 3 new parameters (cpu_tuple_comm_cost,
> parallel_setup_cost, parallel_startup_cost) for deciding the cost
> of parallel plan.  Currently, I have kept the default values for
> parallel_setup_cost and parallel_startup_cost as 0.0, as those
> require some experiments.
> 5. Fixed some issues (related to memory increase as reported
> upthread by Thom Brown and general feature issues found during
> test)
> 
> Note - I have yet to handle the new node types introduced at some
> of the places and need to verify prepared queries and some other
> things, however I think it will be good if I can get some feedback
> at current stage.
> 

I got an assertion failure:

In src/backend/executor/execTuples.c: ExecStoreTuple()

/* passing shouldFree=true for a tuple on a disk page is not sane */
Assert(BufferIsValid(buffer) ? (!shouldFree) : true);

when called from:

In src/backend/executor/nodeParallelSeqscan.c: ParallelSeqNext()

I think something like the following would be necessary (reading from
comments in the code):

--- a/src/backend/executor/nodeParallelSeqscan.c
+++ b/src/backend/executor/nodeParallelSeqscan.c
@@ -85,7 +85,7 @@ ParallelSeqNext(ParallelSeqScanState *node)
if (tuple)
ExecStoreTuple(tuple,
   slot,
-  scandesc->rs_cbuf,
+  fromheap ? scandesc->rs_cbuf : InvalidBuffer,
   !fromheap);

After fixing this, the assertion failure seems to be gone though I
observed the blocked (CPU maxed out) state as reported elsewhere by Thom
Brown.

What I was doing:

CREATE TABLE test(a) AS SELECT generate_series(1, 1000);

postgres=# SHOW max_worker_processes;
 max_worker_processes
--
 8
(1 row)

postgres=# SET seq_page_cost TO 100;
SET

postgres=# SET parallel_seqscan_degree TO 4;
SET

postgres=# EXPLAIN SELECT * FROM test;
   QUERY PLAN
-
 Parallel Seq Scan on test  (cost=0.00..1801071.27 rows=8981483 width=4)
   Number of Workers: 4
   Number of Blocks Per Worker: 8849
(3 rows)

Though, EXPLAIN ANALYZE caused the thing.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-20 Thread Kyotaro HORIGUCHI

Hello, thank you for looking this but sorry that the last patch
was buggy so that adaptive fetch size did not work.

 The attached is the fixed patch. It apparently improves the
performance for the test case shown in the previous mail, in
which the average tuple length is about 140 bytes.

21 Jan 2015 05:22:34 +, Matt Kelly  wrote in 

> I'm trying to compare v5 and v6 in my laptop right now.  Apparently my
> laptop is quite a bit faster than your machine because the tests complete
> in roughly 3.3 seconds.
> 
> I added more data and didn't see anything other than noise.  (Then again
> the queries were dominated by the disk sort so I should retry with larger
> work_mem).  I'll try it again when I have more time to play with it.  I
> suspect the benefits would be more clear over a network.
> 
> Larger than default work_mem yes, but I think one of the prime use case for
> the fdw is for more warehouse style situations (PostgresXL style use
> cases).  In those cases, work_mem might reasonably be set to 1GB.  Then
> even if you have 10KB rows you can fetch a million rows and still be using
> less than work_mem.  A simpler change would be to vary it with respect to
> work_mem.

Agreed about the nature of the typical workload for postgres
FDW. But I think server itself including postgres_fdw should not
crash even by a sudden explosion of tuple length. The number 100
seems to be safe enough but 1000 seems suspicious, and 1 is
looks to be danger from such standpoint.

> Half baked idea: I know its the wrong time in the execution phase, but if
> you are using remote estimates for cost there should also be a row width
> estimate which I believe is based from pg_statistic and its mean column
> width.

It reduces the chance to claim unexpected amount of memory, but
still the chance remains.

> Its actually a pity that there is no way to set fetch sizes based on "give
> me as many tuples as will fit in less than x amount of memory".  Because
> that is almost always exactly what you want.  Even when writing application
> code, I've never actually wanted precisely 10,000 rows; I've always wanted
> "a reasonable size chunk that could fit into memory" and then backed my way
> into how many rows I wanted.  If we were to extend FETCH to support syntax
> like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need
> estimate the value on the fly.

I didn't think about hat. It makes sense, at least to me:) There
would be many cases that the *amount* of data is more crucial
than their number. I'll work on it.

> The async stuff, however, is a huge improvement over the last time I played
> with the fdw.  The two active postgres processes were easily consuming a
> core and half of CPU.  I think its not worth tying these two things
> together.  Its probably worth it to make these two separate discussions and
> separate patches.

Yes, they can be separated and also should be. I'll split them
after this.

> - Matt Kelly
> 
> *Just sanity checking myself: Shutting down the server, applying the
> different patch, 'make clean install' in postgres_fdw, and then restarting
> the server should obviously be sufficient to make sure its running the new
> code because that is all linked at runtime, right?

Yes. it's enough and I also did so. This patch touches only
postgres_fdw.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 12d0e8666871e2272beb1b85903baf0be92881b5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 13 Jan 2015 19:20:35 +0900
Subject: [PATCH] Asynchronous execution of postgres_fdw v7

This is the buf fixed version of v6, which was modified version of
Asynchronous execution of postgres_fdw.
---
 contrib/postgres_fdw/Makefile   |   2 +-
 contrib/postgres_fdw/PgFdwConn.c| 200 ++
 contrib/postgres_fdw/PgFdwConn.h|  61 ++
 contrib/postgres_fdw/connection.c   |  82 
 contrib/postgres_fdw/postgres_fdw.c | 391 +---
 contrib/postgres_fdw/postgres_fdw.h |  15 +-
 6 files changed, 629 insertions(+), 122 deletions(-)
 create mode 100644 contrib/postgres_fdw/PgFdwConn.c
 create mode 100644 contrib/postgres_fdw/PgFdwConn.h

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index d2b98e1..d0913e2 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -1,7 +1,7 @@
 # contrib/postgres_fdw/Makefile
 
 MODULE_big = postgres_fdw
-OBJS = postgres_fdw.o option.o deparse.o connection.o $(WIN32RES)
+OBJS = postgres_fdw.o PgFdwConn.o option.o deparse.o connection.o $(WIN32RES)
 PGFILEDESC = "postgres_fdw - foreign data wrapper for PostgreSQL"
 
 PG_CPPFLAGS = -I$(libpq_srcdir)
diff --git a/contrib/postgres_fdw/PgFdwConn.c b/contrib/postgres_fdw/PgFdwConn.c
new file mode 100644
index 000..b13b597
--- /dev/null
+++ b/contrib/postgres_fdw/PgFdwConn.c
@@ -0,0 +1,200 @@
+/*-
+ *
+ * PgFdwConn.c
+ *		  PGconn e

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 6:39 PM, Peter Geoghegan  wrote:
> On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas  wrote:
>> That might be OK.  Probably needs a bit of performance testing to see
>> how it looks.
>
> Well, we're still only doing it when we do our final merge. So that's
> "only" doubling the number of conversions required, which if we're
> blocked on I/O might not matter at all.

You'll probably prefer the attached. This patch works by disabling
abbreviation, but only after writing out runs, with the final merge
left to go. That way, it doesn't matter when abbreviated keys are not
read back from disk (or regenerated).

I believe this bug was missed because it only occurs when there are
multiple runs, and not in the common case where there is one big
initial run that is found already sorted when we reach mergeruns().
-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 6d3aa88..b6e302f 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2172,6 +2172,22 @@ mergeruns(Tuplesortstate *state)
 		return;
 	}
 
+	if (state->sortKeys->abbrev_converter)
+	{
+		/*
+		 * If there are multiple runs to be merged, when we go to read back
+		 * tuples from disk, abbreviated keys will not have been stored, and we
+		 * don't care to regenerate them.  Disable abbreviation from this point
+		 * on.
+		 */
+		state->sortKeys->abbrev_converter = NULL;
+		state->sortKeys->comparator = state->sortKeys->abbrev_full_comparator;
+
+		/* Not strictly necessary, but be tidy */
+		state->sortKeys->abbrev_abort = NULL;
+		state->sortKeys->abbrev_full_comparator = NULL;
+	}
+
 	/* End of step D2: rewind all output tapes to prepare for merging */
 	for (tapenum = 0; tapenum < state->tapeRange; tapenum++)
 		LogicalTapeRewind(state->tapeset, tapenum, false);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

2015-01-20 Thread Amit Kapila

On Tue, Jan 20, 2015 at 9:52 PM, Robert Haas  wrote:
>
> On Tue, Jan 20, 2015 at 9:41 AM, Amit Kapila 
wrote:
> > It seems [WaitForBackgroundWorkerShutdown] has possibility to wait
forever.
> > Assume one of the worker is not able to start (not able to attach
> > to shared memory or some other reason), then status returned by
> > GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED
> > and after that it can wait forever in WaitLatch.
>
> I don't think that's right.  The status only remains
> BGWH_NOT_YET_STARTED until the postmaster forks the child process.

I think the control flow can reach the above location before
postmaster could fork all the workers.  Consider a case that
we have launched all workers during ExecutorStart phase
and then before postmaster could start all workers an error
occurs in master backend and then it try to Abort the transaction
and destroy the parallel context, at that moment it will get the
above status and wait forever in above code.

I am able to reproduce above scenario with debugger by using
parallel_seqscan patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-01-20 Thread Abhijit Menon-Sen

At 2015-01-20 20:36:39 -0500, robertmh...@gmail.com wrote:
>
> I think this is throwing the baby out with the bathwater.  Neither C
> functions nor all-or-nothing are going to be of any practical use.

Do you see some approach that has a realistic chance of making 9.5 and
would also actually be worth putting into 9.5? Suggestions very welcome.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-20 Thread Amit Kapila

On Tue, Jan 20, 2015 at 9:38 PM, Robert Haas  wrote:
>
> On Mon, Jan 19, 2015 at 10:39 PM, Amit Kapila 
wrote:
> > I think whichever process reads postgresql.conf/postgresql.auto.conf
have
> > to do this (unless we restrict that this will be done at some other
time)
> > and
> > postmaster is one of them.  It seems to me that it is not good idea
unless
> > we do it at other time like populating entries for a view.
>
> Well, the postmaster does not have database access, and neither do
> most of the auxiliary processes.  The autovacuum launcher has very
> limited database access (shared catalogs only).
>
> Making the postmaster access the database is a complete non-starter;
>

Okay and I was also not in favour of this approach.

> > Right, but we can't completely eliminate such a possibility (as an
> > example we have some default settings like max_connections,
> > shared_buffers, etc).  I agree with you that users should use only
> > way of changing the settings, however for certain rare cases (default
> > settings or some other) we can provide a way for user to know which
> > of the settings are duplicate.  I think if we can provide such an
> > information via some view with ease then it's worth to have it.
>
> I'd suggest abandoning the idea of storing anything in a persistent
> basis in a system catalog and look for some way for the backend to
> which the user is connected to expose its own state instead.

Agreed.

>  For
> example, pg_settings already exposes "sourcefile" and "sourceline"
> settings.  Actually, I'm not quite sure why that's not sufficient
> here, but if it isn't, it could be extended.
>

The reason why "sourcefile" and "sourceline" are not sufficient is that
they can only give the information about the setting in last file it is
present.  Assume max_connections (or any other setting) is available
in both postgresql.conf and postgresql.auto.conf, then it will display
the information about the setting in postgresql.auto.conf, so now user
might not be able to decide whether that is the setting he want to retain
unless he knows the information about setting in postgresql.conf.

Now as I have suggested upthread, that we can have a new view
pg_file_settings which will display information about settings even
when there exists multiple entries for the same in different files.

I think adding such information to existing view pg_settings would
be difficult as the same code is used for show commands which
we don't want to change.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-20 Thread David Johnston

On Tue, Jan 20, 2015 at 8:46 PM, Robert Haas  wrote:

> On Tue, Jan 20, 2015 at 4:07 PM, David Johnston
>  wrote:
> > sourceline and sourcefile pertain only to the current value while the
> point
> > of adding these other pieces is to provide a snapshot of all the
> different
> > mappings that the system knows about; instead of having to tell a user
> to go
> > look in two different files (and associated includes) and a database
> catalog
> > to find out what possible values are in place.  That doesn't solve the
> need
> > to scan the catalog to see other possible values - though you could at
> least
> > put a counter in pg_settings that indicates how many pg_db_role_setting
> > entries reference the given variable so that if non-zero the user is
> clued
> > into the fact that they need to check out said catalog table.
>
> This last proposal seems pointless to me.  If the user knows about
> pg_db_role_setting, they will know to check it; if they don't, a
> counter won't fix that.  I can see that there might be some utility to
> a query that would tell you, for a given setting, all sources of that
> setting the system knew about, whether in configuration files,
> pg_db_role_setting, or the current session.  But I don't see that
> putting information that's already available via one system catalog
> query into a different system catalog query helps anything - we should
> presume DBAs know how to write SQL.
>
>
While this is not exactly a high-traffic catalog/view why have them write
a likely 4-way join query (pg_db_role_setting is not the friendly in its
current form), or even two separate queries, when we can give them a solid
and comprehensive view standard.  I guess part of the mixup is that I am
basically talking about a view of multiple catalogs as a DBA UI as opposed
to really even caring what specific catalogs (existing or otherwise) or
functions are needed

to make the whole thing work.  Maybe it does all fit directly on
pg_settings but tacking on some read-only columns to this updateable
view/table doesn't come across as something that should be forbidden in
general.

Maybe I am imagining a use-case that just isn't there but if there are two
separate queries needed, and we call one "consolidated", then having that
query indicate whether the other query has useful information, and it is
quite likely that it will not, avoids the user expending the effort to run
the wasteful secondary query.

David J.

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-20 Thread Pavel Stehule

2015-01-20 21:32 GMT+01:00 Jim Nasby :

> On 1/20/15 11:12 AM, Pavel Stehule wrote:
>
>> I am sending updated version - it allow third optional argument that
>> specify where searching should to start. With it is possible repeatably
>> call this function.
>>
>
> What happened to returning an array of offsets? I think that would be both
> easier to use than this version as well as performing better.
>

I have still thinking about this idea. It needs a different function and I
didn't start with this.

Implementation a optional start parameter to array_offset is cheap - and I
am thinking so it can be useful for some use cases.

>
> I see you dropped multi-dimension support, but I think that's fine.

It can be implemented later. There is no any barriers to later
implementation.

> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Amit Kapila

On Tue, Jan 20, 2015 at 10:59 PM, Thom Brown  wrote:
>
> On 20 January 2015 at 14:29, Amit Kapila  wrote:
>>
>> Note - I have yet to handle the new node types introduced at some
>> of the places and need to verify prepared queries and some other
>> things, however I think it will be good if I can get some feedback
>> at current stage.
>
>
> I'm getting an issue:
>
>
>
> # set parallel_seqscan_degree = 10;
> SET
> Time: 0.219 ms
>
>  ➤ psql://thom@[local]:5488/pgbench
>
>
>  ➤ psql://thom@[local]:5488/pgbench
>
> # explain analyse select c1 from t1;
>
>
> So setting parallel_seqscan_degree above max_worker_processes causes the
CPU to max out, and the query never returns, or at least not after waiting
2 minutes.  Shouldn't it have a ceiling of max_worker_processes?
>

Yes, it should behave that way, but this is not handled in
patch as still we have to decide on what is the best execution
strategy (block-by-block or fixed chunks for different workers)
and based on that I can handle this scenario in patch.

I could return an error for such a scenario or do some work
to handle it seamlessly, but it seems to me that I have to
rework on the same if we select different approach for doing
execution than used in patch, so I am waiting for that to get
decided.  I am planing to work on getting the performance data for
both the approaches, so that we can decide which is better
way to go-ahead.

> The original test I performed where I was getting OOM errors now appears
to be fine:
>

Thanks for confirming the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-20 Thread Matt Kelly

I'm trying to compare v5 and v6 in my laptop right now.  Apparently my
laptop is quite a bit faster than your machine because the tests complete
in roughly 3.3 seconds.

I added more data and didn't see anything other than noise.  (Then again
the queries were dominated by the disk sort so I should retry with larger
work_mem).  I'll try it again when I have more time to play with it.  I
suspect the benefits would be more clear over a network.

Larger than default work_mem yes, but I think one of the prime use case for
the fdw is for more warehouse style situations (PostgresXL style use
cases).  In those cases, work_mem might reasonably be set to 1GB.  Then
even if you have 10KB rows you can fetch a million rows and still be using
less than work_mem.  A simpler change would be to vary it with respect to
work_mem.

Half baked idea: I know its the wrong time in the execution phase, but if
you are using remote estimates for cost there should also be a row width
estimate which I believe is based from pg_statistic and its mean column
width.

Its actually a pity that there is no way to set fetch sizes based on "give
me as many tuples as will fit in less than x amount of memory".  Because
that is almost always exactly what you want.  Even when writing application
code, I've never actually wanted precisely 10,000 rows; I've always wanted
"a reasonable size chunk that could fit into memory" and then backed my way
into how many rows I wanted.  If we were to extend FETCH to support syntax
like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need
estimate the value on the fly.

The async stuff, however, is a huge improvement over the last time I played
with the fdw.  The two active postgres processes were easily consuming a
core and half of CPU.  I think its not worth tying these two things
together.  Its probably worth it to make these two separate discussions and
separate patches.

- Matt Kelly

*Just sanity checking myself: Shutting down the server, applying the
different patch, 'make clean install' in postgres_fdw, and then restarting
the server should obviously be sufficient to make sure its running the new
code because that is all linked at runtime, right?

Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c

2015-01-20 Thread Michael Paquier

On Tue, Jan 20, 2015 at 11:38 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> Coverity is pointing out $subject, with the following stuff in 
>> gbt_var_same():
>> ...
>> As Heikki pointed me out on IM, the lack of crash report in this area,
>> as well as similar coding style in cube/ seem to be sufficient
>> arguments to simply remove those NULL checks instead of doing more
>> solid checks on them. Patch is attached.
>
> The way to form a convincing argument that these checks are unnecessary
> would be to verify that (1) the SQL-accessible functions directly calling
> gbt_var_same() are all marked STRICT, and (2) the core GIST code never
> passes a NULL to these support functions.  I'm prepared to believe that
> (1) and (2) are both true, but it merits checking.

Sure. gbt_var_same is called by those functions in btree_gist/:
- gbt_bit_same
- gbt_bytea_same
- gbt_numeric_same
- gbt_text_same
=# select proname, proisstrict from pg_proc
where proname in ('gbt_bit_same', 'gbt_bytea_same',
'gbt_numeric_same', 'gbt_text_same');
 proname  | proisstrict
--+-
 gbt_text_same| t
 gbt_bytea_same   | t
 gbt_numeric_same | t
 gbt_bit_same | t
(4 rows)

For the second point, I have run regression tests with an assertion in
gbt_var_same checking if t1 or t2 are NULL and tests worked. Also,
looking at the code of gist, gbt_var_same is called through
gistKeyIsEQ, which is used for an insertion or for a split. The point
is that when doing an insertion, a call to gistgetadjusted is done and
we look if one of the keys is NULL. If one of them is, code does not
go through gistKeyIsEQ, so the NULL checks do not seem necessary in
gbt_var_same.

Btw, looking at the code of gist, I think that it would be interesting
to add an assertion in gistKeyIsEQ like that:
Assert(DatumGetPointer(a) != NULL && DatumGetPointer(b) != NULL);
Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH CHECK and Column-Level Privileges

2015-01-20 Thread Stephen Frost

* Noah Misch (n...@leadboat.com) wrote:
> On Mon, Jan 19, 2015 at 11:05:22AM -0500, Stephen Frost wrote:
> > One remaining question is about single-column key violations.  Should we
> > special-case those and allow them to be shown or no?  I can't see a
> > reason not to currently but I wonder if we might have cause to act
> > differently in the future (not that I can think of a reason we'd ever
> > need to).
> 
> Keep them hidden.  Attempting to delete a PK row should not reveal
> otherwise-inaccessible values involved in any constraint violation.  It's
> tempting to make an exception for single-column UNIQUE constraints, but some
> of the ensuing data disclosures are questionable.  What if the violation arose
> from a column default expression or from index corruption?

Interesting point.  I've kept them hidden in this latest version.

> On Mon, Jan 19, 2015 at 11:46:35AM -0500, Stephen Frost wrote:
> > Right, ExecBuildSlotValueDescription() was made to be consistent with
> > BuildIndexValueDescription.  The reason for BuildIndexValueDescription
> > returning an empty string is different from why you hypothosize though-
> > it's exported and I was a bit worried that existing external callers
> > might not be prepared for it to return a NULL instead of a string of
> > some kind.  If this was a green field instead of a back-patch fix, I'd
> > certainly return NULL instead.
> > 
> > If others feel that's not a good reason to avoid returning NULL, I can
> > certainly change it around.
> 
> I won't lose sleep if it does return "" for that reason, but I'm relatively
> unconcerned about extension API compatibility here.  That function is called
> from datatype-independent index uniqueness checks.  This mailing list has
> discussed the difficulties of implementing an index access method in an
> extension, and no such extension has come forward.

Alright, I've reworked this to have those functions return NULL instead.

> Your latest patch has whitespace errors; visit "git diff --check".

Yeah, I had done that but then made a few additional tweaks and didn't
recheck, sorry about that.  I'm playing around w/ my git workflow a bit
and hopefully it won't happen again. :)

Updated patch attached.

Thanks!

Stephen
From d9f0e186a74e4d11e9c7f3181dbf98bae3224111 Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 12 Jan 2015 17:04:11 -0500
Subject: [PATCH] Fix column-privilege leak in error-message paths

While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.

Instead, include only those columns which the user is providing or which
the user has select rights on.  If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription.  Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.

Back-patch all the way, as column-level privileges are now in all
supported versions.

This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
---
 src/backend/access/index/genam.c |  59 ++-
 src/backend/access/nbtree/nbtinsert.c|  12 ++-
 src/backend/commands/copy.c  |   6 +-
 src/backend/commands/matview.c   |   7 ++
 src/backend/executor/execMain.c  | 171 ---
 src/backend/executor/execUtils.c |  12 ++-
 src/backend/utils/adt/ri_triggers.c  |  78 ++
 src/backend/utils/sort/tuplesort.c   |   9 +-
 src/test/regress/expected/privileges.out |  31 ++
 src/test/regress/sql/privileges.sql  |  24 +
 10 files changed, 338 insertions(+), 71 deletions(-)

diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 850008b..69cbbd5 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -25,10 +25,12 @@
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
+#include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
+#include "utils/syscache.h"
 #include "utils/tqual.h"
 
 
@@ -154,6 +156,9 @@ IndexScanEnd(IndexScanDesc scan)
  * form "(key_name, ...)=(key_value, ...)".  This is currently used
  * for building unique-constraint and exclusion-constraint error messages.
  *
+ * Note that if the user does not have permissions to view the columns
+ * involved, a NULL is returned.
+ *
  * The passed-in values/nulls arrays are the "raw" input to the index AM,
  * e.g. results of FormIndexD

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 4:07 PM, David Johnston
 wrote:
> sourceline and sourcefile pertain only to the current value while the point
> of adding these other pieces is to provide a snapshot of all the different
> mappings that the system knows about; instead of having to tell a user to go
> look in two different files (and associated includes) and a database catalog
> to find out what possible values are in place.  That doesn't solve the need
> to scan the catalog to see other possible values - though you could at least
> put a counter in pg_settings that indicates how many pg_db_role_setting
> entries reference the given variable so that if non-zero the user is clued
> into the fact that they need to check out said catalog table.

This last proposal seems pointless to me.  If the user knows about
pg_db_role_setting, they will know to check it; if they don't, a
counter won't fix that.  I can see that there might be some utility to
a query that would tell you, for a given setting, all sources of that
setting the system knew about, whether in configuration files,
pg_db_role_setting, or the current session.  But I don't see that
putting information that's already available via one system catalog
query into a different system catalog query helps anything - we should
presume DBAs know how to write SQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-01-20 Thread Stephen Frost

Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
> +1. In particular I'm very concerned with the idea of doing this via roles, 
> because that would make it trivial for any superuser to disable auditing. The 
> only good option I could see to provide this kind of flexibility would be 
> allowing the user to provide a function that accepts role, object, etc and 
> make return a boolean. The performance of that would presumably suck with 
> anything but a C function, but we could provide some C functions to handle 
> simple cases.

Superusers will be able to bypass, trivially, anything that's done in
the process space of PG.  The only possible exception to that being an
SELinux or similar solution, but I don't think that's what you were
getting at.

I certainly don't think having the user provide a C function to specify
what should be audited as making any sense- if they can do that, they
can use the same hooks pgaudit is using and skip the middle-man.  As for
the performance concern you raise, I actually don't buy into it at all.
It's not like we worry about the performance of checking permissions on
objects in general and, for my part, I like to think that's because it's
pretty darn quick already.

> That said, I think the best idea at this stage is either log everything or 
> nothing. We can always expand upon that later.

We've already got those options and they are, clearly, insufficient for
a large number of our users.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-01-20 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sen  
> wrote:
> > So when I'm trying to decide what to audit, I have to:
> >
> > (a) check if the current user is mentioned in .roles; if so, audit.
> >
> > (b) check if the current user is a descendant of one of the roles
> > mentioned in .roles; if not, no audit.
> >
> > (c) check for permissions granted to the "root" role and see if that
> > tells us to audit.
> >
> > Is that right? If so, it would work fine. I don't look forward to trying
> > to explain it to people, but yes, it would work (for anything you could
> > grant permissions for).
> 
> I think this points to fundamental weakness in the idea of doing this
> through the GRANT system.  Some people are going want to audit
> everything a particular user does, some people are going to want to
> audit all access to particular objects, and some people will have more
> complicated requirements.  Some people will want to audit even
> super-users, others especially super-users, others only non
> super-users.  None of this necessarily matches up to the usual
> permissions framework.

I'm hopeful that my email from a few minutes ago provides a way to cover
all of the above, while still making it easy for users who just want to
say "audit everything this user does" or "audit everything which touches
this column".

Any auditing of superusers is going to be circumventable by those same
superusers if it's happening in the context of the PG process, so I'm
not sure why they would be any different under this scheme vs. some
other scheme.

Don't get me wrong- I agree completely that using the GRANT system isn't
ideal, but I don't see anyone proposing another way for an extension to
store metadata on catalog tables with the same granularity the GRANT
system provides and its dependency handling and ability to have default
values.  We've been over that ground a number of times and I certainly
don't feel like we're any closer to solving it and I'd hate to see that
block pgaudit.

Put another way, if the requirement for pgaudit is that it has to solve
the metadata-on-catalogs-for-extensions problem, I think I'd rather just
move pgaudit into core instead, give it catalog tables, make it part of
the dependency system, provide syntax for it, etc.  I'm pretty sure
that'd be easier and happen a lot faster.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Arne Scheffer



Andrew Dunstan schrieb am 2015-01-20:

> On 01/20/2015 01:26 PM, Arne Scheffer wrote:
> >Interesting patch.
> >I did a quick review looking only into the patch file.

> >The "sum of variances" variable contains
> >the "sum of squared differences" instead, I think.

> Umm, no. It's not.

Umm, yes, i think, it is ;-)

>   e->counters.sum_var_time +=
>   (total_time - old_mean) * (total_time - e->counters.mean_time);
> This is not a square that's being added.

That's correct.
Nevertheless it's the difference between the computed sum of squared
differences
and the preceeding one, added in every step.

> old_mean is not the same as
> e->counters.mean_time.

> Since the variance is this value divided by (n - 1), AIUI, I think
> "sum
> of variances" isn't a bad description. I'm open to alternative
> suggestions.

> >And a very minor aspect:
> >The term "standard deviation" in your code stands for
> >(corrected) sample standard deviation, I think,
> >because you devide by n-1 instead of n to keep the
> >estimator unbiased.
> >How about mentioning the prefix "sample"
> >to indicate this beiing the estimator?

> I don't understand. I'm following pretty exactly the calculations
> stated
> at 

(There is nothing bad about that calculations, Welford's algorithm
 is simply sequently adding the differences mentioned above.)

VlG-Arne


> I'm not a statistician. Perhaps others who are more literate in
> statistics can comment on this paragraph.

> >And I'm sure I'm missing C specifics (again)
> >(or it's the reduced patch file scope),
> >but you introduce sqrtd, but sqrt is called?


> Good catch. Will fix.

> cheers

> andrew


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-01-20 Thread Stephen Frost

Abhijit,

* Abhijit Menon-Sen (a...@2ndquadrant.com) wrote:
> At 2015-01-19 08:26:59 -0500, sfr...@snowman.net wrote:
> > I'm confused by this statement..
> 
> Let me see if I've understood your clarification:

Thanks much for the example use-case and for working this through with
me.  I actually think I've come up with a further specification which
might allow us to make this extremely flexible, but also simple for
those who want to keep it simple.

Consider this:

Everything is single-level to the roles mentioned in the GUC.

Is the logged in role a member of one of the GUC roles?
  Yes -> Audit

Now to cover the "user X for table Y" case:

Did any of the GUC value roles grant SELECT rights for this table to the
current role?
  Yes -> Audit SELECT on the table by the current role

Did any of the GUC value roles grant INSERT rights for this table to the
current role?
  Yes -> Audit INSERT on the table by the current role

etc.

For the 'log all access to an object' case, under this scheme, I'm
afraid we'd need some special role to GRANT the access to.  We wouldn't
want that to simply be 'public' since then it might actually be
granted access rights that we don't want to.  We can't simply use the
same role because you need to grant that role whatever access 'with
grant option' in order for it to be able to re-grant the privilege.

With the special role, it becomes:

Does the special role have SELECT rights on the table?
  Yes -> Audit SELECTs on the table

Does the special role have INSERT rights on the table?
  Yes -> Audit INSERTs on the table

> Suppose you have pgaudit.roles set to 'r0, r1', and that you have
> granted r0 to u0.

Not quite- I wasn't thinking you'd grant r0 to u0 but rather the other
way around- u0 is granted to r0.  If you granted r0 to u0, then u0 would
have all of r0's rights which could be quite a bit larger than you want
u0 to have.  It only works in the other direction.

> Now, if you're connected to the database as r0 or r1, then everything
> you do is logged. But if you're connected as u0, then only those things
> are logged that can be derived (in a manner discussed later) from the
> permissions explicitly granted to r0 (but not u0)?

> So when I'm trying to decide what to audit, I have to:
> 
> (a) check if the current user is mentioned in .roles; if so, audit.
> 
> (b) check if the current user is a descendant of one of the roles
> mentioned in .roles; if not, no audit.
> 
> (c) check for permissions granted to the "root" role and see if that
> tells us to audit.
> 
> Is that right? If so, it would work fine. I don't look forward to trying
> to explain it to people, but yes, it would work (for anything you could
> grant permissions for).

This is pretty close- (a) and (b) are mostly correct, though I would
strongly discourage users from actually logging in as an audit role.
The one caveat with (b) is that 'if not, no audit' is not correct- all
cases are essentially OR'd together when it comes to auditing.  Roles
can be audited even if they are not descendants of the roles mentioned
in .roles.

Review the opening of this email though and consider that we could look
at "what privileges has the audit role granted to the current role?" as
defining what is to be audited.

> > You can't say that you want r1 to have X actions logged, with r2
> > having Y actions logged.
> 
> Right. But how do you do that for non-GRANT-able actions in your scheme?
> For example, what if I want to see all the tables created and dropped by
> a particular user?

I hadn't been intending to address that with this scheme, but I think we
have that by looking for privilege grants where the audit role is the
grantee and the role-to-be-audited the grantor.

> > Have you considered splitting pgaudit.c into multiple files, perhaps
> > along the pre/post deparse lines?
> 
> If such a split were done properly, then could the backwards-compatible
> version be more acceptable for inclusion in contrib in 9.5? If so, I'll
> look into it.

As Robert says, the short answer is 'no'- but it might make it easier to
get the 9.5 bits into 9.5.. :)

> > The key part above is "any".  We're using the grant system as a proxy
> > for saying "I want to audit column X".  That's different from "I want
> > to audit commands which would be allowed if I *only* had access to
> > column X".  If I want to audit access to column X, then:
> > 
> > select A, B, X from mytable;
> > 
> > Should be audited, even if the audit role doesn't have access to
> > columns A or B.
> 
> Yes, that's what the code already does (modulo handling of the audit
> role's oid, which I tweaked to match the behaviour described earlier
> in this mail). I did the following:
> 
> create table x(a int,b int,c int);
> insert into x(a,b,c) values (1,2,3);
> 
> create user y;
> grant select on x to y;
> 
> create role x;
> grant select (a) on table x to x;
> grant x to y;
> 
> Then, as user y, I did «select a,b,c fro

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 6:34 PM, Robert Haas  wrote:
> That might be OK.  Probably needs a bit of performance testing to see
> how it looks.

Well, we're still only doing it when we do our final merge. So that's
"only" doubling the number of conversions required, which if we're
blocked on I/O might not matter at all.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 9:33 PM, Peter Geoghegan  wrote:
> On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas  wrote:
>> I don't want to change the on-disk format for tapes without a lot more
>> discussion.  Can you come up with a fix that avoids that for now?
>
> A more conservative approach would be to perform conversion on-the-fly
> once more. That wouldn't be patently unacceptable from a performance
> perspective, and would also not change the on-disk format for tapes.
>
> Thoughts?

That might be OK.  Probably needs a bit of performance testing to see
how it looks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 6:30 PM, Robert Haas  wrote:
> I don't want to change the on-disk format for tapes without a lot more
> discussion.  Can you come up with a fix that avoids that for now?

A more conservative approach would be to perform conversion on-the-fly
once more. That wouldn't be patently unacceptable from a performance
perspective, and would also not change the on-disk format for tapes.

Thoughts?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan  wrote:
> On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas  wrote:
>> I was assuming we were going to fix this by undoing the abbreviation
>> (as in the abort case) when we spill to disk, and not bothering with
>> it thereafter.
>
> The spill-to-disk case is at least as compelling at the internal sort
> case. The overhead of comparisons is much higher for tapesort.
>
> Attached patch serializes keys. On reflection, I'm inclined to go with
> this approach. Even if the CPU overhead of reconstructing strxfrm()
> blobs is acceptable for text, it might be much more expensive for
> other types. I'm loathe to throw away those abbreviated keys
> unnecessarily.
>
> We don't have to worry about having aborted abbreviation, since once
> we spill to disk we've effectively committed to abbreviation. This
> patch formalizes the idea that there is strictly a pass-by-value
> representation required for such cases (but not that the original
> Datums must be of a pass-by-reference, which is another thing
> entirely). I've tested it some, obviously with Andrew's testcase and
> the regression tests, but also with my B-Tree verification tool.
> Please review it.
>
> Sorry about this.

I don't want to change the on-disk format for tapes without a lot more
discussion.  Can you come up with a fix that avoids that for now?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 5:46 PM, Peter Geoghegan  wrote:
> Would you prefer it if the spill-to-disk case
> aborted in the style of low entropy keys? That doesn't seem
> significantly safer than this, and it certainly not acceptable from a
> performance perspective.

BTW, I can write that patch if that's your preference. Should I?

I just don't favor that even as a short term correctness fix, because
it seems unacceptable to throw away all the strxfrm() work where
that's a very predictable and even likely outcome. I suggest reviewing
and committing my fix as a short term fix, that may well turn out to
be generally acceptable upon further consideration. Yes, we'll need to
make a point of reviewing an already committed patch, but there is a
precedent for that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 5:42 PM, Robert Haas  wrote:
> On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan  wrote:
>> On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas  wrote:
>>> I was assuming we were going to fix this by undoing the abbreviation
>>> (as in the abort case) when we spill to disk, and not bothering with
>>> it thereafter.
>>
>> The spill-to-disk case is at least as compelling at the internal sort
>> case. The overhead of comparisons is much higher for tapesort.
>
> First, we need to unbreak this.  Then, we can look at optimizing it.
> The latter task will require performance testing.

I don't see that any alternative isn't a performance trade-off. My
patch accomplishes unbreaking this. I agree that it needs still needs
review from that perspective, but it doesn't seem any worse than any
other alternative. Would you prefer it if the spill-to-disk case
aborted in the style of low entropy keys? That doesn't seem
significantly safer than this, and it certainly not acceptable from a
performance perspective.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 8:39 PM, Peter Geoghegan  wrote:
> On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas  wrote:
>> I was assuming we were going to fix this by undoing the abbreviation
>> (as in the abort case) when we spill to disk, and not bothering with
>> it thereafter.
>
> The spill-to-disk case is at least as compelling at the internal sort
> case. The overhead of comparisons is much higher for tapesort.

First, we need to unbreak this.  Then, we can look at optimizing it.
The latter task will require performance testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 5:32 PM, Robert Haas  wrote:
> I was assuming we were going to fix this by undoing the abbreviation
> (as in the abort case) when we spill to disk, and not bothering with
> it thereafter.

The spill-to-disk case is at least as compelling at the internal sort
case. The overhead of comparisons is much higher for tapesort.

Attached patch serializes keys. On reflection, I'm inclined to go with
this approach. Even if the CPU overhead of reconstructing strxfrm()
blobs is acceptable for text, it might be much more expensive for
other types. I'm loathe to throw away those abbreviated keys
unnecessarily.

We don't have to worry about having aborted abbreviation, since once
we spill to disk we've effectively committed to abbreviation. This
patch formalizes the idea that there is strictly a pass-by-value
representation required for such cases (but not that the original
Datums must be of a pass-by-reference, which is another thing
entirely). I've tested it some, obviously with Andrew's testcase and
the regression tests, but also with my B-Tree verification tool.
Please review it.

Sorry about this.
-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 6d3aa88..a442617 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -413,10 +413,13 @@ struct Tuplesortstate
  *
  * If state->randomAccess is true, then the stored representation of the
  * tuple must be followed by another "unsigned int" that is a copy of the
- * length --- so the total tape space used is actually sizeof(unsigned int)
- * more than the stored length value.  This allows read-backwards.  When
- * randomAccess is not true, the write/read routines may omit the extra
- * length word.
+ * length (less any abbreviated key that is stored, which is also possible) ---
+ * so the total tape space used is actually sizeof(unsigned int) more than the
+ * stored length value, as well as another sizeof(Datum) overhead for storing
+ * abbreviated keys.  This allows read-backwards, and avoids the need to
+ * re-compute abbreviated keys.  When randomAccess is not true, the write/read
+ * routines may omit the extra length word.  Also, when abbreviation is not in
+ * play, abbreviated keys are not stored.
  *
  * writetup is expected to write both length words as well as the tuple
  * data.  When readtup is called, the tape is positioned just after the
@@ -3124,11 +3127,15 @@ writetup_heap(Tuplesortstate *state, int tapenum, SortTuple *stup)
 	char	   *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET;
 	unsigned int tupbodylen = tuple->t_len - MINIMAL_TUPLE_DATA_OFFSET;
 
-	/* total on-disk footprint: */
+	/* total on-disk footprint for tuple: */
 	unsigned int tuplen = tupbodylen + sizeof(int);
 
 	LogicalTapeWrite(state->tapeset, tapenum,
 	 (void *) &tuplen, sizeof(tuplen));
+	/* Store abbreviated key, if any (not accounted for by tuplen) */
+	if (state->sortKeys->abbrev_converter)
+		LogicalTapeWrite(state->tapeset, tapenum,
+		 (void *) &stup->datum1, sizeof(Datum));
 	LogicalTapeWrite(state->tapeset, tapenum,
 	 (void *) tupbody, tupbodylen);
 	if (state->randomAccess)	/* need trailing length word? */
@@ -3150,6 +3157,10 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,
 	HeapTupleData htup;
 
 	USEMEM(state, GetMemoryChunkSpace(tuple));
+	/* read in the tuple abbreviated key, if any */
+	if (state->sortKeys->abbrev_converter)
+		LogicalTapeReadExact(state->tapeset, tapenum, (void *) &stup->datum1,
+			 sizeof(Datum));
 	/* read in the tuple proper */
 	tuple->t_len = tuplen;
 	LogicalTapeReadExact(state->tapeset, tapenum,
@@ -3161,10 +3172,12 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,
 	/* set up first-column key value */
 	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
 	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
-	stup->datum1 = heap_getattr(&htup,
-state->sortKeys[0].ssup_attno,
-state->tupDesc,
-&stup->isnull1);
+	/* set up first-column key value (for non-abbreviated case) */
+	if (!state->sortKeys->abbrev_converter)
+		stup->datum1 = heap_getattr(&htup,
+	state->sortKeys[0].ssup_attno,
+	state->tupDesc,
+	&stup->isnull1);
 }
 
 /*
@@ -3355,12 +3368,20 @@ writetup_cluster(Tuplesortstate *state, int tapenum, SortTuple *stup)
 {
 	HeapTuple	tuple = (HeapTuple) stup->tuple;
 	unsigned int tuplen = tuple->t_len + sizeof(ItemPointerData) + sizeof(int);
+	AttrNumber	leading = state->indexInfo->ii_KeyAttrNumbers[0];
 
-	/* We need to store t_self, but not other fields of HeapTupleData */
+	/*
+	 * We need to store t_self, but not other fields of HeapTupleData (although
+	 * possibly abbreviated key value)
+	 */
 	LogicalTapeWrite(state->tapeset, tapenum,
 	 &tuplen, sizeof(tuplen));
 	LogicalTapeWrite(state->tapeset, tapenum,
 	 &tuple->t_self, sizeof(ItemPointerData));
+	/* Store abbreviated key, if any (not accounted fo

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 5:03 PM, Jim Nasby  wrote:
> +1. In particular I'm very concerned with the idea of doing this via roles,
> because that would make it trivial for any superuser to disable auditing.
> The only good option I could see to provide this kind of flexibility would
> be allowing the user to provide a function that accepts role, object, etc
> and make return a boolean. The performance of that would presumably suck
> with anything but a C function, but we could provide some C functions to
> handle simple cases.
>
> That said, I think the best idea at this stage is either log everything or
> nothing. We can always expand upon that later.

I think this is throwing the baby out with the bathwater.  Neither C
functions nor all-or-nothing are going to be of any practical use.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 7:07 PM, Peter Geoghegan  wrote:
> On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan  wrote:
>> It's certainly possible to fix Andrew's test case with the attached.
>> I'm not sure that that's the appropriate fix, though: there is
>> probably a case to be made for not bothering with abbreviation once
>> we've read tuples in for the final merge run. More likely, the
>> strongest case is for storing the abbreviated keys on disk too, and
>> reading those back.
>
> Maybe not, though: An extra 8 bytes per tuple on disk is not free.
> OTOH, if we're I/O bound on the final merge, as we ought to be, then
> recomputing the abbreviated keys could make sense, since there may
> well be an idle CPU core anyway.

I was assuming we were going to fix this by undoing the abbreviation
(as in the abort case) when we spill to disk, and not bothering with
it thereafter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-20 Thread Tomas Vondra

On 21.1.2015 00:38, Michael Paquier wrote:
> On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra
>
>> I've tried to reproduce this on my Raspberry PI 'machine' and it's not
>> very difficult to trigger this. About 7 out of 10 'make check' runs fail
>> because of 'pgstat wait timeout'.
>>
>> All the occurences I've seen were right after some sort of VACUUM
>> (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
>> looked something like this:
>>
>> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> mmcblk0   0.0075.000.008.00 0.0036.00
>> 9.00 5.73 15633.750.00 15633.75 125.00 100.00
>>
>> So pretty terrible (this is a Class 4 SD card, supposedly able to
>> handle 4 MB/s). If hamster had faulty SD card, it might have been
>> much worse, I guess.
>
> By experience, a class 10 is at least necessary, with a minimum
> amount of memory to minimize the apparition of those warnings,
> hamster having now a 8GB class 10 card.

Well, my goal was exactly to produce those warnings ;-) and see if I can
identify some strange cases. That's why I chose just class 4. But even
then it produces rather low number of those warnings (one or two per
check run), and mostly at the expected places with significant I/O
overload. So I'm not any wiser :-(

regards
Tomas




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 3:57 PM, Peter Geoghegan  wrote:
> It's certainly possible to fix Andrew's test case with the attached.
> I'm not sure that that's the appropriate fix, though: there is
> probably a case to be made for not bothering with abbreviation once
> we've read tuples in for the final merge run. More likely, the
> strongest case is for storing the abbreviated keys on disk too, and
> reading those back.

Maybe not, though: An extra 8 bytes per tuple on disk is not free.
OTOH, if we're I/O bound on the final merge, as we ought to be, then
recomputing the abbreviated keys could make sense, since there may
well be an idle CPU core anyway.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 3:34 PM, Peter Geoghegan  wrote:
> On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas  wrote:
>> Dear me.  Peter, can you fix this RSN?
>
> Investigating.

It's certainly possible to fix Andrew's test case with the attached.
I'm not sure that that's the appropriate fix, though: there is
probably a case to be made for not bothering with abbreviation once
we've read tuples in for the final merge run. More likely, the
strongest case is for storing the abbreviated keys on disk too, and
reading those back.

-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 6d3aa88..adf4c4d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3148,6 +3148,7 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,
 	MinimalTuple tuple = (MinimalTuple) palloc(tuplen);
 	char	   *tupbody = (char *) tuple + MINIMAL_TUPLE_DATA_OFFSET;
 	HeapTupleData htup;
+	Datum			original;
 
 	USEMEM(state, GetMemoryChunkSpace(tuple));
 	/* read in the tuple proper */
@@ -3161,10 +3162,29 @@ readtup_heap(Tuplesortstate *state, SortTuple *stup,
 	/* set up first-column key value */
 	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
 	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
-	stup->datum1 = heap_getattr(&htup,
-state->sortKeys[0].ssup_attno,
-state->tupDesc,
-&stup->isnull1);
+	/* Once again, store abbreviated key representation */
+	original = heap_getattr(&htup,
+			state->sortKeys[0].ssup_attno,
+			state->tupDesc,
+			&stup->isnull1);
+
+	if (!state->sortKeys->abbrev_converter || stup->isnull1)
+	{
+		/*
+		 * Store ordinary Datum representation, or NULL value.  If there is a
+		 * converter it won't expect NULL values, and cost model is not
+		 * required to account for NULL, so in that case we avoid calling
+		 * converter and just set datum1 to "void" representation (to be
+		 * consistent).
+		 */
+		stup->datum1 = original;
+	}
+	else
+	{
+		/* Store abbreviated key representation */
+		stup->datum1 = state->sortKeys->abbrev_converter(original,
+		 state->sortKeys);
+	}
 }
 
 /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Andrew Dunstan



On 01/20/2015 06:32 PM, David G Johnston wrote:

Andrew Dunstan wrote

On 01/20/2015 01:26 PM, Arne Scheffer wrote:

And a very minor aspect:
The term "standard deviation" in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix "sample"
to indicate this beiing the estimator?


I don't understand. I'm following pretty exactly the calculations stated
at ;


I'm not a statistician. Perhaps others who are more literate in
statistics can comment on this paragraph.

I'm largely in the same boat as Andrew but...

I take it that Arne is referring to:

http://en.wikipedia.org/wiki/Bessel's_correction

but the mere presence of an (n-1) divisor does not mean that is what is
happening.  In this particular situation I believe the (n-1) simply is a
necessary part of the recurrence formula and not any attempt to correct for
sampling bias when estimating a population's variance.  In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary.  I guess it
boils down to whether "future" queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance.  Note point 3 in the
linked Wikipedia article.





Thanks. Still not quite sure what to do, though :-) I guess in the end 
we want the answer to come up with similar results to the builtin stddev 
SQL function. I'll try to set up a test program, to see if we do.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-20 Thread Michael Paquier

On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra
 wrote:
> On 25.12.2014 22:28, Tomas Vondra wrote:
>> On 25.12.2014 21:14, Andres Freund wrote:
>>
>>> That's indeed odd. Seems to have been lost when the statsfile was
>>> split into multiple files. Alvaro, Tomas?
>>
>> The goal was to keep the logic as close to the original as possible.
>> IIRC there were "pgstat wait timeout" issues before, and in most cases
>> the conclusion was that it's probably because of overloaded I/O.
>>
>> But maybe there actually was another bug, and it's entirely possible
>> that the split introduced a new one, and that's what we're seeing now.
>> The strange thing is that the split happened ~2 years ago, which is
>> inconsistent with the sudden increase of this kind of issues. So maybe
>> something changed on that particular animal (a failing SD card causing
>> I/O stalls, perhaps)?
>>
>> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
>> and analyze the issue locally. But that won't happen until January.
>
> I've tried to reproduce this on my Raspberry PI 'machine' and it's not
> very difficult to trigger this. About 7 out of 10 'make check' runs fail
> because of 'pgstat wait timeout'.
>
> All the occurences I've seen were right after some sort of VACUUM
> (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
> looked something like this:
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> mmcblk0   0.0075.000.008.00 0.0036.00
> 9.00 5.73 15633.750.00 15633.75 125.00 100.00
>
> So pretty terrible (this is a Class 4 SD card, supposedly able to handle
> 4 MB/s). If hamster had faulty SD card, it might have been much worse, I
> guess.
By experience, a class 10 is at least necessary, with a minimum amount
of memory to minimize the apparition of those warnings, hamster having
now a 8GB class 10 card.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 3:34 PM, Robert Haas  wrote:
> Dear me.  Peter, can you fix this RSN?

Investigating.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 3:33 PM, Robert Haas  wrote:
> Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's
> failing make check.  Ditto hamerkop (Windows 2k8/VC++) and currawong
> (Windows XP Pro/MSVC++).  jacana (Windows 8/gcc) and brolga (Windows
> XP Pro/cygwin) are unhappy too, although the failures are showing up
> in different build stages rather than in 'make check'.  narwhal
> (Windows 2k3/mingw) and frogmouth (Windows XP Pro/gcc) are happy,
> though, so it's not affecting ALL of the Windows critters.  Still, I'm
> leaning toward the view that we should disable this optimization
> across-the-board on Windows until somebody has time to do the legwork
> to figure out what it takes to make it work, and what makes it work on
> some of these critters and fail on others.  We can't leave the
> buildfarm red for long periods of time.

Fair enough.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 6:27 PM, Andrew Gierth
 wrote:
>> "Robert" == Robert Haas  writes:
>  Robert> All right, it seems Tom is with you on that point, so after
>  Robert> some study, I've committed this with very minor modifications.
>
> While hacking up a patch to demonstrate the simplicity of extending this
> to the Datum sorter, I seem to have run into a fairly major issue with
> this: there seems to be no attempt whatsoever to handle spilling to disk
> correctly. The data spilled to disk only has the un-abbreviated values,
> but nothing tries to re-abbreviate it (or disable abbreviations) when it
> is read back in, and chaos ensues:

Dear me.  Peter, can you fix this RSN?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 10:54 AM, Robert Haas  wrote:
> On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan  wrote:
>> I think that the attached patch should at least fix that much. Maybe
>> the problem on the other animal is also explained by the lack of this,
>> since there could also be a MinGW-ish strxfrm_l(), I suppose.
>
> Committed that, rather blindly, since it looks like a reasonable fix.

Peter, this made bowerbird (Windows 8/Visual Studio) build, but it's
failing make check.  Ditto hamerkop (Windows 2k8/VC++) and currawong
(Windows XP Pro/MSVC++).  jacana (Windows 8/gcc) and brolga (Windows
XP Pro/cygwin) are unhappy too, although the failures are showing up
in different build stages rather than in 'make check'.  narwhal
(Windows 2k3/mingw) and frogmouth (Windows XP Pro/gcc) are happy,
though, so it's not affecting ALL of the Windows critters.  Still, I'm
leaning toward the view that we should disable this optimization
across-the-board on Windows until somebody has time to do the legwork
to figure out what it takes to make it work, and what makes it work on
some of these critters and fail on others.  We can't leave the
buildfarm red for long periods of time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread David G Johnston

Andrew Dunstan wrote
> On 01/20/2015 01:26 PM, Arne Scheffer wrote:
>>
>> And a very minor aspect:
>> The term "standard deviation" in your code stands for
>> (corrected) sample standard deviation, I think,
>> because you devide by n-1 instead of n to keep the
>> estimator unbiased.
>> How about mentioning the prefix "sample"
>> to indicate this beiing the estimator?
> 
> 
> I don't understand. I'm following pretty exactly the calculations stated 
> at ;
> 
> 
> I'm not a statistician. Perhaps others who are more literate in 
> statistics can comment on this paragraph.

I'm largely in the same boat as Andrew but...

I take it that Arne is referring to:

http://en.wikipedia.org/wiki/Bessel's_correction

but the mere presence of an (n-1) divisor does not mean that is what is
happening.  In this particular situation I believe the (n-1) simply is a
necessary part of the recurrence formula and not any attempt to correct for
sampling bias when estimating a population's variance.  In fact, as far as
the database knows, the values provided to this function do represent an
entire population and such a correction would be unnecessary.  I guess it
boils down to whether "future" queries are considered part of the population
or whether the population changes upon each query being run and thus we are
calculating the ever-changing population variance.  Note point 3 in the
linked Wikipedia article.

David J.

--
View this message in context: 
http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Andrew Gierth

> "Robert" == Robert Haas  writes:

 Robert> All right, it seems Tom is with you on that point, so after
 Robert> some study, I've committed this with very minor modifications.

While hacking up a patch to demonstrate the simplicity of extending this
to the Datum sorter, I seem to have run into a fairly major issue with
this: there seems to be no attempt whatsoever to handle spilling to disk
correctly. The data spilled to disk only has the un-abbreviated values,
but nothing tries to re-abbreviate it (or disable abbreviations) when it
is read back in, and chaos ensues:

set work_mem = 64;
select v, v > lag(v) over (order by v)
  from (select 'B'||i as v from generate_series(1,1) i
union all select 'a'||i from generate_series(1,1) i offset
  0) s
  order by v limit 20;

   v| ?column? 
+--
 a1 | 
 B1 | f
 a1000  | t
 a1001  | t
 a1002  | t
 a1003  | t
 B1000  | f
 B1001  | t
 B1002  | t
 B1003  | t
 B1004  | t
 B1005  | t
 a1004  | t
 a1005  | t
 a1006  | t
 a1007  | t
 a1008  | t
 B1 | f
 B10| t
 B100   | t
(20 rows)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Tomas Vondra

Hi,

On 20.1.2015 21:13, Jeff Davis wrote:
> On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane  wrote:
>> Jeff Davis  writes:
>>> Tom (tgl),
>>> Is my reasoning above acceptable?
>>
>> Uh, sorry, I've not been paying any attention to this thread for awhile.
>> What's the remaining questions at issue?
> 
> This patch is trying to improve the array_agg case where there are 
> many arrays being constructed simultaneously, such as in HashAgg.
> You strongly suggested that a commitable patch would differentiate
> between the grouped and ungrouped cases (or perhaps you meant
> differentiate between HashAgg and sorted aggregation?). Tomas's
> current patch does not do so; it does two main things:

I don't think that's entirely true. The problem with the initial
(experimental) patch was that while it fixed aggregate queries, it
mostly ignored all the other callers, and either resulted in memory
corruption (unexpected pfree) or bloat (when not doint the pfree).

Tom's message where he points that out is here:
http://www.postgresql.org/message-id/20707.1396372...@sss.pgh.pa.us

The current patch does that distinction properly (IMHO), because it does
this distinction - all the callers using the underlying array functions
will use the original approach (with subcontexts), and only the
array_agg uses the new API and forces subcontext=false.

>1. Uses a common memory context for arrays being constructed by
> array_agg in any context (ungrouped, sorted agg, and HashAgg)
>2. Reduces the initial array allocation to 8 elements from 64, but
> preserves doubling behavior

Yes, that's true. I'd like to point out that while the current code uses
64 items, it also uses 8kB per-grop contexts. That's slightly overkill I
guess ...

> I don't see either as a big problem, but perhaps there are some 
> downsides in some cases. I think a worst case would be a slowdown in 
> the sorted agg case where every group has 64 elements, so I'll try
> to see if that's a real problem or not. If you saw a bigger problem, 
> please let me know; and if not, I'll proceed with the review.

FWIW I've done a fair amount of measurements and not noticed any
measurable difference (unless using a rather crazy testcase, IIRC).
Certainly the issues with excessive memory consumption (and swapping)
were much worse.

> There are also some other concerns I have about the API ugliness,
> which I believe is the reason there's so much discussion about making
> the comments better. The reason the API is ugly is for backwards
> compatibility, so there's no perfect solution. Your opinion is welcome
> here too, but I mainly need to see if your objection above has been
> addressed.

I generally agree that having two API 'facets' with different behavior
is slightly awkward and assymetric, but I wouldn't call that ugly. I
actually modified both APIs initially, but I think Ali is right that not
breaking the existing API (and keeping the original behavior in that
case) is better. We can break it any time we want in the future, but
it's impossible to "unbreak it" ;-)

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Andrew Dunstan



On 01/20/2015 01:26 PM, Arne Scheffer wrote:

Interesting patch.
I did a quick review looking only into the patch file.

The "sum of variances" variable contains
the "sum of squared differences" instead, I think.


Umm, no. It's not.

   e->counters.sum_var_time +=
   (total_time - old_mean) * (total_time - e->counters.mean_time);

This is not a square that's being added. old_mean is not the same as 
e->counters.mean_time.


Since the variance is this value divided by (n - 1), AIUI, I think "sum 
of variances" isn't a bad description. I'm open to alternative suggestions.





And a very minor aspect:
The term "standard deviation" in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix "sample"
to indicate this beiing the estimator?



I don't understand. I'm following pretty exactly the calculations stated 
at 



I'm not a statistician. Perhaps others who are more literate in 
statistics can comment on this paragraph.





And I'm sure I'm missing C specifics (again)
(or it's the reduced patch file scope),
but you introduce sqrtd, but sqrt is called?



Good catch. Will fix.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 2:00 PM, Peter Geoghegan  wrote:
> Maybe that's the
> wrong way of fixing that, but for now I don't think it's acceptable
> that abbreviation isn't always used in certain cases where it could
> make sense (e.g. not for simple GroupAggregates with a single
> attribute -- only multiple attribute GroupAggregates). After all, most
> sort cases (e.g. B-Tree builds) didn't use SortSupport for several
> years, simply because no one got around to it until I finally did a
> few months back.


Exuse me. I mean that this *is* an acceptable restriction for the time being.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-01-20 Thread Jim Nasby


On 1/20/15 2:20 PM, Robert Haas wrote:

On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sen  wrote:

>So when I'm trying to decide what to audit, I have to:
>
> (a) check if the current user is mentioned in .roles; if so, audit.
>
> (b) check if the current user is a descendant of one of the roles
> mentioned in .roles; if not, no audit.
>
> (c) check for permissions granted to the "root" role and see if that
> tells us to audit.
>
>Is that right? If so, it would work fine. I don't look forward to trying
>to explain it to people, but yes, it would work (for anything you could
>grant permissions for).

I think this points to fundamental weakness in the idea of doing this
through the GRANT system.  Some people are going want to audit
everything a particular user does, some people are going to want to
audit all access to particular objects, and some people will have more
complicated requirements.  Some people will want to audit even
super-users, others especially super-users, others only non
super-users.  None of this necessarily matches up to the usual
permissions framework.


+1. In particular I'm very concerned with the idea of doing this via roles, 
because that would make it trivial for any superuser to disable auditing. The 
only good option I could see to provide this kind of flexibility would be 
allowing the user to provide a function that accepts role, object, etc and make 
return a boolean. The performance of that would presumably suck with anything 
but a C function, but we could provide some C functions to handle simple cases.

That said, I think the best idea at this stage is either log everything or 
nothing. We can always expand upon that later.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Peter Geoghegan

On Tue, Jan 20, 2015 at 3:46 AM, Andrew Gierth
 wrote:
> The comment in tuplesort_begin_datum that abbreviation can't be used
> seems wrong to me; why is the copy of the original value pointed to by
> stup->tuple (in the case of by-reference types, and abbreviation is
> obviously not needed for by-value types) not sufficient?

We haven't formalized the idea that pass-by-value types are not
targets for abbreviation (it's just that the practical application of
abbreviated keys is likely to be limited to pass-by-reference types,
generating a compact pass-by-value abbreviated representation). That
could be a useful restriction to formalize, and certainly seems likely
to be a harmless one, but for now that's the way it is.

It might be sufficient for some tuplesort_begin_datum() callers. Datum
tuple sorts require the original values. Aside from the formalization
of abbreviation only applying to pass-by-value types, you'd have to
teach tuplesort_getdatum() to reconstruct the non-abbreviated
representation transparently from each SortTuple's "tuple proper".
However, the actual tuplesort_getdatum() calls could be the dominant
cost, not the sort  (I'm not sure of that offhand - that's total
speculation).

Basically, the intersection of the datum sort case with abbreviated
keys seems complicated. I tended to think that the solution was to
force a heaptuple sort instead (where abbreviation naturally can be
used), since clearly that could work in some important cases like
nodeAgg.c, iff the gumption to do it that way was readily available.
Rightly or wrongly, I preferred that idea to the idea of teaching the
Datum case to handle abbreviation across the board. Maybe that's the
wrong way of fixing that, but for now I don't think it's acceptable
that abbreviation isn't always used in certain cases where it could
make sense (e.g. not for simple GroupAggregates with a single
attribute -- only multiple attribute GroupAggregates). After all, most
sort cases (e.g. B-Tree builds) didn't use SortSupport for several
years, simply because no one got around to it until I finally did a
few months back.

Note that most tuplesort non-users of abbreviation don't use
abbreviation for sensible reasons. For example, abbreviation simply
doesn't make sense for Top-N heap sorts, or MJExamineQuals(). The
non-single-attribute GroupAggregate/nodeAgg.c case seems bad, but I
don't have a good sense of how bad things are with orderedsetaggs.c
non-use is...it might matter less than the other case.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-20 Thread Magnus Hagander

On Tue, Jan 20, 2015 at 3:43 PM, Tom Lane  wrote:

> Michael Paquier  writes:
> > On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund 
> wrote:
> >> I think you misunderstood me ;). I was talking about the old CF
> >> application providing a RSS feed of all changes to all entries.
> >> https://commitfest-old.postgresql.org/action/commitfest_activity.rss
>
> > Oh, I didn't know this one. That's indeed useful.
>
> Personally, I never used the RSS feed as such, but I did often consult the
> "activity log" pages, which I think are equivalent:
>
> https://commitfest-old.postgresql.org/action/commitfest_activity?id=25
>
> That feature seems to be gone too :-(
>

Activity log is back, clickable from the top right corner when viewing the
frontpage or a commitfest.

RSS feed is there as well.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] logical column ordering

2015-01-20 Thread Alvaro Herrera

I've decided to abandon this patch.  I have spent too much time looking
at it now.

If anyone is interested in trying to study, I can provide the patches I
came up with, explanations, and references to prior discussion -- feel
free to ask.

My main motivation for this work is to enable a later patch for column
stores.  Right now, since columns have monotonically increasing attnums,
it's rather difficult to have columns that are stored elsewhere.  My
plan for that now is much more modest, something like adding a constant
1 to attnums and that would let us identify columns that are outside
the heap -- or something like that.  I haven't fully worked it out yet.


Just a few quick notes about this patch: last thing I was doing was mess
with setrefs.c so that Var nodes carry "varphysnum" annotations, which
are set to 0 during initial planner phases, and are turned into the
correct attphysnum (the value extracted from catalogs) so that
TupleDescs constructed from targetlists by ExecTypeFromTL and friends
can have the correct attphysnum too.  I think this part works correctly,
with the horrible exception that I had to do a relation_open() in
setrefs.c to get hold of the right attphysnum from a tupledesc obtained
from catalogs.  That's not acceptable at all; I think the right way to
do this would be to store a list of numbers earlier (not sure when) and
store it in RelOptInfo or RangeTableEntry; that would be accesible
during setrefs.c.

The other bit I did was modify all the heaptuple.c code so that it could
deal correctly with tupledescs that have attphysnums and attlognum in an
order different from stock attnum.  That took some time to get right,
but I think it's also correct now.

One issue I had was finding places that use "attnum" as an index into
the tupledesc "attrs" array.  I had to examine all these places and
change them to use a "physattrs" array, which is an array that has been
sorted by physical number.  I don't think all the changes are correct,
and I'm not sure that I caught them all.


Anyway it seems to me that this is "mostly there".  If somebody is
interested in learning executor code, this project would be damn cool to
get done.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-20 Thread David Johnston

On Tue, Jan 20, 2015 at 1:24 PM, Jim Nasby  wrote:

> On 1/16/15 10:32 PM, David G Johnston wrote:

> Two changes solve this problem in what seems to be a clean way.
>> 1) Upon each parsing of postgresql.conf we store all assigned variables
>> somewhere
>>
>
> Parsing is relatively cheap, and it's not like we need high performance
> from this. So, -1 on permanent storage.

OK

>
>
>  2) We display these assignments in a new pg_settings column named
>> "system_reset_val"
>>
>> I would also extend this to include:
>> a) upon each parsing of postgresql.auto.conf we store all assigned
>> variables
>> somewhere (maybe the same place as postgresql.conf and simply label the
>> file
>> source)
>>
>
> You can not assume there are only postgresql.conf and
> postgresql.auto.conf. Complex environments will have multiple included
> files.
>

#include files still appear to come from postgresql.conf; I'm not
proposing we try and memorize every single instance of a variable
declaration and provide a global "overlaps" query - though that piece
already seems to be in place...

>  b) add an "alter_system_val" field to show that value (or null)
>> c) add a "db_role_val" to show the current value for the session via
>> pg_db_role_setting
>>
>
> You're forgetting that there are also per-role settings. And I'm with
> Robert; what's wrong with sourcefile and sourceline? Perhaps we just need
> to teach those about ALTER ROLE SET and ALTER DATABASE SET (if they don't
> already know about them).
>

Actually, there are not separate PER ROLE and PER DATABASE settings even
though there are different SQL commands.  The catalog is simply
"pg_db_role_setting" with the use of "all" tags (i.e., 0) as necessary.
But I see where you are going and do not disagree that precedence
information could be useful.

sourceline and sourcefile pertain only to the current value while the
point of adding these other pieces is to provide a snapshot of all the
different mappings that the system knows about; instead of having to tell a
user to go look in two different files (and associated includes) and a
database catalog to find out what possible values are in place.  That
doesn't solve the need to scan the catalog to see other possible values -
though you could at least put a counter in pg_settings that indicates how
many pg_db_role_setting entries reference the given variable so that if
non-zero the user is clued into the fact that they need to check out said
catalog table.

>
>  c.1) add a "db_role_id" to show the named user that is being used for the
>> db_role_val lookup
>>
>> The thinking for c.1 is that in situations with role hierarchies and SET
>> ROLE usage it would be nice to be able to query what the connection role -
>> the one used during variable lookup - is.
>>
>
> I'm losing track of exactly what we're trying to solve here, but...
>
> If the goal is to figure out what settings would be in place for a
> specific user connecting to a specific database, then we should create a
> SRF that does just that (accepting a database name and role name). You
> could then do...
>
> SELECT * FROM pg_show_all_settings( 'database', 'role' ) a;
>

This is fine - but I'm thinking about situations where a user immediately
SET ROLE on their session and typically operate as said user; if they try
doing an ALTER ROLE SET for this SET ROLE user it will not work because
their login user is what matters.  This is probably a solution looking for
a problem but it is a dynamic that one needs to be aware of.

>
>  I'm probably going overkill on this but there are not a lot of difference
>> sources nor do they change frequently so extending the pg_settings view to
>> be more of a one-stop-shopping for this information seems to make sense.
>>
>
> Speaking of overkill... one thing that you currently can't do is find out
> what #includes have been processed. Perhaps it's worth having a SRF that
> would return that info...
>
>  As it relates back to this thread the desired "merging" ends up being done
>> inside this view and at least gives the DBA a central location (well,
>> along
>> with pg_db_role_setting) to go and look at the configuration landscape for
>> the system.
>>
>
> I think the goal is good, but the interface needs to be rethought.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

My main concern with the UI would be too-much-information; but otherwise
if we accept the premise that we should include as much as possible and
then let the user (or us) provide more useful subsets based upon common
use-cases the main issue is making sure we've identified all of the
relevant information that needs to be captured - even if it is something as
minor as the whole logon vs. current role dynamic.  I'm ignoring the
cost/benefit aspect of implementation for the moment largely because I do
known enough about the backend to make reasonable comments.  But much of
the data is already present in

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Jim Nasby

On 1/19/15 7:20 AM, Robert Haas wrote:

>Another thing is that I think prefetching is not supported on all platforms
>(Windows) and for such systems as per above algorithm we need to
>rely on block-by-block method.

Well, I think we should try to set up a test to see if this is hurting
us.  First, do a sequential-scan of a related too big at least twice
as large as RAM.  Then, do a parallel sequential scan of the same
relation with 2 workers.  Repeat these in alternation several times.
If the operating system is accomplishing meaningful readahead, and the
parallel sequential scan is breaking it, then since the test is
I/O-bound I would expect to see the parallel scan actually being
slower than the normal way.

Or perhaps there is some other test that would be better (ideas
welcome) but the point is we may need something like this, but we
should try to figure out whether we need it before spending too much
time on it.

I'm guessing that not all supported platforms have prefetching that actually 
helps us... but it would be good to actually know if that's the case.

Where I think this gets a lot more interesting is if we could apply this to an 
index scan. My thought is that would result in one worker mostly being 
responsible for advancing the index scan itself while the other workers were 
issuing (and waiting on) heap IO. So even if this doesn't turn out to be a win 
for seqscan, there's other places we might well want to use it.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-20 Thread Jim Nasby


On 1/20/15 11:12 AM, Pavel Stehule wrote:

I am sending updated version - it allow third optional argument that specify 
where searching should to start. With it is possible repeatably call this 
function.


What happened to returning an array of offsets? I think that would be both 
easier to use than this version as well as performing better.

I see you dropped multi-dimension support, but I think that's fine.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-20 Thread Jim Nasby

On 1/16/15 10:32 PM, David G Johnston wrote:

One thought I have in this line is that currently there doesn't seem

>>>to

>>> >be
>>> >a way to know if the setting has an entry both in postgresql.conf and
>>> >postgresql.auto.conf, if we can have some way of knowing the same
>>> >(pg_settings?), then it could be convenient for user to decide if the
>>> >value
>>> >in postgresql.auto.conf is useful or not and if it's not useful then

>>>use

>>> >Alter System .. Reset command to remove the same from
>>> >postgresql.auto.conf.

>>>
>>>I think one way is that pg_settings has file name of variables,  But
>>>It would not affect to currently status of postgresql.conf
>>>So we would need to parse postgresql.conf again at that time.
>>>

>>
>>Yeah that could be a possibility, but I think that will break the
>>existing
>>command('s) as this is the common infrastructure used for SHOW ..
>>commands as well which displays the guc value that is used by
>>current session rather than the value in postgresql.conf.

>
>You're right.
>pg_setting and SHOW command use value in current session rather than
>config file.
>It might break these common infrastructure.

Two changes solve this problem in what seems to be a clean way.
1) Upon each parsing of postgresql.conf we store all assigned variables
somewhere

Parsing is relatively cheap, and it's not like we need high performance from 
this. So, -1 on permanent storage.

2) We display these assignments in a new pg_settings column named
"system_reset_val"

I would also extend this to include:
a) upon each parsing of postgresql.auto.conf we store all assigned variables
somewhere (maybe the same place as postgresql.conf and simply label the file
source)

You can not assume there are only postgresql.conf and postgresql.auto.conf. 
Complex environments will have multiple included files.

b) add an "alter_system_val" field to show that value (or null)
c) add a "db_role_val" to show the current value for the session via
pg_db_role_setting

You're forgetting that there are also per-role settings. And I'm with Robert; 
what's wrong with sourcefile and sourceline? Perhaps we just need to teach 
those about ALTER ROLE SET and ALTER DATABASE SET (if they don't already know 
about them).

c.1) add a "db_role_id" to show the named user that is being used for the
db_role_val lookup

The thinking for c.1 is that in situations with role hierarchies and SET
ROLE usage it would be nice to be able to query what the connection role -
the one used during variable lookup - is.

I'm losing track of exactly what we're trying to solve here, but...

If the goal is to figure out what settings would be in place for a specific 
user connecting to a specific database, then we should create a SRF that does 
just that (accepting a database name and role name). You could then do...

SELECT * FROM pg_show_all_settings( 'database', 'role' ) a;

I'm probably going overkill on this but there are not a lot of difference
sources nor do they change frequently so extending the pg_settings view to
be more of a one-stop-shopping for this information seems to make sense.

Speaking of overkill... one thing that you currently can't do is find out what 
#includes have been processed. Perhaps it's worth having a SRF that would 
return that info...

As it relates back to this thread the desired "merging" ends up being done
inside this view and at least gives the DBA a central location (well, along
with pg_db_role_setting) to go and look at the configuration landscape for
the system.

I think the goal is good, but the interface needs to be rethought.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 1:05 AM, Abhijit Menon-Sen  wrote:
> So when I'm trying to decide what to audit, I have to:
>
> (a) check if the current user is mentioned in .roles; if so, audit.
>
> (b) check if the current user is a descendant of one of the roles
> mentioned in .roles; if not, no audit.
>
> (c) check for permissions granted to the "root" role and see if that
> tells us to audit.
>
> Is that right? If so, it would work fine. I don't look forward to trying
> to explain it to people, but yes, it would work (for anything you could
> grant permissions for).

I think this points to fundamental weakness in the idea of doing this
through the GRANT system.  Some people are going want to audit
everything a particular user does, some people are going to want to
audit all access to particular objects, and some people will have more
complicated requirements.  Some people will want to audit even
super-users, others especially super-users, others only non
super-users.  None of this necessarily matches up to the usual
permissions framework.

>> Have you considered splitting pgaudit.c into multiple files, perhaps
>> along the pre/post deparse lines?
>
> If such a split were done properly, then could the backwards-compatible
> version be more acceptable for inclusion in contrib in 9.5? If so, I'll
> look into it.

We're not going to include code in contrib that has leftovers in it
for compatibility with earlier source trees.  That's been discussed on
this mailing list many times and the policy is clear.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Jeff Davis

On Tue, Jan 20, 2015 at 6:44 AM, Tom Lane  wrote:
> Jeff Davis  writes:
>> Tom (tgl),
>> Is my reasoning above acceptable?
>
> Uh, sorry, I've not been paying any attention to this thread for awhile.
> What's the remaining questions at issue?

This patch is trying to improve the array_agg case where there are
many arrays being constructed simultaneously, such as in HashAgg. You
strongly suggested that a commitable patch would differentiate between
the grouped and ungrouped cases (or perhaps you meant differentiate
between HashAgg and sorted aggregation?). Tomas's current patch does
not do so; it does two main things:

   1. Uses a common memory context for arrays being constructed by
array_agg in any context (ungrouped, sorted agg, and HashAgg)
   2. Reduces the initial array allocation to 8 elements from 64, but
preserves doubling behavior

I don't see either as a big problem, but perhaps there are some
downsides in some cases. I think a worst case would be a slowdown in
the sorted agg case where every group has 64 elements, so I'll try to
see if that's a real problem or not. If you saw a bigger problem,
please let me know; and if not, I'll proceed with the review.

There are also some other concerns I have about the API ugliness,
which I believe is the reason there's so much discussion about making
the comments better. The reason the API is ugly is for backwards
compatibility, so there's no perfect solution. Your opinion is welcome
here too, but I mainly need to see if your objection above has been
addressed.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2015-01-20 Thread Arne Scheffer

Interesting patch.
I did a quick review looking only into the patch file.

The "sum of variances" variable contains
the "sum of squared differences" instead, I think.

And a very minor aspect:
The term "standard deviation" in your code stands for
(corrected) sample standard deviation, I think,
because you devide by n-1 instead of n to keep the
estimator unbiased.
How about mentioning the prefix "sample"
to indicate this beiing the estimator?

And I'm sure I'm missing C specifics (again)
(or it's the reduced patch file scope),
but you introduce sqrtd, but sqrt is called?

VlG

Arne












-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing buildfarm disk usage: remove temp installs when done

2015-01-20 Thread Andrew Dunstan



On 01/19/2015 09:53 AM, Tom Lane wrote:

Andrew Dunstan  writes:

But I'm wondering if we should look at using the tricks git-new-workdir
uses, setting up symlinks instead of a full clone. Then we'd have one
clone with a bunch of different work dirs. That plus a but of explicitly
done garbage collection and possibly a periodic re-clone might do the trick.

Yeah, I was wondering whether it'd be okay to depend on git-new-workdir.
That would fix the problem pretty nicely.  But in the installations I've
seen, that's not in PATH but squirreled away in some hard-to-guess library
directory ...




We should move this discussion to the buildfarm members list.

I'll be publishing a patch there.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-20 Thread Pavel Stehule

2015-01-20 19:16 GMT+01:00 Pavel Stehule :

> Hi
>
> I cannot to set my name as author for patch:
> https://commitfest.postgresql.org/4/112/
>

It is solved now - I don't understand a autocomplete in first moment

All works well

Regards

Pavel



>
> Regards
>
> Pavel
>
> 2015-01-13 6:35 GMT+01:00 Magnus Hagander :
>
>> Hi!
>>
>> Last I said something about the new CF app I said I was planning to
>> deploy it over the holidays, and that clearly did not happen.
>>
>> But I've been talking to Michael, as the current cf-chief, and doing some
>> further testing with it, and we think now is a good time to go for it :) As
>> the plan is to close out the current CF just a few days from now, we're
>> going to use that and try to swap it out when traffic is at least at it's
>> lowest (even if we're well aware it's not going to be zero).
>>
>> So based on this, we plan to:
>>
>> In the late evening on Jan 19th (evening European time that is), I will
>> make the current CF app readonly, and move it to a new url where it will
>> remain available for the foreseeable future (we have a lot of useful data
>> in it after all).
>>
>> Once this is done, Michael (primarily) will start working on syncing up
>> the information about the latest patches into the new app. Much info is
>> already synced there, but to make sure all the latest changes are included.
>>
>> In the morning European time on the 20th, I'll take over and try to
>> finish up what's left. And sometime during the day when things are properly
>> in sync, we'll open up the new app for business to all users.
>>
>> There are surely some bugs remaining in the system, so please have some
>> oversight with that over the coming days/weeks. I'll try to get onto fixing
>> them as soon as I can - and some others can look at that as well if it's
>> something critical at least.
>>
>> Further status updates to come as we start working on it...
>>
>> --
>>  Magnus Hagander
>>  Me: http://www.hagander.net/
>>  Work: http://www.redpill-linpro.com/
>>
>
>

Re: [HACKERS] New CF app deployment

2015-01-20 Thread Pavel Stehule

Hi

I cannot to set my name as author for patch:
https://commitfest.postgresql.org/4/112/

Regards

Pavel

2015-01-13 6:35 GMT+01:00 Magnus Hagander :

> Hi!
>
> Last I said something about the new CF app I said I was planning to deploy
> it over the holidays, and that clearly did not happen.
>
> But I've been talking to Michael, as the current cf-chief, and doing some
> further testing with it, and we think now is a good time to go for it :) As
> the plan is to close out the current CF just a few days from now, we're
> going to use that and try to swap it out when traffic is at least at it's
> lowest (even if we're well aware it's not going to be zero).
>
> So based on this, we plan to:
>
> In the late evening on Jan 19th (evening European time that is), I will
> make the current CF app readonly, and move it to a new url where it will
> remain available for the foreseeable future (we have a lot of useful data
> in it after all).
>
> Once this is done, Michael (primarily) will start working on syncing up
> the information about the latest patches into the new app. Much info is
> already synced there, but to make sure all the latest changes are included.
>
> In the morning European time on the 20th, I'll take over and try to finish
> up what's left. And sometime during the day when things are properly in
> sync, we'll open up the new app for business to all users.
>
> There are surely some bugs remaining in the system, so please have some
> oversight with that over the coming days/weeks. I'll try to get onto fixing
> them as soon as I can - and some others can look at that as well if it's
> something critical at least.
>
> Further status updates to come as we start working on it...
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Thom Brown

On 20 January 2015 at 14:29, Amit Kapila  wrote:

> On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila 
> wrote:
> > On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas 
> wrote:
> > >
> > > Yeah, you need two separate global variables pointing to shm_mq
> > > objects, one of which gets used by pqmq.c for errors and the other of
> > > which gets used by printtup.c for tuples.
> > >
> >
> > Okay, I will try to change the way as suggested without doing
> > switching, but this way we need to do it separately for 'T', 'D', and
> > 'C' messages.
> >
>
> I have taken care of integrating the parallel sequence scan with the
> latest patch posted (parallel-mode-v1.patch) by Robert at below
> location:
>
> http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com
>
> Changes in this version
> ---
> 1. As mentioned previously, I have exposed one parameter
> ParallelWorkerNumber as used in parallel-mode patch.
> 2. Enabled tuple queue to be used for passing tuples from
> worker backend to master backend along with error queue
> as per suggestion by Robert in the mail above.
> 3. Involved master backend to scan the heap directly when
> tuples are not available in any shared memory tuple queue.
> 4. Introduced 3 new parameters (cpu_tuple_comm_cost,
> parallel_setup_cost, parallel_startup_cost) for deciding the cost
> of parallel plan.  Currently, I have kept the default values for
> parallel_setup_cost and parallel_startup_cost as 0.0, as those
> require some experiments.
> 5. Fixed some issues (related to memory increase as reported
> upthread by Thom Brown and general feature issues found during
> test)
>
> Note - I have yet to handle the new node types introduced at some
> of the places and need to verify prepared queries and some other
> things, however I think it will be good if I can get some feedback
> at current stage.
>

I'm getting an issue:

 ➤ psql://thom@[local]:5488/pgbench

# set parallel_seqscan_degree = 8;
SET
Time: 0.248 ms

 ➤ psql://thom@[local]:5488/pgbench

# explain select c1 from t1;
  QUERY PLAN
--
 Parallel Seq Scan on t1  (cost=0.00..21.22 rows=100 width=4)
   Number of Workers: 8
   Number of Blocks Per Worker: 11
(3 rows)

Time: 0.322 ms

# explain analyse select c1 from t1;
QUERY
PLAN
---
 Parallel Seq Scan on t1  (cost=0.00..21.22 rows=100 width=4) (actual
time=0.024..13.468 rows=100 loops=1)
   Number of Workers: 8
   Number of Blocks Per Worker: 11
 Planning time: 0.040 ms
 Execution time: 13.862 ms
(5 rows)

Time: 14.188 ms

 ➤ psql://thom@[local]:5488/pgbench

# set parallel_seqscan_degree = 10;
SET
Time: 0.219 ms

 ➤ psql://thom@[local]:5488/pgbench

# explain select c1 from t1;
  QUERY PLAN
--
 Parallel Seq Scan on t1  (cost=0.00..19.18 rows=100 width=4)
   Number of Workers: 10
   Number of Blocks Per Worker: 9
(3 rows)

Time: 0.375 ms

 ➤ psql://thom@[local]:5488/pgbench

# explain analyse select c1 from t1;


So setting parallel_seqscan_degree above max_worker_processes causes the
CPU to max out, and the query never returns, or at least not after waiting
2 minutes.  Shouldn't it have a ceiling of max_worker_processes?

The original test I performed where I was getting OOM errors now appears to
be fine:

# explain (analyse, buffers, timing) select distinct bid from
pgbench_accounts;
   QUERY
PLAN

 HashAggregate  (cost=1400411.11..1400412.11 rows=100 width=4) (actual
time=8504.333..8504.335 rows=13 loops=1)
   Group Key: bid
   Buffers: shared hit=32 read=18183
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..1375411.11
rows=1000 width=4) (actual time=0.054..7183.494 rows=1000 loops=1)
 Number of Workers: 8
 Number of Blocks Per Worker: 18215
 Buffers: shared hit=32 read=18183
 Planning time: 0.058 ms
 Execution time: 8876.967 ms
(9 rows)

Time: 8877.366 ms

Note that I increased seq_page_cost to force a parallel scan in this case.

Thom

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-20 Thread Pavel Stehule

Hi

I am sending updated version - it allow third optional argument that
specify where searching should to start. With it is possible repeatably
call this function.

Regards

Pavel

2015-01-17 23:43 GMT+01:00 Pavel Stehule :

> Hi
>
> here is a proof concept of array_offset function
>
> possible question:
>
> * used comparation "=" or "IS NOT DISTINCT FROM"
>
> In this initial proof concept I used "IS NOT DISTINCT FROM" operator - but
> my opinion is not strong in this question. Both has some advantages and
> disadvantages.
>
> Regards
>
> Pavel
>
>
> 2015-01-16 19:12 GMT+01:00 Pavel Stehule :
>
>>
>>
>> 2015-01-16 18:37 GMT+01:00 Jim Nasby :
>>
>>> On 1/16/15 11:16 AM, Pavel Stehule wrote:
>>>


 2015-01-16 17:57 GMT+01:00 Jim Nasby >>> jim.na...@bluetreble.com>>:

 On 1/16/15 3:39 AM, Pavel Stehule wrote:

 I am proposing a simple function, that returns a position of
 element in array.


 Yes please!

 FUNCTION array_position(anyarray, anyelement) RETURNS int


 That won't work on a multi-dimensional array. Ideally it needs to
 accept a slice or an element and return the specifier for the slice.


 It is question, what is a result - probably, there can be a
 multidimensional variant, where result will be a array

 array_position([1,2,3],2) --> 2
 array_position([[1,2],[2,3],[3,4]], [2,3]) --> 2 /* 2nd parameter
 should to have N-1 dimension of first parameter */

>>>
>>> The problem with that is you can't actually use '2' to get [2,3] back:
>>>
>>> select (array[[1,2,3],[4,5,6],[7,8,9]])[1] IS NULL;
>>>  ?column?
>>> --
>>>  t
>>> (1 row)
>>>
>>
>> yes, but when you are searching a array in array you can use a full slice
>> selection:
>>
>> postgres=# select (ARRAY[[1,2],[4,5]])[1][1:2]; -- [1:2] should be a
>> constant every time in this case -- so it should not be returned
>>   array
>> -
>>  {{1,2}}
>> (1 row)
>>
>>
>>
>>
>>>
>>> I think the bigger problem here is we need something better than slices
>>> for handling subsets of arrays. Even if the function returned [2:2] it's
>>> still going to behave differently than it will in the non-array case
>>> because you won't be getting the expected number of dimensions back. :(
>>>
>>
>> you cannot to return a slice and I don't propose it, although we can
>> return a range type or array of range type - but still we cannot to use
>> range for a arrays.
>>
>>>
>>>  array_position_md([1,2,3],2) --> [2]
 array_position_md([[1,2],[2,3],[3,4]], 2) --> [2,1]

 another question is how to solve more than one occurrence on one value
 - probably two sets of functions - first returns first occurrence of value,
 second returns set of occurrence

>>>
>>> Gee, if only way had some way to return multiple elements of
>>> something... ;P
>>>
>>> In other words, I think all of these should actually return an array of
>>> positions. I think it's OK for someone that only cares about the first
>>> instance to just do [1].
>>
>>
>> there can be two functions - "position" - returns first and "positions"
>> returns all as a array
>>
>>
>>>
>>> --
>>> Jim Nasby, Data Architect, Blue Treble Consulting
>>> Data in Trouble? Get it in Treble! http://BlueTreble.com
>>>
>>
>>
>
diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml
new file mode 100644
index 9ea1068..b3630b4
*** a/doc/src/sgml/array.sgml
--- b/doc/src/sgml/array.sgml
*** SELECT * FROM sal_emp WHERE pay_by_quart
*** 600,605 
--- 600,614 
index, as described in .
   
  
+  
+   You can also search any value in array using the array_offset
+   function (It returns a position of first occurrence of value in the array):
+ 
+ 
+ SELECT array_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon');
+ 
+  
+ 
   

 Arrays are not sets; searching for specific array elements
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 5e7b000..62c9f7f
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT NULLIF(value, '(none)') ...
*** 11474,11479 
--- 11474,11482 
  array_lower


+ array_offset
+   
+   
  array_prepend


*** SELECT NULLIF(value, '(none)') ...
*** 11592,11597 
--- 11595,11613 
 
 
  
+  
+   array_offset(anyarray, anyelement , int)
+  
+ 
+ int
+ returns a offset of first occurrence of some element in a array. It uses
+ a IS NOT DISTINCT FROM operator for comparation. Third
+ optional argument can specify a initial offset when searching starts. 
+ array_offset(ARRAY['sun','mon','tue','wen','thu','fri','sat'], 'mon')
+ 2
+
+
+ 
   
array_prepend(anyelement, anyarray)
   
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/u

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Thom Brown

On 20 January 2015 at 16:55, Amit Kapila  wrote:

>
> On Tue, Jan 20, 2015 at 9:43 PM, Thom Brown  wrote:
> >
> > On 20 January 2015 at 14:29, Amit Kapila 
> wrote:
> >>
> >> On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila 
> wrote:
> >> > On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas 
> wrote:
> >> > >
> >> > > Yeah, you need two separate global variables pointing to shm_mq
> >> > > objects, one of which gets used by pqmq.c for errors and the other
> of
> >> > > which gets used by printtup.c for tuples.
> >> > >
> >> >
> >> > Okay, I will try to change the way as suggested without doing
> >> > switching, but this way we need to do it separately for 'T', 'D', and
> >> > 'C' messages.
> >> >
> >>
> >> I have taken care of integrating the parallel sequence scan with the
> >> latest patch posted (parallel-mode-v1.patch) by Robert at below
> >> location:
> >>
> http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com
> >>
> >> Changes in this version
> >> ---
> >> 1. As mentioned previously, I have exposed one parameter
> >> ParallelWorkerNumber as used in parallel-mode patch.
> >> 2. Enabled tuple queue to be used for passing tuples from
> >> worker backend to master backend along with error queue
> >> as per suggestion by Robert in the mail above.
> >> 3. Involved master backend to scan the heap directly when
> >> tuples are not available in any shared memory tuple queue.
> >> 4. Introduced 3 new parameters (cpu_tuple_comm_cost,
> >> parallel_setup_cost, parallel_startup_cost) for deciding the cost
> >> of parallel plan.  Currently, I have kept the default values for
> >> parallel_setup_cost and parallel_startup_cost as 0.0, as those
> >> require some experiments.
> >> 5. Fixed some issues (related to memory increase as reported
> >> upthread by Thom Brown and general feature issues found during
> >> test)
> >>
> >> Note - I have yet to handle the new node types introduced at some
> >> of the places and need to verify prepared queries and some other
> >> things, however I think it will be good if I can get some feedback
> >> at current stage.
> >
> >
> > Which commit is this based against?  I'm getting errors with the latest
> master:
> >
>
> It seems to me that you have not applied parallel-mode patch
> before applying this patch, can you try once again by first applying
> the patch posted by Robert at below link:
>
> http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com
>
> commit-id used for this patch -  0b49642
>

D'oh.  Yes, you're completely right.  Works fine now.

Thanks.

Thom

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Amit Kapila

On Tue, Jan 20, 2015 at 9:43 PM, Thom Brown  wrote:
>
> On 20 January 2015 at 14:29, Amit Kapila  wrote:
>>
>> On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila 
wrote:
>> > On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas 
wrote:
>> > >
>> > > Yeah, you need two separate global variables pointing to shm_mq
>> > > objects, one of which gets used by pqmq.c for errors and the other of
>> > > which gets used by printtup.c for tuples.
>> > >
>> >
>> > Okay, I will try to change the way as suggested without doing
>> > switching, but this way we need to do it separately for 'T', 'D', and
>> > 'C' messages.
>> >
>>
>> I have taken care of integrating the parallel sequence scan with the
>> latest patch posted (parallel-mode-v1.patch) by Robert at below
>> location:
>>
http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com
>>
>> Changes in this version
>> ---
>> 1. As mentioned previously, I have exposed one parameter
>> ParallelWorkerNumber as used in parallel-mode patch.
>> 2. Enabled tuple queue to be used for passing tuples from
>> worker backend to master backend along with error queue
>> as per suggestion by Robert in the mail above.
>> 3. Involved master backend to scan the heap directly when
>> tuples are not available in any shared memory tuple queue.
>> 4. Introduced 3 new parameters (cpu_tuple_comm_cost,
>> parallel_setup_cost, parallel_startup_cost) for deciding the cost
>> of parallel plan.  Currently, I have kept the default values for
>> parallel_setup_cost and parallel_startup_cost as 0.0, as those
>> require some experiments.
>> 5. Fixed some issues (related to memory increase as reported
>> upthread by Thom Brown and general feature issues found during
>> test)
>>
>> Note - I have yet to handle the new node types introduced at some
>> of the places and need to verify prepared queries and some other
>> things, however I think it will be good if I can get some feedback
>> at current stage.
>
>
> Which commit is this based against?  I'm getting errors with the latest
master:
>

It seems to me that you have not applied parallel-mode patch
before applying this patch, can you try once again by first applying
the patch posted by Robert at below link:
http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

commit-id used for this patch -  0b49642


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)

2015-01-20 Thread Robert Haas

On Mon, Jan 19, 2015 at 8:48 PM, Amit Langote
 wrote:
>>> Specifically, do we regard a partitions as pg_inherits children of its
>>> partitioning parent?
>>
>> I don't think this is totally an all-or-nothing decision.  I think
>> everyone is agreed that we need to not break things that work today --
>> e.g. Merge Append.  What that implies for pg_inherits isn't altogether
>> clear.
>
> One point is that an implementation may end up establishing the
> parent-partition hierarchy somewhere other than (or in addition to)
> pg_inherits. One intention would be to avoid tying partitioning scheme
> to certain inheritance features that use pg_inherits. For example,
> consider call sites of find_all_inheritors(). One notable example is
> Append/MergeAppend which would be of interest to partitioning. We would
> want to reuse that part of the infrastructure but we could might as well
> write an equivalent, say find_all_partitions() which scans something
> other than pg_inherits to get all partitions.

IMHO, there's little reason to avoid putting pg_inherits entries in
for the partitions, and then this just works.  We can find other ways
to make it work if that turns out to be better, but if we don't have
one, there's no reason to complicate things.

> Agree that some concrete idea of internal representation should help
> guide the catalog structure. If we are going to cache the partitioning
> info in relcache (which we most definitely will), then we should try to
> make sure to consider the scenario of having a lot of partitioned tables
> with a lot of individual partitions. It looks like it would be similar
> to a scenarios where there are a lot of inheritance hierarchies. But,
> availability of partitioning feature would definitely cause these
> numbers to grow larger. Perhaps this is an important point driving this
> discussion.

Yeah, it would be good if the costs of supporting, say, 1000
partitions were negligible.

> A primary question for me about partition-pruning is when do we do it?
> Should we model it after relation_excluded_by_constraints() and hence
> totally plan-time? But, the tone of the discussion is that we postpone
> partition-pruning to execution-time and hence my perhaps misdirected
> attempts to inject it into some executor machinery.

It's useful to prune partitions at plan time, because then you only
have to do the work once.  But sometimes you don't know enough to do
it at plan time, so it's useful to do it at execution time, too.
Then, you can do it differently for every tuple based on the actual
value you have.  There's no point in doing 999 unnecessary relation
scans if we can tell which partition the actual run-time value must be
in.  But I think execution-time pruning can be a follow-on patch.  If
you don't restrict the scope of the first patch as much as possible,
you're not going to have much luck getting this committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2015-01-20 Thread Robert Haas

On Mon, Jan 19, 2015 at 10:53 PM, tim_wilson  wrote:
> Was slightly optimistic that this comment in the release notes might mean
> that my bug with bloat on hot tables might have been fixed in 9.4
>
> /Make VACUUM properly report dead but not-yet-removable rows to the
> statistics collector (Hari Babu)
>
> Previously these were reported as live rows./
>
> Unfortunately that is not the case, and we still have the problem in 9.4

As far as I can tell from reviewing the thread, what we need to do
here is basically invent HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS.
There was a lot of concern about doing that in the back-branches, but
I'm skeptical that the concern is justified.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 11:18 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> > Would anybody object to me pushing this commit to branches 8.2 and 8.3?
>>
>> Since those branches are out of support, I am not sure what the point
>> is.  If we want people to be able to use those branches reasonably we
>> need to back-port fixes for critical security and stability issues,
>> not just this one thing.
>>
>> But maybe I am missing something.
>
> I just want to make it easy to compile those branches with current
> toolset so that I can study their behavior to suggest workarounds for
> customer problems etc -- nothing more.  I am not proposing to open them
> up again for support.

Oh, I see.  Well, that doesn't bother me, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 9:41 AM, Amit Kapila  wrote:
> It seems [WaitForBackgroundWorkerShutdown] has possibility to wait forever.
> Assume one of the worker is not able to start (not able to attach
> to shared memory or some other reason), then status returned by
> GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED
> and after that it can wait forever in WaitLatch.

I don't think that's right.  The status only remains
BGWH_NOT_YET_STARTED until the postmaster forks the child process.  At
that point it immediately changes to BGWH_STARTED.  If it starts up
and then dies early on, for example because it can't attach to shared
memory or somesuch, the status will change to BGWH_STOPPED.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

2015-01-20 Thread Alvaro Herrera

Robert Haas wrote:

> > Would anybody object to me pushing this commit to branches 8.2 and 8.3?
> 
> Since those branches are out of support, I am not sure what the point
> is.  If we want people to be able to use those branches reasonably we
> need to back-port fixes for critical security and stability issues,
> not just this one thing.
> 
> But maybe I am missing something.

I just want to make it easy to compile those branches with current
toolset so that I can study their behavior to suggest workarounds for
customer problems etc -- nothing more.  I am not proposing to open them
up again for support.  Of course, I can carry the patched branches
locally if there is strong opposition, but since it's harmless, I don't
see why would there be any such.  Another easy workaround is to add -O0
to CFLAGS, and I can script that easily too.

Without this patch or -O0, initdb fails with 

inicializando pg_authid ... FATAL:  wrong number of index expressions
SENTENCIA:  CREATE TRIGGER pg_sync_pg_database   AFTER INSERT OR UPDATE OR 
DELETE ON pg_database   FOR EACH STATEMENT EXECUTE PROCEDURE 
flatfile_update_trigger();

There is the additional problem that contrib/cube fails to compile, but
I don't care enough about that one:

/pgsql/source/REL8_3_STABLE/contrib/cube/cubeparse.y:61:17: error: ‘result’ 
undeclared (first use in this function)
  *((void **)result) = write_box( dim, $2, $4 );
 ^

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

2015-01-20 Thread Andres Freund

On 2015-01-20 11:10:53 -0500, Robert Haas wrote:
> On Tue, Jan 20, 2015 at 10:30 AM, Alvaro Herrera
>  wrote:
> > Tom Lane wrote:
> >> Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches.
> >> With this optimization flag enabled, recent versions of gcc can generate
> >> incorrect code that assumes variable-length arrays (such as oidvector)
> >> are actually fixed-length because they're embedded in some larger struct.
> >> The known instance of this problem was fixed in 9.2 and up by commit
> >> 8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides
> >> actually-variable-length catalog fields from the compiler altogether.
> >> And we plan to gradually convert variable-length fields to official
> >> "flexible array member" notation over time, which should prevent this type
> >> of bug from reappearing as gcc gets smarter.  We're not going to try to
> >> back-port those changes into older branches, though, so apply this
> >> band-aid instead.
> >
> > Would anybody object to me pushing this commit to branches 8.2 and 8.3?
> 
> Since those branches are out of support, I am not sure what the point
> is.  If we want people to be able to use those branches reasonably we
> need to back-port fixes for critical security and stability issues,
> not just this one thing.
> 
> But maybe I am missing something.

Supporting and being able to compile and run 'make check' (which doesn't
complete >= gcc 4.8) aren't the same thing though. And we e.g. try to
provide pg_dump and libpq support for older versions, which is hard to
ensure if you can't run them.

I personally think that being able to at least compile/make check old
versions a bit longer is a good idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Thom Brown

On 20 January 2015 at 14:29, Amit Kapila  wrote:

> On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila 
> wrote:
> > On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas 
> wrote:
> > >
> > > Yeah, you need two separate global variables pointing to shm_mq
> > > objects, one of which gets used by pqmq.c for errors and the other of
> > > which gets used by printtup.c for tuples.
> > >
> >
> > Okay, I will try to change the way as suggested without doing
> > switching, but this way we need to do it separately for 'T', 'D', and
> > 'C' messages.
> >
>
> I have taken care of integrating the parallel sequence scan with the
> latest patch posted (parallel-mode-v1.patch) by Robert at below
> location:
>
> http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com
>
> Changes in this version
> ---
> 1. As mentioned previously, I have exposed one parameter
> ParallelWorkerNumber as used in parallel-mode patch.
> 2. Enabled tuple queue to be used for passing tuples from
> worker backend to master backend along with error queue
> as per suggestion by Robert in the mail above.
> 3. Involved master backend to scan the heap directly when
> tuples are not available in any shared memory tuple queue.
> 4. Introduced 3 new parameters (cpu_tuple_comm_cost,
> parallel_setup_cost, parallel_startup_cost) for deciding the cost
> of parallel plan.  Currently, I have kept the default values for
> parallel_setup_cost and parallel_startup_cost as 0.0, as those
> require some experiments.
> 5. Fixed some issues (related to memory increase as reported
> upthread by Thom Brown and general feature issues found during
> test)
>
> Note - I have yet to handle the new node types introduced at some
> of the places and need to verify prepared queries and some other
> things, however I think it will be good if I can get some feedback
> at current stage.
>

Which commit is this based against?  I'm getting errors with the latest
master:

thom@swift:~/Development/postgresql$ patch -p1 <
~/Downloads/parallel_seqscan_v4.patch
patching file src/backend/access/Makefile
patching file src/backend/access/common/printtup.c
patching file src/backend/access/shmmq/Makefile
patching file src/backend/access/shmmq/shmmqam.c
patching file src/backend/commands/explain.c
Hunk #1 succeeded at 721 (offset 8 lines).
Hunk #2 succeeded at 918 (offset 8 lines).
Hunk #3 succeeded at 1070 (offset 8 lines).
Hunk #4 succeeded at 1337 (offset 8 lines).
Hunk #5 succeeded at 2239 (offset 83 lines).
patching file src/backend/executor/Makefile
patching file src/backend/executor/execProcnode.c
patching file src/backend/executor/execScan.c
patching file src/backend/executor/execTuples.c
patching file src/backend/executor/nodeParallelSeqscan.c
patching file src/backend/executor/nodeSeqscan.c
patching file src/backend/libpq/pqmq.c
Hunk #1 succeeded at 23 with fuzz 2 (offset -3 lines).
Hunk #2 FAILED at 63.
Hunk #3 succeeded at 132 (offset -31 lines).
1 out of 3 hunks FAILED -- saving rejects to file
src/backend/libpq/pqmq.c.rej
patching file src/backend/optimizer/path/Makefile
patching file src/backend/optimizer/path/allpaths.c
patching file src/backend/optimizer/path/costsize.c
patching file src/backend/optimizer/path/parallelpath.c
patching file src/backend/optimizer/plan/createplan.c
patching file src/backend/optimizer/plan/planner.c
patching file src/backend/optimizer/plan/setrefs.c
patching file src/backend/optimizer/util/pathnode.c
patching file src/backend/postmaster/Makefile
patching file src/backend/postmaster/backendworker.c
patching file src/backend/postmaster/postmaster.c
patching file src/backend/tcop/dest.c
patching file src/backend/tcop/postgres.c
Hunk #1 succeeded at 54 (offset -1 lines).
Hunk #2 succeeded at 1132 (offset -1 lines).
patching file src/backend/utils/misc/guc.c
patching file src/backend/utils/misc/postgresql.conf.sample
can't find file to patch at input line 2105
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|diff --git a/src/include/access/parallel.h b/src/include/access/parallel.h
|index 761ba1f..00ad468 100644
|--- a/src/include/access/parallel.h
|+++ b/src/include/access/parallel.h
--
File to patch:

-- 
Thom

Re: [HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

2015-01-20 Thread Robert Haas

On Tue, Jan 20, 2015 at 10:30 AM, Alvaro Herrera
 wrote:
> Tom Lane wrote:
>> Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches.
>> With this optimization flag enabled, recent versions of gcc can generate
>> incorrect code that assumes variable-length arrays (such as oidvector)
>> are actually fixed-length because they're embedded in some larger struct.
>> The known instance of this problem was fixed in 9.2 and up by commit
>> 8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides
>> actually-variable-length catalog fields from the compiler altogether.
>> And we plan to gradually convert variable-length fields to official
>> "flexible array member" notation over time, which should prevent this type
>> of bug from reappearing as gcc gets smarter.  We're not going to try to
>> back-port those changes into older branches, though, so apply this
>> band-aid instead.
>
> Would anybody object to me pushing this commit to branches 8.2 and 8.3?

Since those branches are out of support, I am not sure what the point
is.  If we want people to be able to use those branches reasonably we
need to back-port fixes for critical security and stability issues,
not just this one thing.

But maybe I am missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merging postgresql.conf and postgresql.auto.conf

2015-01-20 Thread Robert Haas

On Mon, Jan 19, 2015 at 10:39 PM, Amit Kapila  wrote:
> I think whichever process reads postgresql.conf/postgresql.auto.conf have
> to do this (unless we restrict that this will be done at some other time)
> and
> postmaster is one of them.  It seems to me that it is not good idea unless
> we do it at other time like populating entries for a view.

Well, the postmaster does not have database access, and neither do
most of the auxiliary processes.  The autovacuum launcher has very
limited database access (shared catalogs only).

Making the postmaster access the database is a complete non-starter;
we have worked very hard to avoid that, for reasons of overall system
robustness.  If the postmaster crashes or locks up, manual
intervention is required; if any other process crashes, the postmaster
can reset the whole system.

>> Independently of that, it sounds like solving the problem from the
>> wrong end.  I think the idea of ALTER SYSTEM .. SET ought to be that
>> you should EITHER edit postgresql.conf for all your configuration
>> changes, OR you should use ALTER SYSTEM .. SET for all of your
>> changes.  If you mix-and-match, well, that's certainly allowed and it
>> will work and everything, but you - as a human being - might get
>> confused.
>
> Right, but we can't completely eliminate such a possibility (as an
> example we have some default settings like max_connections,
> shared_buffers, etc).  I agree with you that users should use only
> way of changing the settings, however for certain rare cases (default
> settings or some other) we can provide a way for user to know which
> of the settings are duplicate.  I think if we can provide such an
> information via some view with ease then it's worth to have it.

I'd suggest abandoning the idea of storing anything in a persistent
basis in a system catalog and look for some way for the backend to
which the user is connected to expose its own state instead.  For
example, pg_settings already exposes "sourcefile" and "sourceline"
settings.  Actually, I'm not quite sure why that's not sufficient
here, but if it isn't, it could be extended.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2015-01-20 Thread Tomas Vondra

On 25.12.2014 22:28, Tomas Vondra wrote:
> On 25.12.2014 21:14, Andres Freund wrote:
>
>> That's indeed odd. Seems to have been lost when the statsfile was
>> split into multiple files. Alvaro, Tomas?
> 
> The goal was to keep the logic as close to the original as possible.
> IIRC there were "pgstat wait timeout" issues before, and in most cases
> the conclusion was that it's probably because of overloaded I/O.
> 
> But maybe there actually was another bug, and it's entirely possible
> that the split introduced a new one, and that's what we're seeing now.
> The strange thing is that the split happened ~2 years ago, which is
> inconsistent with the sudden increase of this kind of issues. So maybe
> something changed on that particular animal (a failing SD card causing
> I/O stalls, perhaps)?
> 
> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
> and analyze the issue locally. But that won't happen until January.

I've tried to reproduce this on my Raspberry PI 'machine' and it's not
very difficult to trigger this. About 7 out of 10 'make check' runs fail
because of 'pgstat wait timeout'.

All the occurences I've seen were right after some sort of VACUUM
(sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
looked something like this:

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
mmcblk0   0.0075.000.008.00 0.0036.00
9.00 5.73 15633.750.00 15633.75 125.00 100.00

So pretty terrible (this is a Class 4 SD card, supposedly able to handle
4 MB/s). If hamster had faulty SD card, it might have been much worse, I
guess.

This of course does not prove the absence of a bug - I plan to dig into
this a bit more. Feel free to point out some suspicious scenarios that
might be worth reproducing and analyzing.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Robert Haas

On Mon, Jan 19, 2015 at 9:29 PM, Peter Geoghegan  wrote:
> I think that the attached patch should at least fix that much. Maybe
> the problem on the other animal is also explained by the lack of this,
> since there could also be a MinGW-ish strxfrm_l(), I suppose.

Committed that, rather blindly, since it looks like a reasonable fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Tomas Vondra

Hi,

On 20.1.2015 12:23, Ali Akbar wrote:
> 2015-01-20 18:17 GMT+07:00 Ali Akbar 
> Sorry, there is another comment of makeMdArrayResult, i suggest also
> changing it like this:
> @@ -4738,6 +4764,12 @@ makeArrayResult(ArrayBuildState *astate,
>   * beware: no check that specified dimensions match the number of values
>   * accumulated.
>   *
> + * beware: if the astate was not initialized within a separate memory
> + * context (i.e. using subcontext=true when calling initArrayResult),
> + * using release=true is illegal as it releases the whole context,
> + * and that may include other memory still used elsewhere (instead use
> + * release=false and release the memory with the parent context later)
> + *
>   *astate is working state (must not be NULL)
>   *rcontext is where to construct result

I think both comment fixes are appropriate. I'll wait a bit and then
post an updated version of the patch (unless it gets commited with the
comment fixes before that).

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2

2015-01-20 Thread Alvaro Herrera

Tom Lane wrote:
> Disable -faggressive-loop-optimizations in gcc 4.8+ for pre-9.2 branches.
> 
> With this optimization flag enabled, recent versions of gcc can generate
> incorrect code that assumes variable-length arrays (such as oidvector)
> are actually fixed-length because they're embedded in some larger struct.
> The known instance of this problem was fixed in 9.2 and up by commit
> 8137f2c32322c624e0431fac1621e8e9315202f9 and followon work, which hides
> actually-variable-length catalog fields from the compiler altogether.
> And we plan to gradually convert variable-length fields to official
> "flexible array member" notation over time, which should prevent this type
> of bug from reappearing as gcc gets smarter.  We're not going to try to
> back-port those changes into older branches, though, so apply this
> band-aid instead.

Would anybody object to me pushing this commit to branches 8.2 and 8.3?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

2015-01-20 Thread Andres Freund

Hi,

I'm analyzing a problem in which a customer had a pg_basebackup (from
standby) created 9.2 cluster that failed with "WAL contains references to
invalid pages". The failed record was a "xlog redo visible"
i.e. XLOG_HEAP2_VISIBLE.

First I thought there might be another bug along the line of
17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
seem to have any problems.

Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
the basebackup was started and finished *before* pg_basebackup finished.

movedb() basically works in these steps:
1) lock out users of the database
2) RequestCheckpoint(IMMEDIATE|WAIT)
3) DropDatabaseBuffers()
4) copydir()
5) XLogInsert(XLOG_DBASE_CREATE)
6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
7) rmtree(src_dbpath)
8) XLogInsert(XLOG_DBASE_DROP)
9) unlock database

If a basebackup starts while 4) is in progress and continues until 7)
happens I think a pretty wide race opens: The basebackup can end up with
a partial copy of the database in the old tablespace because the
rmtree(old_path) concurrently was in progress.  Normally such races are
fixed during replay. But in this case, the replay of the
XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
fixing nothing.

Besides making AD .. ST use sane WAL logging, which doesn't seem
backpatchable, I don't see what could be done against this except
somehow making basebackups fail if a AD .. ST is in progress. Which
doesn't look entirely trivial either.

This is a pretty nasty bug imo, because you're in no way guaranteed to
be noticed. If the problem happens only in some large, seldomly read,
table, the database might appear to be in a correct state.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-20 Thread Magnus Hagander

On Tue, Jan 20, 2015 at 3:43 PM, Tom Lane  wrote:

> Michael Paquier  writes:
> > On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund 
> wrote:
> >> I think you misunderstood me ;). I was talking about the old CF
> >> application providing a RSS feed of all changes to all entries.
> >> https://commitfest-old.postgresql.org/action/commitfest_activity.rss
>
> > Oh, I didn't know this one. That's indeed useful.
>
> Personally, I never used the RSS feed as such, but I did often consult the
> "activity log" pages, which I think are equivalent:
>
> https://commitfest-old.postgresql.org/action/commitfest_activity?id=25
>
> That feature seems to be gone too :-(
>

Yeah, that's annoying.

In fact, I'm the one who forced the RSS feed into the old system, and it's
something I use all the time myself. Oops.

Turns out I have those in a feature branch that it appears I forgot to
merge :( And it also no longer merges cleanly.

I've added this to my short-term TODO, and will look at it this evening.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Tom Lane

Jeff Davis  writes:
> Tom (tgl),
> Is my reasoning above acceptable?

Uh, sorry, I've not been paying any attention to this thread for awhile.
What's the remaining questions at issue?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-20 Thread Tom Lane

Michael Paquier  writes:
> On Tue, Jan 20, 2015 at 4:31 PM, Andres Freund  wrote:
>> I think you misunderstood me ;). I was talking about the old CF
>> application providing a RSS feed of all changes to all entries.
>> https://commitfest-old.postgresql.org/action/commitfest_activity.rss

> Oh, I didn't know this one. That's indeed useful.

Personally, I never used the RSS feed as such, but I did often consult the
"activity log" pages, which I think are equivalent:

https://commitfest-old.postgresql.org/action/commitfest_activity?id=25

That feature seems to be gone too :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel mode and parallel contexts

2015-01-20 Thread Amit Kapila

On Sat, Jan 17, 2015 at 3:40 AM, Robert Haas  wrote:
>
> New patch attached.  I'm going to take the risk of calling this v1
> (previous versions have been 0.x), since I've now done something about
> the heavyweight locking issue, as well as fixed the message-looping
> bug Amit pointed out.  It doubtless needs more work, but it's starting
> to smell a bit more like a real patch.
>

I need some clarification regarding below code:

+BgwHandleStatus
+WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
+{
+ BgwHandleStatus
status;
+ int rc;
+ bool save_set_latch_on_sigusr1;
+
+
save_set_latch_on_sigusr1 = set_latch_on_sigusr1;
+ set_latch_on_sigusr1 = true;
+
+ PG_TRY();
+ {
+
for (;;)
+ {
+ pid_t pid;
+
+
CHECK_FOR_INTERRUPTS();
+
+ status = GetBackgroundWorkerPid(handle, &pid);
+
if (status == BGWH_STOPPED)
+ return status;
+
+ rc =
WaitLatch(&MyProc->procLatch,
+   WL_LATCH_SET |
WL_POSTMASTER_DEATH, 0);
+
+ if (rc & WL_POSTMASTER_DEATH)
+
return BGWH_POSTMASTER_DIED;

It seems this code has possibility to wait forever.
Assume one of the worker is not able to start (not able to attach
to shared memory or some other reason), then status returned by
GetBackgroundWorkerPid() will be BGWH_NOT_YET_STARTED
and after that it can wait forever in WaitLatch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c

2015-01-20 Thread Tom Lane

Michael Paquier  writes:
> Coverity is pointing out $subject, with the following stuff in gbt_var_same():
> ...
> As Heikki pointed me out on IM, the lack of crash report in this area,
> as well as similar coding style in cube/ seem to be sufficient
> arguments to simply remove those NULL checks instead of doing more
> solid checks on them. Patch is attached.

The way to form a convincing argument that these checks are unnecessary
would be to verify that (1) the SQL-accessible functions directly calling
gbt_var_same() are all marked STRICT, and (2) the core GIST code never
passes a NULL to these support functions.  I'm prepared to believe that
(1) and (2) are both true, but it merits checking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-20 Thread Amit Kapila

On Thu, Jan 15, 2015 at 6:57 PM, Amit Kapila 
wrote:
> On Mon, Jan 12, 2015 at 3:25 AM, Robert Haas 
wrote:
> >
> > Yeah, you need two separate global variables pointing to shm_mq
> > objects, one of which gets used by pqmq.c for errors and the other of
> > which gets used by printtup.c for tuples.
> >
>
> Okay, I will try to change the way as suggested without doing
> switching, but this way we need to do it separately for 'T', 'D', and
> 'C' messages.
>

I have taken care of integrating the parallel sequence scan with the
latest patch posted (parallel-mode-v1.patch) by Robert at below
location:
http://www.postgresql.org/message-id/ca+tgmozduk4k3xhbxc9vm-82khourezdvqwtfglhwsd2r2a...@mail.gmail.com

Changes in this version
---
1. As mentioned previously, I have exposed one parameter
ParallelWorkerNumber as used in parallel-mode patch.
2. Enabled tuple queue to be used for passing tuples from
worker backend to master backend along with error queue
as per suggestion by Robert in the mail above.
3. Involved master backend to scan the heap directly when
tuples are not available in any shared memory tuple queue.
4. Introduced 3 new parameters (cpu_tuple_comm_cost,
parallel_setup_cost, parallel_startup_cost) for deciding the cost
of parallel plan.  Currently, I have kept the default values for
parallel_setup_cost and parallel_startup_cost as 0.0, as those
require some experiments.
5. Fixed some issues (related to memory increase as reported
upthread by Thom Brown and general feature issues found during
test)

Note - I have yet to handle the new node types introduced at some
of the places and need to verify prepared queries and some other
things, however I think it will be good if I can get some feedback
at current stage.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


parallel_seqscan_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2015-01-20 Thread Andrew Gierth

> "Robert" == Robert Haas  writes:

 Robert> All right, it seems Tom is with you on that point, so after
 Robert> some study, I've committed this with very minor modifications.

This caught my eye (thanks to conflict with GS patch):

 * In the future, we should consider forcing the
 * tuplesort_begin_heap() case when the abbreviated key
 * optimization can thereby be used, even when numInputs is 1.

The comment in tuplesort_begin_datum that abbreviation can't be used
seems wrong to me; why is the copy of the original value pointed to by
stup->tuple (in the case of by-reference types, and abbreviation is
obviously not needed for by-value types) not sufficient?

Or what am I missing?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Ali Akbar

2015-01-20 18:17 GMT+07:00 Ali Akbar :

> 2015-01-20 14:22 GMT+07:00 Jeff Davis :
>
>> The current patch, which I am evaluating for commit, does away with
>> per-group memory contexts (it uses a common context for all groups), and
>> reduces the initial array allocation from 64 to 8 (but preserves
>> doubling behavior).
>
>
> Jeff & Tomas, spotted this comment in v8 patch:
> @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate,
>  /*
>   * makeArrayResult - produce 1-D final result of accumArrayResult
>   *
> + * If the array build state was initialized with a separate memory
> context,
> + * this also frees all the memory (by deleting the subcontext). If a
> parent
> + * context was used directly, the memory may be freed by an explicit
> pfree()
> + * call (unless it's meant to be freed by destroying the parent context).
> + *
>   * astate is working state (must not be NULL)
>   * rcontext is where to construct result
>   */
>
> Simple pfree(astate) call is not enough to free the memory. If it's scalar
> accumulation (initArrayResult), the user must pfree(astate->dvalues) and
> pfree(astate->dnulls) before astate. If it's array accumulation,
> pfree(astate->data) and pfree(astate->nullbitmap), with both can be null if
> no array accumulated and some other cases. If its any (scalar or array)
> accumulation, it's more complicated.
>
> I suggest it's simpler to just force the API user to destroy the parent
> context instead. So the comment become like this:
> @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate,
>  /*
>   * makeArrayResult - produce 1-D final result of accumArrayResult
>   *
> + * If the array build state was initialized with a separate memory
> context,
> + * this also frees all the memory (by deleting the subcontext). If a
> parent
> + * context was used directly, the memory is meant to be freed by
> destroying
> + * the parent context.
> + *
>   * astate is working state (must not be NULL)
>   * rcontext is where to construct result
>   */
>

Sorry, there is another comment of makeMdArrayResult, i suggest also
changing it like this:
@@ -4738,6 +4764,12 @@ makeArrayResult(ArrayBuildState *astate,
  * beware: no check that specified dimensions match the number of values
  * accumulated.
  *
+ * beware: if the astate was not initialized within a separate memory
+ * context (i.e. using subcontext=true when calling initArrayResult),
+ * using release=true is illegal as it releases the whole context,
+ * and that may include other memory still used elsewhere (instead use
+ * release=false and release the memory with the parent context later)
+ *
  * astate is working state (must not be NULL)
  * rcontext is where to construct result

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Ali Akbar

2015-01-20 14:22 GMT+07:00 Jeff Davis :

> The current patch, which I am evaluating for commit, does away with
> per-group memory contexts (it uses a common context for all groups), and
> reduces the initial array allocation from 64 to 8 (but preserves
> doubling behavior).


Jeff & Tomas, spotted this comment in v8 patch:
@@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate,
 /*
  * makeArrayResult - produce 1-D final result of accumArrayResult
  *
+ * If the array build state was initialized with a separate memory context,
+ * this also frees all the memory (by deleting the subcontext). If a parent
+ * context was used directly, the memory may be freed by an explicit
pfree()
+ * call (unless it's meant to be freed by destroying the parent context).
+ *
  * astate is working state (must not be NULL)
  * rcontext is where to construct result
  */

Simple pfree(astate) call is not enough to free the memory. If it's scalar
accumulation (initArrayResult), the user must pfree(astate->dvalues) and
pfree(astate->dnulls) before astate. If it's array accumulation,
pfree(astate->data) and pfree(astate->nullbitmap), with both can be null if
no array accumulated and some other cases. If its any (scalar or array)
accumulation, it's more complicated.

I suggest it's simpler to just force the API user to destroy the parent
context instead. So the comment become like this:
@@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate,
 /*
  * makeArrayResult - produce 1-D final result of accumArrayResult
  *
+ * If the array build state was initialized with a separate memory context,
+ * this also frees all the memory (by deleting the subcontext). If a parent
+ * context was used directly, the memory is meant to be freed by destroying
+ * the parent context.
+ *
  * astate is working state (must not be NULL)
  * rcontext is where to construct result
  */

Regards,
-- 
Ali Akbar

Re: [HACKERS] Bug in pg_dump

2015-01-20 Thread Gilles Darold

Le 19/01/2015 14:41, Robert Haas a écrit :
> On Thu, Jan 15, 2015 at 6:26 AM, Gilles Darold  
> wrote:
>> I attach a patch that solves the issue in pg_dump, let me know if it might
>> be included in Commit Fest or if the three other solutions are a better
>> choice.
> I think a fix in pg_dump is appropriate, so I'd encourage you to add
> this to the next CommitFest.
>
Ok, thanks Robert. The patch have been added to next CommitFest under
the Bug Fixes section.

I've also made some review of the patch and more test. I've rewritten
some comments and added a test when TableInfo is NULL to avoid potential
pg_dump crash.

New patch attached: pg_dump.c-extension-FK-v2.patch

Regards

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

--- ../postgresql/src/bin/pg_dump/pg_dump.c	2015-01-19 19:03:45.897706879 +0100
+++ src/bin/pg_dump/pg_dump.c	2015-01-20 11:22:01.144794489 +0100
@@ -209,6 +209,7 @@
 
 static void getDomainConstraints(Archive *fout, TypeInfo *tyinfo);
 static void getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids);
+static bool hasExtensionMember(TableInfo *tblinfo, int numTables);
 static void makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo, bool oids);
 static void buildMatViewRefreshDependencies(Archive *fout);
 static void getTableDataFKConstraints(void);
@@ -730,9 +731,20 @@
 
 	if (!dopt.schemaOnly)
 	{
+		bool has_ext_member;
+
 		getTableData(&dopt, tblinfo, numTables, dopt.oids);
+
+		/* Search if there's dumpable table's members in an extension */
+		has_ext_member = hasExtensionMember(tblinfo, numTables);
+
 		buildMatViewRefreshDependencies(fout);
-		if (dopt.dataOnly)
+		/*
+		 * Get FK constraints even with schema+data if extension's
+		 * members have FK because in this case tables need to be
+		 * dump-ordered too.
+		 */
+		if (dopt.dataOnly || has_ext_member)
 			getTableDataFKConstraints();
 	}
 
@@ -1852,6 +1864,26 @@
 }
 
 /*
+ * hasExtensionMember -
+ *	  return true when on of the dumpable object
+ *	  is an extension members
+ */
+static bool
+hasExtensionMember(TableInfo *tblinfo, int numTables)
+{
+	int			i;
+
+	for (i = 0; i < numTables; i++)
+	{
+		if (tblinfo[i].dobj.ext_member)
+			return true;
+	}
+
+	return false;
+}
+
+
+/*
  * Make a dumpable object for the data of this specific table
  *
  * Note: we make a TableDataInfo if and only if we are going to dump the
@@ -2026,10 +2058,12 @@
  * getTableDataFKConstraints -
  *	  add dump-order dependencies reflecting foreign key constraints
  *
- * This code is executed only in a data-only dump --- in schema+data dumps
- * we handle foreign key issues by not creating the FK constraints until
- * after the data is loaded.  In a data-only dump, however, we want to
- * order the table data objects in such a way that a table's referenced
+ * This code is executed only in a data-only dump or when there is extension's
+ * member -- in schema+data dumps we handle foreign key issues by not creating
+ * the FK constraints until after the data is loaded. In a data-only dump or
+ * when there is an extension member to dump (schema dumps do not concern
+ * extension's objects, they are created during the CREATE EXTENSION), we want
+ * to order the table data objects in such a way that a table's referenced
  * tables are restored first.  (In the presence of circular references or
  * self-references this may be impossible; we'll detect and complain about
  * that during the dependency sorting step.)
@@ -2930,9 +2964,14 @@
 	PQExpBuffer delqry;
 	const char *cmd;
 
+	/* Policy is SCHEMA only */
 	if (dopt->dataOnly)
 		return;
 
+	/* CREATE EXTENSION should take care of that */
+	if ((tbinfo != NULL) && tbinfo->dobj.ext_member)
+		return;
+
 	/*
 	 * If polname is NULL, then this record is just indicating that ROW
 	 * LEVEL SECURITY is enabled for the table. Dump as ALTER TABLE 
@@ -7884,6 +7923,10 @@
 	if (dopt->dataOnly)
 		return;
 
+	/* CREATE EXTENSION should take care of that */
+	if ((tbinfo != NULL) && tbinfo->dobj.ext_member)
+		return;
+
 	/* Search for comments associated with relation, using table */
 	ncomments = findComments(fout,
 			 tbinfo->dobj.catId.tableoid,
@@ -13138,6 +13181,10 @@
 	if (dopt->dataOnly)
 		return;
 
+	/* CREATE EXTENSION should take care of that */
+	if ((tbinfo != NULL) && tbinfo->dobj.ext_member)
+		return;
+
 	/* Search for comments associated with relation, using table */
 	nlabels = findSecLabels(fout,
 			tbinfo->dobj.catId.tableoid,
@@ -13345,7 +13392,7 @@
 static void
 dumpTable(Archive *fout, DumpOptions *dopt, TableInfo *tbinfo)
 {
-	if (tbinfo->dobj.dump && !dopt->dataOnly)
+	if (tbinfo->dobj.dump && !dopt->dataOnly && !tbinfo->dobj.ext_member)
 	{
 		char	   *namecopy;
 
@@ -13483,6 +13530,10 @@
 	int			j,
 k;
 
+	/* CREATE EXTENSION should take care of that */
+	if ((tbinfo != NULL) && tbinfo->dobj.ext_member)
+		return;
+
 	/* Make sure we are in proper schema */
 	selectSourceSchema(fout, tbinfo->dobj.namespace

Re: [HACKERS] New CF app: changing email sender

2015-01-20 Thread Andrew Gierth

> "Kyotaro" == Kyotaro HORIGUCHI  writes:

 Kyotaro> Hmm. The mail address indeed *was* mine but is now obsolete,
 Kyotaro> so that the email might bounce. But I haven't find how to
 Kyotaro> change it within the app itself, and the PostgreSQL community
 Kyotaro> account page.

Just being able to change the email address on the community account
isn't enough; I for one am subscribed to the lists with a different
email address than the one associated with my community account (for
spam management reasons). There needs to be a way to have multiple
addresses or to specify which is to be used for the post.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Reducing lock strength of trigger and foreign key DDL

2015-01-20 Thread Noah Misch

On Fri, Jan 16, 2015 at 04:59:56PM +0100, Andres Freund wrote:
> On 2015-01-16 15:16:20 +0100, Andreas Karlsson wrote:
> > For this reason I opted to only lower the lock levels of ADD and ALTER
> > TRIGGER, and not DROP TRIGGER. Neither of those require MVCC of then
> > WHEN clause.
> 
> I'm unconvinced that this is true. Using a snapshot for part of getting
> a definition certainly opens the door for getting strange
> results.
> 
> Acquiring a lock that prevents schema changes on the table and then
> getting the definition using the syscaches guarantees that that
> definition is at least self consistent because no further schema changes
> are possible and the catalog caches will be up2date.
> 
> What you're doing though is doing part of the scan using the
> transaction's snapshot (as used by pg_dump that will usually be a
> repeatable read snapshot and possibly quite old) and the other using a
> fresh catalog snapshot. This can result in rather odd things.
> 
> Just consider:
> S1: CREATE TABLE flubber(id serial primary key, data text);
> S1: CREATE FUNCTION blarg() RETURNS TRIGGER LANGUAGE plpgsql AS $$BEGIN 
> RETURN NEW; END;$$;
> S1: CREATE TRIGGER flubber_blarg BEFORE INSERT ON flubber FOR EACH ROW WHEN 
> (NEW.data IS NOT NULL) EXECUTE PROCEDURE blarg();
> S2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> S2: SELECT 'dosomethingelse';
> S1: ALTER TABLE flubber RENAME TO wasflubber;
> S1: ALTER TABLE wasflubber RENAME COLUMN data TO wasdata;
> S1: ALTER TRIGGER flubber_blarg ON wasflubber RENAME TO wasflubber_blarg;
> S1: ALTER FUNCTION blarg() RENAME TO wasblarg;
> S2: SELECT pg_get_triggerdef(oid) FROM pg_trigger;
> 
> This will give you: The old trigger name. The new table name. The new
> column name. The new function name.
> 
> I don't think using a snapshot for tiny parts of these functions
> actually buys anything. Now, this isn't a pattern you introduced. But I
> think we should think about this for a second before expanding it
> further.

Fair enough.  It did reinforce pg_get_constraintdef() as a subroutine of
pg_dump rather than an independent, rigorous interface.  It perhaps made the
function worse for non-pg_dump callers.  In that vein, each one of these hacks
has a cost.  One could make a reasonable argument that ALTER TRIGGER RENAME
locking is not important enough to justify spreading the hack from
pg_get_constraintdef() to pg_get_triggerdef().  Lowering the CREATE TRIGGER
lock level does not require any ruleutils.c change for the benefit of pg_dump,
because pg_dump won't see the pg_trigger row of a too-recent trigger.

> Before you argue that this isn't relevant for pg_dump: It is. Precisely
> the above can happen - just replace the 'dosomethingelse' with several
> LOCK TABLEs as pg_dump does. The first blocks after a snapshot has been
> acquired. While waiting, all the ALTERs happen.

We wish pg_dump would take a snapshot of the database; that is, we wish its
output always matched some serial execution of transactions.  pg_dump has,
since ancient times, failed to achieve that if non-table DDL commits during
the dump or if table DDL commits between acquiring the dump transaction
snapshot and acquiring the last table lock.  My reviews have defended the
standard that table DDL issued after pg_dump has acquired locks does not
change the dump.  That's what we bought with pg_get_constraintdef()'s use of
the transaction snapshot and would buy with the same in pg_get_triggerdef().
My reviews have deliberately ignored effects on scenarios where pg_dump
already fails to guarantee snapshot-like output.

> Arguably the benefit here is that the window for this issue is becoming
> smaller as pg_dump (and hopefully other possible callers) acquire
> exclusive locks on the table. I.e. that the lowering of the lock level
> doesn't introduce new races. But on the other side of the coin, this
> makes the result of pg_get_triggerdef() even *more* inaccurate in many
> cases.

What is this about pg_dump acquiring exclusive locks?

To summarize, the problem you raise has been out of scope, because it affects
pg_dump only at times when pg_dump is already wrong.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] documentation update for doc/src/sgml/func.sgml

2015-01-20 Thread Fabien COELHO




I had a look at this patch.  This patch adds some text below a table
of functions.  Immediately above that table, there is this existing
language:

  The functions working with double precision data are mostly
  implemented on top of the host system's C library; accuracy and behavior in
  boundary cases can therefore vary depending on the host system.

This seems to me to substantially duplicate the information added by the patch.


I would rather say that it explicites the potential issues. Taking that 
into account, maybe the part about floating point could be moved up after 
the above sentence, or the above sentence moved down as an introduction, 
with some pruning so that it fits in?


The second paragraph about bitwise ops is not related to these.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-01-20 Thread Kyotaro HORIGUCHI

Hello, thank you for the comment. I added experimental adaptive
fetch size feature in this v6 patch.


At Tue, 20 Jan 2015 04:51:13 +, Matt Kelly  wrote in 

> I think its telling that varying the fetch size doubled the performance,
> even on localhost.  If you were to repeat this test across a network, the
> performance difference would be far more drastic.

I think so surely.

> I understand the desire to keep the fetch size small by default, but I
> think your results demonstrate how important the value is.  At the very
> least, it is worth reconsidering this "arbitrary" value.  However, I think
> the real solution is to make this configurable.  It probably should be a
> new option on the foreign server or table, but an argument could be made
> for it to be global across the server just like work_mem.

The optimal number of fetch_count varies depending on query. Only
from the performance view, it should be the same as the table
size when simple scan on a table. Most of joins also not need to
read target relations simultaneously. (Local merge join on remote
sorted results is not available since fdw is not aware of the
sorted-ness). But it would be changed in near future. So I have
found no appropriate policy to decide the number.

The another point of view is memory requirement. This wouldn't
matter using single-row mode of libpq but it doesn't allow
multple simultaneous queries. The space needed for the fetch
buffer widely varies in proportion to the average row length. If
it is 1Kbytes, 1 rows requires over 10MByes, which is larger
than the default value of work_mem. I tried adaptive fetch_size
based on fetch durtaion and required buffer size for the previous
turn in this version. But hard limit cannot be imposed since we
cannot know of the mean row length in advance. So, for example,
the average row length suddenly grows 1KB->10KB when fetch_size
is 1, 100MB is required for the turn. I think, for the
ordinary cases, maximum fetch size cannot exceeds 1000.


The attatched is the new version implemented the adaptive fetch
size. Simple test runs showed the values below. A single scan was
boosted by about 5% (No effect?) and a join by 33%. The former
case is ununderstandable so I'll examine it tomorrow. This
doesn't seem so promising, though..


=
master=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
   QUERY PLAN
-
 Foreign Scan on ft1 (actual time=1.741..10046.272 rows=100 loops=1)
 Planning time: 0.084 ms
 Execution time: 10145.730 ms
(3 rows)


patched=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
   QUERY PLAN   

 Foreign Scan on ft1 (actual time=1.072..9582.980 rows=100 loops=1)
 Planning time: 0.077 ms
 Execution time: 9683.164 ms
(3 rows)

patched=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c FROM ft1 AS x 
JOIN ft1 AS y on x.a = y.a;
  QUERY PLAN   


postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c FROM ft1 AS x 
JOIN ft1 AS y on x.a = y.a;
  QUERY PLAN   
   
---
---
 Merge Join (actual time=18191.739..19534.001 rows=100 loops=1)
   Merge Cond: (x.a = y.a)
   ->  Sort (actual time=9031.155..9294.465 rows=100 loops=1)
 Sort Key: x.a
 Sort Method: external sort  Disk: 142728kB
 ->  Foreign Scan on ft1 x (actual time=1.156..6486.632 rows=100 lo
ops=1)
   ->  Sort (actual time=9160.577..9479.076 rows=100 loops=1)
 Sort Key: y.a
 Sort Method: external sort  Disk: 146632kB
 ->  Foreign Scan on ft1 y (actual time=0.641..6517.594 rows=100 lo
ops=1)
 Planning time: 0.203 ms
 Execution time: 19626.881 ms
(12 rows)

   
---
---
 Merge Join (actual time=11790.690..13134.071 rows=100 loops=1)
   Merge Cond: (x.a = y.a)
   ->  Sort (actual time=8149.225..8413.611 rows=100 loops=1)
 Sort Key: x.a
 Sort Method: external sort  Disk: 142728kB
 ->  Foreign Scan on ft1 x (actual time=0.679..3989.160 rows=100 lo
ops=1)
   ->  Sort (actual time=3641.457..3957.240 rows=100 loops=1)
 Sort Key: y.a
 Sort Method: external sort  Disk: 146632kB
 ->  Foreign Scan on ft1 y (actual time=0.605..1852.655 rows=100 lo
ops=1)
 Planning time: 0.203 ms
 Execution time: 13226.414 ms
(12 rows)


> Obviously, this shouldn't block your current patch but its worth revisiting.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


>From 8408ea7c5642a59428

[HACKERS] Dereferenced pointers checked as NULL in btree_utils_var.c

2015-01-20 Thread Michael Paquier

Hi all,

Coverity is pointing out $subject, with the following stuff in gbt_var_same():
GBT_VARKEY *t1 = (GBT_VARKEY *) DatumGetPointer(d1);
GBT_VARKEY *t2 = (GBT_VARKEY *) DatumGetPointer(d2);
GBT_VARKEY_R r1,
r2;

r1 = gbt_var_key_readable(t1); <= t1 dereferenced
r2 = gbt_var_key_readable(t2); <= t2 dereferenced

if (t1 && t2)
result = ((*tinfo->f_cmp) (r1.lower, r2.lower,
collation) == 0 &&
  (*tinfo->f_cmp) (r1.upper, r2.upper,
collation) == 0);
else
result = (t1 == NULL && t2 == NULL); <= Coverity complains here

return result;

As Heikki pointed me out on IM, the lack of crash report in this area,
as well as similar coding style in cube/ seem to be sufficient
arguments to simply remove those NULL checks instead of doing more
solid checks on them. Patch is attached.
Regards,
-- 
Michael
diff --git a/contrib/btree_gist/btree_utils_var.c b/contrib/btree_gist/btree_utils_var.c
index b7dd060..6ad3347 100644
--- a/contrib/btree_gist/btree_utils_var.c
+++ b/contrib/btree_gist/btree_utils_var.c
@@ -337,7 +337,6 @@ bool
 gbt_var_same(Datum d1, Datum d2, Oid collation,
 			 const gbtree_vinfo *tinfo)
 {
-	bool		result;
 	GBT_VARKEY *t1 = (GBT_VARKEY *) DatumGetPointer(d1);
 	GBT_VARKEY *t2 = (GBT_VARKEY *) DatumGetPointer(d2);
 	GBT_VARKEY_R r1,
@@ -346,13 +345,8 @@ gbt_var_same(Datum d1, Datum d2, Oid collation,
 	r1 = gbt_var_key_readable(t1);
 	r2 = gbt_var_key_readable(t2);
 
-	if (t1 && t2)
-		result = ((*tinfo->f_cmp) (r1.lower, r2.lower, collation) == 0 &&
-  (*tinfo->f_cmp) (r1.upper, r2.upper, collation) == 0);
-	else
-		result = (t1 == NULL && t2 == NULL);
-
-	return result;
+	return ((*tinfo->f_cmp) (r1.lower, r2.lower, collation) == 0 &&
+			(*tinfo->f_cmp) (r1.upper, r2.upper, collation) == 0);
 }
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

89 matches

Mail list logo