date:20140507

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-07 Thread Jaime Casanova

On Wed, May 7, 2014 at 10:52 PM, Amit Kapila  wrote:
> On Thu, May 8, 2014 at 5:30 AM, Jaime Casanova  wrote:
>> Hi,
>>
>> This patch implements $subject only when ANALYZE and VERBOSE are on.
>> I made it that way because for years nobody seemed interested in this
>> info (at least no one did it) so i decided that maybe is to much
>> information for most people (actually btree indexes are normally very
>> fast).
>
>
> Why to capture only for Index Insert/Update and not for Read; is it
> because Read will be always fast ot implementation complexity?
>

EXPLAIN ANALYZE already shows that on any SELECT that uses an index in
some way. Or are you thinking on something else?

> Why not similar timings for heap?
>

well "actual time" shows us total time of the operation so just
deducting the time spent on triggers, indexes and planning seems like
a way to get "heap modification time".

yes, maybe we still need some additional data. for example, i could
want to know how much time we spent extending a relation.

> Why can't we print when only Analyze is used with Explain, the
> execution time is printed with Analyze option?
>

i'm not sure the info is useful for everyone, i'm not opposed to show
it all the time though

> Could you please tell in what all kind of scenario's, do you expect it
> to be useful?
> One I could think is that if there are multiple indexes on a table and user
> wants to find out if any particular index is consuming more time.
>

exactly my use case. consider this plan (we spent 78% of the time
updating the index uniq_idx_on_text):

   QUERY PLAN

 Insert on public.t1 (actual time=0.540..0.540 rows=0 loops=1)
   ->  Result (actual time=0.046..0.049 rows=1 loops=1)
 Output: 
 Index uniq_idx_on_text on t1: time=0.421 rows=1
 Index t1_pkey on t1: time=0.027 rows=1
 Total runtime: 0.643 ms
(6 rows)

so i want to answer questions like, how much an index is hurting write
performance? once i know that i can look for alternative solutions.
In that vein, it was interesting to see how fastupdate affect
performance in a GIN index using gin_trgm_ops (5 times slower with
fastupdate=off)

(fastupdate=on) Index idx_ggin on t1: time=0.418 rows=1
(fastupdate=off) Index idx_ggin on t1: time=2.205 rows=1

this is not different to showing trigger time info, which we already do

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Ignore files in src/interfaces/libpq generated by windows builds

2014-05-07 Thread Michael Paquier

Hi all,

While doing some builds with mingw and 9.4, I noticed that a couple of
files generated by build are not ignored in the source tree. Those two
files are system.c and win32setlocale.c in src/interfaces/libpq.
Please find attached a patch fixing that.
Note that this is recent and is partially caused by commit a692ee5.

Regards,
-- 
Michael


20140508_mingw_gitignore.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Compilation errors with mingw build caused by undefined optreset

2014-05-07 Thread Michael Paquier

Hi all,

Since commit 60ff2fd introducing the centralizated getopt-related
things in a global header file, build on Windows with mingw is failing
because of some declarations of HAVE_INT_OPTRESET causing optreset to
become undefined:
postmaster.c: In function 'PostmasterMain':
postmaster.c:853:2: error: 'optreset' undeclared (first use in this function)
postmaster.c:853:2: note: each undeclared identifier is reported only
once for each function it appears in

This failure is new with 9.4, and attached is a patch fixing it...
Regards,
-- 
Michael


20140508_minwg_getopt_error.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] popen and pclose redefinitions causing many warning in Windows build

2014-05-07 Thread Michael Paquier

Hi all,

Since commit a692ee5, code compilation on windows is full of warnings
caused by the re-definitions of popen and pclose:
In file included from ../../../src/include/c.h:1028:0,
 from ../../../src/include/postgres.h:47,
 from analyze.c:25:
../../../src/include/port.h:312:0: warning: "popen" redefined [enabled
by default]
In file included from ../../../src/include/c.h:81:0,
 from ../../../src/include/postgres.h:47,
 from analyze.c:25:
c:\mingw64\bin\../lib/gcc/x86_64-w64-mingw32/4.7.0/../../../../x86_64-w64-mingw32/include/stdio.h:48
9:0: note: this is the location of the previous definition
In file included from ../../../src/include/c.h:1028:0,
 from ../../../src/include/postgres.h:47,
 from analyze.c:25:
../../../src/include/port.h:313:0: warning: "pclose" redefined
[enabled by default]
In file included from ../../../src/include/c.h:81:0,
 from ../../../src/include/postgres.h:47,
 from analyze.c:25
The patch attached fixes that.
Regards,
-- 
Michael


20140508_win_build_warnings.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Craig Ringer

On 05/08/2014 10:53 AM, Tom Lane wrote:
> Craig Ringer  writes:
>> On 05/08/2014 12:21 AM, Tom Lane wrote:
>>> If Craig has a concrete argument why all GUCs should be accessible
>>> to external modules, then let's see it
> 
>> As for just GUCs: I suggested GUCs because GUCs are what's been coming
>> up repeatedly as an actual practical issue.
> 
> Meh.  A quick look through the commit logs says that GUC variables are not
> more than 50% of what we've had to PGDLLIMPORT'ify in the past year or
> two.  Maybe that's different from 2ndQuadrant's internal experience,
> but then you've not showed us the use-case driving your changes.

That's because the use case isn't that interesting, really, only the
hiccups it's caused. I'd just as happily mark all extern vars
PGDLLIMPORT, and only suggested focusing on GUCs because I didn't expect
to get far with what I really thought was best.

Re your concerns with exposing GUCs that should by rights be internal to
the wider world, it'd be interesting to mark functions and vars as
something  like PGINTERNAL, to expand to:
   __attribute__((visibility ("hidden"))
on gcc, so they're just not available for linkage in extensions. That's
a weaker form of using -fvisibility=hidden, where we explicitly say
"this is private" rather than treating everything as private unless
explicitly marked public, which has already been rejected.

Right now they're already exported for !windows, and while it's IMO a
bug to have that difference for windows, it doesn't mean the correct
answer is to export for all. If we're confident it won't break anything
interesting it'd be OK to instead say "unexport on !windows too".

In terms of ugliness, would you be happier using a pg_extern instead of
extern, where we:

#ifdef WIN32
#define pg_extern extern PGDLLIMPORT
#else
#define pg_extern extern
#endif

?

That makes it easier to pretend that there's nothing windows-y going on
- and despite appearances, I'm also pretty keen not to have to think
about that platform's linkage horrors when I don't have to.

However, it also makes backpatching ickier.

>> I'd be quite happy to
>> PGDLLEXPORT all extern vars, but I was confident that'd be rejected for
>> aesthetic reasons, and thought that exporting all GUCs would be a
>> reasonable compromise.
> 
> From the aesthetic standpoint, what I'd like is to not have to blanket
> our source code with Windows-isms.  But I guess I can't have that.

I'd rather prefer that as well, but without the ability to go knocking
at Redmond and introduce them to ELF and sane linkage, I don't think any
of us are going to get it.

If it weren't for backbranches etc the first thing I'd do to make it
less ugly personally would be to rename PGDLLIMPORT as PGEXPORT and
BUILDING_DLL as BUILDING_POSTGRES . The current names are unfortunate.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-07 Thread Amit Kapila

On Thu, May 8, 2014 at 5:30 AM, Jaime Casanova  wrote:
> Hi,
>
> This patch implements $subject only when ANALYZE and VERBOSE are on.
> I made it that way because for years nobody seemed interested in this
> info (at least no one did it) so i decided that maybe is to much
> information for most people (actually btree indexes are normally very
> fast).

Why to capture only for Index Insert/Update and not for Read; is it
because Read will be always fast ot implementation complexity?

Why not similar timings for heap?

Why can't we print when only Analyze is used with Explain, the
execution time is printed with Analyze option?

Could you please tell in what all kind of scenario's, do you expect it
to be useful?
One I could think is that if there are multiple indexes on a table and user
wants to find out if any particular index is consuming more time.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Kouhei Kaigai

> >> > * ForeignScan node that is not associated with a particular
> foreign-table.
> >> >   Once we try to apply ForeignScan node instead of Sort or
> >> > Aggregate,
> >> existing
> >> >   FDW implementation needs to be improved. These nodes scan on a
> >> materialized
> >> >   relation (generated on the fly), however, existing FDW code assumes
> >> >   ForeignScan node is always associated with a particular
> foreign-table.
> >> >   We need to eliminate this restriction.
> >>
> >> I don't think we need to do that, given the above.
> >>
> > It makes a problem if ForeignScan is chosen as alternative path of Join.
> >
> > The target-list of Join node are determined according to the query
> > form on the fly, so we cannot expect a particular TupleDesc to be
> > returned preliminary. Once we try to apply ForeignScan instead of Join
> > node, it has to have its TupleDesc depending on a set of joined relations.
> >
> > I think, it is more straightforward approach to allow ForeignScan that
> > is not associated to a particular (cataloged) relations.
> 
> From your description, my understanding is that you would like to stream
> data from 2 standard tables to the GPU, then perform a join on the GPU itself.
> 
> I have been told that is not likely to be useful because of the data transfer
> overheads.
> 
Here are two solutions. One is currently I'm working; in case when number
of rows in left- and right- tables are not balanced well, we can keep a hash
table in the GPU DRAM, then we transfer the data stream chunk-by-chunk from
the other side. Kernel execution and data transfer can be run asynchronously,
so it allows to hide data transfer cost as long as we have enough number of
chunks, like processor pipelining.
Other solution is "integrated" GPU that kills necessity of data transfer,
like Intel's Haswell, AMD's Kaveri or Nvidia's Tegra K1; all majors are
moving to same direction.

> Or did I misunderstand, and that this is intended to get around the current
> lack of join pushdown into FDWs?
> 
The logic above is obviously executed on the extension side, so it needs
ForeignScan node to perform like Join node; that reads two input relation
streams and output one joined relation stream.

It is quite similar to expected FDW join-pushdown design. It will consume
(remote) two relations and generates one output stream; looks like a scan
on a particular relation (but no catalog definition here).

Probably, it shall be visible to local backend as follows:
(it is a result of previous prototype based on custom-plan api)

postgres=# EXPLAIN VERBOSE SELECT count(*) FROM
pgbench1_branches b JOIN pgbench1_accounts a ON a.bid = b.bid WHERE aid < 
100;
   QUERY PLAN   
 
-
 Aggregate  (cost=101.60..101.61 rows=1 width=0)
   Output: count(*)
   ->  Custom Scan (postgres-fdw)  (cost=100.00..101.43 rows=71 width=0)
 Remote SQL: SELECT NULL FROM (public.pgbench_branches r1 JOIN 
public.pgbench_accounts r2 ON ((r1.bid = r2.bid))) WHERE ((r2.aid < 100))
(4 rows)

The place of "Custom Scan" node will be ForeignScan, if Join pushdown got 
supported.
At that time, what relation should be scanned by this ForeignScan?
It is the reason why I proposed ForeignScan node without particular relation.

> Can you be specific about the actual architecture you wish for, so we can
> understand how to generalise that into an API?
> 
If we push the role of CustomPlan node into ForeignScan, I want to use this node
to acquire control during query planning/execution.

As I did in the custom-plan patch, first of all, I want extension to have
a chance to add alternative path towards particular scan/join.
If extension can take over the execution, it will generate a ForeignPath
(or CustomPath) node then call add_path(). As usual manner, planner decide
whether the alternative path is cheaper than other candidates.

In case when it replaced scan relation by ForeignScan, it is almost same as
existing API doing, except for the underlying relation is regular one, not
foreign table.

In case when it replaced join relations by ForeignScan, it will be almost
same as expected ForeignScan with join-pushed down. Unlike usual table scan,
it does not have actual relation definition on catalog, and its result
tuple-slot is determined on the fly.
One thing different from the remote-join is, this ForeignScan node may have
sub-plans locally, if FDW driver (e.g GPU execution) may have capability on
Join only, but no relation scan portion.
So, unlike its naming, I want ForeignScan to support to have sub-plans if
FDW driver supports the capability.

Does it make you clear? Or, makes you more confused??

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers

Re: [HACKERS] Unhappy with error handling in psql's handleCopyOut()

2014-05-07 Thread Noah Misch

On Tue, Feb 11, 2014 at 03:43:08PM -0500, Tom Lane wrote:
> While looking at the pending patch to make psql report a line count
> after COPY, I came across this business in handleCopyOut():
> 
>  * Check command status and return to normal libpq state.  After a
>  * client-side error, the server will remain ready to deliver data.  The
>  * cleanest thing is to fully drain and discard that data.  If the
>  * client-side error happened early in a large file, this takes a long
>  * time.  Instead, take advantage of the fact that PQexec() will silently
>  * end any ongoing PGRES_COPY_OUT state.  This does cause us to lose the
>  * results of any commands following the COPY in a single command string.
>  * It also only works for protocol version 3.  XXX should we clean up
>  * using the slow way when the connection is using protocol version 2?
> 
> which git blames on commit 08146775 (committed by Alvaro on behalf of
> Noah).
> 
> This does not make me happy.  In the first place, we have not dropped
> support for protocol version 2.

As you may realize, that commit helped protocol version 3 most but also
strictly improved things for protocol version 2.  I didn't feel the need to
improve them both to the same extent, a judgment that still seems reasonable.

> In the second place, I fail to see
> what the advantage is of kluging things like this.  The main costs of
> draining the remaining COPY data are the server-side work of generating
> the data and the network transmission costs, neither of which will go
> away with this technique.

Agreed.

>From your commit b8f00a4 of 2014-02-13:
> +  * If for some reason libpq is still reporting PGRES_COPY_OUT state, we
> +  * would like to forcibly exit that state, since our caller would be
> +  * unable to distinguish that situation from reaching the next COPY in a
> +  * command string that happened to contain two consecutive COPY TO 
> STDOUT
> +  * commands.  However, libpq provides no API for doing that, and in
> +  * principle it's a libpq bug anyway if PQgetCopyData() returns -1 or -2
> +  * but hasn't exited COPY_OUT state internally.  So we ignore the
> +  * possibility here.
>*/

PQgetCopyData() returns -2 without exiting COPY_OUT when its malloc() call
fails.  My submission of the patch that became commit 08146775 included a test
procedure for getting an infinite loop in handleCopyOut(); that commit fixed
the infinite loop.  After commit b8f00a4, the same test procedure once again
initiates an infinite loop.  To a large extent, that's bad luck on the part of
commit b8f00a4.  I bet malloc() failure in pqCheckInBufferSpace() could
initiate an infinite loop even with the longstanding 9.2/9.3 code.

> So I'm thinking we should revert this kluge
> and just drain the data straightforwardly, which would also eliminate
> the mentioned edge-case misbehavior when there were more commands in
> the query string.
> 
> Is there a reason I'm overlooking not to do this?

The moral of the infinite loop test case story is that draining the data
straightforwardly has strictly more ways to fail.  Consequently, commit
b8f00a4 improved some scenarios while regressing others.  Overall, I do think
it was an improvement.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Shigeru Hanada

2014-05-07 18:06 GMT+09:00 Kouhei Kaigai :
> Let me list up the things to be clarified / developed randomly.
>
> * Join replacement by FDW; We still don't have consensus about join 
> replacement
>   by FDW. Probably, it will be designed to remote-join implementation 
> primarily,
>   however, things to do is similar. We may need to revisit the Hanada-san's
>   proposition in the past.

I can't recall the details soon but the reason I gave up was about
introducing ForiegnJoinPath node, IIRC.  I'll revisit the discussion
and my proposal.
-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane

Craig Ringer  writes:
> On 05/08/2014 12:21 AM, Tom Lane wrote:
>> If Craig has a concrete argument why all GUCs should be accessible
>> to external modules, then let's see it

> As for just GUCs: I suggested GUCs because GUCs are what's been coming
> up repeatedly as an actual practical issue.

Meh.  A quick look through the commit logs says that GUC variables are not
more than 50% of what we've had to PGDLLIMPORT'ify in the past year or
two.  Maybe that's different from 2ndQuadrant's internal experience,
but then you've not showed us the use-case driving your changes.

> I'd be quite happy to
> PGDLLEXPORT all extern vars, but I was confident that'd be rejected for
> aesthetic reasons, and thought that exporting all GUCs would be a
> reasonable compromise.

>From the aesthetic standpoint, what I'd like is to not have to blanket
our source code with Windows-isms.  But I guess I can't have that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane

Andrew Dunstan  writes:
> On 05/07/2014 04:13 PM, Tom Lane wrote:
>> A README file would be better,
>> perhaps, but there's not a specific directory associated with the jsonb
>> code; so I think this sort of info belongs either in jsonb.h or in the
>> file header comment for jsonb_gin.c.

> Is there any reason we couldn't have a README.jsonb?

We could, but the only place I can see to put it would be in
backend/utils/adt/, which seems like a poor precedent; I don't want to end
up with forty-two README.foo files in there.  The header comments for the
jsonbxxx.c files are probably better candidates for collecting this sort
of info.

(The larger problem here is that utils/adt/ has become a catchbasin for
all kinds of stuff that can barely squeeze under the rubric of "abstract
data type".  But fixing that is something I don't care to tackle now.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor improvement to fdwhandler.sgml

2014-05-07 Thread Etsuro Fujita


(2014/05/05 23:05), Robert Haas wrote:

On Wed, Apr 30, 2014 at 4:10 AM, Etsuro Fujita
 wrote:

(2014/04/28 23:31), Robert Haas wrote:

On Thu, Apr 24, 2014 at 7:59 AM, Etsuro Fujita
 wrote:


The patch attached improves docs in fdwhandler.sgml a little bit.



When you submit a patch, it's helpful to describe what the patch
actually does, rather than just saying it makes things better.  For
example, I think that this patch could be described as "in
fdwhandler.sgml, mark references to scan_clauses with 
tags".



I thought so.  Sorry, my explanation wasn't enough.



A problem with that idea is that scan_clauses is not a field in any
struct.



I was mistaken.  I think those should be marked with  tags. Patch
attached.


OK, committed.


Thanks!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Stephen Frost

Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> I'm proposing something that is like an index, not like a plan node.
> 
> The reason that proposal is being made is that we need to consider
> data structure, data location and processing details.
> 
> * In the case of Mat Views, if there is no Mat View, then we can't use
> it - we can't replace that with just any mat view instead

I agree with you about MatView's.  There are clear trade-offs there,
similar to those with indexes.

> * GPUs and other special processing units have finite data transfer
> rates, so other people have proposed that they retain data on the
> GPU/SPU - so we want to do a lookaside only for situations where the
> data is already prepared to handle a lookaside.

I've heard this and I'm utterly unconvinced that it could be made to
work at all- and it's certainly moving the bar of usefullness quite far
away, making the whole thing much less practical.  If we can't cost for
this transfer rate and make use of GPUs for medium-to-large size queries
which are only transient, then perhaps shoving all GPU work out across
an FDW is actually the right solution, and make that like some kind of
MatView as you're proposing- but I don't see how you're going to manage
updates and invalidation of that data in a sane way for a multi-user PG
system.

> * The other cases I cited of in-memory data structures are all
> pre-arranged items with structures suited to processing particular
> types of query

If it's transient in-memory work, I'd like to see our generalized
optimizer consider them all instead of pushing that job on the user to
decide when the optimizer should consider certain methods.

> Given that I count 4-5 beneficial use cases for this index-like
> lookaside, it seems worth investing time in.

I'm all for making use of MatViews and GPUs, but there's more than one
way to get there and look-asides feels like pushing the decision,
unnecessarily, on to the user.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Stephen Frost

Simon,

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 8 May 2014 01:49, Kouhei Kaigai  wrote:
> >From your description, my understanding is that you would like to
> stream data from 2 standard tables to the GPU, then perform a join on
> the GPU itself.
> 
> I have been told that is not likely to be useful because of the data
> transfer overheads.

That was my original understanding and, I believe, the case at one
point, however...

> Or did I misunderstand, and that this is intended to get around the
> current lack of join pushdown into FDWs?

I believe the issue with the transfer speeds to the GPU have been either
eliminated or at least reduced to the point where it's practical now.
This is all based on prior discussions with KaiGai- I've not done any
testing myself.  In any case, this is exactly what they're looking to
do, as I understand it, and to do the same with aggregates that work
well on GPUs.

> Can you be specific about the actual architecture you wish for, so we
> can understand how to generalise that into an API?

It's something that *could* be done with FDWs, once they have the
ability to have join push-down and aggregate push-down, but I (and, as I
understand it, Tom) feel isn't really the right answer for this because
the actual *data* is completely under PG in this scenario.  It's just
in-memory processing that's being done on the GPU and in the GPU's
memory.

KaiGai has speculated about other possibilities (eg: having the GPU's
memory also used as some kind of multi-query cache, which would reduce
the transfer costs, but at a level of complexity regarding that cache
that I'm not sure it'd be sensible to try and do and, in any case, could
be done later and might make sense independently, if we could make it
work for, say, a memcached environment too; I'm thinking it would be
transaction-specific, but even that would be pretty tricky unless we
held locks across every row...).

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Simon Riggs

On 7 May 2014 18:39, Stephen Frost  wrote:
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> On 7 May 2014 17:43, Stephen Frost  wrote:
>> > It's the optimizer's job to figure out which path to pick though, based
>> > on which will have the lowest cost.
>>
>> Of course. I'm not suggesting otherwise.
>>
>> >> If do you want that, you can write an Event Trigger that automatically
>> >> adds a lookaside for any table.
>> >
>> > This sounds terribly ugly and like we're pushing optimization decisions
>> > on to the user instead of just figuring out what the best answer is.
>>
>> I'm proposing that we use a declarative approach, just like we do when
>> we say CREATE INDEX.
>
> There's quite a few trade-offs when it comes to indexes though.  I'm
> trying to figure out when you wouldn't want to use a GPU, if it's
> available to you and the cost model says it's faster?  To me, that's
> kind of like saying you want a declarative approach for when to use a
> HashJoin.

I'm proposing something that is like an index, not like a plan node.

The reason that proposal is being made is that we need to consider
data structure, data location and processing details.

* In the case of Mat Views, if there is no Mat View, then we can't use
it - we can't replace that with just any mat view instead
* GPUs and other special processing units have finite data transfer
rates, so other people have proposed that they retain data on the
GPU/SPU - so we want to do a lookaside only for situations where the
data is already prepared to handle a lookaside.
* The other cases I cited of in-memory data structures are all
pre-arranged items with structures suited to processing particular
types of query

Given that I count 4-5 beneficial use cases for this index-like
lookaside, it seems worth investing time in.

It appears that Kaigai wishes something else in addition to this
concept, so there may be some confusion from that. I'm sure it will
take a while to really understand all the ideas and possibilities.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Simon Riggs

On 8 May 2014 01:49, Kouhei Kaigai  wrote:

>> > * ForeignScan node that is not associated with a particular foreign-table.
>> >   Once we try to apply ForeignScan node instead of Sort or Aggregate,
>> existing
>> >   FDW implementation needs to be improved. These nodes scan on a
>> materialized
>> >   relation (generated on the fly), however, existing FDW code assumes
>> >   ForeignScan node is always associated with a particular foreign-table.
>> >   We need to eliminate this restriction.
>>
>> I don't think we need to do that, given the above.
>>
> It makes a problem if ForeignScan is chosen as alternative path of Join.
>
> The target-list of Join node are determined according to the query form
> on the fly, so we cannot expect a particular TupleDesc to be returned
> preliminary. Once we try to apply ForeignScan instead of Join node, it
> has to have its TupleDesc depending on a set of joined relations.
>
> I think, it is more straightforward approach to allow ForeignScan that
> is not associated to a particular (cataloged) relations.

>From your description, my understanding is that you would like to
stream data from 2 standard tables to the GPU, then perform a join on
the GPU itself.

I have been told that is not likely to be useful because of the data
transfer overheads.

Or did I misunderstand, and that this is intended to get around the
current lack of join pushdown into FDWs?

Can you be specific about the actual architecture you wish for, so we
can understand how to generalise that into an API?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-07 Thread Tatsuo Ishii

> I have completed the initial version of the 9.4 release notes.  You can
> view them here:
> 
>   http://www.postgresql.org/docs/devel/static/release-9-4.html
> 
> I will be adding additional markup in the next few days.
> 
> Feedback expected and welcomed.  I expect to be modifying this until we
> release 9.4 final.  I have marked items where I need help with question
> marks.


E.1.3.7.1. System Information Functions

Add functions for error-free pg_class, pg_proc, pg_type, and pg_operator 
lookups (Yugo Nagata, Nozomi Anzai, Robert Haas)

For example, to_regclass() does error-free lookups of pg_class, and returns 
NULL for lookup failures.


Probably "error-free" is too strong wording because these functions
are not actualy error free.

test=# select to_regclass('a.b.c.d');
ERROR:  improper relation name (too many dotted names): a.b.c.d
STATEMENT:  select to_regclass('a.b.c.d');

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Kouhei Kaigai

> > Let me list up the things to be clarified / developed randomly.
> >
> > * Join replacement by FDW; We still don't have consensus about join
> replacement
> >   by FDW. Probably, it will be designed to remote-join implementation
> primarily,
> >   however, things to do is similar. We may need to revisit the Hanada-san's
> >   proposition in the past.
> 
> Agreed. We need to push down joins into FDWs and we need to push down
> aggregates also, so they can be passed to FDWs. I'm planning to look at
> aggregate push down.
> 
Probably, it's a helpful feature.

> > * Lookaside for ANY relations; I want planner to try GPU-scan for any
> relations
> >   once installed, to reduce user's administration cost.
> >   It needs lookaside allow to specify a particular foreign-server, not
> foreign-
> >   table, then create ForeignScan node that is not associated with a
> particular
> >   foreign-table.
> 
> IMHO we would not want to add indexes to every column, on every table, nor
> would we wish to use lookaside for all tables. It is a good thing to be
> able to add optimizations for individual tables. GPUs are not good for
> everything; it is good to be able to leverage their strengths, yet avoid
> their weaknesses.
> 
> If do you want that, you can write an Event Trigger that automatically adds
> a lookaside for any table.
> 
It may be a solution if we try to replace scan on a relation by a ForeignScan,
in other words, a case when we can describe 1:1 relationship between a table
and a foreign-table; being alternatively scanned.

Is it possible to fit a case when a ForeignScan replaces a built-in Join plans?
I don't think it is a realistic assumption to set up lookaside configuration
for all the possible combination of joins, preliminary.

I have an idea; if lookaside accept a function, foreign-server or something
subjective entity as an alternative path, it will be able to create paths
on the fly, not only preconfigured foreign-tables.
This idea will take two forms of DDL commands as:

  CREATE LOOKASIDE  ON 
TO ;

  CREATE LOOKASIDE  ON 
EXECUTE ;

Things to do internally is same. TO- form kicks a built-in routine, instead
of user defined function, to add alternative scan/join paths according to
the supplied table/matview/foreign table and so on.


> > * ForeignScan node that is not associated with a particular foreign-table.
> >   Once we try to apply ForeignScan node instead of Sort or Aggregate,
> existing
> >   FDW implementation needs to be improved. These nodes scan on a
> materialized
> >   relation (generated on the fly), however, existing FDW code assumes
> >   ForeignScan node is always associated with a particular foreign-table.
> >   We need to eliminate this restriction.
> 
> I don't think we need to do that, given the above.
> 
It makes a problem if ForeignScan is chosen as alternative path of Join.

The target-list of Join node are determined according to the query form
on the fly, so we cannot expect a particular TupleDesc to be returned
preliminary. Once we try to apply ForeignScan instead of Join node, it
has to have its TupleDesc depending on a set of joined relations.

I think, it is more straightforward approach to allow ForeignScan that
is not associated to a particular (cataloged) relations.

> > * FDW method for MultiExec. In case when we can stack multiple ForeignScan
> >   nodes, it's helpful to support to exchange scanned tuples in their own
> >   data format. Let's assume two ForeignScan nodes are stacked. One
> performs
> >   like Sort, another performs like Scan. If they internally handle column-
> >   oriented data format, TupleTableSlot is not a best way for data
> exchange.
> 
> I agree TupleTableSlot may not be best way for bulk data movement. We
> probably need to look at buffering/bulk movement between executor nodes
> in general, which would be of benefit for the FDW case also.
> This would be a problem even for Custom Scans as originally presented also,
> so I don't see much change there.
> 
Yes. I is the reason why my Custom Scan proposition supports MultiExec method.

> > * Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented
> >   using writable FDW feature. Not a big issue, but don't forget it...
> 
> Yes, possible.
> 
> 
> I hope these ideas make sense. This is early days and there may be other
> ideas and much detail yet to come.
> 
I'd like to agree general direction. My biggest concern towards FDW is
transparency for application. If lookaside allows to redirect a reference
towards scan/join on regular relations by ForeignScan (as an alternative
method to execute), here is no strong reason to stick on custom-plan.

However, existing ForeignScan node does not support to work without
a particular foreign table. It may become a restriction if we try to
replace Join node by ForeignScan, and it is my worry.
(Even it may be solved during Join replacement by FDW works.)

One other point I noticed.

* SubPlan support; if an extension support its specia

Re: [HACKERS] bgworker crashed or not?

2014-05-07 Thread Craig Ringer

On 05/08/2014 05:45 AM, Robert Haas wrote:
> I've pushed your rebased patch now with some further kibitzing,
> especially in regards to the documentation.

Thanks for tacking that Robert. It's great to have that API sorted out
before 9.4 hits and it's locked in.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Craig Ringer

On 05/08/2014 12:21 AM, Tom Lane wrote:
> If Craig has a concrete argument why all GUCs should be accessible
> to external modules, then let's see it

Because they already are.

The only difference here is that that access works only on !windows.

I agree (strongly) that we should have a better defined API in terms of
what is "accessible to external modules" and what is not. However, we
don't, as you stressed just that in a prior discussion when I raised the
idea of using -fvisbility=hidden to limit access to some symbols.

Given that we don't have any kind of exernal vs internal API division,
why pretend we do just for one platform?

As for just GUCs: I suggested GUCs because GUCs are what's been coming
up repeatedly as an actual practical issue. I'd be quite happy to
PGDLLEXPORT all extern vars, but I was confident that'd be rejected for
aesthetic reasons, and thought that exporting all GUCs would be a
reasonable compromise.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 12:06 PM, Josh Berkus  wrote:
> For that matter, our advice on shared_buffers ... and our design for it
> ... is going to need to change radically soon, since Linux is getting an
> ARC with a frequency cache as well as a recency cache, and FreeBSD and
> OpenSolaris already have them.

I knew about ZFS, but Linux is implementing ARC? There are good
reasons to avoid ARC. CAR seems like a more plausible candidate, since
it apparently acknowledges ARC's shortcomings and fixes them.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-07 Thread Jaime Casanova

On Wed, May 7, 2014 at 7:03 PM, Josh Berkus  wrote:
> On 05/07/2014 05:00 PM, Jaime Casanova wrote:
>> Hi,
>>
>> This patch implements $subject only when ANALYZE and VERBOSE are on.
>> I made it that way because for years nobody seemed interested in this
>> info (at least no one did it) so i decided that maybe is to much
>> information for most people (actually btree indexes are normally very
>> fast).
>
> What's "index maintenance"?
>

Maybe is not the right term... i mean: the time that take to update
indexes on an INSERT/UPDATE operation


-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-07 Thread Josh Berkus

On 05/07/2014 05:00 PM, Jaime Casanova wrote:
> Hi,
> 
> This patch implements $subject only when ANALYZE and VERBOSE are on.
> I made it that way because for years nobody seemed interested in this
> info (at least no one did it) so i decided that maybe is to much
> information for most people (actually btree indexes are normally very
> fast).

What's "index maintenance"?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-07 Thread Jaime Casanova

Hi,

This patch implements $subject only when ANALYZE and VERBOSE are on.
I made it that way because for years nobody seemed interested in this
info (at least no one did it) so i decided that maybe is to much
information for most people (actually btree indexes are normally very
fast).

But because we have GiST and GIN this became an interested piece of
data (there are other cases even when using btree).

Current patch doesn't have docs yet i will add them soon.

-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157
From be4d94a7ca49b176d9f67476f2edde9e1f3d2a21 Mon Sep 17 00:00:00 2001
From: Jaime Casanova 
Date: Wed, 7 May 2014 18:28:46 -0500
Subject: [PATCH] Make EXPLAIN show measurements of updating indexes.

This adds a line in the output of EXPLAIN ANALYZE VERBOSE
showing the time that took to update indexes in INSERT/UPDATE
operations.

Also, add that info in auto_explain module using a new
variable auto_explain.log_indexes.
---
 contrib/auto_explain/auto_explain.c|   14 ++
 src/backend/catalog/indexing.c |2 +-
 src/backend/commands/copy.c|6 +--
 src/backend/commands/explain.c |   75 
 src/backend/executor/execUtils.c   |   19 +++-
 src/backend/executor/nodeModifyTable.c |6 +--
 src/include/commands/explain.h |1 +
 src/include/executor/executor.h|4 +-
 src/include/nodes/execnodes.h  |2 +
 9 files changed, 118 insertions(+), 11 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index c8ca7c4..f2c11e5 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -25,6 +25,7 @@ static int	auto_explain_log_min_duration = -1; /* msec or -1 */
 static bool auto_explain_log_analyze = false;
 static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
+static bool auto_explain_log_indexes = false;
 static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = false;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
@@ -114,6 +115,17 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	DefineCustomBoolVariable("auto_explain.log_indexes",
+			 "Include index statistics in plans.",
+			 "This has no effect unless log_analyze is also set.",
+			 &auto_explain_log_indexes,
+			 false,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomBoolVariable("auto_explain.log_triggers",
 			 "Include trigger statistics in plans.",
 			 "This has no effect unless log_analyze is also set.",
@@ -307,6 +319,8 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 			ExplainBeginOutput(&es);
 			ExplainQueryText(&es, queryDesc);
 			ExplainPrintPlan(&es, queryDesc);
+			if (es.analyze && auto_explain_log_indexes)
+ExplainPrintIndexes(&es, queryDesc);
 			if (es.analyze && auto_explain_log_triggers)
 ExplainPrintTriggers(&es, queryDesc);
 			ExplainEndOutput(&es);
diff --git a/src/backend/catalog/indexing.c b/src/backend/catalog/indexing.c
index 4bf412f..b2d3a1e 100644
--- a/src/backend/catalog/indexing.c
+++ b/src/backend/catalog/indexing.c
@@ -46,7 +46,7 @@ CatalogOpenIndexes(Relation heapRel)
 	resultRelInfo->ri_RelationDesc = heapRel;
 	resultRelInfo->ri_TrigDesc = NULL;	/* we don't fire triggers */
 
-	ExecOpenIndices(resultRelInfo);
+	ExecOpenIndices(resultRelInfo, 0);
 
 	return resultRelInfo;
 }
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 70ee7e5..fbb1f13 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2179,7 +2179,7 @@ CopyFrom(CopyState cstate)
 	  1,		/* dummy rangetable index */
 	  0);
 
-	ExecOpenIndices(resultRelInfo);
+	ExecOpenIndices(resultRelInfo, 0);
 
 	estate->es_result_relations = resultRelInfo;
 	estate->es_num_result_relations = 1;
@@ -2333,7 +2333,7 @@ CopyFrom(CopyState cstate)
 
 if (resultRelInfo->ri_NumIndices > 0)
 	recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
-		   estate);
+		   estate, NULL);
 
 /* AFTER ROW INSERT Triggers */
 ExecARInsertTriggers(estate, resultRelInfo, tuple,
@@ -2440,7 +2440,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 			ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
 			recheckIndexes =
 ExecInsertIndexTuples(myslot, &(bufferedTuples[i]->t_self),
-	  estate);
+	  estate, NULL);
 			ExecARInsertTriggers(estate, resultRelInfo,
  bufferedTuples[i],
  recheckIndexes);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 08f3167..8e2b10d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -502,6 +502,10 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,

Re: [HACKERS] bgworker crashed or not?

2014-05-07 Thread Petr Jelinek


On 07/05/14 23:45, Robert Haas wrote:

It was, but I felt that the different pieces of this should be
separated into separate commits, and (regardless of how I committed
it) I needed to review each change separately.  I wasn't saying it was
your fault that it wasn't done; just that I hadn't gotten it committed
yet.



Oh ok, I got confused and thought you might have overlooked that part 
while modifying related parts of code.



I've pushed your rebased patch now with some further kibitzing,
especially in regards to the documentation.



Cool, thanks :)

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-07 Thread Andres Freund

On 2014-05-07 17:48:15 -0400, Robert Haas wrote:
> On Tue, May 6, 2014 at 6:09 PM, Andres Freund  wrote:
> >> I guess I'd vote for
> >> ditching the allocated column completely and outputting the memory
> >> allocated without ShmemIndex using some fixed tag (like "ShmemIndex"
> >> or "Bootstrap" or "Overhead" or something).
> >
> > My way feels slightly cleaner, but I'd be ok with that as well. There's
> > no possible conflicts with an actual segment... In your variant the
> > unallocated/slop memory would continue to have a NULL key?
> 
> Yeah, that seems all right.

Hm. Not sure what you're ACKing here ;).

> One way to avoid conflict with an actual segment would be to add an
> after-the-fact entry into ShmemIndex representing the amount of memory
> that was used to bootstrap it.

There's lots of allocations from shmem that cannot be associated with
any index entry though. Not just ShmemIndex's own entry. Most
prominently most of the memory used for SharedBufHash isn't actually
associated with the "Shared Buffer Lookup Table" entry - imo a
dynahash.c defficiency.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andres Freund

On 2014-05-07 16:24:53 -0500, Merlin Moncure wrote:
> On Wed, May 7, 2014 at 4:15 PM, Andres Freund  wrote:
> > On 2014-05-07 13:51:57 -0700, Jeff Janes wrote:
> >> On Wed, May 7, 2014 at 11:38 AM, Andres Freund 
> >> wrote:
> >>
> >> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> >> > >
> >> > > *) raising shared buffers does not 'give more memory to postgres for
> >> > > caching' -- it can only reduce it via double paging
> >> >
> >> > That's absolutely not a necessary consequence. If pages are in s_b for a
> >> > while the OS will be perfectly happy to throw them away.
> >> >
> >>
> >> Is that an empirical observation?
> >
> > Yes.
> >
> >> I've run some simulations a couple years
> >> ago, and also wrote some instrumentation to test that theory under
> >> favorably engineered (but still plausible) conditions, and couldn't get
> >> more than a small fraction of s_b to be so tightly bound in that the kernel
> >> could forget about them.  Unless of course the entire workload or close to
> >> it fits in s_b.
> >
> > I think it depends on your IO access patterns. If the whole working set
> > fits into the kernel's page cache and there's no other demand for pages
> > it will stay in. If you constantly rewrite most all your pages they'll
> > also stay in the OS cache because they'll get written out. If the churn
> > in shared_buffers is so high (because it's so small in comparison to the
> > core hot data set) that there'll be dozens if not hundreds clock sweeps
> > a second you'll also have no locality.
> > It's also *hugely* kernel version specific :(
> 
> right.  This is, IMNSHO, exactly the sort of language that belongs in the 
> docs.

Well, that's just the tip of the iceberg though. Whether you can accept
small shared_buffers to counteract double buffering or not is also a
hard to answer question... That again heavily depends on the usage
patterns. If you have high concurrency and your working set has some
locality it's very important to have a high s_b lest you fall afoul of
the freelist lock. If you have high concurrency but 90+ of your page
lookups *aren't* going to be in the cache you need to be very careful
with a large s_b because the clock sweeps to lower the usagecounts can
enlarge the lock contention.
Then there's both memory and cache efficiency questions around both the
PrivateRefCount array and the lwlocks

In short: I think it's pretty hard to transfer this into language that's
a) agreed upon b) understandable to someone that hasn't discovered
several of the facts for him/herself.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-07 Thread Robert Haas

On Tue, May 6, 2014 at 6:09 PM, Andres Freund  wrote:
>> I guess I'd vote for
>> ditching the allocated column completely and outputting the memory
>> allocated without ShmemIndex using some fixed tag (like "ShmemIndex"
>> or "Bootstrap" or "Overhead" or something).
>
> My way feels slightly cleaner, but I'd be ok with that as well. There's
> no possible conflicts with an actual segment... In your variant the
> unallocated/slop memory would continue to have a NULL key?

Yeah, that seems all right.

One way to avoid conflict with an actual segment would be to add an
after-the-fact entry into ShmemIndex representing the amount of memory
that was used to bootstrap it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-05-07 Thread Robert Haas

On Wed, May 7, 2014 at 4:55 PM, Petr Jelinek  wrote:
>> This isn't done yet.
>
> Unless I am missing something this change was included in every patch I sent
> - setting rw->rw_terminate = true; in CleanupBackgroundWorker for zero exit
> code + comment changes. Or do you have objections to this approach?
>
> Anyway missing parts attached.

It was, but I felt that the different pieces of this should be
separated into separate commits, and (regardless of how I committed
it) I needed to review each change separately.  I wasn't saying it was
your fault that it wasn't done; just that I hadn't gotten it committed
yet.

I've pushed your rebased patch now with some further kibitzing,
especially in regards to the documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 2:24 PM, Merlin Moncure  wrote:
> right.  This is, IMNSHO, exactly the sort of language that belongs in the 
> docs.

+1

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 1:39 PM, Tom Lane  wrote:
> The indexscan is incorrectly returning rows where the queried key exists
> but isn't at top-level.
>
> We could fix this either by giving up on no-recheck for existence queries,
> or by changing the way that non-top-level keys get indexed.  However
> I suspect the latter would break containment queries, or at least make
> their lives a lot more difficult.

Thanks for looking into this.

I don't know that it would make life all that much harder for
containment. jsonb_ops is already not compelling for testing
containment where there is a lot of nesting. ISTM that
make_scalar_key() could be taught to respect that distinction. ISTM
that containment will indifferently treat the top-level keys as top
level, and the nested keys as nested, because if you're looking at
something nested you have to put a top-level key in the rhs value to
do so, and it will be marked as top level, or not top level in a
bijective fashion.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Tom Lane

Peter Geoghegan  writes:
> On Wed, May 7, 2014 at 1:47 PM, Tom Lane  wrote:
>> No, wait, containment *doesn't* look into sub-objects:
>> 
>> regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}';
>> f1
>> -
>> {"foo": {"bar": "baz"}}
>> (1 row)
>> 
>> regression=# select * from j where f1 @> '{"bar": "baz"}';
>> f1
>> 
>> (0 rows)

> Yes it does. It's just not intended to work like that. You have to
> "give a path" to what you're looking for.

Right, so the top-level item in the query has to be at top level in the
searched-for row.  This means that we *could* distinguish top-level from
non-top-level keys in the index items, and still be able to generate
correct index probes for a containment search (since we could similarly
mark the generated keys as to whether they came from top level in the
query object).

So the question is whether that's a good idea or not.  It seems like it is
a good idea for the current existence operator, and about a wash for the
current containment operator, but perhaps a bad idea for other definitions
of containment.

In any case, something here needs to change.  Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Merlin Moncure

On Wed, May 7, 2014 at 4:15 PM, Andres Freund  wrote:
> On 2014-05-07 13:51:57 -0700, Jeff Janes wrote:
>> On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote:
>>
>> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
>> > >
>> > > *) raising shared buffers does not 'give more memory to postgres for
>> > > caching' -- it can only reduce it via double paging
>> >
>> > That's absolutely not a necessary consequence. If pages are in s_b for a
>> > while the OS will be perfectly happy to throw them away.
>> >
>>
>> Is that an empirical observation?
>
> Yes.
>
>> I've run some simulations a couple years
>> ago, and also wrote some instrumentation to test that theory under
>> favorably engineered (but still plausible) conditions, and couldn't get
>> more than a small fraction of s_b to be so tightly bound in that the kernel
>> could forget about them.  Unless of course the entire workload or close to
>> it fits in s_b.
>
> I think it depends on your IO access patterns. If the whole working set
> fits into the kernel's page cache and there's no other demand for pages
> it will stay in. If you constantly rewrite most all your pages they'll
> also stay in the OS cache because they'll get written out. If the churn
> in shared_buffers is so high (because it's so small in comparison to the
> core hot data set) that there'll be dozens if not hundreds clock sweeps
> a second you'll also have no locality.
> It's also *hugely* kernel version specific :(

right.  This is, IMNSHO, exactly the sort of language that belongs in the docs.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andrew Dunstan



On 05/07/2014 04:13 PM, Tom Lane wrote:


A README file would be better,
perhaps, but there's not a specific directory associated with the jsonb
code; so I think this sort of info belongs either in jsonb.h or in the
file header comment for jsonb_gin.c.




Is there any reason we couldn't have a README.jsonb?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andres Freund

On 2014-05-07 13:51:57 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote:
> 
> > On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> > >
> > > *) raising shared buffers does not 'give more memory to postgres for
> > > caching' -- it can only reduce it via double paging
> >
> > That's absolutely not a necessary consequence. If pages are in s_b for a
> > while the OS will be perfectly happy to throw them away.
> >
> 
> Is that an empirical observation?

Yes.

> I've run some simulations a couple years
> ago, and also wrote some instrumentation to test that theory under
> favorably engineered (but still plausible) conditions, and couldn't get
> more than a small fraction of s_b to be so tightly bound in that the kernel
> could forget about them.  Unless of course the entire workload or close to
> it fits in s_b.

I think it depends on your IO access patterns. If the whole working set
fits into the kernel's page cache and there's no other demand for pages
it will stay in. If you constantly rewrite most all your pages they'll
also stay in the OS cache because they'll get written out. If the churn
in shared_buffers is so high (because it's so small in comparison to the
core hot data set) that there'll be dozens if not hundreds clock sweeps
a second you'll also have no locality.
It's also *hugely* kernel version specific :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 1:51 PM, Peter Geoghegan  wrote:
> Yes it does. It's just not intended to work like that. You have to
> "give a path" to what you're looking for.


There is exactly one exception explicitly noted as such in the
user-visible docs, which is arrays and the containment of single
elements.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bgworker crashed or not?

2014-05-07 Thread Petr Jelinek


On 07/05/14 22:32, Robert Haas wrote:

On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek  wrote:
I've committed the portion of your patch which does this, with pretty
extensive changes.  I moved the function which resets the crash times
to bgworker.c, renamed it, and made it just reset all of the crash
times unconditionally; there didn't seem to be any point in skipping
the irrelevant ones, so it seemed best to keep things simple.  I also
moved the call from reaper() where you had it to
PostmasterStateMachine(), because the old placement did not work for
me when I carried out the following steps:

1. Kill a background worker with a 60-second restart time using
SIGTERM, so that it does exit(1).
2. Before it restarts, kill a regular backend with SIGQUIT.

Let me know if you think I've got it wrong.



No I think it's fine, I didn't try that combination and just wanted to 
put it as deep in the call as possible.




(2) If a shmem-connected backend fails to release the deadman switch
or exits with an exit code other than 0 or 1, we crash-and-restart.  A
non-shmem-connected backend never causes a crash-and-restart.


+1


I did this as a separate commit,
e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for
ReleasePostmasterChildSlot inside the if statement.   Then I realized
that was bogus, so I just pushed
eee6cf1f337aa488a20e9111df446cdad770e645 to fix that.  Hopefully it's
OK now.


The fixed one looks ok to me.




(3) When a background worker exits without triggering a
crash-and-restart, an exit code of precisely 0 causes the worker to be
unregistered; any other exit code has no special effect, so
bgw_restart_time controls.


+1


This isn't done yet.



Unless I am missing something this change was included in every patch I 
sent - setting rw->rw_terminate = true; in CleanupBackgroundWorker for 
zero exit code + comment changes. Or do you have objections to this 
approach?


Anyway missing parts attached.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index fd32d6c..5f86100 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -166,10 +166,12 @@ typedef struct BackgroundWorker
   
 
   
-   Background workers are expected to be continuously running; if they exit
-   cleanly, postgres will restart them immediately.  Consider doing
-   interruptible sleep when they have nothing to do; this can be achieved by
-   calling WaitLatch().  Make sure the
+   Background workers are normally expected to be running continuously;
+   they get restarted automaticaly in case of crash (see 
+   bgw_restart_time documentation for details),
+   but if they exit cleanly, postgres will not restart them.
+   Consider doing interruptible sleep when they have nothing to do; this can be
+   achieved by calling WaitLatch(). Make sure the
WL_POSTMASTER_DEATH flag is set when calling that function, and
verify the return code for a prompt exit in the emergency case that
postgres itself has terminated.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 64c9722..b642f37 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -884,8 +884,9 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
  * running but is no longer.
  *
  * In the latter case, the worker may be stopped temporarily (if it is
- * configured for automatic restart, or if it exited with code 0) or gone
- * for good (if it is configured not to restart and exited with code 1).
+ * configured for automatic restart and exited with code 1) or gone for
+ * good (if it exited with code other than 1 or if it is configured not
+ * to restart).
  */
 BgwHandleStatus
 GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 79d1c50..a0331e6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2849,7 +2849,11 @@ CleanupBackgroundWorker(int pid,
 		if (!EXIT_STATUS_0(exitstatus))
 			rw->rw_crashed_at = GetCurrentTimestamp();
 		else
+		{
+			/* Zero exit status means terminate */
 			rw->rw_crashed_at = 0;
+			rw->rw_terminate = true;
+		}
 
 		/*
 		 * Additionally, for shared-memory-connected workers, just like a
diff --git a/src/include/postmaster/bgworker.h b/src/include/postmaster/bgworker.h
index c9550cc..d3dd1f2 100644
--- a/src/include/postmaster/bgworker.h
+++ b/src/include/postmaster/bgworker.h
@@ -16,10 +16,11 @@
  * that the failure can only be transient (fork failure due to high load,
  * memory pressure, too many processes, etc); more permanent problems, like
  * failure to connect to a database, are detected later in the worker and dealt
- * with just by having the worker exit normally.  A worker which exits with a
- * return code of 0 will be immediately re

Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 1:47 PM, Tom Lane  wrote:
> No, wait, containment *doesn't* look into sub-objects:
>
> regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}';
>f1
> -
>  {"foo": {"bar": "baz"}}
> (1 row)
>
> regression=# select * from j where f1 @> '{"bar": "baz"}';
>  f1
> 
> (0 rows)


Yes it does. It's just not intended to work like that. You have to
"give a path" to what you're looking for.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Jeff Janes

On Wed, May 7, 2014 at 11:38 AM, Andres Freund wrote:

> On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> >
> > *) raising shared buffers does not 'give more memory to postgres for
> > caching' -- it can only reduce it via double paging
>
> That's absolutely not a necessary consequence. If pages are in s_b for a
> while the OS will be perfectly happy to throw them away.
>

Is that an empirical observation?  I've run some simulations a couple years
ago, and also wrote some instrumentation to test that theory under
favorably engineered (but still plausible) conditions, and couldn't get
more than a small fraction of s_b to be so tightly bound in that the kernel
could forget about them.  Unless of course the entire workload or close to
it fits in s_b.

Cheers,

Jeff

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus

On 05/07/2014 01:36 PM, Jeff Janes wrote:
> On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:

>> Unfortunately nobody has the time/resources to do the kind of testing
>> required for a new recommendation for shared_buffers.

> I think it is worse than that.  I don't think we know what such testing
> would even look like.  SSD?  BBU? max_connections=2 with 256 cores?
>  pgbench -N?  capture and replay of Amazon's workload?
> 
> If we could spell out/agree upon what kind of testing we would find
> convincing, that would probably go a long way to getting some people to
> work on carrying out the tests.  Unless the conclusion was "please have 3TB
> or RAM and a 50 disk RAID", then there might be few takers.

Well, step #1 would be writing some easy-to-run benchmarks which carry
out selected workloads and measure response times.  The minimum starting
set would include one OLTP/Web benchmark, and one DW benchmark.

I'm not talking about the software to run the workload; we have that, in
several varieties.  I'm talking about the actual database generator and
queries to run.  That's the hard work.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Tom Lane

I wrote:
> Another idea would be to change the definition of the exists operator
> so that it *does* look into sub-objects.  It seems rather random to me
> that containment looks into sub-objects but exists doesn't.  However,
> possibly there are good reasons for the non-orthogonality.

No, wait, containment *doesn't* look into sub-objects:

regression=# select * from j where f1 @> '{"foo": {"bar": "baz"}}';
   f1
-
 {"foo": {"bar": "baz"}}
(1 row)

regression=# select * from j where f1 @> '{"bar": "baz"}';
 f1 

(0 rows)

This is rather surprising in view of the way that section 8.14.4
goes on about nesting.  But I guess the user-facing docs for jsonb
are in little better shape than the internal docs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Heikki Linnakangas


On 05/07/2014 10:35 AM, Jeff Janes wrote:

When recovering from a crash (with injection of a partial page write at
time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
verification failure.

16396 is a gin index.

If I have it ignore checksum failures, there is no apparent misbehavior.
  I'm trying to bisect it, but it could take a while and I thought someone
might have some theories based on the log:

29075  2014-05-06 23:29:51.411 PDT:LOG:  0: database system was not
properly shut down; automatic recovery in progress
29075  2014-05-06 23:29:51.411 PDT:LOCATION:  StartupXLOG, xlog.c:6361
29075  2014-05-06 23:29:51.412 PDT:LOG:  0: redo starts at 11/323FE1C0
29075  2014-05-06 23:29:51.412 PDT:LOCATION:  StartupXLOG, xlog.c:6600
29075  2014-05-06 23:29:51.471 PDT:WARNING:  01000: page verification
failed, calculated checksum 35967 but expected 7881
29075  2014-05-06 23:29:51.471 PDT:CONTEXT:  xlog redo Delete list pages


A-ha. The WAL record of list page deletion doesn't create a full-page 
images of the deleted pages. That's pretty sensible, as a deleted page 
doesn't contain any data that needs to be retained. However, if a 
full-page image is not taken, then the page should be completely 
recreated from scratch at replay. It's not doing that.


Thanks for the testing! I'll have a stab at fixing that tomorrow. 
Basically, ginRedoDeleteListPages() needs to re-initialize the deleted 
pages.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] jsonb existence queries are misimplemented by jsonb_ops

2014-05-07 Thread Tom Lane

I wrote:
> The readability of that comment starts to go downhill with its use of
> "reset" to refer to what everything else calls a "recheck" flag, and in
> any case it's claiming that we *don't* need a recheck for exists (a
> statement I suspect to be false, but more later).

And, indeed, it's false:

regression=# create table j (f1 jsonb);
CREATE TABLE
regression=# insert into j values ('{"foo": {"bar": "baz"}}');
INSERT 0 1
regression=# insert into j values ('{"foo": {"blah": "baz"}}');
INSERT 0 1
regression=# insert into j values ('{"fool": {"bar": "baz"}}');
INSERT 0 1
regression=# create index on j using gin(f1);
CREATE INDEX
regression=# select * from j where f1 ? 'bar';
 f1 

(0 rows)

regression=# set enable_seqscan to 0;
SET
regression=# select * from j where f1 ? 'bar';
f1
--
 {"foo": {"bar": "baz"}}
 {"fool": {"bar": "baz"}}
(2 rows)

The indexscan is incorrectly returning rows where the queried key exists
but isn't at top-level.

We could fix this either by giving up on no-recheck for existence queries,
or by changing the way that non-top-level keys get indexed.  However
I suspect the latter would break containment queries, or at least make
their lives a lot more difficult.

Another idea would be to change the definition of the exists operator
so that it *does* look into sub-objects.  It seems rather random to me
that containment looks into sub-objects but exists doesn't.  However,
possibly there are good reasons for the non-orthogonality.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Jeff Janes

On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:

> On 05/06/2014 10:35 PM, Peter Geoghegan wrote:
> > +1. In my view, we probably should have set it to a much higher
> > absolute default value. The main problem with setting it to any
> > multiple of shared_buffers that I can see is that shared_buffers is a
> > very poor proxy for what effective_cache_size is supposed to
> > represent. In general, the folk wisdom around sizing shared_buffers
> > has past its sell-by date.
>
> Unfortunately nobody has the time/resources to do the kind of testing
> required for a new recommendation for shared_buffers.
>

I think it is worse than that.  I don't think we know what such testing
would even look like.  SSD?  BBU? max_connections=2 with 256 cores?
 pgbench -N?  capture and replay of Amazon's workload?

If we could spell out/agree upon what kind of testing we would find
convincing, that would probably go a long way to getting some people to
work on carrying out the tests.  Unless the conclusion was "please have 3TB
or RAM and a 50 disk RAID", then there might be few takers.

Cheers,

Jeff

Re: [HACKERS] bgworker crashed or not?

2014-05-07 Thread Robert Haas

On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek  wrote:
>> What I'm inclined to do is change the logic so that:
>>
>> (1) After a crash-and-restart sequence, zero rw->rw_crashed_at, so
>> that anything which is still registered gets restarted immediately.
>
> Yes, that's quite obvious change which I missed completely :).

I've committed the portion of your patch which does this, with pretty
extensive changes.  I moved the function which resets the crash times
to bgworker.c, renamed it, and made it just reset all of the crash
times unconditionally; there didn't seem to be any point in skipping
the irrelevant ones, so it seemed best to keep things simple.  I also
moved the call from reaper() where you had it to
PostmasterStateMachine(), because the old placement did not work for
me when I carried out the following steps:

1. Kill a background worker with a 60-second restart time using
SIGTERM, so that it does exit(1).
2. Before it restarts, kill a regular backend with SIGQUIT.

Let me know if you think I've got it wrong.

>> (2) If a shmem-connected backend fails to release the deadman switch
>> or exits with an exit code other than 0 or 1, we crash-and-restart.  A
>> non-shmem-connected backend never causes a crash-and-restart.
>
> +1

I did this as a separate commit,
e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for
ReleasePostmasterChildSlot inside the if statement.   Then I realized
that was bogus, so I just pushed
eee6cf1f337aa488a20e9111df446cdad770e645 to fix that.  Hopefully it's
OK now.

>> (3) When a background worker exits without triggering a
>> crash-and-restart, an exit code of precisely 0 causes the worker to be
>> unregistered; any other exit code has no special effect, so
>> bgw_restart_time controls.
>
> +1

This isn't done yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane

Peter Geoghegan  writes:
> On Wed, May 7, 2014 at 12:27 PM, Tom Lane  wrote:
>> The jsonb_ops storage format for values is inherently lossy, because it
>> cannot distinguish the string values "n", "t", "f" from JSON null or
>> boolean true, false respectively; nor does it know the difference between
>> a number and a string containing digits.  This appears to not quite be a
>> bug because the consistent functions force recheck for all queries that
>> care about values (as opposed to keys).  But if it's documented anywhere
>> I don't see where.

> The fact that we *don't* set the reset flag for
> JsonbExistsStrategyNumber, and why that's okay is prominently
> documented. So I'd say that it is.

Meh.  Are you talking about the large comment block in gin_extract_jsonb?
The readability of that comment starts to go downhill with its use of
"reset" to refer to what everything else calls a "recheck" flag, and in
any case it's claiming that we *don't* need a recheck for exists (a
statement I suspect to be false, but more later).  It entirely fails to
explain that other query types *do* need a recheck because of arbitrary
decisions about not representing JSON datatype information fully.  There's
another comment in gin_consistent_jsonb that's just as misleading, because
it mentions (vaguely) some reasons why recheck is necessary without
mentioning this one.

A larger issue here is that, to the extent that this comment even has
the information I'm after, the comment is in the wrong place.  It is not
attached to the code that's actually making the lossy representational
choices (ie, make_scalar_key), nor to the code that decides whether or not
a recheck is needed (ie, gin_consistent_jsonb).  I don't think that basic
architectural choices like these should be relegated to comment blocks
inside specific functions to begin with.  A README file would be better,
perhaps, but there's not a specific directory associated with the jsonb
code; so I think this sort of info belongs either in jsonb.h or in the
file header comment for jsonb_gin.c.

> Anyway, doing things that way for values won't obviate the need to set
> the reset flag, unless you come up with a much more sophisticated
> scheme. Existence (of keys) is only tested in respect of the
> top-level. Containment (where values are tested) is more complicated.

I'm not expecting that it will make things better for the current query
operators.  I am thinking that exact rather than lossy storage might help
for future operators wanting to use this same index representation.
Once we ship 9.4, it's going to be very hard to change the index
representation, especially for the default opclass (cf the business about
which opclass is default for type inet).  So it behooves us to not throw
information away needlessly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Jeff Janes

On Wed, May 7, 2014 at 10:34 AM, Andres Freund wrote:

> Hi,
>
> On 2014-05-07 10:21:26 -0700, Jeff Janes wrote:
> > On Wed, May 7, 2014 at 12:48 AM, Andres Freund  >wrote:
> > > If you have the WAL a pg_xlogdump grepping for everything referring to
> > > that block would be helpful.
> > >
> >
> > The only record which mentions block 28486 by name is this one:
>
> Hm, try running it with -b specified.
>

Still nothing more.


>
> > rmgr: Gin len (rec/tot):   1576/  1608, tx:   77882205, lsn:
> > 11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page,
> node:
> > 1663/16384/16396 blkno: 28486
> >
> > However, I think that that record precedes the recovery start point.
>
> If that's the case it seems likely that a PageSetLSN() or PageSetDirty()
> are missing somewhere...
>

Wouldn't that result in corrupted data after crash recovery when checksum
failures are ignored?  I haven't seen any of that.

Cheers,

Jeff

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane

Peter Geoghegan  writes:
> On Wed, May 7, 2014 at 12:08 PM, Tom Lane  wrote:
>> Is this an obsolete speculation about writing jsonb_hash_ops, or is there
>> still something worth preserving there?  If so, what?

> This is not obsolete. It would equally apply to a GiST opclass that
> wanted to support our current definition of existence. Array elements
> are keys simply by virtue of being strings, but otherwise are treated
> as values. See the large comment block within gin_extract_jsonb().

It's not that aspect I'm complaining about, it's the bit about "one day we
may wish to write".  This comment should restrict itself to describing
what jsonb_ops does, not make already-or-soon-to-be-obsolete statements
about what other opclasses might or might not do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 12:27 PM, Tom Lane  wrote:
> The jsonb_ops storage format for values is inherently lossy, because it
> cannot distinguish the string values "n", "t", "f" from JSON null or
> boolean true, false respectively; nor does it know the difference between
> a number and a string containing digits.  This appears to not quite be a
> bug because the consistent functions force recheck for all queries that
> care about values (as opposed to keys).  But if it's documented anywhere
> I don't see where.

The fact that we *don't* set the reset flag for
JsonbExistsStrategyNumber, and why that's okay is prominently
documented. So I'd say that it is.

> And in any case, is it a good idea?  We could fairly
> easily change things so that these cases are guaranteed distinguishable.
> We're using an entire byte to convey one bit of information (key or
> value); I'm inclined to redefine the flag byte so that it tells not just
> that but which JSON datatype is involved.

It seemed simpler to do it that way. As I've said before, jsonb_ops is
mostly concerned with hstore-style indexing. It could also be
particularly useful for expressional indexes on "tags" arrays of
strings, which is a common use-case.

jsonb_hash_ops on the other hand is for testing containment, which is
useful for querying heavily nested documents, where typically there is
a very low selectivity for keys. It's not the default largely because
I was concerned about not supporting all indexable operators by
default, and because I suspected that it would be preferred to have a
default closer to hstore.

Anyway, doing things that way for values won't obviate the need to set
the reset flag, unless you come up with a much more sophisticated
scheme. Existence (of keys) is only tested in respect of the
top-level. Containment (where values are tested) is more complicated.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Robert Haas

On Wed, May 7, 2014 at 2:58 PM, Peter Geoghegan  wrote:
> On Wed, May 7, 2014 at 11:50 AM, Robert Haas  wrote:
>> But that does not mean, as the phrase "folk
>> wisdom" might be taken to imply, that we don't know anything at all
>> about what actually works well in practice.
>
> Folk wisdom doesn't imply that. It implies that we think this works,
> and we may well be right, but there isn't all that much rigor behind
> some of it. I'm not blaming anyone for this state of affairs. I've
> heard plenty of people repeat the "don't exceed 8GB" rule - I
> regularly repeated it myself. I cannot find any rigorous defense of
> this, though. If you're aware of one, please point it out to me.

I'm not sure the level of rigor you'd like to see is going to be
available here.  Complex systems have complex behavior; that's life.

At any rate, I'm not aware of any rigorous defense of the "don't
exceed 8GB" rule.  But, #1, I'd never put it that simply.   What I've
found is more like this: If it's possible to size shared_buffers so
that the working set fits entirely within shared_buffers, that
configuration is worthy of strong consideration.  Otherwise, you
probably want to keep shared_buffers low in order to avoid
checkpoint-related I/O spikes and minimize double buffering; try 25%
of system memory up to 512MB on Windows or up to 2GB on 32-bit Linux
or up to 8GB on 64-bit Linux for starters, and then tune based on your
workload.

And #2, I think the origin of the 8GB number on 64-bit non-Windows
systems is that people found that checkpoint-related I/O spikes became
intolerable when you went too much above that number.  On some
systems, the threshold is lower than that - for example, I believe
Merlin and others have reported numbers more like 2GB than 8GB - and
on other systems, the threshold is higher - indeed, some people go way
higher and never hit it at all.  I agree that it would be nice to
better-characterize why different users hit it at different levels,
but it's probably highly dependent on hardware, workload, and kernel
version, so I tend to doubt it can be characterized very simply.

If I had go to guess, I'd bet that fixing Linux's abominable behavior
around the fsync() call would probably go a long way toward making
higher values of shared_buffers more practical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane

And while I'm looking at it ...

The jsonb_ops storage format for values is inherently lossy, because it
cannot distinguish the string values "n", "t", "f" from JSON null or
boolean true, false respectively; nor does it know the difference between
a number and a string containing digits.  This appears to not quite be a
bug because the consistent functions force recheck for all queries that
care about values (as opposed to keys).  But if it's documented anywhere
I don't see where.  And in any case, is it a good idea?  We could fairly
easily change things so that these cases are guaranteed distinguishable.
We're using an entire byte to convey one bit of information (key or
value); I'm inclined to redefine the flag byte so that it tells not just
that but which JSON datatype is involved.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 12:08 PM, Tom Lane  wrote:
>  * Jsonb Keys and string array elements are treated equivalently when
>  * serialized to text index storage.  One day we may wish to create an opclass
>  * that only indexes values, but for now keys and values are stored in GIN
>  * indexes in a way that doesn't really consider their relationship to each
>  * other.
>
> Is this an obsolete speculation about writing jsonb_hash_ops, or is there
> still something worth preserving there?  If so, what?

This is not obsolete. It would equally apply to a GiST opclass that
wanted to support our current definition of existence. Array elements
are keys simply by virtue of being strings, but otherwise are treated
as values. See the large comment block within gin_extract_jsonb().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 11:50 AM, Robert Haas  wrote:
>> Doesn't match my experience. Even with the current buffer manager
>> there's usually enough locality to keep important pages in s_b for a
>> meaningful time. I *have* seen workloads that should have fit into
>> memory not fit because of double buffering.
>
> Same here.

I think that it depends on whether or not you're thinking about the
worst case. Most people are not going to be in the category you
describe here. Plenty of people in the Postgres community run with
very large shared_buffers settings, on non i/o bound workloads, and
report good results - often massive, quickly apparent improvements.
I'm mostly concerned with obsoleting the 8GB hard ceiling rule here.

It probably doesn't matter whether and by how much one factor is worse
than the other, though. I found the section "5.2 Temporal Control:
Buﬀering" in the following paper, that speaks about the subject quite
interesting: http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane

btw ... in jsonb.h there is this comment:

 * Jsonb Keys and string array elements are treated equivalently when
 * serialized to text index storage.  One day we may wish to create an opclass
 * that only indexes values, but for now keys and values are stored in GIN
 * indexes in a way that doesn't really consider their relationship to each
 * other.

Is this an obsolete speculation about writing jsonb_hash_ops, or is there
still something worth preserving there?  If so, what?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus

On 05/07/2014 11:52 AM, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 11:40 AM, Josh Berkus  wrote:
>> So, as one of several people who put literally hundreds of hours into
>> the original benchmarking which established the sizing recommendations
>> for shared_buffers (and other settings), I find the phrase "folk wisdom"
>> personally offensive.  So, can we stop with this?
> 
> I have also put a lot of time into benchmarking. No personal offence
> was intended, and I'm glad that we have some advice to give to users,
> but the fact of the matter is that current *official* recommendations
> are very vague.

Well, they should be vague; the only hard data we have is rather
out-of-date (I think 8.2 was our last set of tests).  If we gave users
specific, detailed recommendations, we'd be misleading them.

For that matter, our advice on shared_buffers ... and our design for it
... is going to need to change radically soon, since Linux is getting an
ARC with a frequency cache as well as a recency cache, and FreeBSD and
OpenSolaris already have them.

FWIW, if someone could fund me for a month, I'd be happy to create a
benchmarking setup where we could test these kinds of things; I have
pretty clear ideas how to build one.  I imagine some of our other
consultants could make the same offer.  However, it's too much work for
anyone to get done "in their spare time".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andrew Dunstan



On 05/07/2014 02:52 PM, Andres Freund wrote:

On 2014-05-07 14:48:51 -0400, Tom Lane wrote:

Andres Freund  writes:

* The jentry representation should be changed so it's possible to get the type
   of a entry without checking individual types. That'll make code like
   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
   much easier to read. Preferrably so it an have the same values (after
   shifting/masking) ask the JBE variants. And it needs space for futher
   types (or representations thereof).

Further types?  What further types?  JSON seems unlikely to grow any
other value types than what it's got.

I am not thinking about user level exposed types, but e.g. a hash/object
representation that allows duplicated keys and keeps the original
order. Because I am pretty sure that'll come.





That makes one of you. We've had hstore for years and nobody that I 
recall has asked for preservation of key order or duplicates. And any 
app that relies on either in JSON is still, IMNSHO, broken by design.


OTOH, there are suggestions of "superjson" types that support other 
scalar types such as timestamps.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] making bgworkers without shmem access actually not have shmem access

2014-05-07 Thread Petr Jelinek


On 07/05/14 20:37, Robert Haas wrote:

At a minimum, it's got to be better than the status quo, where shared
memory is accessible throughout the entire lifetime of
non-shmem-access background workers.



Seems reasonable to me, it might need to be revisited to at least try to 
figure out if we can make EXEC_BACKEND safer, but it's definitely better 
than the current state.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 11:50 AM, Robert Haas  wrote:
> But that does not mean, as the phrase "folk
> wisdom" might be taken to imply, that we don't know anything at all
> about what actually works well in practice.

Folk wisdom doesn't imply that. It implies that we think this works,
and we may well be right, but there isn't all that much rigor behind
some of it. I'm not blaming anyone for this state of affairs. I've
heard plenty of people repeat the "don't exceed 8GB" rule - I
regularly repeated it myself. I cannot find any rigorous defense of
this, though. If you're aware of one, please point it out to me.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] making bgworkers without shmem access actually not have shmem access

2014-05-07 Thread Robert Haas

On Wed, May 7, 2014 at 2:44 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I've complained about this problem a few times before: there's nothing
>> to prevent a background worker which doesn't request shared memory
>> access from calling InitProcess() and then accessing shared memory
>> anyway.  The attached patch is a first crack at fixing it.
>
>> Comments?
>
> Looks reasonable to me.

Thanks for the fast review.  Committed, after fixing one further bug I spotted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 11:40 AM, Josh Berkus  wrote:
> So, as one of several people who put literally hundreds of hours into
> the original benchmarking which established the sizing recommendations
> for shared_buffers (and other settings), I find the phrase "folk wisdom"
> personally offensive.  So, can we stop with this?

I have also put a lot of time into benchmarking. No personal offence
was intended, and I'm glad that we have some advice to give to users,
but the fact of the matter is that current *official* recommendations
are very vague.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andres Freund

On 2014-05-07 14:48:51 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > * The jentry representation should be changed so it's possible to get the 
> > type
> >   of a entry without checking individual types. That'll make code like
> >   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
> >   much easier to read. Preferrably so it an have the same values (after
> >   shifting/masking) ask the JBE variants. And it needs space for futher
> >   types (or representations thereof).
> 
> Further types?  What further types?  JSON seems unlikely to grow any
> other value types than what it's got.

I am not thinking about user level exposed types, but e.g. a hash/object
representation that allows duplicated keys and keeps the original
order. Because I am pretty sure that'll come.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andres Freund

On 2014-05-07 11:45:04 -0700, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 11:38 AM, Andres Freund  wrote:
> >> *) raising shared buffers does not 'give more memory to postgres for
> >> caching' -- it can only reduce it via double paging
> >
> > That's absolutely not a necessary consequence. If pages are in s_b for a
> > while the OS will be perfectly happy to throw them away.
> 
> The biggest problem with double buffering is not that it wastes
> memory. Rather, it's that it wastes memory bandwidth.

Doesn't match my experience. Even with the current buffer manager
there's usually enough locality to keep important pages in s_b for a
meaningful time. I *have* seen workloads that should have fit into
memory not fit because of double buffering.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Robert Haas

On Wed, May 7, 2014 at 2:49 PM, Andres Freund  wrote:
> On 2014-05-07 11:45:04 -0700, Peter Geoghegan wrote:
>> On Wed, May 7, 2014 at 11:38 AM, Andres Freund  
>> wrote:
>> >> *) raising shared buffers does not 'give more memory to postgres for
>> >> caching' -- it can only reduce it via double paging
>> >
>> > That's absolutely not a necessary consequence. If pages are in s_b for a
>> > while the OS will be perfectly happy to throw them away.
>>
>> The biggest problem with double buffering is not that it wastes
>> memory. Rather, it's that it wastes memory bandwidth.
>
> Doesn't match my experience. Even with the current buffer manager
> there's usually enough locality to keep important pages in s_b for a
> meaningful time. I *have* seen workloads that should have fit into
> memory not fit because of double buffering.

Same here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Robert Haas

On Wed, May 7, 2014 at 2:40 PM, Josh Berkus  wrote:
> On 05/07/2014 11:13 AM, Peter Geoghegan wrote:
>> We ought to be realistic about the fact that the current
>> recommendations around sizing shared_buffers are nothing more than
>> folk wisdom. That's the best we have right now, but that seems quite
>> unsatisfactory to me.
>
> So, as one of several people who put literally hundreds of hours into
> the original benchmarking which established the sizing recommendations
> for shared_buffers (and other settings), I find the phrase "folk wisdom"
> personally offensive.  So, can we stop with this?
>
> Otherwise, I don't think I can usefully participate in this discussion.

+1.

I think it is quite accurate to say that we can't predict precisely
what value of shared_buffers will perform best for a particular
workload and on a particular system.  There are people out there using
very large values and very small ones, according to what they have
found most effective.  But that does not mean, as the phrase "folk
wisdom" might be taken to imply, that we don't know anything at all
about what actually works well in practice.  Because we do know quite
a bit about that.  I and people I work with have been able to improve
performance greatly on many systems by providing guidance based on
what this community has been able to understand on this topic, and
dismissing it as rubbish is wrong.

Also, I seriously doubt that a one-size-fits-all guideline about
setting shared_buffers will ever be right for every workload.
Workloads, by their nature, are complex beasts.  The size of the
workload varies, and which portions of it are how hot vary, and the
read-write mix varies, and those are not problems with PostgreSQL;
those are problems with data.  That is not to say that we can't do
anything to make PostgreSQL work better across a wider range of
settings for shared_buffers, but it is to say that no matter how much
work we do on the code, setting this optimally for every workload will
probably remain complex.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Tom Lane

Andres Freund  writes:
> * The jentry representation should be changed so it's possible to get the type
>   of a entry without checking individual types. That'll make code like
>   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
>   much easier to read. Preferrably so it an have the same values (after
>   shifting/masking) ask the JBE variants. And it needs space for futher
>   types (or representations thereof).

Further types?  What further types?  JSON seems unlikely to grow any
other value types than what it's got.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 11:38 AM, Andres Freund  wrote:
>> *) raising shared buffers does not 'give more memory to postgres for
>> caching' -- it can only reduce it via double paging
>
> That's absolutely not a necessary consequence. If pages are in s_b for a
> while the OS will be perfectly happy to throw them away.

The biggest problem with double buffering is not that it wastes
memory. Rather, it's that it wastes memory bandwidth. I think that
lessening that problem will be the major benefit of making larger
shared_buffers settings practical.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] making bgworkers without shmem access actually not have shmem access

2014-05-07 Thread Tom Lane

Robert Haas  writes:
> I've complained about this problem a few times before: there's nothing
> to prevent a background worker which doesn't request shared memory
> access from calling InitProcess() and then accessing shared memory
> anyway.  The attached patch is a first crack at fixing it.

> Comments?

Looks reasonable to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus

On 05/07/2014 11:13 AM, Peter Geoghegan wrote:
> We ought to be realistic about the fact that the current
> recommendations around sizing shared_buffers are nothing more than
> folk wisdom. That's the best we have right now, but that seems quite
> unsatisfactory to me.

So, as one of several people who put literally hundreds of hours into
the original benchmarking which established the sizing recommendations
for shared_buffers (and other settings), I find the phrase "folk wisdom"
personally offensive.  So, can we stop with this?

Otherwise, I don't think I can usefully participate in this discussion.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andres Freund

On 2014-05-07 13:32:41 -0500, Merlin Moncure wrote:
> On Wed, May 7, 2014 at 1:13 PM, Peter Geoghegan  wrote:
> > On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:
> >> Unfortunately nobody has the time/resources to do the kind of testing
> >> required for a new recommendation for shared_buffers.
> >
> > I meant to suggest that the buffer manager could be improved to the
> > point that the old advice becomes obsolete. Right now, it's much
> > harder to analyze shared_buffers than it should be, presumably because
> > of the problems with the buffer manager. I think that if we could
> > formulate better *actionable* advice around what we have right now,
> > that would have already happened.
> >
> > We ought to be realistic about the fact that the current
> > recommendations around sizing shared_buffers are nothing more than
> > folk wisdom. That's the best we have right now, but that seems quite
> > unsatisfactory to me.
> 
> I think the stock advice is worse then nothing because it is A. based
> on obsolete assumptions and B. doesn't indicate what the tradeoffs are
> or what kinds of symptoms adjusting the setting could alleviate.  The
> documentation should be reduced to things that are known, for example:
> 
> *) raising shared buffers does not 'give more memory to postgres for
> caching' -- it can only reduce it via double paging

That's absolutely not a necessary consequence. If pages are in s_b for a
while the OS will be perfectly happy to throw them away.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] making bgworkers without shmem access actually not have shmem access

2014-05-07 Thread Robert Haas

I've complained about this problem a few times before: there's nothing
to prevent a background worker which doesn't request shared memory
access from calling InitProcess() and then accessing shared memory
anyway.  The attached patch is a first crack at fixing it.
Unfortunately, there's still a window between when we fork() and when
we detach shared memory, but that's also true for processes like
syslogger, whose death is nevertheless does not cause a
crash-and-restart.  Also unfortunately, in the EXEC_BACKEND case, we
actually have to attach shared memory first, to get our background
worker entry details. This is undesirable, but I'm not sure there's
any good way to fix it at all, and certainly not in time for 9.4.
Hopefully, the window when shared memory is attached with this patch
is short enough to make things relatively safe.

At a minimum, it's got to be better than the status quo, where shared
memory is accessible throughout the entire lifetime of
non-shmem-access background workers.

Attached is also a new background worker called say_hello which I used
for testing this patch.  That's obviously not for commit.

Comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index a6b25d8..8b8ae89 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,9 +20,11 @@
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
 #include "storage/barrier.h"
+#include "storage/dsm.h"
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/lwlock.h"
+#include "storage/pg_shmem.h"
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
@@ -400,12 +402,16 @@ BackgroundWorkerStopNotifications(pid_t pid)
 BackgroundWorker *
 BackgroundWorkerEntry(int slotno)
 {
+	static BackgroundWorker myEntry;
 	BackgroundWorkerSlot *slot;
 
 	Assert(slotno < BackgroundWorkerData->total_slots);
 	slot = &BackgroundWorkerData->slot[slotno];
 	Assert(slot->in_use);
-	return &slot->worker;		/* can't become free while we're still here */
+
+	/* must copy this in case we don't intend to retain shmem access */
+	memcpy(&myEntry, &slot->worker, sizeof myEntry);
+	return &myEntry;
 }
 #endif
 
@@ -542,6 +548,19 @@ StartBackgroundWorker(void)
 	snprintf(buf, MAXPGPATH, "bgworker: %s", worker->bgw_name);
 	init_ps_display(buf, "", "", "");
 
+	/*
+	 * If we're not supposed to have shared memory access, then detach from
+	 * shared memory.  If we didn't request shared memory access, the
+	 * postmaster won't force a cluster-wide restart if we exit unexpectedly,
+	 * so we'd better make sure that we don't mess anything up that would
+	 * require that sort of cleanup.
+	 */
+	if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0)
+	{
+		dsm_detach_all();
+		PGSharedMemoryDetach();
+	}
+
 	SetProcessingMode(InitProcessing);
 
 	/* Apply PostAuthDelay */
@@ -616,19 +635,29 @@ StartBackgroundWorker(void)
 	/* We can now handle ereport(ERROR) */
 	PG_exception_stack = &local_sigjmp_buf;
 
-	/* Early initialization */
-	BaseInit();
-
 	/*
-	 * If necessary, create a per-backend PGPROC struct in shared memory,
-	 * except in the EXEC_BACKEND case where this was done in
-	 * SubPostmasterMain. We must do this before we can use LWLocks (and in
-	 * the EXEC_BACKEND case we already had to do some stuff with LWLocks).
+	 * If the background worker request shared memory access, set that up now;
+	 * else, detach all shared memory segments.
 	 */
-#ifndef EXEC_BACKEND
 	if (worker->bgw_flags & BGWORKER_SHMEM_ACCESS)
+	{
+		/*
+		 * Early initialization.  Some of this could be useful even for
+		 * background workers that aren't using shared memory, but they can
+		 * call the individual startup routines for those subsystems if needed.
+		 */
+		BaseInit();
+
+		/*
+		 * Create a per-backend PGPROC struct in shared memory, except in the
+		 * EXEC_BACKEND case where this was done in SubPostmasterMain. We must
+		 * do this before we can use LWLocks (and in the EXEC_BACKEND case we
+		 * already had to do some stuff with LWLocks).
+		 */
+#ifndef EXEC_BACKEND
 		InitProcess();
 #endif
+	}
 
 	/*
 	 * If bgw_main is set, we use that value as the initial entrypoint.
diff --git a/contrib/say_hello/Makefile b/contrib/say_hello/Makefile
new file mode 100644
index 000..2da89d7
--- /dev/null
+++ b/contrib/say_hello/Makefile
@@ -0,0 +1,17 @@
+# contrib/say_hello/Makefile
+
+MODULES = say_hello
+
+EXTENSION = say_hello
+DATA = say_hello--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/say_hello
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/say_hello/say_hello--1.0.sql b/contrib/say_hello/say_hello--1.0.sql
new file mode 100644
index 000..c82b7df
--- /d

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Merlin Moncure

On Wed, May 7, 2014 at 1:13 PM, Peter Geoghegan  wrote:
> On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:
>> Unfortunately nobody has the time/resources to do the kind of testing
>> required for a new recommendation for shared_buffers.
>
> I meant to suggest that the buffer manager could be improved to the
> point that the old advice becomes obsolete. Right now, it's much
> harder to analyze shared_buffers than it should be, presumably because
> of the problems with the buffer manager. I think that if we could
> formulate better *actionable* advice around what we have right now,
> that would have already happened.
>
> We ought to be realistic about the fact that the current
> recommendations around sizing shared_buffers are nothing more than
> folk wisdom. That's the best we have right now, but that seems quite
> unsatisfactory to me.

I think the stock advice is worse then nothing because it is A. based
on obsolete assumptions and B. doesn't indicate what the tradeoffs are
or what kinds of symptoms adjusting the setting could alleviate.  The
documentation should be reduced to things that are known, for example:

*) raising shared buffers does not 'give more memory to postgres for
caching' -- it can only reduce it via double paging
*) are generally somewhat faster than fault to o/s buffers
*) large s_b than working dataset size can be good configuration for
read only loads especially
*) have bad interplay with o/s in some configurations with large settings
*) shared buffers can reduce write i/o in certain workloads
*) interplay with checkpoint
*) have different mechanisms for managing contention than o/s buffers

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Jeff Janes

On Tue, May 6, 2014 at 9:55 AM, Andres Freund wrote:

> On 2014-05-06 17:43:45 +0100, Simon Riggs wrote:
>

> > All this changes is the cost of
> > IndexScans that would use more than 25% of shared_buffers worth of
> > data. Hopefully not many of those in your workload. Changing the cost
> > doesn't necessarily prevent index scans either. And if there are many
> > of those in your workload AND you run more than one at same time, then
> > the larger setting will work against you. So the benefit window for
> > such a high setting is slim, at best.
>

Not only do you need to run more than one at a time, but they also must use
mostly disjoint sets of data, in order for the larger estimate to be bad.

>
> Why? There's many workloads where indexes are larger than shared buffers
> but fit into the operating system's cache. And that's precisely what
> effective_cache_size is about.
>

It is more about the size of the table referenced by the index, rather than
the size of the index.  The point is that doing a large index scan might
lead you to visit the same table blocks repeatedly within quick succession.
 (If a small index scan is on the inner side of a nested loop, then you
might access the same index leaf blocks and the same table blocks
repeatedly--that is why is only mostly about the table size, rather than
exclusively).

Cheers,

Jeff

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Andres Freund

On 2014-05-07 10:55:28 -0700, Peter Geoghegan wrote:
> On Wed, May 7, 2014 at 4:40 AM, Andres Freund  wrote:
> > * The jentry representation should be changed so it's possible to get the 
> > type
> >   of a entry without checking individual types. That'll make code like
> >   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
> >   much easier to read. Preferrably so it an have the same values (after
> >   shifting/masking) ask the JBE variants. And it needs space for futher
> >   types (or representations thereof).
>
> I don't know what you mean. Every place that macros like
> JBE_ISNUMERIC() are used subsequently involves access to (say) numeric
> union fields. At best, you could have those functions check that the
> types match, and then handle each case in a switch that only looked at
> (say) the "candidate", but that doesn't really save much at all. It
> wouldn't take much to have the macros give enum constant values back
> as you suggest, though.

Compare
for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (JBE_ISNULL(*e) && key->type == jbvNull)
{
result->type = jbvNull;
result->estSize = sizeof(JEntry);
}
else if (JBE_ISSTRING(*e) && key->type == jbvString)
{
result->type = jbvString;
result->val.string.val = data + JBE_OFF(*e);
result->val.string.len = JBE_LEN(*e);
result->estSize = sizeof(JEntry) + 
result->val.string.len;
}
else if (JBE_ISNUMERIC(*e) && key->type == jbvNumeric)
{
result->type = jbvNumeric;
result->val.numeric = (Numeric) (data + 
INTALIGN(JBE_OFF(*e)));

result->estSize = 2 * sizeof(JEntry) +
VARSIZE_ANY(result->val.numeric);
}
else if (JBE_ISBOOL(*e) && key->type == jbvBool)
{
result->type = jbvBool;
result->val.boolean = JBE_ISBOOL_TRUE(*e) != 0;
result->estSize = sizeof(JEntry);
}
else
continue;

if (compareJsonbScalarValue(key, result) == 0)
return result;
}
with

for (i = 0; i < count; i++)
{
JEntry *e = array + i;

if (!JBE_TYPE_IS_SCALAR(*e))
continue;

if (JBE_TYPE(*e) != key->type)
continue;

result = getJsonbValue(e);

if (compareJsonbScalarValue(key, result) == 0)
return result;
}

Yes, it's not a fair comparison because I made up getJsonbValue(). But
it's *much* more readable regardless. And there's more places that could
use it. Like the second half of findJsonbValueFromSuperHeader(). FWIW,
that's one of the requests I definitely made before.

> > * I wonder if the hash/object pair representation is wise and if it
> >   shouldn't be keys combined with offsets first, and then the
> >   values. That will make access much faster. And that's what jsonb
> >   essentially is about.
>
> I like that the physical layout reflects the layout of the original
> JSON document.

Don't see much value in that. This is a binary representation and it'd
be bijective.

> Besides, I don't think this is obviously the case. Are
> you sure that it won't be more useful to make children as close to
> their parents as possible? Particularly given idiomatic usage.

Because - if done right - it would allow elementwise access without
scanning previous values.

> > * I think both arrays and hashes should grow individual structs. With
> >   space for additional flags.

> Why?

Because a) it will make the code more readable b) it'll allow for
adding different representations of hashes/arrays.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 11:04 AM, Josh Berkus  wrote:
> Unfortunately nobody has the time/resources to do the kind of testing
> required for a new recommendation for shared_buffers.

I meant to suggest that the buffer manager could be improved to the
point that the old advice becomes obsolete. Right now, it's much
harder to analyze shared_buffers than it should be, presumably because
of the problems with the buffer manager. I think that if we could
formulate better *actionable* advice around what we have right now,
that would have already happened.

We ought to be realistic about the fact that the current
recommendations around sizing shared_buffers are nothing more than
folk wisdom. That's the best we have right now, but that seems quite
unsatisfactory to me.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus

On 05/06/2014 10:35 PM, Peter Geoghegan wrote:
> +1. In my view, we probably should have set it to a much higher
> absolute default value. The main problem with setting it to any
> multiple of shared_buffers that I can see is that shared_buffers is a
> very poor proxy for what effective_cache_size is supposed to
> represent. In general, the folk wisdom around sizing shared_buffers
> has past its sell-by date.

Unfortunately nobody has the time/resources to do the kind of testing
required for a new recommendation for shared_buffers.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wanted: jsonb on-disk representation documentation

2014-05-07 Thread Peter Geoghegan

On Wed, May 7, 2014 at 4:40 AM, Andres Freund  wrote:
> * Imo we need space in jsonb ondisk values to indicate a format
>   version. We won't fully get this right.

That would be nice.

> * The jentry representation should be changed so it's possible to get the type
>   of a entry without checking individual types. That'll make code like
>   findJsonbValueFromSuperHeader() (well, whatever you've renamed it to)
>   much easier to read. Preferrably so it an have the same values (after
>   shifting/masking) ask the JBE variants. And it needs space for futher
>   types (or representations thereof).

I don't know what you mean. Every place that macros like
JBE_ISNUMERIC() are used subsequently involves access to (say) numeric
union fields. At best, you could have those functions check that the
types match, and then handle each case in a switch that only looked at
(say) the "candidate", but that doesn't really save much at all. It
wouldn't take much to have the macros give enum constant values back
as you suggest, though.

> * I wonder if the hash/object pair representation is wise and if it
>   shouldn't be keys combined with offsets first, and then the
>   values. That will make access much faster. And that's what jsonb
>   essentially is about.

I like that the physical layout reflects the layout of the original
JSON document. Besides, I don't think this is obviously the case. Are
you sure that it won't be more useful to make children as close to
their parents as possible? Particularly given idiomatic usage. Jsonb,
in point of fact, *is* fast.

> * I think both arrays and hashes should grow individual structs. With
>   space for additional flags.

Why?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Josh Berkus

On 05/07/2014 07:31 AM, Andrew Dunstan wrote:

> +1. If we ever want to implement an auto-tuning heuristic it seems we're
> going to need some hard empirical evidence to support it, and that
> doesn't seem likely to appear any time soon.

4GB default it is, then.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 7 May 2014 17:43, Stephen Frost  wrote:
> > It's the optimizer's job to figure out which path to pick though, based
> > on which will have the lowest cost.
> 
> Of course. I'm not suggesting otherwise.
> 
> >> If do you want that, you can write an Event Trigger that automatically
> >> adds a lookaside for any table.
> >
> > This sounds terribly ugly and like we're pushing optimization decisions
> > on to the user instead of just figuring out what the best answer is.
> 
> I'm proposing that we use a declarative approach, just like we do when
> we say CREATE INDEX.

There's quite a few trade-offs when it comes to indexes though.  I'm
trying to figure out when you wouldn't want to use a GPU, if it's
available to you and the cost model says it's faster?  To me, that's
kind of like saying you want a declarative approach for when to use a
HashJoin.

> The idea is that we only consider a lookaside when a lookaside has
> been declared. Same as when we add an index, the optimizer considers
> whether to use that index. What we don't want to happen is that the
> optimizer considers a GIN plan, even when a GIN index is not
> available.

Yes, I understood your proposal- I just don't agree with it. ;)

For MatViews and/or Indexes, there are trade-offs to be had as it
relates to disk space, insert speed, etc.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Andres Freund

Hi,

On 2014-05-07 10:21:26 -0700, Jeff Janes wrote:
> On Wed, May 7, 2014 at 12:48 AM, Andres Freund wrote:
> 
> > Hi,
> >
> > On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> > > When recovering from a crash (with injection of a partial page write at
> > > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> > > verification failure.
> > >
> > > 16396 is a gin index.
> >
> > Over which type? What was the load? make check?
> >
> 
> A gin index on text[].
> 
> The load is a variation of the crash recovery tester I've been using the
> last few years, this time adapted to use a gin index in a rather unnatural
> way.  I just increment a counter on a random row repeatedly via a unique
> key, but for this purpose that unique key is stuffed into text[] along with
> a bunch of cruft.  The cruft is text representations of negative integers,
> the actual key is text representation of nonnegative integers.
> 
> The test harness (patch to induce crashes, and two driving programs) and a
> preserved data directory are here:
> 
> https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing
> 
> (role jjanes, database jjanes)
> 
> As far as I can tell, this problem goes back to the beginning of page
> checksums.

Interesting.

> > > If I have it ignore checksum failures, there is no apparent misbehavior.
> > >  I'm trying to bisect it, but it could take a while and I thought someone
> > > might have some theories based on the log:
> >
> > If you have the WAL a pg_xlogdump grepping for everything referring to
> > that block would be helpful.
> >
> 
> The only record which mentions block 28486 by name is this one:

Hm, try running it with -b specified.

> rmgr: Gin len (rec/tot):   1576/  1608, tx:   77882205, lsn:
> 11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page, node:
> 1663/16384/16396 blkno: 28486
> 
> However, I think that that record precedes the recovery start point.

If that's the case it seems likely that a PageSetLSN() or PageSetDirty()
are missing somewhere...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Simon Riggs

On 7 May 2014 17:43, Stephen Frost  wrote:
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> IMHO we would not want to add indexes to every column, on every table,
>> nor would we wish to use lookaside for all tables. It is a good thing
>> to be able to add optimizations for individual tables. GPUs are not
>> good for everything; it is good to be able to leverage their
>> strengths, yet avoid their weaknesses.
>
> It's the optimizer's job to figure out which path to pick though, based
> on which will have the lowest cost.

Of course. I'm not suggesting otherwise.

>> If do you want that, you can write an Event Trigger that automatically
>> adds a lookaside for any table.
>
> This sounds terribly ugly and like we're pushing optimization decisions
> on to the user instead of just figuring out what the best answer is.

I'm proposing that we use a declarative approach, just like we do when
we say CREATE INDEX.

The idea is that we only consider a lookaside when a lookaside has
been declared. Same as when we add an index, the optimizer considers
whether to use that index. What we don't want to happen is that the
optimizer considers a GIN plan, even when a GIN index is not
available.

I'll explain it more at the developer meeting. It probably sounds a
bit weird at first.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Robert Haas

On Wed, May 7, 2014 at 1:08 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, May 7, 2014 at 12:21 PM, Tom Lane  wrote:
>>> If Craig has a concrete argument why all GUCs should be accessible
>>> to external modules, then let's see it (after which we'd better debate
>>> exposing the few that are in fact static in guc.c).
>
>> I think there's actually a very good reason to think that GUCs are
>> good candidates for this treatment, which is that, by definition, the
>> GUC is a public interface: you can change it with a SET command.
>
> Sure, and we provide public APIs for accessing/setting GUCs.  The SET
> side of that is most emphatically *not* "just set the C variable".
> Yeah, you can get away with reading them like that, assuming you want
> the internal representation not the user-visible one.  In any case,
> I've not heard the use-case why all (and only) GUCs might need to be
> readable in that way.

My experience is that GUCs are a common thing to want expose to
extensions, and that C code usually wants the internal form, not the
string form.  I'm not arguing that nothing else needs to be exposed,
but if there's a better argument possible for exposing the GUC
variables than the fact that a bunch of people with experience
developing PostgreSQL extensions view that as a real need, I can't
think what it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 checksum errors in recovery with gin index

2014-05-07 Thread Jeff Janes

On Wed, May 7, 2014 at 12:48 AM, Andres Freund wrote:

> Hi,
>
> On 2014-05-07 00:35:35 -0700, Jeff Janes wrote:
> > When recovering from a crash (with injection of a partial page write at
> > time of crash) against 7c7b1f4ae5ea3b1b113682d4d I get a checksum
> > verification failure.
> >
> > 16396 is a gin index.
>
> Over which type? What was the load? make check?
>

A gin index on text[].

The load is a variation of the crash recovery tester I've been using the
last few years, this time adapted to use a gin index in a rather unnatural
way.  I just increment a counter on a random row repeatedly via a unique
key, but for this purpose that unique key is stuffed into text[] along with
a bunch of cruft.  The cruft is text representations of negative integers,
the actual key is text representation of nonnegative integers.

The test harness (patch to induce crashes, and two driving programs) and a
preserved data directory are here:

https://drive.google.com/folderview?id=0Bzqrh1SO9FcESDZVeFk5djJaeHM&usp=sharing

(role jjanes, database jjanes)

As far as I can tell, this problem goes back to the beginning of page
checksums.

> > If I have it ignore checksum failures, there is no apparent misbehavior.
> >  I'm trying to bisect it, but it could take a while and I thought someone
> > might have some theories based on the log:
>
> If you have the WAL a pg_xlogdump grepping for everything referring to
> that block would be helpful.
>

The only record which mentions block 28486 by name is this one:

rmgr: Gin len (rec/tot):   1576/  1608, tx:   77882205, lsn:
11/30F4C2C0, prev 11/30F4C290, bkp: , desc: Insert new list page, node:
1663/16384/16396 blkno: 28486

However, I think that that record precedes the recovery start point.

Cheers,

Jeff

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Andres Freund

On 2014-05-07 13:08:52 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Wed, May 7, 2014 at 12:21 PM, Tom Lane  wrote:
> >> If Craig has a concrete argument why all GUCs should be accessible
> >> to external modules, then let's see it (after which we'd better debate
> >> exposing the few that are in fact static in guc.c).
> 
> > I think there's actually a very good reason to think that GUCs are
> > good candidates for this treatment, which is that, by definition, the
> > GUC is a public interface: you can change it with a SET command.
> 
> Sure, and we provide public APIs for accessing/setting GUCs.

As strings. Not a useful representation for C code... And to avoid units
getting tacked on you need to first get the config option number, then
allocate an array on the stack, call GetConfigOptionByNum(), then parse
the result into the underlying type.

Meh.

> The SET
> side of that is most emphatically *not* "just set the C variable".
> Yeah, you can get away with reading them like that, assuming you want
> the internal representation not the user-visible one.  In any case,
> I've not heard the use-case why all (and only) GUCs might need to be
> readable in that way.

I think you're making up the 'and only' part. There's lots of variables
that should/need to be exported. Just look at the amount of mess you
cleaned up with variou extensions not actually working on
windows...
Last time round you argued against exporting all variables. So Craig
seems to have choosen a subset that's likely to be needed.

> Again, I'm not arguing against a proposal that we should automatically
> export all globally-declared variables for platform-levelling reasons.
> I *am* saying that I find a proposal to do that just to GUCs to be
> unsupported by any argument made so far.

Well, then let's start discussing that proposal then. I propose having
defining a 'pg_export' macro that's suitably defined by the buildsystem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane

Robert Haas  writes:
> On Wed, May 7, 2014 at 12:21 PM, Tom Lane  wrote:
>> If Craig has a concrete argument why all GUCs should be accessible
>> to external modules, then let's see it (after which we'd better debate
>> exposing the few that are in fact static in guc.c).

> I think there's actually a very good reason to think that GUCs are
> good candidates for this treatment, which is that, by definition, the
> GUC is a public interface: you can change it with a SET command.

Sure, and we provide public APIs for accessing/setting GUCs.  The SET
side of that is most emphatically *not* "just set the C variable".
Yeah, you can get away with reading them like that, assuming you want
the internal representation not the user-visible one.  In any case,
I've not heard the use-case why all (and only) GUCs might need to be
readable in that way.

Again, I'm not arguing against a proposal that we should automatically
export all globally-declared variables for platform-levelling reasons.
I *am* saying that I find a proposal to do that just to GUCs to be
unsupported by any argument made so far.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)

2014-05-07 Thread Tom Lane

Heikki Linnakangas  writes:
> gin_extract_jsonb recursively extracts all the elements, keys and values 
> of any sub-object too, but JB_ROOT_COUNT only counts the top-level elements.

Got it.  So if the top level is empty, we can exit early, but otherwise we
use its length * 2 as a guess at how big the output will be; which will
be right if it's an object without further substructure, and otherwise
might need enlargement.

> (I hope this is made a bit more clear in the comments I added in the 
> patch I posted this morning)

Didn't read that yet, but will incorporate this info into the jsonb_gin
patch I'm working on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Robert Haas

On Wed, May 7, 2014 at 12:21 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I don't accept this argument.  In EnterpriseDB's Advanced Server fork
>> of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT
>> precisely because we have external modules that need access to them.
>
> Well, that's an argument for marking every darn global variable as
> PGDLLEXPORT.  But it's *not* an argument for marking GUCs in particular
> that way.  In particular, you are conveniently ignoring the point that
> GUCs are much more likely to be global as an artifact of the way guc.c
> is modularized than because we actually think they should be globally
> accessible.
>
> If Craig has a concrete argument why all GUCs should be accessible
> to external modules, then let's see it (after which we'd better debate
> exposing the few that are in fact static in guc.c).  Or if you want
> to hang your hat on the platform-leveling argument, then we should be
> re-debating exporting *all* global variables.  But as far as the actually
> proposed patch goes, all I'm hearing is very confused thinking.

I think there's actually a very good reason to think that GUCs are
good candidates for this treatment, which is that, by definition, the
GUC is a public interface: you can change it with a SET command.  It's
certainly easier to imagine an extension wanting access to
update_process_title than, say, criticalRelcachesBuilt.

But maybe you're right and we should revisit the idea of exposing
everything.  A quick grep through src/include suggests that GUCs are a
big percentage of what's not marked PGDLLEXPORT anyway, and the other
stuff that's in there is stuff like PgStartTime and PostmasterPid
which hardly seem like silly things to expose.

I certainly think we should err on the side of exposing stuff that
people think might be useful rather than pretending that we can stop
them from using symbols by refusing to PGDLLEXPORT them.  We can't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> IMHO we would not want to add indexes to every column, on every table,
> nor would we wish to use lookaside for all tables. It is a good thing
> to be able to add optimizations for individual tables. GPUs are not
> good for everything; it is good to be able to leverage their
> strengths, yet avoid their weaknesses.

It's the optimizer's job to figure out which path to pick though, based
on which will have the lowest cost.

> If do you want that, you can write an Event Trigger that automatically
> adds a lookaside for any table.

This sounds terribly ugly and like we're pushing optimization decisions
on to the user instead of just figuring out what the best answer is.

> I agree TupleTableSlot may not be best way for bulk data movement. We
> probably need to look at buffering/bulk movement between executor
> nodes in general, which would be of benefit for the FDW case also.
> This would be a problem even for Custom Scans as originally presented
> also, so I don't see much change there.

Being able to do bulk movement would be useful, but (as I proposed
months ago) being able to do asyncronous returns would be extremely
useful also, when you consider FDWs and Append()- the main point there
being that you want to keep the FDWs busy and working in parallel.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Andres Freund

On 2014-05-07 12:21:55 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > I don't accept this argument.  In EnterpriseDB's Advanced Server fork
> > of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT
> > precisely because we have external modules that need access to them.
> 
> Well, that's an argument for marking every darn global variable as
> PGDLLEXPORT.  But it's *not* an argument for marking GUCs in particular
> that way.  In particular, you are conveniently ignoring the point that
> GUCs are much more likely to be global as an artifact of the way guc.c
> is modularized than because we actually think they should be globally
> accessible.

GUCs in general are user configurable things. So it's not particularly
unreasonable to assume that a significant fraction of them are of
interest to extensions. And it's not like exporting them gives way to
many additional dangers - they already can be overwritten.
In fact, I am pretty sure that nearly all of these cases are about
*reading* the underlying variable not writing them. It's pretty darn
less convenient and far slower to get the config variable as text and
then convert it to the underlying type.

> (after which we'd better debate  exposing the few that are in fact
> static in guc.c).

I plan to do that for most of them - completely independently of this
topic. I think 'export'ing a variable in several files is a horrible idea.

> Or if you want
> to hang your hat on the platform-leveling argument, then we should be
> re-debating exporting *all* global variables.

Yes. I am wondering whether that's not the most sensible course. It's a
pita, but essentially we'd have to do a global s/export/pg_export/ in
the headers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)

2014-05-07 Thread Heikki Linnakangas


On 05/07/2014 06:27 PM, Tom Lane wrote:

I think you're just proving the point that this code is woefully
underdocumented.  If there were, somewhere, some comment explaining
what the heck JB_ROOT_COUNT actually counts, maybe I wouldn't be asking
this question.  jsonb.h is certainly not divulging any such information.


After having reverse-engineered the convertJsonb code, I think I can 
explain what JB_ROOT_COUNT is.


If the root of the Jsonb datum is an array, it's the number of elements 
in that top-level array. If it's an object, it's the number of key/value 
pairs in that top-level object. Some of the elements of that array (or 
values of the object) can be arrays or objects themselves.


gin_extract_jsonb recursively extracts all the elements, keys and values 
of any sub-object too, but JB_ROOT_COUNT only counts the top-level elements.


(I hope this is made a bit more clear in the comments I added in the 
patch I posted this morning)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-07 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> Agreed. My proposal is that if the planner allows the lookaside to an
> FDW then we pass the query for full execution on the FDW. That means
> that the scan, aggregate and join could take place via the FDW. i.e.
> "Custom Plan" == lookaside + FDW

How about we get that working for FDWs to begin with and then we can
come back to this idea..?  We're pretty far from join-pushdown or
aggregate-pushdown to FDWs, last I checked, and having those would be a
massive win for everyone using FDWs.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane

Robert Haas  writes:
> I don't accept this argument.  In EnterpriseDB's Advanced Server fork
> of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT
> precisely because we have external modules that need access to them.

Well, that's an argument for marking every darn global variable as
PGDLLEXPORT.  But it's *not* an argument for marking GUCs in particular
that way.  In particular, you are conveniently ignoring the point that
GUCs are much more likely to be global as an artifact of the way guc.c
is modularized than because we actually think they should be globally
accessible.

If Craig has a concrete argument why all GUCs should be accessible
to external modules, then let's see it (after which we'd better debate
exposing the few that are in fact static in guc.c).  Or if you want
to hang your hat on the platform-leveling argument, then we should be
re-debating exporting *all* global variables.  But as far as the actually
proposed patch goes, all I'm hearing is very confused thinking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Robert Haas

On Wed, May 7, 2014 at 11:19 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2014-05-07 10:29:36 -0400, Tom Lane wrote:
>>> To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
>>> we're okay with having third-party modules touching that.  Craig's
>>> proposal is to remove human judgement from that process.
>
>> It's just levelling the planefield between platforms. If I had an idea
>> how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on
>> windows.
>> The problem is that there's lot of variables which aren't exported and
>> which we'll only discover after the release. Just look at what
>> e.g. postgres_fdw needed. It's not particularly unlikely that others
>> fdws need some of those as well. But they can't change the release at
>> the same time.
>
> [ shrug... ] That problem is uncorrelated with GUC status, however.
> If that's your argument for a patch, then the patch should DLLEXPORT
> *every single non-static variable*.  Which is a discussion we've already
> had, and rejected.
>
> I'd not be against an automatic mechanism for that, and indeed put
> considerable work into trying to make it happen a few months ago.  But
> I'll resist wholesale cluttering of the source code with those markers.
> As long as we have to have them, I think we should use them in the way
> I outlined, that we mark only variables that are "considered okay to
> access".  In fact, GUCs are exactly the *last* variables that should get
> marked that way automatically, because so many of them are global only
> because of the need for guc.c to communicate with the owning module,
> not because we want anything else touching them.

I don't accept this argument.  In EnterpriseDB's Advanced Server fork
of PostgreSQL, we've marked a bunch of extra things PGDLLEXPORT
precisely because we have external modules that need access to them.
Of course, what that means is that when PostgreSQL changes things
around in a future release, we have to go revise those external
modules as well.  However, that just doesn't happen often enough to
actually pose a significant problem for us.  It's not really accurate
to think that people are only going to rely on the things that we
choose to PGDLLEXPORT.  It's more accurate to think that when we don't
mark things PGDLLEXPORT, we're forcing people to decide between (1)
giving up on writing their extension, (2) having that extension not
work on Windows, (3) submitting a patch to add a PGDLLEXPORT marking
and waiting an entire release cycle for that to go G.A., and then
still not being able to support older versions, or (4) forking
PostgreSQL.  That's an unappealing list of options.

I would not go so far as to say that we should PGDLLEXPORT absolutely
every non-static variable.  But I think adding those markings to every
GUC we've got is perfectly reasonable.  I am quite sure that the
2ndQuadrant folks know that they'll be on the hook to update any code
they write that uses those variables if and when a future version of
PostgreSQL whacks things around.  But that's not a new problem - the
same thing happens when a function signature changes, or when a
variable that does happen to have a PGDLLEXPORT marking changes.  And
at least in my experience it's also not a large problem.  The amount
of time EnterpriseDB spends updating our (large!) amount of
proprietary code in response to such changes is a small percentage of
our overall development time.

Enabling extensibility is a far more important goal than keeping
people from committing to interfaces that may change in the future,
especially since the latter is a losing battle anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Schizophrenic coding in gin_extract_jsonb(_hash)

2014-05-07 Thread Tom Lane

Peter Geoghegan  writes:
> On Tue, May 6, 2014 at 8:08 PM, Tom Lane  wrote:
>> The early-exit code path supposes that JB_ROOT_COUNT is absolutely
>> reliable as an indicator that there's nothing in the jsonb value.
>> On the other hand, the realloc logic inside the iteration loop implies
>> that JB_ROOT_COUNT is just an untrustworthy estimate.  Which theory is
>> correct?  And why is there not a comment to be seen anywhere?  If the code
>> is correct then this logic is certainly worthy of a comment or three.

> JsonbIteratorNext() is passed "false" as its skipNested argument. It's
> recursive.

And?

I think you're just proving the point that this code is woefully
underdocumented.  If there were, somewhere, some comment explaining
what the heck JB_ROOT_COUNT actually counts, maybe I wouldn't be asking
this question.  jsonb.h is certainly not divulging any such information.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane

Andres Freund  writes:
> On 2014-05-07 10:29:36 -0400, Tom Lane wrote:
>> To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
>> we're okay with having third-party modules touching that.  Craig's
>> proposal is to remove human judgement from that process.

> It's just levelling the planefield between platforms. If I had an idea
> how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on
> windows.
> The problem is that there's lot of variables which aren't exported and
> which we'll only discover after the release. Just look at what
> e.g. postgres_fdw needed. It's not particularly unlikely that others
> fdws need some of those as well. But they can't change the release at
> the same time.

[ shrug... ] That problem is uncorrelated with GUC status, however.
If that's your argument for a patch, then the patch should DLLEXPORT
*every single non-static variable*.  Which is a discussion we've already
had, and rejected.

I'd not be against an automatic mechanism for that, and indeed put
considerable work into trying to make it happen a few months ago.  But
I'll resist wholesale cluttering of the source code with those markers.
As long as we have to have them, I think we should use them in the way
I outlined, that we mark only variables that are "considered okay to
access".  In fact, GUCs are exactly the *last* variables that should get
marked that way automatically, because so many of them are global only
because of the need for guc.c to communicate with the owning module,
not because we want anything else touching them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Andres Freund

On 2014-05-07 10:29:36 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-05-07 09:35:06 -0400, Tom Lane wrote:
> >> Craig Ringer  writes:
> >>> Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
> >>> concerns?
> 
> >> That seems morally equivalent to "is there a reason not to make every
> >> static variable global?".
> 
> > I think what Craig actually tries to propose is to mark all GUCs
> > currently exported in headers PGDLLIMPORT.
> 
> There are few if any GUCs that aren't exposed in headers, just so that
> guc.c can communicate with the owning modules.  That doesn't mean that
> we want everybody in the world messing with them.
> 
> To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
> we're okay with having third-party modules touching that.  Craig's
> proposal is to remove human judgement from that process.

It's just levelling the planefield between platforms. If I had an idea
how I'd still like to just PGDDLIMPORT *all* 'export'ed variables on
windows.
The problem is that there's lot of variables which aren't exported and
which we'll only discover after the release. Just look at what
e.g. postgres_fdw needed. It's not particularly unlikely that others
fdws need some of those as well. But they can't change the release at
the same time.

If we want to declare variables off limits to extension/external code we
need a solution that works on !windows as well.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread David G Johnston

Tom Lane-2 wrote
> Andres Freund <

> andres@

> > writes:
>> On 2014-05-07 09:35:06 -0400, Tom Lane wrote:
>>> Craig Ringer <

> craig@

> > writes:
 Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
 concerns?
> 
>>> That seems morally equivalent to "is there a reason not to make every
>>> static variable global?".
> 
>> I think what Craig actually tries to propose is to mark all GUCs
>> currently exported in headers PGDLLIMPORT.
> 
> There are few if any GUCs that aren't exposed in headers, just so that
> guc.c can communicate with the owning modules.  That doesn't mean that
> we want everybody in the world messing with them.
> 
> To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
> we're okay with having third-party modules touching that.  Craig's
> proposal is to remove human judgement from that process.

So third-party modules that use GUC's that are not PGDLLEXPORT are doing so
improperly - even if it works for them because they only care/test
non-Windows platforms?

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/PGDLLEXPORTing-all-GUCs-tp5802901p5802955.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Simon Riggs

On 7 May 2014 15:10, Merlin Moncure  wrote:

> The core issues are:
> 1) There is no place to enter total system memory available to the
> database in postgresql.conf
> 2) Memory settings (except for the above) are given as absolute
> amounts, not percentages.

Those sound useful starting points.

The key issue for me is that effective_cache_size is a USERSET. It
applies per-query, just like work_mem (though work_mem is per query
node).

If we had "total system memory" we wouldn't know how to divide it up
amongst users since we have no functionality for "workload
management".

It would be very nice to be able to tell Postgres that "I have 64GB
RAM, use it wisely". At present, any and all users can set
effective_cache_size and work_mem to any value they please, any time
they wish and thus overuse available memory. Which is why I've had to
write plugins to manage the memory allocations better in userspace.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Simon Riggs

On 7 May 2014 15:07, Tom Lane  wrote:
> Simon Riggs  writes:
>> I think I'm arguing myself towards using a BufferAccessStrategy of
>> BAS_BULKREAD for large IndexScans, BitMapIndexScans and
>> BitMapHeapScans.
>
> As soon as you've got some hard evidence to present in favor of such
> changes, we can discuss it.  I've got other things to do besides
> hypothesize.

Now we have a theory to test, I'll write a patch and we can collect
evidence for, or against.

> In the meantime, it seems like there is an emerging consensus that nobody
> much likes the existing auto-tuning behavior for effective_cache_size,
> and that we should revert that in favor of just increasing the fixed
> default value significantly.  I see no problem with a value of say 4GB;
> that's very unlikely to be worse than the pre-9.4 default (128MB) on any
> modern machine.
>
> Votes for or against?

+1 for fixed 4GB and remove the auto-tuning code.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2014-05-07 Thread Andrew Dunstan



On 05/07/2014 10:12 AM, Andres Freund wrote:

On 2014-05-07 10:07:07 -0400, Tom Lane wrote:

In the meantime, it seems like there is an emerging consensus that nobody
much likes the existing auto-tuning behavior for effective_cache_size,
and that we should revert that in favor of just increasing the fixed
default value significantly.  I see no problem with a value of say 4GB;
that's very unlikely to be worse than the pre-9.4 default (128MB) on any
modern machine.

Votes for or against?

+1 for increasing it to 4GB and remove the autotuning. I don't like the
current integration into guc.c much and a new static default doesn't
seem to be worse than the current autotuning.





+1. If we ever want to implement an auto-tuning heuristic it seems we're 
going to need some hard empirical evidence to support it, and that 
doesn't seem likely to appear any time soon.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-07 Thread Tom Lane

Andres Freund  writes:
> On 2014-05-07 09:35:06 -0400, Tom Lane wrote:
>> Craig Ringer  writes:
>>> Is there any reason _not_ to PGDLLEXPORT all GUCs, other than cosmetic
>>> concerns?

>> That seems morally equivalent to "is there a reason not to make every
>> static variable global?".

> I think what Craig actually tries to propose is to mark all GUCs
> currently exported in headers PGDLLIMPORT.

There are few if any GUCs that aren't exposed in headers, just so that
guc.c can communicate with the owning modules.  That doesn't mean that
we want everybody in the world messing with them.

To my mind, we PGDLLEXPORT some variable only after deciding that yeah,
we're okay with having third-party modules touching that.  Craig's
proposal is to remove human judgement from that process.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 134 matches

Mail list logo