Re: [HACKERS] 9.5 CF1

2014-07-14 Thread Abhijit Menon-Sen
Hi.

We're now at the nearly done stage of the first 9.5 CF.

Twenty-four patches have been committed so far, and five more are ready
for committer (and of those, four are small). Thirty patches are still
marked needs review, and twenty-one are waiting on author.

I'll go through the patches and follow up or move them to the 2014-08
CF as appropriate, so that this one can be closed out as scheduled.

Many thanks to everyone for their participation in this CF.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-07-14 Thread David Rowley
On Mon, Jul 14, 2014 at 3:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 David Rowley dgrowle...@gmail.com writes:
  I had another look at this and it appears you were right the first time,
 we
  need to ensure there's no NULLs on both sides of the join condition.

 Ugh.  I'm back to being discouraged about the usefulness of the
 optimization.


Are you worried about the planning overhead of the not null checks, or is
it more that you think there's a much smaller chance of a real world
situation that the optimisation will succeed? At least the planning
overhead is limited to query's that have NOT IN clauses.

I'm still quite positive about the patch. I think that it would just be a
matter of modifying query_outputs_are_not_nullable() giving it a nice new
name and changing the parameter list to accept not only a Query, but also a
List of Expr. Likely this would be quite a nice reusable function that
likely could be used in a handful of other places in the planner to
optimise various other cases.

When I first decided to work on this I was more interested in getting some
planner knowledge about NOT NULL constraints than I was interested in
speeding up NOT IN, but it seemed like a perfect target or even excuse to
draft up some code that checks if an expr can never be NULL.

Since the patch has not been marked as rejected I was thinking that I'd
take a bash at fixing it up, but if you think this is a waste of time,
please let me know.

Regards

David Rowley


Re: [HACKERS] WAL replay bugs

2014-07-14 Thread Kyotaro HORIGUCHI
Hello, Let me apologize for continuing the discussion even though
the deadline is approaching.

At Fri, 11 Jul 2014 09:49:55 +0900, Michael Paquier michael.paqu...@gmail.com 
wrote in cab7npqtjezoz-fotibsejyg0eabngx2plqywodchyfxzfqy...@mail.gmail.com
 Updated patches attached.
 
 On Thu, Jul 10, 2014 at 7:13 PM, Kyotaro HORIGUCHI
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 
  The usage of this is a bit irregular as a (extension) module but
  it is the nature of this 'contrib'. The rearranged page type
  detection logic seems neater and keeps to work as intended. This
  logic will be changed after the new page type detection scheme
  becomes ready by the another patch.
 
 No disk format changes will be allowed just to make page detection
 easier (Tom mentioned that earlier in this thread btw). We'll have to
 live with what current code offers, 
 especially considering that adding
 new bytes for page detection for gin pages would double the size of
 its special area after applying MAXALIGN to it.

That's awkward, but I agree with it. By the way, I found
PageHeaderData.pd_flags to have 9 bits room.  It seems to be
usable if no other usage is in sight right now, if the formal
method to identify page types is worth the 3-4 bits there.

# This is a separate discussion from this patch itself.

  - make check runs regress --use-existing but IMHO the make
target which is expected to do that is installcheck. I had
fooled to convince that it should run the postgres which is
built dedicatedly:(
 
 Yes, the patch is abusing of that. --use-existing is specified in this
 case because the test itself is controlling Postgres servers to build
 and fetch the buffer captures. This allows more flexible machinery
 IMO.

Although I doubt necessity of the flexibility seeing the current
testing framework, I don't have so strong objection about
that. Nevertheless, perhaps you are appreciated to put a notice
on.. README or somewhere.

  - postgres processes are left running after
test_default(custom).sh has stopped halfway. This can be fixed
with the attached patch, but, to be honest, this seems too
much. I'll follow your decision whether or not to do this.
(bufcapt_test_sh_20140710.patch)
 
 I had considered that first, thinking that it was the user
 responsibility if things are screwed up with his custom scripts. I
 guess that the way you are doing it is a safeguard simple enough
 though, so included with some editing, particularly reporting to the
 user the error code returned by the test script.

Agreed.

  - test_default.sh is not only an example script which will run
while utilize this facility, but the test script for this
facility itself.
So I think it would be better be added some queries so that all
possible page types available for the default testing. What do
you think about the attached patch?  (hash index is unlogged
but I dared to put it for clarity.)
 
 Yeah, having a default set of queries run just to let the user get an
 idea of how it works improves things.
 Once again thanks for taking the time to look at that.

Thank you.


regardes,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-07-14 Thread Shigeru Hanada
Kaigai-san,

The v3 patch had conflict in src/backend/tcop/utility.c for newly
added IMPORT FOREIGN SCHEMA statement, but it was trivial.

2014-07-08 20:55 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 * System catalog name was changed to pg_custom_plan_provider;
   that reflects role of the object being defined.

ISTM that doc/src/sgml/custom-plan.sgml should be also renamed to
custom-plan-provider.sgml.

 * Also, prefix of its variable names are changed to cpp; that
   means custom-plan-provider.

A custclass remains in a comment in
src/include/catalog/pg_custom_plan_provider.h.

 * Syntax also reflects what the command does more. New syntax to
   define custom plan provider is:
 CREATE CUSTOM PLAN PROVIDER cpp_name
   FOR cpp_class HANDLER cpp_function;
 * Add custom-plan.sgml to introduce interface functions defined
   for path/plan/exec methods.
 * FinalizeCustomPlan() callback was simplified to support scan
   (and join in the future) at the starting point. As long as
   scan/join requirement, no need to control paramids bitmap in
   arbitrary way.
 * Unnecessary forward declaration in relation.h and plannode.h
   were removed, but a few structures still needs to have
   forward declarations.
 * Fix typos being pointed out.

Check.  I found some typos and a wording datatype which is not used
in any other place. Please refer the attached patch.

-- 
Shigeru HANADA


fix_typo_in_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: issue log message to suggest VACUUM FULL if a table is nearly empty

2014-07-14 Thread Abhijit Menon-Sen
Hi.

Do we have any consensus on this patch?

I took a quick look at it because no review was posted. The patch itself
does what it claims to, but from the discussion it doesn't seem like we
want the feature; or perhaps we only don't want it in its present form.

So which is more appropriate, returned with feedback or rejected?

(In the latter case, the TODO item should also be removed.)

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-07-14 Thread Abhijit Menon-Sen
At 2014-05-08 15:28:22 +0200, and...@2ndquadrant.com wrote:

   Hm. Not sure what you're ACKing here ;).
  
  The idea of giving the unallocated memory a NULL key.
 
 Ok. A new version of the patches implementing that are attached.
 Including a couple of small fixups and docs. The latter aren't
 extensive, but that doesn't seem to be warranted anyway.

I realise now that this email didn't actually have an attachment. Could
you please repost the latest version of this patch?

Thanks.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect comment in postgres_fdw.c

2014-07-14 Thread Shigeru Hanada
Fujita-san,

2014-07-11 18:22 GMT+09:00 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 I think the following comment for store_returning_result() in
 postgres_fdw.c is not right.

 /* PGresult must be released before leaving this function. */

 I think PGresult should not be released before leaving this function *on
 success* in that function.

 (I guess the comment has been copied and pasted from that for
 get_remote_estimate().)

+1 for just removing the comment, because header comment clearly
mentions necessity of releasing the PGresult.


-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-14 Thread Kyotaro HORIGUCHI
Thank you for reviewing, the revised patch will come later.

At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila amit.kapil...@gmail.com wrote 
in caa4ek1+6b6wjwf51ozmrl+mkfh8xup9j-pehqvor8se7swy...@mail.gmail.com
 On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI 
 horiguchi.kyot...@lab.ntt.co.jp wrote:
 
  Hello, This is the continuation from the last CF.
 
  This patch intends to make PG to use index for longer pathkeys
  than index columns when,
 
   - The index is a unique index.
   - All index columns are NOT NULL.
   - The index column list is a subset of query_pathkeys.
 
  
 
  This patch consists of mainly three parts.
(snip)
 
 I have spent some time on this patch and would like to share my
 findings as below with you.
 
 1.
 It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
 this patch is not getting set properly.  Please refer below test:

Yeah, probably you omitted the second condition above.

 create table t (a int, b int, c int, d text);

The following definition instead will make this work. NULLs
cannot be unique.

| create table t (a int not null, b int not null, c int, d text);

 create unique index i_t_pkey on t(a, b);
 insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
 10) a);
 analyze;
 explain (costs off, analyze on) select distinct * from t;
...
 2.
 typedef struct IndexOptInfo
   bool predOK; /* true if predicate matches query */
   bool unique; /* true if a unique index */
   bool immediate; /* is uniqueness enforced immediately? */
 + bool full_ordered; /* don't key columns duplicate? */
 
 It seems you have forgotten to update out function _outIndexOptInfo().

Mmm, it's surely what I didn't intended to make:( , but the
member actually works in collect_common_primary_pathkeys.

The correct (desirable) code should use (allnotnull  unique)
instead of (full_ordered). I'll fix this in the comming version
later but they are equivelant in functionality. It's not a
mistake of forgetting update out (so broken), but branching from
wrong point (so it works contrary to the expectation):)

 3.
 + root-distinct_pathkeys =
 + shorten_pathkeys_following(root, root-distinct_pathkeys,pk_guide,
 +   true);
 +
 + root-query_pathkeys =
 + shorten_pathkeys_following(root,root-query_pathkeys,pk_guide,
 +   true);
 
 Add a space after each parameter in function call.

Thank you for pointing out.

 4.
 +PathKeysComparison
 +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)
 
 a. This function return 4 different type of values (PATHKEYS_EQUAL,
PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
however the caller just checks for PATHKEYS_EQUAL, isn't it better to
 just
return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.

Hmm.. I agree that it is practically removable, but I think it is
necessary from the view of similarity to the function with the
similar name. Perhaps I want another name which don't suggest the
similarity to compare_pathekeys(), say, bool
pathkeys_are_equal_ignoring_strategy() to change the rerurn
values set.

 b. Another idea could be that instead of writting separate function,
pass an additional parameter to compare_pathkeys() to distinguish
for ignore strategy case.

It sounds reasonable. According to my faint memory, the reason
why the function is separated from the original one is, perhaps,
simplly a editorial reason.

 c. In any case, I think it is good to add comments on top of newly
introduced function (compare_pathkeys_ignoring_strategy) or extend
the comments of compare_pathkeys() if you want to add a new parameter
to it.

Ouch, thank you for pointing out. I'll add comment if it remains
in the next patch.

  Finally, the new patch has grown up to twice in size.. Although
  it seems a bit large, I think that side-effects are clearly
  avoided.
 
 I am bit worried about the extra cycles added by this patch as compare
 to your previous patch; example
 During trimming of paths, it will build index paths (build_index_pathkeys())
 which it will anyway do again during creation of index paths:
 create_index_paths()-get_index_paths()-build_index_paths()-build_index_pathkeys()

I felt the same thing. On stupid measure to reduce cycles would
be caching created pathkeys in IndexOptInfo. I'll try in the next
patch.

 I am not sure if this has direct impact, however I am suspecting trimming
 logic can add quite a few instructions even for cases when it is not
 beneficial.
 At this moment, I also don't have better idea, so lets proceed with review
 of this
 patch unless there is any other better way to achieve it.

I suppose there's some shortcut which can exclude the cases where
obviously there's no use of pathkeys trimming, for example, using
only pathkey lengths. I'll seek for it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Incorrect comment in postgres_fdw.c

2014-07-14 Thread Fujii Masao
On Mon, Jul 14, 2014 at 7:31 PM, Shigeru Hanada
shigeru.han...@gmail.com wrote:
 Fujita-san,

 2014-07-11 18:22 GMT+09:00 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 I think the following comment for store_returning_result() in
 postgres_fdw.c is not right.

 /* PGresult must be released before leaving this function. */

 I think PGresult should not be released before leaving this function *on
 success* in that function.

 (I guess the comment has been copied and pasted from that for
 get_remote_estimate().)

 +1 for just removing the comment, because header comment clearly
 mentions necessity of releasing the PGresult.

Committed. Thanks for the report!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: issue log message to suggest VACUUM FULL if a table is nearly empty

2014-07-14 Thread Tom Lane
Abhijit Menon-Sen a...@2ndquadrant.com writes:
 Do we have any consensus on this patch?

 I took a quick look at it because no review was posted. The patch itself
 does what it claims to, but from the discussion it doesn't seem like we
 want the feature; or perhaps we only don't want it in its present form.

Re-reading the thread quickly, it seemed like there was considerable
pushback about the cost of collecting the stats, and worries about whether
it wouldn't just be log spam.  But I think the opposite of the latter
concern is also valid, namely that people who could benefit from the
warning are not going to see it because they don't peruse the postmaster
log carefully/at all.  That's a generic problem for warning messages
emitted by background tasks, which we ought to think about how to fix.
In the meantime though it seems like this patch is far more likely to be
annoying than helpful.

 So which is more appropriate, returned with feedback or rejected?
 (In the latter case, the TODO item should also be removed.)

I'd vote for rejecting and annotating the TODO item with a link to
this thread.  And maybe taking off the easy notation.  I think the
TODO item is reflecting a real usability issue, but solving it usefully
is quite a bit harder than it looks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan

2014-07-14 Thread Fujii Masao
On Fri, Jul 11, 2014 at 7:21 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jul 11, 2014 at 5:45 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
 I've noticed that EXPLAIN ANALYZE shows no information on exact/lossy
 blocks for a bitmap heap scan when both the numbers of exact/lossy pages
 retrieved by the node are zero.  Such an example is shown below.  I
 think it would be better to suppress the 'Heap Blocks' line in that
 case, based on the same idea of the 'Buffers' line.  Patch attached.

 The patch looks good to me. Barring any objection, I will commit this both
 in HEAD and 9.4.

Committed!

Regards,


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: issue log message to suggest VACUUM FULL if a table is nearly empty

2014-07-14 Thread Abhijit Menon-Sen
At 2014-07-14 07:55:34 -0400, t...@sss.pgh.pa.us wrote:

 I'd vote for rejecting and annotating the TODO item with a link to
 this thread.

I've marked the patch as rejected and edited the TODO list to remove the
easy. There was already a link to this thread there.

Thank you.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 CF1

2014-07-14 Thread Fabrízio de Royes Mello
On Mon, Jul 14, 2014 at 3:21 AM, Abhijit Menon-Sen a...@2ndquadrant.com
wrote:

 Hi.

 We're now at the nearly done stage of the first 9.5 CF.

 Twenty-four patches have been committed so far, and five more are ready
 for committer (and of those, four are small). Thirty patches are still
 marked needs review, and twenty-one are waiting on author.


Actually thirty one maked needs review because I already send a new
reviewed patch but I forgot to mark needs review... sorry!


Greetings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-07-14 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 On Mon, Jul 14, 2014 at 3:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ugh.  I'm back to being discouraged about the usefulness of the
 optimization.

 Are you worried about the planning overhead of the not null checks, or is
 it more that you think there's a much smaller chance of a real world
 situation that the optimisation will succeed?

Both.  We need to look at how much it costs the planner to run these
checks, and think about how many real queries it will help for.  The
first is quantifiable, the second probably not so much :-( but we still
need to ask the question.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-14 Thread Amit Kapila
On Mon, Jul 14, 2014 at 4:02 PM, Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote:
 At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila amit.kapil...@gmail.com
wrote in caa4ek1+6b6wjwf51ozmrl+mkfh8xup9j-pehqvor8se7swy...@mail.gmail.com

  On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI 
  horiguchi.kyot...@lab.ntt.co.jp wrote:
  I am bit worried about the extra cycles added by this patch as compare
  to your previous patch; example
  During trimming of paths, it will build index paths
(build_index_pathkeys())
  which it will anyway do again during creation of index paths:
 
create_index_paths()-get_index_paths()-build_index_paths()-build_index_pathkeys()

 I felt the same thing. On stupid measure to reduce cycles would
 be caching created pathkeys in IndexOptInfo. I'll try in the next
 patch.

I suggest to wait for overall review of patch before trying this out,
we can try to see what is the best way to avoid this if required, once
other part of patch is reviewed and proved to be problem free.

Other than this I agree with the other points you mentioned in mail
and may be you can just handle those in next version of your patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PATCH] Fix search_path default value separator.

2014-07-14 Thread Robert Haas
On Fri, Jul 11, 2014 at 6:09 AM, Christoph Martin
christoph.r.mar...@gmail.com wrote:
 I noticed a minor inconsistency with the search_path separator used in the
 default configuration.

 The schemas of any search_path set using `SET search_path TO...` are
 separated by ,  (comma, space), while the default value is only separated
 by , (comma).

 The attached patch against master changes the separator of the default value
 to be consistent with the usual comma-space separators, and updates the
 documentation of `SHOW search_path;` accordingly.

 This massive three-byte change passes all 144 tests of make check.

Heh.  I'm not particularly averse to changing this, but I guess I
don't see any particular benefit of changing it either.  Either comma
or comma-space is a legal separator, so why worry about it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Over-optimization in ExecEvalWholeRowVar

2014-07-14 Thread Merlin Moncure
On Fri, Jul 11, 2014 at 2:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This example in the regression database is a simplified version of a bug
 I was shown off-list:

 regression=# select (
 select q from
 ( select 1,2,3 where f10
   union all
   select 4,5,6.0 where f1=0
 ) q
 )
 from int4_tbl;
 ERROR:  record type has not been registered

Thanks Tom!

If anybody gets hit by this, the workaround I use is to put the inner
subquery as a CTE:

 select (
 select q from
 (
 with data as (
 select 1,2,3 where f10
 union all
 select 4,5,6.0 where f1=0
   )
   select * from data
 ) q
 )
 from int4_tbl;

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix search_path default value separator.

2014-07-14 Thread Christoph Martin
True. Both variants are legal, and most people won't ever notice. I
stumbled across this while writing a test case for a transaction helper
that sets/restores search_path before committing. The test was to simply
compare the string values of `SHOW search_path;` before `BEGIN
TRANSACTION;` and after `COMMIT;`.

It's a non-issue, really, but since there's a patch and I cannot come up
with a more common use case that would depend on the use of just-comma
separators in the default value, I'd say it's more of a question of why
not instead of why, isn't it?


On 14 July 2014 16:58, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jul 11, 2014 at 6:09 AM, Christoph Martin
 christoph.r.mar...@gmail.com wrote:
  I noticed a minor inconsistency with the search_path separator used in
 the
  default configuration.
 
  The schemas of any search_path set using `SET search_path TO...` are
  separated by ,  (comma, space), while the default value is only
 separated
  by , (comma).
 
  The attached patch against master changes the separator of the default
 value
  to be consistent with the usual comma-space separators, and updates the
  documentation of `SHOW search_path;` accordingly.
 
  This massive three-byte change passes all 144 tests of make check.

 Heh.  I'm not particularly averse to changing this, but I guess I
 don't see any particular benefit of changing it either.  Either comma
 or comma-space is a legal separator, so why worry about it?

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] things I learned from working on memory allocation

2014-07-14 Thread Robert Haas
On Sun, Jul 13, 2014 at 2:23 PM, Andres Freund and...@2ndquadrant.com wrote:
 The actual if (lock != NULL) bit costs significant amounts of cycles?
 I'd have assumed that branch prediction takes care of that. Or is it
 actually the icache not keeping up? Did you measure icache vs. dcache
 misses?
 Have you played with unlikely()/likely() type of macros?

I have not.  I think it's really got more to do with how much stuff
needs to be saved in the stack frame, but I might be wrong about that.

 Unfortunately, there's some incremental overhead
 in pfree, amounting to a single test of global variable, even when the
 new allocator is never used, which is measurable on a microbenchmark;
 I don't remember the exact numbers off-hand.

 Hm. Why's that needed? Because you're searching for your allocator's
 superblocks in pfree()? I guess you don't store information about the
 size/context for every allocatation like aset.c does?

Since the chunks don't have a StandardChunkHeader, pfree has got to
decide whether it's safe to look at the 16 bytes preceding the chunk
before doing so.  If the new allocator's not in use (single global
variable test) that's pretty cheap, but not so cheap as you might
hope.  I found that it was possible to buy back most of the cost by
replacing (*context-methods-free_p) (context, pointer); with a
hard-coded AllocSetFree(context, pointer), so that gives you some idea
what order of magnitude we're talking about here.

 When the new allocator
 *is* used, pfree gets a lot slower - mostly because one of the data
 structures used by the new allocator suffer occasional cache misses
 during free operations.  I think there might be an argument that some
 hit on pfree would be worth taking in exchange for better
 space-efficiency, because we mostly free contexts by resetting them
 anyway; but I haven't yet managed to make that hit small enough to
 feel especially good about it.

 The speed bump is hit when pfree()ing a allocation that's been allocated
 with the new allocator or also when there's any concurrent activity for
 that allocator?

The latter.

 3. The current MemoryContext abstraction layer is inadequate for
 almost everything one might want to do.  The idea of a context that
 allows only allocation, and ignores requests to free memory, has been
 discussed more than once.  Such an allocator could skip the
 power-of-two rounding that aset.c does, but it couldn't omit the chunk
 header, which means that in practice it would save no memory at all
 for cases like the one mentioned above (REINDEX
 pgbench_accounts_pkey).

 I personally think that the performance benefit of not doing the
 rounding, not accessing the freelist, and such is more interesting for
 such an allocator than the memory savings. I'm pretty sure such a
 'allocation only' allocator for e.g. the parser and parts of the
 executor would be a rather noticeable performance benefit.

I don't have any reason to believe that.  All palloc() does in the
common case is bump the boundary pointer and stick the chunk header in
there.  You're not going to be able to make that much faster.

 But even for the cases where the space savings are the important bit: To
 me it sounds feasible to declare that some allocators don't allow
 reallocations and freeing. Those then would be allowed to omit the chunk
 header.  To make that debuggable we would need to add a chunk header
 when assertions are enabled to error out when such an operation is
 performed - but that's a acceptable price imo.

Hmm, yeah, that could be done.  However, it does have the disadvantage
of requiring you to affirmatively avoid pfreeing anything that might
have come from such a context, rather than simply making pfree a noop.

 Btw, am I the only one that wondered if it  wouldn't be rather
 beneficial to make aset.c add the chunk header before rounding?

Well, it would help in some cases, hurt in others, and make no
difference at all in still others.  I mean, it just has to do with how
well your size classes fit your actual memory allocation pattern;
you'd be effectively changing the size classes from 32, 64, 128, 256
to 16, 48, 112, 240 and that could be a win or a loss depending on the
actual allocation pattern.   I'm sure it's pretty trivial to construct
a test case favoring either outcome.

 5. It's tempting to look at other ways of solving the parallel sort
 problem that don't need an allocator - perhaps by simply packing all
 the tuples into a DSM one after the next.  But this is not easy to do,
 or at least it's not easy to do efficiently.  Tuples enter tuplesort.c
 through one of the tuplesort_put*() functions, most of which end up
 calling one of the copytup_*() functions.  But you can't easily just
 change those functions to stick the tuple someplace else, because they
 don't necessarily get the address of the space to be used from palloc
 directly.  In particular, copytup_heap() calls
 ExecCopySlotMinimalTuple(), and copytup_cluster() calls
 heap_copytuple().  

Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-14 Thread Robert Haas
On Fri, Jul 11, 2014 at 9:55 AM, Bruce Momjian br...@momjian.us wrote:
 On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
  Uh, why does this need to be in ALTER TABLE?  Can't this be part of
  table creation done by pg_dump?

 Uh, I think you need to read the thread.  We have to delay the toast
 creation part so we don't use an oid that will later be required by
 another table from the old cluster.  This has to be done after all
 tables have been created.

 We could have pg_dump spit out those ALTER lines at the end of the dump,
 but it seems simpler to do it in pg_upgrade.

 Even if we have pg_dump create all the tables that require pre-assigned
 TOAST oids first, then the other tables that _might_ need a TOAST table,
 those later tables might create a toast oid that matches a later
 non-TOAST-requiring table, so I don't think that fixes the problem.

 What would be nice is if I could mark just the tables that will need
 toast tables created in that later phase (those tables that didn't have
 a toast table in the old cluster, but need one in the new cluster).
 However, I can't see where to store that or how to pass that back into
 pg_upgrade.  I don't see a logical place in pg_class to put it.

reloptions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] things I learned from working on memory allocation

2014-07-14 Thread Andres Freund
On 2014-07-14 11:24:26 -0400, Robert Haas wrote:
 On Sun, Jul 13, 2014 at 2:23 PM, Andres Freund and...@2ndquadrant.com wrote:
  The actual if (lock != NULL) bit costs significant amounts of cycles?
  I'd have assumed that branch prediction takes care of that. Or is it
  actually the icache not keeping up? Did you measure icache vs. dcache
  misses?
  Have you played with unlikely()/likely() type of macros?
 
 I have not.  I think it's really got more to do with how much stuff
 needs to be saved in the stack frame, but I might be wrong about that.

I don't really see why that'd play such a big role. The register
pressure on ppc/amd64 shouldn't be high enough that the (lock != NULL)
alone requires to push anything on the stack. Sure, the call to
LWLockAcquire() will, but if that's done in the separate branch...

  Unfortunately, there's some incremental overhead
  in pfree, amounting to a single test of global variable, even when the
  new allocator is never used, which is measurable on a microbenchmark;
  I don't remember the exact numbers off-hand.
 
  Hm. Why's that needed? Because you're searching for your allocator's
  superblocks in pfree()? I guess you don't store information about the
  size/context for every allocatation like aset.c does?
 
 Since the chunks don't have a StandardChunkHeader, pfree has got to
 decide whether it's safe to look at the 16 bytes preceding the chunk
 before doing so.  If the new allocator's not in use (single global
 variable test) that's pretty cheap, but not so cheap as you might
 hope.

That's what I was afraid of :/

 I found that it was possible to buy back most of the cost by
 replacing (*context-methods-free_p) (context, pointer); with a
 hard-coded AllocSetFree(context, pointer), so that gives you some idea
 what order of magnitude we're talking about here.

That's measured with a microbenchmark or actual postgres workloads?
Because in my measurements there wasn't consistent benefit in doing so
even when benchmarking workloads where allocation is a bottleneck.

  3. The current MemoryContext abstraction layer is inadequate for
  almost everything one might want to do.  The idea of a context that
  allows only allocation, and ignores requests to free memory, has been
  discussed more than once.  Such an allocator could skip the
  power-of-two rounding that aset.c does, but it couldn't omit the chunk
  header, which means that in practice it would save no memory at all
  for cases like the one mentioned above (REINDEX
  pgbench_accounts_pkey).
 
  I personally think that the performance benefit of not doing the
  rounding, not accessing the freelist, and such is more interesting for
  such an allocator than the memory savings. I'm pretty sure such a
  'allocation only' allocator for e.g. the parser and parts of the
  executor would be a rather noticeable performance benefit.
 
 I don't have any reason to believe that.  All palloc() does in the
 common case is bump the boundary pointer and stick the chunk header in
 there.  You're not going to be able to make that much faster.

In my profiling the access to the freelist (to test whether there's free
chunks in there) actually is noticeable stall. Removing it makes things
measurably faster. Also adds memory overhead :) If you look at it, a
simple AllocSetAlloc() will currently access three cachelines inside
AllocSetContext - with some care that can be one.

  But even for the cases where the space savings are the important bit: To
  me it sounds feasible to declare that some allocators don't allow
  reallocations and freeing. Those then would be allowed to omit the chunk
  header.  To make that debuggable we would need to add a chunk header
  when assertions are enabled to error out when such an operation is
  performed - but that's a acceptable price imo.
 
 Hmm, yeah, that could be done.  However, it does have the disadvantage
 of requiring you to affirmatively avoid pfreeing anything that might
 have come from such a context, rather than simply making pfree a noop.

Yep. Sounds feasible for a number of cases imo. I have serious doubts
that adding a measurable cost to pfree()s for allocations from all
contests is going to be fun. There's some operators that do a lot of
those - especially inside btree opclasses which have to do so. Since
those are the ones you'll often be calling from parallel sort...

  Btw, am I the only one that wondered if it  wouldn't be rather
  beneficial to make aset.c add the chunk header before rounding?
 
 Well, it would help in some cases, hurt in others, and make no
 difference at all in still others.  I mean, it just has to do with how
 well your size classes fit your actual memory allocation pattern;
 you'd be effectively changing the size classes from 32, 64, 128, 256
 to 16, 48, 112, 240 and that could be a win or a loss depending on the
 actual allocation pattern.   I'm sure it's pretty trivial to construct
 a test case favoring either outcome.

I was actually thinking about it 

Re: [HACKERS] 9.4 logical replication - walsender keepalive replies

2014-07-14 Thread Steve Singer

On 07/06/2014 10:11 AM, Andres Freund wrote:

Hi Steve,



Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check -flush
 sentPtr but also  -write  sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,


Yes I think that will work for me.
I tested with the attached patch that I think  does what you describe 
and it seems okay.





Andres Freund



diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
new file mode 100644
index 3189793..844a5de
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*** WalSndWaitForWal(XLogRecPtr loc)
*** 1203,1211 
  		 * possibly are waiting for a later location. So we send pings
  		 * containing the flush location every now and then.
  		 */
! 		if (MyWalSnd-flush  sentPtr  !waiting_for_ping_response)
  		{
! 			WalSndKeepalive(true);
  			waiting_for_ping_response = true;
  		}
  
--- 1203,1213 
  		 * possibly are waiting for a later location. So we send pings
  		 * containing the flush location every now and then.
  		 */
! 		if (MyWalSnd-flush  sentPtr 
! 			MyWalSnd-write  sentPtr 
! 			!waiting_for_ping_response)
  		{
! 			WalSndKeepalive(false);
  			waiting_for_ping_response = true;
  		}
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-14 Thread Fabrízio de Royes Mello
On Mon, Jul 14, 2014 at 12:26 PM, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jul 11, 2014 at 9:55 AM, Bruce Momjian br...@momjian.us wrote:
  On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
   Uh, why does this need to be in ALTER TABLE?  Can't this be part of
   table creation done by pg_dump?
 
  Uh, I think you need to read the thread.  We have to delay the toast
  creation part so we don't use an oid that will later be required by
  another table from the old cluster.  This has to be done after all
  tables have been created.
 
  We could have pg_dump spit out those ALTER lines at the end of the
dump,
  but it seems simpler to do it in pg_upgrade.
 
  Even if we have pg_dump create all the tables that require pre-assigned
  TOAST oids first, then the other tables that _might_ need a TOAST
table,
  those later tables might create a toast oid that matches a later
  non-TOAST-requiring table, so I don't think that fixes the problem.
 
  What would be nice is if I could mark just the tables that will need
  toast tables created in that later phase (those tables that didn't have
  a toast table in the old cluster, but need one in the new cluster).
  However, I can't see where to store that or how to pass that back into
  pg_upgrade.  I don't see a logical place in pg_class to put it.

 reloptions?


Is this another use case to custom reloptions idea?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-14 Thread Peter Geoghegan
On Thu, Jun 12, 2014 at 2:09 PM, Peter Geoghegan p...@heroku.com wrote:
 Right. It was a little bit incautious of me to say that we had the
 full benefit of 8 bytes of storage with en_US.UTF-8, since that is
 only true of lower case characters (I think that FreeBSD can play
 tricks with this. Sometimes, it will give you the benefit of 8 bytes
 of entropy for an 8 byte string, with only non-differentiating
 trailing bytes, so that the first 8 bytes of Aaaa are distinct
 from the first eight bytes of , while any trailing bytes are
 non-distinct for both). In any case it's pretty clear that a goal of
 the glibc implementation is to concentrate as much entropy as possible
 into the first part of the string, and that's the important point.

I thought about using an order-preserving compression technique like
Hu Tucker [1] in order to get additional benefit from those 8 bytes.
Even my apparently sympathetic cities example isn't all that
sympathetic, since the number of distinct normalized keys is only
about 75% of the total number of cities (while a strcmp()-only
reliable tie-breaker isn't expected to be useful for the remaining
25%). Here is the improvement I see when I setup things so that there
is a 1:1 correspondence between city rows and distinct normalized
keys:

postgres=# with cc as
(select count(*), array_agg(ctid) ct,
substring(strxfrm_test(city)::text from 0 for 19 ) blob from cities
group by 3 having count(*)  1 order by 1),
ff as
(
  select unnest(ct[2:400]) u, blob from cc
)
delete from cities using ff where cities.ctid = ff.u;

postgres=# vacuum full cities ;
VACUUM

Afterwards:

postgres=# select count(*) from (select distinct
substring(strxfrm_test(city)::text from 0 for 19) from cities) i;
 count

 243782
(1 row)

postgres=# select count(*) from cities;
 count

 243782
(1 row)

$ cat sort-city.sql
select * from (select * from cities order by city offset 100) d;

Patch results
==

pgbench -M prepared -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 1734
latency average: 173.010 ms
tps = 5.778545 (including connections establishing)
tps = 5.778572 (excluding connections establishing)

pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 2916
latency average: 411.523 ms
tps = 9.706683 (including connections establishing)
tps = 9.706776 (excluding connections establishing)

Master results
==

pgbench -M prepared -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 390
latency average: 769.231 ms
tps = 1.297545 (including connections establishing)
tps = 1.297551 (excluding connections establishing)

pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 535
latency average: 2242.991 ms
tps = 1.777005 (including connections establishing)
tps = 1.777024 (excluding connections establishing)

So that seems like a considerable further improvement that would be
nice to see more frequently. I don't know that it's worth it to go to
the trouble of compressing given the existing ameliorating measures
(HyperLogLog, etc), at least if using compression is motivated by
worst case performance. There is some research on making this work for
this kind of thing specifically [2].

My concern is that it won't be worth it to do the extra work,
particularly given that I already have 8 bytes to work with. Supposing
I only had 4 bytes to work with (as researchers writing [2] may have
only had in 1994), that would leave me with a relatively small number
of distinct normalized keys in many representative cases. For example,
I'd have a mere 40,665 distinct normalized keys in the case of my
cities database, rather than 243,782 (out of a set of 317,102 rows)
for 8 bytes of storage. But if I double that to 16 bytes (which might
be taken as a proxy for what a good compression scheme could get me),
I only get a modest improvement - 273,795 distinct keys. To be fair,
that's in no small part because there are only 275,330 distinct city
names overall (and so most dups get away with a cheap memcmp() on
their tie-breaker), but this is a reasonably organic, representative
dataset.

Now, it's really hard to judge something like that, and I don't
imagine that this analysis is all that scientific. I am inclined to
think that we're better off just aborting if the optimization doesn't
work out while copying, and forgetting about order preserving
compression. 

Re: [HACKERS] SSL information view

2014-07-14 Thread Stefan Kaltenbrunner
On 07/13/2014 10:35 PM, Magnus Hagander wrote:
 On Sun, Jul 13, 2014 at 10:32 PM, Stefan Kaltenbrunner
 ste...@kaltenbrunner.cc wrote:
 On 07/12/2014 03:08 PM, Magnus Hagander wrote:
 As an administrator, I find that you fairly often want to know what
 your current connections are actually using as SSL parameters, and
 there is currently no other way than gdb to find that out - something
 we definitely should fix.

 Yeah that would be handy - however I often wish to be able to figure
 that out based on the logfile as well, any chance of getting these into
 connection-logging/log_line_prefix as well?
 
 We do already log some of it if you have enabled log_connections -
 protocol and cipher. Anything else in particular you'd be looking for
 - compression info?

DN mostly, not sure I care about compression info...


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-14 Thread Claudio Freire
On Mon, Jul 14, 2014 at 2:53 PM, Peter Geoghegan p...@heroku.com wrote:
 My concern is that it won't be worth it to do the extra work,
 particularly given that I already have 8 bytes to work with. Supposing
 I only had 4 bytes to work with (as researchers writing [2] may have
 only had in 1994), that would leave me with a relatively small number
 of distinct normalized keys in many representative cases. For example,
 I'd have a mere 40,665 distinct normalized keys in the case of my
 cities database, rather than 243,782 (out of a set of 317,102 rows)
 for 8 bytes of storage. But if I double that to 16 bytes (which might
 be taken as a proxy for what a good compression scheme could get me),
 I only get a modest improvement - 273,795 distinct keys. To be fair,
 that's in no small part because there are only 275,330 distinct city
 names overall (and so most dups get away with a cheap memcmp() on
 their tie-breaker), but this is a reasonably organic, representative
 dataset.


Are those numbers measured on MAC's strxfrm?

That was the one with suboptimal entropy on the first 8 bytes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-07-14 Thread Robert Haas
On Mon, Jul 14, 2014 at 6:20 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 At 2014-05-08 15:28:22 +0200, and...@2ndquadrant.com wrote:
   Hm. Not sure what you're ACKing here ;).
 
  The idea of giving the unallocated memory a NULL key.

 Ok. A new version of the patches implementing that are attached.
 Including a couple of small fixups and docs. The latter aren't
 extensive, but that doesn't seem to be warranted anyway.

 I realise now that this email didn't actually have an attachment. Could
 you please repost the latest version of this patch?

That's odd.  I received two attachments with that email.  Of course, I
was copied directly, but why would the archives have lost the
attachments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-14 Thread Peter Geoghegan
On Mon, Jul 14, 2014 at 11:03 AM, Claudio Freire klaussfre...@gmail.com wrote:
 Are those numbers measured on MAC's strxfrm?

 That was the one with suboptimal entropy on the first 8 bytes.

No, they're from a Linux system which uses glibc 2.19. The
optimization will simply be not used on implementations that don't
meet a certain standard (see my AC_TRY_RUN test program). I'm
reasonably confident that that test program will pass on most systems.
Just not Mac OSX. The optimization is never used on Windows and 32-bit
systems for other reasons.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-14 Thread Christoph Berg
Re: Fabrízio de Royes Mello 2014-07-12 
cafcns+qf5fukkp9vdjwokiabn_rrm4+hjqxeevpd_7-to0l...@mail.gmail.com
  ... that being the non-WAL-logging with wal_level=minimal, or more?
 
 This is the first of additional goals,  but we have others. Please see [1].

Oh I wasn't aware of the wiki page, I had just read the old thread.
Thanks for the pointer.

   diff --git a/doc/src/sgml/ref/alter_table.sgml
 b/doc/src/sgml/ref/alter_table.sgml
   index 69a1e14..424f2e9 100644
   --- a/doc/src/sgml/ref/alter_table.sgml
   +++ b/doc/src/sgml/ref/alter_table.sgml
   @@ -58,6 +58,7 @@ ALTER TABLE [ IF EXISTS ] replaceable
 class=PARAMETERname/replaceable
ENABLE REPLICA RULE replaceable
 class=PARAMETERrewrite_rule_name/replaceable
ENABLE ALWAYS RULE replaceable
 class=PARAMETERrewrite_rule_name/replaceable
CLUSTER ON replaceable class=PARAMETERindex_name/replaceable
   +SET {LOGGED | UNLOGGED}
SET WITHOUT CLUSTER
SET WITH OIDS
SET WITHOUT OIDS
 
  This must not be between the two CLUSTER lines. I think the best spot
  would just be one line down, before SET WITH OIDS.
 
 Fixed.

The (long) SET LOGGED paragraph is still between CLUSTER and SET
WITHOUT CLUSTER.

  This grammar bug pops up consistently: This form *changes*...
 
 
 Fixed.

Two more:

+ * The AlterTableChangeCatalogToLoggedOrUnlogged function perform the 
+ * The AlterTableChangeIndexesToLoggedOrUnlogged function scan all indexes

  This unnecessarily rewrites all the tabs, but see below.
 
 
 I did that because the new constant AT_PASS_SET_LOGGED_UNLOGGED is larger
 than others.

Ah, ok then.

  I'm wondering if you shouldn't make a single ATPrepSetLogged function
  that takes and additional toLogged argument. Or alternatively get rid
  of the if() by putting the code also into case AT_SetLogged.
 
 
 Actually I started that way... with just one ATPrep function we have some
 additional complexity to check relpersistence, define the error message and
 to run AlterTableSetLoggedCheckForeignConstraints(rel) function. So to
 simplify the code I decided split in two small functions.

Nod.

 relation_close(rel, NoLock);
   +
   + if (pass == AT_PASS_SET_LOGGED_UNLOGGED)
   +
 ATPostAlterSetLoggedUnlogged(RelationGetRelid(rel));
 
  This must be done before relation_close() which releases all locks.

You didn't address that. I'm not sure about it, but either way, this
deserves a comment on the lock level necessary.

  Moreover, I think you can get rid of that extra PASS here.
  AT_PASS_ALTER_TYPE has its own pass because you can alter several
  columns in a single ALTER TABLE statement, but you can have only one
  SET (UN)LOGGED, so you can to the cluster_rel() directly in
  AlterTableSetLoggedOrUnlogged() (unless cluster_rel() is too intrusive
  and would interfere with other ALTER TABLE operations in this command,
  no idea).
 
 
 I had some troubles here so I decided to do in that way, but I confess I'm
 not comfortable with this implementation. Looking more carefully on
 tablecmds.c code, at the ATController we have three phases and the third is
 'scan/rewrite tables as needed' so my doubt is if can I use it instead of
 call 'cluster_rel'?

I've just tried some SET (UN)LOGGED operations with altering column
types in the same operation, that works. But:

Yes, you should use the existing table rewriting machinery, or at
least clearly document (in comments) why it doesn't work for you.

Also looking at ATController, there's a wqueue mechanism to queue
catalog updates. You should probably use this, too, or again document
why it doesn't work for you.

  Here's the big gotcha: Just like SET LOGGED must check for outgoing
  FKs to unlogged tables, SET UNLOGGED must check for incoming FKs from
  permanent tables. This is missing.
 
 
 I don't think so... we can create an unlogged table with a FK referring to
 a regular table...
 ... but is not possible create a FK from a regular table referring to an
 unlogged table:
 ... and a FK from an unlogged table referring other unlogged table works:
 So we must take carefull just when changing an unlogged table to a regular
 table.
 
 Am I correct or I miss something?

You miss the symmetric case the other way round. When changing a table
to unlogged, you need to make sure no other permanent table is
referencing our table.

   +AlterTableChangeCatalogToLoggedOrUnlogged(Relation rel, Relation
 relrelation, bool toLogged)

You are using relrelation and relrel. I'd change all occurrences
to relrelation because that's also used elsewhere.

  The comment on heap_open() suggests that you could directly invoke
  relation_open() because you know this is a toast table, similarly for
  index_open(). (I can't say which is better style.)
 
 
 I don't know which is better style too... other opinions??

Both are used several times in tablecmds.c, so both are probably fine.
(Didn't check the contexts, though.)

Christoph
-- 
c...@df7cb.de | 

Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-14 Thread Bruce Momjian
On Mon, Jul 14, 2014 at 11:26:19AM -0400, Robert Haas wrote:
 On Fri, Jul 11, 2014 at 9:55 AM, Bruce Momjian br...@momjian.us wrote:
  On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
   Uh, why does this need to be in ALTER TABLE?  Can't this be part of
   table creation done by pg_dump?
 
  Uh, I think you need to read the thread.  We have to delay the toast
  creation part so we don't use an oid that will later be required by
  another table from the old cluster.  This has to be done after all
  tables have been created.
 
  We could have pg_dump spit out those ALTER lines at the end of the dump,
  but it seems simpler to do it in pg_upgrade.
 
  Even if we have pg_dump create all the tables that require pre-assigned
  TOAST oids first, then the other tables that _might_ need a TOAST table,
  those later tables might create a toast oid that matches a later
  non-TOAST-requiring table, so I don't think that fixes the problem.
 
  What would be nice is if I could mark just the tables that will need
  toast tables created in that later phase (those tables that didn't have
  a toast table in the old cluster, but need one in the new cluster).
  However, I can't see where to store that or how to pass that back into
  pg_upgrade.  I don't see a logical place in pg_class to put it.
 
 reloptions?

Yes, that might work.  I thought about that but did not have time to see
if we can easily add/remove reloptions at the C level in that area of
the code.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-07-14 Thread Christoph Berg
Re: Noah Misch 2014-07-12 20140712170151.ga1985...@tornado.leadboat.com
 Thanks.  Preliminary questions:
 
  +#ifdef HAVE_UNIX_SOCKETS
  +/* make_temp_sockdir() is invoked at most twice from pg_upgrade.c via 
  get_sock_dir() */
  +#define MAX_TEMPDIRS 2
  +static int n_tempdirs = 0; /* actual number of directories created */
  +static const char *temp_sockdir[MAX_TEMPDIRS];
  +#endif
 
 get_sock_dir() currently returns the same directory, the CWD, for both calls;
 can't it continue to do so?  We already have good reason not to start two
 postmasters simultaneously during pg_upgrade.
 
  +/*
  + * Remove the socket temporary directories.  pg_ctl waits for postmaster
  + * shutdown, so we expect the directory to be empty, unless we are 
  interrupted
  + * by a signal, in which case the postmaster will clean up the sockets, but
  + * there's a race condition with us removing the directory.
 
 What's the reason for addressing that race condition in pg_regress and not
 addressing it in pg_upgrade?

I didn't want to have too many arrays for additionally storing the
socket and lockfile names, and unlike pg_regress, pg_upgrade usually
doesn't need to delete the files by itself, so it seemed like a good
choice to rely on the postmaster removing them.

Now, if get_sock_dir() should only return a single directory instead
of many (well, two), that makes the original code from pg_regress more
attractive to use. (Possibly it will even be a candidate for moving to
pgcommon.a, though the static/global variables might defeat that.)

I'll send an updated patch soonish.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tab completion for setting search_path

2014-07-14 Thread Christoph Berg
Re: Andres Freund 2014-07-12 20140712135128.gd3...@awork2.anarazel.de
 I'm also not really happy with the fact that we only complete a single
 search_path item. But it's not easy to do better and when looking around
 other places (e.g. DROP TABLE) don't support it either.

The difference is that the other places don't really need it, i.e. you
can just issue two DROP TABLE. (And I wasn't even aware that DROP
TABLE a, b; exists specifically).

That said, it's great to have the feature, though I'd say making
search_path list-aware should be much higher on the todo list than a
generic solution for other cases.

 I've thought about adding $user to the set of completed things as

If we only support one item atm, $user isn't very relevant anyway.

 Fujii wondered about it, but it turns out completions containing $ don't
 work really great because $ is part of WORD_BREAKS.
 E.g. check out what happens if you do
 CREATE TABLE foo$01();
 CREATE TABLE foo$02();
 DROP TABLE foo$tab
 which means that a single schema that requires quoting will break
 completion of $user.

Schemas requiring quoting should be rare, so that wouldn't be a big
problem. (Or at least it could solve the problem for most users.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-14 Thread Andres Freund
On 2014-07-11 09:55:34 -0400, Bruce Momjian wrote:
 On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
   Uh, why does this need to be in ALTER TABLE?  Can't this be part of
   table creation done by pg_dump?
  
  Uh, I think you need to read the thread.  We have to delay the toast
  creation part so we don't use an oid that will later be required by
  another table from the old cluster.  This has to be done after all
  tables have been created.
  
  We could have pg_dump spit out those ALTER lines at the end of the dump,
  but it seems simpler to do it in pg_upgrade.
  
  Even if we have pg_dump create all the tables that require pre-assigned
  TOAST oids first, then the other tables that _might_ need a TOAST table,
  those later tables might create a toast oid that matches a later
  non-TOAST-requiring table, so I don't think that fixes the problem.
 
 What would be nice is if I could mark just the tables that will need
 toast tables created in that later phase (those tables that didn't have
 a toast table in the old cluster, but need one in the new cluster). 
 However, I can't see where to store that or how to pass that back into
 pg_upgrade.  I don't see a logical place in pg_class to put it.

This seems overengineered. Why not just do
SELECT pg_upgrade.maintain_toast(oid) FROM pg_class WHERE relkind = 'r';

and in maintain_toast() just call AlterTableCreateToastTable()?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] returning SETOF RECORD

2014-07-14 Thread Robert Haas
populate_record_worker in jsonfuncs.c says this:

if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg(function returning record called in context 
that cannot accept type record),
 errhint(Try calling the function in the FROM clause 
 using a column definition list.)));

dblink.c has a similar incantation.

Is there any reasonable alternative?  That is, if you have a function
returning SETOF record, and the details of the record type aren't
specified, is there anything you can do other than error out like
this?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-07-14 Thread Robert Haas
On Wed, Jul 9, 2014 at 5:16 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 The way it works now, each opclass needs to have three support
 procedures; I've called them getOpers, maybeUpdateValues, and compare.
 (I realize these names are pretty bad, and will be changing them.)
 getOpers is used to obtain information about what is stored for that
 data type; it says how many datum values are stored for a column of that
 type (two for sortable: min and max), and how many operators it needs
 setup.  Then, the generic code fills in a MinmaxDesc(riptor) and creates
 an initial DeformedMMTuple (which is a rather ugly name for a minmax
 tuple held in memory).  The maybeUpdateValues amproc can then be called
 when there's a new heap tuple, which updates the DeformedMMTuple to
 account for the new tuple (in essence, it's a union of the original
 values and the new tuple).  This can be done repeatedly (when a new
 index is being created) or only once (when a new heap tuple is inserted
 into an existing index).  There is no need for an aggregate.

 This DeformedMMTuple can easily be turned into the on-disk
 representation; there is no hardcoded assumption on the number of index
 values stored per heap column, so it is possible to build an opclass
 that stores a bounding box column for a geometry heap column, for
 instance.

 Then we have the compare amproc.  This is used during index scans;
 after extracting an index tuple, it is turned into DeformedMMTuple, and
 the compare amproc for each column is called with the values of scan
 keys.  (Now that I think about this, it seems pretty much what
 consistent is for GiST opclasses).  A true return value indicates that
 the scan key matches the page range boundaries and thus all pages in the
 range are added to the output TID bitmap.

This sounds really great.  I agree that it needs some renaming.  I
think renaming what you are calling compare to consistent would be
an excellent idea, to match GiST.  maybeUpdateValues sounds like it
does the equivalent of GIST's compress on the new value followed by
a union with the existing summary item.  I don't think it's
necessary to separate those out, though.  You could perhaps call it
something like add_item.

Also, FWIW, I liked Peter's idea of calling these summarizing
indexes or perhaps summary would be a bit shorter and mean the same
thing.  minmax wouldn't be the end of the world, but since you've
gone to the trouble of making this more generic I think giving it a
more generic name would be a very good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] returning SETOF RECORD

2014-07-14 Thread Andrew Dunstan


On 07/14/2014 03:44 PM, Robert Haas wrote:

populate_record_worker in jsonfuncs.c says this:

 if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
 ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg(function returning record called in context 
 that cannot accept type record),
  errhint(Try calling the function in the FROM clause 
  using a column definition list.)));

dblink.c has a similar incantation.

Is there any reasonable alternative?  That is, if you have a function
returning SETOF record, and the details of the record type aren't
specified, is there anything you can do other than error out like
this?




Not that I can see. What would you suggest?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] returning SETOF RECORD

2014-07-14 Thread Robert Haas
On Mon, Jul 14, 2014 at 4:39 PM, Andrew Dunstan and...@dunslane.net wrote:
 Is there any reasonable alternative?  That is, if you have a function
 returning SETOF record, and the details of the record type aren't
 specified, is there anything you can do other than error out like
 this?

 Not that I can see. What would you suggest?

Dunno.  Was hoping someone else had an idea.  It'd certainly be nice
to have some way of calling functions like this without specifying the
shape of the return value, but I doubt there's a way to make that work
without a lot of new infrastructure.  For example, if a function could
be called at the point where we need to know the record shape with a
special flag that says just tell me what kind of record you're going
to return and then called again at execution time to actually produce
the results, that would be nifty.

But mostly, I think it's slightly odd that the function gets called at
all if nothing useful can be done.  Why not just error out in the
caller?  So that made me wonder if maybe there is a way to do
something useful, and I'm just not seeing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-07-14 Thread Alvaro Herrera
Robert Haas wrote:
 On Mon, Jul 14, 2014 at 6:20 AM, Abhijit Menon-Sen a...@2ndquadrant.com 
 wrote:
  At 2014-05-08 15:28:22 +0200, and...@2ndquadrant.com wrote:
Hm. Not sure what you're ACKing here ;).
  
   The idea of giving the unallocated memory a NULL key.
 
  Ok. A new version of the patches implementing that are attached.
  Including a couple of small fixups and docs. The latter aren't
  extensive, but that doesn't seem to be warranted anyway.
 
  I realise now that this email didn't actually have an attachment. Could
  you please repost the latest version of this patch?
 
 That's odd.  I received two attachments with that email.  Of course, I
 was copied directly, but why would the archives have lost the
 attachments?

The attachments are there on the archives, and also on my mbox -- and
unlike Robert, I was not copied.  I think this is a problem on Abhijit's
end.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Getting list of held lwlocks from debugger

2014-07-14 Thread Jeff Janes
Is there an easy way to get a list of held lwlocks out of gdb attached to a
core dump (or for that matter attached to a live process)?

I can try manually walking the internal data structure, but I am hoping for
something analogous to the way you get memory contexts:

(gdb) p MemoryContextStats(TopMemoryContext)

Cheers,

Jeff


Re: [HACKERS] returning SETOF RECORD

2014-07-14 Thread Andrew Dunstan


On 07/14/2014 04:46 PM, Robert Haas wrote:

On Mon, Jul 14, 2014 at 4:39 PM, Andrew Dunstan and...@dunslane.net wrote:

Is there any reasonable alternative?  That is, if you have a function
returning SETOF record, and the details of the record type aren't
specified, is there anything you can do other than error out like
this?

Not that I can see. What would you suggest?

Dunno.  Was hoping someone else had an idea.  It'd certainly be nice
to have some way of calling functions like this without specifying the
shape of the return value, but I doubt there's a way to make that work
without a lot of new infrastructure.  For example, if a function could
be called at the point where we need to know the record shape with a
special flag that says just tell me what kind of record you're going
to return and then called again at execution time to actually produce
the results, that would be nifty.

But mostly, I think it's slightly odd that the function gets called at
all if nothing useful can be done.  Why not just error out in the
caller?  So that made me wonder if maybe there is a way to do
something useful, and I'm just not seeing it.





For json{b}, this only happens if you call json{b}_to_record{set}. 
json{b}_populate_record{set} will always have the required info. The 
downside of these is that you have to supply a value of a named type 
rather than an anonymous type expression.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] view reloptions

2014-07-14 Thread Alvaro Herrera
Jaime Casanova wrote:

 Attached is a patch moving the reloptions of views into its own structure.
 i also created a view_reloptions() function in order to not use
 heap_reloptions() for views, but maybe that was too much?

No, that was part of what I was thinking too -- I have pushed this now.
Many thanks.

 i haven't seen the check_option_offset thingy yet, but i hope to take
 a look at that tomorrow.

I'm guessing you didn't get around to doing this.  I gave it a quick
look and my conclusion is that it requires somewhat more work than it's
worth.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \db+ lack of size column

2014-07-14 Thread Alvaro Herrera
Fabrízio de Royes Mello wrote:
 On Fri, May 16, 2014 at 2:03 AM, Fabrízio de Royes Mello 
 fabriziome...@gmail.com wrote:
 
  Hi all,
 
  Are there some reason to don't show the tablespace size in the \db+
  psql command?
 
 The attached patch show tablespace size in \db+ psql command.

Thanks, pushed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL compression info in psql header

2014-07-14 Thread Robert Haas
On Sat, Jul 12, 2014 at 8:49 AM, Magnus Hagander mag...@hagander.net wrote:
 It's today really hard to figure out if your SSL connection is
 actually *using* SSL compression. This got extra hard when we the
 default value started getting influenced by environment variables at
 least on many platforms after the crime attacks. ISTM we should be
 making this easier for the user.

 Attached patch adds compression info at least to the header of the
 psql banner, as that's very non-intrusive. I think this is a small
 enough change, yet very useful, that we should squeeze it into 9.4
 before the next beta. Not sure if it can be qualified enough of a bug
 to backpatch further than that though.

 As far as my research shows, the function
 SSL_get_current_compression() which it uses was added in OpenSSL
 0.9.6, which is a long time ago (stopped being maintained in 2004).
 AFAICT even RHEL *3* shipped with 0.9.7. So I think we can safely rely
 on it, especially since we only check for whether it returns NULL or
 not.

 Comments?

Seems like a fine change.  I think it would be OK to slip it into 9.4,
too, but I don't think we should back-patch it further than that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2014-07-14 Thread Robert Haas
On Sun, Jul 13, 2014 at 2:27 AM, Mitsumasa KONDO
kondo.mitsum...@gmail.com wrote:
 I still agree with Fabien-san. I cannot understand why our logical proposal
 isn't accepted...

Well, I think the feedback has been pretty clear, honestly.  Here's
what I'm unhappy about: I can't understand what these options are
actually doing.

And this isn't helping me a bit:

  [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
 starting vacuum...end.
 transaction type: Exponential distribution TPC-B (sort of)
 scaling factor: 1
 exponential threshold: 10.0

 decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
 highest/lowest percent of the range: 9.5% 0.0%

I don't have a clue what that means.  None.

Here is an example of an explanation that would make sense to me.
This is not the actual behavior of your patch, I'm quite sure, so this
is just an example of the *kind* of explanation that I think is
needed:

The --exponential option causes pgbench to select lower-numbered
account IDs exponentially more frequently than higher-numbered account
IDs.  The argument to --exponential controls the strength of the
preference for lower-numbered account IDs, with a smaller value
indicating a stronger preference.  Specifically, it is the percentage
of the total number of account IDs which will receive half the total
accesses.  For example, with --exponential=10, half the accesses will
be to the smallest 10 percent of the account IDs; half the remaining
accesses will be to the next-smallest 10 percent of account IDs, and
so on.  --exponential=50 therefore represents a completely flat
distribution; larger values are not allowed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect comment in postgres_fdw.c

2014-07-14 Thread Etsuro Fujita

(2014/07/14 19:46), Fujii Masao wrote:

On Mon, Jul 14, 2014 at 7:31 PM, Shigeru Hanada
shigeru.han...@gmail.com wrote:

Fujita-san,

2014-07-11 18:22 GMT+09:00 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:

I think the following comment for store_returning_result() in
postgres_fdw.c is not right.

 /* PGresult must be released before leaving this function. */

I think PGresult should not be released before leaving this function *on
success* in that function.

(I guess the comment has been copied and pasted from that for
get_remote_estimate().)


+1 for just removing the comment, because header comment clearly
mentions necessity of releasing the PGresult.


Committed. Thanks for the report!


Thank you for committing the patch!  And thanks, Hanada-san!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-14 Thread Fabrízio de Royes Mello
On Mon, Jul 14, 2014 at 3:31 PM, Christoph Berg c...@df7cb.de wrote:

 Oh I wasn't aware of the wiki page, I had just read the old thread.
 Thanks for the pointer.


:-)

Thanks again for your review!


diff --git a/doc/src/sgml/ref/alter_table.sgml
  b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..424f2e9 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -58,6 +58,7 @@ ALTER TABLE [ IF EXISTS ] replaceable
  class=PARAMETERname/replaceable
 ENABLE REPLICA RULE replaceable
  class=PARAMETERrewrite_rule_name/replaceable
 ENABLE ALWAYS RULE replaceable
  class=PARAMETERrewrite_rule_name/replaceable
 CLUSTER ON replaceable
class=PARAMETERindex_name/replaceable
+SET {LOGGED | UNLOGGED}
 SET WITHOUT CLUSTER
 SET WITH OIDS
 SET WITHOUT OIDS
  
   This must not be between the two CLUSTER lines. I think the best spot
   would just be one line down, before SET WITH OIDS.
 
  Fixed.

 The (long) SET LOGGED paragraph is still between CLUSTER and SET
 WITHOUT CLUSTER.


Fixed.


   This grammar bug pops up consistently: This form *changes*...
  
 
  Fixed.

 Two more:

 + * The AlterTableChangeCatalogToLoggedOrUnlogged function perform the
 + * The AlterTableChangeIndexesToLoggedOrUnlogged function scan all
indexes


Fixed.


  relation_close(rel, NoLock);
+
+ if (pass == AT_PASS_SET_LOGGED_UNLOGGED)
+
  ATPostAlterSetLoggedUnlogged(RelationGetRelid(rel));
  
   This must be done before relation_close() which releases all locks.

 You didn't address that. I'm not sure about it, but either way, this
 deserves a comment on the lock level necessary.


Actually relation_close(rel, NoLock) don't release the locks.

See src/backend/access/heap/heapam.c:1167


   Moreover, I think you can get rid of that extra PASS here.
   AT_PASS_ALTER_TYPE has its own pass because you can alter several
   columns in a single ALTER TABLE statement, but you can have only one
   SET (UN)LOGGED, so you can to the cluster_rel() directly in
   AlterTableSetLoggedOrUnlogged() (unless cluster_rel() is too intrusive
   and would interfere with other ALTER TABLE operations in this command,
   no idea).
  
 
  I had some troubles here so I decided to do in that way, but I confess
I'm
  not comfortable with this implementation. Looking more carefully on
  tablecmds.c code, at the ATController we have three phases and the
third is
  'scan/rewrite tables as needed' so my doubt is if can I use it instead
of
  call 'cluster_rel'?

 I've just tried some SET (UN)LOGGED operations with altering column
 types in the same operation, that works. But:

 Yes, you should use the existing table rewriting machinery, or at
 least clearly document (in comments) why it doesn't work for you.

 Also looking at ATController, there's a wqueue mechanism to queue
 catalog updates. You should probably use this, too, or again document
 why it doesn't work for you.


This works... fixed!



   Here's the big gotcha: Just like SET LOGGED must check for outgoing
   FKs to unlogged tables, SET UNLOGGED must check for incoming FKs from
   permanent tables. This is missing.
  
 
  I don't think so... we can create an unlogged table with a FK referring
to
  a regular table...
  ... but is not possible create a FK from a regular table referring to an
  unlogged table:
  ... and a FK from an unlogged table referring other unlogged table
works:
  So we must take carefull just when changing an unlogged table to a
regular
  table.
 
  Am I correct or I miss something?

 You miss the symmetric case the other way round. When changing a table
 to unlogged, you need to make sure no other permanent table is
 referencing our table.


Ohh yeas... sorry... you're completely correct... fixed!


+AlterTableChangeCatalogToLoggedOrUnlogged(Relation rel, Relation
  relrelation, bool toLogged)

 You are using relrelation and relrel. I'd change all occurrences
 to relrelation because that's also used elsewhere.


Fixed.


   The comment on heap_open() suggests that you could directly invoke
   relation_open() because you know this is a toast table, similarly for
   index_open(). (I can't say which is better style.)
  
 
  I don't know which is better style too... other opinions??

 Both are used several times in tablecmds.c, so both are probably fine.
 (Didn't check the contexts, though.)


Then we can leave that way. Is ok for you?

Greetings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..2d131df 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] 

Re: [HACKERS] No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan

2014-07-14 Thread Etsuro Fujita

(2014/07/14 21:01), Fujii Masao wrote:

On Fri, Jul 11, 2014 at 7:21 PM, Fujii Masao masao.fu...@gmail.com wrote:

On Fri, Jul 11, 2014 at 5:45 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:

I've noticed that EXPLAIN ANALYZE shows no information on exact/lossy
blocks for a bitmap heap scan when both the numbers of exact/lossy pages
retrieved by the node are zero.  Such an example is shown below.  I
think it would be better to suppress the 'Heap Blocks' line in that
case, based on the same idea of the 'Buffers' line.  Patch attached.


The patch looks good to me. Barring any objection, I will commit this both
in HEAD and 9.4.


Committed!


Thanks!

I have another proposal for show_tidbitmap_info().  That is about the 
following comment for show_tidbitmap_info():


 /*
  * If it's EXPLAIN ANALYZE, show exact/lossy pages for a 
BitmapHeapScan node

  */

ISTM that the words If it's EXPLAIN ANALYZE are unnecessary.  As the 
function is called in EXPLAIN ANALYZE, so the words are not wrong, but 
it doesn't seem to me suitable for the comment for the function itself. 
 Patch attached.


Thanks,

Best regards,
Etsuro Fujita
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 781a736..07da169 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1925,7 +1925,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 }
 
 /*
- * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node
+ * Show exact/lossy pages for a BitmapHeapScan node
  */
 static void
 show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-07-14 Thread Abhijit Menon-Sen
At 2014-07-14 16:48:09 -0400, alvhe...@2ndquadrant.com wrote:

 The attachments are there on the archives, and also on my mbox -- and
 unlike Robert, I was not copied.  I think this is a problem on
 Abhijit's end.

Yes, it is. I apologise for the noise.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2014-07-14 Thread Dilip kumar
On 12 July 2014 23:25, Emre Hasegeli Wrote,

  I have one last comment, after clarifying this I can move it to
 ready for committer.
  1. In networkjoinsel, For avoiding the case of huge statistics, only
 some of the values from mcv and histograms are used (calculated using
 SQRT).
  -- But in my opinion, if histograms and mcv both are exist then its
 fine, but if only mcv's are there in that case, we can match complete
 MCV, it will give better accuracy.
 In other function like eqjoinsel also its matching complete MCV.
 
 I was not sure of reducing statistics, at all.  I could not find any
 other selectivity estimation function which does this.  After testing
 it some more, I reached the conclusion that it would be better to only
 reduce the values of the outer loop on histogram match.  Now it matches
 complete MCV lists to each other.  I also switched back to
 log2() from sqrt() to make the outer list smaller.

OK

 
 I rethink your previous advice to threat histogram bucket partially
 matched when the constant matches the last boundary, and changed it
 that way.  It is better than using the selectivity for only one value.
 Removing this part also make the function more simple.  The new version
 of the patch attached.

 This seems good to me.

 
 While looking at it I find some other small problems and fixed them.
 I also realized that I forgot to support other join types than inner
 join.  Currently, the default estimation is used for anti joins.
 I think the patch will need more than trivial amount of change to
 support anti joins.  I can work on it later.  While doing it, outer
 join selectivity estimation can also be improved.  I think the patch is
 better than nothing in its current state.

I agree with you that we can support other join type and anti join later,
If others don’t have any objection in doing other parts later I will mark as 
Ready For Committer.

Regards,
Dilip




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers