Re: [HACKERS] Selectivity estimation for inet operators

2014-07-14 Thread Dilip kumar
On 12 July 2014 23:25, Emre Hasegeli Wrote,

> > I have one last comment, after clarifying this I can move it to
> "ready for committer".
> > 1. In networkjoinsel, For avoiding the case of huge statistics, only
> some of the values from mcv and histograms are used (calculated using
> SQRT).
> > -- But in my opinion, if histograms and mcv both are exist then its
> fine, but if only mcv's are there in that case, we can match complete
> MCV, it will give better accuracy.
> >In other function like eqjoinsel also its matching complete MCV.
> 
> I was not sure of reducing statistics, at all.  I could not find any
> other selectivity estimation function which does this.  After testing
> it some more, I reached the conclusion that it would be better to only
> reduce the values of the outer loop on histogram match.  Now it matches
> complete MCV lists to each other.  I also switched back to
> log2() from sqrt() to make the outer list smaller.

OK

> 
> I rethink your previous advice to threat histogram bucket partially
> matched when the constant matches the last boundary, and changed it
> that way.  It is better than using the selectivity for only one value.
> Removing this part also make the function more simple.  The new version
> of the patch attached.

 This seems good to me.

> 
> While looking at it I find some other small problems and fixed them.
> I also realized that I forgot to support other join types than inner
> join.  Currently, the default estimation is used for anti joins.
> I think the patch will need more than trivial amount of change to
> support anti joins.  I can work on it later.  While doing it, outer
> join selectivity estimation can also be improved.  I think the patch is
> better than nothing in its current state.

I agree with you that we can support other join type and anti join later,
If others don’t have any objection in doing other parts later I will mark as 
"Ready For Committer".

Regards,
Dilip




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-07-14 Thread Abhijit Menon-Sen
At 2014-07-14 16:48:09 -0400, alvhe...@2ndquadrant.com wrote:
>
> The attachments are there on the archives, and also on my mbox -- and
> unlike Robert, I was not copied.  I think this is a problem on
> Abhijit's end.

Yes, it is. I apologise for the noise.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan

2014-07-14 Thread Etsuro Fujita

(2014/07/14 21:01), Fujii Masao wrote:

On Fri, Jul 11, 2014 at 7:21 PM, Fujii Masao  wrote:

On Fri, Jul 11, 2014 at 5:45 PM, Etsuro Fujita
 wrote:

I've noticed that EXPLAIN ANALYZE shows no information on exact/lossy
blocks for a bitmap heap scan when both the numbers of exact/lossy pages
retrieved by the node are zero.  Such an example is shown below.  I
think it would be better to suppress the 'Heap Blocks' line in that
case, based on the same idea of the 'Buffers' line.  Patch attached.


The patch looks good to me. Barring any objection, I will commit this both
in HEAD and 9.4.


Committed!


Thanks!

I have another proposal for show_tidbitmap_info().  That is about the 
following comment for show_tidbitmap_info():


 /*
  * If it's EXPLAIN ANALYZE, show exact/lossy pages for a 
BitmapHeapScan node

  */

ISTM that the words "If it's EXPLAIN ANALYZE" are unnecessary.  As the 
function is called in EXPLAIN ANALYZE, so the words are not wrong, but 
it doesn't seem to me suitable for the comment for the function itself. 
 Patch attached.


Thanks,

Best regards,
Etsuro Fujita
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 781a736..07da169 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1925,7 +1925,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 }
 
 /*
- * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node
+ * Show exact/lossy pages for a BitmapHeapScan node
  */
 static void
 show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-14 Thread Fabrízio de Royes Mello
On Mon, Jul 14, 2014 at 3:31 PM, Christoph Berg  wrote:
>
> Oh I wasn't aware of the wiki page, I had just read the old thread.
> Thanks for the pointer.
>

:-)

Thanks again for your review!


> > > > diff --git a/doc/src/sgml/ref/alter_table.sgml
> > b/doc/src/sgml/ref/alter_table.sgml
> > > > index 69a1e14..424f2e9 100644
> > > > --- a/doc/src/sgml/ref/alter_table.sgml
> > > > +++ b/doc/src/sgml/ref/alter_table.sgml
> > > > @@ -58,6 +58,7 @@ ALTER TABLE [ IF EXISTS ]  > class="PARAMETER">name
> > > >  ENABLE REPLICA RULE  > class="PARAMETER">rewrite_rule_name
> > > >  ENABLE ALWAYS RULE  > class="PARAMETER">rewrite_rule_name
> > > >  CLUSTER ON index_name
> > > > +SET {LOGGED | UNLOGGED}
> > > >  SET WITHOUT CLUSTER
> > > >  SET WITH OIDS
> > > >  SET WITHOUT OIDS
> > >
> > > This must not be between the two CLUSTER lines. I think the best spot
> > > would just be one line down, before SET WITH OIDS.
> >
> > Fixed.
>
> The (long) SET LOGGED paragraph is still between CLUSTER and SET
> WITHOUT CLUSTER.
>

Fixed.


> > > This grammar bug pops up consistently: This form *changes*...
> > >
> >
> > Fixed.
>
> Two more:
>
> + * The AlterTableChangeCatalogToLoggedOrUnlogged function perform the
> + * The AlterTableChangeIndexesToLoggedOrUnlogged function scan all
indexes
>

Fixed.


> > > >   relation_close(rel, NoLock);
> > > > +
> > > > + if (pass == AT_PASS_SET_LOGGED_UNLOGGED)
> > > > +
> > ATPostAlterSetLoggedUnlogged(RelationGetRelid(rel));
> > >
> > > This must be done before relation_close() which releases all locks.
>
> You didn't address that. I'm not sure about it, but either way, this
> deserves a comment on the lock level necessary.
>

Actually relation_close(rel, NoLock) don't release the locks.

See src/backend/access/heap/heapam.c:1167


> > > Moreover, I think you can get rid of that extra PASS here.
> > > AT_PASS_ALTER_TYPE has its own pass because you can alter several
> > > columns in a single ALTER TABLE statement, but you can have only one
> > > SET (UN)LOGGED, so you can to the cluster_rel() directly in
> > > AlterTableSetLoggedOrUnlogged() (unless cluster_rel() is too intrusive
> > > and would interfere with other ALTER TABLE operations in this command,
> > > no idea).
> > >
> >
> > I had some troubles here so I decided to do in that way, but I confess
I'm
> > not comfortable with this implementation. Looking more carefully on
> > tablecmds.c code, at the ATController we have three phases and the
third is
> > 'scan/rewrite tables as needed' so my doubt is if can I use it instead
of
> > call 'cluster_rel'?
>
> I've just tried some SET (UN)LOGGED operations with altering column
> types in the same operation, that works. But:
>
> Yes, you should use the existing table rewriting machinery, or at
> least clearly document (in comments) why it doesn't work for you.
>
> Also looking at ATController, there's a wqueue mechanism to queue
> catalog updates. You should probably use this, too, or again document
> why it doesn't work for you.
>

This works... fixed!



> > > Here's the big gotcha: Just like SET LOGGED must check for outgoing
> > > FKs to unlogged tables, SET UNLOGGED must check for incoming FKs from
> > > permanent tables. This is missing.
> > >
> >
> > I don't think so... we can create an unlogged table with a FK referring
to
> > a regular table...
> > ... but is not possible create a FK from a regular table referring to an
> > unlogged table:
> > ... and a FK from an unlogged table referring other unlogged table
works:
> > So we must take carefull just when changing an unlogged table to a
regular
> > table.
> >
> > Am I correct or I miss something?
>
> You miss the symmetric case the other way round. When changing a table
> to unlogged, you need to make sure no other permanent table is
> referencing our table.
>

Ohh yeas... sorry... you're completely correct... fixed!


> > > > +AlterTableChangeCatalogToLoggedOrUnlogged(Relation rel, Relation
> > relrelation, bool toLogged)
>
> You are using "relrelation" and "relrel". I'd change all occurrences
> to "relrelation" because that's also used elsewhere.
>

Fixed.


> > > The comment on heap_open() suggests that you could directly invoke
> > > relation_open() because you know this is a toast table, similarly for
> > > index_open(). (I can't say which is better style.)
> > >
> >
> > I don't know which is better style too... other opinions??
>
> Both are used several times in tablecmds.c, so both are probably fine.
> (Didn't check the contexts, though.)
>

Then we can leave that way. Is ok for you?

Greetings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..2d131df 100644

Re: [HACKERS] Incorrect comment in postgres_fdw.c

2014-07-14 Thread Etsuro Fujita

(2014/07/14 19:46), Fujii Masao wrote:

On Mon, Jul 14, 2014 at 7:31 PM, Shigeru Hanada
 wrote:

Fujita-san,

2014-07-11 18:22 GMT+09:00 Etsuro Fujita :

I think the following comment for store_returning_result() in
postgres_fdw.c is not right.

 /* PGresult must be released before leaving this function. */

I think PGresult should not be released before leaving this function *on
success* in that function.

(I guess the comment has been copied and pasted from that for
get_remote_estimate().)


+1 for just removing the comment, because header comment clearly
mentions necessity of releasing the PGresult.


Committed. Thanks for the report!


Thank you for committing the patch!  And thanks, Hanada-san!

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gaussian distribution pgbench

2014-07-14 Thread Robert Haas
On Sun, Jul 13, 2014 at 2:27 AM, Mitsumasa KONDO
 wrote:
> I still agree with Fabien-san. I cannot understand why our logical proposal
> isn't accepted...

Well, I think the feedback has been pretty clear, honestly.  Here's
what I'm unhappy about: I can't understand what these options are
actually doing.

And this isn't helping me a bit:

>  [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
> starting vacuum...end.
> transaction type: Exponential distribution TPC-B (sort of)
> scaling factor: 1
> exponential threshold: 10.0
>
> decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
> highest/lowest percent of the range: 9.5% 0.0%

I don't have a clue what that means.  None.

Here is an example of an explanation that would make sense to me.
This is not the actual behavior of your patch, I'm quite sure, so this
is just an example of the *kind* of explanation that I think is
needed:

The --exponential option causes pgbench to select lower-numbered
account IDs exponentially more frequently than higher-numbered account
IDs.  The argument to --exponential controls the strength of the
preference for lower-numbered account IDs, with a smaller value
indicating a stronger preference.  Specifically, it is the percentage
of the total number of account IDs which will receive half the total
accesses.  For example, with --exponential=10, half the accesses will
be to the smallest 10 percent of the account IDs; half the remaining
accesses will be to the next-smallest 10 percent of account IDs, and
so on.  --exponential=50 therefore represents a completely flat
distribution; larger values are not allowed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL compression info in psql header

2014-07-14 Thread Robert Haas
On Sat, Jul 12, 2014 at 8:49 AM, Magnus Hagander  wrote:
> It's today really hard to figure out if your SSL connection is
> actually *using* SSL compression. This got extra hard when we the
> default value started getting influenced by environment variables at
> least on many platforms after the crime attacks. ISTM we should be
> making this easier for the user.
>
> Attached patch adds compression info at least to the header of the
> psql banner, as that's very non-intrusive. I think this is a small
> enough change, yet very useful, that we should squeeze it into 9.4
> before the next beta. Not sure if it can be qualified enough of a bug
> to backpatch further than that though.
>
> As far as my research shows, the function
> SSL_get_current_compression() which it uses was added in OpenSSL
> 0.9.6, which is a long time ago (stopped being maintained in 2004).
> AFAICT even RHEL *3* shipped with 0.9.7. So I think we can safely rely
> on it, especially since we only check for whether it returns NULL or
> not.
>
> Comments?

Seems like a fine change.  I think it would be OK to slip it into 9.4,
too, but I don't think we should back-patch it further than that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \db+ lack of size column

2014-07-14 Thread Alvaro Herrera
Fabrízio de Royes Mello wrote:
> On Fri, May 16, 2014 at 2:03 AM, Fabrízio de Royes Mello <
> fabriziome...@gmail.com> wrote:
> >
> > Hi all,
> >
> > Are there some reason to don't show the tablespace size in the \db+
> > psql command?
> 
> The attached patch show tablespace size in \db+ psql command.

Thanks, pushed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] view reloptions

2014-07-14 Thread Alvaro Herrera
Jaime Casanova wrote:

> Attached is a patch moving the reloptions of views into its own structure.
> i also created a view_reloptions() function in order to not use
> heap_reloptions() for views, but maybe that was too much?

No, that was part of what I was thinking too -- I have pushed this now.
Many thanks.

> i haven't seen the check_option_offset thingy yet, but i hope to take
> a look at that tomorrow.

I'm guessing you didn't get around to doing this.  I gave it a quick
look and my conclusion is that it requires somewhat more work than it's
worth.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] returning SETOF RECORD

2014-07-14 Thread Andrew Dunstan


On 07/14/2014 04:46 PM, Robert Haas wrote:

On Mon, Jul 14, 2014 at 4:39 PM, Andrew Dunstan  wrote:

Is there any reasonable alternative?  That is, if you have a function
returning SETOF record, and the details of the record type aren't
specified, is there anything you can do other than error out like
this?

Not that I can see. What would you suggest?

Dunno.  Was hoping someone else had an idea.  It'd certainly be nice
to have some way of calling functions like this without specifying the
shape of the return value, but I doubt there's a way to make that work
without a lot of new infrastructure.  For example, if a function could
be called at the point where we need to know the record shape with a
special flag that says "just tell me what kind of record you're going
to return" and then called again at execution time to actually produce
the results, that would be nifty.

But mostly, I think it's slightly odd that the function gets called at
all if nothing useful can be done.  Why not just error out in the
caller?  So that made me wonder if maybe there is a way to do
something useful, and I'm just not seeing it.





For json{b}, this only happens if you call json{b}_to_record{set}. 
json{b}_populate_record{set} will always have the required info. The 
downside of these is that you have to supply a value of a named type 
rather than an anonymous type expression.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Getting list of held lwlocks from debugger

2014-07-14 Thread Jeff Janes
Is there an easy way to get a list of held lwlocks out of gdb attached to a
core dump (or for that matter attached to a live process)?

I can try manually walking the internal data structure, but I am hoping for
something analogous to the way you get memory contexts:

(gdb) p MemoryContextStats(TopMemoryContext)

Cheers,

Jeff


Re: [HACKERS] pg_shmem_allocations view

2014-07-14 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Jul 14, 2014 at 6:20 AM, Abhijit Menon-Sen  
> wrote:
> > At 2014-05-08 15:28:22 +0200, and...@2ndquadrant.com wrote:
> >> > > Hm. Not sure what you're ACKing here ;).
> >> >
> >> > The idea of giving the unallocated memory a NULL key.
> >>
> >> Ok. A new version of the patches implementing that are attached.
> >> Including a couple of small fixups and docs. The latter aren't
> >> extensive, but that doesn't seem to be warranted anyway.
> >
> > I realise now that this email didn't actually have an attachment. Could
> > you please repost the latest version of this patch?
> 
> That's odd.  I received two attachments with that email.  Of course, I
> was copied directly, but why would the archives have lost the
> attachments?

The attachments are there on the archives, and also on my mbox -- and
unlike Robert, I was not copied.  I think this is a problem on Abhijit's
end.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] returning SETOF RECORD

2014-07-14 Thread Robert Haas
On Mon, Jul 14, 2014 at 4:39 PM, Andrew Dunstan  wrote:
>> Is there any reasonable alternative?  That is, if you have a function
>> returning SETOF record, and the details of the record type aren't
>> specified, is there anything you can do other than error out like
>> this?
>
> Not that I can see. What would you suggest?

Dunno.  Was hoping someone else had an idea.  It'd certainly be nice
to have some way of calling functions like this without specifying the
shape of the return value, but I doubt there's a way to make that work
without a lot of new infrastructure.  For example, if a function could
be called at the point where we need to know the record shape with a
special flag that says "just tell me what kind of record you're going
to return" and then called again at execution time to actually produce
the results, that would be nifty.

But mostly, I think it's slightly odd that the function gets called at
all if nothing useful can be done.  Why not just error out in the
caller?  So that made me wonder if maybe there is a way to do
something useful, and I'm just not seeing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] returning SETOF RECORD

2014-07-14 Thread Andrew Dunstan


On 07/14/2014 03:44 PM, Robert Haas wrote:

populate_record_worker in jsonfuncs.c says this:

 if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
 ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("function returning record called in context "
 "that cannot accept type record"),
  errhint("Try calling the function in the FROM clause "
  "using a column definition list.")));

dblink.c has a similar incantation.

Is there any reasonable alternative?  That is, if you have a function
returning SETOF record, and the details of the record type aren't
specified, is there anything you can do other than error out like
this?




Not that I can see. What would you suggest?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-07-14 Thread Robert Haas
On Wed, Jul 9, 2014 at 5:16 PM, Alvaro Herrera  wrote:
> The way it works now, each opclass needs to have three support
> procedures; I've called them getOpers, maybeUpdateValues, and compare.
> (I realize these names are pretty bad, and will be changing them.)
> getOpers is used to obtain information about what is stored for that
> data type; it says how many datum values are stored for a column of that
> type (two for sortable: min and max), and how many operators it needs
> setup.  Then, the generic code fills in a MinmaxDesc(riptor) and creates
> an initial DeformedMMTuple (which is a rather ugly name for a minmax
> tuple held in memory).  The maybeUpdateValues amproc can then be called
> when there's a new heap tuple, which updates the DeformedMMTuple to
> account for the new tuple (in essence, it's a union of the original
> values and the new tuple).  This can be done repeatedly (when a new
> index is being created) or only once (when a new heap tuple is inserted
> into an existing index).  There is no need for an "aggregate".
>
> This DeformedMMTuple can easily be turned into the on-disk
> representation; there is no hardcoded assumption on the number of index
> values stored per heap column, so it is possible to build an opclass
> that stores a bounding box column for a geometry heap column, for
> instance.
>
> Then we have the "compare" amproc.  This is used during index scans;
> after extracting an index tuple, it is turned into DeformedMMTuple, and
> the "compare" amproc for each column is called with the values of scan
> keys.  (Now that I think about this, it seems pretty much what
> "consistent" is for GiST opclasses).  A true return value indicates that
> the scan key matches the page range boundaries and thus all pages in the
> range are added to the output TID bitmap.

This sounds really great.  I agree that it needs some renaming.  I
think renaming what you are calling "compare" to "consistent" would be
an excellent idea, to match GiST.  "maybeUpdateValues" sounds like it
does the equivalent of GIST's "compress" on the new value followed by
a "union" with the existing summary item.  I don't think it's
necessary to separate those out, though.  You could perhaps call it
something like "add_item".

Also, FWIW, I liked Peter's idea of calling these "summarizing
indexes" or perhaps "summary" would be a bit shorter and mean the same
thing.  "minmax" wouldn't be the end of the world, but since you've
gone to the trouble of making this more generic I think giving it a
more generic name would be a very good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] returning SETOF RECORD

2014-07-14 Thread Robert Haas
populate_record_worker in jsonfuncs.c says this:

if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("function returning record called in context "
"that cannot accept type record"),
 errhint("Try calling the function in the FROM clause "
 "using a column definition list.")));

dblink.c has a similar incantation.

Is there any reasonable alternative?  That is, if you have a function
returning SETOF record, and the details of the record type aren't
specified, is there anything you can do other than error out like
this?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-14 Thread Andres Freund
On 2014-07-11 09:55:34 -0400, Bruce Momjian wrote:
> On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
> > > Uh, why does this need to be in ALTER TABLE?  Can't this be part of
> > > table creation done by pg_dump?
> > 
> > Uh, I think you need to read the thread.  We have to delay the toast
> > creation part so we don't use an oid that will later be required by
> > another table from the old cluster.  This has to be done after all
> > tables have been created.
> > 
> > We could have pg_dump spit out those ALTER lines at the end of the dump,
> > but it seems simpler to do it in pg_upgrade.
> > 
> > Even if we have pg_dump create all the tables that require pre-assigned
> > TOAST oids first, then the other tables that _might_ need a TOAST table,
> > those later tables might create a toast oid that matches a later
> > non-TOAST-requiring table, so I don't think that fixes the problem.
> 
> What would be nice is if I could mark just the tables that will need
> toast tables created in that later phase (those tables that didn't have
> a toast table in the old cluster, but need one in the new cluster). 
> However, I can't see where to store that or how to pass that back into
> pg_upgrade.  I don't see a logical place in pg_class to put it.

This seems overengineered. Why not just do
SELECT pg_upgrade.maintain_toast(oid) FROM pg_class WHERE relkind = 'r';

and in maintain_toast() just call AlterTableCreateToastTable()?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tab completion for setting search_path

2014-07-14 Thread Christoph Berg
Re: Andres Freund 2014-07-12 <20140712135128.gd3...@awork2.anarazel.de>
> I'm also not really happy with the fact that we only complete a single
> search_path item. But it's not easy to do better and when looking around
> other places (e.g. DROP TABLE) don't support it either.

The difference is that the other places don't really need it, i.e. you
can just issue two DROP TABLE. (And I wasn't even aware that DROP
TABLE a, b; exists specifically).

That said, it's great to have the feature, though I'd say making
search_path list-aware should be much higher on the todo list than a
generic solution for other cases.

> I've thought about adding "$user" to the set of completed things as

If we only support one item atm, $user isn't very relevant anyway.

> Fujii wondered about it, but it turns out completions containing $ don't
> work really great because $ is part of WORD_BREAKS.
> E.g. check out what happens if you do
> CREATE TABLE "foo$01"();
> CREATE TABLE "foo$02"();
> DROP TABLE "foo$
> which means that a single schema that requires quoting will break
> completion of "$user".

Schemas requiring quoting should be rare, so that wouldn't be a big
problem. (Or at least it could solve the problem for most users.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Securing "make check" (CVE-2014-0067)

2014-07-14 Thread Christoph Berg
Re: Noah Misch 2014-07-12 <20140712170151.ga1985...@tornado.leadboat.com>
> Thanks.  Preliminary questions:
> 
> > +#ifdef HAVE_UNIX_SOCKETS
> > +/* make_temp_sockdir() is invoked at most twice from pg_upgrade.c via 
> > get_sock_dir() */
> > +#define MAX_TEMPDIRS 2
> > +static int n_tempdirs = 0; /* actual number of directories created */
> > +static const char *temp_sockdir[MAX_TEMPDIRS];
> > +#endif
> 
> get_sock_dir() currently returns the same directory, the CWD, for both calls;
> can't it continue to do so?  We already have good reason not to start two
> postmasters simultaneously during pg_upgrade.
> 
> > +/*
> > + * Remove the socket temporary directories.  pg_ctl waits for postmaster
> > + * shutdown, so we expect the directory to be empty, unless we are 
> > interrupted
> > + * by a signal, in which case the postmaster will clean up the sockets, but
> > + * there's a race condition with us removing the directory.
> 
> What's the reason for addressing that race condition in pg_regress and not
> addressing it in pg_upgrade?

I didn't want to have too many arrays for additionally storing the
socket and lockfile names, and unlike pg_regress, pg_upgrade usually
doesn't need to delete the files by itself, so it seemed like a good
choice to rely on the postmaster removing them.

Now, if get_sock_dir() should only return a single directory instead
of many (well, two), that makes the original code from pg_regress more
attractive to use. (Possibly it will even be a candidate for moving to
pgcommon.a, though the static/global variables might defeat that.)

I'll send an updated patch soonish.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-14 Thread Bruce Momjian
On Mon, Jul 14, 2014 at 11:26:19AM -0400, Robert Haas wrote:
> On Fri, Jul 11, 2014 at 9:55 AM, Bruce Momjian  wrote:
> > On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
> >> > Uh, why does this need to be in ALTER TABLE?  Can't this be part of
> >> > table creation done by pg_dump?
> >>
> >> Uh, I think you need to read the thread.  We have to delay the toast
> >> creation part so we don't use an oid that will later be required by
> >> another table from the old cluster.  This has to be done after all
> >> tables have been created.
> >>
> >> We could have pg_dump spit out those ALTER lines at the end of the dump,
> >> but it seems simpler to do it in pg_upgrade.
> >>
> >> Even if we have pg_dump create all the tables that require pre-assigned
> >> TOAST oids first, then the other tables that _might_ need a TOAST table,
> >> those later tables might create a toast oid that matches a later
> >> non-TOAST-requiring table, so I don't think that fixes the problem.
> >
> > What would be nice is if I could mark just the tables that will need
> > toast tables created in that later phase (those tables that didn't have
> > a toast table in the old cluster, but need one in the new cluster).
> > However, I can't see where to store that or how to pass that back into
> > pg_upgrade.  I don't see a logical place in pg_class to put it.
> 
> reloptions?

Yes, that might work.  I thought about that but did not have time to see
if we can easily add/remove reloptions at the C level in that area of
the code.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-07-14 Thread Christoph Berg
Re: Fabrízio de Royes Mello 2014-07-12 

> > ... that being the non-WAL-logging with wal_level=minimal, or more?
> 
> This is the first of additional goals,  but we have others. Please see [1].

Oh I wasn't aware of the wiki page, I had just read the old thread.
Thanks for the pointer.

> > > diff --git a/doc/src/sgml/ref/alter_table.sgml
> b/doc/src/sgml/ref/alter_table.sgml
> > > index 69a1e14..424f2e9 100644
> > > --- a/doc/src/sgml/ref/alter_table.sgml
> > > +++ b/doc/src/sgml/ref/alter_table.sgml
> > > @@ -58,6 +58,7 @@ ALTER TABLE [ IF EXISTS ]  class="PARAMETER">name
> > >  ENABLE REPLICA RULE  class="PARAMETER">rewrite_rule_name
> > >  ENABLE ALWAYS RULE  class="PARAMETER">rewrite_rule_name
> > >  CLUSTER ON index_name
> > > +SET {LOGGED | UNLOGGED}
> > >  SET WITHOUT CLUSTER
> > >  SET WITH OIDS
> > >  SET WITHOUT OIDS
> >
> > This must not be between the two CLUSTER lines. I think the best spot
> > would just be one line down, before SET WITH OIDS.
> 
> Fixed.

The (long) SET LOGGED paragraph is still between CLUSTER and SET
WITHOUT CLUSTER.

> > This grammar bug pops up consistently: This form *changes*...
> >
> 
> Fixed.

Two more:

+ * The AlterTableChangeCatalogToLoggedOrUnlogged function perform the 
+ * The AlterTableChangeIndexesToLoggedOrUnlogged function scan all indexes

> > This unnecessarily rewrites all the tabs, but see below.
> >
> 
> I did that because the new constant AT_PASS_SET_LOGGED_UNLOGGED is larger
> than others.

Ah, ok then.

> > I'm wondering if you shouldn't make a single ATPrepSetLogged function
> > that takes and additional toLogged argument. Or alternatively get rid
> > of the if() by putting the code also into case AT_SetLogged.
> >
> 
> Actually I started that way... with just one ATPrep function we have some
> additional complexity to check relpersistence, define the error message and
> to run AlterTableSetLoggedCheckForeignConstraints(rel) function. So to
> simplify the code I decided split in two small functions.

Nod.

> > >   relation_close(rel, NoLock);
> > > +
> > > + if (pass == AT_PASS_SET_LOGGED_UNLOGGED)
> > > +
> ATPostAlterSetLoggedUnlogged(RelationGetRelid(rel));
> >
> > This must be done before relation_close() which releases all locks.

You didn't address that. I'm not sure about it, but either way, this
deserves a comment on the lock level necessary.

> > Moreover, I think you can get rid of that extra PASS here.
> > AT_PASS_ALTER_TYPE has its own pass because you can alter several
> > columns in a single ALTER TABLE statement, but you can have only one
> > SET (UN)LOGGED, so you can to the cluster_rel() directly in
> > AlterTableSetLoggedOrUnlogged() (unless cluster_rel() is too intrusive
> > and would interfere with other ALTER TABLE operations in this command,
> > no idea).
> >
> 
> I had some troubles here so I decided to do in that way, but I confess I'm
> not comfortable with this implementation. Looking more carefully on
> tablecmds.c code, at the ATController we have three phases and the third is
> 'scan/rewrite tables as needed' so my doubt is if can I use it instead of
> call 'cluster_rel'?

I've just tried some SET (UN)LOGGED operations with altering column
types in the same operation, that works. But:

Yes, you should use the existing table rewriting machinery, or at
least clearly document (in comments) why it doesn't work for you.

Also looking at ATController, there's a wqueue mechanism to queue
catalog updates. You should probably use this, too, or again document
why it doesn't work for you.

> > Here's the big gotcha: Just like SET LOGGED must check for outgoing
> > FKs to unlogged tables, SET UNLOGGED must check for incoming FKs from
> > permanent tables. This is missing.
> >
> 
> I don't think so... we can create an unlogged table with a FK referring to
> a regular table...
> ... but is not possible create a FK from a regular table referring to an
> unlogged table:
> ... and a FK from an unlogged table referring other unlogged table works:
> So we must take carefull just when changing an unlogged table to a regular
> table.
> 
> Am I correct or I miss something?

You miss the symmetric case the other way round. When changing a table
to unlogged, you need to make sure no other permanent table is
referencing our table.

> > > +AlterTableChangeCatalogToLoggedOrUnlogged(Relation rel, Relation
> relrelation, bool toLogged)

You are using "relrelation" and "relrel". I'd change all occurrences
to "relrelation" because that's also used elsewhere.

> > The comment on heap_open() suggests that you could directly invoke
> > relation_open() because you know this is a toast table, similarly for
> > index_open(). (I can't say which is better style.)
> >
> 
> I don't know which is better style too... other opinions??

Both are used several times in tablecmds.c, so both are probably fine.
(Didn't check the contexts, though.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/



Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-14 Thread Peter Geoghegan
On Mon, Jul 14, 2014 at 11:03 AM, Claudio Freire  wrote:
> Are those numbers measured on MAC's strxfrm?
>
> That was the one with suboptimal entropy on the first 8 bytes.

No, they're from a Linux system which uses glibc 2.19. The
optimization will simply be not used on implementations that don't
meet a certain standard (see my AC_TRY_RUN test program). I'm
reasonably confident that that test program will pass on most systems.
Just not Mac OSX. The optimization is never used on Windows and 32-bit
systems for other reasons.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-07-14 Thread Robert Haas
On Mon, Jul 14, 2014 at 6:20 AM, Abhijit Menon-Sen  wrote:
> At 2014-05-08 15:28:22 +0200, and...@2ndquadrant.com wrote:
>> > > Hm. Not sure what you're ACKing here ;).
>> >
>> > The idea of giving the unallocated memory a NULL key.
>>
>> Ok. A new version of the patches implementing that are attached.
>> Including a couple of small fixups and docs. The latter aren't
>> extensive, but that doesn't seem to be warranted anyway.
>
> I realise now that this email didn't actually have an attachment. Could
> you please repost the latest version of this patch?

That's odd.  I received two attachments with that email.  Of course, I
was copied directly, but why would the archives have lost the
attachments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-14 Thread Claudio Freire
On Mon, Jul 14, 2014 at 2:53 PM, Peter Geoghegan  wrote:
> My concern is that it won't be worth it to do the extra work,
> particularly given that I already have 8 bytes to work with. Supposing
> I only had 4 bytes to work with (as researchers writing [2] may have
> only had in 1994), that would leave me with a relatively small number
> of distinct normalized keys in many representative cases. For example,
> I'd have a mere 40,665 distinct normalized keys in the case of my
> "cities" database, rather than 243,782 (out of a set of 317,102 rows)
> for 8 bytes of storage. But if I double that to 16 bytes (which might
> be taken as a proxy for what a good compression scheme could get me),
> I only get a modest improvement - 273,795 distinct keys. To be fair,
> that's in no small part because there are only 275,330 distinct city
> names overall (and so most dups get away with a cheap memcmp() on
> their tie-breaker), but this is a reasonably organic, representative
> dataset.


Are those numbers measured on MAC's strxfrm?

That was the one with suboptimal entropy on the first 8 bytes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL information view

2014-07-14 Thread Stefan Kaltenbrunner
On 07/13/2014 10:35 PM, Magnus Hagander wrote:
> On Sun, Jul 13, 2014 at 10:32 PM, Stefan Kaltenbrunner
>  wrote:
>> On 07/12/2014 03:08 PM, Magnus Hagander wrote:
>>> As an administrator, I find that you fairly often want to know what
>>> your current connections are actually using as SSL parameters, and
>>> there is currently no other way than gdb to find that out - something
>>> we definitely should fix.
>>
>> Yeah that would be handy - however I often wish to be able to figure
>> that out based on the logfile as well, any chance of getting these into
>> connection-logging/log_line_prefix as well?
> 
> We do already log some of it if you have enabled log_connections -
> protocol and cipher. Anything else in particular you'd be looking for
> - compression info?

DN mostly, not sure I care about compression info...


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-07-14 Thread Peter Geoghegan
On Thu, Jun 12, 2014 at 2:09 PM, Peter Geoghegan  wrote:
> Right. It was a little bit incautious of me to say that we had the
> full benefit of 8 bytes of storage with "en_US.UTF-8", since that is
> only true of lower case characters (I think that FreeBSD can play
> tricks with this. Sometimes, it will give you the benefit of 8 bytes
> of entropy for an 8 byte string, with only non-differentiating
> trailing bytes, so that the first 8 bytes of "Aaaa" are distinct
> from the first eight bytes of "", while any trailing bytes are
> non-distinct for both). In any case it's pretty clear that a goal of
> the glibc implementation is to concentrate as much entropy as possible
> into the first part of the string, and that's the important point.

I thought about using an order-preserving compression technique like
Hu Tucker [1] in order to get additional benefit from those 8 bytes.
Even my apparently sympathetic cities example isn't all that
sympathetic, since the number of distinct normalized keys is only
about 75% of the total number of cities (while a strcmp()-only
reliable tie-breaker isn't expected to be useful for the remaining
25%). Here is the improvement I see when I setup things so that there
is a 1:1 correspondence between city rows and distinct normalized
keys:

postgres=# with cc as
(select count(*), array_agg(ctid) ct,
substring(strxfrm_test(city)::text from 0 for 19 ) blob from cities
group by 3 having count(*) > 1 order by 1),
ff as
(
  select unnest(ct[2:400]) u, blob from cc
)
delete from cities using ff where cities.ctid = ff.u;

postgres=# vacuum full cities ;
VACUUM

Afterwards:

postgres=# select count(*) from (select distinct
substring(strxfrm_test(city)::text from 0 for 19) from cities) i;
 count

 243782
(1 row)

postgres=# select count(*) from cities;
 count

 243782
(1 row)

$ cat sort-city.sql
select * from (select * from cities order by city offset 100) d;

Patch results
==

pgbench -M prepared -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 1734
latency average: 173.010 ms
tps = 5.778545 (including connections establishing)
tps = 5.778572 (excluding connections establishing)

pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 2916
latency average: 411.523 ms
tps = 9.706683 (including connections establishing)
tps = 9.706776 (excluding connections establishing)

Master results
==

pgbench -M prepared -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 390
latency average: 769.231 ms
tps = 1.297545 (including connections establishing)
tps = 1.297551 (excluding connections establishing)

pgbench -M prepared -j 4 -c 4 -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 535
latency average: 2242.991 ms
tps = 1.777005 (including connections establishing)
tps = 1.777024 (excluding connections establishing)

So that seems like a considerable further improvement that would be
nice to see more frequently. I don't know that it's worth it to go to
the trouble of compressing given the existing ameliorating measures
(HyperLogLog, etc), at least if using compression is motivated by
worst case performance. There is some research on making this work for
this kind of thing specifically [2].

My concern is that it won't be worth it to do the extra work,
particularly given that I already have 8 bytes to work with. Supposing
I only had 4 bytes to work with (as researchers writing [2] may have
only had in 1994), that would leave me with a relatively small number
of distinct normalized keys in many representative cases. For example,
I'd have a mere 40,665 distinct normalized keys in the case of my
"cities" database, rather than 243,782 (out of a set of 317,102 rows)
for 8 bytes of storage. But if I double that to 16 bytes (which might
be taken as a proxy for what a good compression scheme could get me),
I only get a modest improvement - 273,795 distinct keys. To be fair,
that's in no small part because there are only 275,330 distinct city
names overall (and so most dups get away with a cheap memcmp() on
their tie-breaker), but this is a reasonably organic, representative
dataset.

Now, it's really hard to judge something like that, and I don't
imagine that this analysis is all that scientific. I am inclined to
think that we're better off just aborting if the optimization doesn't
work out while copying, and forgetting about order preserving
compression

Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-14 Thread Fabrízio de Royes Mello
On Mon, Jul 14, 2014 at 12:26 PM, Robert Haas  wrote:
>
> On Fri, Jul 11, 2014 at 9:55 AM, Bruce Momjian  wrote:
> > On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
> >> > Uh, why does this need to be in ALTER TABLE?  Can't this be part of
> >> > table creation done by pg_dump?
> >>
> >> Uh, I think you need to read the thread.  We have to delay the toast
> >> creation part so we don't use an oid that will later be required by
> >> another table from the old cluster.  This has to be done after all
> >> tables have been created.
> >>
> >> We could have pg_dump spit out those ALTER lines at the end of the
dump,
> >> but it seems simpler to do it in pg_upgrade.
> >>
> >> Even if we have pg_dump create all the tables that require pre-assigned
> >> TOAST oids first, then the other tables that _might_ need a TOAST
table,
> >> those later tables might create a toast oid that matches a later
> >> non-TOAST-requiring table, so I don't think that fixes the problem.
> >
> > What would be nice is if I could mark just the tables that will need
> > toast tables created in that later phase (those tables that didn't have
> > a toast table in the old cluster, but need one in the new cluster).
> > However, I can't see where to store that or how to pass that back into
> > pg_upgrade.  I don't see a logical place in pg_class to put it.
>
> reloptions?
>

Is this another use case to "custom reloptions" idea?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] 9.4 logical replication - walsender keepalive replies

2014-07-14 Thread Steve Singer

On 07/06/2014 10:11 AM, Andres Freund wrote:

Hi Steve,



Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check ->flush
< sentPtr but also && ->write < sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,


Yes I think that will work for me.
I tested with the attached patch that I think  does what you describe 
and it seems okay.





Andres Freund



diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
new file mode 100644
index 3189793..844a5de
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*** WalSndWaitForWal(XLogRecPtr loc)
*** 1203,1211 
  		 * possibly are waiting for a later location. So we send pings
  		 * containing the flush location every now and then.
  		 */
! 		if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
  		{
! 			WalSndKeepalive(true);
  			waiting_for_ping_response = true;
  		}
  
--- 1203,1213 
  		 * possibly are waiting for a later location. So we send pings
  		 * containing the flush location every now and then.
  		 */
! 		if (MyWalSnd->flush < sentPtr &&
! 			MyWalSnd->write < sentPtr &&
! 			!waiting_for_ping_response)
  		{
! 			WalSndKeepalive(false);
  			waiting_for_ping_response = true;
  		}
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] things I learned from working on memory allocation

2014-07-14 Thread Andres Freund
On 2014-07-14 11:24:26 -0400, Robert Haas wrote:
> On Sun, Jul 13, 2014 at 2:23 PM, Andres Freund  wrote:
> > The actual if (lock != NULL) bit costs significant amounts of cycles?
> > I'd have assumed that branch prediction takes care of that. Or is it
> > actually the icache not keeping up? Did you measure icache vs. dcache
> > misses?
> > Have you played with unlikely()/likely() type of macros?
> 
> I have not.  I think it's really got more to do with how much stuff
> needs to be saved in the stack frame, but I might be wrong about that.

I don't really see why that'd play such a big role. The register
pressure on ppc/amd64 shouldn't be high enough that the (lock != NULL)
alone requires to push anything on the stack. Sure, the call to
LWLockAcquire() will, but if that's done in the separate branch...

> >> Unfortunately, there's some incremental overhead
> >> in pfree, amounting to a single test of global variable, even when the
> >> new allocator is never used, which is measurable on a microbenchmark;
> >> I don't remember the exact numbers off-hand.
> >
> > Hm. Why's that needed? Because you're searching for your allocator's
> > superblocks in pfree()? I guess you don't store information about the
> > size/context for every allocatation like aset.c does?
> 
> Since the chunks don't have a StandardChunkHeader, pfree has got to
> decide whether it's safe to look at the 16 bytes preceding the chunk
> before doing so.  If the new allocator's not in use (single global
> variable test) that's pretty cheap, but not so cheap as you might
> hope.

That's what I was afraid of :/

> I found that it was possible to buy back most of the cost by
> replacing (*context->methods->free_p) (context, pointer); with a
> hard-coded AllocSetFree(context, pointer), so that gives you some idea
> what order of magnitude we're talking about here.

That's measured with a microbenchmark or actual postgres workloads?
Because in my measurements there wasn't consistent benefit in doing so
even when benchmarking workloads where allocation is a bottleneck.

> >> 3. The current MemoryContext abstraction layer is inadequate for
> >> almost everything one might want to do.  The idea of a context that
> >> allows only allocation, and ignores requests to free memory, has been
> >> discussed more than once.  Such an allocator could skip the
> >> power-of-two rounding that aset.c does, but it couldn't omit the chunk
> >> header, which means that in practice it would save no memory at all
> >> for cases like the one mentioned above (REINDEX
> >> pgbench_accounts_pkey).
> >
> > I personally think that the performance benefit of not doing the
> > rounding, not accessing the freelist, and such is more interesting for
> > such an allocator than the memory savings. I'm pretty sure such a
> > 'allocation only' allocator for e.g. the parser and parts of the
> > executor would be a rather noticeable performance benefit.
> 
> I don't have any reason to believe that.  All palloc() does in the
> common case is bump the boundary pointer and stick the chunk header in
> there.  You're not going to be able to make that much faster.

In my profiling the access to the freelist (to test whether there's free
chunks in there) actually is noticeable stall. Removing it makes things
measurably faster. Also adds memory overhead :) If you look at it, a
simple AllocSetAlloc() will currently access three cachelines inside
AllocSetContext - with some care that can be one.

> > But even for the cases where the space savings are the important bit: To
> > me it sounds feasible to declare that some allocators don't allow
> > reallocations and freeing. Those then would be allowed to omit the chunk
> > header.  To make that debuggable we would need to add a chunk header
> > when assertions are enabled to error out when such an operation is
> > performed - but that's a acceptable price imo.
> 
> Hmm, yeah, that could be done.  However, it does have the disadvantage
> of requiring you to affirmatively avoid pfreeing anything that might
> have come from such a context, rather than simply making pfree a noop.

Yep. Sounds feasible for a number of cases imo. I have serious doubts
that adding a measurable cost to pfree()s for allocations from all
contests is going to be fun. There's some operators that do a lot of
those - especially inside btree opclasses which have to do so. Since
those are the ones you'll often be calling from parallel sort...

> > Btw, am I the only one that wondered if it  wouldn't be rather
> > beneficial to make aset.c add the chunk header before rounding?
> 
> Well, it would help in some cases, hurt in others, and make no
> difference at all in still others.  I mean, it just has to do with how
> well your size classes fit your actual memory allocation pattern;
> you'd be effectively changing the size classes from 32, 64, 128, 256
> to 16, 48, 112, 240 and that could be a win or a loss depending on the
> actual allocation pattern.   I'm sure it's pretty trivia

Re: [HACKERS] Pg_upgrade and toast tables bug discovered

2014-07-14 Thread Robert Haas
On Fri, Jul 11, 2014 at 9:55 AM, Bruce Momjian  wrote:
> On Fri, Jul 11, 2014 at 09:48:06AM -0400, Bruce Momjian wrote:
>> > Uh, why does this need to be in ALTER TABLE?  Can't this be part of
>> > table creation done by pg_dump?
>>
>> Uh, I think you need to read the thread.  We have to delay the toast
>> creation part so we don't use an oid that will later be required by
>> another table from the old cluster.  This has to be done after all
>> tables have been created.
>>
>> We could have pg_dump spit out those ALTER lines at the end of the dump,
>> but it seems simpler to do it in pg_upgrade.
>>
>> Even if we have pg_dump create all the tables that require pre-assigned
>> TOAST oids first, then the other tables that _might_ need a TOAST table,
>> those later tables might create a toast oid that matches a later
>> non-TOAST-requiring table, so I don't think that fixes the problem.
>
> What would be nice is if I could mark just the tables that will need
> toast tables created in that later phase (those tables that didn't have
> a toast table in the old cluster, but need one in the new cluster).
> However, I can't see where to store that or how to pass that back into
> pg_upgrade.  I don't see a logical place in pg_class to put it.

reloptions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] things I learned from working on memory allocation

2014-07-14 Thread Robert Haas
On Sun, Jul 13, 2014 at 2:23 PM, Andres Freund  wrote:
> The actual if (lock != NULL) bit costs significant amounts of cycles?
> I'd have assumed that branch prediction takes care of that. Or is it
> actually the icache not keeping up? Did you measure icache vs. dcache
> misses?
> Have you played with unlikely()/likely() type of macros?

I have not.  I think it's really got more to do with how much stuff
needs to be saved in the stack frame, but I might be wrong about that.

>> Unfortunately, there's some incremental overhead
>> in pfree, amounting to a single test of global variable, even when the
>> new allocator is never used, which is measurable on a microbenchmark;
>> I don't remember the exact numbers off-hand.
>
> Hm. Why's that needed? Because you're searching for your allocator's
> superblocks in pfree()? I guess you don't store information about the
> size/context for every allocatation like aset.c does?

Since the chunks don't have a StandardChunkHeader, pfree has got to
decide whether it's safe to look at the 16 bytes preceding the chunk
before doing so.  If the new allocator's not in use (single global
variable test) that's pretty cheap, but not so cheap as you might
hope.  I found that it was possible to buy back most of the cost by
replacing (*context->methods->free_p) (context, pointer); with a
hard-coded AllocSetFree(context, pointer), so that gives you some idea
what order of magnitude we're talking about here.

>> When the new allocator
>> *is* used, pfree gets a lot slower - mostly because one of the data
>> structures used by the new allocator suffer occasional cache misses
>> during free operations.  I think there might be an argument that some
>> hit on pfree would be worth taking in exchange for better
>> space-efficiency, because we mostly free contexts by resetting them
>> anyway; but I haven't yet managed to make that hit small enough to
>> feel especially good about it.
>
> The speed bump is hit when pfree()ing a allocation that's been allocated
> with the new allocator or also when there's any concurrent activity for
> that allocator?

The latter.

>> 3. The current MemoryContext abstraction layer is inadequate for
>> almost everything one might want to do.  The idea of a context that
>> allows only allocation, and ignores requests to free memory, has been
>> discussed more than once.  Such an allocator could skip the
>> power-of-two rounding that aset.c does, but it couldn't omit the chunk
>> header, which means that in practice it would save no memory at all
>> for cases like the one mentioned above (REINDEX
>> pgbench_accounts_pkey).
>
> I personally think that the performance benefit of not doing the
> rounding, not accessing the freelist, and such is more interesting for
> such an allocator than the memory savings. I'm pretty sure such a
> 'allocation only' allocator for e.g. the parser and parts of the
> executor would be a rather noticeable performance benefit.

I don't have any reason to believe that.  All palloc() does in the
common case is bump the boundary pointer and stick the chunk header in
there.  You're not going to be able to make that much faster.

> But even for the cases where the space savings are the important bit: To
> me it sounds feasible to declare that some allocators don't allow
> reallocations and freeing. Those then would be allowed to omit the chunk
> header.  To make that debuggable we would need to add a chunk header
> when assertions are enabled to error out when such an operation is
> performed - but that's a acceptable price imo.

Hmm, yeah, that could be done.  However, it does have the disadvantage
of requiring you to affirmatively avoid pfreeing anything that might
have come from such a context, rather than simply making pfree a noop.

> Btw, am I the only one that wondered if it  wouldn't be rather
> beneficial to make aset.c add the chunk header before rounding?

Well, it would help in some cases, hurt in others, and make no
difference at all in still others.  I mean, it just has to do with how
well your size classes fit your actual memory allocation pattern;
you'd be effectively changing the size classes from 32, 64, 128, 256
to 16, 48, 112, 240 and that could be a win or a loss depending on the
actual allocation pattern.   I'm sure it's pretty trivial to construct
a test case favoring either outcome.

>> 5. It's tempting to look at other ways of solving the parallel sort
>> problem that don't need an allocator - perhaps by simply packing all
>> the tuples into a DSM one after the next.  But this is not easy to do,
>> or at least it's not easy to do efficiently.  Tuples enter tuplesort.c
>> through one of the tuplesort_put*() functions, most of which end up
>> calling one of the copytup_*() functions.  But you can't easily just
>> change those functions to stick the tuple someplace else, because they
>> don't necessarily get the address of the space to be used from palloc
>> directly.  In particular, copytup_heap() calls
>> ExecCopySlot

Re: [HACKERS] [PATCH] Fix search_path default value separator.

2014-07-14 Thread Christoph Martin
True. Both variants are legal, and most people won't ever notice. I
stumbled across this while writing a test case for a transaction helper
that sets/restores search_path before committing. The test was to simply
compare the string values of `SHOW search_path;` before `BEGIN
TRANSACTION;` and after `COMMIT;`.

It's a non-issue, really, but since there's a patch and I cannot come up
with a more common use case that would depend on the use of just-comma
separators in the default value, I'd say it's more of a question of "why
not" instead of "why", isn't it?


On 14 July 2014 16:58, Robert Haas  wrote:

> On Fri, Jul 11, 2014 at 6:09 AM, Christoph Martin
>  wrote:
> > I noticed a minor inconsistency with the search_path separator used in
> the
> > default configuration.
> >
> > The schemas of any search_path set using `SET search_path TO...` are
> > separated by ", " (comma, space), while the default value is only
> separated
> > by "," (comma).
> >
> > The attached patch against master changes the separator of the default
> value
> > to be consistent with the usual comma-space separators, and updates the
> > documentation of `SHOW search_path;` accordingly.
> >
> > This massive three-byte change passes all 144 tests of make check.
>
> Heh.  I'm not particularly averse to changing this, but I guess I
> don't see any particular benefit of changing it either.  Either comma
> or comma-space is a legal separator, so why worry about it?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Over-optimization in ExecEvalWholeRowVar

2014-07-14 Thread Merlin Moncure
On Fri, Jul 11, 2014 at 2:43 PM, Tom Lane  wrote:
> This example in the regression database is a simplified version of a bug
> I was shown off-list:
>
> regression=# select (
> select q from
> ( select 1,2,3 where f1>0
>   union all
>   select 4,5,6.0 where f1<=0
> ) q
> )
> from int4_tbl;
> ERROR:  record type has not been registered

Thanks Tom!

If anybody gets hit by this, the workaround I use is to put the inner
subquery as a CTE:

 select (
 select q from
 (
 with data as (
 select 1,2,3 where f1>0
 union all
 select 4,5,6.0 where f1<=0
   )
   select * from data
 ) q
 )
 from int4_tbl;

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix search_path default value separator.

2014-07-14 Thread Robert Haas
On Fri, Jul 11, 2014 at 6:09 AM, Christoph Martin
 wrote:
> I noticed a minor inconsistency with the search_path separator used in the
> default configuration.
>
> The schemas of any search_path set using `SET search_path TO...` are
> separated by ", " (comma, space), while the default value is only separated
> by "," (comma).
>
> The attached patch against master changes the separator of the default value
> to be consistent with the usual comma-space separators, and updates the
> documentation of `SHOW search_path;` accordingly.
>
> This massive three-byte change passes all 144 tests of make check.

Heh.  I'm not particularly averse to changing this, but I guess I
don't see any particular benefit of changing it either.  Either comma
or comma-space is a legal separator, so why worry about it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-14 Thread Amit Kapila
On Mon, Jul 14, 2014 at 4:02 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:
> At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila 
wrote in 
> > On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI <
> > horiguchi.kyot...@lab.ntt.co.jp> wrote:
> > I am bit worried about the extra cycles added by this patch as compare
> > to your previous patch; example
> > During trimming of paths, it will build index paths
(build_index_pathkeys())
> > which it will anyway do again during creation of index paths:
> >
create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys()
>
> I felt the same thing. On stupid measure to reduce cycles would
> be caching created pathkeys in IndexOptInfo. I'll try in the next
> patch.

I suggest to wait for overall review of patch before trying this out,
we can try to see what is the best way to avoid this if required, once
other part of patch is reviewed and proved to be problem free.

Other than this I agree with the other points you mentioned in mail
and may be you can just handle those in next version of your patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-07-14 Thread Tom Lane
David Rowley  writes:
> On Mon, Jul 14, 2014 at 3:00 AM, Tom Lane  wrote:
>> Ugh.  I'm back to being discouraged about the usefulness of the
>> optimization.

> Are you worried about the planning overhead of the not null checks, or is
> it more that you think there's a much smaller chance of a real world
> situation that the optimisation will succeed?

Both.  We need to look at how much it costs the planner to run these
checks, and think about how many real queries it will help for.  The
first is quantifiable, the second probably not so much :-( but we still
need to ask the question.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 CF1

2014-07-14 Thread Fabrízio de Royes Mello
On Mon, Jul 14, 2014 at 3:21 AM, Abhijit Menon-Sen 
wrote:
>
> Hi.
>
> We're now at the "nearly done" stage of the first 9.5 CF.
>
> Twenty-four patches have been committed so far, and five more are ready
> for committer (and of those, four are small). Thirty patches are still
> marked "needs review", and twenty-one are waiting on author.
>

Actually thirty one maked "needs review" because I already send a new
reviewed patch but I forgot to mark "needs review"... sorry!


Greetings,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Re: issue log message to suggest VACUUM FULL if a table is nearly empty

2014-07-14 Thread Abhijit Menon-Sen
At 2014-07-14 07:55:34 -0400, t...@sss.pgh.pa.us wrote:
>
> I'd vote for rejecting and annotating the TODO item with a link to
> this thread.

I've marked the patch as rejected and edited the TODO list to remove the
"easy". There was already a link to this thread there.

Thank you.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No exact/lossy block information in EXPLAIN ANALYZE for a bitmap heap scan

2014-07-14 Thread Fujii Masao
On Fri, Jul 11, 2014 at 7:21 PM, Fujii Masao  wrote:
> On Fri, Jul 11, 2014 at 5:45 PM, Etsuro Fujita
>  wrote:
>> I've noticed that EXPLAIN ANALYZE shows no information on exact/lossy
>> blocks for a bitmap heap scan when both the numbers of exact/lossy pages
>> retrieved by the node are zero.  Such an example is shown below.  I
>> think it would be better to suppress the 'Heap Blocks' line in that
>> case, based on the same idea of the 'Buffers' line.  Patch attached.
>
> The patch looks good to me. Barring any objection, I will commit this both
> in HEAD and 9.4.

Committed!

Regards,


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: issue log message to suggest VACUUM FULL if a table is nearly empty

2014-07-14 Thread Tom Lane
Abhijit Menon-Sen  writes:
> Do we have any consensus on this patch?

> I took a quick look at it because no review was posted. The patch itself
> does what it claims to, but from the discussion it doesn't seem like we
> want the feature; or perhaps we only don't want it in its present form.

Re-reading the thread quickly, it seemed like there was considerable
pushback about the cost of collecting the stats, and worries about whether
it wouldn't just be log spam.  But I think the opposite of the latter
concern is also valid, namely that people who could benefit from the
warning are not going to see it because they don't peruse the postmaster
log carefully/at all.  That's a generic problem for warning messages
emitted by background tasks, which we ought to think about how to fix.
In the meantime though it seems like this patch is far more likely to be
annoying than helpful.

> So which is more appropriate, returned with feedback or rejected?
> (In the latter case, the TODO item should also be removed.)

I'd vote for rejecting and annotating the TODO item with a link to
this thread.  And maybe taking off the "easy" notation.  I think the
TODO item is reflecting a real usability issue, but solving it usefully
is quite a bit harder than it looks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Incorrect comment in postgres_fdw.c

2014-07-14 Thread Fujii Masao
On Mon, Jul 14, 2014 at 7:31 PM, Shigeru Hanada
 wrote:
> Fujita-san,
>
> 2014-07-11 18:22 GMT+09:00 Etsuro Fujita :
>> I think the following comment for store_returning_result() in
>> postgres_fdw.c is not right.
>>
>> /* PGresult must be released before leaving this function. */
>>
>> I think PGresult should not be released before leaving this function *on
>> success* in that function.
>>
>> (I guess the comment has been copied and pasted from that for
>> get_remote_estimate().)
>
> +1 for just removing the comment, because header comment clearly
> mentions necessity of releasing the PGresult.

Committed. Thanks for the report!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use unique index for longer pathkeys.

2014-07-14 Thread Kyotaro HORIGUCHI
Thank you for reviewing, the revised patch will come later.

At Mon, 14 Jul 2014 11:01:52 +0530, Amit Kapila  wrote 
in 
> On Fri, Jun 13, 2014 at 1:11 PM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> >
> > Hello, This is the continuation from the last CF.
> >
> > This patch intends to make PG to use index for longer pathkeys
> > than index columns when,
> >
> >  - The index is a unique index.
> >  - All index columns are NOT NULL.
> >  - The index column list is a subset of query_pathkeys.
> >
> > 
> >
> > This patch consists of mainly three parts.
(snip)
> 
> I have spent some time on this patch and would like to share my
> findings as below with you.
> 
> 1.
> It seems to me that the new flag (IndexOptInfo.full_ordered) introduced in
> this patch is not getting set properly.  Please refer below test:

Yeah, probably you omitted the second condition above.

> create table t (a int, b int, c int, d text);

The following definition instead will make this work. NULLs
cannot be unique.

| create table t (a int not null, b int not null, c int, d text);

> create unique index i_t_pkey on t(a, b);
> insert into t (select a % 10, a / 10, a, 't' from generate_series(0,
> 10) a);
> analyze;
> explain (costs off, analyze on) select distinct * from t;
...
> 2.
> typedef struct IndexOptInfo
>   bool predOK; /* true if predicate matches query */
>   bool unique; /* true if a unique index */
>   bool immediate; /* is uniqueness enforced immediately? */
> + bool full_ordered; /* don't key columns duplicate? */
> 
> It seems you have forgotten to update out function _outIndexOptInfo().

Mmm, it's surely what I didn't intended to make:( , but the
member actually works in collect_common_primary_pathkeys.

The correct (desirable) code should use (allnotnull && unique)
instead of (full_ordered). I'll fix this in the comming version
later but they are equivelant in functionality. It's not a
mistake of forgetting update out (so broken), but branching from
wrong point (so it works contrary to the expectation):)

> 3.
> + root->distinct_pathkeys =
> + shorten_pathkeys_following(root, root->distinct_pathkeys,pk_guide,
> +   true);
> +
> + root->query_pathkeys =
> + shorten_pathkeys_following(root,root->query_pathkeys,pk_guide,
> +   true);
> 
> Add a space after each parameter in function call.

Thank you for pointing out.

> 4.
> +PathKeysComparison
> +compare_pathkeys_ignoring_strategy(List *keys1, List *keys2)
> 
> a. This function return 4 different type of values (PATHKEYS_EQUAL,
>PATHKEYS_DIFFERENT, PATHKEYS_BETTER1, PATHKEYS_BETTER2),
>however the caller just checks for PATHKEYS_EQUAL, isn't it better to
> just
>return either PATHKEYS_EQUAL or PATHKEYS_DIFFERENT.

Hmm.. I agree that it is practically removable, but I think it is
necessary from the view of similarity to the function with the
similar name. Perhaps I want another name which don't suggest the
similarity to compare_pathekeys(), say, bool
pathkeys_are_equal_ignoring_strategy() to change the rerurn
values set.

> b. Another idea could be that instead of writting separate function,
>pass an additional parameter to compare_pathkeys() to distinguish
>for ignore strategy case.

It sounds reasonable. According to my faint memory, the reason
why the function is separated from the original one is, perhaps,
simplly a editorial reason.

> c. In any case, I think it is good to add comments on top of newly
>introduced function (compare_pathkeys_ignoring_strategy) or extend
>the comments of compare_pathkeys() if you want to add a new parameter
>to it.

Ouch, thank you for pointing out. I'll add comment if it remains
in the next patch.

> > Finally, the new patch has grown up to twice in size.. Although
> > it seems a bit large, I think that side-effects are clearly
> > avoided.
> 
> I am bit worried about the extra cycles added by this patch as compare
> to your previous patch; example
> During trimming of paths, it will build index paths (build_index_pathkeys())
> which it will anyway do again during creation of index paths:
> create_index_paths()->get_index_paths()->build_index_paths()->build_index_pathkeys()

I felt the same thing. On stupid measure to reduce cycles would
be caching created pathkeys in IndexOptInfo. I'll try in the next
patch.

> I am not sure if this has direct impact, however I am suspecting trimming
> logic can add quite a few instructions even for cases when it is not
> beneficial.
> At this moment, I also don't have better idea, so lets proceed with review
> of this
> patch unless there is any other better way to achieve it.

I suppose there's some shortcut which can exclude the cases where
obviously there's no use of pathkeys trimming, for example, using
only pathkey lengths. I'll seek for it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org

Re: [HACKERS] Incorrect comment in postgres_fdw.c

2014-07-14 Thread Shigeru Hanada
Fujita-san,

2014-07-11 18:22 GMT+09:00 Etsuro Fujita :
> I think the following comment for store_returning_result() in
> postgres_fdw.c is not right.
>
> /* PGresult must be released before leaving this function. */
>
> I think PGresult should not be released before leaving this function *on
> success* in that function.
>
> (I guess the comment has been copied and pasted from that for
> get_remote_estimate().)

+1 for just removing the comment, because header comment clearly
mentions necessity of releasing the PGresult.


-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-07-14 Thread Abhijit Menon-Sen
At 2014-05-08 15:28:22 +0200, and...@2ndquadrant.com wrote:
>
> > > Hm. Not sure what you're ACKing here ;).
> > 
> > The idea of giving the unallocated memory a NULL key.
> 
> Ok. A new version of the patches implementing that are attached.
> Including a couple of small fixups and docs. The latter aren't
> extensive, but that doesn't seem to be warranted anyway.

I realise now that this email didn't actually have an attachment. Could
you please repost the latest version of this patch?

Thanks.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: issue log message to suggest VACUUM FULL if a table is nearly empty

2014-07-14 Thread Abhijit Menon-Sen
Hi.

Do we have any consensus on this patch?

I took a quick look at it because no review was posted. The patch itself
does what it claims to, but from the discussion it doesn't seem like we
want the feature; or perhaps we only don't want it in its present form.

So which is more appropriate, returned with feedback or rejected?

(In the latter case, the TODO item should also be removed.)

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-07-14 Thread Shigeru Hanada
Kaigai-san,

The v3 patch had conflict in src/backend/tcop/utility.c for newly
added IMPORT FOREIGN SCHEMA statement, but it was trivial.

2014-07-08 20:55 GMT+09:00 Kouhei Kaigai :
> * System catalog name was changed to pg_custom_plan_provider;
>   that reflects role of the object being defined.

ISTM that doc/src/sgml/custom-plan.sgml should be also renamed to
custom-plan-provider.sgml.

> * Also, prefix of its variable names are changed to "cpp"; that
>   means custom-plan-provider.

A "custclass" remains in a comment in
src/include/catalog/pg_custom_plan_provider.h.

> * Syntax also reflects what the command does more. New syntax to
>   define custom plan provider is:
> CREATE CUSTOM PLAN PROVIDER 
>   FOR  HANDLER ;
> * Add custom-plan.sgml to introduce interface functions defined
>   for path/plan/exec methods.
> * FinalizeCustomPlan() callback was simplified to support scan
>   (and join in the future) at the starting point. As long as
>   scan/join requirement, no need to control paramids bitmap in
>   arbitrary way.
> * Unnecessary forward declaration in relation.h and plannode.h
>   were removed, but a few structures still needs to have
>   forward declarations.
> * Fix typos being pointed out.

Check.  I found some typos and a wording "datatype" which is not used
in any other place. Please refer the attached patch.

-- 
Shigeru HANADA


fix_typo_in_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL replay bugs

2014-07-14 Thread Kyotaro HORIGUCHI
Hello, Let me apologize for continuing the discussion even though
the deadline is approaching.

At Fri, 11 Jul 2014 09:49:55 +0900, Michael Paquier  
wrote in 
> Updated patches attached.
> 
> On Thu, Jul 10, 2014 at 7:13 PM, Kyotaro HORIGUCHI
>  wrote:
> >
> > The usage of this is a bit irregular as a (extension) module but
> > it is the nature of this 'contrib'. The rearranged page type
> > detection logic seems neater and keeps to work as intended. This
> > logic will be changed after the new page type detection scheme
> > becomes ready by the another patch.
> 
> No disk format changes will be allowed just to make page detection
> easier (Tom mentioned that earlier in this thread btw). We'll have to
> live with what current code offers, 
> especially considering that adding
> new bytes for page detection for gin pages would double the size of
> its special area after applying MAXALIGN to it.

That's awkward, but I agree with it. By the way, I found
PageHeaderData.pd_flags to have 9 bits room.  It seems to be
usable if no other usage is in sight right now, if the formal
method to identify page types is worth the 3-4 bits there.

# This is a separate discussion from this patch itself.

> > - "make check" runs "regress --use-existing" but IMHO the make
> >   target which is expected to do that is installcheck. I had
> >   fooled to convince that it should run the postgres which is
> >   built dedicatedly:(
> 
> Yes, the patch is abusing of that. --use-existing is specified in this
> case because the test itself is controlling Postgres servers to build
> and fetch the buffer captures. This allows more flexible machinery
> IMO.

Although I doubt necessity of the flexibility seeing the current
testing framework, I don't have so strong objection about
that. Nevertheless, perhaps you are appreciated to put a notice
on.. README or somewhere.

> > - postgres processes are left running after
> >   test_default(custom).sh has stopped halfway. This can be fixed
> >   with the attached patch, but, to be honest, this seems too
> >   much. I'll follow your decision whether or not to do this.
> >   (bufcapt_test_sh_20140710.patch)
> 
> I had considered that first, thinking that it was the user
> responsibility if things are screwed up with his custom scripts. I
> guess that the way you are doing it is a safeguard simple enough
> though, so included with some editing, particularly reporting to the
> user the error code returned by the test script.

Agreed.

> > - test_default.sh is not only an example script which will run
> >   while utilize this facility, but the test script for this
> >   facility itself.
> >   So I think it would be better be added some queries so that all
> >   possible page types available for the default testing. What do
> >   you think about the attached patch?  (hash index is unlogged
> >   but I dared to put it for clarity.)
> 
> Yeah, having a default set of queries run just to let the user get an
> idea of how it works improves things.
> Once again thanks for taking the time to look at that.

Thank you.


regardes,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing NOT IN to use ANTI joins

2014-07-14 Thread David Rowley
On Mon, Jul 14, 2014 at 3:00 AM, Tom Lane  wrote:

> David Rowley  writes:
> > I had another look at this and it appears you were right the first time,
> we
> > need to ensure there's no NULLs on both sides of the join condition.
>
> Ugh.  I'm back to being discouraged about the usefulness of the
> optimization.
>
>
Are you worried about the planning overhead of the not null checks, or is
it more that you think there's a much smaller chance of a real world
situation that the optimisation will succeed? At least the planning
overhead is limited to query's that have NOT IN clauses.

I'm still quite positive about the patch. I think that it would just be a
matter of modifying query_outputs_are_not_nullable() giving it a nice new
name and changing the parameter list to accept not only a Query, but also a
List of Expr. Likely this would be quite a nice reusable function that
likely could be used in a handful of other places in the planner to
optimise various other cases.

When I first decided to work on this I was more interested in getting some
planner knowledge about NOT NULL constraints than I was interested in
speeding up NOT IN, but it seemed like a perfect target or even "excuse" to
draft up some code that checks if an expr can never be NULL.

Since the patch has not been marked as rejected I was thinking that I'd
take a bash at fixing it up, but if you think this is a waste of time,
please let me know.

Regards

David Rowley