date:20150611

On Fri, Jun 12, 2015 at 3:17 PM, Michael Paquier
 wrote:
> On Thu, Jun 11, 2015 at 5:48 PM, Fujii Masao  wrote:
>> On Thu, Jun 11, 2015 at 2:14 PM, Michael Paquier
>>  wrote:
>>> On Thu, Jun 11, 2015 at 1:51 AM, Fujii Masao  wrote:
 Shouldn't pg_rewind ignore that failure of operation? If the file is not
 found in source server, the file doesn't need to be copied to destination
 server obviously. So ISTM that pg_rewind safely can skip copying that file.
 Thought?
>>>
>>> I think that you should fail. Let's imagine that the master to be
>>> rewound has removed a relation file before being stopped cleanly after
>>> its standby has been promoted that was here at the last checkpoint
>>> before forking, and that the standby still has the relation file after
>>> promotion. You should be able to copy it to be able to replay WAL on
>>> it. If the standby has removed a file in the file map after taking the
>>> file map, I guess that the best thing to do is fail because the file
>>> that should be here for the rewound node cannot be fetched.
>>
>> In this case, why do you think that the file should exist in the old master?
>> Even if it doesn't exist, ISTM that the old master can safely replay the WAL
>> records related to the file when it restarts. So what's the problem
>> if the file doesn't exist in the old master?
>
> Well, some user may want to rewind the master down to the point where
> WAL forked, and then recover it immediately when a consistent point is
> reached just at restart instead of replugging it into the cluster. In
> this case I think that you need the relation file of the dropped
> relation to get a consistent state. That's still cheaper than
> recreating a node from a fresh base backup in some cases, particularly
> if the last base backup taken is far in the past for this cluster.

So it's the case where a user wants to recover old master up to the point
BEFORE the file in question is deleted in new master. At that point,
since the file must exist, pg_rewind should fail if the file cannot be copied
from new master. Is my understanding right?

As far as I read the code of pg_rewind, ISTM that your scenario never happens.
Because pg_rewind sets the minimum recovery point to the latest WAL location
in new master, i.e., AFTER the file is deleted. So old master cannot stop
recovering before the file is deleted in new master. If the recovery stops
at that point, it fails because the minimum recovery point is not reached yet.

IOW, after pg_rewind runs, the old master has to replay the WAL records
which were generated by the deletion of the file in the new master.
So it's okay if the old master doesn't have the file after pg_rewind runs,
I think.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind failure by file deletion in source server

2015-06-11 Thread Michael Paquier

On Thu, Jun 11, 2015 at 5:48 PM, Fujii Masao  wrote:
> On Thu, Jun 11, 2015 at 2:14 PM, Michael Paquier
>  wrote:
>> On Thu, Jun 11, 2015 at 1:51 AM, Fujii Masao  wrote:
>>> Shouldn't pg_rewind ignore that failure of operation? If the file is not
>>> found in source server, the file doesn't need to be copied to destination
>>> server obviously. So ISTM that pg_rewind safely can skip copying that file.
>>> Thought?
>>
>> I think that you should fail. Let's imagine that the master to be
>> rewound has removed a relation file before being stopped cleanly after
>> its standby has been promoted that was here at the last checkpoint
>> before forking, and that the standby still has the relation file after
>> promotion. You should be able to copy it to be able to replay WAL on
>> it. If the standby has removed a file in the file map after taking the
>> file map, I guess that the best thing to do is fail because the file
>> that should be here for the rewound node cannot be fetched.
>
> In this case, why do you think that the file should exist in the old master?
> Even if it doesn't exist, ISTM that the old master can safely replay the WAL
> records related to the file when it restarts. So what's the problem
> if the file doesn't exist in the old master?

Well, some user may want to rewind the master down to the point where
WAL forked, and then recover it immediately when a consistent point is
reached just at restart instead of replugging it into the cluster. In
this case I think that you need the relation file of the dropped
relation to get a consistent state. That's still cheaper than
recreating a node from a fresh base backup in some cases, particularly
if the last base backup taken is far in the past for this cluster.

>> Documentation should be made clearer about that with a better error
>> message...
>
> I'm wondering how we can recover (or rewind again) the old master from
> that error. This also would need to be documented if we decide not to
> fix any code regarding this problem...

FWIW, here is a scenario able to trigger the error with 1 master (port
5432, data at ~/data/5432) and 1 standby (port 5433, data at
~/data/5433).
$ psql -c 'create table aa as select generate_series(1,100)'
# Promote standby
$ pg_ctl promote -D ~/data/5433/
# Drop table on master
$ psql -c 'drop table aa'
DROP TABLE
$ pg_ctl stop -D ~/data/5432/

At this point there is no more relation file on master for 'aa', it is
still present on standby. Running pg_rewind at this point will work,
the relation file would be copied from the promoted standby to master.

$ lldb -- pg_rewind -D 5432 --source-server="port=5433 dbname=postgres"
Breakpoint pg_rewind after fetchSourceFileList() and before replaying
the changes from the block map, drop table 'aa' on standby and
checkpoint it, then the source file list is inconsistent and pg_rewind
will fail. This can just happen with --source-server, with
--source-pgdata

Adding a sleep() of a couple of seconds in pg_rewind may be better to
trigger directly the error ;), with DROP DATABASE for example.

Regards,
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comfortably check BackendPID with psql

2015-06-11 Thread Naoya Anzai

> > >Not a big fan of that abbreviation itself. What I'd wondered about
> > >instead - and actually had patched into my psql at some point - is
> > >adding an appropriate escape to psql's PROMPT. I think that'd serve your
> > >purpose as well?
> > 
> > +3.14159; that would be hugely helpful when using gdb.
> 
> You can get that today.  In ~/.psqlrc:
> 
> SELECT pg_catalog.pg_backend_pid() AS backend_pid \gset
> \set PROMPT1 '%m %:backend_pid: %/%R%# '
> 
> It doesn't update after \connect, but the overlap between my use of \connect
> and my use of debuggers is tiny.

Thank you all!
My hack is going to be much smoother.

Regards,

Naoya

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-an...@xc.jp.nec.com
---


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it possible to have a "fast-write" Index?

On 5 June 2015 at 18:07, deavid  wrote:

> There are several use cases where I see useful an index, but adding it
> will slow too much inserts and updates.
> For example, when we have 10 million rows on a table, and it's a table
> which has frequent updates, we need several index to speed up selects, but
> then we'll slow down updates a lot, specially when we have 10 or more
> indexes.
> Other cases involve indexes for text search, which are used only for user
> search and aren't that important, so we want to have them, but we don't
> want the overload they put whenever we write on the table.
> I know different approaches that already solve some of those problems in
> some ways (table partitioning, partial indexes, etc), but i don't feel they
> are the solution to every problem of this kind.
>
> Some people already asked for "delayed write" indexes, but the idea gets
> discarded because the index could get out of sync, so it can omit results
> and this is unacceptable. But i think maybe that could be fixed in several
> ways and we can have a fast and reliable index (but maybe not so fast on
> selects).
>

This is exactly the use case and mechanism for BRIN indexes.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] The purpose of the core team

On 12 June 2015 at 06:48, Noah Misch  wrote:

> On Thu, Jun 11, 2015 at 03:47:06PM -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > > http://www.postgresql.org/developer/core/
>
> > After going over this a few times, there is one thing that strikes me
> > that nobody has mentioned: the list of tasks mentioned there has one
> > that's completely unlike the others.  These are related to human
> > relations:
> >
> > Acting as a conduit for confidential communication.
> > Making policy announcements.
> > Managing permissions for commits, infrastructure, etc.
> > Handling disciplinary issues.
> > Making difficult decisions when consensus is lacking.
> >
> > while this one is highly technical:
> > Coordinating release activities.
>
> Quite so.  Deciding "it's time for a release" requires the same knowledge
> and
> skills as deciding "it's time to commit patch P", yet we have a
> special-case
> decision procedure.  A release does require people acting in concert for a
> span of a few days, but that precise scheduling is work for an
> administrative
> assistant, not work befitting -core.


Deciding "WHAT goes in the next release?" is what Committers do, by
definition.

It seems strange to have a different mailing list for "WHEN is the next
release needed?", so those two things should be combined.


> > It seems that only this last one is where most people seem to have a
> > problem.  I wonder if it makes sense to create a separate group that
> > handles release activites -- the "release team."
>
> I think the decision to initiate or revoke release scheduling belongs in
> the
> same forum as patch development, usually -hackers or -security.  We'd need
> to
> pick a way to clearly signal the discussion's conclusion, analogous to how
> a
> pushed commit unambiguously disposes a patch proposal.  The balance of
> coordinating release activities is mechanical, and -packagers seems
> adequate
> for it.


Packagers should be about "HOW do we make the next release", which is
separate from the above.

Ultimately, "How" effects "When", but "When is it needed?" is an earlier
thought.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Reconsidering the behavior of ALTER COLUMN TYPE

2015-06-11 Thread Noah Misch

On Thu, Jun 11, 2015 at 03:41:49PM -0500, Merlin Moncure wrote:
> On Thu, Jun 11, 2015 at 3:12 PM, Tom Lane  wrote:
> > Specifically, it's not clear why you can change the type of a uuid[]
> > column with
> >
> > alter table t alter u type text[];
> >
> > but then you can't change it back with
> >
> > alter table t alter u type uuid[];
> >
> > The reason of course is that uuid-to-text is considered an
> > assignment-grade coercion while text-to-uuid is not.

> > Specifically, after a bit of thought, I suggest that
> >
> > (1) If there's no USING, attempt to coerce the column value as though
> > an *explicit* coercion were used.

> > This could be documented as "if there is no USING, the default behavior
> > is as if you'd written USING column::newtype".

This removes the training wheels from varchar(N) in particular, and
potentially from other three-argument cast functions:

[local] test=# create table t (c) as select 1234;
SELECT 1
[local] test=# alter table t alter c type varchar(2);
ERROR:  value too long for type character varying(2)
[local] test=# alter table t alter c type varchar(2) using c::varchar(2);
ALTER TABLE
[local] test=# table t;
 c  

 12
(1 row)

I suppose you could accept explicit-only casts yet pass "false" for the
isExplicit argument, but that is a wart.

> > In any case, we oughta use two different error messages for the two cases,
> > as per my comment in the above thread.  That seems like a back-patchable
> > bug fix, though of course any semantics change should only be in HEAD.
> 
> I have a slight preference to keep it to tightening up the wording on
> both the hint and the error (for example, "Perhaps you meant USING
> foo::type?") but leaving the behavior alone.

+1.  The HINT could certainly provide situation-specific help.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comfortably check BackendPID with psql

2015-06-11 Thread Noah Misch

On Thu, Jun 11, 2015 at 04:05:13PM -0500, Jim Nasby wrote:
> On 6/11/15 4:55 AM, Andres Freund wrote:
> >On 2015-06-11 09:41:17 +, Naoya Anzai wrote:
> >>This is a so tiny patch but I think it is very useful for hackers and DBAs.
> >>When we debug with psql, we frequently use "SELECT pg_backend_pid();".
> >>This can change the input of the 24 characters to the only 4 characters!
> >
> >Not a big fan of that abbreviation itself. What I'd wondered about
> >instead - and actually had patched into my psql at some point - is
> >adding an appropriate escape to psql's PROMPT. I think that'd serve your
> >purpose as well?
> 
> +3.14159; that would be hugely helpful when using gdb.

You can get that today.  In ~/.psqlrc:

SELECT pg_catalog.pg_backend_pid() AS backend_pid \gset
\set PROMPT1 '%m %:backend_pid: %/%R%# '

It doesn't update after \connect, but the overlap between my use of \connect
and my use of debuggers is tiny.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

2015-06-11 Thread Noah Misch

On Thu, Jun 11, 2015 at 03:47:06PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > http://www.postgresql.org/developer/core/

> After going over this a few times, there is one thing that strikes me
> that nobody has mentioned: the list of tasks mentioned there has one
> that's completely unlike the others.  These are related to human
> relations:
> 
> Acting as a conduit for confidential communication.
> Making policy announcements.
> Managing permissions for commits, infrastructure, etc.
> Handling disciplinary issues.
> Making difficult decisions when consensus is lacking.
> 
> while this one is highly technical:
> Coordinating release activities.

Quite so.  Deciding "it's time for a release" requires the same knowledge and
skills as deciding "it's time to commit patch P", yet we have a special-case
decision procedure.  A release does require people acting in concert for a
span of a few days, but that precise scheduling is work for an administrative
assistant, not work befitting -core.

> It seems that only this last one is where most people seem to have a
> problem.  I wonder if it makes sense to create a separate group that
> handles release activites -- the "release team."

I think the decision to initiate or revoke release scheduling belongs in the
same forum as patch development, usually -hackers or -security.  We'd need to
pick a way to clearly signal the discussion's conclusion, analogous to how a
pushed commit unambiguously disposes a patch proposal.  The balance of
coordinating release activities is mechanical, and -packagers seems adequate
for it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On columnar storage

2015-06-11 Thread Amit Kapila

On Fri, Jun 12, 2015 at 4:33 AM, Alvaro Herrera 
wrote:
>
> We hope to have a chance to discuss this during the upcoming developer
> unconference in Ottawa.  Here are some preliminary ideas to shed some
> light on what we're trying to do.
>
>
> I've been trying to figure out a plan to enable native column stores
> (CS or "colstore") for Postgres.  Motivations:
>
> * avoid the 32 TB limit for tables
> * avoid the 1600 column limit for tables
> * increased performance
>
> There already are some third-party CS implementations for Postgres; some
> of these work on top of the FDW interface, others are simply proprietary
> forks.  Since I don't have access to any of their code, it's not much I
> can learn from them.  If people with insider knowledge on them can chime
> in, perhaps we can work together -- collaboration is very welcome.
>
> We're not interested in perpetuating the idea that a CS needs to go
> through the FDW mechanism.  Even if there's a lot of simplicity of
> implementation, it's almost certain to introduce too many limitations.
>
> Simply switching all our code to use columnar storage rather than
> row-based storage is unlikely to go well.  We're aiming at letting some
> columns of tables be part of a CS, while other parts would continue to
> be in the heap.  At the same time, we're aiming at opening the way for
> different CS implementations instead of trying to provide a single
> one-size-fits-all one.
>
>
> There are several parts to this:
>
> 1. the CSM API
> 2. Cataloguing column stores
> 3. Query processing: rewriter, optimizer, executor
>

I think another important point is about the format of column stores, in
Page format used by index/heap and how are they organised?

>
> The Column Store Manager API
> 
>
> Since we want to have pluggable implementations, we need to have a
> registry of store implementations.  I propose we add a catalog
> pg_cstore_impl with OID, name, and a bunch of function references to
> "open" a store, "getvalue" from it, "getrows" (to which we pass a qual
> and get a bunch of tuple IDs back), "putvalue".
>
> This is in line with our procedural language support.
>
> One critical detail is what will be used to identify a heap row when
> talking to a CS implementation.  There are two main possibilities:
>
> 1. use CTIDs
> 2. use some logical tuple identifier
>
> Using CTIDs is simpler.  One disadvantage is that every UPDATE of a row
> needs to let the CS know about the new location of the tuple, so that
> the value is known associated with the new tuple location as well as the
> old.  This needs to happen even if the value of the column itself is not
> changed.

Isn't this somewhat similar to index segment?
Will the column store obey snapshot model similar to current heap tuples,
if so will it derive the transaction information from heap tuple?



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 1:15 PM, Bruce Momjian  wrote:
> I have committed the first draft of the 9.5 release notes.  You can view
> the output here:
>
> http://momjian.us/pgsql_docs/release-9-5.html
>
> and it will eventually appear here:
>
> http://www.postgresql.org/docs/devel/static/release.html

I found some minor issues.

e.g. IDENTIFY_COMMAND, are not logged, even when
log_statements is set to all.

Typos.
s/IDENTIFY_COMMAND/IDENTIFY_SYSTEM
s/log_statements/log_statement

RETURN WHERE

Looks like garbage.

   Add VERBOSE option to REINDEX (Fujii Masao)

Could you change the author name to Sawada Masahiko?
He is the author of the feature.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

2015-06-11 Thread Amit Kapila

On Thu, Jun 11, 2015 at 8:24 PM, Bruce Momjian  wrote:
>
> On Thu, Jun 11, 2015 at 10:20:13AM +0530, Amit Kapila wrote:
> > On Thu, Jun 11, 2015 at 9:45 AM, Bruce Momjian  wrote:
> > >
> > > I have committed the first draft of the 9.5 release notes.  You can
view
> > > the output here:
> > >
> > > http://momjian.us/pgsql_docs/release-9-5.html
> > >
> >
> > Thanks for writing the Release notes.
> >
> > Some comments:
> >
> > Have pg_basebackup use a tablespace mapping file, to allow for file
paths of
> > 100+ characters in length
> >
> >
> > Also shall we mention about below in Migrations to 9.5 section
> >
> > "pg_basebackup will not not work in tar mode against 9.4 and older
servers,
> >  as we have introduced a new protocol option in that mode."
>
> Yes, added.  The attached, applied patch has both of these mentions, and
> mentions 'tar' mode.
>

Attached patch makes sense.  Thanks.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Useless mention of RMGRDESCSOURCES in src/bin/pg_rewind/Makefile

On Tue, Jun 9, 2015 at 1:39 PM, Michael Paquier
 wrote:
> Hi all,
>
> pg_rewind's Makefile uses RMGRDESCSOURCES:
> EXTRA_CLEAN = $(RMGRDESCSOURCES) xlogreader.c
> However this variable is defined only in the Makefile of pg_xlogdump
> so it has no utility in this case.
> Attached is a cleanup patch.

Good catch. Patch applied. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it possible to have a "fast-write" Index?

2015-06-11 Thread Qingqing Zhou

On Fri, Jun 5, 2015 at 10:59 AM, Tom Lane  wrote:
> So I really doubt that anyone would have any enthusiasm for saddling btree
> with a similar mechanism.  It's complicated (and has been the cause of
> multiple bugs); it's hard to figure out when is the optimal time to flush
> the pending insertions; and it slows down searches in favor of making
> inserts cheap, which is generally not the way to bet --- if that's the
> tradeoff you want, why not drop the index altogether?
>
I have seen a case that a major fact table with up to 7 indices, every
15~60 mins with large amount of data loading, and there are
concurrrent seeks against indices at the same time. We can play with
partitioning, or sarcrifice some application semantics, to alleviate
the pressure but it is good to see if we can improve: sorting and
batching insert into btree is helpful for better IO and locking
behavior. So can we guard the case that hard to handle, e.g., the
indices enforcing some constraints (like uniqueness), and improve the
loading senario?

Hint bits update is also painful in above case, but it is out of the topic here.

Thanks,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On columnar storage

2015-06-11 Thread Stephen Frost

Josh,

* Josh Berkus (j...@agliodbs.com) wrote:
> On 06/11/2015 04:03 PM, Alvaro Herrera wrote:
> > We hope to have a chance to discuss this during the upcoming developer
> > unconference in Ottawa.  Here are some preliminary ideas to shed some
> > light on what we're trying to do.
> 
> Added to:
> https://wiki.postgresql.org/wiki/PgCon_2015_Developer_Unconference#Topics

I believe it was already there?

Look for 'Native Columnar Storage'.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] The purpose of the core team

On 06/11/2015 05:08 PM, Peter Geoghegan wrote:
> On Thu, Jun 11, 2015 at 9:49 AM, Andrew Dunstan  wrote:
>>> JoshB: Advocacy. There is a strong argument that does not need to be a
>>> core position.
>>>
>>
>> I strongly disagree with this. On the contrary, I think there is a very good
>> argument that FOR such a position in core.
> 
> +1.

FYI, what do I mostly for Core is:

a) Press Relations

b) Corporate Relations

Both of those need to be handled by a core team member, because they
often fall under the "confidential contact" portion of Core's duties, or
require nonpublic knowledge of things like security releases.  I agree
that general advocacy can certainly be handled outside core, and should
be -- and, for that matter, is.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On columnar storage

On 06/11/2015 04:03 PM, Alvaro Herrera wrote:
> We hope to have a chance to discuss this during the upcoming developer
> unconference in Ottawa.  Here are some preliminary ideas to shed some
> light on what we're trying to do.

Added to:
https://wiki.postgresql.org/wiki/PgCon_2015_Developer_Unconference#Topics

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

On Thu, Jun 11, 2015 at 9:49 AM, Andrew Dunstan  wrote:
>> JoshB: Advocacy. There is a strong argument that does not need to be a
>> core position.
>>
>
> I strongly disagree with this. On the contrary, I think there is a very good
> argument that FOR such a position in core.

+1.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On columnar storage

2015-06-11 Thread Qingqing Zhou

On Thu, Jun 11, 2015 at 4:03 PM, Alvaro Herrera
 wrote:
> I've been trying to figure out a plan to enable native column stores
> (CS or "colstore") for Postgres.  Motivations:
>
> * avoid the 32 TB limit for tables
> * avoid the 1600 column limit for tables
> * increased performance
>
And better compression ratio.

> We're not interested in perpetuating the idea that a CS needs to go
> through the FDW mechanism.
>
Agree. It is cleaner to add a ColumnScan node which does a scan
against a columnar table, and a possible ColumnIndexScan for an
indexed columnar table seek.

> Since we want to have pluggable implementations, we need to have a
> registry of store implementations.
>
If we do real native implementation, where columnar store sits on par
with heap, can give us arbitray flexibility to control performance and
transaction, without worrying about interface (you defined below)
compatibility.

> One critical detail is what will be used to identify a heap row when
> talking to a CS implementation.  There are two main possibilities:
>
> 1. use CTIDs
> 2. use some logical tuple identifier
>
I like the concept of half row, half columnar table: this allows row
part good for select * and updates, and columnar part for other
purpose. Popular columnar-only table uses position alignment, which is
virtual (no storage), to associate each column value. CTIDs are still
needed but not for this purpose. An alternaive is:
1.  Allow column groups, where several columns physically stored together;
2.  Updates are handled by a separate row store table associated with
each columnar table.

> Query Processing
> 
>
If we treat columnar storage as first class citizen as heap, we can
model after heap, which enables much natural change in parser,
rewriter, planner and executor.

Regards,
Qingqing


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed


Hi,

On 06/11/15 16:28, Robert Haas wrote:

On Wed, Jun 10, 2015 at 10:57 PM, Kouhei Kaigai  wrote:

The attached patch replaces this palloc0() by MemoryContextAllocHuge() + 
memset().
Indeed, this hash table is constructed towards the relation with 
nrows=119994544,
so, it is not strange even if hash-slot itself is larger than 1GB.


You forgot to attach the patch, I think.  It looks to me like the size
of a HashJoinTuple is going to be 16 bytes, so 1GB/16 = ~64 million.
That's a lot of buckets, but maybe not unreasonably many if you've got
enough memory.


Actually, HashJoinTuple is just a pointer, so it's 8 bytes, so 1GB is 
enough for 134217728 million rows, which is more than the 119994544 rows 
from the plan.


Also, looking at the error message again:

ERROR:  invalid memory alloc request size 1073741824

but this time with beer goggles, I noticed that the amount reported is 
exactly 1GB. The backtrace also shows the error happens right inside 
ExecHashTableCreate (and not in the resize which may happen later), 
which means it gets the nbuckets from ExecChooseHashTableSize directly.


The resize is probably still broken as I mentioned before, but this 
crash before reaching that code as the estimates are high enough to 
trigger the issue. But ExecChooseHashTableSize is supposed to keep all 
the checks from previous versions, and I think it actually does.


But I don't see there any checks regarding the 1GB boundary. What I see 
is this:


  max_pointers = (work_mem * 1024L) / sizeof(void *);
  max_pointers = Min(max_pointers, INT_MAX / 2);

  ...

  dbuckets = Min(dbuckets, max_pointers);

That has nothing to do with 1GB, and it's in the code since the time 
work_mem was limited by 2GB, so perhaps there was some reasoning that 
it's sufficient (because the tuples stored in the hash table will need 
more than 1/2 of the memory, or something like that).


But today this issue is more likely, because people have more RAM and 
use higher work_mem values, so the max_pointers value gets much higher. 
In the extreme it may get to INT_MAX/2, so ~1 billion, so the buckets 
would allocate ~8B on 64-bit machines (on 32-bit machines we'd also get 
twice the number of pointers, compared to 64 bits, but that's mostly 
irrelevant, because of the memory size limits).


It's also true, that the hash-join improvements in 9.5 - namely the 
decrease of NTUP_PER_BUCKET from 10 to 1, made this error more likely. 
With 9.4 we'd use only 16777216 buckets (128MB), because that gets us 
below 10 tuples per bucket. But now we're shooting for 1 tuple per 
bucket, so we end up with 131M buckets, and that's 1GB.


I see two ways to fix this:

(1) enforce the 1GB limit (probably better for back-patching, if that's
necessary)

(2) make it work with hash tables over 1GB

I'm in favor of (2) if there's a good way to do that. It seems a bit 
stupid not to be able to use fast hash table because there's some 
artificial limit. Are there any fundamental reasons not to use the 
MemoryContextAllocHuge fix, proposed by KaiGai-san?



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] On columnar storage

We hope to have a chance to discuss this during the upcoming developer
unconference in Ottawa.  Here are some preliminary ideas to shed some
light on what we're trying to do.


I've been trying to figure out a plan to enable native column stores
(CS or "colstore") for Postgres.  Motivations:

* avoid the 32 TB limit for tables
* avoid the 1600 column limit for tables
* increased performance

There already are some third-party CS implementations for Postgres; some
of these work on top of the FDW interface, others are simply proprietary
forks.  Since I don't have access to any of their code, it's not much I
can learn from them.  If people with insider knowledge on them can chime
in, perhaps we can work together -- collaboration is very welcome.

We're not interested in perpetuating the idea that a CS needs to go
through the FDW mechanism.  Even if there's a lot of simplicity of
implementation, it's almost certain to introduce too many limitations.

Simply switching all our code to use columnar storage rather than
row-based storage is unlikely to go well.  We're aiming at letting some
columns of tables be part of a CS, while other parts would continue to
be in the heap.  At the same time, we're aiming at opening the way for
different CS implementations instead of trying to provide a single
one-size-fits-all one.


There are several parts to this:

1. the CSM API
2. Cataloguing column stores
3. Query processing: rewriter, optimizer, executor


The Column Store Manager API


Since we want to have pluggable implementations, we need to have a
registry of store implementations.  I propose we add a catalog
pg_cstore_impl with OID, name, and a bunch of function references to
"open" a store, "getvalue" from it, "getrows" (to which we pass a qual
and get a bunch of tuple IDs back), "putvalue".

This is in line with our procedural language support.

One critical detail is what will be used to identify a heap row when
talking to a CS implementation.  There are two main possibilities:

1. use CTIDs
2. use some logical tuple identifier

Using CTIDs is simpler.  One disadvantage is that every UPDATE of a row
needs to let the CS know about the new location of the tuple, so that
the value is known associated with the new tuple location as well as the
old.  This needs to happen even if the value of the column itself is not
changed.

Using logical tuple identifiers solves this problem: an update does not
change the LTID, so the tuple colstore needn't be involved unless the
attribute(s) in the colstore is being changed.  The downside is that the
logical tuple identifier must come from somewhere.  We could either use
some user attribute, if there's something appropriate.  But it's
probably not good to simply use any primary key that the user has
specified.  (Also, having an UPDATE change the primary key would be
troublesome).  We could also offer the choice of having an autogenerated
value that's not user-visible; we currently don't have non-user-visible
columns, so this would be additional implementation effort.
Furthermore, we should think about interactions between this and the
IDENTITY stuff we currently have for replication -- my impression is
that IDENTITY does not necessarily represent an useful identifier for
column store purposes.

All in all, it seems prudent to limit the initial implementation to use
CTIDs only, and leave LTIDs for a later stage.


Cataloguing Column Stores
-

Each table with columns in a separate store will have relhasstore=t.
This hints construction of its relcache entry to obtain rows from
pg_cstore for that table.  The new catalog pg_cstore looks like this:

 cstname | cststoreid | cstrelid | cstnatts | cstatts 


cstname is the store name; unique within each relation.
cststoreid is the OID of the pg_cstore_impl row.
cstorerelid is the OID of the table that this cstore is for.
cstnatts is the number of columns in the store
cstatts is an array of attnums contained in this store.

This design permits having multiple stores for a table, and one or
more columns in a store.  We will focus on the case that a table has a
single column store, and a column store has a single column, because we
believe this simplifies several things.

Query Processing


Rewriter

Parsing occurs as currently.  During query rewrite, specifically at the
bottom of the per-relation loop in fireRIRrules(), we will modify the
query tree: each relation RTE containing a colstore will be replaced
with a JoinExpr containing the relation as left child and the colstore
as right child (1).  The colstore RTE will be of a new RTEKind.  For
each such change, all Var nodes that point to attnums stored in the
colstore will modified so that they reference the RTE of the colstore
instead (2).

(1) This seems very similar to what convert_ANY_sublink_to_join() does.

(2) This is very similar to ChangeVarNodes does, except that we modify
only some of the var nodes pointing to the

Re: [HACKERS] CREATE POLICY and RETURNING

2015-06-11 Thread Stephen Frost

Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> On 8 June 2015 at 16:53, Stephen Frost  wrote:
> > I definitely don't like the idea of returning a portion of the data in a
> > RETURNING clause.  Returning an error if the RETURNING clause ends up
> > not passing the SELECT policy is an interesting idea, but I do have
> > doubts about how often that will be a useful exercise.  Another issue to
> > note is that, currently, SELECT policies never cause errors to be
> > returned, and this would change that.
> 
> True. Before UPSERT was added, it was the case that USING clauses from
> all kinds of policies didn't cause errors, and CHECK clauses did, but
> that has now changed for UPDATE, and I don't think it's necessarily a
> problem to change it for SELECT, if we decide that it makes sense to
> apply SELECT policies to rows returned by RETURNING clauses.

Fair enough.

> > There was discussion about a VISIBILITY policy instead of a SELECT
> > policy at one point.  What I think we're seeing here is confusion about
> > the various policies which exist, because the USING policy of an UPDATE
> > is precisely its VISIBILITY policy, in my view, which is why I wasn't
> > bothered by the RETURNING clause being able to be used in that case.  I
> > recall working on the documentation to make this clear, but it clearly
> > needs work.
> 
> Yeah, perhaps there's scope for improving the documentation,
> regardless of whether or not we change the current behaviour. One
> place that we could add additional documentation is the pages
> describing the INSERT, UPDATE and DELETE commands. These currently
> each have a paragraph in their main description sections describing
> what privileges they require, so perhaps we should add similar
> paragraphs describing what RLS policies are checked, if RLS is enabled
> on the table.

Agreed.

> > The primary use-case for having a different VISIBILITY for UPDATE (or
> > DELETE) than for SELECT is that you wish to allow users to only modify a
> > subset of the rows in the table which they can see.  I agree that the
> > current arrangement is that this can be used to allow users to UPDATE
> > records which they can't see (except through a RETURNING).  Perhaps
> > that's a mistake and we should, instead, force the SELECT policy to be
> > included for the UPDATE and DELETE policies and have the USING clauses
> > from those be AND'd with the SELECT policy.  That strikes me as a much
> > simpler approach than applying the SELECT policy against the RETURNING
> > clause and then throwing an error if it fails,
> 
> I don't think that quite addresses the concern with RETURNING though
> because the resulting clauses would only be applied to the old
> (pre-update) data, whereas a RETURNING clause might still return new
> data that you couldn't otherwise see.

This part doesn't make sense to me.  You can issue a SELECT statement
which will return records that are not visible to you also, by simply
running some transformation of the appropriate column as it comes out.
I don't believe it makes any sense to try and protect the results of a
transformation from the user who is defining what the transformations
are.

Worse, with this approach, it might *look* like RETURNING results are
being filtered in a way that prevents users from being able to see data
they shouldn't be allowed to see in the table via the SELECT policy when
they can- they just have to transform the appropriate column during the
UPDATE to get it to pass the UPDATE CHECK and SELECT USING policies.

Here is an example of what I mean:

CREATE TABLE my_data (
  color text,
  secret text
);

CREATE POLICY only_green ON my_data
  FOR SELECT USING (color = 'green');

CREATE POLICY allow_update ON my_data FOR UPDATE
  USING (true) -- a mistake
  WITH CHECK (color = 'green');

If the attacker was interested in rows which had 'red' for the color,
they could simply run:

UPDATE my_data SET color = 'green' WHERE color = 'red' RETURNING *;

The above command returns all the 'secret' data for rows which had
a color of 'red' in the table, even though the SELECT policy clearly
doesn't allow the user to see those rows and the UPDATE WITH CHECK
policy only allows the user to create records with the color 'green',
and this would still be the case even with the proposed changes.

Note that the above WHERE clause isn't actually necessary at all, but
the 'color' column would need to be set to 'green' to pass the SELECT
and UPDATE policies, and therefore the attacker would lose the data in
that column- but we're not trying to hide just the data in that
particular column, nor do we want to have to define policies for every
column.

Basically, I don't think it makes much sense to try and enforce a SELECT
policy on a value which an attacker can set.  This is very different
from enforcing a policy about adding rows to a table or enforcing a
policy about what rows are visible to a user based on the data already
in the table.

Ultimately, we're not actually

Re: [HACKERS] Construction of Plan-node by CSP (RE: Custom/Foreign-Join-APIs)

2015-06-11 Thread Kouhei Kaigai

> Robert Haas  writes:
> > Tom, do you want to review this patch and figure out how to solve the
> > underlying problem?  If not, I will take care of it.  But I will be
> > unhappy if I put time and effort into this and then you insist on
> > changing everything afterwards, again.
> 
> [ sorry for slow response, been busy ]  I will take a look.
> 
>   regards, tom lane
>
Tom, please don't forget the problem.

It is still problematic for custom-scan provider that tries to
implement own join logic, thus we still have to apply additional
patch (or copy&paste createplan.c to module's source).

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



custom-join-children.v2.patch
Description: custom-join-children.v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 2:17 PM, Robert Haas  wrote:
> On Thu, Jun 11, 2015 at 4:23 PM, Peter Geoghegan  wrote:
>> Secondly, Robert didn't credit himself as an author in his commit
>> message for the abbreviated keys infrastructure + text opclass support
>> *at all*. However, I think that Robert should be listed as a secondary
>> author of the abbreviated keys infrastructure, and that he would agree
>> that I am clearly the primary author. Andrew Gierth did work on the
>> datum case for sortsupport + abbreviation, so I agree he should be
>> listed as a secondary author of the infrastructure too, after Robert.
>
> I'd probably say Peter, Andrew, me.

If you bunch everything together, then yes, I'd agree.

>> I think there should be a total of 4 items related to sorting. The
>> wording I come up with may not be appropriate, but will give you an
>> idea:
>>
>> * Allow sorting to be performed by inlined, non-SQL-callable
>> comparison functions for CREATE INDEX, REINDEX and CLUSTER operations
>> based on a B-Tree operator class. (5ea86e6e6 -- Geoghegan)
>>
>> * Add abbreviated key sorting infrastructure. This allows B-Tree
>> operator classes to provide compact abbreviated representations of
>> pass-by-reference types which are sorted with inexpensive comparisons.
>> This makes sort operations with support for the infrastructure very
>> significantly faster in the common case where most comparisons can be
>> resolved with the abbreviated representation alone. (4ea51cdfe85 --
>> Geoghegan, Haas, Gierth, with Gierth's contribution coming from
>> 78efd5c1 alone)
>>
>> * Add sortsupport (support for non-SQL callable interface for
>> comparators) with abbreviation capability to text/varlena operator
>> class. This significantly accelerates sorting on text columns.
>> (4ea51cdfe85 too, but also b34e37bf. Worth noting separately IMV.
>> Geoghegan, Haas).
>>
>> * Add sortsupport (support for non-SQL callable interface for
>> comparators) with abbreviation capability to numeric operator class.
>> This significantly accelerates sorting on numeric columns. (abd94bcac,
>> Gierth)
>
> I think this is overkill.  Bruce's single entry seems fine to me,
> although we could adjust the wording a bit.  We need this to be simple
> enough that someone not immersed in the code can understand it, and as
> soon as you talk about sortsupport as if it were a thing, you lose
> that.

I think that that is a valid point of view, but that we should get
across that sortsupport works for all types with CREATE INDEX +
CLUSTER now, including integers, for example. The reviewer of that
particular piece of work, Andreas Karlsson, put it at a 10% - 15%
improvement for CREATE INDEX on an integer column IIRC. That's not a
bad little boost.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is it possible to have a "fast-write" Index?

2015-06-11 Thread Jim Nasby


On 6/5/15 6:54 PM, deavid wrote:


So the problem is: i see a low iowait, and CPU time for one core is at
80-90% most of the time. I can buy more ram, better disks, or cpu's with
more cores. But one cpu core would have more-or-less the same speed no
matter how much money you invest.

When someone wants a delayed-write index is similar to setting
  "synchronous_commit = off". We want to give an OK to the backend as
soon as is possible and do this work in background. But we also want
some reliability against crashes.

Also, if the task is done in background it may be done from other
backend, so probably several indexes could be packed at once using
different backend processes. We could use the entire cpu if our index
writes aren't tied to the session who wrote the row.


Something that might help here would be doing the index maintenance in 
parallel via background workers. There's now enough parallelism 
infrastructure that it shouldn't be too hard to hack up a quick test of 
that.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 4:23 PM, Peter Geoghegan  wrote:
> Secondly, Robert didn't credit himself as an author in his commit
> message for the abbreviated keys infrastructure + text opclass support
> *at all*. However, I think that Robert should be listed as a secondary
> author of the abbreviated keys infrastructure, and that he would agree
> that I am clearly the primary author. Andrew Gierth did work on the
> datum case for sortsupport + abbreviation, so I agree he should be
> listed as a secondary author of the infrastructure too, after Robert.

I'd probably say Peter, Andrew, me.

> I think there should be a total of 4 items related to sorting. The
> wording I come up with may not be appropriate, but will give you an
> idea:
>
> * Allow sorting to be performed by inlined, non-SQL-callable
> comparison functions for CREATE INDEX, REINDEX and CLUSTER operations
> based on a B-Tree operator class. (5ea86e6e6 -- Geoghegan)
>
> * Add abbreviated key sorting infrastructure. This allows B-Tree
> operator classes to provide compact abbreviated representations of
> pass-by-reference types which are sorted with inexpensive comparisons.
> This makes sort operations with support for the infrastructure very
> significantly faster in the common case where most comparisons can be
> resolved with the abbreviated representation alone. (4ea51cdfe85 --
> Geoghegan, Haas, Gierth, with Gierth's contribution coming from
> 78efd5c1 alone)
>
> * Add sortsupport (support for non-SQL callable interface for
> comparators) with abbreviation capability to text/varlena operator
> class. This significantly accelerates sorting on text columns.
> (4ea51cdfe85 too, but also b34e37bf. Worth noting separately IMV.
> Geoghegan, Haas).
>
> * Add sortsupport (support for non-SQL callable interface for
> comparators) with abbreviation capability to numeric operator class.
> This significantly accelerates sorting on numeric columns. (abd94bcac,
> Gierth)

I think this is overkill.  Bruce's single entry seems fine to me,
although we could adjust the wording a bit.  We need this to be simple
enough that someone not immersed in the code can understand it, and as
soon as you talk about sortsupport as if it were a thing, you lose
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comfortably check BackendPID with psql

2015-06-11 Thread Jim Nasby


On 6/11/15 4:55 AM, Andres Freund wrote:

Hi,

On 2015-06-11 09:41:17 +, Naoya Anzai wrote:

This is a so tiny patch but I think it is very useful for hackers and DBAs.
When we debug with psql, we frequently use "SELECT pg_backend_pid();".
This can change the input of the 24 characters to the only 4 characters!


Not a big fan of that abbreviation itself. What I'd wondered about
instead - and actually had patched into my psql at some point - is
adding an appropriate escape to psql's PROMPT. I think that'd serve your
purpose as well?


+3.14159; that would be hugely helpful when using gdb.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

On Thu, Jun 11, 2015 at 3:22 PM, Alvaro Herrera
 wrote:
> I think #1 is the part that we seem to have the most trouble with.  It
> seems easily fixable: establish a new mailing list for that task (say
> pgsql-release) and get all the current -core in there, plus the set of
> active committers.  That group would handle tasks #1 and #2 above.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cancel race condition

Thanks for the extra consideration Robert.

Since I'm implementing a generic driver, users can send either
single-statement transactions or actual multiple-statement transaction.
However, whether we're in a transaction or not doesn't seem to affect
Npgsql logic (unless I'm missing something) - if the cancellation does hit
a query the transaction will be cancelled and it's up to the user to roll
it back as is required in PostgreSQL...


On Thu, Jun 11, 2015 at 9:50 PM, Robert Haas  wrote:

> On Tue, Jun 9, 2015 at 4:42 AM, Shay Rojansky  wrote:
> > Ah, OK - I wasn't aware that cancellation was actually delivered as a
> > regular POSIX signal... You're right about the lack of guarantees then.
> >
> > In that case, I'm guessing not much could be do to guarantee sane
> > cancellation behavior... I do understand the "best effort" idea around
> > cancellations. However, it seems different to say "we'll try our best and
> > the cancellation may not be delivered" (no bad consequences except maybe
> > performance), and to say "we'll try our best but the cancellation may be
> > delivered randomly to any query you send from the moment you send the
> > cancellation". The second makes it very difficult to design a 100% sane,
> > deterministic application... Any plans to address this in protocol 4?
> >
> > Would you have any further recommendations or guidelines to make the
> > situation as sane as possible? I guess I could block any new SQL queries
> > while a cancellation on that connection is still outstanding (meaning
> that
> > the cancellation connection hasn't yet been closed). As you mentioned
> this
> > wouldn't be a 100% solution since it would only cover signal sending, but
> > better than nothing?
>
> Blocking new queries seems like a good idea.  Note that the entire
> transaction (whether single-statement or multi-statement) will be
> aborted, or at least the currently-active subtransaction, not just the
> current query.  If you're using single-statement transactions I guess
> there is not much practical difference, but if you are using
> multi-statement transactions the application kind of needs to be aware
> of this, since it needs to know that any work it did got rolled back,
> and everything's going to fail up until the current (sub)transaction
> is rolled back.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] Reconsidering the behavior of ALTER COLUMN TYPE

2015-06-11 Thread Merlin Moncure

On Thu, Jun 11, 2015 at 3:12 PM, Tom Lane  wrote:
> A recent thread in pgsql-general shows yet another user who's befuddled by
> the need to add a USING clause to an ALTER TABLE ... ALTER COLUMN TYPE
> command:
>
> http://www.postgresql.org/message-id/flat/CAD25zGDiaqCG=eqXp=byvzcdgvcqubh7kbpjpjqsespowfv...@mail.gmail.com
>
> Specifically, it's not clear why you can change the type of a uuid[]
> column with
>
> alter table t alter u type text[];
>
> but then you can't change it back with
>
> alter table t alter u type uuid[];
>
> The reason of course is that uuid-to-text is considered an
> assignment-grade coercion while text-to-uuid is not.
>
> I've lost count of the number of times we've had to tell someone to
> use a USING clause for this.  Maybe it's time to be a little bit less
> rigid about this situation, and do what the user obviously wants rather
> than make him spell out a rather pointless USING.
>
> Specifically, after a bit of thought, I suggest that
>
> (1) If there's no USING, attempt to coerce the column value as though
> an *explicit* coercion were used.
>
> (2) If there is a USING, maintain the current behavior that the result
> has to be assignment-coercible to the new column type.  We could use
> explicit-coercion semantics here too, but I think that might be throwing
> away a bit too much error checking, in a case where the odds of a typo
> are measurably higher than for the default situation.
>
> This could be documented as "if there is no USING, the default behavior
> is as if you'd written USING column::newtype".
>
> Thoughts?
>
> In any case, we oughta use two different error messages for the two cases,
> as per my comment in the above thread.  That seems like a back-patchable
> bug fix, though of course any semantics change should only be in HEAD.

I have a slight preference to keep it to tightening up the wording on
both the hint and the error (for example, "Perhaps you meant USING
foo::type?") but leaving the behavior alone.  In other components of
the system, forcing explicit casts has added safety and there is no
more dangerous footgun than 'ALTER TABLE'.  IMSNHO, the issue is the
somewhat arcane syntax, not the casting rules.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postgres GSSAPI Encryption

2015-06-11 Thread Stephen Frost

Robbie,

* Robbie Harwood (rharw...@redhat.com) wrote:
> I've coded up the GSSAPI encryption and is on my github[0].  It's
> missing a number of things before merge, including proper error
> handling, correct ACL behavior (by and large, it currently doesn't
> respect hba.conf), and exposing configuration handles in hba.conf and
> the client for the settings we've talked about above, as well as
> documentation of all that.

Neat!

We're currently focusing on stabilizing for PostgreSQL 9.5, but this
work is great and we'd like to make sure to review it for inclusion
post-9.5.  Our process is that patches are posted to the mailing list
and then an entry is created on http://commitfest.postgresql.org which
references the mailing list post.  This way, we won't forget about the
patch in a month or so, after we've branched off 9.5 and resumed
reviewing development work.

So, when you're ready for it to be reviewed for feedback, please post a
full patch against whatever current master is at the time, and register
the patch in the commitfest application.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 1:23 PM, Peter Geoghegan  wrote:
> * Add sortsupport (support for non-SQL callable interface for
> comparators) with abbreviation capability to text/varlena operator
> class. This significantly accelerates sorting on text columns.
> (4ea51cdfe85 too, but also b34e37bf. Worth noting separately IMV.
> Geoghegan, Haas).

Also, note that there is no character(n) support for abbreviation as
yet (nor is there any contrib/citext support), so it's not quite true
that the sorting stuff accelerates sorts on "character fields".

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Wed, Jun 10, 2015 at 9:15 PM, Bruce Momjian  wrote:
> I have committed the first draft of the 9.5 release notes.  You can view
> the output here:

+  
+   
+Improve the speed of sorting character and numeric fields (Robert
+Haas, Peter Geoghegan, Andrew Gierth)
+   
+  

A few comments on this.

First of all, I think it should be separately noted that the
sortsupport infrastructure is now used in virtually all places where
it's useful (see commit 5ea86e6e6). So for example, CREATE INDEX on
integer columns ought to be notably faster (and CLUSTER, too). The 9.2
era sortsupport stuff was simply never adopted to do that until now.
That has nothing to do with abbreviated keys, except that the idea of
abbreviated keys gave me a strong reason to care about sortsupport a
lot more. But commit 5ea86e6e6 predates abbreviated keys, and is
certainly independently useful (this really should have made it into
9.2).

Secondly, Robert didn't credit himself as an author in his commit
message for the abbreviated keys infrastructure + text opclass support
*at all*. However, I think that Robert should be listed as a secondary
author of the abbreviated keys infrastructure, and that he would agree
that I am clearly the primary author. Andrew Gierth did work on the
datum case for sortsupport + abbreviation, so I agree he should be
listed as a secondary author of the infrastructure too, after Robert.

I think there should be a total of 4 items related to sorting. The
wording I come up with may not be appropriate, but will give you an
idea:

* Allow sorting to be performed by inlined, non-SQL-callable
comparison functions for CREATE INDEX, REINDEX and CLUSTER operations
based on a B-Tree operator class. (5ea86e6e6 -- Geoghegan)

* Add abbreviated key sorting infrastructure. This allows B-Tree
operator classes to provide compact abbreviated representations of
pass-by-reference types which are sorted with inexpensive comparisons.
This makes sort operations with support for the infrastructure very
significantly faster in the common case where most comparisons can be
resolved with the abbreviated representation alone. (4ea51cdfe85 --
Geoghegan, Haas, Gierth, with Gierth's contribution coming from
78efd5c1 alone)

* Add sortsupport (support for non-SQL callable interface for
comparators) with abbreviation capability to text/varlena operator
class. This significantly accelerates sorting on text columns.
(4ea51cdfe85 too, but also b34e37bf. Worth noting separately IMV.
Geoghegan, Haas).

* Add sortsupport (support for non-SQL callable interface for
comparators) with abbreviation capability to numeric operator class.
This significantly accelerates sorting on numeric columns. (abd94bcac,
Gierth)

I'm not sure if it's worth mentioning the "cheap equality for text"
commit (e246b3d6eac09). I guess that it probably is, because it will
help with things like index scans, too. Arguably that isn't a sorting
thing (it's certainly not *just* a sorting thing).

I've blogged on the abbreviated key stuff quite a bit, which may be
useful should you require additional background information:

http://pgeoghegan.blogspot.com/2015/01/abbreviated-keys-exploiting-locality-to.html

http://pgeoghegan.blogspot.com/2015/04/abbreviated-keys-for-numeric-to.html

Thanks
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Reconsidering the behavior of ALTER COLUMN TYPE

A recent thread in pgsql-general shows yet another user who's befuddled by
the need to add a USING clause to an ALTER TABLE ... ALTER COLUMN TYPE
command:

http://www.postgresql.org/message-id/flat/CAD25zGDiaqCG=eqXp=byvzcdgvcqubh7kbpjpjqsespowfv...@mail.gmail.com

Specifically, it's not clear why you can change the type of a uuid[]
column with

alter table t alter u type text[];

but then you can't change it back with

alter table t alter u type uuid[];

The reason of course is that uuid-to-text is considered an
assignment-grade coercion while text-to-uuid is not.

I've lost count of the number of times we've had to tell someone to
use a USING clause for this. Maybe it's time to be a little bit less
rigid about this situation, and do what the user obviously wants rather
than make him spell out a rather pointless USING.

Specifically, after a bit of thought, I suggest that

(1) If there's no USING, attempt to coerce the column value as though
an *explicit* coercion were used.

(2) If there is a USING, maintain the current behavior that the result
has to be assignment-coercible to the new column type. We could use
explicit-coercion semantics here too, but I think that might be throwing
away a bit too much error checking, in a case where the odds of a typo
are measurably higher than for the default situation.

This could be documented as "if there is no USING, the default behavior
is as if you'd written USING column::newtype".

Thoughts?

In any case, we oughta use two different error messages for the two cases,
as per my comment in the above thread. That seems like a back-patchable
bug fix, though of course any semantics change should only be in HEAD.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Entities created in one query not available in another in extended protocol

Thanks everyone for your time (or rather sorry for having wasted it).

Just in case it's interesting to you... The reason we implemented things
this way is in order to avoid a deadlock situation - if we send two queries
as P1/D1/B1/E1/P2/D2/B2/E2, and the first query has a large resultset,
PostgreSQL may block writing the resultset, since Npgsql isn't reading it
at that point. Npgsql on its part may get stuck writing the second query
(if it's big enough) since PostgreSQL isn't reading on its end (thanks to
Emil Lenngren for pointing this out originally).

Of course this isn't an excuse for anything, we're looking into ways of
solving this problem differently in our driver implementation.

Shay

On Thu, Jun 11, 2015 at 6:17 PM, Simon Riggs  wrote:

> On 11 June 2015 at 16:56, Shay Rojansky  wrote:
>
> Npgsql (currently) sends Parse for the second command before sending
>> Execute for the first one.
>>
>
> Look no further than that.
>
> --
> Simon Riggshttp://www.2ndQuadrant.com/
> 
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

[HACKERS] Why does replication need the old history file?

Hackers,

Sequence of events:

1. PITR backup of server on timeline 2.

2. Restored the backup to a new server, new-master.

3. Restored the backup to another new server, new-replica.

4. Started and promoted new-master (now on Timeline 3).

5. Started new-replica, connecting over streaming to new-master.

6. Get error message:

2015-06-11 12:24:14.503 PDT,,,7465,,5579e05e.1d29,1,,2015-06-11 12:24:14
PDT,,0,LOG,0,"fetching timeline history file for timeline 2 from
primary server",""
2015-06-11 12:24:14.503 PDT,,,7465,,5579e05e.1d29,2,,2015-06-11 12:24:14
PDT,,0,FATAL,XX000,"could not receive timeline history file from the
primary server: ERROR:  could not open file
""pg_xlog/0002.history"": No such file or directory

Questions:

A. Why does the replica need 0002.history?  Shouldn't it only need
0003.history?

B. Did something change in this regard in 9.3.6, 9.3.7 or 9.3.8?  It was
working in our previous setup, on 9.3.5, although that could have just
been that the history file hadn't been removed from the backups yet.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

Robert Haas wrote:

> The release process has multiple parts:
> 
> 1. Deciding that we need to do a release, either because $BUG is
> really bad or because we have security fixes to release or because
> enough time has gone by.
> 2. Updating translations and time zones and release notes and stamping
> version numbers and building tarballs.
> 3. Packaging and releasing tarballs.
> 4. Writing and publicizing the release announcement.
> 
> #3 happens on pgsql-packagers and AFAICT it works fine.  The problems
> are primarily with #1, and sometimes with #2 to the extent that Tom
> and Peter pretty much do them every time, so if they're not available,
> nobody else can step in.  I have no complaints about #4.

I am familiar with the part of #2 that Peter does, so I could do that in
case of need.  Not sure about the tzdata updates, but I expect that it
should be reasonably straightforward.  Stamping version numbers and
building tarballs are tasks now scripted, so I think any well-documented
release officer could do them also.  Of that bunch, writing the release
notes seems the most difficult.

I think #1 is the part that we seem to have the most trouble with.  It
seems easily fixable: establish a new mailing list for that task (say
pgsql-release) and get all the current -core in there, plus the set of
active committers.  That group would handle tasks #1 and #2 above.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

2015-06-11 Thread Magnus Hagander

On Thu, Jun 11, 2015 at 8:56 PM, Josh Berkus  wrote:

> On 06/10/2015 09:50 PM, Amit Kapila wrote:
> > Also shall we mention about below in Migrations to 9.5 section
> >
> > "pg_basebackup will not not work in tar mode against 9.4 and older
> servers,
> >  as we have introduced a new protocol option in that mode."
>
> AFAIK, pg_basebackup has never worked across versions.  So there's no
> reason for this note.
>

It has. The resulting backup has not been usable cross version, but
pg_basebackup itself has. Though not always, and I'm not sure we've ever
claimed it was supported, but it has worked.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] 9.5 release notes

On 06/10/2015 09:50 PM, Amit Kapila wrote:
> Also shall we mention about below in Migrations to 9.5 section
> 
> "pg_basebackup will not not work in tar mode against 9.4 and older servers,
>  as we have introduced a new protocol option in that mode."

AFAIK, pg_basebackup has never worked across versions.  So there's no
reason for this note.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

On Thu, Jun 11, 2015 at 2:50 PM, Josh Berkus  wrote:
> On 06/11/2015 11:47 AM, Alvaro Herrera wrote:
>> After going over this a few times, there is one thing that strikes me
>> that nobody has mentioned: the list of tasks mentioned there has one
>> that's completely unlike the others.  These are related to human
>> relations:
>>
>> Acting as a conduit for confidential communication.
>> Making policy announcements.
>> Managing permissions for commits, infrastructure, etc.
>> Handling disciplinary issues.
>> Making difficult decisions when consensus is lacking.
>>
>> while this one is highly technical:
>> Coordinating release activities.
>>
>> It seems that only this last one is where most people seem to have a
>> problem.  I wonder if it makes sense to create a separate group that
>> handles release activites -- the "release team."
>
> De-facto, this is Packagers.  Which is maybe not the best system, but
> it's what we're doing now.

The release process has multiple parts:

1. Deciding that we need to do a release, either because $BUG is
really bad or because we have security fixes to release or because
enough time has gone by.
2. Updating translations and time zones and release notes and stamping
version numbers and building tarballs.
3. Packaging and releasing tarballs.
4. Writing and publicizing the release announcement.

#3 happens on pgsql-packagers and AFAICT it works fine.  The problems
are primarily with #1, and sometimes with #2 to the extent that Tom
and Peter pretty much do them every time, so if they're not available,
nobody else can step in.  I have no complaints about #4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

On 06/11/2015 11:47 AM, Alvaro Herrera wrote:
> After going over this a few times, there is one thing that strikes me
> that nobody has mentioned: the list of tasks mentioned there has one
> that's completely unlike the others.  These are related to human
> relations:
> 
> Acting as a conduit for confidential communication.
> Making policy announcements.
> Managing permissions for commits, infrastructure, etc.
> Handling disciplinary issues.
> Making difficult decisions when consensus is lacking.
> 
> while this one is highly technical:
> Coordinating release activities.
> 
> It seems that only this last one is where most people seem to have a
> problem.  I wonder if it makes sense to create a separate group that
> handles release activites -- the "release team."

De-facto, this is Packagers.  Which is maybe not the best system, but
it's what we're doing now.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cancel race condition

On Tue, Jun 9, 2015 at 4:42 AM, Shay Rojansky  wrote:
> Ah, OK - I wasn't aware that cancellation was actually delivered as a
> regular POSIX signal... You're right about the lack of guarantees then.
>
> In that case, I'm guessing not much could be do to guarantee sane
> cancellation behavior... I do understand the "best effort" idea around
> cancellations. However, it seems different to say "we'll try our best and
> the cancellation may not be delivered" (no bad consequences except maybe
> performance), and to say "we'll try our best but the cancellation may be
> delivered randomly to any query you send from the moment you send the
> cancellation". The second makes it very difficult to design a 100% sane,
> deterministic application... Any plans to address this in protocol 4?
>
> Would you have any further recommendations or guidelines to make the
> situation as sane as possible? I guess I could block any new SQL queries
> while a cancellation on that connection is still outstanding (meaning that
> the cancellation connection hasn't yet been closed). As you mentioned this
> wouldn't be a 100% solution since it would only cover signal sending, but
> better than nothing?

Blocking new queries seems like a good idea.  Note that the entire
transaction (whether single-statement or multi-statement) will be
aborted, or at least the currently-active subtransaction, not just the
current query.  If you're using single-statement transactions I guess
there is not much practical difference, but if you are using
multi-statement transactions the application kind of needs to be aware
of this, since it needs to know that any work it did got rolled back,
and everything's going to fail up until the current (sub)transaction
is rolled back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

Bruce Momjian wrote:
> There has been some confusion by old and new community members about the
> purpose of the core team, and this lack of understanding has caused some
> avoidable problems.  Therefore, the core team has written a core charter
> and published it on our website:
> 
> http://www.postgresql.org/developer/core/
> 
> Hopefully this will be helpful to people.

After going over this a few times, there is one thing that strikes me
that nobody has mentioned: the list of tasks mentioned there has one
that's completely unlike the others.  These are related to human
relations:

Acting as a conduit for confidential communication.
Making policy announcements.
Managing permissions for commits, infrastructure, etc.
Handling disciplinary issues.
Making difficult decisions when consensus is lacking.

while this one is highly technical:
Coordinating release activities.

It seems that only this last one is where most people seem to have a
problem.  I wonder if it makes sense to create a separate group that
handles release activites -- the "release team."

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes for 9.4.4 et al

On 06/10/2015 11:35 PM, Noah Misch wrote:
> On Tue, Jun 09, 2015 at 04:31:43PM -0700, Josh Berkus wrote:
>> First draft of the release announcement.

Noah, thank you for corrections!


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team



On 06/11/2015 10:20 AM, Robert Haas wrote:


True but that isn't the fault of core outside of communication. The hackers,
reviewers and committers of those patches should be required to communicate
with core in a way that expresses the true severity of a situation so core
can make a:

Management decision.


I feel I've been making an honest and sincere effort do that with
limited success.


I have no disagreement with this statement. Bruce, you and I have all 
been advocating slowing down a bit (when it comes to the recent 
releases), we are obviously in the minority.


Instead of a huge thread of complaining from a bunch of people how about 
we just say, "These are the productive steps I would like to see"


Here are a few of mine:

1. I would like to see core elected to terms. I think the terms should 
be multi-year but no more than 2 or 3 years.


2. I would like to see more transparent discussion from core.

This is a fine line. We shouldn't be talking about potential security 
issues publicly. On the other hand there is question about whether or 
not core had any business putting out the statement on the Russian 
conference.


Note: I am not saying whether I agree or disagree with the statement. I 
am only talking about whether or not it was appropriate for core to 
handle it.


3. I would like to see core institute a different release policy.

I think something similar to Ubuntu would be a big boon for us.

4. I would like to see core be a strictly technical committee.

I think that advocacy and such with guidance from the community 
including core should be reflective of the community as a whole.


JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

2015-06-11 Thread Jeff Janes

On Wed, Jun 10, 2015 at 7:16 PM, Noah Misch  wrote:

> On Mon, Jun 08, 2015 at 03:15:04PM +0200, Andres Freund wrote:
> > One more thing:
> > Our testing infrastructure sucks. Without writing C code it's basically
> > impossible to test wraparounds and such. Even if not particularly useful
> > for non-devs, I really think we should have functions for creating
> > burning xids/multixacts in core. Or at least in some extension.
>
> +1.  This keeps coming up, so it's worth maintaining a verified and speedy
> implementation.

+1 from me as well.

Also, I've pretty much given up on testing this area myself, because of the
issue pointed out here:

http://www.postgresql.org/message-id/CAMkU=1wbi5afhytawdkawease_mc00i4y_7ojhp1y-8sgci...@mail.gmail.com

I think this is the same issue as part of Andres' point 1.

It is pretty frustrating and futile to test wrap around when the database
doesn't live long enough to wrap around under the high-stress conditions.

I had thought that all changes to ShmemVariableCache except nextXid should
be WAL logged at the time they occur, not just at the next checkpoint.  But
that wouldn't fix the problem, as the change to ShmemVariableCache has to
be transactional with the change to pg_database.  So it would have to be
WAL logged inside the commit record or any transaction which changes
pg_database.

Cheers,

Jeff

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 12:14 PM, Bruce Momjian  wrote:
> On Thu, Jun 11, 2015 at 12:13:26PM -0400, Robert Haas wrote:
>> On Thu, Jun 11, 2015 at 11:32 AM, Bruce Momjian  wrote:
>> > Improve hash creation and lookup performance (Tomas Vondra,
>> > Teodor Sigaev, Tom Lane, Robert Haas)
>>
>> I suggest haveing two separate items.  One of those is about the Hash
>> executor node and the other is about our dynahash stuff.  So they're
>> completely different code bases.
>
> OK, can you give me some text?  Do users really care which part it is
> in?

One item should say something like:

Improve hash join performance.

The Teodor/Tom thing should say something like:

Improve the performance of various in-memory hash tables.  In
particular, this can improve the performance of bitmap index scans.

I assume users would view those as separate things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

Robert Haas  writes:
> On Thu, Jun 11, 2015 at 11:32 AM, Bruce Momjian  wrote:
>> Improve hash creation and lookup performance (Tomas Vondra,
>> Teodor Sigaev, Tom Lane, Robert Haas)

> I suggest haveing two separate items.  One of those is about the Hash
> executor node and the other is about our dynahash stuff.  So they're
> completely different code bases.

As far as 4a14f13a0 goes, I would think that ought to be mentioned under
"Source Code" since it's a change in a rather widely used API.  I doubt
that the performance aspect of it is really all that exciting to end
users, but third-party modules calling the dynahash code would care.
The hash join changes are a completely different thing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

Bruce Momjian  writes:
> On Thu, Jun 11, 2015 at 05:16:07PM +1200, David Rowley wrote:
>> Would you also be able to mention something about f15821e and d222585 ?

> I am going to defer to Tom on that.  I have added optimizer changes in
> the past but he didn't feel it made sense unless there was some
> user-visible change.

I'd be inclined to document both of those.  We mentioned outer join
removal when it was first added, in 9.0, so making a significant
improvement in it seems worthy of mention also.  Both of these things
are "user visible" to the extent that they affect EXPLAIN output.

I'm not sure whether we need to document the semantic hazard that the
second commit message worries about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

On Thu, Jun 11, 2015 at 12:29 PM, Joshua D. Drake  
wrote:
>> However, the core team needs to know when it should initiate a
>> release, and to do that it needs to understand the impact of bugs that
>> have been fixed and bugs that have not been fixed.  The recent
>> discussion of multixacts seems to indicate that the number of core
>> team members who had a clear understanding of the issues was zero,
>
> True but that isn't the fault of core outside of communication. The hackers,
> reviewers and committers of those patches should be required to communicate
> with core in a way that expresses the true severity of a situation so core
> can make a:
>
> Management decision.

I feel I've been making an honest and sincere effort do that with
limited success.  If  you're confident that that is my fault rather
than a sign of any possible problem with core, then I certainly
respect your right to hold that opinion.

>> As a non-core team member, I find it quite frustrating that getting a
>> release triggered requires emailing a closed mailing list.  I am not a
>> party to all of the discussion on my request, and the other people who
>> might know whether my request is technically sound or not are not
>> party to that discussion either.
>
> Nor should you be. This idea that all communications must be open is a joke
> and shows a lack of maturity in a community. There are things that must be
> discussed in private for many reasons.

You are arguing against a straw man, since I explicitly said they
should not be.  What I know, though, is that over the last four weeks,
four committers and one other contributor worked round the clock for
days to fix MultiXact bugs and test the fixes, and after that, it
emerged that (at least as far as I can tell) nobody from core was even
reading the -hackers thread closely enough to understand what problems
we were actually fixing, or even the long commit message I wrote
explaining it.  I think it's silly to argue that there is no need for
any overlap between the set of people who know why we need to do a
release and the set of people deciding when to do it, but if I am in
the minority, then so be it.

>> I disagreed with the decision to
>> stamp 9.4.3 without waiting for
>> b6a3444fa63519a0192447b8f9a332dddc66018f, but of course I couldn't
>> comment on it, because it was decided in a forum in which I don't get
>> to participate, on a thread on which I was not copied.
>
> We do not all have to agree and further there is nothing stopping you from
> commenting on -hackers. If enough people agree with you, core is going to
> listen.

It doesn't do much good when you only find out about the decision
after it has been made.

> There is one change to core that I (and I know others) would like to see:
>
> They should be serve a finite term and be elected.

In the rest of the email, you seemed to be arguing that there were no
problems with core and that everything is working great.  Here you are
saying perhaps some change would be helpful and constructive for the
project.  I agree with the latter position, but not the former.  On
the question of whether that means elections or something else, I
don't know what is best.

I am honestly not trying to completely overturn the apple cart here.
Obviously, many things that core has done - and is doing - for the
project have worked out very well.  The fact that things are not
perfect is, as you say, to be expected.  And certainly I appreciate
the time that everyone puts into this project, which for the core team
members adds up to a whole lot of time over very many years.
Nevertheless, our release scheduling has been sluggish; Andres
mentioned to me one occasion on which it took, IIRC, two months before
we did a release of a fix for a data-corrupting multixact bug, which I
think is too long; and there was a gap of more than 6 months between
9.3.5 and 9.3.6, which IMHO is too long even if no individual
top-priority issue was fixed during that time.  The fact that the core
team (and the packagers!) are very dedicated is not a reason not to
talk about these problems and how to fix them.  I don't believe that
we need to completely change the current system in order to make
things better, but I don't believe that we need to change nothing,
either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team



On 06/11/2015 10:10 AM, Magnus Hagander wrote:


Magnus: Committer, primary Windows dude and reviews patches here and
there.


Not sure that's a fair title at this point. Both Andrew and Michael seem
to be doing more of that than me these days, for example. (I do review
patches here and there, but not as much as I'd like)


Peter: Committer, reviews patches not only on -hackers but also -docs

Tom: Enough said

Dave: Committer, agreed that he doesn't do much -hackers work but I
guarantee you he provides a unique perspective to the rest of his
group due to his management of PgAdmin and an entire team at EDB.



Dave is not and never was a committer on the actual postgresql code -
only on other subprojects.


Thank you for the clarification.

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

2015-06-11 Thread Magnus Hagander

On Thu, Jun 11, 2015 at 6:29 PM, Joshua D. Drake 
wrote:

>
> On 06/11/2015 07:12 AM, Robert Haas wrote:
>
>>
>>  Hopefully this will be helpful to people.
>>>
>>
>> I believe the core team is suffering from a lack of members who are
>> involved in writing, reviewing, and committing patches.  Those things
>> are not core functions of the core team, as that charter illustrates.
>>
>
> Bruce: Committer, maintains pg_upgrade and reviews patches here and there.
>
> Magnus: Committer, primary Windows dude and reviews patches here and there.
>

Not sure that's a fair title at this point. Both Andrew and Michael seem to
be doing more of that than me these days, for example. (I do review patches
here and there, but not as much as I'd like)



>
> Peter: Committer, reviews patches not only on -hackers but also -docs
>
> Tom: Enough said
>
> Dave: Committer, agreed that he doesn't do much -hackers work but I
> guarantee you he provides a unique perspective to the rest of his group due
> to his management of PgAdmin and an entire team at EDB.
>


Dave is not and never was a committer on the actual postgresql code - only
on other subprojects.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] The purpose of the core team



On 06/11/2015 09:49 AM, Andrew Dunstan wrote:



On 06/11/2015 12:29 PM, Joshua D. Drake wrote:




JoshB: Advocacy. There is a strong argument that does not need to be a
core position.



I strongly disagree with this. On the contrary, I think there is a very
good argument that FOR such a position in core.


In the past, absolutely. However we have a very strong and powerful 
advocacy network now including two non-profits that are heavily involved.


That said, I am not advocating that the the position be removed as much 
as stating that in my opinion it isn't necessary.


JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

On Thu, Jun 11, 2015 at 12:13 PM, Joshua D. Drake  
wrote:
> This is crap. I am on the packagers list. Core always asks what people think
> and no it is not always accepted. There have been many times that the
> release has been pushed off because of resources available or new
> information being provided. There have also been plenty of well worn FOSS
> arguments on that list to make sure that everything is done in a mature,
> reliable and professional way.

I think I've gotten sucked into arguing about something I don't really
want to argue about.  I agree that -core is always very polite when
they asked on -packagers, and that -core would probably move the
release date if subscribers to -packagers asked them so to do.  I also
do not dispute Dave's statement that PostgreSQL's packagers are very
committed.  That is certainly true.  From what I can see they bend
over backwards to get the job done, which is great.

Now, I cannot personally recall, nor find in my email archives, an
occasion on which -packagers asked for the release date to be moved.
But you or others may be able to, and that is fine, but it's not what
I'm unhappy about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

2015-06-11 Thread Andrew Dunstan



On 06/11/2015 12:29 PM, Joshua D. Drake wrote:




JoshB: Advocacy. There is a strong argument that does not need to be a 
core position.




I strongly disagree with this. On the contrary, I think there is a very 
good argument that FOR such a position in core.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team



On 06/11/2015 07:12 AM, Robert Haas wrote:



Hopefully this will be helpful to people.


I believe the core team is suffering from a lack of members who are
involved in writing, reviewing, and committing patches.  Those things
are not core functions of the core team, as that charter illustrates.


Bruce: Committer, maintains pg_upgrade and reviews patches here and there.

Magnus: Committer, primary Windows dude and reviews patches here and there.

Peter: Committer, reviews patches not only on -hackers but also -docs

Tom: Enough said

Dave: Committer, agreed that he doesn't do much -hackers work but I 
guarantee you he provides a unique perspective to the rest of his group 
due to his management of PgAdmin and an entire team at EDB.


JoshB: Advocacy. There is a strong argument that does not need to be a 
core position.


In short, I don't agree with you.



However, the core team needs to know when it should initiate a
release, and to do that it needs to understand the impact of bugs that
have been fixed and bugs that have not been fixed.  The recent
discussion of multixacts seems to indicate that the number of core
team members who had a clear understanding of the issues was zero,


True but that isn't the fault of core outside of communication. The 
hackers, reviewers and committers of those patches should be required to 
communicate with core in a way that expresses the true severity of a 
situation so core can make a:


Management decision.



which I view as unfortunate.  The core team also needs to make good
decisions about who should be made a committer, and the people who are
doing reviews and commits of other people's patches are in the best
position to have an informed opinion on that topic.


Which happens.



As a non-core team member, I find it quite frustrating that getting a
release triggered requires emailing a closed mailing list.  I am not a
party to all of the discussion on my request, and the other people who
might know whether my request is technically sound or not are not
party to that discussion either.


Nor should you be. This idea that all communications must be open is a 
joke and shows a lack of maturity in a community. There are things that 
must be discussed in private for many reasons.


Now, if you are saying that core isn't reaching out directly to people 
involved in suspect work to make a quality decision that may be one 
thing but I also know that happens.



I disagreed with the decision to
stamp 9.4.3 without waiting for
b6a3444fa63519a0192447b8f9a332dddc66018f, but of course I couldn't
comment on it, because it was decided in a forum in which I don't get
to participate, on a thread on which I was not copied.


We do not all have to agree and further there is nothing stopping you 
from commenting on -hackers. If enough people agree with you, core is 
going to listen.



I realize
that, because decisions about whether to release and when to release
often touch on security issues, not all of this discussion can be
carried on in public.  But when the cone of secrecy is drawn in so
tightly that excludes everyone who actually understands the technical
issues related to the proposed release, we have lost our way, and do
our users a disservice.



We can't lose our way when this is the way it has always been.


I am not sure whether the solution to this problem is to add more
people to the core team, or whether the solution is to move release
timing decisions and committer selection out of the core team to some
newly-created group.  But I believe that change is needed.



If you have a people problem and you add people, you only have a bigger 
people problem. In terms of timing and committer selection, core does a 
very good job. Core has been around longer than you have and has shown 
great respect, maturity and management skills with most (not all of 
course) aspects of this community.


There is one change to core that I (and I know others) would like to see:

They should be serve a finite term and be elected.

Sincerely,

JD



--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 11:32 AM, Bruce Momjian  wrote:
> Improve hash creation and lookup performance (Tomas Vondra,
> Teodor Sigaev, Tom Lane, Robert Haas)

I suggest haveing two separate items.  One of those is about the Hash
executor node and the other is about our dynahash stuff.  So they're
completely different code bases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 12:13:26PM -0400, Robert Haas wrote:
> On Thu, Jun 11, 2015 at 11:32 AM, Bruce Momjian  wrote:
> > Improve hash creation and lookup performance (Tomas Vondra,
> > Teodor Sigaev, Tom Lane, Robert Haas)
> 
> I suggest haveing two separate items.  One of those is about the Hash
> executor node and the other is about our dynahash stuff.  So they're
> completely different code bases.

OK, can you give me some text?  Do users really care which part it is
in?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team



On 06/11/2015 08:26 AM, Robert Haas wrote:


Timing *decisions* are not made by -core, as I've told you in the
past. They are made by the packagers who do the actual work, based on
suggestions from -core.


You have told me that in the past, and I do not accept that it is true.

The suggestions from -core are always accepted, or as near as makes no
difference.  So in effect, -core decides.


Robert,

This is crap. I am on the packagers list. Core always asks what people 
think and no it is not always accepted. There have been many times that 
the release has been pushed off because of resources available or new 
information being provided. There have also been plenty of well worn 
FOSS arguments on that list to make sure that everything is done in a 
mature, reliable and professional way.


JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

2015-06-11 Thread Dave Page

On Thu, Jun 11, 2015 at 4:26 PM, Robert Haas  wrote:
>
>> Timing *decisions* are not made by -core, as I've told you in the
>> past. They are made by the packagers who do the actual work, based on
>> suggestions from -core.
>
> You have told me that in the past, and I do not accept that it is true.
>
> The suggestions from -core are always accepted, or as near as makes no
> difference.  So in effect, -core decides.

No, that means we have some very committed people handling the release
process, who are mostly able to put in the effort on the dates
suggested, and on the odd occasion when they can't for some reason, we
(core and packagers) figure out the best date for everyone involved.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 02:16:59PM +0200, Tomas Vondra wrote:
> Hi,
> 
> On 06/11/15 06:15, Bruce Momjian wrote:
> >I have committed the first draft of the 9.5 release notes.  You can view
> >the output here:
> >
> > http://momjian.us/pgsql_docs/release-9-5.html
> > 
> >and it will eventually appear here:
> >
> > http://www.postgresql.org/docs/devel/static/release.html
> >
> >I am ready to make suggested adjustments, though I am traveling for
> >conferences for the next ten days so there might a delay in my replies.
> 
> I wonder whether this point:
> 
> * Improve hash creation performance (Tomas Vondra, Teodor Sigaev, Tom
>   Lane)
> 
> is really about and 45f6240a, 8cce08f1 and 30d7ae3c. I can't
> remember or find other hash-related patches I've been working on.
> 
> If that's the case, it really is not about hash creation performance
> (except maybe partially the first commit), but about the lookup
> performance on the hash table. So it should rather be 'improve
> hash-join performance'.

That is a merged item;  here is the other one;

Improve hash_create's API for selecting simple-binary-key hash 
functions.

Previously, if you wanted anything besides C-string hash keys, you 
had to
specify a custom hashing function to hash_create().  Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and 
rather
error-prone, since a caller could easily miss the opportunity to 
optimize
by using hash_uint32 when appropriate.  Replace this with a design 
whereby
callers using simple binary-data keys just specify HASH_BLOBS and 
don't
need to mess with specific support functions.  hash_create() itself 
will
take care of optimizing when the key size is four bytes.

This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.

In future we could look into offering a similar optimized hashing 
function
for 8-byte keys.  Under this design that could be done in a 
centralized
and machine-independent fashion, whereas getting it right for keys 
of
platform-dependent sizes would've been notationally painful before.

For the moment, the old way still works fine, so as not to break 
source
code compatibility for loadable modules.  Eventually we might want 
to
remove tag_hash and friends from the exported API altogether, since 
there's
no real need for them to be explicitly referenced from outside 
dynahash.c.

Teodor Sigaev and Tom Lane

(Tom Lane)
[4a14f13a0] 2014-12-18 13:36:36 -0500

> Also, I think the list of authors is wrong. Robert Haas did a lot of
> work on those patches, so he should be mentioned there, and I don't
> remember Teodor working on this.
> 
> But maybe I was working on another patch and managed to forget about it.

New text:

Improve hash creation and lookup performance (Tomas Vondra,
Teodor Sigaev, Tom Lane, Robert Haas)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

On Thu, Jun 11, 2015 at 11:13 AM, Dave Page  wrote:
> Yes, and we have recently been discussing how best to solicit those
> opinions this year.

Great!

>> As a non-core team member, I find it quite frustrating that getting a
>> release triggered requires emailing a closed mailing list.
>
> It does not, unless you're talking about a security release. You might
> have to prod people if they overlook an email on -hackers, but you can
> certainly suggest releasing updates there.

I certainly can suggest it in a variety of ways on a variety of
mailing lists.  Getting it to happen is a different thing.

> Timing *decisions* are not made by -core, as I've told you in the
> past. They are made by the packagers who do the actual work, based on
> suggestions from -core.

You have told me that in the past, and I do not accept that it is true.

The suggestions from -core are always accepted, or as near as makes no
difference.  So in effect, -core decides.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Entities created in one query not available in another in extended protocol

On 11 June 2015 at 16:56, Shay Rojansky  wrote:

Npgsql (currently) sends Parse for the second command before sending
> Execute for the first one.
>

Look no further than that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 05:16:07PM +1200, David Rowley wrote:
> 
> On 11 June 2015 at 16:15, Bruce Momjian  wrote:
> 
> I have committed the first draft of the 9.5 release notes.  You can view
> the output here:
> 
>         http://momjian.us/pgsql_docs/release-9-5.html
> 
> 
> 
> Thanks Bruce.
> 
> Would you also be able to mention something about f15821e and d222585 ?

I am going to defer to Tom on that.  I have added optimizer changes in
the past but he didn't feel it made sense unless there was some
user-visible change.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 02:00:08PM +0900, Amit Langote wrote:
> On 2015-06-11 PM 01:15, Bruce Momjian wrote:
> > I have committed the first draft of the 9.5 release notes.  You can view
> > the output here:
> > 
> > http://momjian.us/pgsql_docs/release-9-5.html
> > 
> > and it will eventually appear here:
> > 
> > http://www.postgresql.org/docs/devel/static/release.html
> > 
> > I am ready to make suggested adjustments, though I am traveling for
> > conferences for the next ten days so there might a delay in my replies.
> > 
> 
> In the last section E.1.3.11.1. pgbench, there is:
> 
> +  
> +   
> +Add information about buffer pins to pg_buffercache
> +display (Andres Freund)
> +   
> +  
> 
> Should be moved its own section?

Oops, wrong section.   Moved.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

2015-06-11 Thread Dave Page

On Thu, Jun 11, 2015 at 3:12 PM, Robert Haas  wrote:
> On Tue, Jun 9, 2015 at 6:35 PM, Bruce Momjian  wrote:
>> There has been some confusion by old and new community members about the
>> purpose of the core team, and this lack of understanding has caused some
>> avoidable problems.  Therefore, the core team has written a core charter
>> and published it on our website:
>>
>> http://www.postgresql.org/developer/core/
>>
>> Hopefully this will be helpful to people.
>
> I believe the core team is suffering from a lack of members who are
> involved in writing, reviewing, and committing patches.  Those things
> are not core functions of the core team, as that charter illustrates.
> However, the core team needs to know when it should initiate a
> release, and to do that it needs to understand the impact of bugs that
> have been fixed and bugs that have not been fixed.  The recent
> discussion of multixacts seems to indicate that the number of core
> team members who had a clear understanding of the issues was zero,
> which I view as unfortunate.  The core team also needs to make good
> decisions about who should be made a committer, and the people who are
> doing reviews and commits of other people's patches are in the best
> position to have an informed opinion on that topic.

Yes, and we have recently been discussing how best to solicit those
opinions this year.

> As a non-core team member, I find it quite frustrating that getting a
> release triggered requires emailing a closed mailing list.

It does not, unless you're talking about a security release. You might
have to prod people if they overlook an email on -hackers, but you can
certainly suggest releasing updates there.

> I am not a
> party to all of the discussion on my request, and the other people who
> might know whether my request is technically sound or not are not
> party to that discussion either.  I disagreed with the decision to
> stamp 9.4.3 without waiting for
> b6a3444fa63519a0192447b8f9a332dddc66018f, but of course I couldn't
> comment on it, because it was decided in a forum in which I don't get
> to participate, on a thread on which I was not copied.

All of the technical discussion was done outside -core, in lists on
which you are a member. We simply discussed the possible impacts of
scheduling constraints given our personal availability to deal with
the release process, and the possible PR impact of waiting. Even then
I think there were all of maybe half a dozen short comments on the
thread.

> I realize
> that, because decisions about whether to release and when to release
> often touch on security issues, not all of this discussion can be
> carried on in public.  But when the cone of secrecy is drawn in so
> tightly that excludes everyone who actually understands the technical
> issues related to the proposed release, we have lost our way, and do
> our users a disservice.
>
> I am not sure whether the solution to this problem is to add more
> people to the core team, or whether the solution is to move release
> timing decisions and committer selection out of the core team to some
> newly-created group.  But I believe that change is needed.

Timing *decisions* are not made by -core, as I've told you in the
past. They are made by the packagers who do the actual work, based on
suggestions from -core.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 01:54:23PM +0900, Michael Paquier wrote:
> On Thu, Jun 11, 2015 at 1:15 PM, Bruce Momjian  wrote:
> > I have committed the first draft of the 9.5 release notes.  You can view
> > the output here:
> >
> > http://momjian.us/pgsql_docs/release-9-5.html
> >
> > and it will eventually appear here:
> >
> > http://www.postgresql.org/docs/devel/static/release.html
> >
> > I am ready to make suggested adjustments, though I am traveling for
> > conferences for the next ten days so there might a delay in my replies.
> 
> Here are some review comments:
> +   
> +RETURN WHERE
> +   
> What is that?

It says the OID is returned, but where?  I don't see it in psql.

> +   
> +WHAT IS A STATISTICS SNAPSHOT?
> +   
> +  
> It defines the last time when the global statistics file of pgstat has
> been written. Perhaps documentation should be made clearer.

Update with this text:

This represents the last time the snapshot files was written to
the file system.

> +   
> +The remote snapshot must have been exported by
> +pg_export_snapshot() or been defined as a
> +logical replication slot.  This can be used by parallel
> +pg_dump to use a consistent snapshot across
> +pg_dump processes.
> +   
> Perhaps "or been defined when creating a logical replication slot
> through a replication connection".

Sure, updated paragraph:

The remote snapshot must have been exported by
pg_export_snapshot() or been defined when creating
a logical replication slot.  This can be used by parallel
pg_dump to use a consistent snapshot across
pg_dump processes.

> +  
> +   
> +Simplify WAL record format (Heikki Linnakangas)
> +   
> +
> +   
> +This allows external tools to more easily process WAL
> +files.
> +   
> +  
> More precision could be brought here. What the new format allows is
> actually to track more easily what are the blocks modified for
> relations, something possible without having the knowledge of the
> record type directly.

OK, new text:

This allows external tools to more easily track what blocks
are modified.

> +   
> +This is particularly helpful for warm standbys.
> +   
> "for warm standbys to control the timing at which WAL segment files
> are retrieved from a WAL archive."

That feels redundant to the major description of the item.

> I think that the following items should be added as well:
> - Improvement of file version information for Windows builds (Noah
> Misch, Michael Paquier), commit ee9569e. The file version information
> was missing for a set of contrib modules as well as a handful of
> libraries and binaries (like conversion_procs, zic, pg_regress, etc.).
> This item should mention that all the binaries and libraries produced
> by a Windows build now contain file version information. This could be
> merged as well with this item as both are related:
> +   
> +Add icons to all MSVC-built binaries (Noah Misch)
> +   

OK, merged into the existing item:

Add icons to all MSVC-built binaries and version
information to all MS Windows
binaries (Noah Misch)

> - Move pg_lzcompress and pg_lzdecompress to libpqcommon, commit
> 40bede54. This was some legwork for wal_compression but external
> binary tools can take advantage of using it now more freely. Those
> APIs have been reworked as well to be more generic, somewhat similarly
> to the interface lz4 exposes to the user.

Uh, do we actually want to document that API for users?  I didn't think so.


> - Addition of palloc_extended (8c8a886) to give module developers a
> fallback plan instead of OOM ERROR that palloc throws mandatorily.
> MemoryContextAllocExtended() can be used on another memory context
> than the current one similarly (bd4e2fd9). Feel free to discard this
> one if this is not appropriate in the release notes.

Same question.  I am happy to mention it, but if we mention it, we are
encouraging people to use it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed




On 06/11/15 16:54, Tom Lane wrote:

Tomas Vondra  writes:

Interestingly, the hash code checks for INT_MAX overflows on a number of
places, but does not check for this ...


Yeah, and at least at one time there were checks to prevent the hash
table request from exceeding MaxAllocSize. Did those get removed by
somebody?


I think the problem is in this piece of code:

  if ((hashtable->nbatch == 1) &&
  (hashtable->nbuckets_optimal <= INT_MAX / 2) &&
   /* overflow protection */
  (ntuples >= (hashtable->nbuckets_optimal * NTUP_PER_BUCKET)))
  {
  hashtable->nbuckets_optimal *= 2;
  hashtable->log2_nbuckets_optimal += 1;
  }

ISTM it does not check against the max_pointers (that's only done in 
ExecChooseHashTableSize).




--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Entities created in one query not available in another in extended protocol

I just understood the same thing Tom wrote, yes, Npgsql (currently) sends
Parse for the second command before sending Execute for the first one. I
will look into that implementation decision. It might be worth looking into
Simon's comment though, I'll report if I still see the same problematic
behavior after reordering the messages (assuming we do reorder).

Thanks for your inputs...

On Thu, Jun 11, 2015 at 5:50 PM, Tom Lane  wrote:

> Simon Riggs  writes:
> > On 11 June 2015 at 11:20, Shay Rojansky  wrote:
> >> It appears that when we send two messages in an extended protocol (so
> two
> >> Parse/Bind/Execute followed by a single Sync), where the first one
> creates
> >> some entity (function, table), and the second one can't query that
> entity
> >> (not found). This isn't terribly important but does seem a bit odd, I
> >> wanted to make sure you're aware of this.
>
> > Sounds somewhat unlikely, but thank you for the report. Can we see a test
> > case?
>
> > Most commonly in such cases the first request failed and error messages
> > weren't checked before running second message.
>
> I'm wondering if it was really more like
> Parse/Parse/Bind/Bind/Execute/Execute/Sync, in which case the described
> behavior wouldn't be too surprising at all.
>
> I do note that if a transaction is implicitly started to execute these
> messages, it's not closed until Sync.  But that should only affect the
> visibility of the results to other sessions, not to the current one.
> There's definitely a CommandCounterIncrement in exec_execute_message ...
>
> regards, tom lane
>

Re: [HACKERS] Entities created in one query not available in another in extended protocol

Andres Freund  writes:
> On 2015-06-11 10:50:31 -0400, Tom Lane wrote:
>> I do note that if a transaction is implicitly started to execute these
>> messages, it's not closed until Sync.  But that should only affect the
>> visibility of the results to other sessions, not to the current one.
>> There's definitely a CommandCounterIncrement in exec_execute_message ...

> exec_execute_message() only does so if the command has run to
> completion. Shay, Is it possible that a row limit was used and the
> commands didn't run fully?

That wouldn't affect utility statements like CREATE FUNCTION or CREATE
TABLE, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Entities created in one query not available in another in extended protocol

2015-06-11 Thread Andres Freund

On 2015-06-11 10:50:31 -0400, Tom Lane wrote:
> I do note that if a transaction is implicitly started to execute these
> messages, it's not closed until Sync.  But that should only affect the
> visibility of the results to other sessions, not to the current one.
> There's definitely a CommandCounterIncrement in exec_execute_message ...

exec_execute_message() only does so if the command has run to
completion. Shay, Is it possible that a row limit was used and the
commands didn't run fully?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

Tomas Vondra  writes:
> Interestingly, the hash code checks for INT_MAX overflows on a number of 
> places, but does not check for this ...

Yeah, and at least at one time there were checks to prevent the hash table
request from exceeding MaxAllocSize.  Did those get removed by somebody?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes

On Thu, Jun 11, 2015 at 10:20:13AM +0530, Amit Kapila wrote:
> On Thu, Jun 11, 2015 at 9:45 AM, Bruce Momjian  wrote:
> >
> > I have committed the first draft of the 9.5 release notes.  You can view
> > the output here:
> >
> >         http://momjian.us/pgsql_docs/release-9-5.html
> >
> 
> Thanks for writing the Release notes.
> 
> Some comments:
> 
> Have pg_basebackup use a tablespace mapping file, to allow for file paths of
> 100+ characters in length
> 
> I think this is not completely correct. This is mainly done to allow
> usage of tar format in Windows when tablespaces are present
> in database, although we have eventually done it for both
> Windows and Linux in the same way.  So how about: 
> 
> Have pg_basebackup use a tablespace mapping file, to allow usage of tar format
> consistently across all platforms

Actually, the file fixes 100+ char on all platforms _and_ Windows symbolic 
links:

Map basebackup tablespaces using a tablespace_map file

Windows can't reliably restore symbolic links from a tar format, so
instead during backup start we create a tablespace_map file, which 
is
used by the restoring postgres to create the correct links in 
pg_tblspc.
The backup protocol also now has an option to request this file to 
be
included in the backup stream, and this is used by pg_basebackup 
when
operating in tar mode.

This is done on all platforms, not just Windows.

This means that pg_basebackup will not not work in tar mode against 
9.4
and older servers, as this protocol option isn't implemented there.

Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.

(Andrew Dunstan)
[72d422a52] 2015-05-12 09:29:10 -0400

pg_basebackup -F t now succeeds with a long symlink target

Error when creating names too long for tar format

The tar format (at least the version we are using), does not support
file names or symlink targets longer than 99 bytes.  Until now, the 
tar
creation code would silently truncate any names that are too long.  
(Its
original application was pg_dump, where this never happens.)  This
creates problems when running base backups over the replication
protocol.

The most important problem is when a tablespace path is longer than 
99
bytes, which will result in a truncated tablespace path being 
backed up.
Less importantly, the basebackup protocol also promises to back up 
any
other files it happens to find in the data directory, which would 
also
lead to file name truncation if someone put a file with a long name 
in
there.

Now both of these cases result in an error during the backup.

Add tests that fail when a too-long file name or symlink is 
attempted to
be backed up.

Reviewed-by: Robert Hass 

(Peter Eisentraut)
[23a78352c] 2015-02-24 13:41:07 -0500

> 
> Also shall we mention about below in Migrations to 9.5 section
> 
> "pg_basebackup will not not work in tar mode against 9.4 and older servers,
>  as we have introduced a new protocol option in that mode."

Yes, added.  The attached, applied patch has both of these mentions, and
mentions 'tar' mode.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/release-9.5.sgml b/doc/src/sgml/release-9.5.sgml
new file mode 100644
index 2073f74..f022050
*** a/doc/src/sgml/release-9.5.sgml
--- b/doc/src/sgml/release-9.5.sgml
***
*** 1322,1329 

!Have pg_basebackup use a tablespace mapping file,
!to allow for file paths of 100+ characters in length (Amit Kapila)

--- 1322,1336 

!Have pg_basebackup use a tablespace mapping
!file with the tar format, to handle file paths
!of 100+ characters in length and sybolic links on MS Windows (Amit Kapila)
!   
! 
!   
!This change prevents pg_basebackup from backing
!up pre-9.5 servers.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

2015-06-11 Thread Kohei KaiGai

2015-06-11 23:33 GMT+09:00 Tomas Vondra :
> Hi,
>
> On 06/11/15 16:20, Jan Wieck wrote:
>>
>> On 06/11/2015 09:53 AM, Kouhei Kaigai wrote:

 curious: what was work_mem set to?

>>> work_mem=48GB
>>>
>>> My machine mounts 256GB physical RAM.
>>
>>
>> work_mem can be allocated several times per backend. Nodes like sort
>> and hash_aggregate may each allocate that much. You should set
>> work_mem to a fraction of physical-RAM / concurrent-connections
>> depending on the complexity of your queries. 48GB does not sound
>> reasonable.
>
>
> That's true, but there are cases where values like this may be useful (e.g.
> for a particular query). We do allow such work_mem values, so I consider
> this failure to be a bug.
>
> It probably existed in the past, but was amplified by the hash join
> improvements I did for 9.5, because that uses NTUP_PER_BUCKET=1 instead of
> NTUP_PER_BUCKET=10. So the arrays of buckets are much larger, and we also
> much more memory than we had in the past.
>
> Interestingly, the hash code checks for INT_MAX overflows on a number of
> places, but does not check for this ...
>
Which number should be changed in this case?

Indeed, nbuckets is declared as int, so INT_MAX is hard limit of hash-slot.
However, some extreme usage can easily create a situation that we shall
touch this restriction.

Do we have nbuckets using long int?

-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Entities created in one query not available in another in extended protocol

Simon Riggs  writes:
> On 11 June 2015 at 11:20, Shay Rojansky  wrote:
>> It appears that when we send two messages in an extended protocol (so two
>> Parse/Bind/Execute followed by a single Sync), where the first one creates
>> some entity (function, table), and the second one can't query that entity
>> (not found). This isn't terribly important but does seem a bit odd, I
>> wanted to make sure you're aware of this.

> Sounds somewhat unlikely, but thank you for the report. Can we see a test
> case?

> Most commonly in such cases the first request failed and error messages
> weren't checked before running second message.

I'm wondering if it was really more like
Parse/Parse/Bind/Bind/Execute/Execute/Sync, in which case the described
behavior wouldn't be too surprising at all.

I do note that if a transaction is implicitly started to execute these
messages, it's not closed until Sync.  But that should only affect the
visibility of the results to other sessions, not to the current one.
There's definitely a CommandCounterIncrement in exec_execute_message ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)

Michael Paquier wrote:

> After spending the night thinking about that, honestly, I think that
> we should go with (2) and keep the base backup as light-weight as
> possible and not bother about a GUC. (3) would need some extra
> intelligence to decide if some files can be skipped or not. Imagine
> for example --skip-files=global/pg_control or --skip-files=pg_clog
> (because it *is* a log file with much data), that would just corrupt
> silently your backup, but I guess that it is what you had in mind. In
> any case (3) is not worth the maintenance burden because we would need
> to update the things to filter each time a new important folder is
> added in PGDATA by a patch.

If somebody sets log_directory=pg_clog/ they are screwed pretty badly,
aren't they. (I guess this is just a case of "don't do that").

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

2015-06-11 Thread Kohei KaiGai

2015-06-11 23:28 GMT+09:00 Robert Haas :
> On Wed, Jun 10, 2015 at 10:57 PM, Kouhei Kaigai  wrote:
>> The attached patch replaces this palloc0() by MemoryContextAllocHuge() + 
>> memset().
>> Indeed, this hash table is constructed towards the relation with 
>> nrows=119994544,
>> so, it is not strange even if hash-slot itself is larger than 1GB.
>
> You forgot to attach the patch, I think.
>
Oops, I forgot to attach indeed.

>  It looks to me like the size
> of a HashJoinTuple is going to be 16 bytes, so 1GB/16 = ~64 million.
> That's a lot of buckets, but maybe not unreasonably many if you've got
> enough memory.
>
EXPLAIN says, this Hash node takes underlying SeqScan with
119994544 (~119 million) rows, but it is much smaller than my
work_mem setting.

-- 
KaiGai Kohei 


hashslot-allocation-by-huge-alloc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The Future of Aggregation

Robert Haas  writes:
> On Tue, Jun 9, 2015 at 11:00 AM, Alvaro Herrera
>  wrote:
>> Uh, this also requires serialization and deserialization of non-
>> finalized transition state, no?

> A bunch of this stuff does, but I recently had a Brilliant Insight: we
> don't need to add a new method for serializing and deserializing
> transition functions.  We can already do that: to serialize an
> aggregate transition state, you run it through the typoutput (or
> typsend) function and to deserialize it, you run it through the
> typinput (or typreceive) function.  The only problem is that we have
> some aggregate functions that use an internal type.  Those could,
> however, be changed: we could invent new types for each aggregate that
> uses a distinctive internal representation, rather than lumping it all
> under internal, and then give those types real input and output
> functions.  That way, we don't really need to invent anything new
> here.

Yeah.  Now, there are reasons why some of those aggregates are using
"internal" and not, say, "bytea": they want the core aggregate logic to be
just passing a pointer around and not trying to copy the aggregate's
actual state value.  However, I have been wondering whether the "expanded
objects" stuff I did recently could provide a more principled way to do
that kind of thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed


Hi,

On 06/11/15 16:20, Jan Wieck wrote:

On 06/11/2015 09:53 AM, Kouhei Kaigai wrote:

curious: what was work_mem set to?


work_mem=48GB

My machine mounts 256GB physical RAM.


work_mem can be allocated several times per backend. Nodes like sort
and hash_aggregate may each allocate that much. You should set
work_mem to a fraction of physical-RAM / concurrent-connections
depending on the complexity of your queries. 48GB does not sound
reasonable.


That's true, but there are cases where values like this may be useful 
(e.g. for a particular query). We do allow such work_mem values, so I 
consider this failure to be a bug.


It probably existed in the past, but was amplified by the hash join 
improvements I did for 9.5, because that uses NTUP_PER_BUCKET=1 instead 
of NTUP_PER_BUCKET=10. So the arrays of buckets are much larger, and we 
also much more memory than we had in the past.


Interestingly, the hash code checks for INT_MAX overflows on a number of 
places, but does not check for this ...


regards
--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Entities created in one query not available in another in extended protocol

On Thu, Jun 11, 2015 at 5:38 AM, Simon Riggs  wrote:
> On 11 June 2015 at 11:20, Shay Rojansky  wrote:
>>
>> In Npgsql, the .NET driver for PostgreSQL, we've switched from simple to
>> extended protocol and have received a user complaint.
>>
>> It appears that when we send two messages in an extended protocol (so two
>> Parse/Bind/Execute followed by a single Sync), where the first one creates
>> some entity (function, table), and the second one can't query that entity
>> (not found). This isn't terribly important but does seem a bit odd, I wanted
>> to make sure you're aware of this.
>
> Sounds somewhat unlikely, but thank you for the report. Can we see a test
> case?

Actually, I think I've seen this before.   The code that handles the
Sync message does this:

case 'S':   /* sync */
pq_getmsgend(&input_message);
finish_xact_command();
send_ready_for_query = true;
break;

finish_xact_command() calls CommitTransactionCommand(), which does
CommandCounterIncrement() or CommitTransaction() as appropriate.  So
without the Sync, I think it's expected that you don't see the results
of the previous command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

On Wed, Jun 10, 2015 at 10:57 PM, Kouhei Kaigai  wrote:
> The attached patch replaces this palloc0() by MemoryContextAllocHuge() + 
> memset().
> Indeed, this hash table is constructed towards the relation with 
> nrows=119994544,
> so, it is not strange even if hash-slot itself is larger than 1GB.

You forgot to attach the patch, I think.  It looks to me like the size
of a HashJoinTuple is going to be 16 bytes, so 1GB/16 = ~64 million.
That's a lot of buckets, but maybe not unreasonably many if you've got
enough memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

2015-06-11 Thread Kohei KaiGai

2015-06-11 23:20 GMT+09:00 Jan Wieck :
> On 06/11/2015 09:53 AM, Kouhei Kaigai wrote:
>>>
>>> curious: what was work_mem set to?
>>>
>> work_mem=48GB
>>
>> My machine mounts 256GB physical RAM.
>
>
> work_mem can be allocated several times per backend. Nodes like sort and
> hash_aggregate may each allocate that much. You should set work_mem to a
> fraction of physical-RAM / concurrent-connections depending on the
> complexity of your queries. 48GB does not sound reasonable.
>
Smaller number of max_connections and large work_mem configuration are
usual for typical OLAP workloads.

Even if configuration is not reasonable, it is not a right error message.
People cannot understand how to fix it.

psql:query21.sql:50: ERROR:  invalid memory alloc request size 1073741824


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

2015-06-11 Thread Jan Wieck


On 06/11/2015 09:53 AM, Kouhei Kaigai wrote:

curious: what was work_mem set to?


work_mem=48GB

My machine mounts 256GB physical RAM.


work_mem can be allocated several times per backend. Nodes like sort and 
hash_aggregate may each allocate that much. You should set work_mem to a 
fraction of physical-RAM / concurrent-connections depending on the 
complexity of your queries. 48GB does not sound reasonable.



Regards, Jan

--
Jan Wieck
Senior Software Engineer
http://slony.info


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The purpose of the core team

On Tue, Jun 9, 2015 at 6:35 PM, Bruce Momjian  wrote:
> There has been some confusion by old and new community members about the
> purpose of the core team, and this lack of understanding has caused some
> avoidable problems.  Therefore, the core team has written a core charter
> and published it on our website:
>
> http://www.postgresql.org/developer/core/
>
> Hopefully this will be helpful to people.

I believe the core team is suffering from a lack of members who are
involved in writing, reviewing, and committing patches.  Those things
are not core functions of the core team, as that charter illustrates.
However, the core team needs to know when it should initiate a
release, and to do that it needs to understand the impact of bugs that
have been fixed and bugs that have not been fixed.  The recent
discussion of multixacts seems to indicate that the number of core
team members who had a clear understanding of the issues was zero,
which I view as unfortunate.  The core team also needs to make good
decisions about who should be made a committer, and the people who are
doing reviews and commits of other people's patches are in the best
position to have an informed opinion on that topic.

As a non-core team member, I find it quite frustrating that getting a
release triggered requires emailing a closed mailing list.  I am not a
party to all of the discussion on my request, and the other people who
might know whether my request is technically sound or not are not
party to that discussion either.  I disagreed with the decision to
stamp 9.4.3 without waiting for
b6a3444fa63519a0192447b8f9a332dddc66018f, but of course I couldn't
comment on it, because it was decided in a forum in which I don't get
to participate, on a thread on which I was not copied.  I realize
that, because decisions about whether to release and when to release
often touch on security issues, not all of this discussion can be
carried on in public.  But when the cone of secrecy is drawn in so
tightly that excludes everyone who actually understands the technical
issues related to the proposed release, we have lost our way, and do
our users a disservice.

I am not sure whether the solution to this problem is to add more
people to the core team, or whether the solution is to move release
timing decisions and committer selection out of the core team to some
newly-created group.  But I believe that change is needed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The Future of Aggregation

On Tue, Jun 9, 2015 at 11:00 AM, Alvaro Herrera
 wrote:
> Uh, this also requires serialization and deserialization of non-
> finalized transition state, no?

A bunch of this stuff does, but I recently had a Brilliant Insight: we
don't need to add a new method for serializing and deserializing
transition functions.  We can already do that: to serialize an
aggregate transition state, you run it through the typoutput (or
typsend) function and to deserialize it, you run it through the
typinput (or typreceive) function.  The only problem is that we have
some aggregate functions that use an internal type.  Those could,
however, be changed: we could invent new types for each aggregate that
uses a distinctive internal representation, rather than lumping it all
under internal, and then give those types real input and output
functions.  That way, we don't really need to invent anything new
here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DBT-3 with SF=20 got failed

2015-06-11 Thread Kouhei Kaigai

> curious: what was work_mem set to?
>
work_mem=48GB

My machine mounts 256GB physical RAM.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: Merlin Moncure [mailto:mmonc...@gmail.com]
> Sent: Thursday, June 11, 2015 10:52 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] DBT-3 with SF=20 got failed
> 
> On Wed, Jun 10, 2015 at 9:57 PM, Kouhei Kaigai  wrote:
> > Hello,
> >
> > I got the following error during DBT-3 benchmark with SF=20.
> >
> >   psql:query21.sql:50: ERROR:  invalid memory alloc request size 1073741824
> >   psql:query21.sql:50: ERROR:  invalid memory alloc request size 1073741824
> >
> > It looks to me Hash node tries to 1GB area using palloc0(), but it exceeds
> > the limitation of none-huge interface.
> >
> > (gdb) bt
> > #0  0x7f669d29a989 in raise () from /lib64/libc.so.6
> > #1  0x7f669d29c098 in abort () from /lib64/libc.so.6
> > #2  0x0090ccfd in ExceptionalCondition (conditionName=0xb18130
> "!(((Size) (size) <= ((Size) 0x3fff)))",
> > errorType=0xb17efd "FailedAssertion", fileName=0xb17e40 "mcxt.c",
> lineNumber=856) at assert.c:54
> > #3  0x0093ad53 in palloc0 (size=1073741824) at mcxt.c:856
> > #4  0x00673045 in ExecHashTableCreate (node=0x7f669de951f0,
> hashOperators=0x24dbf90, keepNulls=0 '\000') at nodeHash.c:391
> > #5  0x006752e1 in ExecHashJoin (node=0x24d74e0) at 
> > nodeHashjoin.c:169
> > #6  0x0065abf4 in ExecProcNode (node=0x24d74e0) at 
> > execProcnode.c:477
> > #7  0x00681026 in ExecNestLoop (node=0x24d6668) at 
> > nodeNestloop.c:123
> > #8  0x0065abca in ExecProcNode (node=0x24d6668) at 
> > execProcnode.c:469
> > #9  0x00681026 in ExecNestLoop (node=0x24d61f8) at 
> > nodeNestloop.c:123
> > #10 0x0065abca in ExecProcNode (node=0x24d61f8) at 
> > execProcnode.c:469
> > #11 0x00681026 in ExecNestLoop (node=0x24d5478) at 
> > nodeNestloop.c:123
> > #12 0x0065abca in ExecProcNode (node=0x24d5478) at 
> > execProcnode.c:469
> > #13 0x00681026 in ExecNestLoop (node=0x24d51d0) at 
> > nodeNestloop.c:123
> > #14 0x0065abca in ExecProcNode (node=0x24d51d0) at 
> > execProcnode.c:469
> >
> > The attached patch replaces this palloc0() by MemoryContextAllocHuge() +
> memset().
> > Indeed, this hash table is constructed towards the relation with
> nrows=119994544,
> > so, it is not strange even if hash-slot itself is larger than 1GB.
> >
> > Another allocation request potentially reset of expand hash-slot may also 
> > needs
> > to be "Huge" version of memory allocation, I think.
> >
> > Thanks,
> >
> > Below is the query itself and EXPLAIN result.
> > 
> > dbt3c=# EXPLAIN VERBOSE
> > dbt3c-# select
> > dbt3c-# s_name,
> > dbt3c-# count(*) as numwait
> > dbt3c-# from
> > dbt3c-# supplier,
> > dbt3c-# lineitem l1,
> > dbt3c-# orders,
> > dbt3c-# nation
> > dbt3c-# where
> > dbt3c-# s_suppkey = l1.l_suppkey
> > dbt3c-# and o_orderkey = l1.l_orderkey
> > dbt3c-# and o_orderstatus = 'F'
> > dbt3c-# and l1.l_receiptdate > l1.l_commitdate
> > dbt3c-# and exists (
> > dbt3c(# select
> > dbt3c(# *
> > dbt3c(# from
> > dbt3c(# lineitem l2
> > dbt3c(# where
> > dbt3c(# l2.l_orderkey = l1.l_orderkey
> > dbt3c(# and l2.l_suppkey <> l1.l_suppkey
> > dbt3c(# )
> > dbt3c-# and not exists (
> > dbt3c(# select
> > dbt3c(# *
> > dbt3c(# from
> > dbt3c(# lineitem l3
> > dbt3c(# where
> > dbt3c(# l3.l_orderkey = l1.l_orderkey
> > dbt3c(# and l3.l_suppkey <> l1.l_suppkey
> > dbt3c(# and l3.l_receiptdate > l3.l_commitdate
> > dbt3c(# )
> > dbt3c-# and s_nationkey = n_nationkey
> > dbt3c-# and n_name = 'UNITED KINGDOM'
> > dbt3c-# group by
> > dbt3c-# s_name
> > dbt3c-# order by
> > dbt3c-# numwait desc,
> > dbt3c-# s_name
> > dbt3c-# LIMIT 100;
> >
> > QUERY PLAN
> >
> >
> 
> --
> >
> 
> --
> > --
> >  Limit  (cost=6792765.24..6792765.24 rows=1 width=26)
> >Output: supplier.s_name, (count(*))
> >->  Sort  (cost=6792765.24..6792765.24 rows=1 width=26)
> >  Output: supplier.s_name, (count(*))
> >  Sort Key: (count(*)) DESC, supplier.s_na

Re: [HACKERS] DBT-3 with SF=20 got failed

2015-06-11 Thread Merlin Moncure

On Wed, Jun 10, 2015 at 9:57 PM, Kouhei Kaigai  wrote:
> Hello,
>
> I got the following error during DBT-3 benchmark with SF=20.
>
>   psql:query21.sql:50: ERROR:  invalid memory alloc request size 1073741824
>   psql:query21.sql:50: ERROR:  invalid memory alloc request size 1073741824
>
> It looks to me Hash node tries to 1GB area using palloc0(), but it exceeds
> the limitation of none-huge interface.
>
> (gdb) bt
> #0  0x7f669d29a989 in raise () from /lib64/libc.so.6
> #1  0x7f669d29c098 in abort () from /lib64/libc.so.6
> #2  0x0090ccfd in ExceptionalCondition (conditionName=0xb18130 
> "!(((Size) (size) <= ((Size) 0x3fff)))",
> errorType=0xb17efd "FailedAssertion", fileName=0xb17e40 "mcxt.c", 
> lineNumber=856) at assert.c:54
> #3  0x0093ad53 in palloc0 (size=1073741824) at mcxt.c:856
> #4  0x00673045 in ExecHashTableCreate (node=0x7f669de951f0, 
> hashOperators=0x24dbf90, keepNulls=0 '\000') at nodeHash.c:391
> #5  0x006752e1 in ExecHashJoin (node=0x24d74e0) at nodeHashjoin.c:169
> #6  0x0065abf4 in ExecProcNode (node=0x24d74e0) at execProcnode.c:477
> #7  0x00681026 in ExecNestLoop (node=0x24d6668) at nodeNestloop.c:123
> #8  0x0065abca in ExecProcNode (node=0x24d6668) at execProcnode.c:469
> #9  0x00681026 in ExecNestLoop (node=0x24d61f8) at nodeNestloop.c:123
> #10 0x0065abca in ExecProcNode (node=0x24d61f8) at execProcnode.c:469
> #11 0x00681026 in ExecNestLoop (node=0x24d5478) at nodeNestloop.c:123
> #12 0x0065abca in ExecProcNode (node=0x24d5478) at execProcnode.c:469
> #13 0x00681026 in ExecNestLoop (node=0x24d51d0) at nodeNestloop.c:123
> #14 0x0065abca in ExecProcNode (node=0x24d51d0) at execProcnode.c:469
>
> The attached patch replaces this palloc0() by MemoryContextAllocHuge() + 
> memset().
> Indeed, this hash table is constructed towards the relation with 
> nrows=119994544,
> so, it is not strange even if hash-slot itself is larger than 1GB.
>
> Another allocation request potentially reset of expand hash-slot may also 
> needs
> to be "Huge" version of memory allocation, I think.
>
> Thanks,
>
> Below is the query itself and EXPLAIN result.
> 
> dbt3c=# EXPLAIN VERBOSE
> dbt3c-# select
> dbt3c-# s_name,
> dbt3c-# count(*) as numwait
> dbt3c-# from
> dbt3c-# supplier,
> dbt3c-# lineitem l1,
> dbt3c-# orders,
> dbt3c-# nation
> dbt3c-# where
> dbt3c-# s_suppkey = l1.l_suppkey
> dbt3c-# and o_orderkey = l1.l_orderkey
> dbt3c-# and o_orderstatus = 'F'
> dbt3c-# and l1.l_receiptdate > l1.l_commitdate
> dbt3c-# and exists (
> dbt3c(# select
> dbt3c(# *
> dbt3c(# from
> dbt3c(# lineitem l2
> dbt3c(# where
> dbt3c(# l2.l_orderkey = l1.l_orderkey
> dbt3c(# and l2.l_suppkey <> l1.l_suppkey
> dbt3c(# )
> dbt3c-# and not exists (
> dbt3c(# select
> dbt3c(# *
> dbt3c(# from
> dbt3c(# lineitem l3
> dbt3c(# where
> dbt3c(# l3.l_orderkey = l1.l_orderkey
> dbt3c(# and l3.l_suppkey <> l1.l_suppkey
> dbt3c(# and l3.l_receiptdate > l3.l_commitdate
> dbt3c(# )
> dbt3c-# and s_nationkey = n_nationkey
> dbt3c-# and n_name = 'UNITED KINGDOM'
> dbt3c-# group by
> dbt3c-# s_name
> dbt3c-# order by
> dbt3c-# numwait desc,
> dbt3c-# s_name
> dbt3c-# LIMIT 100;
>
> QUERY PLAN
>
> --
> --
> --
>  Limit  (cost=6792765.24..6792765.24 rows=1 width=26)
>Output: supplier.s_name, (count(*))
>->  Sort  (cost=6792765.24..6792765.24 rows=1 width=26)
>  Output: supplier.s_name, (count(*))
>  Sort Key: (count(*)) DESC, supplier.s_name
>  ->  HashAggregate  (cost=6792765.22..6792765.23 rows=1 width=26)
>Output: supplier.s_name, count(*)
>Group Key: supplier.s_name
>->  Nested Loop Anti Join  (cost=4831094.94..6792765.21 rows=1 
> width=26)
>  Output: supplier.s_name
>  ->  Nested Loop  (cost=4831094.37..6792737.52 rows=1 
> width=34)
>Output: supplier.s_name, l1.l_suppkey, 
> l1.l_orderkey
>Join Filter: (supplier.s_nationkey = 
> nation.n_nationkey)
>->  Nested Loop  (cost=4831094.37..

Re: [HACKERS] minor issues in pg_rewind

On Thu, Jun 11, 2015 at 10:21 AM, Michael Paquier
 wrote:
> On Thu, Jun 11, 2015 at 2:38 AM, Fujii Masao  wrote:
>> * Remove invalid option character "N" from the third argument (valid option
>> string) of getopt_long().
>> * Use pg_free() or pfree() to free the memory allocated by pg_malloc() or
>> palloc() instead of always using free().
>> * Assume problem is no disk space if write() fails but doesn't set errno.
>> * Fix several typos.
>
> This looks good to me, especially the ENOSPC enforcement for
> pg_control.

Thanks for reviewing the patch! I just pushed it.

> We may want to rename datapagemap_iterate to
> datapagemap_init as well. I can't really see what iterate means in
> this context.

returning an iterator object, maybe. The pair of iterator() and next() is
common way to walk through the entries, so I can live with the current
naming of the function.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 release notes


Hi,

On 06/11/15 06:15, Bruce Momjian wrote:

I have committed the first draft of the 9.5 release notes.  You can view
the output here:

http://momjian.us/pgsql_docs/release-9-5.html

and it will eventually appear here:

http://www.postgresql.org/docs/devel/static/release.html

I am ready to make suggested adjustments, though I am traveling for
conferences for the next ten days so there might a delay in my replies.


I wonder whether this point:

* Improve hash creation performance (Tomas Vondra, Teodor Sigaev, Tom
  Lane)

is really about and 45f6240a, 8cce08f1 and 30d7ae3c. I can't remember or 
find other hash-related patches I've been working on.


If that's the case, it really is not about hash creation performance 
(except maybe partially the first commit), but about the lookup 
performance on the hash table. So it should rather be 'improve hash-join 
performance'.


Also, I think the list of authors is wrong. Robert Haas did a lot of 
work on those patches, so he should be mentioned there, and I don't 
remember Teodor working on this.


But maybe I was working on another patch and managed to forget about it.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comfortably check BackendPID with psql

2015-06-11 Thread Naoya Anzai

Hi, Andres, Marko

> Seems easier to set this in .psqlrc:
oops! I've never noticed..
Thank you for your comment.

Regards,

Naoya

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-an...@xc.jp.nec.com
---


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comfortably check BackendPID with psql

2015-06-11 Thread Andres Freund

Hi,

On 2015-06-11 09:41:17 +, Naoya Anzai wrote:
> This is a so tiny patch but I think it is very useful for hackers and DBAs.
> When we debug with psql, we frequently use "SELECT pg_backend_pid();".
> This can change the input of the 24 characters to the only 4 characters!

Not a big fan of that abbreviation itself. What I'd wondered about
instead - and actually had patched into my psql at some point - is
adding an appropriate escape to psql's PROMPT. I think that'd serve your
purpose as well?

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] skipping pg_log in basebackup (was Re: pg_basebackup and pg_stat_tmp directory)

2015-06-11 Thread Magnus Hagander

On Jun 11, 2015 7:38 AM, "Amit Langote" 
wrote:
>
> On 2015-06-11 PM 02:20, Abhijit Menon-Sen wrote:
> > At 2015-06-10 13:22:27 -0400, robertmh...@gmail.com wrote:
> >>
> >> (1) include pg_log in pg_basebackup as we do currently
> >> (2) exclude it
> >> (3) add a switch controlling whether or not it gets excluded
> >>
> >> I can live with (3), but I bet most people want (2).
> >
> > Thanks for spelling out the options.
> >
> > I strongly prefer (2), but I could live with (3) if it were done as a
> > GUC setting. (And if that's what we decide to do, I'm willing to write
> > up the patch.)
> >
> > Whether or not it's a good idea to let one's logfiles grow to >8GB, the
> > fact that doing so breaks base backups means that being able to exclude
> > pg_log *somehow* is more of a necessity than personal preference.
> >
> > On the other hand, I don't like the idea of doing (3) by adding command
> > line arguments to pg_basebackup and adding a new option to the command.
> > I don't think that level of "flexibility" is justified; it would also
> > make it easier to end up with a broken base backup (by inadvertently
> > excluding more than you meant to).
> >
>
> Maybe a combination of (2) and part of (3). In absence of any command line
> argument, the behavior is (2), to exclude. Provide an option to *include*
it
> (-S/--serverlog)

I think it's useful enough to have a switch, but no problem to exclude it
by default. So I can definitely go for Amits suggestions.

I also don't feel strongly enough about it to put up any kind of fight if
the majority wants different :-)

/Magnus

Re: [HACKERS] Comfortably check BackendPID with psql

2015-06-11 Thread Marko Tiikkaja

On 6/11/15 11:41 AM, Naoya Anzai wrote:
> This can change the input of the 24 characters to the only 4 characters!
> 
> Image.
> --
> naoya=# \bid
> Backend Process ID
>   pid
> --
>   1716
> (1 row)
> ---
> 
> How do you like it?

Seems easier to set this in .psqlrc:

  \set bid 'select pg_backend_pid();'

and then:

=# :bid
 pg_backend_pid

  84430
(1 row)

No patches or concerns for psql version necessary.


.m


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Comfortably check BackendPID with psql

2015-06-11 Thread Naoya Anzai

Hi, hackers!

This is a so tiny patch but I think it is very useful for hackers and DBAs.
When we debug with psql, we frequently use "SELECT pg_backend_pid();".
This can change the input of the 24 characters to the only 4 characters!

Image.
--
naoya=# \bid
Backend Process ID
 pid  
--
 1716
(1 row)
---

How do you like it?

Regards,

Naoya

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: nao-an...@xc.jp.nec.com
---



psql_show_backend_pid.patch
Description: psql_show_backend_pid.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Entities created in one query not available in another in extended protocol

On 11 June 2015 at 11:20, Shay Rojansky  wrote:

> In Npgsql, the .NET driver for PostgreSQL, we've switched from simple to
> extended protocol and have received a user complaint.
>
> It appears that when we send two messages in an extended protocol (so two
> Parse/Bind/Execute followed by a single Sync), where the first one creates
> some entity (function, table), and the second one can't query that entity
> (not found). This isn't terribly important but does seem a bit odd, I
> wanted to make sure you're aware of this.
>

Sounds somewhat unlikely, but thank you for the report. Can we see a test
case?

Most commonly in such cases the first request failed and error messages
weren't checked before running second message.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

[HACKERS] Entities created in one query not available in another in extended protocol

In Npgsql, the .NET driver for PostgreSQL, we've switched from simple to
extended protocol and have received a user complaint.

It appears that when we send two messages in an extended protocol (so two
Parse/Bind/Execute followed by a single Sync), where the first one creates
some entity (function, table), and the second one can't query that entity
(not found). This isn't terribly important but does seem a bit odd, I
wanted to make sure you're aware of this.

Thanks,

Shay

Re: [HACKERS] pg_rewind failure by file deletion in source server