Re: [HACKERS] Desirable pgbench features?

2016-03-30 Thread Josh berkus
On 03/30/2016 08:29 AM, Fabien wrote:

> (1) TPC-B test driver must obtain a value from a query (the branch is
> the one
> of the chosen teller, not any random branch) and reuse it in another
> query. Currently the communication is one way, all results are silently
> discarded.
> 
> This is not representative of client applications (say a web app) which
> interact with the server with some logic of their own, including
> reading
> things and writing others depending on the previous reading.
> 
> This may be simulated on server side with a plpgsql script, but that
> would not exercise the client/server protocol logic and its performance
> impact, so I think that this simple read capability is important and
> missing.

Yes.  Particularly, one of the things I'd like to benchmark is
load-balancing between masters and replicas, including checks for
coherency.  Without being able to retrieve and reuse values, this can't
be tested.

The simplest way I'd see doing this is being able to SELECT INTO a
pgbench variable.


> (5) Consistency check: after a run, some properties are expected to be
> true, such as the balances of branches is the balance of its
> tellers and also of its accounts... This should/could be checked,
> maybe with an additional query.

I'd also love to have a consistency check which would be client-only
which I could run in the pgbench unit itself.  That is, a way to log
"errors" if, say, two variables were not equal at the end of the unit of
work.

An example of this would be using this to test if load-balanced
connections were getting "stale reads", especially since the
*percentage* of stale reads is what I want to know.  5% is acceptable,
50% is not.

> * using values from a query
> 
> For this use case (1), the best syntax and implementation is unclear. In
> particular, I'm not fond of the \gset syntax used in psql because the ';'
> is dropped and the \gset seems to act as a statement terminator.
> 
> After giving it some thought, I would suggest a simple two-line explicit
> syntax compatible with current conventions, with a SELECT statement
> terminated with a ';', on one side and where to put the results on the
> other, something like:
> 
>   SELECT ... ;
>   \into some variable names

This works for me if it works for the parser.

> 
> Or maybe in the other way around:
> 
>   \setsql some variable names
>   SELECT ... ;

This also works, but is not my preference.  It would be somewhat harder
to avoid variable/column mismatches.

One more wishlist item, which would make my request above for unit tests
unnecessary:

* Allow custom logging:

\vlog TAG varname1, varname2

Which would produce a custom log file called:

PID.TAG.varlog

With the format:

timestamp, var1, var2

e.g. if I had this:

SELECT id, abalance FROM account WHERE id = :aid
\into :lid, :lbal

\vlog balancelog :lid, :lbal

It would create a file called:

2247.balancelog.varlog

and/or append a line:

2016-03-30 21:37:33.899, 511, 2150

This would allow CSV logging of all sorts of user custom information,
including de-facto response times.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-29 Thread Josh berkus
Hackers,

For Thursday's release, I've added a wiki page to give users more
information about the strxfrm() issue, especially since we're going to
ask them to do a bunch of REINDEXing.

Please help me improve this.  Particularly, I need help on the following:

* is my explanation of the issue correct?
* how does this affect non-Linux platforms?  Windows?
* can someone write a search query for columns with indexes in non-C
collation?

Thanks!

http://wiki.postgresql.org/wiki/Abbreviatedkeys_issue

(if you have editing rights, please just edit the wiki page instead of
commenting here)

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JPUG wants to have a copyright notice on the translated doc

2016-03-08 Thread Josh berkus
On 03/04/2016 06:01 PM, Tatsuo Ishii wrote:

> I imagine kind of an extream case: a bad guy removes "Copyright
> 1996-2016 The PostgreSQL Global Development Group" and replaces it
> with his/her copyright.

The PostgreSQL license does not permit that; you have to retain the
original copyright notice.  You can *add* whatever you want.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New competition from Microsoft?

2016-03-07 Thread Josh berkus

On 03/07/2016 01:43 PM, Josh berkus wrote:

All,

http://blogs.microsoft.com/?p=67248

Once SQL Server is available on Linux, we're going to see more people
using it as an alternative to PostgreSQL.  Especially since they're
picking up a lot of our better features, like R support.



Sorry, that was meant to be posted to pgsql-advocacy.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New competition from Microsoft?

2016-03-07 Thread Josh berkus

All,

http://blogs.microsoft.com/?p=67248

Once SQL Server is available on Linux, we're going to see more people 
using it as an alternative to PostgreSQL.  Especially since they're 
picking up a lot of our better features, like R support.


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread Josh berkus
de technical assistance to those vendors as an organization so that 
> they can support PostgreSQL smoothly.
> If the vendors aren't proactive, we verify the interoperability with 
> their software by executing it.
> * Make a directory of software/services that can be used with PostgreSQL 
> on the community web site (wiki.postgresql.org or www.postgresql.org).
> Software/services vendors and PostgreSQL developers/users can edit this 
> directory.
> This list not only has the names of software and its vendors, but also 
> other information such as the level of interoperability (certified by 
> the vendor, or verified by the community/users) and remarks about 
> configuration, tuning, and whatever useful for users.
> PostgreSQL Enterprise Consortium (PGECons)
> https://www.pgecons.org/en
> CONSULTATION AND DISCUSSION
> ==
> I'd like to discuss and hear opinions on how can we expand the ecosystem 
> of PostgreSQL.  Example points are:
> * How/Where can we get the knowledge of expanding the software 
> ecosystem?  Is there any OSS project that we can learn from?
> How can we attract software vendors to support PostgreSQL?  What words 
> are convincing to appeal the increasing potential of PostgreSQL as a 
> good replacement for commercial databases?
> * How can we gain momentum for the activity?
> Can we involve globally influential companies like Dell, HPE, and Red Hat?
> * Do we need some kind of assistance from a foundation or establish a 
> foundation?
> There should be many, many software to address, so the ecosystem 
> activity has to be long-lasting. Plus, money and expertise is necessary.
> Would we benefit if we join the Linux Foundation Collaborative Projects?
> Linux Foundation Collaborative Projects
> http://collabprojects.linuxfoundation.org/
> * Which software/services in what category should we address preferentially?
> What software would many users desire to be interoperable when migrating 
> from commercial databases?
> What is the effective way to absorb user requests for this?  Is it 
> enough to make a questionnaire like the following?  What is the popular 
> questionnaire site which can catch many users (SurveyMonkey?)
> https://postgresql.uservoice.com/forums/21853-general
> Regards
> MauMau


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Josh berkus

On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote:

Sorry, but based on this plan it is possible to make a conclusion that
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based.  From my point of view there are  much more
possible alternatives.


Definitely.

Currently we have five approaches to sharding inside postgres in the 
field, in chronological order:


1. Greenplum's executor-based approach with motion nodes

2. Skype's function-based approach (PL/proxy)

3. XC/XL's approach, which I believe is also query executor-based

4. CitusDB's pg_shard which is based on query hooks

5. FDW-based (currently theoretical)

One of the things which causes bad reactions and arguments, Bruce, is 
that a lot of your posts and presentations detailing plans for the FDW 
approach carry the subtext that all four of the other approaches are 
dead ends and not worth considering.  Given that the other approaches, 
whatever their limitations, have working code in the field and the FDW 
approach does not, that's more than a little offensive.


If we want to move forwards on serious work on FDW-based sharding, the 
folks working on it should stop treating it as a "fait accompli" that 
this is the Chosen Way for the PostgreSQL project.  Otherwise, you'll 
spend all of your time arguing that point instead of working on features 
that matter.


Bruce made a long comparison with built-in replication, but there's a 
big difference here.  We decided that WAL-based replication was the way 
to go for built-in as a community decision here on -hackers and at 
various conferences.  Both the plan and the implementation for 
replication transcended company backing, involving even active 
competitors, and involved discussions with maintainers of the older 
replication projects.


In contrast, this FDW plan *still* feels very much like a small group 
made up of employees of only two companies came up with it in private 
and decided that it should be the plan for the whole project.  I know 
that Bruce and others have good reasons for starting the FDW project, 
but there hasn't been much of an attempt to obtain community consensus 
around it. If Bruce and others want contributors to work on FDWs instead 
of other sharding approaches, then they need to win over those people as 
to why they should do that.  It's how this community works.


Alternately, you can just work on the individual FDW features, which 
*everyone* thinks are a good idea, and when most of them are done, 
FDW-based scaleout will be such an obvious solution that nobody will 
argue with it.


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about google summer of code 2016

2016-02-19 Thread Josh berkus

On 02/19/2016 10:10 AM, Álvaro Hernández Tortosa wrote:


 Hi.

 Oleg and I discussed recently that a really good addition to a GSoC
item would be to study whether it's convenient to have a binary
serialization format for jsonb over the wire. Some argue this should be
benchmarked first. So the scope for this project would be to benchmark
and analyze the potential improvements and then agree on which format
jsonb could be serialized to (apart from the current on-disk format,
there are many json or nested k-v formats that could be used for sending
over the wire).

 I would like to mentor this project with Oleg.


+1


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-02-17 Thread Josh berkus

On 02/17/2016 03:02 PM, Tom Lane wrote:

Joe Conway <m...@joeconway.com> writes:

On 02/17/2016 02:14 PM, Tom Lane wrote:

I thought we'd agreed on requiring superuser access for this function.
I concur that letting just anyone see the config data is inappropriate.



It does not let anyone see config data out of the box:



+ CREATE VIEW pg_config AS
+ SELECT * FROM pg_config();
+
+ REVOKE ALL on pg_config FROM PUBLIC;
+ REVOKE EXECUTE ON FUNCTION pg_config() FROM PUBLIC;


Ah, that's fine.  I'd looked for a superuser() check and not seen one,
but letting the SQL permissions system handle it seems good enough.


What I like about this is that if I want to expose it to a 
non-superuser, I can just do a GRANT instead of needing to write a 
security definer view.



--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-02-17 Thread Josh berkus

On 02/17/2016 01:31 PM, Peter Eisentraut wrote:

On 1/31/16 7:34 AM, Michael Paquier wrote:

I am marking this patch as returned with feedback for now, not all the
issues have been fixed yet, and there are still no docs (the
conclusion being that people would like to have this stuff, right?).
Feel free to move it to the next CF should a new version be written.


I think we still don't have a real use case for this feature, and a
couple of points against it.


I have a use-case for this feature, at part of it containerized 
PostgreSQL. Right now, there is certain diagnostic information (like 
timeline) which is exposed ONLY in pg_controldata.  That leaves no 
reasonable way to expose this information in an API.


(and yes, we have a bigger issue with stuff which is only in pg_log, but 
one thing at a time)


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2016-02-15 Thread Josh berkus

On 02/15/2016 04:28 PM, Amit Langote wrote:

Also, you won't see any optimizer and executor changes. Queries will still
use the same plans as existing inheritance-based partitioned tables,
although as I mentioned, constraint exclusion won't yet kick in. That will
be fixed very shortly.


We're not going to use CE for the new partitioning long-term, are we? 
This is just the first version, right?


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Code of Conduct plan

2016-01-26 Thread Josh Berkus
Community members:

A number of people have contacted the Core Team about taking action
regarding a Code of Conduct (CoC) for the project. After some
discussion, the plan we have come up with is below.

**Please do not reply-all to this email, as we do not wish to generate
additional list traffic regarding CoCs**

1. The Core Team will appoint an exploration committee which will look
at various proposals (including the one drafted on pgsql-general) for
CoCs and discuss them. This committee will include both major community
members and less central folks who have hands-on experience with CoCs
and community management issues.  If you know of PostgreSQL community
members who have relevant experience, please nominate them by emailing
the core team: pgsql-c...@postgresql.org.

2. We will also hire a professional consultant to advise the committee
on CoC development, adoption, training, and enforcement.  Again, if
community members have a consultant to recommend, please email the core
team.

3. This committee will post a draft CoC or possibly a selection of draft
CoCs by or before late April for community comment.  Likely the
committee will be publishing drafts more frequently, but that will be up
to them to work out.

4. At the pgCon Community Unconference, and again at pgconf.EU, we will
have sessions where people can discuss and provide feedback about
proposed (or adopted) CoCs.  Possibly we will have CoC-related trainings
as well.

5. Once a draft is agreed upon, it will be circulated to our various
sub-communities for comment.

6. A "final" CoC will be endorsed by the committee and the Core Team
shortly after pgConf.EU, unless there is sufficently strong consensus to
adopt one before then.

Yes, we realize this is a long timeline.  The PostgreSQL Project has
never been about implementing things in a hurry; our practice has always
been to take all of the time required to develop the right feature the
right way.  Adopting a CoC is no different; if anything, we need to take
*more* time in order to get input from community members who do not
speak up frequently or assertively.

In the meantime, our policy remains: if you have experienced harassment
or feel that you are being treated unfairly by other project members,
email the Core Team and we will investigate your complaint and take
appropriate action.

Also, we want to thank Josh Drake for raising the CoC issue and getting
it off the TODO list and into process, and devising an initial "seed"
CoC.  Such things are all too easy to keep postponing.

Again, Please DO NOT comment on this plan on-list; one of the pieces of
feedback we have received loud and clear is that many community members
are unhappy with the amount of list traffic devoted to the subject of
CoCs.  As such, if you have comments on the plan above, please email the
core team instead of replying on-list, or wait for the committee and
address comments to them.

--Josh Berkus
  PostgreSQL Core Team


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2016-01-07 Thread Josh Berkus
On 01/07/2016 10:30 AM, Jeff Janes wrote:
> I don't completely agree with that.  I have often wanted to know when
> a specific item was added to the TODO page, and/or its individual edit
> history.  With only a unified history of the entire TODO page, and
> with no wiki equivalent of "git blame", figuring this out is extremely
> tedious.  A tracker would precisely solve this problem, if nothing
> else.  And when I edit the wiki and forget to make a coherent edit
> summary, there is no way to fix that, while presumably an issue
> tracker would be more tolerant of people's imperfections.

Yeah, we could also get rid of this conversation:

"Here's a patch for X, which is on the TODO list"

"Oh, we've obsolesced that, that was added to the TODO before we had Y"

... by auto-closing TODO items at a certain age.

-- 
Josh Berkus
Red Hat OSAS
(opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another little thing about psql wrapped expanded output

2015-12-02 Thread Josh Berkus
On 12/01/2015 11:20 AM, Tom Lane wrote:
> Notice that the dashed lines go all the way to the right margin of my
> 80-column terminal window, even though the data requires no more than
> 22 columns.  While this doesn't look so awful as-is, when I'm working
> in a very wide window it starts to look a little silly.
> 
> The behavior I'd have expected is that if the data is narrower than
> the window, the lines only go to the right margin of the data.  This
> is a trivial change to the logic in print_aligned_vertical, but before
> I go make it, does anyone want to argue that the current behavior is
> preferable to that?

If you're fixing the dashed-line code, is there a way to say that we
never have more than a reasonable number of dashes (ideally, the width
of the terminal) no matter how wide the data is?  Having 4000 dashes
because of large text on one row is kinda painful, and not at all useful.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] Another little thing about psql wrapped expanded output

2015-12-02 Thread Josh Berkus
On 12/02/2015 05:24 PM, Tom Lane wrote:
> Don't think I agree.  Suppose that you have a wider-than-screen table
> and you use a pager to scroll left and right in that.  If we shorten the
> dashed lines, then once you scroll to the right of wherever they stop,
> you lose that visual cue separating the rows.  This matters a lot if
> only a few of the column values are very wide: everywhere else, there's
> gonna be lots of whitespace.

What pager lets me scroll right infinitely?  Because I wanna install that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Beta2 Next week

2015-11-05 Thread Josh Berkus
Folks,

We are going to try to get Beta2 wrapped on Monday for a release next
week.  So if you're currently working on a 9.5 Open Item, getting it
checked in tommorrow or this weekend would be great.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Request: pg_cancel_backend variant that handles 'idle in transaction' sessions

2015-11-04 Thread Josh Berkus
On 11/04/2015 01:55 PM, Stephen Frost wrote:
> * Joe Conway (m...@joeconway.com) wrote:
>> On 11/04/2015 01:24 PM, Alvaro Herrera wrote:
>>> I agree with Pavel.  Having a transaction timeout just does not make any
>>> sense.  I can see absolutely no use for it.  An idle-in-transaction
>>> timeout, on the other hand, is very useful.
>>
>> +1 -- agreed
> 
> I'm not sure of that.  I can certainly see a use for transaction
> timeouts- after all, they hold locks and can be very disruptive in the
> long run.  Further, there are cases where a transaction is normally very
> fast and in a corner case it becomes extremely slow and disruptive to
> the rest of the system.  In those cases, having a timeout for it is
> valuable.

I could see a use for both, having written scripts which do both.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patent warning about the Greenplum source code

2015-11-02 Thread Josh Berkus
On 11/01/2015 06:37 PM, Bruce Momjian wrote:
> Let me add that this is more than hypothetical.  While we don't think
> any of these companies would sue the community for patent infringement,
> they could sue users, and the company could be bought by a sinister
> company that could enforce those patents.  For example, few had problems
> with Sun's control over Java, 

You only say this because you're not part of the Java world.  LOTS of
people had issues with Sun's control over Java; some of them even went
to court.

> but when Oracle bought Sun, more people
> were concerned.  Someone could buy the company _just_ to sue for patent
> infringement --- happens all the time.

Not as often as you'd think, and it hasn't happened in the database
world yet, for some good reasons.  This is all besides the point,
though; PostgreSQL has been accepting contributions from patent-holding
companies for over a decade, and that doesn't seem likely to stop any
time soon.  Greenplum is not in any way special, especially since we
already accepted contributions from Greenplum Inc. back in 2005-2006.

Overall, this thread seems designed to kick up a lot of fuss with no
potential useful outcome.  How about we terminate it now?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER SYSTEM vs symlink

2015-11-02 Thread Josh Berkus
On 11/02/2015 06:36 AM, Tom Lane wrote:
> Andrew Dunstan <and...@dunslane.net> writes:
>> I don't know if this was discussed at the time ALTER SYSTEM was 
>> implemented, but I have just discovered that if postgresql.auto.conf is 
>> a symlink to a file elsewhere, ALTER SYSTEM will happily break that link 
>> and write its own local copy. That strikes me as rather unfriendly. Why 
>> not just truncate the file rather than unlink it as it's being 
>> rewritten? Even if we don't want to do that a warning in the docs might 
>> help users avoid the "mistake" I just made.
> 
> Frankly, that behavior strikes me as a good idea.  There is no situation,
> IMV, where it's sane to try to put a symlink there.

So, just a doc patch then?

Since we have both include files and config_dir, I really don't
understand why anyone symlinks conf files anymore.  Anyone care to
enlighten me?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patent warning about the Greenplum source code

2015-11-01 Thread Josh Berkus
On 10/31/2015 11:47 PM, Bruce Momjian wrote:
> On Sun, Nov  1, 2015 at 01:27:13AM -0500, Bruce Momjian wrote:
>> On Fri, Oct 30, 2015 at 04:47:35AM -0400, Bruce Momjian wrote:
>>> Therefore, I caution people from viewing the Greenplum source code as
>>> you might see patented ideas that could be later implemented in
>>> Postgres, opening Postgres up to increased patent violation problems.  I
>>> am also concerned about existing community members who work for
>>> Pivotal/Greenplum and therefore are required to view the patented source
>>> code.  The license issue might eventually be improved by
>>> Pivotal/Greenplum, but, for now, I think caution is necessary.

Do let me point out that *code* isn't patented.  *techniques* are.  So
those techniques are patented whether or not you read the code.  It's
just that if you read the code, copy the technique directly, and put it
in Postgres, that's considered "willful" instead of "innocent"
infringement and the penalties are different.  Its effect on our project
is the same, though: we have to rip out the code in a hurry.

Maybe we should just relicense PostgreSQL as Apache and cover all of the
patent issues ;-)

>>>
>>> Of course, never mention known-patented ideas in any community forum,
>>> including this email list.
>>
>> I just found out that Citus Data has patent applications pending, so
>> viewing Citus Data source code has the same problems as Greenplum.
> 
> Actually, it might only be their closed source software that contains
> patents, i.e. not pg_shard.  I will check and report back when I can
> unless someone else reports here first.

I will ask Citus Data for an official statement.  I will point out that
cstore_fdw is Apache-licensed, which also contains a patent grant.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-30 Thread Josh Berkus
On 10/30/2015 08:29 AM, Christopher Browne wrote:
> I think it makes the feature WORSE.  I am getting more and more convinced
> that the Correct Solution is for this feature to be handled by submitting
> multiple URIs, and my argument isn't even based on any aspects of
> implementation complexity.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Within CF app, "Bug Fixes" should be "Bug Fixes/Refactoring"

2015-10-29 Thread Josh Berkus
On 10/29/2015 01:10 PM, Tom Lane wrote:
> Michael Paquier <michael.paqu...@gmail.com> writes:
>> On Thu, Oct 29, 2015 at 8:33 PM, Peter Geoghegan <p...@heroku.com> wrote:
>>> I think that within the CF app, we should either rename the patch
>>> topic "Bug Fixes" to "Bug Fixes/Refactoring", or introduce a new
>>> "Refactoring" topic. I prefer the first approach.
> 
>> I would vote for the second approach, with a separate category for 
>> refactoring.
> 
> Ditto.  Bug fixes are not at all like refactoring --- in particular, we'd
> usually not consider refactoring as fit material for back-patching.
> 
> "Refactoring" seems rather a narrow definition of what might show up
> in such a category, btw.  Maybe "Code Beautification" would be a
> suitable title?  I'm bikeshedding though.

"Miscellaneous"?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there any ordering to the values in guc.c?

2015-10-28 Thread Josh Berkus
On 10/28/2015 01:58 PM, Peter Eisentraut wrote:
> On 10/28/15 10:27 AM, Bill Moran wrote:
>> See subject. Aside from them being divvied up by datatype, they seem
>> to be ordered randomly. Since I'm putting together a patch that will
>> add some GUCs, do I just add them to the end of the list?
> 
> The initial commit grouped them logically, and it went downhill from
> there. :)

Yeah, we're overdue for another overhaul of GUC ordering.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Implement failover on libpq connect level.

2015-10-27 Thread Josh Berkus
On 10/27/2015 01:42 AM, Shulgin, Oleksandr wrote:
> On Mon, Oct 26, 2015 at 10:02 PM, Christopher Browne <cbbro...@gmail.com
> <mailto:cbbro...@gmail.com>> wrote:
> 
> 
> On 26 October 2015 at 16:25, Peter Eisentraut <pete...@gmx.net
> <mailto:pete...@gmx.net>> wrote:
> 
> On 10/14/15 6:41 AM, Victor Wagner wrote:
> > 1. It is allowed to specify several hosts in the connect string, 
> either
> > in URL-style (separated by comma) or in param=value form (several 
> host
> > parameters).
> 
> I'm not fond of having URLs that are not valid URLs according to the
> applicable standards.  Because then they can't be parsed or
> composed by
> standard libraries.
> 
> Also, this assumes that all the components other than host and
> port are
> the same.  Earlier there was a discussion about why the ports
> would ever
> need to be different.  Well, why can't the database names be
> different?
>  I could have use for that.
> 
> I think you should just accept multiple URLs.
> 
> 
> I'd give a "+1" on this...
> 
> As an area of new behaviour, I don't see a big problem with declining to
> support every wee bit of libpq configuration, and instead requiring the
> use of URLs.
> 
> Trying to put "multiplicities" into each parameter (and then considering
> it at the pg_service level, too) is WAY more complicated, and for a
> feature where it seems to me that it is pretty reasonable to have a
> series of fully qualified URLs.
> 
> Specifying several URLs should be easier to understand, easier to
>     test, easier to code, and easier to keep from blowing up badly.
> 
> 
> Setting aside all other concerns, have a +1 from me on that too.

I'm good with this.  +1


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-10-27 Thread Josh Berkus
On 10/27/2015 07:44 AM, Andres Freund wrote:
>> Unlinking old pg_clog files is strictly an optimization.  If you were to
>> > comment out every unlink() call in slru.c, the only ill effect on CLOG is 
>> > the
>> > waste of disk space.  Is the same true of MultiXact?
> Well, multixacts are a lot larger than the other SLRUs, I think that
> makes some sort of difference.

And by "a lot larger" we're talking like 50X to 100X.  I regularly see
pg_multixact directories larger than 1GB.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpython is broken for recursive use

2015-10-15 Thread Josh Berkus
On 10/15/2015 01:10 PM, Tom Lane wrote:
> I think this means that we should get rid of proc->globals and instead
> manufacture a new globals dict locally in each call to PLy_exec_function
> or PLy_exec_trigger.  For SETOF functions it would be necessary to keep
> the globals dict reference somewhere in the FunctionCallInfo struct,
> probably.  Not sure about cleaning up after an error that occurs between
> SETOF callbacks --- we might need plpython to grow an at-abort callback to
> do decref's on unreleased dicts.

Don't people currently specifically treat the state of the globals dict
as a feature?  That is, make use of the fact that you can store
session-persistent data in it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-10-13 Thread Josh Berkus
On 10/13/2015 11:02 AM, Masahiko Sawada wrote:
> I thought that this feature for postgresql should be simple at first
> implementation.
> It would be good even if there are some restriction such as the
> nesting level, the group setting.
> The another new approach that I came up with is,
> * Add new parameter synchronous_replication_method (say s_r_method)
> which can have two names: 'priority', 'quorum'
> * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
> is handled using priority. It's same as '[n1,n2,n3]' in dedicated
> laguage.
> * If s_r_method = 'quorum', the value of s_s_names is handled using
> quorum commit, It's same as '(n1,n2,n3)' in dedicated language.

Well, the first question is: can you implement both of these things for
9.6, realistically?  If you can implement them, then we can argue about
configuration format later.  It's even possible that the nature of your
implementation will enforce a particular syntax.

For example, if your implementation requires sync groups to be named,
then we have to include group names in the syntax.  If you can't
implement nesting in the near future, there's no reason to have a syntax
for it.

> * Setting of synchronous_standby_names is same as today. That is, the
> storing the nesting value is not supported.
> * If we want to support more complex syntax like what we are
> discussing, we can add the new value to s_r_method, for example
> 'complex', 'json'.

I think having two different syntaxes is a bad idea.  I'd rather have a
wholly proprietary configuration markup than deal with two alternate ones.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Release of CVEs

2015-10-11 Thread Josh Berkus
On 10/11/2015 04:54 AM, Greg Sabino Mullane wrote:
> The release notes for the new version reference some CVEs that 
> have not been publically released yet. Are they slow, or is 
> this something that needs to be added to the release 
> process checklist? 

These days MITRE is lagging 2-6 weeks behind publication for getting
CVEs on their website.  That's why I didn't bother to link them from the
announcement.

I don't know that there's anything the PostgreSQL project can do about
it.  If anyone on this list is connected with MITRE, please ask them
what they need to be more prompt.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-09 Thread Josh Berkus
Nathan,

If you're going further with this, one thing that we'll need is a bug
triage interface.  This is something which is better done by web; that
is, a way for volunteers to go through bugs by status, and quickly
update their status and/or post follow-up questions, and then go to the
next one in the queue.

I have yet to see a bug tracker which does this particular task well, so
please don't emulate existing art.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-07 Thread Josh Berkus
On 10/07/2015 10:25 AM, Alvaro Herrera wrote:
> Hmm, I guess we could have the bug form add
>  To: n...@bugs.postgresql.org
>  CC: pgsql-b...@postgresql.org
> as headers, which should work for most people (since we reply-all), Josh
> Berkus being the exception.

Well, this will just give you more opportunities to ignore me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-07 Thread Josh Berkus
On 10/07/2015 11:05 AM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> On 10/07/2015 10:25 AM, Alvaro Herrera wrote:
>>> Hmm, I guess we could have the bug form add
>>>  To: n...@bugs.postgresql.org
>>>  CC: pgsql-b...@postgresql.org
>>> as headers, which should work for most people (since we reply-all), Josh
>>> Berkus being the exception.
>>
>> Well, this will just give you more opportunities to ignore me.
> 
> Currently, it just means I have to look harder for stuff you post -- I
> certainly do not ignore you.  

I know, I'm just "Joshing" you.

> I worry that in the future it might mean
> your messages don't make it to the "bug threads", if we have such a
> thing.  See my followup to Stephen,
> https://www.postgresql.org/message-id/20151007175841.GE4405%40alvherre.pgsql
> where some traffic was not present in the mailing list archives but it
> does appear in the bug report itself.  I think we need to make it very
> hard for that to happen, and lack of CCs might become a problem.

As a serious response, "reply-all" is NOT the default for common GUI
mail clients (TB, Apple, Gmail, etc.), and I know on TB that it's not
even possible to make it the default (I miss Kmail).  So a system which
depends on the user ... including outside bug reporters ... always
hitting "reply all" is going to have some issues.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-06 Thread Josh Berkus
On 10/06/2015 12:03 PM, Bruce Momjian wrote:
> On Tue, Oct  6, 2015 at 03:33:20PM -0300, Alvaro Herrera wrote:
>> Joshua D. Drake wrote:
>>> On 10/06/2015 10:57 AM, Josh Berkus wrote:
>>>> On 10/06/2015 10:17 AM, Bruce Momjian wrote:
>>
>>>> Speaking of which ... this project is rich in skilled users who are
>>>> involved in the community but don't code.  Bug triage is exactly the
>>>> kind of thing very part-time community supporters can do, if we make it
>>>> easy for them to do.
>>>
>>> That is an understatement. There is a huge pool of non-hackers that can
>>> help contribute to this sort of thing.
>>
>> It was said, way back when, "adding Windows support will add a huge pool
>> of Windows-only developers".  I'm not sure that the impact was really
>> all that big there.  We have a few Windows-enabled people, but how many
>> of them are Windows-only?
>>
>> We similarly said, "moving the TODO list to the wiki will add a huge
>> pool of users that cannot edit the current CVS-only file".  To date,
>> most of what has happened is that the old items have become stale and
>> Bruce continues to do 99% of the work of maintaining it.
>>
>> So I am dubious that people that currently do not contribute will
>> contribute in the future just because we change the system.
> 
> Agreed, though that did work for the commitfest --- I almost never deal
> with those patches, for good and bad.

There isn't a huge pool, but then we don't need a huge pool.  We need
like three people, and I think that's not unreasonable to expect.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bugs and bug tracking

2015-10-06 Thread Josh Berkus
On 10/06/2015 10:17 AM, Bruce Momjian wrote:
> First, let me say I am glad we are talking about this, and am open to
> the criticism that my and other's tracking open items by keeping them in
> our personal mailboxes is not only odd, but bizarre given the size of
> our community and the responsibility we have.

On the other hand, until we do have some kind of tracker, it is
absolutely essential.  But at this point, it's more than one person can do.

This is kind of like CVS.  We didn't upgrade so Subversion, becuase we
said "we already have a user-friendly interface to CVS, called Marc."
We only moved to git when it could provide us with solid advantages.

I believe the same thing is happening here.  The inefficiency of the old
system (Bruce's mailbox) is becoming higher than the inefficiency of a
new, hypothetical system.

> Therefore, our current default behavior is to ignore user reports,
> unless someone takes an action to reply, record, or retain the email for
> later review.  What a tracker does is to make the default user report be
> _retained_, meaning we have to take action to _not_ retain a user report
> as an open item.

Well, we can determine how that's handled.  There are bug trackers out
there that automatically archive unconfirmed bug reports after a certain
amount of time.  I'd personally recommend it.

Of course, that requires a bug tracker which can have an "unconfirmed"
status.

> Second, we have a mix of user reports.  Some bug reports are not bugs
> and must be reclassified.  In other cases, uses ask questions via
> non-tracked communicate channels, e.g. pgsql-general, but they are
> really bugs.  So, to do this right, we need a way of marking tracked
> bugs as not bugs, and a way of adding bugs that were reported in a
> non-tracked manner.

Yeah, I was wondering about that.

> My point is that we have our current workflow not because we are idiots,
> but because it fit our workflow and resources best.  I am not sure if we
> have succeeded because of our current non-retain mode, or in spite of
> it.  It might be time to switch to a default-retain mode, especially
> since most other projects have that mode, but we should be clear what we
> are getting into.

FWIW, when I talk about bugs which we lost track of, they're not
generally unconfirmed bug reports.  Usually, it's stuff which a
contributor replied to, but the bug was low-impact, circumstantial, and
hard to reproduce, and came in during a busy period (like release time).
 So I'd be perfectly OK with the idea that unconfirmed bugs hang around
in the system for 60 days, then automatically convert to "stale" status.
Until we build up a team of volunteers for bug triage, we might have to
do that.

Speaking of which ... this project is rich in skilled users who are
involved in the community but don't code.  Bug triage is exactly the
kind of thing very part-time community supporters can do, if we make it
easy for them to do.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!]

2015-10-04 Thread Josh Berkus
On 10/04/2015 03:42 PM, Nathan Wagner wrote:
> I downloaded the archives for pgsql-bugs, and fed them into a database.  This
> part was easy, since I have already written a pg backed usenet server and had
> the code hand for storing and parsing out bits of rfc 2822 messages.

That would be the key part, wouldn't it?  Nice that you have that.

So this would be the other option if adopting debbugs doesn't work out.
 I think it's likely that we'll end up recreating most of debbugs in the
process (or bugzilla or something else) but whatever.  As long as we
have some kind of bug tracker, I'm happy.

> It's dirt simple.  If the system sees a message with 'Bug #(\d+)' in the
> subject line, it creates an entry in a bugs table with that bug number (if
> needed), and then marks the message as belonging to that bug.  If there seems
> to be metadata about the bug in the format of the (unquoted)
> 
> Bug reference:
> Logged by:  
> Email address:
> PostgreSQL version:
> Operating system:
> Description:
> Details:
> 
> it pulls that out and puts it in the bugs table.  There's also an "open"
> boolean in the table, defaulting to true.
> 
> The results can be found at https://granicus.if.org/pgbugs/
> 
> Ok.  So now we have a bug tracker, but...
> 
> Some open questions that I don't think have really been addressed, with my
> commentary interspersed:
> 
> 1: Can a bug be more than "open" or "closed"?
> 
> I think yes.  At least we probably want to know why a bug is closed.  Is it 
> not
> a bug at all, not our bug, a duplicate submission, a duplicate of another bug,
> something we won't fix for some reason (e.g. a bug against version 7)

We'd want the usual statuses:

* fixed
* duplicate
* unreproduceable
* timed out
* not a bug
* won't fix
* reopened

We'd also want a way to link a bug fix to a commit, and probably a way
to give the bug a list of searchable keywords (and add to that list).

> 2: Who can declare a bug closed.
> 
> Ugh.  I'm going to close some of them if it seems obvious to me that they
> should be closed.  But what if it's not obvious?  I could probably maintain it
> to some extent, but I don't know how much time that would actually take.
> 
> Related to the next point, it probably makes sense to just close up front
> bugs that are marked against unsupported pg versions, or haven't had
> any activity for too long, perhaps two years.  Just closing bugs with no
> mailing list activity for two years closes 5280 of 6376 bugs.

I'm reluctant to close all of those unexamined, since part of the
purpose of this is to find bugs which were never fixed.  Probably we
should organize a posse to comb trhough all of the old bugs and
hand-close them.

> 3: How far back should I actually import data from the bugs list?
> 
> I have imported each archived month from December of 1998.  It looks like the
> bug sequence was started at 1000 in December of 2003.  Emails with no bug id 
> in
> the subject line don't get associated with any bug, they're in the DB bug not
> really findable.
> 
> 4: What should I do with emails that don't reference a bug id but seem to be
> talking about a bug?
> 
> I suggest we do nothing with them as far as the bug tracker is concerned.  If
> people want to mark their message as pertaining to a bug, they can put that in
> the subject line.  However, I don't think a bug id can be assigned via email,
> that is, I think you have to use a web form to create a bug report with a bug
> id.  Presumably that could change if whoever runs the bug counter wants it to.

Yeah, fixing this would probably be tied to the possible change to
mailman.  Unless someone already has a way to get majordomo to append a
bug ID.

> 5: How can we use email to update the status of a bug?
> 
> I suggest using email headers to do this.  'X-PGBug-Fixed: ' and the
> like.  I assume here that everyone who might want to do such a thing uses an
> MUA that would allow this, and they know how.

I guess that depends on who we expect to use this, at least for closing
stuff.

> 6: Does there need to be any security on updating the status?
> 
> Probably not.  I don't think it's the sort of thing that would attract
> malicious adjustments.  If I'm wrong, I'd need to rethink this.  I realize I'm
> making security an afterthought, which makes my teeth itch, but I think layers
> of security would make it much less likely to be actually adopted.

I think there needs to be some kind of administrative access which
allows, for example, an issue to be closed so that it can't be reopened.

Anyway, I'm not convinced we want to reinvent this particular wheel, but
if we do, you've done a yeoman's job.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for improving buildfarm robustness

2015-10-03 Thread Josh Berkus
On 10/02/2015 09:39 PM, Tom Lane wrote:
> I wrote:
>> Here's a rewritten patch that looks at postmaster.pid instead of
>> pg_control.  It should be effectively the same as the prior patch in terms
>> of response to directory-removal cases, and it should also catch many
>> overwrite cases.
> 
> BTW, my thought at the moment is to wait till after next week's releases
> to push this in.  I think it's probably solid, but it doesn't seem like
> it's worth taking the risk of pushing shortly before a wrap date.
> 
> If anyone wants to argue for including it in the releases, speak up ...

Wait, we're backpatching this?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-10-01 Thread Josh Berkus
On 10/01/2015 07:55 AM, Tom Lane wrote:
> Playing devil's advocate ... would this really do much other than bloat
> the release notes?  The entire assumption of this thread is that people
> don't, or don't want to, use the release notes to find out what got fixed;
> they'd rather search a tracker.

It's not a question of "rather", it's a question of how searchable the
release notes are, which is "not really at all".  Yes, you can scan the
release notes for the latest update, but consider users who have an
issue and are running 9.2.7.  Reasonably enough, they want to know that
their issue is fixed in 9.2.13 (or in 9.4 if it turns out to be a
feature, not a bug) before they ask their boss for a downtime.  Figuring
that out now is really hard.

I tried to tackle this three or four years ago, by writing a tool which
would slurp the release notes and put them into a full-text search
database.  This turned out to be very hard to do; our formatting for the
release notes makes it very difficult for an automated import program to
interpret (SGML doesn't help), especially on point releases to old
versions.  It also turned out that the resulting database was useful
mostly to me, because you had to figure out what terms to search on
based on the bug report in front of you.  As a result, it never went online.

So today, the only time the release notes are useful for a "is this
issue fixed or not" is when a release note message mentions the specific
error message the user is getting, which is a minority of the time.

So in addition to what Haas mentions, I think we want to be able to link
the release notes to the original issues for our hypothetical bug tracker.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-10-01 Thread Josh Berkus
On 10/01/2015 07:43 AM, Robert Haas wrote:
> On Thu, Oct 1, 2015 at 9:44 AM, Fujii Masao <masao.fu...@gmail.com> wrote:
>> I wonder how much it's worth renaming only the file extension while
>> there are many places where "visibility map" and "vm" are used,
>> for example, log messages, function names, variables, etc.
> 
> I'd be inclined to keep calling it the visibility map (vm) even if it
> also contains freeze information.
> 

-1 to rename.  Visibility Map is a perfectly good name.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-30 Thread Josh Berkus
On 09/30/2015 07:44 AM, Merlin Moncure wrote:
> I'm not trolling in any way.  I'm just challenging you to back up your
> blanket assertions with evidence.  For example, you're assertion that
> mailing lists are insufficient is simply stated and expected to be
> taken on faith: *How* is it insufficient and *what* do things like in
> the new world?  Be specific: glossing over these details doesn't
> really accomplish anything and avoids the careful examination that may
> suggest small tweaks to the current processes that could get similar
> results with a lot less effort.  In this entire massive thread, so far
> only Josh has come up with what I'd consider to be actionable problem
> cases.

I don't see any way to make small tweaks to the existing process which
would fix any of these problems.  I think if that were possible, we'd
already have done it.  Suggestions welcome, of course.

For example, "just use the wiki for this" has been mentioned as an
alternative.  But we've tried "just using the wiki" for a number of
things, and it doesn't really work.  For example, using the wiki as a
way of breaking down the various multixact issues manifestly didn't
work.  A big part of the problem there is that there's no good way for
the wiki to notify people when there's been an update; a smaller part is
that the formatting gets messed up and impossible to follow.

> Josh's point, "2. Not losing track of minor bugs." is an example of
> what's bugging (pun intended) me.  Do you think issues don't get lost
> in issue trackers? 

More accurately: losing track of *fewer* minor bugs.

> As I noted upthread google is incredibly efficient at tying up a
> observed issue with the relevant fix and commentary, all based on
> search engine integration with the mailing list that you've summarily
> dismissed (without any evidence whatsoever) as ineffective.

In my experience it has not been effective.  Generally when a client
asks me the question about which release a particular bug is fixed in,
it takes me 15-30 minutes to determine the answer using
google/list/commitlog.  The client would not be able to determine it for
themselves at all.  While I appreciate the billable hours, it doesn't
seem like a good use of the customer's money, you know?

> You're
> just assuming it's better without any examination of the costs
> involved.  For example, implementation and training aside, I expect
> one of the immediate downsides of moving away from google + mailing
> list is that you're going to be suffering from a deluge of duplicate
> issues.

As opposed to duplicate emails, which we already get?

> Note, I'm not picking on Josh here.   The points pertaining to
> querying issues are certainly better than wading through the release
> notes which I've always felt to be kind of a pain.  What I'm driving
> at is that you should identify actual pain points in the process and
> explain clearly how things would improve.  Also, consider low impact
> solutions first (for example what low tech method makes bug
> identification to release easier?) and move into a big tooling change
> only after discarding them.

Well, if you have suggestions that don't involve an email-driven bug
tracker, please make them.  I don't have any other ideas.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-30 Thread Josh Berkus
On 09/30/2015 12:02 AM, Jim Nasby wrote:
> I wish people would at least consider this as an option because it
> integrates a ton of different features together. It has *the potential*
> to eliminate our need to keep maintaining CommitFest and buildfarm and
> could also replace mediawiki.
> 
> If people are hell-bent on every tool being separate then fine, but I
> get the distinct impression that everyone is discarding GitLab out of
> hand based on completely bogus information.

Well, Gitlab was introduced into this dicussion in the context of being
an OSS version of Github Issues.  If it's more than that, you're going
to have to explain.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-30 Thread Josh Berkus
On 09/30/2015 03:27 PM, Tom Lane wrote:
> Josh Berkus <j...@agliodbs.com> writes:
>> On 09/30/2015 03:10 PM, Tom Lane wrote:
>>> I'd be feeling a lot more positive about this whole thread if any people
>>> had stepped up and said "yes, *I* will put in a lot of grunt-work to make
>>> something happen here".  The lack of any volunteers suggests strongly
>>> that this thread is a waste of time, just as the several similar ones
>>> before it have been.
> 
>> Hmmm?  Frost volunteered to stand up debbugs.
> 
> So he did, and did anyone volunteer to put data into it, or to do ongoing
> curation of said data?  If we simply connect it up to the mailing lists,
> and then stand back and wait for magic to happen, we will not ever have
> anything that's any more useful than the existing mailing list archives.

Well, it's hard for anyone to volunteer when we don't know what the
actual volunteer tasks are.  I certainly intend to do *something* to
support the bug tracker system, but I don't know yet what that something is.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-30 Thread Josh Berkus
On 09/30/2015 03:10 PM, Tom Lane wrote:
> I'd be feeling a lot more positive about this whole thread if any people
> had stepped up and said "yes, *I* will put in a lot of grunt-work to make
> something happen here".  The lack of any volunteers suggests strongly
> that this thread is a waste of time, just as the several similar ones
> before it have been.

Hmmm?  Frost volunteered to stand up debbugs.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for improving buildfarm robustness

2015-09-30 Thread Josh Berkus
So, testing:

1. I tested running an AWS instance (Ubuntu 14.04) into 100% IOWAIT, and
the shutdown didn't kick in even when storage went full "d" state.  It's
possible that other kinds of remote storage failures would cause a
shutdown, but don't we want them to?

2. I tested deleting /pgdata/* several times (with pgbench running), and
Postgres shut down within 20 seconds each time.

3. I tested messing with the permissions on pg_control and global, and
Postgres threw other errors but continued running.

4. I mv'd the files and that didn't trigger a shutdown.

5. I did a fast swap:

rm -rf /pgdata/*
cp -p -r /pgdata2/* /pgdata/

... as expected, this did NOT cause postgres to shut down.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for improving buildfarm robustness

2015-09-29 Thread Josh Berkus
On 09/29/2015 12:47 PM, Tom Lane wrote:
> Josh Berkus <j...@agliodbs.com> writes:
>> In general, having the postmaster survive deletion of PGDATA is
>> suboptimal.  In rare cases of having it survive installation of a new
>> PGDATA (via PITR restore, for example), I've even seen the zombie
>> postmaster corrupt the data files.
> 
> However ... if you'd simply deleted everything *under* $PGDATA but not
> that directory itself, then this type of failure mode is 100% plausible.
> And that's not an unreasonable thing to do, especially if you've set
> things up so that $PGDATA's parent is not a writable directory.

I don't remember the exact setup, but this is likely the case.  Probably
1/3 of the systems I monitor have a root-owned mount point for PGDATA's
parent directory.

> Testing accessibility of "global/pg_control" would be enough to catch this
> case, but only if we do it before you create a new one.  So that seems
> like an argument for making the test relatively often.  The once-a-minute
> option is sounding better and better.
> 
> We could possibly add additional checks, like trying to verify that
> pg_control has the same inode number it used to.  But I'm afraid that
> would add portability issues and false-positive hazards that would
> outweigh the value.

It's not worth doing extra stuff for this.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-29 Thread Josh Berkus
On 09/29/2015 03:08 PM, Merlin Moncure wrote:
> I've read this email about three times now and it's not clear at all
> to me what a issue/bug tracker brings to the table.

Here are the problems I'd like to solve:

1. "Was this issue fixed in a Postgres update?  Which one?"

2. Not losing track of minor bugs.

3. Having a better way to track bugs which require multi-part solutions
(e.g. multixact).

4. Having a place for downstream projects/packagers to report bugs.

5. Not answering this question ever again: "Why doesn't your project
have a bug tracker?"

Note that all of the above requires a bug *tracker*, that is, a tool
which tracks the bug activity which was happening anyway, just makes it
more visible.  Rather than an Issue Resolution System, which would be
intended to remake our workflow.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for improving buildfarm robustness

2015-09-29 Thread Josh Berkus
On 09/29/2015 11:48 AM, Tom Lane wrote:
> But today I thought of another way: suppose that we teach the postmaster
> to commit hara-kiri if the $PGDATA directory goes away.  Since the
> buildfarm script definitely does remove all the temporary data directories
> it creates, this ought to get the job done.

This would also be useful for production.  I can't count the number of
times I've accidentally blown away a replica's PGDATA without shutting
the postmaster down first, and then had to do a bunch of kill -9.

In general, having the postmaster survive deletion of PGDATA is
suboptimal.  In rare cases of having it survive installation of a new
PGDATA (via PITR restore, for example), I've even seen the zombie
postmaster corrupt the data files.

So if you want this change to be useful beyond the buildfarm, it should
check every few minutes, and you'd SIGQUIT.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idea for improving buildfarm robustness

2015-09-29 Thread Josh Berkus
On 09/29/2015 12:18 PM, Tom Lane wrote:
> Andrew Dunstan <and...@dunslane.net> writes:
>> On 09/29/2015 02:48 PM, Tom Lane wrote:
>>> Also, perhaps we'd only enable this behavior in --enable-cassert builds,
>>> to avoid any risk of a postmaster incorrectly choosing to suicide in a
>>> production scenario.  Or maybe that's overly conservative.
> 
>> Not every buildfarm member uses cassert, so I'm not sure that's the best 
>> way to go. axolotl doesn't, and it's one of those that regularly has 
>> speed problems. Maybe a not-very-well-publicized GUC, or an environment 
>> setting? Or maybe just enable this anyway. If the data directory is gone 
>> what's the point in keeping the postmaster around? Shutting it down 
>> doesn't seem likely to cause any damage.
> 
> The only argument I can see against just turning it on all the time is
> the possibility of false positives.  I mentioned ENFILE and EPERM as
> foreseeable false-positive conditions, and I'm worried that there might be
> others.  It might be good if we have a small list of specific errnos that
> cause us to conclude we should die, rather than a small list that cause us
> not to.  But as long as we're reasonably confident that we're seeing an
> error that means somebody deleted pg_control, I think abandoning ship
> is just fine.

Give me source with the change, and I'll put it on a cheap, low-bandwith
AWS instance and hammer the heck out of it.  That should raise any false
positives we can expect.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3.9 and pg_multixact corruption

2015-09-28 Thread Josh Berkus
On 09/28/2015 08:10 AM, Robert Haas wrote:
> -1 on that idea.  I really don't think that we should categorically
> decide we don't support higher optimization levels.  If the compiler
> has a bug, then the compiler manufacturer should fix it, and it's not
> our fault.  If the compiler doesn't have a bug and our stuff is
> blowing up, then we have a bug and should fix it.  I suppose there
> could be some grey area but hopefully not too much.

Or it's PILBChAK. I know Sun-CC used to warn that -O3 was unsuitable for
most programs because it could change behavior.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-28 Thread Josh Berkus
On 09/28/2015 03:40 PM, Alvaro Herrera wrote:
> Tom Lane wrote:
> 
>> Now, running gitlab on community-owned hardware would potentially be an
>> option, if we find gitlab attractive from a functionality standpoint.
>> The question I'd have about that is whether it has a real development
>> community, or is open-source in name only.  If github did go belly up,
>> would we find ourselves maintaining the gitlab code all by ourselves?
>> That might not be the end of the world, but it wouldn't be a good use
>> of community time either.
>>
>> Fundamentally, we're playing the long game here.  We do not want to make
>> a choice of tools that we're going to regret ten years from now.
> 
> We already made a similar choice some years ago when we started
> depending on the then-recently open sourced SourceForge code for
> pgFoundry.  That didn't turn out all that well in the long run.

No kidding.

Anyway, we don't have to have this discussion because the Github Issue
model is insufficiently sophisticated for our usage:

* crappy-to-nonexistant email integration
* flat "tag" categorization system
* no concept of releases
* too-simple two-level permissions model
* poor search tools
* no ability to add new fields to extend
* dependency on markup-based cross-referencing
* inability to flag issues for specific people's attention
* no workflow other than open/closed
* no support for attachments

... in short, Github issues is great for a small 6-dev project, but is
utterly inadequate for a project the size of PostgreSQL.

Now, if those issues were common to other tools we could find, then
maybe it would be worth fixing them.  But we already have access to
other tools which are more mature, so why would we bother?

The infra team seems to be good with debbugs, and several committers
seem to like it, why not go with it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-25 Thread Josh Berkus
On 09/25/2015 10:27 AM, Simon Riggs wrote:
> On 25 September 2015 at 11:32, Tom Lane <t...@sss.pgh.pa.us
> 1. We don't have a good process for making sure things don't "slip
> through
> the cracks".  I think everyone more or less relies on Bruce to run
> through
> his mailbox periodically and nag them about threads that don't seem to
> have been closed off.  The deficiencies of that are obvious.
> 
> 
> I don't rely on that myself. That sounds like a personal viewpoint only.
> I welcome more discussion amongst Committers with regard to
> coordination, but formal systems aren't what I think will help there.
> That situation has recently improved anyway, so no further change needed
> at present, IMHO.

??? Improved how?

> 2. There's no visibility for outsiders as to what issues are open or
> recently fixed.  Not being outsiders, I'm not sure that we are terribly
> well qualified to describe this problem precisely or identify a good
> solution --- but I grant that there's a problem there.
> 
> 
> If they can perform "git log" they can view what has happened recently.
> Tracking what might happen is much harder for active contributors.

It takes a lot of technical knowledge of PostgreSQL to relate a commit
message to a bug report, given that the bug report may not be
referenced, and the report and the commit often use completely different
terminology.  Also, users are often wanting to look for bug fixes from
months or even years ago, and git log has crappy searchability.

I can't say how many times I've had a conversation like this:

"Oh, that's a known issue.  It was fixed later in the 9.3 series."

"Really?  In which release specificially?"

"Ummm lemme search the release notes ... I know it's here somewhere ..."

> 
> I've never had a user ask me for such a list. All I here is compliments
> that our software is incredibly robust.

I have. I've had users ask for it, I've had customers ask for it, I've
had companies thinking of adopting PostgreSQL ask for it ... and for a
few of them, our lack of an issue tracker was a deciding factor in
saying that PostgreSQL "wasn't mature enough".  Certainly it was major
points off when Forrester rated us.

Also, members of downstream projects would really like us to get a bug
tracker they can kick up bugs to if they're determined to be in
Postgres.   There are bugs we're not even hearing about because it's too
confusing for someone who has a lot of projects to cover to figure out
our idiosyncratic system.

Today, having an issue tracker is considered "normal" for any
significant OSS project. The fact that we don't have one is regarded as
abberant, and not in a good way ... we're at the point where if we're
not going to adopt one, we'd better have a darned good reason which we
can explain clearly to the public.

> The only time this info is required is for people that provide a Support
> service based upon PostgreSQL, yet are not themselves sufficiently
> involved to know what bugs have been reported and are as yet unfixed. I
> expect such people are extremely interested in getting other people to
> do things that will help their business.

That has absolutely nothing to do with any reason for the PostgreSQL
project to have a bug tracker.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics / patch v7

2015-09-24 Thread Josh Berkus
Tomas,

> attached is v7 of the multivariate stats patch. The main improvement is
> major refactoring of the clausesel.c portion - splitting the awfully
> long spaghetti-style functions into smaller pieces, making it much more
> understandable etc.

So presumably v7 handles varlena attributes as well, yes?   I have a
destruction test case for correlated column stats, so I'd like to test
your patch on it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-24 Thread Josh Berkus
On 09/24/2015 10:24 AM, Stephen Frost wrote:
> * Joe Conway (m...@joeconway.com) wrote:
>> On 09/24/2015 10:08 AM, Stephen Frost wrote:
>>> debbugs does most of the above by default, no programming needed...  I'm
>>> sure we could get it to integrate with the commitfest and have a git
>>> commit hook which sends the appropriate email to it also.
>>>
>>> That the emacs folks are using it makes me *much* more interested in the
>>> idea of getting debbugs up and running..
>>
>> I'm not familiar with debbugs myself, but given that description it
>> sounds to me like it would be worth giving it a try.
> 
> It started out as Debian's bug tracking system, but apparently others
> are using it now also.
> 
> Here's an example.
> 
> The main view for a particular package:
> 
> https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=util-linux;dist=unstable
> 
> A specific bug:
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=786804

I adore "Toggle useless messages" as a feature.  ;-)


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-24 Thread Josh Berkus
On 09/24/2015 12:55 PM, Tom Lane wrote:
> Stephen Frost <sfr...@snowman.net> writes:
>> Are there any objections to pginfra standing up bugs.postgresql.org with
>> debbugs?  Obviously, it'd be more-or-less beta as we play with it, and
>> we could set it up as beta-bugs.p.o, if there's concern about that.
> 
> I agree with the idea that we don't yet want to give the impression that
> this is the official bug tracker.  However, "beta-bugs" could give the
> impression that it was specifically for bugs about 9.5beta, without
> dispelling the idea that it is official.  Maybe "bugs-test.p.o"?

I'd suggest instead just having a big banner up in the page header which
says "this system is currently beta and not yet the canonical source for
postgres bug information".  That way, if it does become the canonical
source, we won't go breaking everyone's links when we change the domain
name.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-24 Thread Josh Berkus
On 09/23/2015 10:25 PM, Thomas Munro wrote:
> On Thu, Sep 24, 2015 at 1:31 PM, Joe Conway <m...@joeconway.com> wrote:
>> On 09/23/2015 05:21 PM, Thomas Munro wrote:
>>> Do you think it would make any sense to consider evolving what we have
>>> already?  At the moment, we have a bug form, and when you submit it it
>>> does this (if I'm looking at the right thing, please correct me if I'm
>>> not):

I know we're big on reinventing the wheel here, but it would really be a
better idea to use an established product than starting over from
scratch. Writing a bug tracker is a lot of work and maintenance.

> The two most common interactions could go something like this:
> 
> 1.  User enters bug report via form, creating an issue in NEW state
> and creating a pgsql-bugs thread.  Someone responds by email that this
> is expected behaviour, not a bug, not worth fixing or not a Postgres
> issue etc using special trigger words.  The state is automatically
> switched to WORKS_AS_DESIGNED or WONT_FIX.  No need to touch the web
> interface: the only change from today's workflow is awareness of the
> right wording to trigger the state change.
> 
> 2.  User enters bug report via form, creating issue #1234 in NEW
> state.   Someone responds by email to acknowledge that that may indeed
> be an issue, and any response to an issue in NEW state that doesn't
> reject it switches it to UNDER_DISCUSSION.  Maybe if a commitfest item
> references the same thread (or somehow references the issue number?)
> its state is changed to IN_COMMITFEST, or maybe as you say there could
> be a way to generate the commitfest item from the issue, not sure
> about that.  Eventually a commit log message says "Fixes bug #1234"
> and the state automatically goes to FIXED.

I don't know debbugs, but I know that it would be possible to program RT
to do all of the above, except add the item to the commitfest.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Decimal64 and Decimal128

2015-09-24 Thread Josh Berkus
On 09/24/2015 02:23 PM, Feng Tian wrote:
> If there is enough interest, would be great for it to go into the
> official contrib dir.
> Thanks, 
> 
> 
> Second thought, the extension depends on decNumber, which is either GPL,
> or ICU license.  Maybe this is trouble.
>  

Yes.  Please just build an external extension and submit it to PGXN.
Thanks!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-23 Thread Josh Berkus
On 09/23/2015 11:43 AM, Robert Haas wrote:
> If somebody does do the work, then we get to the next question: if we
> had an accurate list of open bugs, would anybody who currently doesn't
> work on fixing those bugs step up to help fix them?  I hope so, but I
> don't know.  If not, we might not feel that the effort of maintaining
> the bug tracker paid much of a dividend.

I don't anticipate that getting additional bug fixers would be a benefit
of having a bug tracker, at least not in the first year.  In fact, I
would say that we don't need a bug tracker to fix most significant bugs
at all.  We're pretty good at that.

What we need a bug tracker for is:

1. so users and downstream projects know where to report bugs (and no,
our idiosyncratic bug form doesn't fit into anyone's workflow).

2. so that users know when a bug is fixed, and what release it's fixed
in, rather than depending on "ask someone on IRC".

3. so that we don't completely lose track of low-importance, hard-to-fix
bugs and trivial bugs, which we currently certainly do.

4. so that we can have a clearer idea more immediately that we've fixed
all known bugs in upcoming postgresql releases, instead of depending on
Bruce catching up on his email.

5. so that we have a place to track bugs which require hard, multi-step
fixes and don't lose track of some of the steps like we did with Multixact.

Those are the main reasons to have a BT.  Offering a place for new
hackers to get started with trivial code fixes might be a side benefit,
but isn't a good enough reason to have one.

Obviously, everything said about "who's going to maintain this" is
completely valid.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-23 Thread Josh Berkus
On 09/23/2015 11:18 AM, Kam Lasater wrote:
> 
> At this point not having one is borderline negligent. I'd suggest:
> Github Issues, Pivotal Tracker or Redmine (probably in that order).
> There are tens to hundreds of other great ones out there, I'm sure one
> of them would also work.

First, understand that the Postgres project was created before bug
trackers existed. And people are very slow to change their habits,
especially since not having a bug tracker was actually a benefit up
until around 2005.  It's not anymore, but I'm sure people will argue
with my statement on that.

We have to use something OSS; open source projects depending on
closed-source infra is bad news.  Out of what's available, I'd actually
choose Bugzilla; as much as BZ frustrates the heck out of me at times,
it's the only OSS tracker that's at all sophisticated.

The alternative would be someone building a sophisticated system on top
of RequestTracker, which would also let us have tight mailing list
integration given RT's email-driven model.  However, that would require
someone with the time to build a custom workflow system and web UI on
top of RT.  It's quite possible that Best Practical would be willing to
help here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No Issue Tracker - Say it Ain't So!

2015-09-23 Thread Josh Berkus
On 09/23/2015 03:05 PM, Jim Nasby wrote:
> On 9/23/15 3:12 PM, Thomas Kellerer wrote:
>> They also support Postgres as their backend (and you do find hints
>> here and
>> there
>> that it is the recommended open source DBMS for them - but they don't
>> explicitly state it like that). We are using Jira at the company I
>> work for
>> and
>> all Jira installations run on Postgres there.
> 
> I'll second Jira as well. It's the only issue tracker I've seen that you
> can actually use for multiple different things without it becoming a
> mess. IE: it could track Postgres bugs, infrastructure issues, and the
> TODO list if we wanted, allow issues to reference each other
> intelligently, yet still keep them as 3 separate bodies.

Speaking as someone who uses Jira for commericial work, I'm -1 on them.
 I simply don't find Jira to be superior to OSS BT systems, and inferior
in several ways (like that you can't have more than one person assigned
to a bug).  And email integration for Jira is nonexistant.

When we discussed this 8 years ago, Debian said debbugs wasn't ready for
anyone else to use.  Has that changed?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rework the way multixact truncations work

2015-09-21 Thread Josh Berkus
On 09/21/2015 07:36 AM, Andres Freund wrote:
> On 2015-09-21 10:31:17 -0400, Robert Haas wrote:
>> So, where are we with this patch?
> 
> Uh. I'd basically been waiting on further review and then forgot about
> it.

Does the current plan to never expire XIDs in 9.6 affect multixact
truncation at all?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming Replication clusters and load balancing

2015-09-21 Thread Josh Berkus
On 09/17/2015 07:27 PM, James Sewell wrote:
> Hello all,
> 
> I have recently been working with PostgreSQL and HAProxy to provide
> seamless load balancing to a group of database servers. This on it's own
> isn't a hard thing: I have an implementation finished and am now
> thinking about the best way to bring it to a production ready state
> which could be used by others, and used in load-balancers other than
> HAProxy with minimal config changes.

Funny, I've been working on this exact problem today to add to the
Patroni project: https://github.com/zalando/patroni

My solution will depend on patroni's included HTTP access, though, so
I'm not sure it will work for you.  Anyway, this isn't a topic for
pgsql-hackers mailing list, so reply offlist if you want to discuss further.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-09-07 Thread Josh Berkus
On 09/06/2015 12:34 PM, Joe Conway wrote:
> To the extent that we want specific pg_controldata output in non-text
> form, we should identify which items those are and provide individual
> functions for them.

Well, I think it's pretty simple, let's take it down:

# function pg_control_control_data() returning INT, TSTZ
pg_control version number:942
pg_control last modified: Thu 20 Aug 2015 10:05:33 AM PDT

#function pg_catversion() returning BIGINT
Catalog version number:   201409291

# have function for this, no?
Database system identifier:   6102142380557650900

# not relevant, if we can connect, it's running
Database cluster state:   shut down

# Do we have functions for all of the below?
# if not I suggest virtual table pg_checkpoint_status
# returning the below 17 columns
# that would be useful even if we have some of them
Latest checkpoint location:   0/1A1BF178
Prior checkpoint location:0/1A1BF0D8
Latest checkpoint's REDO location:0/1A1BF178
Latest checkpoint's REDO WAL file:0001001A
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/2038
Latest checkpoint's NextOID:  19684
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:711
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:Thu 20 Aug 2015 10:05:33 AM PDT

# Not in any way useful
Fake LSN counter for unlogged rels:   0/1

# add another system view,
# pg_recovery_state, holding the
# below 5 columns
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no

# duplicates system settings, not needed
Current wal_level setting:logical
Current wal_log_hints setting:off
Current max_connections setting:  100
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192

# do we have the below anywhere else?
# this is somewhat duplicative of config info
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 2048
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value

# return INT function pg_data_page_checksum_version()
Data page checksum version:   0


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-03 Thread Josh Berkus
On 09/03/2015 03:57 AM, Bruce Momjian wrote:
>> > 
>> > Yes, the logical replication has similar syncrep properties as the
>> > binary one (feedback works same way).

Oh?  What does UDR/BDR currently support for sync?

> Yes, I assumed that.  Logical replication uses WAL, so if you are
> synchronous with WAL, logical replication is synchronous too.  However,
> of course, it is synchronous in being durable, not synchronous in terms
> of applying the WAL.  This is true of binary and logical replication.

Well, there's no such thing as simultaneity in scalable architectures.
But users are already used to that ... anybody who load-balances to read
slaves knows about lag.  The only way* to ensure near-simultenaity is to
have some kind of single-node, single-process GTM for the cluster, and
then your actual scalability goes bye-bye.

The bigger issue we'll need to address with this is the fight between
lag and load-balancing, which would become a much worse issue with
read-load-balanced shards which are transparent to the user.  They'd see
the effects of lag, without having actually chosen to use this or that
replica.  This is the other reason to look at logical replication;
presumably with logrep, we can be more discriminating about what
activities cause lag (for one thing, vacuum won't).

Also:
On 09/03/2015 07:00 AM, Kevin Grittner wrote:
> There is another approach to this that we should consider how (if?)
> we are going to cover: database affinity.  I have seen cases where
> there are multiple databases which are targets of asynchronous
> replication, with a web application load balancing among them.  The
> application kept track of which copy each connection was using, so
> that if when they were not exactly in sync the user never saw "time
> moving backward".  Two different users might see versions of the
> data from different points in time, but that generally doesn't
> matter, especially if the difference is just a few minutes.  If one
> copy got too far behind for some reason, they would load-shift to
> the other servers (time still moves forward, only there is a "jump"
> forward at the shift).  This would allow the tardy database to be
> dedicated to catching up again.
>
> Bottom line is that this very smooth behavior required two features
> -- the ability for the application to control database affinity,
> and the ability to shift that affinity gracefully (with no down
> time).

Yes.  Frankly, it would be *easier* to code things so that the same
session always gets its requests load balanced to the same copies;
making that a feature, too, is nice.


(* there are actually other ways to come close to simultaneity, but they
are much more complicated)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-09-03 Thread Josh Berkus
On 09/03/2015 05:11 PM, Bruce Momjian wrote:
> On Thu, Sep  3, 2015 at 11:37:09PM +0200, Petr Jelinek wrote:
>>> I don't understand.  I'm just proposing that the source code for the
>>> extension to live in src/extensions/, and have the shared library
>>> installed by toplevel make install; I'm not suggesting that the
>>> extension is installed automatically.  For that, you still need a
>>> superuser to run CREATE EXTENSION.
>>>
>>
>> +! for this
> 
> OK, what does "+!" mean?  (I know it is probably a shift-key mistype,
> but it looks interesting.)

Add the next factorial value?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-02 Thread Josh Berkus
On 09/01/2015 04:14 PM, Petr Jelinek wrote:
> On 2015-09-02 00:09, Josh Berkus wrote:
>> On 09/01/2015 02:29 PM, Tomas Vondra wrote:
>>> So while you may be right in single-DC deployments, with multi-DC
>>> deployments the situation is quite different - not only that the network
>>> bandwidth is not unlimited, but because latencies within DC may be a
>>> fraction of latencies between the locations (to the extent that the
>>> increase due to syncrep may be just noise). So the local replication may
>>> be actually way faster.
>>
>> I'm not seeing how the above is better using syncrep than using shard
>> copying?
> 
> Shard copying usually assumes that the origin node does the copy - the
> data has to go twice through the slow connection. With replication you
> can replicate locally over fast connection.

Ah, I was thinking of the case of having a single set of copies in the
remote DC, but of course that isn't going to be the case with a highly
redundant setup.

Basically this seems to be saying that, in an ideal setup, we'd have
some kind of synchronous per-shard replication.  We don't have that at
present (sync rep is whole-node, and BDR is asynchronous).  There's also
the question of how to deal with failures and taking bad nodes out of
circulation in such a setup, especially considering that the writes
could be coming from multiple other nodes.

>> Not really, the mechanism is different and the behavior is different.
>> One critical deficiency in using binary syncrep is that you can't do
>> round-robin redundancy at all; every redundant node has to be an exact
>> mirror of another node.  In a good HA distributed system, you want
>> multiple shards per node, and you want each shard to be replicated to a
>> different node, so that in the event of node failure you're not dumping
>> the full load on one other server.
>>
> 
> This assumes that we use binary replication, but we can reasonably use
> logical replication which can quite easily do filtering of what's
> replicated where.

Is there a way to do logical synchronous replication?  I didn't think
there was.

>>> IMHO the design has to address the multi-DC setups somehow. I think that
>>> many of the customers who are so concerned about scaling to many shards
>>> are also concerned about availability in case of DC outages, no?
>>
>> Certainly.  But users located in a single DC shouldn't pay the same
>> overhead as users who are geographically spread.
>>
> 
> Agreed, so we should support both ways, but I don't think it's necessary
> to support both ways in version 0.1. It's just important to not paint
> ourselves into a corner with design decisions that would make one of the
> ways impossible.

Exactly!

Let me explain why I'm so vocal on this point.  PostgresXC didn't deal
with the redundancy/node replacement at all until after version 1.0.
Then, when they tried to address it, they discovered that the code was
chock full of assumptions that "1 node == 1 shard", and breaking that
assumption would require a total refactor of the code (which never
happened).  I don't want to see a repeat of that mistake.

Even if it's only on paper, any new sharding design needs to address
these questions:

1. How do we ensure no/minimal data is lost if we lose a node?
2. How do we replace a lost node (without taking the cluster down)?
   2. a. how do we allow an out-of-sync node to "catch up"?
3. How do we maintain metadata about good/bad nodes (and shard locations)?
4. How do we add nodes to expand the cluster?

There doesn't need to be code for all of the above from version 0.1, but
there needs to be a plan to tackle those problems.  Otherwise, we'll
just end up with another dead-end, not-useful-in-production technology.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-02 Thread Josh Berkus
On 09/02/2015 11:41 AM, Robert Haas wrote:
> On Wed, Sep 2, 2015 at 1:57 PM, Josh Berkus <j...@agliodbs.com> wrote:
>> Even if it's only on paper, any new sharding design needs to address
>> these questions:
>>
>> 1. How do we ensure no/minimal data is lost if we lose a node?
>> 2. How do we replace a lost node (without taking the cluster down)?
>>2. a. how do we allow an out-of-sync node to "catch up"?
>> 3. How do we maintain metadata about good/bad nodes (and shard locations)?
>> 4. How do we add nodes to expand the cluster?
>>
>> There doesn't need to be code for all of the above from version 0.1, but
>> there needs to be a plan to tackle those problems.  Otherwise, we'll
>> just end up with another dead-end, not-useful-in-production technology.
> 
> This is a good point, and I think I agree with it.  Let me make a few
> observations:
> 
> 1. None of this stuff matters very much when the data is strictly
> read-only. 

Yep.

> 2. None of this stuff matters when you only have one copy of the data.
> Your system is low-availability, but you just don't care for whatever
> reason. 

Uh-huh.

> 3. IIUC, Postgres-XC handles this problem by reducing at least
> volatile functions, maybe all functions, to constants.  Then it
> generates an SQL statement to be sent to the data node to make the
> appropriate change.  If there's more than one copy of the data, we
> send a separate copy of the SQL statement to every node.  I'm not sure
> exactly what happens if some of those nodes are not available, but I
> don't think it's anything good.  Fundamentally, this model doesn't
> allow for many good options in that case.

pg_shard also sends the data to each node, and automatically notices
which nodes are not responding and takes them out of availability.
There isn't a "catch up" feature yet (AFAIK), or any attempt to reduce
volatile functions.

For that matter, last I worked on it Greenplum also did multiplexing via
the writing node (or via the data loader).  So this is a popular
approach; it has a number of drawbacks, though, of which volatile
functions are a major one.

> 4. Therefore, I think that we should instead use logical replication,
> which might be either synchronous or asynchronous.  When you modify
> one copy of the data, that change will then be replicated to all other
> nodes.  If you are OK with eventual consistency, this replication can
> be asynchronous, and nodes that are off-line will catch up when they
> are on-line.  If you are not OK with that, then you must replicate
> synchronously to every node before transaction commit; or at least you
> must replicate synchronously to every node that is currently on-line.
> This presents some challenges: logical decoding currently can't
> replicate transactions that are still in process - replication starts
> when the transaction commits.  Also, we don't have any way for
> synchronous replication to wait for multiple nodes.  

Well, there is a WIP patch for that, which IMHO would be much improved
by having a concrete use-case like this one.  What nobody is working on
-- and we've vetoed in the past -- is a way of automatically failing and
removing from replication any node which repeatedly fails to sync, which
would be a requirement for this model.

You'd also need a way to let the connection nodes know when a replica
has fallen behind so that they can be taken out of
load-balancing/sharding for read queries.  For the synchronous model,
that would be "fallen behind at all"; for asynchronous it would be
"fallen more than ### behind".

> But in theory
> those seem like limitations that can be lifted.  Also, the GTM needs
> to be aware that this stuff is happening, or it will DTWT.  That too
> seems like a problem that can be solved.

Yeah?  I'd assume that a GTM would be antithetical to two-stage copying.
 I'm not a big fan of a GTM at all, frankly; it makes clusters much
harder to set up, and becomes a SPOF.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-02 Thread Josh Berkus
On 09/02/2015 12:30 PM, Robert Haas wrote:
> On Wed, Sep 2, 2015 at 3:03 PM, Josh Berkus <j...@agliodbs.com> wrote:
>>> 4. Therefore, I think that we should instead use logical replication,
>>> which might be either synchronous or asynchronous.  When you modify
>>> one copy of the data, that change will then be replicated to all other
>>> nodes.  If you are OK with eventual consistency, this replication can
>>> be asynchronous, and nodes that are off-line will catch up when they
>>> are on-line.  If you are not OK with that, then you must replicate
>>> synchronously to every node before transaction commit; or at least you
>>> must replicate synchronously to every node that is currently on-line.
>>> This presents some challenges: logical decoding currently can't
>>> replicate transactions that are still in process - replication starts
>>> when the transaction commits.  Also, we don't have any way for
>>> synchronous replication to wait for multiple nodes.
>>
>> Well, there is a WIP patch for that, which IMHO would be much improved
>> by having a concrete use-case like this one.  What nobody is working on
>> -- and we've vetoed in the past -- is a way of automatically failing and
>> removing from replication any node which repeatedly fails to sync, which
>> would be a requirement for this model.
> 
> Yep.  It's clear to me we need that in general, not just for sharding.
> To me, the key is to make sure there's a way for the cluster-ware to
> know about the state transitions.  Currently, when the synchronous
> standby changes, PostgreSQL doesn't tell anyone.  That's a problem.

There are many parts of our replication which are still effectively
unmonitorable. For example, there's still no way to tell from the
replica that it's lost contact with the master except by tailing the
log.  If we try to build bigger systems on top of these components,
we'll find that we need to add a lot of instrumentation.

> 
>> You'd also need a way to let the connection nodes know when a replica
>> has fallen behind so that they can be taken out of
>> load-balancing/sharding for read queries.  For the synchronous model,
>> that would be "fallen behind at all"; for asynchronous it would be
>> "fallen more than ### behind".
> 
> How is that different from the previous thing?  Just that we'd treat
> "lagging" as "down" beyond some threshold?  That doesn't seem like a
> mandatory feature.

It's a mandatory feature if you want to load-balance reads.  We have to
know which nodes not to send reads to because they are out of sync.

>> Yeah?  I'd assume that a GTM would be antithetical to two-stage copying.
> 
> I don't think so.  If transaction A writes data on X which is
> replicated to Y and then commits, a new snapshot which shows A as
> committed can't be used on Y until A's changes have been replicated
> there.  That could be enforced by having the commit of A wait for
> replication, or by having an attempt by a later transaction to use the
> snapshot on Y wait until replication completes, or some even more
> sophisticated strategy that considers whether the replication backlog
> touches the same data that the new transaction will read.  It's
> complicated, but it doesn't seem intractable.

I need to see this on a chalkboard to understand it.

>>  I'm not a big fan of a GTM at all, frankly; it makes clusters much
>> harder to set up, and becomes a SPOF.
> 
> I partially agree.  I think it's very important that the GTM is an
> optional feature of whatever we end up with, rather than an
> indispensable component.  People who don't want it shouldn't have to
> pay the price in performance and administrative complexity.  But at
> the same time, I think a lot of people will want it, because without
> it, the fact that sharding is in use is much less transparent to the
> application.

If it can be optional, then we're pretty close to covering most use
cases with one general infrastructure.  That would be nice.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow a per-tablespace effective_io_concurrency setting

2015-09-02 Thread Josh Berkus
On 09/02/2015 02:25 PM, Tomas Vondra wrote:
> 
> As I explained, spindles have very little to do with it - you need
> multiple I/O requests per device, to get the benefit. Sure, the DBAs
> should know how many spindles they have and should be able to determine
> optimal IO depth. But we actually say this in the docs:

My experience with performance tuning is that values above 3 have no
real effect on how queries are executed.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing pg_controldata and pg_config as functions

2015-09-02 Thread Josh Berkus
On 09/02/2015 02:34 PM, Alvaro Herrera wrote:
> Tom Lane wrote:
>> Robert Haas <robertmh...@gmail.com> writes:
>>> But I'm not sure I like the idea of adding a server dependency on the
>>> ability to exec pg_controldata.  That seems like it could be
>>> unreliable at best, and a security vulnerability at worst.
>>
>> I hadn't been paying attention --- the proposed patch actually depends on
>> exec'ing pg_controldata?  That's horrid!  There is no expectation that
>> that's installed.
> 
> No, it doesn't.  For the pg_controldata output it processes the
> pg_control file directly, and for pg_config it relies on compile-time
> CPPFLAGS.
> 
> I think trying to duplicate the exact strings isn't too nice an
> interface.

Well, for pg_controldata, no, but what else would you do for pg_config?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-01 Thread Josh Berkus
On 09/01/2015 11:36 AM, Tomas Vondra wrote:
>> We want multiple copies of shards created by the sharding system itself.
>>   Having a separate, and completely orthagonal, redundancy system to the
>> sharding system is overly burdensome on the DBA and makes low-data-loss
>> HA impossible.
> 
> IMHO it'd be quite unfortunate if the design would make it impossible to
> combine those two features (e.g. creating standbys for shards and
> failing over to them).
> 
> It's true that solving HA at the sharding level (by keeping multiple
> copies of a each shard) may be simpler than combining sharding and
> standbys, but I don't see why it makes low-data-loss HA impossible.

Other way around, that is, having replication standbys as the only
method of redundancy requires either high data loss or high latency for
all writes.

In the case of async rep, every time we fail over a node, the entire
cluser would need to roll back to the last common known-good replay
point, hence high data loss.

In the case of sync rep, we are required to wait for at least double
network lag time in order to do a single write ... making
write-scalability quite difficult.

Futher, if using replication the sharding system would have no way to
(a) find out immediately if a copy was bad and (b) fail over quickly to
a copy of the shard if the first requested copy was not responding.
With async replication, we also can't use multiple copies of the same
shard as a way to balance read workloads.

If we write to multiple copies as a part of the sharding feature, then
that can be parallelized, so that we are waiting only as long as the
slowest write (or in failure cases, as long as the shard timeout).
Further, we can check for shard-copy health and update shard
availability data with each user request, so that the ability to see
stale/bad data is minimized.

There are obvious problems with multiplexing writes, which you can
figure out if you knock pg_shard around a bit.  But I really think that
solving those problems is the only way to go.

Mind you, I see a strong place for binary replication and BDR for
multi-region redundancy; you really don't want that to be part of the
sharding system if you're aiming for write scalability.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-01 Thread Josh Berkus
On 09/01/2015 02:39 AM, Bruce Momjian wrote:
> On Mon, Aug 31, 2015 at 01:16:21PM -0700, Josh Berkus wrote:
>> I'm also going to pontificate that, for a future solution, we should not
>> focus on write *IO*, but rather on CPU and RAM. The reason for this
>> thinking is that, with the latest improvements in hardware and 9.5
>> improvements, it's increasingly rare for machines to be bottlenecked on
>> writes to the transaction log (or the heap). This has some implications
>> for system design.  For example, solutions which require all connections
>> to go through a single master node do not scale sufficiently to be worth
>> bothering with.
> 
> Well, I highlighted write IO for sharding because sharding is the only
> solution that allows write scaling.  If we want to scale CPU, we are
> better off using server parallelism, and to scale CPU and RAM, a
> multi-master/BDR solution seems best.  (Multi-master doesn't do write
> scaling because you eventually have to write all the data to each node.)

You're assuming that our primary bottleneck for writes is IO.  It's not
at present for most users, and it certainly won't be in the future.  You
need to move your thinking on systems resources into the 21st century,
instead of solving the resource problems from 15 years ago.

Currently, CPU resources and locking are the primary bottlenecks on
writing for the vast majority of the hundreds of servers I tune every
year.  This even includes AWS, with EBS's horrible latency; even in that
environment, most users can outstrip PostgreSQL's ability to handle
requests by getting 20K PRIOPs.

Our real future bottlenecks are:

* ability to handle more than a few hundred connections
* locking limits on the scalability of writes
* ability to manage large RAM and data caches


The only place where IO becomes the bottleneck is for the
batch-processing, high-throughput DW case ... and I would argue that
existing forks already handle that case.

Any sharding solution worth bothering with will solve some or all of the
above by extending our ability to process requests across multiple
nodes.  Any solution which does not is merely an academic curiosity.

> For these reasons, I think sharding has a limited use, and hence, I
> don't think the community will be willing to add a lot of code just to
> enable auto-sharding.  I think it has to be done in a way that adding
> sharding also gives other benefits, like better FDWs and cross-node ACID
> control.
> 
> In summary, I don't think adding a ton of code just to do sharding will
> be acceptable.  A corollary of that, is that if FDWs are unable to
> provide useful sharding, I don't see an acceptable way of adding
> built-in sharding to Postgres.

So, while I am fully in agreement with you that having side benefits to
our sharding tools, I think you're missing the big picture entirely.  In
a few years, clustered/sharded PostgreSQL will be the default
installation, or we'll be a legacy database.  Single-node and
single-master databases are rapidly becoming history.

>From my perspective, we don't need an awkward, limited, bolt-on solution
for write-scaling.  We need something which will become core to how
PostgreSQL works.  I just don't see us getting there with the described
FDW approach, which is why I keep raising issues with it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-01 Thread Josh Berkus
On 09/01/2015 10:17 AM, Robert Haas wrote:
> On Tue, Sep 1, 2015 at 1:06 PM, Josh Berkus <j...@agliodbs.com> wrote:
>> Any sharding solution worth bothering with will solve some or all of the
>> above by extending our ability to process requests across multiple
>> nodes.  Any solution which does not is merely an academic curiosity.
> 
> I think the right solution to those problems is to attack them
> head-on.  Sharding solutions should cater to use cases where using all
> the resources of one machine isn't sufficient no matter how
> efficiently we do it.

As long as "all the resources" != "just IO", I'm completely on board
with that.  The reason I raised this is that the initial FDW-based
proposals pretty much scale IO and nothing else.

pg_shard also currently only scales IO, but they're working on that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-01 Thread Josh Berkus
On 09/01/2015 02:29 PM, Tomas Vondra wrote:
> Hi,
> 
> On 09/01/2015 09:19 PM, Josh Berkus wrote:
>> Other way around, that is, having replication standbys as the only
>> method of redundancy requires either high data loss or high latency
>> for all writes.
> 
> I haven't said that. I said that we should allow that topology, not that
> it should be the only method of redundancy.

Ah, OK, I didn't understand you.  Of course I'm in favor of supporting
both methods of redundancy if we can.

>> In the case of sync rep, we are required to wait for at least double
>>  network lag time in order to do a single write ... making
>> write-scalability quite difficult.
> 
> Which assumes that latency (or rather the increase due to syncrep) is a
> problem for the use case. Which may be the case for many use cases, but
> certainly is not a problem for many BI/DWH use cases performing mostly
> large batch loads. In those cases the network bandwidth may be quite
> important resource.

I'll argue that BI/DW is the least interesting use case for mainstream
PostgreSQL because there are production-quality forks which do this
(mostly propietary, but we can work on that).  We really need a solution
which works for OLTP.

> For example assume that there are just two shards in two separate data
> centers, connected by a link with limited bandwidth. Now, let's assume
> you always keep a local replica for failover. So you have A1+A2 in DC1,
> B1+B2 in DC2. If you're in DC1, then writing data to B1 means you also
> have to write data to B2 and wait for it. So either you send the data to
> each node separately (consuming 2x the bandwidth), or send it to B1 and
> let it propagate to B2 e.g. through sync rep.
> 
> So while you may be right in single-DC deployments, with multi-DC
> deployments the situation is quite different - not only that the network
> bandwidth is not unlimited, but because latencies within DC may be a
> fraction of latencies between the locations (to the extent that the
> increase due to syncrep may be just noise). So the local replication may
> be actually way faster.

I'm not seeing how the above is better using syncrep than using shard
copying?

> I can imagine forwarding the data between B1 and B2 even with a purely
> sharding solution, but at that point you effectively re-implemented
> syncrep.

Not really, the mechanism is different and the behavior is different.
One critical deficiency in using binary syncrep is that you can't do
round-robin redundancy at all; every redundant node has to be an exact
mirror of another node.  In a good HA distributed system, you want
multiple shards per node, and you want each shard to be replicated to a
different node, so that in the event of node failure you're not dumping
the full load on one other server.

> IMHO the design has to address the multi-DC setups somehow. I think that
> many of the customers who are so concerned about scaling to many shards
> are also concerned about availability in case of DC outages, no?

Certainly.  But users located in a single DC shouldn't pay the same
overhead as users who are geographically spread.

> I don't follow. With sync rep we do know whether the copy is OK or not,
> because the node either confirms writes or not. The failover certainly
> is more complicated and is not immediate (to the extent of keeping a
> copy at the sharding level), but it's a question of trade-offs.
> 
> It's true we don't have auto-failover solution at the moment, but as I
> said - I can easily imagine most people using just sharding, while some
> deployments use syncrep with manual failover.

As long as direct shard copying is available, I'm happy.  I have no
complaints about additional mechanisms.

I'm bringing this up because the FDW proposal made at pgCon did not
include *any* mechanism for HA/redundancy, just some handwaving about
replication and/or BDR. This was one of the critical design failures of
PostgresXC.  A multinode system without automated node failover and
replacement is a low-availability system.

>> If we write to multiple copies as a part of the sharding feature,
>> then that can be parallelized, so that we are waiting only as long as
>> the slowest write (or in failure cases, as long as the shard
>> timeout). Further, we can check for shard-copy health and update
>> shard availability data with each user request, so that the ability
>> to see stale/bad data is minimized.
> 
> Again, this assumes infinite network bandwidth.

In what way is the total network bandwitdh used in the system different
for shard copying than for sync replication?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Josh Berkus
All, Bruce:

First, let me put out there that I think the horizontal scaling project
which has buy-in from the community and we're working on is infinitely
better than the one we're not working on or is an underresourced fork.
So we're in agreement on that.  However, I think there's a lot of room
for discussion; I feel like the FDW approach was decided in exclusive
meetings involving a very small number of people.  The FDW approach
*may* be the right approach, but I'd like to see some rigorous
questioning of that before it's final.

Particularly, I'm concerned that we already have two projects in process
aimed at horizontal scalability, and it seems like we could bring either
(or both) projects to production quality MUCH faster than we could make
an FDW-based solution work.  These are:

* pg_shard
* BDR

It seems worthwhile, just as a thought experiment, if we can get where
we want using those, faster, or by combining those with new FDW features.

It's also important to recognize that there are three major use-cases
for write-scalable clustering:

* OLTP: small-medium cluster, absolute ACID consistency,
  bottlnecked on small writes per second
* DW: small-large cluster, ACID optional,
  bottlenecked on bulk reads/writes
* Web: medium to very large cluster, ACID optional,
  bottlenecked on # of connections

We cannot possibly solve all of the above at once, but to the extent
that we recognize all 3 use cases, we can build core features which can
be adapted to all of them.

I'm also going to pontificate that, for a future solution, we should not
focus on write *IO*, but rather on CPU and RAM. The reason for this
thinking is that, with the latest improvements in hardware and 9.5
improvements, it's increasingly rare for machines to be bottlenecked on
writes to the transaction log (or the heap). This has some implications
for system design.  For example, solutions which require all connections
to go through a single master node do not scale sufficiently to be worth
bothering with.

On some other questions from Mason:

> Do we want multiple copies of shards, like the pg_shard approach? Or
> keep things simpler and leave it up to the DBA to add standbys? 

We want multiple copies of shards created by the sharding system itself.
 Having a separate, and completely orthagonal, redundancy system to the
sharding system is overly burdensome on the DBA and makes low-data-loss
HA impossible.

> Do we want to leverage table inheritance? If so, we may want to spend
> time improving performance for when the number of shards becomes large
> with what currently exists. If using table inheritance, we could add the
> ability to specify what node (er, foreign server) the subtable lives on.
> We could create top level sharding expressions that allow these to be
> implicitly created.

IMHO, given that we're looking at replacing inheritance because of its
many documented limitations, building sharding on top of inheritance
seems unwise.  For example, many sharding systems are hash-based; how
would an inheritance system transparently use hash keys?

> Should we allow arbitrary expressions for shards, not just range, list
> and hash?

That seems like a 2.0 feature.  It also doesn't seem necessary to
support it for the moderately skilled user; that is, requiring a special
C sharding function for this seems fine to me.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Josh Berkus
On 08/31/2015 02:47 PM, Robert Haas wrote:
> On Mon, Aug 31, 2015 at 4:16 PM, Josh Berkus <j...@agliodbs.com> wrote:
>> First, let me put out there that I think the horizontal scaling project
>> which has buy-in from the community and we're working on is infinitely
>> better than the one we're not working on or is an underresourced fork.
>> So we're in agreement on that.  However, I think there's a lot of room
>> for discussion; I feel like the FDW approach was decided in exclusive
>> meetings involving a very small number of people.  The FDW approach
>> *may* be the right approach, but I'd like to see some rigorous
>> questioning of that before it's final.
> 
> It seems to me that sharding consists of (1) breaking your data set up
> into shards, (2) possibly replicating some of those shards onto
> multiple machines, and then (3) being able to access the remote data
> from local queries.   As far as (1) is concerned, we need declarative
> partitioning, which is being worked on by Amit Langote.  As far as (2)
> is concerned, I hope and expect BDR, or technology derived therefrom,
> to eventually fill that need.  

Well, maybe.  If you look at pg_shard, you'll see that it works by
multiplexing writes to all copies.  There's a good reason to do that; it
allows you to have a tight feedback loop between the success of writes
and the availability of "good" nodes.  If you're depending on a separate
replication system to handle getting row copies from one shard to
another, then you need a different way to deal with bad nodes and with
inconsistency between copies of shards.  That's why the existing
multinode non-relational databases don't separate replication from
writes, either.

For that matter, if what you want is transactional fully ACID sharding,
I really don't see a way to do it via BDR, since BDR is purely
asynchronous replication, as far as I know.

Also, if we want BDR to do this, that's pretty far afield of what BDR is
currently capable of, so someone will need to put serious work into it
rather than just assuming functionality will show up.

> As far as (3) is concerned, why
> wouldn't we use the foreign data wrapper interface, and specifically
> postgres_fdw?  That interface was designed for the explicit purpose of
> allowing access to remote data sources, and a lot of work has been put
> into it, so it would be highly surprising if we decided to throw that
> away and develop something completely new from the ground up.

Well, query hooks are also a capability which we already have, and is
mature.  Citus has already posted about why they chose to use them instead.

As long as you recognize that the FDW API (not just the existing fdws)
will need to expand to make this work, it's a viable path.

Also consider that (3) includes both reads and writes.

> I think it's abundantly clear that we need a logical replication
> solution as part of any horizontal scalability story.  People will
> want to do things like have 10 machines with each piece of data on 3
> of them, and there won't be any reasonable way of doing that without
> logical replication.  I assume that BDR, or some technology derived
> from it, will end up in core and solve that problem.  I had actually
> hoped we were going to get that in 9.5, but it didn't happen that way.
> Still, I think that getting first single-master, and then eventually
> multi-master, logical replication in core is absolutely critical.  And
> not just for sharding specifically: replicating your whole database to
> several nodes and load-balancing your clients across them isn't
> sharding, but it does give you read scalability and is a good fit for
> people with geographically dispersed data with good geographical
> locality.  I think a lot of people will want that.

Well, the latter thing is something which BDR is designed for, so all
that needs to happen with that is getting the rest of the plumbing into
core.  Also documentation, packaging, productization, etc.  But the
heavy lifting has already been done.

However, integrating BDR with sharding has some major design issues
which aren't trivial and may be unresolvable, per above.

> I'm not quite sure yet how we can marry declarative partitioning and
> better FDW-pushdown and logical replication into one seamless, easy to
> deploy solution that requires very low administrator effort.  But I am
> sure that each of those things, taken individually, is very useful,
> and that being able to construct a solution from those building blocks
> would be a big improvement over what we have today.  I can't imagine
> that trying to do one monolithic project that provides all of those
> things, but only if you combine them in the specific way that the
> designer had in mind, is ever going to be successful.  People _will_
> want access to each of those fe

Re: [HACKERS] Planned release for PostgreSQL 9.5

2015-08-25 Thread Josh Berkus
On 08/24/2015 11:26 AM, Tom Lane wrote:
 Paragon Corporation l...@pcorp.us writes:
 Just checking to see if you guys have settled on a date for 9.5.0 release.
 
 No.  Considering we don't have a beta out yet, it's not imminent ...

This is the timeline, effectively:

https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

When that list gets down to a handful of non-critical items, we'll be in
beta.  Help wanted.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-23 Thread Josh Berkus
On 08/21/2015 08:34 PM, Jim Nasby wrote:
 On 8/18/15 12:31 PM, Josh Berkus wrote:
 Also this would be useful for range
 partitions:

 CREATE PARTITION ON parent_table USING ( start_value );

 ... where start_value is the start range of the new partition.  Again,
 easier for users to get correct.
 
 Instead of that, I think it would be more foolproof to do
 
 CREATE PARTITION ON parent_table FOR ( value1, ... );
 
 instead of trusting the user to get the exact start value correct.
 
 Though... I guess there could be value in allowing an exact start value
 but throwing an error if it doesn't sit exactly on a boundary. Might
 make it less likely to accidentally create the wrong partition.

Well, I'm figuring that most people would use CREATE NEXT PARTITION
instead.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-20 Thread Josh Berkus
On 08/20/2015 06:19 AM, David Fetter wrote:
 On Thu, Aug 20, 2015 at 06:58:24PM +0900, Amit Langote wrote:
 Do you mean ATTACH and DETACH, if they require access exclusive lock on
 the parent, should not be in the first cut? Or am I misreading?
 
 Sorry I was unclear.
 
 ATTACH and DETACH should be in the first cut even if they require an
 access exclusive lock.
 
 Cheers,
 David.

I don't see a way for them to *ever* not require an access exclusive lock.

We could eventually implement:

DETACH PARTITION CONCURRENTLY

... but that's the only way I can see around it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb array-style subscripting

2015-08-20 Thread Josh Berkus
On 08/20/2015 12:24 PM, Jim Nasby wrote:
 On 8/17/15 4:25 PM, Josh Berkus wrote:
 On 08/17/2015 02:18 PM, Jim Nasby wrote:
 On 8/17/15 3:33 PM, Josh Berkus wrote:
 Again, how do we handle missing keys?  Just return NULL?  or
 ERROR?  I'd
 prefer the former, but there will be arguments the other way.
 
 I've been wondering if we should add some kind of strict JSON. My big
 concern is throwing an error if you try to provide duplicate keys, but
 it seems reasonable that json_strict would throw an error if you try to
 reference something that doesn't exist.
 Only if there's demand for it.  Is there?
 
 I'm certainly worried (paranoid?) about it. Postgres is very good about
 not silently dropping data and this seems a glaring departure from that.
 I haven't looked yet but I'm hoping this could at least be added as an
 extension without duplicating a bunch of the existing JSON stuff.

There's a big difference between silently dropping data and implicitly
creating it. As I said, the only reason I could see wanting a strict
mode is because AppDev users expect it to be consistent with their
programming languages.  Otherwise, from a user perspective, being able
to create a whole nested chain in one statement is a big win.

What could be added as an extension?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-19 Thread Josh Berkus
On 08/19/2015 04:59 AM, Simon Riggs wrote:
 I like the idea of a regular partitioning step because it is how you
 design such tables - lets use monthly partitions.
 
 This gives sanely terse syntax, rather than specifying pages and pages
 of exact values in DDL
 
PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' )
 START WITH value;

Oh, I like that syntax!

How would it work if there were multiple columns?  Maybe we don't want
to allow that for this form?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable plan

2015-08-19 Thread Josh Berkus
On 08/18/2015 04:40 PM, Qingqing Zhou wrote:
 Attached please find the WIP patch and also the ANALYZE results.
 Notes: the patch may not directly apply to head as some network issue
 here so my Linux box can't talk to git server.

So, one of the things we previously mentioned is that currently many
users deliberately use CTEs as an optimization barrier in order to force
the planner.  Given that, we need some kind of option to force the old
behavior; either SQL syntax or a GUC option.  Otherwise this will cause
a bunch of backwards-compatibility breakage.

Ideas?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our trial to TPC-DS but optimizer made unreasonable plan

2015-08-19 Thread Josh Berkus
On 08/19/2015 01:32 PM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 On 08/18/2015 04:40 PM, Qingqing Zhou wrote:
 Attached please find the WIP patch and also the ANALYZE results.
 Notes: the patch may not directly apply to head as some network issue
 here so my Linux box can't talk to git server.
 
 So, one of the things we previously mentioned is that currently many
 users deliberately use CTEs as an optimization barrier in order to force
 the planner.  Given that, we need some kind of option to force the old
 behavior; either SQL syntax or a GUC option.
 
 I think we already agreed what the syntax would be: ye good olde OFFSET 0
 in the subquery.
 
 We could have a GUC option too if people are sufficiently worried about
 it, but I think that the need for one hasn't really been proven.

Asking users to refactor their applications to add OFFSET 0 is a bit
painful, if we could take care of it via a backwards-compatibility GUC.
 We have many users who are specifically using the CTE optimization
barrier to work around planner failures.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-19 Thread Josh Berkus
On 08/19/2015 01:18 PM, Thom Brown wrote:
 On 19 August 2015 at 21:10, Josh Berkus j...@agliodbs.com
 mailto:j...@agliodbs.com wrote:
 
 On 08/19/2015 04:59 AM, Simon Riggs wrote:
  I like the idea of a regular partitioning step because it is how you
  design such tables - lets use monthly partitions.
 
  This gives sanely terse syntax, rather than specifying pages and pages
  of exact values in DDL
 
 PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' )
  START WITH value;
 
 Oh, I like that syntax!
 
 How would it work if there were multiple columns?  Maybe we don't want
 to allow that for this form?
 
 
 If we went with that, and had:
 
 CREATE TABLE orders (order_id serial, order_date date, item text)
   PARTITION BY RANGE ON (order_date) INCREMENT BY (INTERVAL '1 month')
   START WITH '2015-01-01';
 
 Where would the following go?
 
 INSERT INTO orders (order_date, item) VALUES ('2014-11-12', 'Old item');
 
 Would there automatically be an others partition?  Or would it produce
 an error and act like a constraint?

The others partition was brought up upthread, as an addition to the
original proposal.  I really think that an others partition needs to
be up to the DBA; I've seen apps where they'd want to capture it, and
apps where they'd want such an insert to error.

I, for one, would be OK with a new partitioning which didn't address the
others partition issue until 9.7; I see it as a wholly separable
improvement.

Plus, you can always *manually* add high/low catchall partitions.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning

2015-08-18 Thread Josh Berkus
Amit,

 I would like propose $SUBJECT for this development cycle. Attached is a
 WIP patch that implements most if not all of what's described below. Some
 yet unaddressed parts are mentioned below, too. I'll add this to the CF-SEP.

First of all, wow!  Really happy to see this.

 
 Syntax
 ==
 
 1. Creating a partitioned table
 
 CREATE TABLE table_name
 PARTITION BY {RANGE|LIST}
 ON (column_list);
 
 Where column_list consists of simple column names or expressions:
 
 PARTITION BY LIST ON (name)
 PARTITION BY RANGE ON (year, month)
 
 PARTITION BY LIST ON ((lower(left(name, 2)))
 PARTITION BY RANGE ON ((extract(year from d)), (extract(month from d)))

So far so good.  Have you given any thought as to how a user will
determine which partition corresponds to which values (for purposes of
dropping/maintaining a partition)?

Also, won't doing things like extract() for range partitions make it
much harder for you to develop the planner parts of this solution?

What about defining an interval instead, such as:

PARTITION BY RANGE USING ( interval ) ON ( column );

i.e.

PARTITION BY RANGE USING ( INTERVAL '1 month' ) ON ( submitted_date );
PARTITION BY RANGE USING ( 10 ) ON ( user_id );

This would make it easy for you to construct range type values defining
the range of each partition, which would then make the planner work much
easier than calling a function would, no?

Or am I misunderstanding how you're using ranges here? It kind of seems
like you're still leaving specific range defintions up to the user,
which is (from my perspective) unsatisfactory (see below).

I'm assuming that all range partitions will be [ closed, open ) ranges.


 2. Creating a partition of a partitioned table
 
 CREATE TABLE table_name
 PARTITION OF partitioned_table_name
 FOR VALUES values_spec;
 
 Where values_spec is:
 
 listvalues: [IN] (val1, ...)
 
 rangevalues: START (col1min, ... ) END (col1max, ... )
| START (col1min, ... )
| END (col1max, ... )

So, one thing I missed in here is anything about automated partitioning
of tables; that is, creating new partitions based on incoming data or a
simple statement which doesn't require knowledge of the partitioning
scheme.  It's possible (and entirely accceptable) that you're
considering automated partition creation outside of the scope of this
patch.  However, for range partitions, it would be *really* useful to
have this syntax:

CREATE NEXT PARTITION ON parent_table;

Which would just create the next partition based on whatever the range
partitoning scheme is, instead of requiring the user to calculate start
and end values which might or might not match the parent partitioning
scheme, and might leave gaps.  Also this would be useful for range
partitions:

CREATE PARTITION ON parent_table USING ( start_value );

... where start_value is the start range of the new partition.  Again,
easier for users to get correct.

Both of these require the idea of regular intervals for range
partitions, that is, on a table partitioned by month on a timestamptz
column, each partition will have the range [ month:1, nextmonth:1 ).
This is the most common use-case for range partitions (like, 95% of all
partitioning cases I've seen), so a new partitioning scheme ought to
address it.

While there are certainly users who desire the ability to define
arbitrary ranges for each range partition, these are by far the minority
and could be accomodated by a different path with more complex syntax.
Further, I'd wager that most users who want to define arbitrary ranges
for range partitions aren't going to be satisfied with the other
restrictions on declarative partitioning (e.g. same constraints, columns
for all partitions) and are going to use inheritance partitioning anyway.

 5. Detach partition
 
 ALTER TABLE partitioned_table
 DETACH PARTITION partition_name [USING table_name]
 
 This removes partition_name as partition of partitioned_table. The table
 continues to exist with the same name or 'table_name', if specified.
 pg_class.relispartition is set to false for the table, so it behaves like
 a normal table.

What about DROPping partitions?  Do they need to be detached first?

 Creating index on parent is not allowed. They should be defined on (leaf)
 partitions. Because of this limitation, primary keys are not allowed on a
 partitioned table. Perhaps, we should be able to just create a dummy
 entry somewhere to represent an index on parent (which every partition
 then copies.) 

This would be preferable, yes.  Making users remember to manually create
indexes on each partition is undesirable.

 What should TRUNCATE on partitioned table do?

On the master table?  Truncate all individual partitions.  Do not drop
the partitions.

On a partitition?  Truncate just that partition.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http

Re: [HACKERS] More WITH

2015-08-17 Thread Josh Berkus

 EXPLAIN [ANALYZE]

Would be tricky.  We don't currently have any way to wrap an EXPLAIN in
any larger statement, do we?  Would be very useful for automated query
analysis, though.

 SHOW

Not very useful, easy to work around (pg_settings).

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Josh Berkus
On 08/17/2015 02:18 PM, Jim Nasby wrote:
 On 8/17/15 3:33 PM, Josh Berkus wrote:
 Again, how do we handle missing keys?  Just return NULL?  or ERROR?  I'd
 prefer the former, but there will be arguments the other way.
 
 I've been wondering if we should add some kind of strict JSON. My big
 concern is throwing an error if you try to provide duplicate keys, but
 it seems reasonable that json_strict would throw an error if you try to
 reference something that doesn't exist.

Only if there's demand for it.  Is there?

 array/key ambiguity is going to be painful.
 
 JSON keys are required to be strings, so maybe it's OK to differentiate
 based on whether the index is a string or a number. Or perhaps we use
 different nomenclature (ie: {} for objects).

Well, we did get rid of all of those implicit conversions for a reason.
 So maybe that's good enough?  i.e.

json['a']['b'][1] = 5
assign 5 as the first element in the array 'b' of object 'a'

json['a']['b']['1'] = 5
assign 5 to key '1' of object 'b' of object 'a'

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb array-style subscripting

2015-08-17 Thread Josh Berkus
On 08/17/2015 10:57 AM, Dmitry Dolgov wrote:
 Hi,
 
 Some time ago the array-style subscripting for the jsonb data type was
 discussed in this mailing list. I think it will be quite convenient to
 have a such nice syntax to update jsonb objects, so I'm trying to
 implement this. I created a patch, that allows doing something like this:

Yaaay!

 =# create TEMP TABLE test_jsonb_subscript (
id int,
test_json jsonb
 );
 
 =# insert into test_jsonb_subscript values
 (1, '{}'),
 (2, '{}');
 
 =# update test_jsonb_subscript set test_json['a']['a1']['a2'] = 42;
 =# select * from test_jsonb_subscript;
  id |test_json 
 +--
   1 | {a: {a1: {a2: 42}}}
   2 | {a: {a1: {a2: 42}}}
 (2 rows)

So, both perl and python do not allow deep nesting of assignments.
For example:

 d = { a : { } }
 d[a][a1][a2] = 42
Traceback (most recent call last):
  File stdin, line 1, in module
KeyError: 'a1'

... you have to append one key level at a time.  Your approach, on the
other hand, feels more user-friendly to me; I can't tell you the number
of if 'a2' in dic[key] tests I've written.

So, is there any reason why consistency with perl/python behavior would
be more desirable than user-friendliness?  I'm thinking no, but figured
that it's something which needs to come up.

There is one ambiguous case you need to address:

testjson = '{ a : { } }'

SET testjson['a']['a1']['1'] = 42

... so in this case, is '1' a key, or the first item of an array?  how
do we determine that? How does the user assign something to an array?

 
 =# select test_json['a']['a1'] from test_jsonb_subscript;
  test_json  
 
  {a2: 42}
  {a2: 42}
 (2 rows)

Again, how do we handle missing keys?  Just return NULL?  or ERROR?  I'd
prefer the former, but there will be arguments the other way.

 This patch has a status work in progress of course. Generally
 speaking, this implementation extends the `ArrayRef` usage for the jsonb.
 And I need some sort of advice about several questions:
 
 * is it interesting for the community?
 * is that a good idea to extend the `ArrayRef` for jsonb? If it's
 appropriate, probably we can rename it to `ArrayJsonbRef` of something.
 * what can be improved in the code at the top level (function placement,
 probably, functionality duplication, etc.)?
 * are there any special cases, that I should take care of in this
 implementation?

array/key ambiguity is going to be painful.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-13 Thread Josh Berkus
On 08/12/2015 06:36 PM, Stephen Frost wrote:
 I attempted to address that also by stating that, should an attacker
 compromise a system with the goal of gaining the cleartext password,
 they would attempt the following, in order:
 
 1) attempt to compromise a superuser account, if not already done, and
 then modify the system to get the 'password' auth mechanism to be used
 whereby the password is sent in the clear
 
 2) change the existing password, or encourge the user to do so and
 somehow capture that activity
 
 3) social engineering attacks
 
 4) attempt to crack the md5 hash
 
 5) attempt to crack the SCRAM password verifier
 
 6) try to work out a way to use both the md5 hash and the SCRAM password
 verifier to figure out the password
 

I don't feel like you've correctly assessed the risk inherent in the
md5 auth method, which is that, having captured an md5auth string by
whatever means, and attacker can reuse that md5 string on other
databases in the network *without* cracking it.  That's the biggest risk
as long as md5 is present.

Aside from code complexity, the user security concern with a multiple
verifier per role approach is that the DBAs would never remember to
completely disable md5auth and would capture md5 hashes either in flight
or from backups.  This approach can be used to capture an md5hash from a
non-critical database which is poorly secured, and then re-use it
against an important database.

Now, the counter-argument to this is that a DBA is just as likely to
rememeber to remove md5 verifiers as she is to remember to remove roles
with md5auth.

Regardless of the approach we take, encouraging users to migrate is
going to be more of a matter of documentation, publicity, and
administrative tools than one of multiple verifiers vs. multiple roles.
 That is, giving DBAs the ability to see and log who's using what kind
of verifier, and what account has what verifier(s) available, will make
more of a difference.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-12 Thread Josh Berkus
On 08/12/2015 01:37 PM, Stephen Frost wrote:
 Would be great to get comments on the other comments, specifically that
 adding SCRAM's password verifier won't seriously change the security of
 a user's account or password based on an attack vector where the
 contents of pg_authid is compromised.  I do agree with the general
 concern that the additional complexity involved in supporting multiple
 password verifiers may result in bugs, and likely security ones, but I
 really expect the larger risk to be from the SCRAM implementation itself
 than how we get data into and back out of our own catalogs.

There's also the concern that the additional complexity will cause
*users* to make security-compromising mistakes, which I think is the
greater risk.  Robert has mostly won me over to his point of view on this.

The only case where I can see multiple verifiers per role making a real
difference in migrations is for PGAAS hosting.  But the folks from
Heroku and AWS have been notably silent on this; lemme ping them.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Josh Berkus
On 08/11/2015 07:28 AM, Robert Haas wrote:
 There may be a good answer to this question, but I don't think I've
 seen it spelled out clearly.

Please see my follow-up post about making by-login-role migration easier
for users.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Josh Berkus
On 08/11/2015 09:35 AM, Robert Haas wrote:
 On Tue, Aug 11, 2015 at 12:29 PM, Josh Berkus j...@agliodbs.com wrote:
 On 08/11/2015 07:28 AM, Robert Haas wrote:
 There may be a good answer to this question, but I don't think I've
 seen it spelled out clearly.

 Please see my follow-up post about making by-login-role migration easier
 for users.
 
 I read it, and now I've reread it, but I don't see how it addresses
 the points I raised.

I'm not disagreeing with your security argument, BTW, which is why I'm
trying to come up with ways that make it easy for users to switch to
SCRAM via gradual rollout.

You're suggesting, then, that the switchover should be relatively easy,
because drivers will support both MD5 and SCRAM, and once all drivers
support both, the DBA can just swap verifiers?

That makes sense if drivers go that way.  I'm concerned that some
drivers will have a different call for a SCRAM connection than for an
MD5 one; we'd want to exert our project influence to prevent that from
happening.

That also makes it a bit harder to test the new auth on a few app
servers before a general rollout, but there's ways around that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-11 Thread Josh Berkus
On 08/11/2015 10:06 AM, Robert Haas wrote:
 On Tue, Aug 11, 2015 at 12:49 PM, Josh Berkus j...@agliodbs.com wrote:
 That makes sense if drivers go that way.  I'm concerned that some
 drivers will have a different call for a SCRAM connection than for an
 MD5 one; we'd want to exert our project influence to prevent that from
 happening.
 
 I'm not sure that would be a disaster, but do any existing drivers
 have a different call for a cleartext password
 (pg_hba.conf='password') than they do for an MD5 password
 (pg_hba.conf='md5')?  If not, I'm not sure why they'd add that just
 because there is now a third way of doing password-based
 authentication.

Well, there is a different send-and-response cycle to the SCRAM
approach, no?  Plus, I've seen driver authors do strange things in the
past, including PHP's various drivers and pypgsql, which IIRC required
you to manually pick a protocol version.  I'm not saying we should plan
for bad design, we should just get the word out to driver authors that
we think it would be a good idea to support both methods transparently.

 That also makes it a bit harder to test the new auth on a few app
 servers before a general rollout, but there's ways around that.
 
 Well, staging servers are a good idea...

Don't get me started. :-b

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Josh Berkus
Simon,

Thank you for this summary!  I was losing track, myself.

On 08/09/2015 11:03 PM, Simon Riggs wrote:
 Freezing is painful for VLDBs and high transaction rate systems. We have
 a number of proposals to improve things...

 3. Speed up autovacuums when they are triggered to avoid wraparounds (Simon)
 Idea is to do a VACUUM scan which only freezes tuples. If we dirty a
 page from freezing we then also prune it, but don't attempt to scan
 indexes to remove the now-truncated dead tuples.
 This looks very straightforward, no technical issues. Might even be able
 to backpatch it.
 [patch investigated but not finished yet]

There's a lesser version of this item which remains relevant unless we
implement (5).  That is, currently the same autovacuum_vaccuum_delay
(AVVD) applies to regular autovacuums as does to anti-wraparound
autovacuums.  If the user has set AV with a high delay, this means that
anti-wraparound AV may never complete.  For that reason, we ought to
have a separate parameter for AVVD, which defaults to a lower number
(like 5ms), or even to zero.

Of course, if we implement (5), that's not necessary, since AV will
never trigger an anti-wraparound freeze.

 Having a freeze map would be wholly unnecessary if we don't ever need to
 freeze whole tables again. Freezing would still be needed on individual
 blocks where an old row has been updated or deleted; a freeze map would
 not help there either.

 So there is no conflict, but options 2) and 3) are completely redundant
 if we go for 5). After investigation, I now think 5) is achievable in
 9.6, but if I am wrong for whatever reason, we have 2) as a backstop.

It's not redundant.  Users may still want to freeze for two reasons:

1. to shrink the clog and multixact logs

2. to support INDEX-ONLY SCAN

In both of those cases, having a freeze map would speed up the manual
vacuum freeze considerably.  Otherwise, we're just punting on the
problem, and making it worse for users who wait too long.

Now, it might still be the case that the *overhead* of a freeze map is a
bad tradeoff if we don't have to worry about forced wraparound.  But
that's a different argument.

BTW, has it occured to anyone that implementing XID epoch headers is
going to mean messing with multixact logs again?  Just thought I'd open
a papercut and pour some lemon juice on it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Summary of plans to avoid the annoyance of Freezing

2015-08-10 Thread Josh Berkus
On 08/10/2015 10:31 AM, Simon Riggs wrote:
 Freezing is not a necessary pre-condition for either of those things, I
 am happy to say. There is confusion here because for ( 1 ) the shrink
 was performed after freezing, but when you have access to the epoch
 there is no need for exhaustive freezing - only in special cases, as
 noted. If we are lucky those special cases will mean a massive reduction
 in I/O. For ( 2 ) a normal VACUUM is sufficient and as Robert observes,
 maybe just HOT is enough.

Yeah, saw your explanation on this on the other thread.  Good point.

Question: does regular vacuum update the visibility map in the same way
vacuum freeze does?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-10 Thread Josh Berkus
On 08/09/2015 07:19 PM, Michael Paquier wrote:
 ... during my exchange with Michael, I was thinking about the bug
  potential of taking the password field and multiplexing it in some way,
  which is significant.  There is a definite risk of making this too
  complicated and we'll need to contrast that against ease-of-migration,
  because complicated mechanisms tend to be less secure due to user error.

 Sure. That's why I am all in for adding a compatibility GUC or similar
 that enforces the removal of old verifier types after marking those as
 deprecated for a couple of years as there's surely a significant risk
 to keep old passwords around or bad pg_hba entries. Still we need IMO
 a way for a user to save multiple verifiers generated from a client to
 manage carefully the password verifier aging, deprecations and support
 removal.

That still falls under the heading of possibly too complicated though.

As I see it, there's two potential migration paths:

1. Allow multiple verifiers for each login role (Heikki's plan).

2. Set verifier type per login role.

(2) has the advantage of not requiring a bunch of new scaffolding
(although it will require a little) and thus being less likely to
introduce new bugs.  It also doesn't require adding new catalog
structures which are really only needed for the migration period, and
after which will become a wart (i.e. having multiple verifiers per login
role).

In real migration terms, though, (2) has some major drawbacks in terms
of making migration much harder.

a) it won't work for Heroku and other 1-login-per-database hosting.

b) moving to multiple roles from single roles per app is a painful
process currently.

For (a), one could argue that these are good candidates for all at
once migrations, and that moving to SCRAM will depend on the host
supporting it.  Someone from Heroku could speak up here.

For (b), there are a lot of things we could do to make migrating to a
multiple-role infra much easier for users, which would continue to be
useful even after the migration to SCRAM is history:

* remove the role requirement for ALTER DEFAULT PRIVILEGES, and ...

* add ALTER ROLE ___ ALTER DEFAULT OWNER, a command which sets the
default owner of newly created objects by that login role to a different
role of which they are a member.  Alternatively, add a way to make a
default SET ROLE whenever a login role logs in.

These two changes, or changes like them that serve the same purpose,
would allow us to prescribe the following migration path for most users:

1. add a new login role which is a member of the old login role and uses
SCRAM;

2. set the defaults for that role so that its objects and permissions
belong to the parent role;

3. move all applications to using SCRAM and the new role;

4. disable logins on the old, parent role.

It's currently (2) which is painfully difficult, and could be made less
so via the two features recommended above.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-09 Thread Josh Berkus
On 08/09/2015 08:09 AM, Robert Haas wrote:
 Why do we need to be able to authenticate using more than one
 mechanism?  If you have some clients that can't support SCRAM yet, you
 might as well continue using MD5 across the board until that changes.
 You're not going to get much real security out of using MD5 for some
 authentication attempts and SCRAM for other ones, 

Speaking as someone who has sheperded several clients through
infrastructure upgrades, I have to disagree with this.

First, people don't upgrade large infrastructures with multiple
applications, ETL processes and APIs which connect with the database all
at once.  They do it one component at a time, verify that component is
working, and then move on to the next one.  Even within a single
application, there could be many servers to upgrade, and you can't do
them all simultaneously.

Now, for shops where they've had the foresight to set up group roles
which own objects so that a new user with SCRAM can be assigned in the
group role, this is no problem.  But for the other 98% of our large-app
users, setting up that kind of infrastructure would itself require a
weekend-long downtime, due to the locking required to reassign object
permissions and all of the app testing required.

Second, you're forgetting hosted PostgreSQL, where there may be only one
user available to each database owner.  So assigning a new login role
for SCRAM isn't even an option.

Plus all of the above requires that some login roles have a SCRAM
verifier, and others have MD5, for some period.  Even if we don't
support multiple verifiers for one login, that still means we need to
deal with what verifier gets created for a new role and the required
support functions and GUCs for that.  Switching across the board on a
per-installation basis is a complete nonstarter for any running application.

Frankly, switching on a per-postmaster basis isn't even worth discussing
in my book, because some languages/platforms will take years longer than
others to support SCRAM.

Overall, it's to the PostgreSQL project's benefit to have users switch
to SCRAM once we have it available.  For that reason, we should try to
make it easy for them to make the switch.

However ...

 and the amount of
 infrastructure we're proposing to introduce to support that is pretty
 substantial.

... during my exchange with Michael, I was thinking about the bug
potential of taking the password field and multiplexing it in some way,
which is significant.  There is a definite risk of making this too
complicated and we'll need to contrast that against ease-of-migration,
because complicated mechanisms tend to be less secure due to user error.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Test code is worth the space

2015-08-08 Thread Josh Berkus
On 08/08/2015 12:24 PM, Peter Geoghegan wrote:
 I think that there needs to be a way of running an extended set of
 regression tests. I could definitely respect the desire for minimalism
 when it comes to adding tests to the regression tests proper if there
 was an extended set of tests that could be run during development less
 frequently. I thought about doing the extended set as a satellite
 project, but that may not be workable.

There already is, isn't there?  All of those named sets of regression
tests which aren't run by default.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-08 Thread Josh Berkus
On 08/08/2015 10:23 AM, Heikki Linnakangas wrote:
 On 08/08/2015 04:27 PM, Robert Haas wrote:
 I don't see that there's any good reason to allow the same password to
 be stored in the catalog encrypted more than one way,
 
 Sure there is. If you want to be able to authenticate using different
 mechanism, you need the same password encrypted in different ways.
 SCRAM uses verifier that's derived from the password in one way, MD5
 authentication needs an MD5 hash, and yet other protocols have other
 requirements.

That's correct.  However, one of the goals of implementing SCRAM
authentication is to allow security-conscious users to get rid of those
reusable md5 hashes, no?

Obviously the backwards-compatibility issues are pretty major ... it'll
be years before all drivers support SCRAM ... but we also want to
provide a path forwards for secure installations in which no md5 hashes
are stored.

This says backwards-compatible GUC to me.  Here's one idea on how to
handle this:

1. we drop the parameter password_encryption

2. we add the parameter password_storage, which takes a list:
   - plain : plain text
   - md5 : current md5 hashes
   - scram : new scram hashed passwords
   This defaults to 'md5, scram' if not specified.
   This list might be extended in the future.

3. All password types in the list are generated.  This means having
multiple columns in pg_shadow, or an array.  An array would support the
addition of future password storage methods.

4. CREATE ROLE / ALTER ROLE syntax is changed to accept a parameter to
ENCRYPTED in order to support md5, scram, and future methods.  If no
parameter is supplied, ENCRYPTED will default to 'md5, scram'.

5. we add the superuser-only function pg_apply_password_policy().  This
applies the policy expressed by password_storage, generating or erasing
passwords for each user.

6. We add a new connection error for authentication __method__ not
supported for user

7. Two versions from now, we change the defaults.

I thought about the idea of determining password storage based on what's
in pg_hba.conf, but that seems like way too much implied authorization
to me, and liable to be a big foot-gun.

--Josh Berkus

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SCRAM authentication

2015-08-08 Thread Josh Berkus
On 08/08/2015 03:21 PM, Michael Paquier wrote:
 On Sun, Aug 9, 2015 at 6:51 AM, Josh Berkus j...@agliodbs.com wrote:
 1. we drop the parameter password_encryption
 2. we add the parameter password_storage, which takes a list:
- plain : plain text
- md5 : current md5 hashes
- scram : new scram hashed passwords
This defaults to 'md5, scram' if not specified.
This list might be extended in the future.
 
 Perhaps using a different GUC than password_encryption is safer... I
 am not that sure. Still that's how I switched password_encryption to
 actually handle a list. Default is 'md5' with the first patch, and
 'md5,scram' with the scram patch added and it sets the list of
 password verifiers created when PASSWORD with ENCRYPTED/UNENCRYPTED is
 used.

Well, generally I feel like if we're going to change the *type* of a GUC
parameter, we ought to change the *name*.  It's far easier for users to
figure out that the contents of a parameter need to change if the name
is also changed.

In other words, I think invalid parameter 'password_encryption' is an
easier to understand error message than invalid password_encryption
type 'on'.  Besides which, password_encryption was always a misnomer.

Unless you're going to still accept on, off in some kind of wierd
backwards-compatibitlity mode?  If so, how does that work?

 Like password ENCRYPTED (md5,scram) or similar? If no method is
 passed, I think that we should default to password_storage instead.

Make sense.

 Also, I still think that something like PASSWORD VERIFIERS is needed,
 users may want to set the verifier user for each method after
 calculating it on client-side: we authorize that for md5 even now, and
 that's not something this spec authorizes.

I don't follow this.  Mind you, I'm not sure that I need to.

 5. we add the superuser-only function pg_apply_password_policy().  This
 applies the policy expressed by password_storage, generating or erasing
 passwords for each user.
 
 pg_upgrade could make use of that to control password aging with an
 option to do the cleanup or not. Not sure what the default should be
 though. pg_apply_password_policy(roleid) would be useful as well to do
 it on a role base.

No objections to an optional roleid parameter, if you think people will
use it.

 6. We add a new connection error for authentication __method__ not
 supported for user
 
 Hm? This would let any user trying to connect with a given method know
 that if a method is used or not. What's wrong with failing as we do
 now. In case of PASSWORD NULL for example, an attempt of connection
 fails all the time with incorrect password or similar.

So, the DBA sets password_storage = 'scram', but doesn't take the md5
lines out of pg_hba.conf.

The app dev tries to connect using a driver which only supports md5.
What error should they get?  A user/DBA who is getting invalid
password is going to spend a long time debugging it.  Also, it would be
very useful to have a distinctive error in the log, so that DBAs could
see who is *trying* to connect with the wrong verifier.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-07 Thread Josh Berkus
Petr,

Just user-tested SYSTEM_ROWS and SYSTEM_TIME.  They work as expected.
Useful!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Josh Berkus
On 08/06/2015 01:10 PM, Simon Riggs wrote:
 Given, user-stated probability of accessing a block of P and N total
 blocks, there are a few ways to implement block sampling.
 
 1. Test P for each block individually. This gives a range of possible
 results, with 0 blocks being possible outcome, though decreasing in
 probability as P increases for fixed N. This is the same way BERNOULLI
 works, we just do it for blocks rather than rows.
 
 2. We calculate P/N at start of scan and deliver this number blocks by
 random selection from N available blocks.
 
 At present we do (1), exactly as documented. (2) is slightly harder
 since we'd need to track which blocks have been selected already so we
 can use a random selection with no replacement algorithm. On a table
 with uneven distribution of rows this would still return a variable
 sample size, so it didn't seem worth changing.

Aha, thanks!

So, seems like this is just a doc issue? That is, we just need to
document that using SYSTEM on very small sample sizes may return
unexpected numbers of results ... and maybe also how the algorithm
actually works.

I agree that implementing (2) makes more sense as an additional
algorithm for someone to write in the future.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

2015-08-06 Thread Josh Berkus
On 08/06/2015 01:14 PM, Josh Berkus wrote:
 On 08/06/2015 01:10 PM, Simon Riggs wrote:
 Given, user-stated probability of accessing a block of P and N total
 blocks, there are a few ways to implement block sampling.

 1. Test P for each block individually. This gives a range of possible
 results, with 0 blocks being possible outcome, though decreasing in
 probability as P increases for fixed N. This is the same way BERNOULLI
 works, we just do it for blocks rather than rows.

 2. We calculate P/N at start of scan and deliver this number blocks by
 random selection from N available blocks.

 At present we do (1), exactly as documented. (2) is slightly harder
 since we'd need to track which blocks have been selected already so we
 can use a random selection with no replacement algorithm. On a table
 with uneven distribution of rows this would still return a variable
 sample size, so it didn't seem worth changing.
 
 Aha, thanks!
 
 So, seems like this is just a doc issue? That is, we just need to
 document that using SYSTEM on very small sample sizes may return
 unexpected numbers of results ... and maybe also how the algorithm
 actually works.

Following up on this ... where is TABLESAMPLE documented other than in
the SELECT command?  Doc search on the website is having issues right
now.  I'm happy to write a doc patch.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2   3   4   5   6   7   8   9   10   >