date:20140617

Re: [HACKERS] gaussian distribution pgbench

2014-06-17 Thread Mitsumasa KONDO

Hello Fabien-san,

I have checked your v13 patch, and tested the new exponential distribution
generating algorithm. It works fine and less or no overhead than previous
version.
Great work! And I agree with your proposal.

And I'm also interested in your "decile percents" output like under
followings,

> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=20
> ~
> decile percents: 86.5% 11.7% 1.6% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%
> ~
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
> ~
> decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
> ~
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
> ~
> decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
> ~

I think that it is easy to understand exponential distribution when I check
the exponential parameter. I also agree with it. So I create decile
percents output
 in gaussian distribution.
Here are the examples.

> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=20
> ~
> decile percents: 0.0% 0.0% 0.0% 0.0% 50.0% 50.0% 0.0% 0.0% 0.0% 0.0%
> ~
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=10
> ~
> decile percents: 0.0% 0.0% 0.0% 2.3% 47.7% 47.7% 2.3% 0.0% 0.0% 0.0%
> ~
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --gaussian=5
> ~
> decile percents: 0.0% 0.1% 2.1% 13.6% 34.1% 34.1% 13.6% 2.1% 0.1% 0.0%

I think that it is easier than before. Sum of decile percents is just 100%.


However, I don't prefer "highest/lowest percentage" because it will be
confused
 with decile percentage for users, and anyone cannot understand this
digits.

Here is example when sets exponential=5,
> [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
> ~
> decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
> highest/lowest percent of the range: 4.9% 0.0%
> ~

I cannot understand "4.9%, 0.0%" when I see the first time.
Then, I checked the source code, I understood it:( It's not good design...
#Why this parameter use 100?
So I'd like to remove it if you like. It will be more simple.

Attached patch is fixed version, please confirm it.
#Of course, World Cup is being held now. I'm not hurry at all.

Best regards,
-- 
Mitsumasa KONDO
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
***
*** 41,46 
--- 41,47 
  #include 
  #include 
  #include 
+ #include 
  #ifdef HAVE_SYS_SELECT_H
  #include 
  #endif
***
*** 98,103  static int	pthread_join(pthread_t th, void **thread_return);
--- 99,106 
  #define LOG_STEP_SECONDS	5	/* seconds between log messages */
  #define DEFAULT_NXACTS	10		/* default nxacts */
  
+ #define MIN_GAUSSIAN_THRESHOLD		2.0	/* minimum threshold for gauss */
+ 
  int			nxacts = 0;			/* number of transactions per client */
  int			duration = 0;		/* duration in seconds */
  
***
*** 171,176  bool		is_connect;			/* establish connection for each transaction */
--- 174,187 
  bool		is_latencies;		/* report per-command latencies */
  int			main_pid;			/* main process id used in log filename */
  
+ /* gaussian distribution tests: */
+ double		stdev_threshold;   /* standard deviation threshold */
+ booluse_gaussian = false;
+ 
+ /* exponential distribution tests: */
+ double		exp_threshold;   /* threshold for exponential */
+ bool		use_exponential = false;
+ 
  char	   *pghost = "";
  char	   *pgport = "";
  char	   *login = NULL;
***
*** 332,337  static char *select_only = {
--- 343,430 
  	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
  };
  
+ /* --exponential case */
+ static char *exponential_tpc_b = {
+ 	"\\set nbranches " CppAsString2(nbranches) " * :scale\n"
+ 	"\\set ntellers " CppAsString2(ntellers) " * :scale\n"
+ 	"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ 	"\\setrandom aid 1 :naccounts exponential :exp_threshold\n"
+ 	"\\setrandom bid 1 :nbranches\n"
+ 	"\\setrandom tid 1 :ntellers\n"
+ 	"\\setrandom delta -5000 5000\n"
+ 	"BEGIN;\n"
+ 	"UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;\n"
+ 	"SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n"
+ 	"UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n"
+ 	"UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n"
+ 	"INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);\n"
+ 	"END;\n"
+ };
+ 
+ /* --exponential with -N case */
+ static char *exponential_simple_update = {
+ 	"\\set nbranches " CppAsString2(nbranches) " * :scale\n"
+ 	"\\set ntellers " CppAsString2(ntellers) " * :scale\n"
+ 	"\\set naccounts " CppAsString2(naccounts) " * :scale\n"
+ 	"\\setrandom aid 1 :naccounts exponential :exp_threshold\n"
+ 	"\\setrandom bid 1 :nbranches\n"
+ 	"\\setrandom tid 1 :ntellers\n"
+ 	"\\setrandom delta -5000 5000\n"
+ 	"BEGIN;\n"
+ 	"UPDATE pgbench_accounts SET abalance = a

Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-17 Thread Tom Lane

Amit Kapila  writes:
> On Wed, Jun 18, 2014 at 2:09 AM, Tom Lane  wrote:
>> On the other hand, this approach would entirely fail to account for
>> non-palloc'd allocations, which could be a significant issue in some
>> contexts.

> Won't it be possible if we convert malloc calls in backend code to
> go through wrapper, we already have some precedents of same like
> guc_malloc, pg_malloc?

We do not have control over mallocs done by third-party code
(think pl/perl for example).  mallocs done by our own code are
fairly insignificant, I would hope.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-06-17 Thread Abhijit Menon-Sen

At 2014-06-18 12:55:36 +0900, masao.fu...@gmail.com wrote:
>
> Also I'm thinking to backpatch this to 9.4 because this is a bugfix
> rather than new feature.

That sounds reasonable, thanks.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-06-17 Thread Fujii Masao

On Tue, Jun 17, 2014 at 2:08 PM, Abhijit Menon-Sen  wrote:
> At 2014-06-17 09:49:35 +0530, amit.kapil...@gmail.com wrote:
>>
>> By above do you mean that the patch should allow GUC_NOT_IN_SAMPLE or
>> something else, if earlier then I have removed it as per comment from
>> Fujji-san.
>
> Yes, what you've done in v3 of the patch is what I meant. I've marked
> this as ready for committer now.

Barring objections, I will commit this patch.

Also I'm thinking to backpatch this to 9.4 because this is a bugfix rather
than new feature.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat directory and pg_stat_statements

2014-06-17 Thread Fujii Masao

On Tue, Jun 17, 2014 at 2:11 PM, Shigeru Hanada
 wrote:
> Fujii-san,
>
> I agree not to backpatch, but I noticed that the 9.3 document about
> stats collector doesn't mention $PGDATA/pg_stat.
>
> http://www.postgresql.org/docs/current/static/monitoring-stats.html
>
> It just says:
>> When the server shuts down, a permanent copy of the statistics data is 
>> stored in the global subdirectory, so that statistics can be retained across 
>> server restarts.
>
> I'm not sure whether we should modify the 9.3 document at the moment,
> but actually the description made me confused a bit.

Could you check 8dc90b9c4c45fa30a8f59313e21d294529ef7182 in REL9_3_STABLE
branch? You can see that I added the description of pg_stat directory
into the document
in 9.3.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-17 Thread Amit Kapila

On Wed, Jun 18, 2014 at 2:09 AM, Tom Lane  wrote:
>
> Robert Haas  writes:
> > We could do better by accounting for memory usage ourselves, inside
> > the memory-context system, but that'd probably impose some overhead we
> > don't have today.
>
> Hm.  We could minimize the overhead if we just accounted for entire
> malloc chunks and not individual palloc allocations.  That would make
> the overhead not zero, but yet probably small enough to ignore.
>
> On the other hand, this approach would entirely fail to account for
> non-palloc'd allocations, which could be a significant issue in some
> contexts.

Won't it be possible if we convert malloc calls in backend code to
go through wrapper, we already have some precedents of same like
guc_malloc, pg_malloc?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] How to implement the skip errors for copy from ?

2014-06-17 Thread xbzhang

Use subtraction is very inefficient, a project called pg_bulkload support 
the?skip errors ,and it does not useing subtraction. It?performance is very 
good.??So I want to imitate?pg_bulkload?to?implementation?skip errors of 
copy.if i do the following thing to copy :1. disable all of trigger of table;2. 
only skip the following errors:? ? ?* format error of tuple;? ? ?* check 
constraint?violation;? ? ?* unique or primary key?constraint?violation;? ? 
other errors will be abort current transcation except above three errors.3. 
?bad tuple will be deleted and the?resource owner per tuples is reseased?? ? 
when?a unique key?violation, ?xmax of the tuple will?be marked to current? ? 
transaction id as it?not be seen as good, and all of index?entries?of the bad? 
? tuple will?be real delete at next VACUUM.Is it right to?skip errors for copy 
from?

From:?Tom LaneDate:?2014-06-17?21:40To:?xbzhangCC:?Amit Kapila; Pavel Stehule; 
Alvaro Herrera; pgsql-hackersSubject:?Re: [HACKERS] How to implement the skip 
errors for copy from ?xbzhang  writes:
> LWlocks can record in resource owner per tuples, so they can be released��at 
> rigth way, but the memory allocated on memory context��is one problem.Are 
> there any others problems?
?
See AbortSubTransaction(), CleanupSubTransaction(), and the rather large
number of subroutines they call.? Almost everything that code does is
connected to cleaning up something that might have been left unfinished
after an elog(ERROR) took control away in the middle of some code
sequence.
?
In addition, you can't just wave your hands and presto the bad tuple is
not there anymore.? For example, the failure might have been a unique key
violation in some index or other.? Not only is the bad tuple already on
disk, but possibly so are index entries for it in other indexes.? In
general the only way to get rid of those index entries is a VACUUM.
So you really have to have a subtransaction whose XID is what you mark
the new tuple with, and then rolling back the subtransaction is what
causes the new tuple to not be seen as good.? (Actually getting rid of
it will be left for the next VACUUM.)
?
regards, tom lane
?
?
?
?
-
???
:AVG - www.avg.com
??:2013.0.3480 / ?:3955/7691 - :06/17/14
?

Re: [HACKERS] BUG #10680 - ldapbindpasswd leaks to postgresql log

2014-06-17 Thread Tom Lane

Steven Siebert  writes:
> Attached is a proposed patch for BUG #10680.

> It's a simple fix to the problem of the ldapbindpasswd leaking in
> clear text to the postgresql log.  The patch simply removes the raw
> pg_hba.conf line from the log message, but retains the log line number
> to assist admins in troubleshooting.

You haven't exactly explained why this is a problem.  The proposed patch
would impede diagnosing of many other problems, so it's not going to get
committed without a thoroughly compelling rationale.

Hint: "I don't store my postmaster log securely" is not compelling.
We've been over that ground before; there are far too many reasons
why access to the postmaster log is a potential security hazard
to justify concluding that this particular one is worse.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Jun 16, 2014 at 1:15 AM, Stephen Frost  wrote:
> > I understand that there are performance implications.  As mentioned to
> > Tom, realistically, there's zero way to optimized at least some of these
> > use-cases because they require a completely external module (eg:
> > SELlinux) to be involved in the decision about who can view what
> > records.  If we can optimize that, it'd be by a completely different
> > approach whereby we pull up the qual higher because we know the whole
> > query only involves leakproof functions or similar, allowing us to only
> > apply the filter to the final set of records prior to them being
> > returned to the user.  The point being that such optimizations would
> > happen independently and regardless of the quals or user-defined
> > functions involved.  At the end of the day, I can't think of a better
> > optimization for such a case (where we have to ask an external security
> > module if a row is acceptable to return to the user) than that.  Is
> > there something specific you're thinking about that we'd be missing out
> > on?
> 
> Yeah, if we have to ask an external security module a question for
> each row, there's little hope of any real optimization.  However, I
> think there will be a significant number of cases where people will
> want filtering clauses that can be realized by doing an index scan
> instead of a sequential scan, and if we end up forcing a sequential
> scan anyway, the feature will be useless to those people.

I agree that we want to support that, if we can do so reasonably.  What
I was trying to get at is simply this- don't we provide that already
with the leakproof attribute and functions?  If we don't have enough
there to allow index scans then we should be looking to add more, I'm
thinking.

> > Perhaps it's just my experience, but I've been focused on the main core
> > feature for quite some time and it feels like we're really close to
> > having it there.  I agree that a few additional bits would be nice to
> > have but these strike me as relatively straight-forward to implement
> > overtop of this general construct.  I do see value in documenting these
> > concerns and will see about making that happen, along with what the
> > general viewpoints and thoughts are about how to address the concern.
> 
> I feel like there's quite a bit of work left to do around these
> issues.  The technical bits may not be too hard, but deciding what we
> want will take some thought and discussion.

I agree on this point, but I'm still hopeful that we'll be able to get a
good feature into 9.5.  There are quite a few resources available for
the 'just programming' part, so the long pole in the tent here is
absolutely hashing out what we want and how it should function.

I'd be happy to host or participate in a conference call or similar if
that would be useful to move this along- or we can continue to
communicate via email.  There's a bit of a lull in conferences to which
I'm going to right now, so in person is unlikely, unless folks want to
get together somewhere on the east coast (I'd be happy to travel to
Philly, Pittsburgh, NYC, etc, if it'd help..).

> > That's not actualy true today, is it?  Given our leak-proof attribute,
> > if the qual is "WHERE somecomplexfunction() AND leakprooffunctionx()"
> > then we would be able to push down the leak-proof function and not
> > necessairly run a straight sequential scan, no?  Even so, though, we've
> > had users who have tested exactly what this patch implements and they've
> > been happy with their real-world use-cases.  I'm certainly all for
> > optimization and would love to see us make this better for everyone, but
> > I don't view that as a reason to delay this particular feature which is
> > really just bringing us up to parity with other RDMBS's.
> 
> I'm a bit confused here, because your example seems to be totally
> different from my example.  In my example, somecomplexfunction() will
> get pushed down because it's the security qual; that needs to be
> inside the security_barrier view, or a malicious user can subvert the
> system by getting some other qual evaluated first.  In your example,
> you seem to be imagining WHERE somecomplexfunction() AND
> leakprooffunctionx() as queries sent by the untrusted user, in which
> case, yet, the leak-proof one will get pushed down and the other one
> will not.

Right- my point there was that the leakproof one might allow an index
scan to be run.  This is all pretty hand-wavey, I admit, so I'll see if
I can get more details about how the currently-proposed patch is
performing for the users who are testing it and what kind of plans
they're seeing.  If that falls through, I'll try and build up my own set
of realistic-looking (to myself and the users who are testing) example.

> > What solution did you come up with for this case, which performed well
> > and was also secure..?
> 
> I put the logic in the client.  :-(

Wel

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed  
> wrote:
> > Yeah, I was thinking something like this could work, but I would go
> > further. Suppose you had separate GRANTable privileges for direct
> > access to individual tables, bypassing RLS, e.g.
> >
> >   GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name
> 
> So, is this one new privilege (DIRECT) or four separate new privileges
> that are variants of the existing privileges (DIRECT SELECT, DIRECT
> INSERT, DIRECT UPDATE, DIRECT DELETE)?

I had taken it to be a single privilege, but you're right, it could be
done for each of those..  I really don't think we have the bits for more
than one case here though (if that) without a fair bit of additional
rework.  I'm not against that rework (and called for it wayyy back when
I proposed the TRUNCATE privilege, as I recall) but that's a whole
different challenge and no small bit of work..

> > Actually, given the fact that the majority of users won't be using
> > RLS, I would be tempted to invert the above logic and have the new
> > privilege be for LIMITED access (via RLS quals). So a user granted the
> > normal SELECT privilege would be able to bypass RLS, but a user only
> > granted LIMITED SELECT wouldn't.
> 
> Well, for the people who are not using RLS, there's no difference
> anyway.  I think it matters more what users of RLS will expect from a
> command like GRANT SELECT ... and I'm guessing they'll prefer that RLS
> always apply unless they very specifically grant the right for RLS to
> not apply.  I might be wrong, though.

The preference from the folks using RLS that I've talked to is
absolutely that it be applied by default for all 'normal' (eg:
non-pg_dump) sessions.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Thu, Jun 12, 2014 at 8:13 PM, Stephen Frost  wrote:
> > This approach was suggested by an existing user testing out this RLS
> > approach, to be fair, but it looks pretty sane to me as a way to address
> > some of these concerns. Certainly open to other ideas and thoughts though.
> 
> In general, I agree that this is a good approach.  I think it will be
> difficult to have a GUC with three values, one of which is
> superuser-only and the other two of which are not.  I don't think
> there's any precedent for something like that in the existing
> framework, and I think it's likely we'll run into unpleasant corner
> cases if we try to graft it in.  Also, I think we need to separate
> things: whether the system is willing to allow the user to access the
> table without RLS, and whether the user is willing to accept RLS if
> the system deems it necessary.

Good point- I agree that it's best to avoid having to support individual
superuser-only only options on a GUC.  Also, addressing the issues
independently also makes sense to me.

> For the first one, two solutions have been proposed.  The initial
> proposal was to insist on RLS except for the superuser (and maybe the
> table owner?).  Having a separate grantable privilege, as Dean
> suggests, may be better.  I'll reply separately to that email also, as
> I have a question about what he's proposing.

I like the idea of a grantable privilege as it allows the granularity
that some users may require (or be frustrated that we don't have it).

> For the second one, I think the two most useful behaviors are "normal
> mode" - i.e. allow access to the table, applying RLS predicates if
> required and not applying them if I am exempt - and "error-instead"
> mode - i.e. if my access to this table would be mediated by an RLS
> predicate, then throw an error instead.  

Right, makes sense.

> There's a third mode which
> might be useful as well, which is "even though I have the *right* to
> bypass the RLS predicate on this table, please apply the predicate
> anyway".  This could be used by the table owner in testing, for
> example.  

Agreed, this sounds very useful too.

> Here again, the level of granularity we want to provide is
> an interesting question.  Having a GUC (e.g. enable_row_level_security
> = on, off, force) would be adequate for pg_dump, but allowing the
> table name to be qualified in the query, as proposed downthread, would
> be more granular, admittedly at some parser cost.  I'm personally of
> the view that we *at least* need the GUC, because that seems like the
> best way to secure pg_dump, and perhaps other applications.  We can
> and should give pg_dump an--allow-row-level-security flag, I think,
> but pg_dump's default behavior should be to configure the system in
> such a way that the dump will fail rather than complete with a subset
> of the data.  

This sounds good to me.

> I'm less sure whether we should have something that can
> be used to qualify table names in particular queries.  I think it
> would be really useful, but I'm concerned that it will require
> creating additional fully-reserved keywords, which are somewhat
> painful for users.

I've been trying to think of the use-case for this.  It certainly
*sounds* nice, but on reflection, the use-case for this seems to me to
be that you're trying to develop some application which will be
constrained by RLS totally and therefore want to flip back-and-forth
between "RLS on" and "RLS off" (for the tables involved).  When would
you really need, in the same query, to have RLS enabled for table X but
disabled for table Y?  I do like the idea of an *independent* option to
(just) COPY which says "give me all the data or error, independent of
the GUC for the same purpose".  Would be curious to hear what others
think of that proposal.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Brightwell, Adam

Robert,

However, I believe that
> there has been a lack of focus in the development of the patch thus
> far in a couple of key areas - first in terms of articulating how it
> is different from and better than a writeable security barrier view,
> and second on how to manage the security and operational aspects of
> having a feature like this.  I think that the discussion subsequent to
> my June 10th email has let to some good discussion on both points,
> which was my intent, but I still think much more time and thought
> needs to be spent on those issues if we are to have a feature which is
> up to our usual standards.  I do apologize to anyone who interpreted
> that initial as a pure rant, because it really wasn't intended that
> way.  Contrariwise, I hope that the people defending this patch will
> admit that the issues I am raising are real and focus on whether and
> how those concerns can be addressed.


I absolutely appreciate all of the feedback that has been provided.  It has
been educational.  To your point above, I started putting together a wiki
page, as Stephen has spoken to, that is meant to capture these concerns and
considerations as well as to capture ideas around solutions.

https://wiki.postgresql.org/wiki/Row_Security_Considerations

This page is obviously not complete, but I think it is a good start.
Hopefully this document will help to continue the conversation and assist
in addressing all the concerns that have been brought to the table.  As
well, I hope that this document serves to demonstrate our intent and that
we *are* taking these concerns seriously.  I assure you that as one of the
individuals who is working towards the acceptance of this feature/patch, I
am very much concerned about meeting the expected standards of quality and
security.

Thanks,
Adam

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-06-17 Thread Abhijit Menon-Sen

At 2014-06-17 15:31:33 -0300, klaussfre...@gmail.com wrote:
>
> On Tue, Jun 17, 2014 at 8:47 AM, Abhijit Menon-Sen  
> wrote:
> > if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_SNAPPY)
> 
> You mean == right?

Of course. Thanks.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] BUG #10680 - ldapbindpasswd leaks to postgresql log

2014-06-17 Thread Steven Siebert

Hello,

Attached is a proposed patch for BUG #10680.

It's a simple fix to the problem of the ldapbindpasswd leaking in
clear text to the postgresql log.  The patch simply removes the raw
pg_hba.conf line from the log message, but retains the log line number
to assist admins in troubleshooting.

The patch is against the master branch and compiles/tests green.

Please let me know if there is anything I can do to get this worked
into the next (or perhaps current?) commit fest.  This is a critical
issue for us to meet government accreditation (security) requirements.

Thanks,

Steve


bug_10680_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Peter Geoghegan

On Tue, Jun 17, 2014 at 5:18 PM, Robert Haas  wrote:
> What bash does is annoying and stupid, and any time I find a system
> with that obnoxious behavior enabled I immediately disable it, so I
> don't consider that a good precedent for anything.

I happen to totally agree with you here. Bash sometimes does awful
things with its completion.

>> Another issue is whether to print only those having exactly the minimum
>> observed Levenshtein distance, or to print everything less than some
>> cutoff.  The former approach seems to me to be placing a great deal of
>> faith in something that's only a heuristic.
>
> Well, we've got lots of heuristics.  Many of them serve us quite well.
> I might do something like this:
>
> (1) Set the maximum levenshtein distance to half the length of the
> string, rounded down, but not more than 3.
> (2) If there are more than 2 matches, reduce the maximum distance by 1
> and repeat this step.
> (3) If there are no remaining matches, print no hint; else print the 1
> or 2 matching items.

I could do that. I can prepare a revision if others feel that's
acceptable. My only concern with this is that a more sophisticated
scheme implies more clutter in the parser, although it should not
imply wasted cycles.

What I particularly wanted to avoid in our choice of completion scheme
is doing nothing because there is an ambiguity about what is best,
which Tom suggested. In practice, that ambiguity will frequently be
something that our users will not care about, and not really see as an
ambiguity, as in my "o.orderid or ol.orderid?" example. However, if
there are 3 equally distant Vars, and not just 2, that's very probably
because none are useful, and so we really ought to show nothing. This
seems most sensible.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-06-17 Thread Craig Ringer

On 06/18/2014 12:41 AM, Robert Haas wrote:
> On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer  wrote:
>> > On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:
>>> >> Yeah. To recap, the failure mode is that if the master crashes and
>>> >> restarts, the transaction becomes visible in the master even though it
>>> >> was never replicated.
>> >
>> > Wouldn't another pg_clog bit for the transaction be able to sort that out?
> How?

A flag to indicate that the tx is locally committed but hasn't been
confirmed by a streaming synchronous replica, so it must not become
visible until the replica confirms it or SR is disabled.

Then scan pg_clog on start / replica connect and ask the replica to
confirm local commit for those tx's.

No?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btreecheck extension

2014-06-17 Thread Peter Geoghegan

On Tue, Jun 17, 2014 at 5:11 PM, Robert Haas  wrote:
> I think there's something to be said for that, but I think at the
> moment I like the idea of a functional interface better.  The reason
> is that I'm not sure we can predict all of the checks we're going to
> want to add.

That's true. Clearly this is very hand-wavy, and as I said I think
it's appropriate that the tool evolve in response to our testing
requirements, but I would like to figure out a way to have
verification tooling available for some kind of semi-standard tests,
possibly even including the standard regression tests. It's probably
too early to discuss, though.

> Now, we could.  We could come up with an extensible syntax, like this:
>
> CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ];

That's what I had in mind. Using the same trick that you came up with
for EXPLAIN, to avoid grammar bloat and let the am figure out for
itself what to name the various check types, with a generic default
check.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-17 Thread Bruce Momjian

On Tue, Jun 17, 2014 at 07:12:16PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Tue, Jun 17, 2014 at 03:55:02PM -0400, Alvaro Herrera wrote:
> >> Can't you compare it to the historic default value?  I mean, add an
> >> assumption that people thus far has never tweaked it.
> 
> > Well, if they did tweak it, then they would be unable to use pg_upgrade
> > because it would complain about a mismatch if they actually matched the
> > old and new servers.
> 
> What about comparing to the symbolic value LOBLKSIZE?  This would make
> pg_upgrade assume that the old installation had been tweaked the same
> as in its own build.  This ends up being the same as what you said,
> ie, effectively no comparison ... but it might be less complicated to
> code/understand.

OK, assume the compiled-in default is the value for an old cluster that
has no value --- yeah, I could do that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [REVIEW] psql tab completion for DROP TRIGGER/RULE and ALTER TABLE ... DISABLE/ENABLE

2014-06-17 Thread Andreas Karlsson


On 06/18/2014 02:34 AM, Ian Barwick wrote:

On 14/06/18 7:51, Andreas Karlsson wrote:

On 06/17/2014 01:36 PM, Ian Barwick wrote:

One issue - the table's internal triggers will also be listed. which can
result in
something like this:

This is a bit of an extreme case, but I don't think manually manipulating
internal triggers (which can only be done as a superuser) is a common
enough
operation to justify their inclusion. I suggest adding
'AND tgisinternal is FALSE' to 'Query_for_trigger_of_table' to hide them.


Good suggestion. I have attached a patch which filters out the internal 
triggers,
both for ALTER TABLE and DROP TRIGGER. I am not entirely sure about the DROP 
TRIGGER
case but I think I prefer no auto completion of RI triggers.


Thanks, looks good. Another reason for not autocompleting RI triggers is that
the names are all auto-generated; on the offchance you are manually manipulating
them individually, you'd have to have a pretty good idea of which ones you're
working with anyway.


Thanks, could you update the status of the patch in the commitfest app 
to "Ready for Committer"?



Personally I think this patch could go into 9.4, as it's not introducing any
new features and doesn't depend on any 9.5 syntax.


I think the feature freeze is strict to avoid having to think about what 
is an exception and what is not.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [REVIEW] psql tab completion for DROP TRIGGER/RULE and ALTER TABLE ... DISABLE/ENABLE

2014-06-17 Thread Ian Barwick

On 14/06/18 7:51, Andreas Karlsson wrote:
> On 06/17/2014 01:36 PM, Ian Barwick wrote:
>> One issue - the table's internal triggers will also be listed. which can
>> result in
>> something like this:
>>
>> This is a bit of an extreme case, but I don't think manually manipulating
>> internal triggers (which can only be done as a superuser) is a common
>> enough
>> operation to justify their inclusion. I suggest adding
>> 'AND tgisinternal is FALSE' to 'Query_for_trigger_of_table' to hide them.
> 
> Good suggestion. I have attached a patch which filters out the internal 
> triggers, 
> both for ALTER TABLE and DROP TRIGGER. I am not entirely sure about the DROP 
> TRIGGER
> case but I think I prefer no auto completion of RI triggers.

Thanks, looks good. Another reason for not autocompleting RI triggers is that
the names are all auto-generated; on the offchance you are manually manipulating
them individually, you'd have to have a pretty good idea of which ones you're
working with anyway.

Personally I think this patch could go into 9.4, as it's not introducing any
new features and doesn't depend on any 9.5 syntax.

Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Robert Haas

On Tue, Jun 17, 2014 at 5:36 PM, Tom Lane  wrote:
> Josh Berkus  writes:
>> (2) If there are multiple columns with the same levenschtien distance,
>> which one do you suggest?  The current code picks a random one, which
>> I'm OK with.  The other option would be to list all of the columns.
>
> I objected to that upthread.  I don't think that picking a random one is
> sane at all.  Listing them all might be OK (I notice that that seems to be
> what both bash and git do).

What bash does is annoying and stupid, and any time I find a system
with that obnoxious behavior enabled I immediately disable it, so I
don't consider that a good precedent for anything.  I think what the
bash algorithm demonstrates is that while it may be sane to list more
than one option, listing 10 or 20 or 150 is unbearably obnoxious.
Filling the user's *entire terminal window* with a list of suggestions
when they make a minor typo is more like a punishment than an aid.
git's behavior of limiting itself to one or two options, while
somewhat useless, is at least not annoying.

> Another issue is whether to print only those having exactly the minimum
> observed Levenshtein distance, or to print everything less than some
> cutoff.  The former approach seems to me to be placing a great deal of
> faith in something that's only a heuristic.

Well, we've got lots of heuristics.  Many of them serve us quite well.
I might do something like this:

(1) Set the maximum levenshtein distance to half the length of the
string, rounded down, but not more than 3.
(2) If there are more than 2 matches, reduce the maximum distance by 1
and repeat this step.
(3) If there are no remaining matches, print no hint; else print the 1
or 2 matching items.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] comparison operators

2014-06-17 Thread Andrew Dunstan



On 06/17/2014 07:25 PM, Andres Freund wrote:

On 2014-06-17 19:22:07 -0400, Tom Lane wrote:

Andrew Dunstan  writes:

I went to have a look at documenting the jsonb comparison operators, and
found that the docs on comparison operators contain this:
 Comparison operators are available for all relevant data types.
They neglect to specify further, however. This doesn't seem very
satisfactory. How is a user to know which are relevant? I know they are
not available for xml and json, but are for jsonb. Just talking about
"all relevant types" seems rather hand-wavy.

Well, there are 38 default btree opclasses in the standard system ATM.
Are we worried enough about this to list them all explicitly?  Given the
lack of complaints to date, I'm not.

However, if we try to fudge it by saying something like "available for
all data types for which there is a natural linear order", I'm not
sure that that's 100% true; and it's certainly not complete, since
for instance jsonb's ordering is rather artificial, and the area-based
orderings of the built-in geometric types are even more so.

It's not true for e.g. xid (which is rather annoying btw).




I think I'd rather just say "for many data types" or something along 
those lines, rather than imply that there is some obvious rule that 
users should be able to intuit.


For json/jsonb I think I'll just add a para saying we have them for 
jsonb and not for json.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btreecheck extension

2014-06-17 Thread Robert Haas

On Tue, Jun 17, 2014 at 5:10 PM, Peter Geoghegan  wrote:
> On Tue, Jun 17, 2014 at 1:16 PM, Robert Haas  wrote:
>> I don't feel qualified to comment on any of the substantive issues you
>> raise, so instead I'd like to bikeshed the name.  I suggest that we
>> create one extension to be a repository for index-checking machinery
>> (and perhaps also heap-checking machinery) and view this as the first
>> of possibly several checkers to live there.  Otherwise, we may
>> eventually end up with separate extensions for btreecheck, hashcheck,
>> gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck,
>> toastcheck, etc. which seems like excessive namespace pollution.
>
> I agree.
>
> I hope that we'll eventually be able to move code like this into each
> AM, with something like an amverifyintegrity pg_am entry optionally
> provided. There'd also be a utility statement that would perform this
> kind of verification. It seems natural to do this, as the patch I've
> posted arguably adds a big modularity violation. Besides, it seems
> worthwhile to pepper the regular regression tests with calls like
> these, at least in some places, and putting something in core is the
> most convenient way to do that.

I think there's something to be said for that, but I think at the
moment I like the idea of a functional interface better.  The reason
is that I'm not sure we can predict all of the checks we're going to
want to add.  For example, maybe somebody will come up with another
btree checker that's different from your btree checker, and maybe
there will be good reasons not to merge the two - e.g. different
locking requirements, or different runtimes, or whatever.  Even more
likely, I think there will be things we want to do that fall under the
broad umbrella of integrity checking that we just can't predict now:
scan the table and check whether all the keys are present in every
index, scan the TOAST table and make sure that all the chunks are the
right size, etc.  If we have a bunch of functions for this sort of
thing, it's easy and relatively uncontroversial to add more.  If we
put it in core and give it real grammar support, then we've got to
fight with keywords and contemplate grammar bloat and, basically, I
think every change will be likely to get a higher level of scrutiny.
I'd rather not go there.

Now, we could.  We could come up with an extensible syntax, like this:

CHECK relation [ USING { checktype [ '(' arg [, ...] '}' [, ...] ];

But frankly I'm kind of uncompelled by that.  This isn't a feature
that seems to me to really need to be in core.  It doesn't
particularly need grammar support, and it doesn't need WAL logging,
and not everyone needs it at all, and especially if we eventually end
up with a robust suite of tools in this area, not everyone may even
want it, if it means a bigger installed footprint or more security
concerns to worry about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-06-17 Thread Greg Stark

On Tue, Jun 17, 2014 at 11:16 AM, Andres Freund  wrote:
> Well, it might be doable to correlate them along one axis, but along
> both?  That's more complicated... And even alongside one axis you
> already get into problems if your geometries are irregularly sized.
> Asingle large polygon will completely destroy indexability for anything
> stored physically close by because suddently the minmax range will be
> huge... So you'll need to cleverly sort for that as well.

I think there's a misunderstanding here, possibly mine. My
understanding is that a min/max index will always be exactly the same
size for a given size table. It stores the minimum and maximum value
of the key for each page. Then you can do a bitmap scan by comparing
the search key with each page's minimum and maximum to see if that
page needs to be included in the scan. The failure mode is not that
the index is large but that a page that has an outlier will be
included in every scan as a false positive incurring an extra iop.

I don't think it's implausible at all that Geometric data would work
well. If you load Geometric data it's very common to load data by
geographic area so that all objects in San Francisco in one part of
the data load, probably even by zip code or census block.

What operations would an opclass for min/max need? I think it would be
pretty similar to the operators that GiST needs (thankfully minus the
baroque page split function):

An aggregate to generate a min/max "bounding box" from several values
A function which takes an "bounding box" and a new value and returns
the new "bounding box"
A function which tests if a value is in a "bounding box"
A function which tests if a "bounding box" overlaps a "bounding box"

The nice thing is this would let us add things like range @> (contains
element) to the plain integer min/max case.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] comparison operators

2014-06-17 Thread Andres Freund

On 2014-06-17 19:22:07 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > I went to have a look at documenting the jsonb comparison operators, and 
> > found that the docs on comparison operators contain this:
> 
> > Comparison operators are available for all relevant data types.
> 
> > They neglect to specify further, however. This doesn't seem very 
> > satisfactory. How is a user to know which are relevant? I know they are 
> > not available for xml and json, but are for jsonb. Just talking about 
> > "all relevant types" seems rather hand-wavy.
> 
> Well, there are 38 default btree opclasses in the standard system ATM.
> Are we worried enough about this to list them all explicitly?  Given the
> lack of complaints to date, I'm not.
> 
> However, if we try to fudge it by saying something like "available for
> all data types for which there is a natural linear order", I'm not
> sure that that's 100% true; and it's certainly not complete, since
> for instance jsonb's ordering is rather artificial, and the area-based
> orderings of the built-in geometric types are even more so.

It's not true for e.g. xid (which is rather annoying btw).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] comparison operators

2014-06-17 Thread Tom Lane

Andrew Dunstan  writes:
> I went to have a look at documenting the jsonb comparison operators, and 
> found that the docs on comparison operators contain this:

> Comparison operators are available for all relevant data types.

> They neglect to specify further, however. This doesn't seem very 
> satisfactory. How is a user to know which are relevant? I know they are 
> not available for xml and json, but are for jsonb. Just talking about 
> "all relevant types" seems rather hand-wavy.

Well, there are 38 default btree opclasses in the standard system ATM.
Are we worried enough about this to list them all explicitly?  Given the
lack of complaints to date, I'm not.

However, if we try to fudge it by saying something like "available for
all data types for which there is a natural linear order", I'm not
sure that that's 100% true; and it's certainly not complete, since
for instance jsonb's ordering is rather artificial, and the area-based
orderings of the built-in geometric types are even more so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby

2014-06-17 Thread Jeff Janes

(Redirected from Bugs to Hackers, as it seems not to be the same bug
as the original one.)

On Mon, Jun 9, 2014 at 3:50 PM, Alvaro Herrera  wrote:
> Jeff Janes wrote:
>
>> This patch seems to solve a problem I've also been having with non-existent
>> "pg_multixact/members/13D35" files in my testing of torn page write and fk
>> locks again recent 9.4.  However, sometimes I've found the problem even
>> before multi wrap around occured, and my reading of this thread suggests
>> that the current issue would not show up in that case, so I don't know how
>> much comfort to take in the knowledge that the problem seems to have gone
>> away--as they say, things that go away by themselves without explanation
>> also tend to come back by themselves.
>
> I think this problem is rather related to the code touched by commit
> 722acf51a0d074d19782ad7e97ebe3fdb63fbb87, which is about removing
> unneeded files in pg_multixact/members.  I checked your files, and it
> seems that pg_multixact/members files that should still exist are gone;
> for instance, pg_controldata says that the oldest multi is 2586314407,
> but if I run
> select * from pg_get_multixact_members('that value')
> I get an error saying that pg_multixact/members/218B does not exist.
>
> It seems that there are a lot of files that were improperly removed.
> I expect that existing files should be sequential, but there are a lot
> of holes in a listing of pg_multixact/members.

I think the holes are because a truncation was in progress and
interrupted by a 'pg_ctl stop -mi' before it completed. The files are
considered in an apparently random order, so if it is interrupted
midway through it will leave a collection of holes that would have
been closed up had it been allowed to continue.

I instrumented the code to log file deletion (attached patch) and I
see that the it starts complaining about the absence of 5183 just a
fraction of a second after it was deleted, so the deletion must have
been inappropriate.

12538  2014-06-17 12:10:02.925 PDT:LOG:  JJ deleting 0xb66b20 5183
12498 UPDATE 2014-06-17 12:10:03.188 PDT:DETAIL:  Could not open file
"pg_multixact/members/5183": No such file or directory.
12561 UPDATE 2014-06-17 12:10:04.473 PDT:DETAIL:  Could not open file
"pg_multixact/members/5183": No such file or directory.
12572 UPDATE 2014-06-17 12:10:04.475 PDT:DETAIL:  Could not open file
"pg_multixact/members/5183": No such file or directory.

This problem was initially fairly easy to reproduce, but since I
started adding instrumentation specifically to catch it, it has become
devilishly hard to reproduce.

I think my next step will be to also log each of the values which goes
into the complex if (...) expression that decides on the deletion.

Cheers,

Jeff

member_delete_log.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] comparison operators

2014-06-17 Thread Andrew Dunstan



I went to have a look at documenting the jsonb comparison operators, and 
found that the docs on comparison operators contain this:


   Comparison operators are available for all relevant data types.

They neglect to specify further, however. This doesn't seem very 
satisfactory. How is a user to know which are relevant? I know they are 
not available for xml and json, but are for jsonb. Just talking about 
"all relevant types" seems rather hand-wavy.


Thoughts?

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-17 Thread Tom Lane

Bruce Momjian  writes:
> On Tue, Jun 17, 2014 at 03:55:02PM -0400, Alvaro Herrera wrote:
>> Can't you compare it to the historic default value?  I mean, add an
>> assumption that people thus far has never tweaked it.

> Well, if they did tweak it, then they would be unable to use pg_upgrade
> because it would complain about a mismatch if they actually matched the
> old and new servers.

What about comparing to the symbolic value LOBLKSIZE?  This would make
pg_upgrade assume that the old installation had been tweaked the same
as in its own build.  This ends up being the same as what you said,
ie, effectively no comparison ... but it might be less complicated to
code/understand.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-17 Thread Alvaro Herrera

Kevin Grittner wrote:
> Andres Freund  wrote:

> The release of version n doesn't mean that version n-1 is no longer
> supported.  As of today, this page:
> 
> http://www.oracle.com/technetwork/server-storage/solaris10/overview/index-138972.html
> 
> says:
> 
> > The Oracle Solaris 9 operating system is now in the Extended
> > Support period of it's life cycle. Customers with support
> > contracts can still access patches and log new service requests,
> > although support contracts carry a surcharge for Oracle Solaris 9
> > support.

Most likely, people will not be installing Postgres 9.5 in that
platform, though, which is what matters for the purposes of the proposal
at hand.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-17 Thread Bruce Momjian

On Tue, Jun 17, 2014 at 03:55:02PM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Tue, Jun 17, 2014 at 12:28:46PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > Uh, I think pg_upgrade needs to check that they match too.
> > > 
> > > Possibly.  What do you think it should do when examining a pg_control
> > > version that lacks the field?
> > 
> > Good question.  I have existing cases where fields were removed, but not
> > ones that were added.  As we have no way to query the old cluster's
> > value for LOBLKSIZE, I think I will just add code to compare them if
> > they _both_ exist.
> 
> Can't you compare it to the historic default value?  I mean, add an
> assumption that people thus far has never tweaked it.

Well, if they did tweak it, then they would be unable to use pg_upgrade
because it would complain about a mismatch if they actually matched the
old and new servers.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [REVIEW] psql tab completion for DROP TRIGGER/RULE and ALTER TABLE ... DISABLE/ENABLE

2014-06-17 Thread Andreas Karlsson


On 06/17/2014 01:36 PM, Ian Barwick wrote:

Thanks for this patch; I'm playing around with rules at the moment and
it was
very useful. A quick review:

- applies cleanly to HEAD

- does what it claims, i.e. adds tab completion support for this syntax:

 ALTER TABLE table { ENABLE | DISABLE } [ ALWAYS | REPLICA ] { RULE
| TRIGGER } rule_or_trigger
 DROP TRIGGER trigger ON relation { CASCADE | RESTRICT }
 DROP RULE rule ON relation { CASCADE | RESTRICT }

- code style is consistent with the project style


Thanks for the review.


One issue - the table's internal triggers will also be listed. which can
result in
something like this:

This is a bit of an extreme case, but I don't think manually manipulating
internal triggers (which can only be done as a superuser) is a common
enough
operation to justify their inclusion. I suggest adding
'AND tgisinternal is FALSE' to 'Query_for_trigger_of_table' to hide them.


Good suggestion. I have attached a patch which filters out the internal 
triggers, both for ALTER TABLE and DROP TRIGGER. I am not entirely sure 
about the DROP TRIGGER case but I think I prefer no auto completion of 
RI triggers.


--
Andreas Karlsson

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 3bb727f..be5c3c5
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*** static const SchemaQuery Query_for_list_
*** 626,631 
--- 626,639 
  " WHERE pg_catalog.quote_ident(conname)='%s')"
  
  /* the silly-looking length condition is just to eat up the current word */
+ #define Query_for_rule_of_table \
+ "SELECT pg_catalog.quote_ident(rulename) "\
+ "  FROM pg_catalog.pg_class c1, pg_catalog.pg_rewrite "\
+ " WHERE c1.oid=ev_class and (%d = pg_catalog.length('%s'))"\
+ "   and pg_catalog.quote_ident(c1.relname)='%s'"\
+ "   and pg_catalog.pg_table_is_visible(c1.oid)"
+ 
+ /* the silly-looking length condition is just to eat up the current word */
  #define Query_for_list_of_tables_for_rule \
  "SELECT pg_catalog.quote_ident(relname) "\
  "  FROM pg_catalog.pg_class"\
*** static const SchemaQuery Query_for_list_
*** 635,640 
--- 643,657 
  " WHERE pg_catalog.quote_ident(rulename)='%s')"
  
  /* the silly-looking length condition is just to eat up the current word */
+ #define Query_for_trigger_of_table \
+ "SELECT pg_catalog.quote_ident(tgname) "\
+ "  FROM pg_catalog.pg_class c1, pg_catalog.pg_trigger "\
+ " WHERE c1.oid=tgrelid and (%d = pg_catalog.length('%s'))"\
+ "   and pg_catalog.quote_ident(c1.relname)='%s'"\
+ "   and pg_catalog.pg_table_is_visible(c1.oid)"\
+ "   and not tgisinternal"
+ 
+ /* the silly-looking length condition is just to eat up the current word */
  #define Query_for_list_of_tables_for_trigger \
  "SELECT pg_catalog.quote_ident(relname) "\
  "  FROM pg_catalog.pg_class"\
*** static const pgsql_thing_t words_after_c
*** 774,780 
  	{"TEMP", NULL, NULL, THING_NO_DROP},		/* for CREATE TEMP TABLE ... */
  	{"TEMPLATE", Query_for_list_of_ts_templates, NULL, THING_NO_SHOW},
  	{"TEXT SEARCH", NULL, NULL},
! 	{"TRIGGER", "SELECT pg_catalog.quote_ident(tgname) FROM pg_catalog.pg_trigger WHERE substring(pg_catalog.quote_ident(tgname),1,%d)='%s'"},
  	{"TYPE", NULL, &Query_for_list_of_datatypes},
  	{"UNIQUE", NULL, NULL, THING_NO_DROP},		/* for CREATE UNIQUE INDEX ... */
  	{"UNLOGGED", NULL, NULL, THING_NO_DROP},	/* for CREATE UNLOGGED TABLE
--- 791,797 
  	{"TEMP", NULL, NULL, THING_NO_DROP},		/* for CREATE TEMP TABLE ... */
  	{"TEMPLATE", Query_for_list_of_ts_templates, NULL, THING_NO_SHOW},
  	{"TEXT SEARCH", NULL, NULL},
! 	{"TRIGGER", "SELECT pg_catalog.quote_ident(tgname) FROM pg_catalog.pg_trigger WHERE substring(pg_catalog.quote_ident(tgname),1,%d)='%s' AND NOT tgisinternal"},
  	{"TYPE", NULL, &Query_for_list_of_datatypes},
  	{"UNIQUE", NULL, NULL, THING_NO_DROP},		/* for CREATE UNIQUE INDEX ... */
  	{"UNLOGGED", NULL, NULL, THING_NO_DROP},	/* for CREATE UNLOGGED TABLE
*** psql_completion(const char *text, int st
*** 1409,1414 
--- 1426,1463 
  
  		COMPLETE_WITH_LIST(list_ALTERENABLE2);
  	}
+ 	else if (pg_strcasecmp(prev5_wd, "ALTER") == 0 &&
+ 			 pg_strcasecmp(prev4_wd, "TABLE") == 0 &&
+ 			 pg_strcasecmp(prev2_wd, "ENABLE") == 0 &&
+ 			 pg_strcasecmp(prev_wd, "RULE") == 0)
+ 	{
+ 		completion_info_charp = prev3_wd;
+ 		COMPLETE_WITH_QUERY(Query_for_rule_of_table);
+ 	}
+ 	else if (pg_strcasecmp(prev6_wd, "ALTER") == 0 &&
+ 			 pg_strcasecmp(prev5_wd, "TABLE") == 0 &&
+ 			 pg_strcasecmp(prev3_wd, "ENABLE") == 0 &&
+ 			 pg_strcasecmp(prev_wd, "RULE") == 0)
+ 	{
+ 		completion_info_charp = prev4_wd;
+ 		COMPLETE_WITH_QUERY(Query_for_rule_of_table);
+ 	}
+ 	else if (pg_strcasecmp(prev5_wd, "ALTER") == 0 &&
+ 			 pg_strcasecmp(prev4_wd, "TABLE") == 0 &&
+ 			 pg_strcasecmp(prev2_wd, "ENABLE") == 0 &&
+ 			 pg_strcasecmp(prev_wd, "TRIGGER") == 0)
+ 	{
+ 		completion_info_charp = p

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-06-17 Thread MauMau


From: "Joe Conway" 

That first hunk refers to dblink -- I'm guessing it does not belong with
this patch.


Ouch, what a careless mistake.  The attached one is clean.  I'll update the 
CommitFest entry when I'm back home from work.


Regards
MauMau



pg_copy_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-17 Thread Petr Jelinek


On 17/06/14 22:32, Kevin Grittner wrote:

The release of version n doesn't mean that version n-1 is no longer
supported.  As of today, this page:

http://www.oracle.com/technetwork/server-storage/solaris10/overview/index-138972.html

says:


The Oracle Solaris 9 operating system is now in the Extended
Support period of it's life cycle. Customers with support
contracts can still access patches and log new service requests,
although support contracts carry a surcharge for Oracle Solaris 9
support.




That will change in October when the Extended Support ends and Solaris 9 
enters Sustaining Support (same as 8 is in now).



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Gavin Flower


On 18/06/14 10:05, Peter Geoghegan wrote:

On Tue, Jun 17, 2014 at 2:53 PM, Tom Lane  wrote:

I'm not proposing an immutable cutoff.  Something that scales with the
string length might be a good idea, or we could make it a multiple of
the minimum observed distance, or probably there are a dozen other things
we could do.  I'm just saying that if we have an alternative at distance
3, and another one at distance 4, it's not clear to me that we should
assume that the first one is certainly what the user had in mind.
Especially not if all the other alternatives are distance 10 or more.

The patch just looks for the match with the lowest distance, passing
the lowest observed distance so far as a "max" to the distance
calculation function. That could have some value in certain cases.
People have already raised general concerns about added cycles and/or
clutter.


How about a list of miss spellings and the likely targets.
  (grop, grap, ...) ==> (grep, grape, grope...)
type of thing? Possibly with some kind of adaptive learning algorithm.

I suspect that while this might be a useful research project, it is out 
of scope for the current discussion!



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Peter Geoghegan

On Tue, Jun 17, 2014 at 2:53 PM, Tom Lane  wrote:
> I'm not proposing an immutable cutoff.  Something that scales with the
> string length might be a good idea, or we could make it a multiple of
> the minimum observed distance, or probably there are a dozen other things
> we could do.  I'm just saying that if we have an alternative at distance
> 3, and another one at distance 4, it's not clear to me that we should
> assume that the first one is certainly what the user had in mind.
> Especially not if all the other alternatives are distance 10 or more.

The patch just looks for the match with the lowest distance, passing
the lowest observed distance so far as a "max" to the distance
calculation function. That could have some value in certain cases.
People have already raised general concerns about added cycles and/or
clutter.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-17 Thread Andres Freund

On 2014-06-17 13:32:51 -0700, Kevin Grittner wrote:
> Andres Freund  wrote:
> > On 2014-06-17 13:14:26 -0400, Robert Haas wrote:
> >> On Sat, Jun 14, 2014 at 9:12 PM, Andres Freund  
> >> wrote:
> 
> >>> 3) sparcv8: Last released model 1997.
> >>
> >> I seem to recall hearing about this in a customer situation
> >> relatively recently, so there may be a few of these still
> >> kicking around out there.
> >
> > Really? As I'd written in a reply solaris 10 (released 2005)
> > dropped support for it. Dropping support for a platform that's
> > been desupported 10 years ago by it's manufacturer doesn't sound
> > bad imo...
> 
> The release of version n doesn't mean that version n-1 is no longer
> supported.  As of today, this page:
> 
> http://www.oracle.com/technetwork/server-storage/solaris10/overview/index-138972.html
> 
> says:
> 
> > The Oracle Solaris 9 operating system is now in the Extended
> > Support period of it's life cycle. Customers with support
> > contracts can still access patches and log new service requests,
> > although support contracts carry a surcharge for Oracle Solaris 9
> > support.

Yea, I know - for another 3 months or so. But that's not actually my
point. It's rather that I think that 10 years after the vendor itself
dropped support for an old cpu version in their new major software
release we can think about dropping it too. In our newest release, not
the old ones.
To be clear I *don't* mean that we shouldn't support solaris 9
anymore. Just that we don't need to support <= 1995 CPUs. That's pre
ultrasparc...

Note that freebsd, debian
(https://www.debian.org/devel/debian-installer/News/2007/20070721),
fedora (http://fedoraproject.org/wiki/Architectures/SPARC), gentoo,
firefox, and lots of others all don't have sparcv8 support anymore.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Josh Berkus

On 06/17/2014 02:53 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> On 06/17/2014 02:36 PM, Tom Lane wrote:
>>> Another issue is whether to print only those having exactly the minimum
>>> observed Levenshtein distance, or to print everything less than some
>>> cutoff.  The former approach seems to me to be placing a great deal of
>>> faith in something that's only a heuristic.
> 
>> Well, that depends on what the cutoff is.  If it's high, like 0.5, that
>> could be a LOT of columns.  Like, I plan to test this feature with a
>> 3-table join that has a combined 300 columns.  I can completely imagine
>> coming up with a string which is within 0.5 or even 0.3 of 40 columns names.
> 
> I think Levenshtein distances are integers, though that's just a minor
> point.

I was giving distance/length ratios.  That is, 0.5 would mean that up to
50% of the characters could be replaced/changed.  0.2 would mean that
only one character could be changed at lengths of five characters.  Etc.

The problem with these ratios is that they behave differently with long
strings than short ones.  I think realistically we'd need a double
threshold, i.e. ( distance >= 2 OR ratio <= 0.4 ).  Otherwise the
obvious case, getting two characters wrong in a 4-character column name
(or one in a two character name), doesn't get a HINT.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Tom Lane

Josh Berkus  writes:
> On 06/17/2014 02:36 PM, Tom Lane wrote:
>> Another issue is whether to print only those having exactly the minimum
>> observed Levenshtein distance, or to print everything less than some
>> cutoff.  The former approach seems to me to be placing a great deal of
>> faith in something that's only a heuristic.

> Well, that depends on what the cutoff is.  If it's high, like 0.5, that
> could be a LOT of columns.  Like, I plan to test this feature with a
> 3-table join that has a combined 300 columns.  I can completely imagine
> coming up with a string which is within 0.5 or even 0.3 of 40 columns names.

I think Levenshtein distances are integers, though that's just a minor
point.

> So if we want to list everything below a cutoff, we'd need to make that
> cutoff fairly narrow, like 0.2.  But that means we'd miss a lot of
> potential matches on short column names.

I'm not proposing an immutable cutoff.  Something that scales with the
string length might be a good idea, or we could make it a multiple of
the minimum observed distance, or probably there are a dozen other things
we could do.  I'm just saying that if we have an alternative at distance
3, and another one at distance 4, it's not clear to me that we should
assume that the first one is certainly what the user had in mind.
Especially not if all the other alternatives are distance 10 or more.

> I really think we're overthinking this: it is just a HINT, and we can
> improve it in future PostgreSQL versions, and most of our users will
> ignore it anyway because they'll be using a client which doesn't display
> HINTs.

Agreed that we can make it better later.  But whether it prints exactly
one suggestion, and whether it does that no matter how silly the
suggestion is, are rather fundamental decisions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Tom Lane

Peter Geoghegan  writes:
> Maybe that's just a matter of phrasing the message appropriately. A
> more guarded message, that suggests that "foobar" is the *best* match
> is correct at least on its own terms (terms that are self evident).
> This does pretty effectively communicate to the user that they should
> totally rethink not just the column name, but perhaps the entire
> query. On the other hand, showing nothing communicates nothing.

I don't especially buy that argument.  As soon as the user's gotten used
to hints of this sort, the absence of a hint communicates plenty.

In any case, people have now cited two different systems with suggestion
capability, and neither of them behaves as you're arguing for.  The lack
of precedent should give you pause, unless you can point to widely-used
systems that do what you have in mind.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Josh Berkus

On 06/17/2014 02:36 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> (2) If there are multiple columns with the same levenschtien distance,
>> which one do you suggest?  The current code picks a random one, which
>> I'm OK with.  The other option would be to list all of the columns.
> 
> I objected to that upthread.  I don't think that picking a random one is
> sane at all.  Listing them all might be OK (I notice that that seems to be
> what both bash and git do).
> 
> Another issue is whether to print only those having exactly the minimum
> observed Levenshtein distance, or to print everything less than some
> cutoff.  The former approach seems to me to be placing a great deal of
> faith in something that's only a heuristic.

Well, that depends on what the cutoff is.  If it's high, like 0.5, that
could be a LOT of columns.  Like, I plan to test this feature with a
3-table join that has a combined 300 columns.  I can completely imagine
coming up with a string which is within 0.5 or even 0.3 of 40 columns names.

So if we want to list everything below a cutoff, we'd need to make that
cutoff fairly narrow, like 0.2.  But that means we'd miss a lot of
potential matches on short column names.

I really think we're overthinking this: it is just a HINT, and we can
improve it in future PostgreSQL versions, and most of our users will
ignore it anyway because they'll be using a client which doesn't display
HINTs.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-17 Thread Michael Paquier

On Wed, Jun 18, 2014 at 1:40 AM, Robert Haas  wrote:
> On Mon, Jun 2, 2014 at 8:55 AM, Michael Paquier
>  wrote:
> I'm not sure if this is reasonably possible, but one thing that would
> make this tool a whole lot easier to use would be if you could make
> all the magic happen in a single server.  For example, suppose you had
> a background process that somehow got access to the pre and post
> images for every buffer change, and the associated WAL record, and
> tried applying the WAL record to the pre-image to see whether it got
> the corresponding post-image.  Then you could run 'make check' or so
> and afterwards do something like psql -c 'SELECT * FROM
> wal_replay_problems()' and hopefully get no rows back.
So your point is to have a 3rd independent server in the process that
would compare images taken from a master and its standby? Seems to
complicate the machinery.

> Don't get me wrong, having this tool at all sounds great.  But I think
> to really get the full benefit out of it we need to be able to run it
> in the buildfarm, so that if people break stuff it gets noticed
> quickly.
The patch I sent has included a regression test suite making the tests
rather facilitated: that's only a matter of running actually "make
check" in the contrib repository containing the binary able to compare
buffer captures between a master and a standby.

Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Tom Lane

Josh Berkus  writes:
> (2) If there are multiple columns with the same levenschtien distance,
> which one do you suggest?  The current code picks a random one, which
> I'm OK with.  The other option would be to list all of the columns.

I objected to that upthread.  I don't think that picking a random one is
sane at all.  Listing them all might be OK (I notice that that seems to be
what both bash and git do).

Another issue is whether to print only those having exactly the minimum
observed Levenshtein distance, or to print everything less than some
cutoff.  The former approach seems to me to be placing a great deal of
faith in something that's only a heuristic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Peter Geoghegan

On Tue, Jun 17, 2014 at 1:59 PM, Tom Lane  wrote:
> Yeah, that's my point exactly.  There's no very good reason to assume that
> the intended answer is in fact among the set of column names we can see;
> and if it *is* there, the Levenshtein distance to it isn't going to be
> all that large.  I think that suggesting "foobar" when the user typed
> "glorp" is not only not helpful, but makes us look like idiots.

Maybe that's just a matter of phrasing the message appropriately. A
more guarded message, that suggests that "foobar" is the *best* match
is correct at least on its own terms (terms that are self evident).
This does pretty effectively communicate to the user that they should
totally rethink not just the column name, but perhaps the entire
query. On the other hand, showing nothing communicates nothing.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-06-17 Thread Josh Berkus

On 06/17/2014 09:14 AM, Robert Haas wrote:
> Well, I don't know: suppose you're loading geospatial data showing the
> location of every building in some country.  It might easily be the
> case that the data is or can be loaded in an order that provides
> pretty good spatial locality, leading to tight bounding boxes over
> physically consecutive data ranges.

I admin a production application which has exactly this.  However, that
application doesn't have big enough data to benefit from minmax indexes;
it uses the basic spatial indexes.

So, my $0.02: bounding box minmax falls under the heading of "would be
nice to have, but not if it delays the feature".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Josh Berkus

On 06/17/2014 01:59 PM, Tom Lane wrote:
> Robert Haas  writes:
>> Emitting a suggestion with a large distance seems like it could be
>> rather irritating.  If the user types in SELECT prodct_id FROM orders,
>> and that column does not exist, suggesting "product_id", if such a
>> column exists, will likely be well-received.  Suggesting a column
>> named, say, "price", however, will likely make at least some users say
>> "no I didn't mean that you stupid @%!#" - because probably the issue
>> there is that the user selected from the completely wrong table,
>> rather than getting 6 of the 9 characters they typed incorrect.
> 
> Yeah, that's my point exactly.  There's no very good reason to assume that
> the intended answer is in fact among the set of column names we can see;
> and if it *is* there, the Levenshtein distance to it isn't going to be
> all that large.  I think that suggesting "foobar" when the user typed
> "glorp" is not only not helpful, but makes us look like idiots.

Well, there's two different issues:

(1) offering a suggestion which is too different from what the user
typed.  This is easily limited by having a max distance (most likely a
distance/length ratio, with a max of say, 0.5).  The only drawback of
this would be the extra cpu cycles to calculate it, and some arguments
about what the max distance should be.  But for the sake of the
children, let's not have a GUC for it.

(2) If there are multiple columns with the same levenschtien distance,
which one do you suggest?  The current code picks a random one, which
I'm OK with.  The other option would be to list all of the columns.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Kevin Grittner

Tom Lane  wrote:

> I wouldn't necessarily hold up git as a model of user interface
> engineering ;-) ... but still, it might be interesting to take a
> look at exactly what heuristics they used here.  I'm sure there
> are other precedents we could look at, too.

On my Ubuntu machine, bash does something similar.  A few examples
chosen completely arbitrarily:

kgrittn@Kevin-Desktop:~$ got
No command 'got' found, did you mean:
 Command 'go' from package 'golang-go' (universe)
 Command 'gout' from package 'scotch' (universe)
 Command 'jot' from package 'athena-jot' (universe)
 Command 'go2' from package 'go2' (universe)
 Command 'git' from package 'git' (main)
 Command 'gpt' from package 'gpt' (universe)
 Command 'gom' from package 'gom' (universe)
 Command 'goo' from package 'goo' (universe)
 Command 'gst' from package 'gnu-smalltalk' (universe)
 Command 'dot' from package 'graphviz' (main)
 Command 'god' from package 'god' (universe)
 Command 'god' from package 'ruby-god' (universe)
got: command not found
kgrittn@Kevin-Desktop:~$ groupad
No command 'groupad' found, did you mean:
 Command 'groupadd' from package 'passwd' (main)
 Command 'groupd' from package 'cman' (main)
groupad: command not found
kgrittn@Kevin-Desktop:~$ asdf
No command 'asdf' found, did you mean:
 Command 'asdfg' from package 'aoeui' (universe)
 Command 'sadf' from package 'sysstat' (main)
 Command 'sdf' from package 'sdf' (universe)
asdf: command not found
kgrittn@Kevin-Desktop:~$ zxcv
zxcv: command not found

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btreecheck extension

2014-06-17 Thread Peter Geoghegan

On Tue, Jun 17, 2014 at 1:16 PM, Robert Haas  wrote:
> I don't feel qualified to comment on any of the substantive issues you
> raise, so instead I'd like to bikeshed the name.  I suggest that we
> create one extension to be a repository for index-checking machinery
> (and perhaps also heap-checking machinery) and view this as the first
> of possibly several checkers to live there.  Otherwise, we may
> eventually end up with separate extensions for btreecheck, hashcheck,
> gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck,
> toastcheck, etc. which seems like excessive namespace pollution.

I agree.

I hope that we'll eventually be able to move code like this into each
AM, with something like an amverifyintegrity pg_am entry optionally
provided. There'd also be a utility statement that would perform this
kind of verification. It seems natural to do this, as the patch I've
posted arguably adds a big modularity violation. Besides, it seems
worthwhile to pepper the regular regression tests with calls like
these, at least in some places, and putting something in core is the
most convenient way to do that.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jun 17, 2014 at 12:51 AM, Peter Geoghegan  wrote:
>> I disagree. I happen to think that making some guess is better than no
>> guess at all here, given the fact that there aren't too many
>> possibilities to choose from.

> Emitting a suggestion with a large distance seems like it could be
> rather irritating.  If the user types in SELECT prodct_id FROM orders,
> and that column does not exist, suggesting "product_id", if such a
> column exists, will likely be well-received.  Suggesting a column
> named, say, "price", however, will likely make at least some users say
> "no I didn't mean that you stupid @%!#" - because probably the issue
> there is that the user selected from the completely wrong table,
> rather than getting 6 of the 9 characters they typed incorrect.

Yeah, that's my point exactly.  There's no very good reason to assume that
the intended answer is in fact among the set of column names we can see;
and if it *is* there, the Levenshtein distance to it isn't going to be
all that large.  I think that suggesting "foobar" when the user typed
"glorp" is not only not helpful, but makes us look like idiots.

> One existing tool that does something along these lines is 'git',
> which seems to have some kind of a heuristic to know when to give up:

I wouldn't necessarily hold up git as a model of user interface
engineering ;-) ... but still, it might be interesting to take a look
at exactly what heuristics they used here.  I'm sure there are other
precedents we could look at, too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-17 Thread Robert Haas

On Tue, Jun 17, 2014 at 4:39 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> We could do better by accounting for memory usage ourselves, inside
>> the memory-context system, but that'd probably impose some overhead we
>> don't have today.
>
> Hm.  We could minimize the overhead if we just accounted for entire
> malloc chunks and not individual palloc allocations.  That would make
> the overhead not zero, but yet probably small enough to ignore.

Yeah, although it might expose more details of aset.c's allocation
behavior than we want users to have to know about.

> Another idea would be to do the enforcement ourselves by means of
> measuring the change in "sbrk(0)"'s reported value since startup, which we
> could check whenever, say, we're about to request a large malloc chunk in
> aset.c.  I'm not sure about the cost of that function though, nor about
> whether this measurement method is meaningful in modern systems.

I wouldn't like to count on that method.  I believe some malloc()
implementations implement large allocations via mmap().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-17 Thread Robert Haas

On Tue, Jun 17, 2014 at 12:51 AM, Peter Geoghegan  wrote:
> On Mon, Jun 16, 2014 at 8:56 PM, Tom Lane  wrote:
>> Not having looked at the patch, but: I think the probability of
>> useless-noise HINTs could be substantially reduced if the code prints a
>> HINT only when there is a single available alternative that is clearly
>> better than the others in Levenshtein distance.  I'm not sure how much
>> better is "clearly better", but I exclude "zero" from that.  I see that
>> the original description of the patch says that it will arbitrarily
>> choose one alternative when there are several with equal Levenshtein
>> distance, and I'd say that's a bad idea.
>
> I disagree. I happen to think that making some guess is better than no
> guess at all here, given the fact that there aren't too many
> possibilities to choose from. I think that it might be particularly
> annoying to not show some suggestion in the event of a would-be
> ambiguous column reference where the column name is itself wrong,
> since both mistakes are common. For example, "order_id" was specified
> instead of one of either "o.orderid" or "ol.orderid", as in my
> original examples. If some correct alias was specified, that would
> make the new code prefer the appropriate Var, but it might not be, and
> that should be okay in my view.
>
> I'm not trying to remove the need for human judgement here. We've all
> heard stories about people who did things like input "Portland" into
> their GPS only to end up in Maine rather than Oregon, but I think in
> general you can only go so far in worrying about those cases.

Emitting a suggestion with a large distance seems like it could be
rather irritating.  If the user types in SELECT prodct_id FROM orders,
and that column does not exist, suggesting "product_id", if such a
column exists, will likely be well-received.  Suggesting a column
named, say, "price", however, will likely make at least some users say
"no I didn't mean that you stupid @%!#" - because probably the issue
there is that the user selected from the completely wrong table,
rather than getting 6 of the 9 characters they typed incorrect.

One existing tool that does something along these lines is 'git',
which seems to have some kind of a heuristic to know when to give up:

[rhaas pgsql]$ git gorp
git: 'gorp' is not a git command. See 'git --help'.

Did you mean this?
grep
[rhaas pgsql]$ git goop
git: 'goop' is not a git command. See 'git --help'.

Did you mean this?
grep
[rhaas pgsql]$ git good
git: 'good' is not a git command. See 'git --help'.
[rhaas pgsql]$ git puma
git: 'puma' is not a git command. See 'git --help'.

Did you mean one of these?
pull
push

I suspect that the maximum useful distance is a function of the string
length.  Certainly, if the distance is greater than or equal to the
length of one of the strings involved, it's just a totally unrelated
string and thus not worth suggesting.  A useful heuristic might be
something like "distance at most 3, or at most half the string length,
whichever is less".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-17 Thread Tom Lane

Robert Haas  writes:
> We could do better by accounting for memory usage ourselves, inside
> the memory-context system, but that'd probably impose some overhead we
> don't have today.

Hm.  We could minimize the overhead if we just accounted for entire
malloc chunks and not individual palloc allocations.  That would make
the overhead not zero, but yet probably small enough to ignore.

On the other hand, this approach would entirely fail to account for
non-palloc'd allocations, which could be a significant issue in some
contexts.

I wonder how practical it would be to forestall Noah's scenario by
preallocating all the stack space we want before enabling the rlimit.

Another idea would be to do the enforcement ourselves by means of
measuring the change in "sbrk(0)"'s reported value since startup, which we
could check whenever, say, we're about to request a large malloc chunk in
aset.c.  I'm not sure about the cost of that function though, nor about
whether this measurement method is meaningful in modern systems.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-17 Thread Kevin Grittner

Andres Freund  wrote:
> On 2014-06-17 13:14:26 -0400, Robert Haas wrote:
>> On Sat, Jun 14, 2014 at 9:12 PM, Andres Freund  
>> wrote:

>>> 3) sparcv8: Last released model 1997.
>>
>> I seem to recall hearing about this in a customer situation
>> relatively recently, so there may be a few of these still
>> kicking around out there.
>
> Really? As I'd written in a reply solaris 10 (released 2005)
> dropped support for it. Dropping support for a platform that's
> been desupported 10 years ago by it's manufacturer doesn't sound
> bad imo...

The release of version n doesn't mean that version n-1 is no longer
supported.  As of today, this page:

http://www.oracle.com/technetwork/server-storage/solaris10/overview/index-138972.html

says:

> The Oracle Solaris 9 operating system is now in the Extended
> Support period of it's life cycle. Customers with support
> contracts can still access patches and log new service requests,
> although support contracts carry a surcharge for Oracle Solaris 9
> support.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Audit of logout

2014-06-17 Thread Robert Haas

On Mon, Jun 16, 2014 at 4:14 PM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Fujii Masao  writes:
>> > That's harmful for audit purpose. I think that we should make
>> > log_disconnections PGC_SUSET rather than PGC_BACKEND in order
>> > to forbid non-superusers from changing its setting. Attached
>> > patch does this.
>
> I tend to agree with this- those are things which regular users really
> shouldn't be able to modify.  Policy enforcement can be done when there
> isn't system enforcement, but you really don't want to fall back to
> policy enforcement when you don't need to.

+1.

>> TBH, this complaint seems entirely artificial.  If somebody needs to audit
>> disconnections, and they see a connection record with no disconnection
>> record but the session's no longer there, they certainly know that a
>> disconnection occurred.  And if there wasn't a server crash to explain it,
>> they go slap the wrist of whoever violated corporate policy by turning off
>> log_disconnections.  Why do we need system-level enforcement of this?
>
> Going through every log file, and writing something to address log file
> changes, to hunt for those cases is no small amount of effort which
> you're asking every administrator with an audit requirement to write
> code to do to provide something which we make it appear as if we're
> doing for them.  It's certainly a violation of POLA that users can
> decide on their own that their disconnections don't get logged.

+1.

>> I wonder whether we should just get rid of log_disconnections as a
>> separate variable, instead logging disconnections when log_connections
>> is set.
>>
>> Another answer is to make both variables PGC_SIGHUP, on the grounds
>> that it doesn't make much sense for them not to be applied system-wide;
>> except that I think there was some idea that logging might be enabled
>> per-user or per-database using ALTER ROLE/DATABASE.
>
> Both of these options look pretty reasonable to me as a way to improve
> the current situation.  I'm not really sure that I see the use-case for
> only logging connections/disconnections on a per-user or per-database
> basis; in my experience it's not a lot of traffic to log it all and I
> don't recall ever seeing anyone set those anything other than
> system-wide.
>
> I like the idea of flexibility in what's logged, I just don't see this
> specific case as really needing it.

I don't really like either of these ideas much, and I don't really see
the problem with Fujii Masao's original proposal.  It's true that some
installations may find it valuable to preserve the property that a
disconnect is logged whenever they connect is logged, but if that's
really what's at issue, then presumably the superuser is not going to
be flipping these settings on and off on the fly *anyway*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-17 Thread Robert Haas

On Mon, Jun 16, 2014 at 10:16 PM, Noah Misch  wrote:
> On Sat, Jun 14, 2014 at 10:37:36AM -0400, Tom Lane wrote:
>> After giving somebody advice, for the Nth time, to install a
>> memory-consumption ulimit instead of leaving his database to the tender
>> mercies of the Linux OOM killer, it occurred to me to wonder why we don't
>> provide a built-in feature for that, comparable to the "ulimit -c max"
>> option that already exists in pg_ctl.
>
> In principle, I would like to have such a feature.  The only limit I've found
> reliable on the occasions I wanted this was RLIMIT_AS; RLIMIT_DATA has been
> ineffective on Linux+GNU libc.  RLIMIT_AS has its own problems, of course:
> address space usage is only tenuously connected to the definition of "memory
> consumption" folks actually want.  Worse, one can often construct a query to
> crash an RLIMIT_AS-affected backend.  Make the query use heap space until just
> shy of the limit, then burn stack space until RLIMIT_AS halts stack growth.

Ouch.  Having a feature that causes excessive memory utilization to
produce an ERROR that halts query execution and returns us to the top
level, as Tom and Greg were proposing, sounds nice, though even there
I wonder what happens if the memory exhaustion is due to things like
relcache bloat which will not be ameliorated by error recovery.  But
having a feature that crashes the backend in similar circumstances
doesn't sound nearly as nice.

We could do better by accounting for memory usage ourselves, inside
the memory-context system, but that'd probably impose some overhead we
don't have today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] btreecheck extension

2014-06-17 Thread Robert Haas

On Mon, Jun 16, 2014 at 9:47 PM, Peter Geoghegan  wrote:
> As discussed at the developer meeting at pgCon, I think that there is
> a lot to be said for a tool that checks nbtree index invariants on
> live systems.

Me too.

> Attached prototype patch adds contrib extension, btreecheck.

I don't feel qualified to comment on any of the substantive issues you
raise, so instead I'd like to bikeshed the name.  I suggest that we
create one extension to be a repository for index-checking machinery
(and perhaps also heap-checking machinery) and view this as the first
of possibly several checkers to live there.  Otherwise, we may
eventually end up with separate extensions for btreecheck, hashcheck,
gistcheck, gincheck, spgistcheck, minmaxcheck, vodkacheck, heapcheck,
toastcheck, etc. which seems like excessive namespace pollution.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2014-06-17 Thread Robert Haas

On Sat, Jun 14, 2014 at 7:56 PM, Kevin Grittner  wrote:
> I looked at the standard, and initially tried to implement the
> standard syntax for this; however, it appeared that the reasons
> given for not using standard syntax for the row variables also
> apply to the transition relations (the term used by the standard).
> There isn't an obvious way to tie that in to all the PLs we
> support.  It could be done, but it seems like it would intolerably
> ugly, and more fragile than what we have done so far.

I'm not too familiar with this area.  Can you describe what the
standard syntax for the row variables is, as opposed to our syntax?
Also, what's the standard syntax for the the transition relations?

> Some things which I *did* follow from the standard: these new
> relations are only allowed within AFTER triggers, but are available
> in both AFTER STATEMENT and AFTER ROW triggers.  That is, an AFTER
> UPDATE ... FOR EACH ROW trigger could use both the OLD and NEW row
> variables as well as the delta relations (under whatever names we
> pick).  That probably won't be used very often, but I can imagine
> some cases where it might be useful.  I expect that these will
> normally be used in FOR EACH STATEMENT triggers.

I'm concerned about the performance implications of capturing the
delta relations unconditionally.  If a particular trigger actually
needs the delta relations, then the time spent capturing that
information is well spent; but if it doesn't, then it's a waste.
There are lots of people using FOR EACH STATEMENT triggers right now
who won't be happy if those start requiring O(n) additional temporary
storage, where n is the number of rows modified by the statement.
This is likely an even bigger issue for per-row triggers, of course,
where as you say, it probably won't be used often.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-17 Thread Alvaro Herrera

Bruce Momjian wrote:
> On Tue, Jun 17, 2014 at 12:28:46PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Uh, I think pg_upgrade needs to check that they match too.
> > 
> > Possibly.  What do you think it should do when examining a pg_control
> > version that lacks the field?
> 
> Good question.  I have existing cases where fields were removed, but not
> ones that were added.  As we have no way to query the old cluster's
> value for LOBLKSIZE, I think I will just add code to compare them if
> they _both_ exist.

Can't you compare it to the historic default value?  I mean, add an
assumption that people thus far has never tweaked it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Robert Haas

On Mon, Jun 16, 2014 at 1:15 AM, Stephen Frost  wrote:
>> I'm not referring to the proposed implementation particularly; or at
>> least not that aspect of it.  I don't think trying to run the view
>> quals as the defining user is likely to be very appealing, because I
>> think it's going to hurt performance, for example by preventing
>> function inlining and requiring lots of user-ID switches.
>
> I understand that there are performance implications.  As mentioned to
> Tom, realistically, there's zero way to optimized at least some of these
> use-cases because they require a completely external module (eg:
> SELlinux) to be involved in the decision about who can view what
> records.  If we can optimize that, it'd be by a completely different
> approach whereby we pull up the qual higher because we know the whole
> query only involves leakproof functions or similar, allowing us to only
> apply the filter to the final set of records prior to them being
> returned to the user.  The point being that such optimizations would
> happen independently and regardless of the quals or user-defined
> functions involved.  At the end of the day, I can't think of a better
> optimization for such a case (where we have to ask an external security
> module if a row is acceptable to return to the user) than that.  Is
> there something specific you're thinking about that we'd be missing out
> on?

Yeah, if we have to ask an external security module a question for
each row, there's little hope of any real optimization.  However, I
think there will be a significant number of cases where people will
want filtering clauses that can be realized by doing an index scan
instead of a sequential scan, and if we end up forcing a sequential
scan anyway, the feature will be useless to those people.

>> But I'm not
>> gonna complain if someone wants to mull it over and make a proposal
>> for how to make it work.  Rather, my concern is that all we've got is
>> what might be called the core of the feature; the actual guts of it.
>> There are a lot of ancillary details that seem to me to be not worked
>> out at all yet, or only half-baked.
>
> Perhaps it's just my experience, but I've been focused on the main core
> feature for quite some time and it feels like we're really close to
> having it there.  I agree that a few additional bits would be nice to
> have but these strike me as relatively straight-forward to implement
> overtop of this general construct.  I do see value in documenting these
> concerns and will see about making that happen, along with what the
> general viewpoints and thoughts are about how to address the concern.

I feel like there's quite a bit of work left to do around these
issues.  The technical bits may not be too hard, but deciding what we
want will take some thought and discussion.

>> > The current approach allows a nearly unlimited level of flexibility,
>> > should the user wish it, by being able to run user-defined code.
>> > Perhaps that would be considered 'one policy', but it could certainly
>> > take under consideration the calling user, the object being queried
>> > (if a function is defined per table, or if we provide a way to get
>> > that information in the function), etc.
>>
>> In theory, that's true.  But in practice, performance will suck unless
>> the security qual is easily optimizable.  If your security qual is
>> WHERE somecomplexfunction() you're going to have to implement that by
>> sequential-scanning the table and evaluating the function for each
>> row.
>
> That's not actualy true today, is it?  Given our leak-proof attribute,
> if the qual is "WHERE somecomplexfunction() AND leakprooffunctionx()"
> then we would be able to push down the leak-proof function and not
> necessairly run a straight sequential scan, no?  Even so, though, we've
> had users who have tested exactly what this patch implements and they've
> been happy with their real-world use-cases.  I'm certainly all for
> optimization and would love to see us make this better for everyone, but
> I don't view that as a reason to delay this particular feature which is
> really just bringing us up to parity with other RDMBS's.

I'm a bit confused here, because your example seems to be totally
different from my example.  In my example, somecomplexfunction() will
get pushed down because it's the security qual; that needs to be
inside the security_barrier view, or a malicious user can subvert the
system by getting some other qual evaluated first.  In your example,
you seem to be imagining WHERE somecomplexfunction() AND
leakprooffunctionx() as queries sent by the untrusted user, in which
case, yet, the leak-proof one will get pushed down and the other one
will not.

>> But that's probably not going to perform very well, because to match
>> an index on sales_rep_id, or an index on partner_id, that's going to
>> have to get simplified a whole lot, and that's probably not going to
>> happen.  If we've only got one branch of the OR, I think we

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Stephen Frost

Robert,

On Tuesday, June 17, 2014, Robert Haas  wrote:
>
> After sending that one (1) email, I was promptly told that "I'm very
> disappointed to hear that the mechanical pieces around making RLS easy
> for users to use ... is receiving such push-back."  The push-back, at
> that point in time, consisted of one (1) email.  Several more emails
> have been sent that time, including the above-quoted text, seeming to
> me to imply that the people who are concerned about this feature are
> being unreasonable.  I don't believe I am the only such person,
> although I may be the main one right at the moment, and you may not be
> entirely surprised to hear that I don't think I'm being unreasonable.

I'm on my phone at the moment but that looks like a quote from me. My email
and concern there was regarding the specific suggestion that we could check
off the "RLS" capability which users have been asking us to provide nearly
since I started with PG by saying that they could use Updatable SB views. I
did not intend it as a comment regarding the specific technical concerns
raised and have been responding to and trying to address those
independently and openly.

I've expressed elsewhere on this thread my gratitude that the technical
concerns are being brought up now, near the beginning of the cycle, so we
can address them. I've been working with others who are interested in RLS
on a wiki page to outline and understand the options and identify
dependencies and priorities. Hopefully the link will be posted shortly
(again, not at a computer right now) and we can get comments back. There
are some very specific questions which really need to be addressed and
which I've mentioned before (in particular the question of what user the
functions in a view definition should run as, both for "normal" views, for
SB views, and for when an RLS qual is included and run through that
framework, and if doing so would address some of the concerns which have
been raised regarding selects running code).

Thanks,

Stephen

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Robert Haas

On Thu, Jun 12, 2014 at 8:13 PM, Stephen Frost  wrote:
>> I'm in full agreement we should clearly communicate the issues around
>> pg_dump in particular, because they can't necessarily be eliminated
>> altogether without some major work that's going to take a while to finish.
>> And if the work-around is some sort of GUC for killing RLS altogether,
>> that's ugly but not unacceptable to me as a short-term fix.
>
> A GUC which is enable / disable / error-instead may work quiet well, with
> error-instead for pg_dump default if people really want it (there would have
> to be a way to disable that though, imv).
>
> Note that enable is default in general, disable would be for superuser only
> (or on start-up) to disable everything, and error-instead anyone could use
> but it would error instead of implementing RLS when querying an RLS-enabled
> table.
>
> This approach was suggested by an existing user testing out this RLS
> approach, to be fair, but it looks pretty sane to me as a way to address
> some of these concerns. Certainly open to other ideas and thoughts though.

In general, I agree that this is a good approach.  I think it will be
difficult to have a GUC with three values, one of which is
superuser-only and the other two of which are not.  I don't think
there's any precedent for something like that in the existing
framework, and I think it's likely we'll run into unpleasant corner
cases if we try to graft it in.  Also, I think we need to separate
things: whether the system is willing to allow the user to access the
table without RLS, and whether the user is willing to accept RLS if
the system deems it necessary.

For the first one, two solutions have been proposed.  The initial
proposal was to insist on RLS except for the superuser (and maybe the
table owner?).  Having a separate grantable privilege, as Dean
suggests, may be better.  I'll reply separately to that email also, as
I have a question about what he's proposing.

For the second one, I think the two most useful behaviors are "normal
mode" - i.e. allow access to the table, applying RLS predicates if
required and not applying them if I am exempt - and "error-instead"
mode - i.e. if my access to this table would be mediated by an RLS
predicate, then throw an error instead.  There's a third mode which
might be useful as well, which is "even though I have the *right* to
bypass the RLS predicate on this table, please apply the predicate
anyway".  This could be used by the table owner in testing, for
example.  Here again, the level of granularity we want to provide is
an interesting question.  Having a GUC (e.g. enable_row_level_security
= on, off, force) would be adequate for pg_dump, but allowing the
table name to be qualified in the query, as proposed downthread, would
be more granular, admittedly at some parser cost.  I'm personally of
the view that we *at least* need the GUC, because that seems like the
best way to secure pg_dump, and perhaps other applications.  We can
and should give pg_dump an--allow-row-level-security flag, I think,
but pg_dump's default behavior should be to configure the system in
such a way that the dump will fail rather than complete with a subset
of the data.  I'm less sure whether we should have something that can
be used to qualify table names in particular queries.  I think it
would be really useful, but I'm concerned that it will require
creating additional fully-reserved keywords, which are somewhat
painful for users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Robert Haas

On Fri, Jun 13, 2014 at 3:11 AM, Dean Rasheed  wrote:
> Yeah, I was thinking something like this could work, but I would go
> further. Suppose you had separate GRANTable privileges for direct
> access to individual tables, bypassing RLS, e.g.
>
>   GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name

So, is this one new privilege (DIRECT) or four separate new privileges
that are variants of the existing privileges (DIRECT SELECT, DIRECT
INSERT, DIRECT UPDATE, DIRECT DELETE)?

> Actually, given the fact that the majority of users won't be using
> RLS, I would be tempted to invert the above logic and have the new
> privilege be for LIMITED access (via RLS quals). So a user granted the
> normal SELECT privilege would be able to bypass RLS, but a user only
> granted LIMITED SELECT wouldn't.

Well, for the people who are not using RLS, there's no difference
anyway.  I think it matters more what users of RLS will expect from a
command like GRANT SELECT ... and I'm guessing they'll prefer that RLS
always apply unless they very specifically grant the right for RLS to
not apply.  I might be wrong, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-17 Thread Robert Haas

On Thu, Jun 12, 2014 at 6:33 PM, Gregory Smith  wrote:
> I'm kind of surprised to see this turn into a hot button all of the sudden
> though, because my thought on all that so far has been a giant so what?
> This is what PostgreSQL does.
[...]
> But let's not act like RLS is a scary bogeyman because it introduces a new
> way to hack the server or get surprising side-effects.  That's expected and
> possibly unavoidable behavior in a feature like this, and there are much
> worse instances of arbitrary function risk throughout the core code already.

I have some technical comments on later emails in this thread, but
first let me address this point.  In the past, people have sometimes
complained that reviewers waited until very late in the cycle to
complain about issues which they found problematic.  By the time the
issues were pointed out, insufficient time remained before feature
freeze to get those issues addressed, causing the patch to slip out of
the release and provoking developer frustration.  It has therefore
been requested numerous times by numerous people that potential issues
be raised as early as possible.

The concerns that I have raised in this thread are not new; I have
raised them before.  However, we are now at the beginning of a new
development cycle, and it seems fair to assume that the people who are
working on this patch are hoping very much that something will get
committed to 9.5.  Since it seems to me that there are unaddressed
issues with the design of this patch, I felt that it was a good idea
to make sure that those concerns were on the table right from the
beginning of the process, rather than waiting until the patch was on
the verge of commit or, indeed, already committed.  That is why, when
this thread was revived on June 10th, I decide that it was a good time
to again comment on the design points about which I was concerned.

After sending that one (1) email, I was promptly told that "I'm very
disappointed to hear that the mechanical pieces around making RLS easy
for users to use ... is receiving such push-back."  The push-back, at
that point in time, consisted of one (1) email.  Several more emails
have been sent that time, including the above-quoted text, seeming to
me to imply that the people who are concerned about this feature are
being unreasonable.  I don't believe I am the only such person,
although I may be the main one right at the moment, and you may not be
entirely surprised to hear that I don't think I'm being unreasonable.

I will admit that my initial email may have contained just a touch of
hyperbole.  But I won't admit to more than a touch, and frankly, I
think it was warranted.  I perfectly well understand that people
really, really, really want this feature, and if I hadn't understood
that before, I certainly understand it now.  However, I believe that
there has been a lack of focus in the development of the patch thus
far in a couple of key areas - first in terms of articulating how it
is different from and better than a writeable security barrier view,
and second on how to manage the security and operational aspects of
having a feature like this.  I think that the discussion subsequent to
my June 10th email has let to some good discussion on both points,
which was my intent, but I still think much more time and thought
needs to be spent on those issues if we are to have a feature which is
up to our usual standards.  I do apologize to anyone who interpreted
that initial as a pure rant, because it really wasn't intended that
way.  Contrariwise, I hope that the people defending this patch will
admit that the issues I am raising are real and focus on whether and
how those concerns can be addressed.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-06-17 Thread Claudio Freire

On Tue, Jun 17, 2014 at 8:47 AM, Abhijit Menon-Sen  wrote:
> if (compress_backup_block = BACKUP_BLOCK_COMPRESSION_SNAPPY)


You mean == right?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-06-17 Thread Claudio Freire

On Tue, Jun 17, 2014 at 1:04 PM, Andres Freund  wrote:
> For me minmax indexes are helpful because they allow to generate *small*
> 'coarse' indexes over large volumes of data. From my pov that's possible
> possible because they don't contain item pointers for every contained
> row.


But minmax is just a specific form of bloom filter.

This could certainly be generalized to a bloom filter index with some
set of bloom&hashing operators (minmax being just one).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-06-17 Thread Andres Freund

On 2014-06-17 12:14:00 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 12:04 PM, Andres Freund  
> wrote:
> >> Well, I'm not the guy who does things with geometric data, but I don't
> >> want to ignore the significant percentage of our users who are.  As
> >> you must surely know, the GIST implementations for geometric data
> >> types store bounding boxes on internal pages, and that seems to be
> >> useful to people.  What is your reason for thinking that it would be
> >> any less useful in this context?
> >
> > For me minmax indexes are helpful because they allow to generate *small*
> > 'coarse' indexes over large volumes of data. From my pov that's possible
> > possible because they don't contain item pointers for every contained
> > row.
> > That'ill imo work well if there are consecutive rows in the table that
> > can be summarized into one min/max range. That's quite likely to happen
> > for common applications of number of scalar datatypes. But the
> > likelihood of placing sufficiently many rows with very similar bounding
> > boxes close together seems much less relevant in practice. And I think
> > that's generally likely for operations which can't be well represented
> > as btree opclasses - the substructure that implies inside a Datum will
> > make correlation between consecutive rows less likely.
> 
> Well, I don't know: suppose you're loading geospatial data showing the
> location of every building in some country.  It might easily be the
> case that the data is or can be loaded in an order that provides
> pretty good spatial locality, leading to tight bounding boxes over
> physically consecutive data ranges.

Well, it might be doable to correlate them along one axis, but along
both?  That's more complicated... And even alongside one axis you
already get into problems if your geometries are irregularly sized.
Asingle large polygon will completely destroy indexability for anything
stored physically close by because suddently the minmax range will be
huge... So you'll need to cleverly sort for that as well.

I think hierarchical datastructures are so much better suited for this,
that there's little point trying to fit them into minmax. I can very
well imagine that there's benefit in a gist support for only storing one
pointer per block instead of one pointer per item or such. But it seems
like separate feature.

> But I'm not trying to say that we absolutely have to support that kind
> of thing; what I am trying to say is that there should be a README or
> a mailing list post or some such that says: "We thought about how
> generic to make this.  We considered A, B, and C.  We rejected C as
> too narrow, and A because if we made it that general it would have
> greatly enlarged the disk footprint for the following reasons.
> Therefore we selected B."

Isn't 'simpler implementation' a valid reason that's already been
discussed onlist? Obviously simpler implementation doesn't trump
everything, but it's one valid reason...
Note that I have zap to do with the design of this feature. I work for
the same company as Alvaro, but that's pretty much it...

> Basically, I think Heikki asked a good
> question - which was "could we abstract this more?" - and I can't
> recall seeing a clear answer explaining why we could or couldn't and
> what the trade-offs would be.

'could we abstract more' imo is a pretty bad design guideline. It's 'is
there benefit in abstracting more'. Otherwise you end up with way to
complicated systems.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-17 Thread Andres Freund

On 2014-06-17 13:14:26 -0400, Robert Haas wrote:
> On Sat, Jun 14, 2014 at 9:12 PM, Andres Freund  wrote:
> > At this year developer's meeting we'd discussed the atomics abstraction
> > which is necessary for some future improvements. We'd concluded that a
> > overview over the hardware capabilities of the supported platforms would
> > be helpful. I've started with that at:
> > https://wiki.postgresql.org/wiki/Atomics
> 
> That looks like a great start.  It could use a key explaining what the
> columns mean.  Here are some guesses, and some comments.
> 
> single-copy r/w atomicity = For what word sizes does this platform
> have the ability to do atomic reads and writes of values?

Yes. It's a term found in literature, I'm happy to replace it by
something else. It's essentially defined to mean that a read after one
or more writes will only be influenced by one of the last writes, not a
mixture of them. I.e. whether there ever will be intermediate states
visible.

> I don't
> understand how "1" can ever NOT be supported - does any architecture
> write data to memory a nibble at a time?

Actually there seem to be some where that's possible for some operations
(VAX). But the concern is more whether 1 byte can actually be written
without also writing neighbouring values. I.e. there's hardware out
there that'll implement a 1byte store as reading 4 bytes, changing one
of the bytes in a register, and then write the 4 bytes out again. Which
would mean that a neighbouring write will possibly cause a wrong value
to be written...

> I also don't really
> understand the *** footnote; how can you do kernel emulation of an
> atomic load or store.

What happens is that gcc will do a syscall triggering the kernel to turn
of scheduling; perform the math and store the result; turn scheduling on
again. That way there cannot be a other operation influencing the
calculation/store. Imagine if you have hardware that, internally, only
does stores in 4 byte units. Even if it's a single CPU machine, which
most of those are, the kernel could schedule a separate process after
the first 4bytes have been written. Oops. The kernel has ways to prevent
that, userspace doesn't...

> TAS = Does this platform support test and set?  Items in parentheses
> are the instructions that implement this.

> cmpxchg = Does this platform support compare and swap?  If so, for
> what word sizes?  I assume "n" means "not supported at all" and "y"
> means "supported but we don't know for which word sizes".  Maybe call
> this column "CAS" or "Compare and Swap" rather than using an
> Intel-ism.

Ok.

> gcc support from version = Does this mean the version from which GCC
> supported the architecture, or from which it supported atomics (which
> ones?) on that architecture?

It means from which version on gcc has support for the __sync intrinsics
on that platform. I've only added versions where support for all atomics
of the supported sizes were available.
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins

Once these answers satisfy you I'll adapt the wiki.

> > Does somebody want other columns in there?
> 
> I think the main question at the developer meeting was how far we want
> to go with supporting primitives like atomic add, atomic and, atomic
> or, etc.  So I think we should add columns for those.

Well, once CAS is available, atomic add etc is all trivially
implementable - without further hardware support. It might be more
efficient to use the native instruction (e.g. xadd can be much better
than a cmpxchg loop because there's no retries), but that's just
optimization that won't matter unless you have a fair bit of
concurrency.

There's currently fallbacks like:
#ifndef PG_HAS_ATOMIC_FETCH_ADD_U32
#define PG_HAS_ATOMIC_FETCH_ADD_U32
STATIC_IF_INLINE uint32
pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 add_)
{
uint32 old;
while (true)
{
old = pg_atomic_read_u32_impl(ptr);
if (pg_atomic_compare_exchange_u32_impl(ptr, &old, old + add_))
break;
}
return old;
}

> > 3) sparcv8: Last released model 1997.
> 
> I seem to recall hearing about this in a customer situation relatively
> recently, so there may be a few of these still kicking around out
> there.

Really? As I'd written in a reply solaris 10 (released 2005) dropped
support for it. Dropping support for a platform that's been desupported
10 years ago by it's manufacturer doesn't sound bad imo...

> > 4) i386: Support dropped from windows 98 (yes, really), linux, openbsd
> >(yes, really), netbsd (yes, really). No code changes needed.
> 
> Wow, OK.  In that case, yeah, let's dump it.  But let's make sure we
> adequately document that someplace in the code comments, along with
> the reasons, because not everyone may realize how dead it is.

I'm generally wondering how to better document the supported os/platform
combinations. E.g. it's not apparent

Re: [HACKERS] 9.5 CF1

2014-06-17 Thread Alvaro Herrera

Abhijit Menon-Sen wrote:

> I find it hard to believe that gmail is incapable of threading messages
> using In-Reply-To/References header fields, especially given that mail
> subjects are changed all the time in the normal course of events. But
> I'll take your word for it and reply to the original message (as I've
> always done) without changing the Subject for any reviews I post.

They do it -- I experimented with it awhile ago.  Gmail threads emails
perfectly well, but it seems they break threads on purpose when the
subject changes.  Apparently, more people is used to starting a new
thread by replying to an existing email and changing to a new subject,
than people "tweak" subjects by adding tags and such.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Atomics hardware support table & supported architectures

2014-06-17 Thread Robert Haas

On Sat, Jun 14, 2014 at 9:12 PM, Andres Freund  wrote:
> At this year developer's meeting we'd discussed the atomics abstraction
> which is necessary for some future improvements. We'd concluded that a
> overview over the hardware capabilities of the supported platforms would
> be helpful. I've started with that at:
> https://wiki.postgresql.org/wiki/Atomics

That looks like a great start.  It could use a key explaining what the
columns mean.  Here are some guesses, and some comments.

single-copy r/w atomicity = For what word sizes does this platform
have the ability to do atomic reads and writes of values?  I don't
understand how "1" can ever NOT be supported - does any architecture
write data to memory a nibble at a time?  I also don't really
understand the *** footnote; how can you do kernel emulation of an
atomic load or store.
TAS = Does this platform support test and set?  Items in parentheses
are the instructions that implement this.
cmpxchg = Does this platform support compare and swap?  If so, for
what word sizes?  I assume "n" means "not supported at all" and "y"
means "supported but we don't know for which word sizes".  Maybe call
this column "CAS" or "Compare and Swap" rather than using an
Intel-ism.
gcc support from version = Does this mean the version from which GCC
supported the architecture, or from which it supported atomics (which
ones?) on that architecture?

> Does somebody want other columns in there?

I think the main question at the developer meeting was how far we want
to go with supporting primitives like atomic add, atomic and, atomic
or, etc.  So I think we should add columns for those.

> From that and previous discussions
> (e.g. 
> http://archives.postgresql.org/message-id/20131013004658.GG4056218%40alap2.anarazel.de
> ) I think we should definitely remove some platforms:
> 1) VAX. Production ceased 1997. We've never supported OpenVMS, just
> netbsd (and probably openbsd)

Based on the letter you linked from the Wiki page, I am inclined to
agree.  The letter lists the final ship date for the last surviving
type of VAX as December 31, 2000, with support ending in 2010.  It
seems unlikely that anyone will wish to run a version of PostgreSQL
released in 2015 on such hardware (and if they do, we can make that
their problem rather than ours).

> 2) m32r. Our implementation depends on an *unmerged* glibc header. The
> website was dead for several *years*, even after the restore patches
> can't be downloaded anymore (404).

That is also an excellent argument for deprecation.

> 3) sparcv8: Last released model 1997.

I seem to recall hearing about this in a customer situation relatively
recently, so there may be a few of these still kicking around out
there.

> 4) i386: Support dropped from windows 98 (yes, really), linux, openbsd
>(yes, really), netbsd (yes, really). No code changes needed.

Wow, OK.  In that case, yeah, let's dump it.  But let's make sure we
adequately document that someplace in the code comments, along with
the reasons, because not everyone may realize how dead it is.

> 5) pa-risc.

This is a bit less dead than the other ones; support for existing
system only stopped at the end of 2013.  But I don't personally have a
reason to keep it alive.

> 6) armv-v5

I think this is also a bit less dead than the other ones; Red Hat's
shows Bugzilla shows people filing bugs for platform-specific problems
as recently as January of 2013:

https://bugzilla.redhat.com/show_bug.cgi?id=892378

> Note that this is *not* a requirement for the atomics abstraction - it
> now has a fallback to spinlocks if atomics aren't available.

That seems great.  Hopefully with a configure option to disable
atomics so that it's easy to test the fallback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-17 Thread Bruce Momjian

On Tue, Jun 17, 2014 at 12:28:46PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Uh, I think pg_upgrade needs to check that they match too.
> 
> Possibly.  What do you think it should do when examining a pg_control
> version that lacks the field?

Good question.  I have existing cases where fields were removed, but not
ones that were added.  As we have no way to query the old cluster's
value for LOBLKSIZE, I think I will just add code to compare them if
they _both_ exist.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-06-17 Thread Tom Lane

Noah Misch  writes:
> Here's a fix to make the MSVC build process account for the addition of
> HAVE_UUID_OSSP.  (None of the MSVC buildfarm members enable uuid-ossp.)

Looks reasonable.  I'm unable to test this scenario, but if you have,
please commit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 CF1

2014-06-17 Thread Abhijit Menon-Sen

At 2014-06-17 12:47:19 -0400, alvhe...@2ndquadrant.com wrote:
>
> > > P.S. If you tag your reviews with [REVIEW] in the Subject, it'll
> > > be easier to keep track of them.
> > 
> > I and, I believe, various other people hate that style, because at
> > least in Gmail, it breaks the threading.  It is much easier to find
> > things if they are all posted on one thread.
> 
> Yes, please don't do that.  A simple, normal reply to the message that
> submits the patch is much better from my point of view as a subsequent
> reviewer and committer.

I find it hard to believe that gmail is incapable of threading messages
using In-Reply-To/References header fields, especially given that mail
subjects are changed all the time in the normal course of events. But
I'll take your word for it and reply to the original message (as I've
always done) without changing the Subject for any reviews I post.

Only one other person took my suggestion so far, but if anyone else is
tempted to do the same: consider the recommendation withdrawn.

-- Abhijit

P.S. I also realise now that some people may not {,be able to} change
the Subject of a message without creating an entirely new thread.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 CF1

2014-06-17 Thread Andres Freund

On 2014-06-17 12:47:19 -0400, Alvaro Herrera wrote:
> Robert Haas wrote:
> > On Mon, Jun 16, 2014 at 2:36 AM, Abhijit Menon-Sen  
> > wrote:
> > > P.S. If you tag your reviews with [REVIEW] in the Subject, it'll be
> > > easier to keep track of them.
> > 
> > I and, I believe, various other people hate that style, because at
> > least in Gmail, it breaks the threading.  It is much easier to find
> > things if they are all posted on one thread.
> 
> Yes, please don't do that.  A simple, normal reply to the message that
> submits the patch is much better from my point of view as a subsequent
> reviewer and committer.

Very much agreed. Especially as many patches have several reviews over
the course of their integration. I think the separate thread suggestion
was made by some former CF manager?

I think it's sometimes, for larger/hotly debated patches, useful to
start anew at significant new version though... 300 message deep threads
get unwiedly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 CF1

2014-06-17 Thread Tom Lane

Alvaro Herrera  writes:
> Robert Haas wrote:
>> On Mon, Jun 16, 2014 at 2:36 AM, Abhijit Menon-Sen  
>> wrote:
>>> P.S. If you tag your reviews with [REVIEW] in the Subject, it'll be
>>> easier to keep track of them.

>> I and, I believe, various other people hate that style, because at
>> least in Gmail, it breaks the threading.  It is much easier to find
>> things if they are all posted on one thread.

> Yes, please don't do that.  A simple, normal reply to the message that
> submits the patch is much better from my point of view as a subsequent
> reviewer and committer.

Worth noting also is that Magnus is working on a new version of the
commitfest app that will be able to automatically keep track of threads
about patches --- so long as they *are* threads according to our mailing
list archives.  I'm not sure if the archives recognize replies with a
changed Subject: as being the same thread or not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-17 Thread Andres Freund

On 2014-06-17 12:07:04 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 10:33 AM, Petr Jelinek  wrote:
> > On 17/06/14 16:18, Robert Haas wrote:
> >> On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek 
> >> wrote:
> >>> attached is a simple patch which makes it possible to change the system
> >>> identifier of the cluster in pg_control. This is useful for
> >>> individualization of the instance that is started on top of data
> >>> directory
> >>> produced by pg_basebackup - something that's helpful for logical
> >>> replication
> >>> setup where you need to easily identify each node (it's used by
> >>> Bidirectional Replication for example).
> >>
> >>
> >> I can clearly understand the utility of being able to reset the system
> >> ID to a new, randomly-generated system ID - but giving the user the
> >> ability to set a particular value of their own choosing seems like a
> >> pretty sharp tool.  What is the use case for that?

I've previously hacked this up adhoc during data recovery when I needed
to make another cluster similar enough that I could replay WAL.

Another usecase is to mark a database as independent from its
origin. Imagine a database that gets sharded across several
servers. It's not uncommon to do that by initially basebackup'ing the
database to several nodes and then use them separately from
thereon. It's quite useful to actually mark them as being
distinct. Especially as several of them right now would end up with the
same timeline id...

> But it seems to me that we might need to have a process discussion
> here, because, while I'm all in favor of incremental feature proposals
> that build towards a larger goal, it currently appears that the larger
> goal toward which you are building is not something that's been
> publicly discussed and debated on this list.  And I really think we
> need to have that conversation.  Obviously, individual patches will
> still need to be debated, but I feel like 2ndQuadrant is trying to
> construct a castle without showing the community the floor plan.  I
> believe that there is relatively broad agreement that we would all
> like a castle, but different people may have legitimately different
> ideas about how it should be constructed.  If the work arrives as a
> series of disconnected pieces (user-specified system ID, event
> triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has
> to take it on faith that those pieces are going to eventually fit
> together in a way that we'll all be happy with.  In some cases, that's
> fine, because the feature is useful on its own merits whether it ends
> up being part of the castle or not.
>

Uh. Right now this patch has been written because it's needed for a out
of core replication solution. That's what BDR is at this point. The
patch is unobtrusive, has other usecases than just our internal one and
doesn't make pg_resetxlog even more dangerous than it already is. I
don't see much problem with considering it on it's own cost/benefit?

So this seems to be a concern that's relatively independent of this
patch. Am I seing that right?

I think one very important point here is that BDR is *not* the proposed
in core solution. I think a reasonable community perspective - besides
also being useful on it's own - is to view it as a *prototype* for a in
core solution. And e.g. logical decoding would have looked much worse -
and likely not have been integrated - without externally already being
used for BDR.

I'm not sure how we can ease or even resolve your conerns when talking
about pretty independent and general pieces of functionality like the
DDL even trigger stuff. We needed to actually *write* those to see how
BDR will look like. And the communities feedback heavily influenced how
BDR looks like by accepting some pieces, demanding others, and outright
rejecting the remainder.

I think there's some pieces that need to consider them on their own
merit. Logical decoding is useful on it's own. The ability for out of
core systems to do DDL replication is another piece (that you referred
to above).
I think the likelihood of success if we were to try to design a in-core
system from ground up first and then follow through prety exactly along
those lines is minimal.

So, what I think we can do is to continue trying to build independent,
generally useful bits. Which imo all the stuff that's been integrated
is. Then, somewhat soon I think, we'll have to come up with a proposal
how the parts that are *not* necessarily useful outside of in-core
logical rep. might look like. Which will likely trigger some long long
discussions that turn that design around a couple of times. Which is
fine. I *don't* think that's going to be a trimmed down version of
todays BDR.

> But in other cases, like this one, if the premise that the slot name
> should match the system identifier isn't something the community wants
> to accept, then taking a patch that lets people do that is probably a
> bad idea, because at least one person will use it to

Re: [HACKERS] 9.5 CF1

2014-06-17 Thread Alvaro Herrera

Robert Haas wrote:
> On Mon, Jun 16, 2014 at 2:36 AM, Abhijit Menon-Sen  
> wrote:
> > P.S. If you tag your reviews with [REVIEW] in the Subject, it'll be
> > easier to keep track of them.
> 
> I and, I believe, various other people hate that style, because at
> least in Gmail, it breaks the threading.  It is much easier to find
> things if they are all posted on one thread.

Yes, please don't do that.  A simple, normal reply to the message that
submits the patch is much better from my point of view as a subsequent
reviewer and committer.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 CF1

2014-06-17 Thread Robert Haas

On Mon, Jun 16, 2014 at 2:36 AM, Abhijit Menon-Sen  wrote:
> P.S. If you tag your reviews with [REVIEW] in the Subject, it'll be
> easier to keep track of them.

I and, I believe, various other people hate that style, because at
least in Gmail, it breaks the threading.  It is much easier to find
things if they are all posted on one thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2014-06-17 Thread Robert Haas

On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer  wrote:
> On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:
>> Yeah. To recap, the failure mode is that if the master crashes and
>> restarts, the transaction becomes visible in the master even though it
>> was never replicated.
>
> Wouldn't another pg_clog bit for the transaction be able to sort that out?

How?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL replay bugs

2014-06-17 Thread Robert Haas

On Mon, Jun 2, 2014 at 8:55 AM, Michael Paquier
 wrote:
> On Wed, Apr 23, 2014 at 9:43 PM, Heikki Linnakangas
>  wrote:
>> And here is the tool itself. It consists of two parts:
>>
>> 1. Modifications to the backend to write the page images
>> 2. A post-processing tool to compare the logged images between master and
>> standby.
> Having that into Postgres at the disposition of developers would be
> great, and I believe that it would greatly reduce the occurrence of
> bugs caused by WAL replay during recovery. So, with the permission of
> the author, I have been looking at this facility for a cleaner
> integration into Postgres.

I'm not sure if this is reasonably possible, but one thing that would
make this tool a whole lot easier to use would be if you could make
all the magic happen in a single server.  For example, suppose you had
a background process that somehow got access to the pre and post
images for every buffer change, and the associated WAL record, and
tried applying the WAL record to the pre-image to see whether it got
the corresponding post-image.  Then you could run 'make check' or so
and afterwards do something like psql -c 'SELECT * FROM
wal_replay_problems()' and hopefully get no rows back.

Don't get me wrong, having this tool at all sounds great.  But I think
to really get the full benefit out of it we need to be able to run it
in the buildfarm, so that if people break stuff it gets noticed
quickly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD

2014-06-17 Thread Noah Misch

On Tue, May 27, 2014 at 07:46:41PM -0400, Tom Lane wrote:
> Pushed; thanks for working on this!

Here's a fix to make the MSVC build process account for the addition of
HAVE_UUID_OSSP.  (None of the MSVC buildfarm members enable uuid-ossp.)

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index c0d7f38..f7a5abb 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -214,6 +214,7 @@ s{PG_VERSION_STR "[^"]+"}{__STRINGIFY(x) #x\n#define 
__STRINGIFY2(z) __STRINGIFY
 
if ($self->{options}->{uuid})
{
+   print O "#define HAVE_UUID_OSSP\n";
print O "#define HAVE_UUID_H\n";
}
if ($self->{options}->{xml})

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-17 Thread Tom Lane

Bruce Momjian  writes:
> Uh, I think pg_upgrade needs to check that they match too.

Possibly.  What do you think it should do when examining a pg_control
version that lacks the field?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory deallocation after calling cast function

2014-06-17 Thread Tom Lane

Andres Freund  writes:
> On 2014-06-17 21:09:25 +0530, Abhijit Menon-Sen wrote:
>> At 2014-06-17 11:32:37 -0400, br...@momjian.us wrote:
>>> Uh, what is replacing SRFs?  CTEs?

>> I don't think Tom was referring to SRFs in general, only putting them
>> directly into the targetlist of a SELECT.

> And the primary reason for using SRFs in the targetlist has gone away
> due to LATERAL.

Exactly.  LATERAL does what you usually want from a SRF in the
targetlist, and it doesn't have bizarre semantics when you use more than
one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL support to define multi variables once

2014-06-17 Thread Robert Haas

On Fri, Jun 13, 2014 at 11:57 AM, David Johnston
 wrote:
>> That's not the reading I want, and it's not the reading you want either,
>> but there is nothing in the existing text that justifies single
>> evaluation.  So I think we'd be well advised to sit on our hands until
>> the committee clarifies that.  It's not like there is some urgent reason
>> to have this feature.
>
> Agreed.
>
> I don't suppose there is any support or prohibition on the :
>
> one,two,three integer := generate_series(1,3);
>
> interpretation...not that I can actually come up with a good use case that
> wouldn't be better implemented via a loop in the main body.

Based on these comments and the remarks by Alvaro and Andres, I think
it's clear that we should reject this patch.  The number of patches
that get through with -1 votes from 3 committers is very small, if not
zero.  While I like the feature in the abstract, I agree with Tom that
it would be better to wait until we have more clarity about what the
semantics are supposed to be.

I will update the CommitFest app accordingly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_control is missing a field for LOBLKSIZE

2014-06-17 Thread Bruce Momjian

On Wed, Jun  4, 2014 at 06:57:31PM -0400, Tom Lane wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> There are at least two places in inv_api.c where we have
> >> "Assert(pagelen <= LOBLKSIZE)" that is protecting a subsequent memcpy
> >> into a local variable of size LOBLKSIZE, so that the only thing standing
> >> between us and a stack-smash security issue that's trivially exploitable
> >> in production builds is that on-disk data conforms to our expectation
> >> about LOBLKSIZE.  I think it's definitely worth promoting these checks
> >> to regular runtime-if-test-and-elog.
> 
> > Agreed.  Promoting that to a run-time check seems well worth it to me.
> 
> Here's a draft patch for this.  Barring objections I'll commit the whole
> thing to HEAD, and the inv_api.c changes to the back branches as well.

Uh, I think pg_upgrade needs to check that they match too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-06-17 Thread Robert Haas

On Tue, Jun 17, 2014 at 12:04 PM, Andres Freund  wrote:
>> Well, I'm not the guy who does things with geometric data, but I don't
>> want to ignore the significant percentage of our users who are.  As
>> you must surely know, the GIST implementations for geometric data
>> types store bounding boxes on internal pages, and that seems to be
>> useful to people.  What is your reason for thinking that it would be
>> any less useful in this context?
>
> For me minmax indexes are helpful because they allow to generate *small*
> 'coarse' indexes over large volumes of data. From my pov that's possible
> possible because they don't contain item pointers for every contained
> row.
> That'ill imo work well if there are consecutive rows in the table that
> can be summarized into one min/max range. That's quite likely to happen
> for common applications of number of scalar datatypes. But the
> likelihood of placing sufficiently many rows with very similar bounding
> boxes close together seems much less relevant in practice. And I think
> that's generally likely for operations which can't be well represented
> as btree opclasses - the substructure that implies inside a Datum will
> make correlation between consecutive rows less likely.

Well, I don't know: suppose you're loading geospatial data showing the
location of every building in some country.  It might easily be the
case that the data is or can be loaded in an order that provides
pretty good spatial locality, leading to tight bounding boxes over
physically consecutive data ranges.

But I'm not trying to say that we absolutely have to support that kind
of thing; what I am trying to say is that there should be a README or
a mailing list post or some such that says: "We thought about how
generic to make this.   We considered A, B, and C.  We rejected C as
too narrow, and A because if we made it that general it would have
greatly enlarged the disk footprint for the following reasons.
Therefore we selected B."  Basically, I think Heikki asked a good
question - which was "could we abstract this more?" - and I can't
recall seeing a clear answer explaining why we could or couldn't and
what the trade-offs would be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-17 Thread Robert Haas

On Tue, Jun 17, 2014 at 10:33 AM, Petr Jelinek  wrote:
> On 17/06/14 16:18, Robert Haas wrote:
>> On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek 
>> wrote:
>>> attached is a simple patch which makes it possible to change the system
>>> identifier of the cluster in pg_control. This is useful for
>>> individualization of the instance that is started on top of data
>>> directory
>>> produced by pg_basebackup - something that's helpful for logical
>>> replication
>>> setup where you need to easily identify each node (it's used by
>>> Bidirectional Replication for example).
>>
>>
>> I can clearly understand the utility of being able to reset the system
>> ID to a new, randomly-generated system ID - but giving the user the
>> ability to set a particular value of their own choosing seems like a
>> pretty sharp tool.  What is the use case for that?
>
> Let's say you want to initialize new logical replication node via
> pg_basebackup and you want your replication slots to be easily identifiable
> so you use your local system id as part of the slot name.
>
> In that case you need to know the future system id of the node because you
> need to register the slot before consistent point to which you replay via
> streaming replication (and you can't replay anymore once you changed the
> system id). Which means you need to generate your system id in advance and
> be able to change it in pg_control later.

Hmm.  I guess that makes sense.

But it seems to me that we might need to have a process discussion
here, because, while I'm all in favor of incremental feature proposals
that build towards a larger goal, it currently appears that the larger
goal toward which you are building is not something that's been
publicly discussed and debated on this list.  And I really think we
need to have that conversation.  Obviously, individual patches will
still need to be debated, but I feel like 2ndQuadrant is trying to
construct a castle without showing the community the floor plan.  I
believe that there is relatively broad agreement that we would all
like a castle, but different people may have legitimately different
ideas about how it should be constructed.  If the work arrives as a
series of disconnected pieces (user-specified system ID, event
triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has
to take it on faith that those pieces are going to eventually fit
together in a way that we'll all be happy with.  In some cases, that's
fine, because the feature is useful on its own merits whether it ends
up being part of the castle or not.

But in other cases, like this one, if the premise that the slot name
should match the system identifier isn't something the community wants
to accept, then taking a patch that lets people do that is probably a
bad idea, because at least one person will use it to set the system
identifier of a system to a value that enables physical replication to
take place when that is actually totally unsafe, and we don't want to
enable that for no reason.  Maybe the slot name should match the
replication identifier rather than the standby system ID, for example.
There are conflicting proposals for how replication identifiers should
work, but one of those proposals limits it to 16 bits.  If we're going
to have multiple identifiers floating around anyway, I'd rather have a
slot called 7 than one called 6024402925054484590.  On the other hand,
maybe there's going to be a new proposal to use the database system
identifier as a replication identifier, which might be a fine idea and
which would demolish that argument.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-06-17 Thread Andres Freund

On 2014-06-17 11:48:10 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 10:31 AM, Andres Freund  
> wrote:
> > On 2014-06-17 10:26:11 -0400, Robert Haas wrote:
> >> On Sat, Jun 14, 2014 at 10:34 PM, Alvaro Herrera
> >>  wrote:
> >> > Robert Haas wrote:
> >> >> On Wed, Sep 25, 2013 at 4:34 PM, Alvaro Herrera
> >> >>  wrote:
> >> >> > Here's an updated version of this patch, with fixes to all the bugs
> >> >> > reported so far.  Thanks to Thom Brown, Jaime Casanova, Erik Rijkers 
> >> >> > and
> >> >> > Amit Kapila for the reports.
> >> >>
> >> >> I'm not very happy with the use of a separate relation fork for
> >> >> storing this data.
> >> >
> >> > Here's a new version of this patch.  Now the revmap is not stored in a
> >> > separate fork, but together with all the regular data, as explained
> >> > elsewhere in the thread.
> >>
> >> Cool.
> >>
> >> Have you thought more about this comment from Heikki?
> >>
> >> http://www.postgresql.org/message-id/52495dd3.9010...@vmware.com
> >
> > Is there actually a significant usecase behind that wish or just a
> > general demand for being generic? To me it seems fairly unlikely you'd
> > end up with something useful by doing a minmax index over bounding
> > boxes.
> 
> Well, I'm not the guy who does things with geometric data, but I don't
> want to ignore the significant percentage of our users who are.  As
> you must surely know, the GIST implementations for geometric data
> types store bounding boxes on internal pages, and that seems to be
> useful to people.  What is your reason for thinking that it would be
> any less useful in this context?

For me minmax indexes are helpful because they allow to generate *small*
'coarse' indexes over large volumes of data. From my pov that's possible
possible because they don't contain item pointers for every contained
row.
That'ill imo work well if there are consecutive rows in the table that
can be summarized into one min/max range. That's quite likely to happen
for common applications of number of scalar datatypes. But the
likelihood of placing sufficiently many rows with very similar bounding
boxes close together seems much less relevant in practice. And I think
that's generally likely for operations which can't be well represented
as btree opclasses - the substructure that implies inside a Datum will
make correlation between consecutive rows less likely.

Maybe I've a major intuition failure here though...

> I do also think that a general demand for being generic ought to carry
> some weight.

Agreed. It's always a balance act. But it's not like this doesn't use a
datatype abstraction concept...

> We have gone to great lengths to make sure that our
> indexing can handle more than just < and >, where a lot of other
> products have not bothered.  I think we have gotten a lot of mileage
> out of that decision and feel that we shouldn't casually back away
> from it.

I don't see this as a case of backing away from that though?

> we shouldn't accept a less-generic
> approach blindly, without questioning whether it's possible to do
> better.

But the aim shouldn't be to add genericity that's not going to be used,
but to add it where it's somewhat likely to help...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-06-17 Thread Robert Haas

On Tue, Jun 17, 2014 at 10:31 AM, Andres Freund  wrote:
> On 2014-06-17 10:26:11 -0400, Robert Haas wrote:
>> On Sat, Jun 14, 2014 at 10:34 PM, Alvaro Herrera
>>  wrote:
>> > Robert Haas wrote:
>> >> On Wed, Sep 25, 2013 at 4:34 PM, Alvaro Herrera
>> >>  wrote:
>> >> > Here's an updated version of this patch, with fixes to all the bugs
>> >> > reported so far.  Thanks to Thom Brown, Jaime Casanova, Erik Rijkers and
>> >> > Amit Kapila for the reports.
>> >>
>> >> I'm not very happy with the use of a separate relation fork for
>> >> storing this data.
>> >
>> > Here's a new version of this patch.  Now the revmap is not stored in a
>> > separate fork, but together with all the regular data, as explained
>> > elsewhere in the thread.
>>
>> Cool.
>>
>> Have you thought more about this comment from Heikki?
>>
>> http://www.postgresql.org/message-id/52495dd3.9010...@vmware.com
>
> Is there actually a significant usecase behind that wish or just a
> general demand for being generic? To me it seems fairly unlikely you'd
> end up with something useful by doing a minmax index over bounding
> boxes.

Well, I'm not the guy who does things with geometric data, but I don't
want to ignore the significant percentage of our users who are.  As
you must surely know, the GIST implementations for geometric data
types store bounding boxes on internal pages, and that seems to be
useful to people.  What is your reason for thinking that it would be
any less useful in this context?

I do also think that a general demand for being generic ought to carry
some weight.  We have gone to great lengths to make sure that our
indexing can handle more than just < and >, where a lot of other
products have not bothered.  I think we have gotten a lot of mileage
out of that decision and feel that we shouldn't casually back away
from it.  Obviously, we do already have some special-case
optimizations and will likely have more in the future, and there are
can certainly be valid reasons for taking that approach. But it needs
to be justified in some way; we shouldn't accept a less-generic
approach blindly, without questioning whether it's possible to do
better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory deallocation after calling cast function

2014-06-17 Thread Andres Freund

On 2014-06-17 21:09:25 +0530, Abhijit Menon-Sen wrote:
> At 2014-06-17 11:32:37 -0400, br...@momjian.us wrote:
> >
> > > SRFs in the SELECT targetlist tend to leak memory; this is not easily
> > > fixable, and nobody is likely to try hard considering the feature's on
> > > the edge of deprecation anyhow.
> > 
> > Uh, what is replacing SRFs?  CTEs?
> 
> I don't think Tom was referring to SRFs in general, only putting them
> directly into the targetlist of a SELECT.

And the primary reason for using SRFs in the targetlist has gone away
due to LATERAL. It's now possible to pass data from tables to a SRF,
that previously wasn't possibly unless you'd used a SRF in the targetlist.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory deallocation after calling cast function

2014-06-17 Thread Abhijit Menon-Sen

At 2014-06-17 11:32:37 -0400, br...@momjian.us wrote:
>
> > SRFs in the SELECT targetlist tend to leak memory; this is not easily
> > fixable, and nobody is likely to try hard considering the feature's on
> > the edge of deprecation anyhow.
> 
> Uh, what is replacing SRFs?  CTEs?

I don't think Tom was referring to SRFs in general, only putting them
directly into the targetlist of a SELECT.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-06-17 Thread Matheus de Oliveira

Hi Hackers,

I was facing a situation were we wanted to set temp_tablespaces to a
tablespace on a ephemeral disk (yes, it is AWS ephemeral disk), and I know
many users have faced the same situation. Although it seems safe to create
a tablespace on ephemeral disks if you use it to store only temporary files
(either created by queries or temp tables), PostgreSQL does not have such
information, and can't safely prevent a careless user of creating a
non-temporary relation on this tablespace.

Also, in case of a lost of this tablespace, its PG_TEMP_FILES_DIR should be
created by hand after recovering. That is fine actually, but many users may
not even noticed that.

So, what you guys think about letting PG know somehow that a tablespace is
temporary?

I have took some small time to make a PoC just to see if that is doable.
And so I did a new syntax like:

CREATE TABLESPACE spcname [TEMP | TEMPORARY] LOCATION ...

So, if TEMP or TEMPORARY is present, I mark a new column at pg_tablespace
as true. On every table creation or moving to a new tablespace, I just
check this, and fails if the tablespace is "temporary" but the
"relpersistence" says the table is not.

The other part is, every time some query or relation is asked to be stored
on this tablespace, I create (on-demand) the PG_TEMP_FILES_DIR inside of it
(also if it is temporary).

The attached patch (and also on my Github, [1]), shows the PoC. For now,
I'm not worried about the code quality, there are yet a lot of work to be
done there (like ALTER TABLESPACE, better testing, use relcache, etc...),
and it is my first hacking on PG (so I'm a newbie). But I'd like to hear
from you guys if such feature is desirable and if I could starting working
on that for real. Also some thoughts about better way of implementing it.

[1] https://github.com/matheusoliveira/postgres/compare/my_temptablespace

Waiting for thoughts on that.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
*** a/src/backend/commands/dbcommands.c
--- b/src/backend/commands/dbcommands.c
***
*** 388,399  createdb(const CreatedbStmt *stmt)
--- 388,406 
  			aclcheck_error(aclresult, ACL_KIND_TABLESPACE,
  		   tablespacename);
  
+ 
  		/* pg_global must never be the default tablespace */
  		if (dst_deftablespace == GLOBALTABLESPACE_OID)
  			ereport(ERROR,
  	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
    errmsg("pg_global cannot be used as default tablespace")));
  
+ 		/* can't create a database on temporary tablespace */
+ 		if (is_tablespace_storage_temporary(dst_deftablespace))
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+   errmsg("cannot create a database on a tablespace in temporary storage")));
+ 
  		/*
  		 * If we are trying to change the default tablespace of the template,
  		 * we require that the template not have any files in the new default
***
*** 1083,1088  movedb(const char *dbname, const char *tblspcname)
--- 1090,1101 
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("pg_global cannot be used as default tablespace")));
  
+ 	/* can't create a database on temporary tablespace */
+ 	if (is_tablespace_storage_temporary(dst_tblspcoid))
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 			  errmsg("cannot create a database on a tablespace in temporary storage")));
+ 
  	/*
  	 * No-op if same tablespace
  	 */
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***
*** 432,437  DefineIndex(Oid relationId,
--- 432,446 
  		   get_tablespace_name(tablespaceId));
  	}
  
+ 	/* Can't save relations on temporary tablespace */
+ 	if (rel->rd_rel->relpersistence != RELPERSISTENCE_TEMP &&
+ 		is_tablespace_storage_temporary(OidIsValid(tablespaceId) ? tablespaceId : MyDatabaseTableSpace))
+ 	{
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 			  errmsg("cannot save relation on a tablespace in temporary storage")));
+ 	}
+ 
  	/*
  	 * Force shared indexes into the pg_global tablespace.  This is a bit of a
  	 * hack but seems simpler than marking them in the BKI commands.  On the
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 523,528  DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId)
--- 523,537 
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
   errmsg("only shared relations can be placed in pg_global tablespace")));
  
+ 	/* Can't save relations on temporary tablespace */
+ 	if (stmt->relation->relpersistence != RELPERSISTENCE_TEMP &&
+ 		is_tablespace_storage_temporary(OidIsValid(tablespaceId) ? tablespaceId : MyDatabaseTableSpace))
+ 	{
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 			  errmsg("cannot save relation on a tablespace in temporary storage")));
+ 	}
+ 
  	/* Identify user ID that will own the table */
  	if (!OidIs

Re: [HACKERS] Memory deallocation after calling cast function

2014-06-17 Thread Bruce Momjian

On Tue, Jun  3, 2014 at 03:59:45PM -0400, Tom Lane wrote:
> Soroosh Sardari  writes:
> > I have problem with memory deallocation. look at the following queries
> 
> > 1- create table test01(a) as select generate_series(1,1)::int8 ;
> 
> Do it as, eg,
> 
> create table test01(a) as select g::int8 from generate_series(1,1) g;
> 
> SRFs in the SELECT targetlist tend to leak memory; this is not easily
> fixable, and nobody is likely to try hard considering the feature's on
> the edge of deprecation anyhow.

Uh, what is replacing SRFs?  CTEs?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-06-17 Thread Bruce Momjian

On Tue, Jun  3, 2014 at 01:21:51AM -0700, Peter Geoghegan wrote:
> On Sun, May 4, 2014 at 5:46 AM, Bruce Momjian  wrote:
> > Feedback expected and welcomed.
> 
> One item currently reads "Improve valgrind error reporting".  I
> suggest this be changed to "Add support for Valgrind memcheck memory
> error detector". It was possible to use the tool before, but the lack
> of cooperation from Postgres made this far less useful.

I have applied the attached patch to improve this.  I didn't use "memory
error" because it is not checking for errors in the memory, but rather
invalid memory usage.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/release-9.4.sgml b/doc/src/sgml/release-9.4.sgml
new file mode 100644
index d9f9828..5868529
*** a/doc/src/sgml/release-9.4.sgml
--- b/doc/src/sgml/release-9.4.sgml
***
*** 2229,2235 
  

 
! Improve Valgrind error reporting (Noah Misch)
 

  
--- 2229,2236 
  

 
! Improve Valgrind detection of invalid memory usage
! (Noah Misch)
 

  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2014-06-17 Thread Greg Stark

On Tue, Jun 17, 2014 at 3:31 PM, Andres Freund  wrote:
> Is there actually a significant usecase behind that wish or just a
> general demand for being generic? To me it seems fairly unlikely you'd
> end up with something useful by doing a minmax index over bounding
> boxes.

Isn't min/max just a 2d bounding box? If you do a bulk data load of
something like the census data then sure, every page will have data
points for some geometrically clustered set of data.

I had in mind to do a small bloom filter per block. In general any
kind of predicate like bounding box should work.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Andres Freund

On 2014-06-17 20:47:51 +0530, Amit Kapila wrote:
> On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund 
> wrote:
> > On 2014-06-17 18:01:58 +0530, Amit Kapila wrote:
> > > On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund 
> > > > On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
> > I unfortunately still can't follow.
> 
> You have followed it pretty well as far as I can understand from your
> replies, as there is no reproducible test (which I think is bit tricky to
> prepare), so it becomes difficult to explain by theory.

I'm working an updated patch that moves the releaseOK into the
spinlocks. Maybe that's the problem already - it's certainly not correct
as is.

> > If Session-1 woke up some previous
> > waiter the woken up process will set releaseOK to true again when it
> > loops to acquire the lock?
> 
> You are right, it will wakeup the existing waiters, but I think the
> new logic has one difference which is that it can allow the backend to
> take Exclusive lock when there are already waiters in queue.  As per
> above example even though Session-2 and Session-3 are in wait
> queue, Session-4 will be able to acquire Exclusive lock which I think
> was previously not possible.

I think that was previously possible as well - in a slightly different
set of circumstances though. After a process releases a lock and wakes
up some of several waiters another process can come in and acquire the
lock. Before the woken up process gets scheduled again. lwlocks aren't
fair locks...

> > Somewhat unrelated:
> >
> > I have a fair amount of doubt about the effectiveness of the releaseOK
> > logic (which imo also is pretty poorly documented).
> > Essentially its intent is to avoid unneccessary scheduling when other
> > processes have already been woken up (i.e. releaseOK has been set to
> > false). I believe the theory is that if any process has already been
> > woken up it's pointless to wake up additional processes
> > (i.e. PGSemaphoreUnlock()) because the originally woken up process will
> > wake up at some point. But if the to-be-woken up process is scheduled
> > out because it used all his last timeslices fully that means we'll not
> > wakeup other waiters for a relatively long time.
> 
> I think it will also maintain that the wokedup process won't stall for
> very long time, because if we wake new waiters, then previously woked
> process can again enter into wait queue and similar thing can repeat
> for long time.

I don't think it effectively does that - newly incoming lockers ignore
the queue and just acquire the lock. Even if there's some other backend
scheduled to wake up. And shared locks can be acquired when there's
exclusive locks waiting.

I think both are actually critical for performance... Otherwise even a
only lightly contended lock would require scheduler activity when a
processes tries to lock something twice. Given the frequency we acquire
some locks with that'd be disastrous...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2

2014-06-17 Thread Amit Kapila

On Tue, Jun 17, 2014 at 6:35 PM, Andres Freund 
wrote:
>
> On 2014-06-17 18:01:58 +0530, Amit Kapila wrote:
> > On Tue, Jun 17, 2014 at 3:56 PM, Andres Freund 
> > > On 2014-06-17 12:41:26 +0530, Amit Kapila wrote:
> > > > 2.
> > > > Handling of potentialy_spurious case seems to be pending
> > > > in LWLock functions like LWLockAcquireCommon().
> > > >
> > > > LWLockAcquireCommon()
> > > > {
> > > > ..
> > > > /* add to the queue */
> > > > LWLockQueueSelf(l, mode);
> > > >
> > > > /* we're now guaranteed to be woken up if necessary */
> > > > mustwait = LWLockAttemptLock(l, mode, false, &potentially_spurious);
> > > >
> > > > }
> > > >
> > > > I think it can lead to some problems, example:
> > > >
> > > > Session -1
> > > > 1. Acquire Exclusive LWlock
> > > >
> > > > Session -2
> > > > 1. Acquire Shared LWlock
> > > >
> > > > 1a. Unconditionally incrementing shared count by session-2
> > > >
> > > > Session -1
> > > > 2. Release Exclusive lock
> > > >
> > > > Session -3
> > > > 1. Acquire Exclusive LWlock
> > > > It will put itself to wait queue by seeing the lock count
incremented
> > > > by Session-2
> > > >
> > > > Session-2
> > > > 1b. Decrement the shared count and add it to wait queue.
> > > >
> > > > Session-4
> > > > 1. Acquire Exclusive lock
> > > >This session will get the exclusive lock, because even
> > > >though other lockers are waiting, lockcount is zero.
> > > >
> > > > Session-2
> > > > 2. Try second time to take shared lock, it won't get
> > > >as session-4 already has an exclusive lock, so it will
> > > >start waiting
> > > >
> > > > Session-4
> > > > 2. Release Exclusive lock
> > > >it will not wake the waiters because waiters have been added
> > > >before acquiring this lock.
> > >
> > > I don't understand this step here? When releasing the lock it'll
notice
> > > that the waiters is <> 0 and acquire the spinlock which should protect
> > > against badness here?
> >
> > While Releasing lock, I think it will not go to Wakeup waiters
> > (LWLockWakeup), because releaseOK will be false.  releaseOK
> > can be set as false when Session-1 has Released Exclusive lock
> > and wakedup some previous waiter.  Once it is set to false, it can
> > be reset to true only for retry logic(after getting semaphore).
>
> I unfortunately still can't follow.

You have followed it pretty well as far as I can understand from your
replies, as there is no reproducible test (which I think is bit tricky to
prepare), so it becomes difficult to explain by theory.

> If Session-1 woke up some previous
> waiter the woken up process will set releaseOK to true again when it
> loops to acquire the lock?

You are right, it will wakeup the existing waiters, but I think the
new logic has one difference which is that it can allow the backend to
take Exclusive lock when there are already waiters in queue.  As per
above example even though Session-2 and Session-3 are in wait
queue, Session-4 will be able to acquire Exclusive lock which I think
was previously not possible.


> Somewhat unrelated:
>
> I have a fair amount of doubt about the effectiveness of the releaseOK
> logic (which imo also is pretty poorly documented).
> Essentially its intent is to avoid unneccessary scheduling when other
> processes have already been woken up (i.e. releaseOK has been set to
> false). I believe the theory is that if any process has already been
> woken up it's pointless to wake up additional processes
> (i.e. PGSemaphoreUnlock()) because the originally woken up process will
> wake up at some point. But if the to-be-woken up process is scheduled
> out because it used all his last timeslices fully that means we'll not
> wakeup other waiters for a relatively long time.

I think it will also maintain that the wokedup process won't stall for
very long time, because if we wake new waiters, then previously woked
process can again enter into wait queue and similar thing can repeat
for long time.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-06-17 Thread Abhijit Menon-Sen

At 2014-06-17 08:02:59 -0700, m...@joeconway.com wrote:
>
> That first hunk refers to dblink -- I'm guessing it does not belong
> with this patch.

Yes, it's a leftover of the dblink memory leak patch that's in this CF.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-06-17 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/17/2014 07:26 AM, MauMau wrote:
> Hello,
> 
> As I proposed before in the thread below, I've implemented a simple
> command for reliable WAL archiving.  I would appreciate it if you could
> review and test the patch.
> 
> http://www.postgresql.org/message-id/9C1EB95CA1F34DAB93DF549A51E3E874@maumau

That first hunk refers to dblink -- I'm guessing it does not belong with
this patch.

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJToFiiAAoJEDfy90M199hlseMP/Rpwneo5la7mJkHJA0BWUtj/
Hh8yNzzyBVPbKYuTEOZWi2yblFW6yA2dknrYD8RigS1qJFwLw+drFt5673Vi6jKW
Pf7Qn62Ck/U0lZTGXUU9akOhSx7BsKVwH8HvdARIp5DSV2n7/HFDtazi3hTtFq31
GHKA84lPwuynODN42eVez0YXdeRUX7K/s5rCRq154q3BrLPhEEvQwo/kfhet3F0A
utf0ymSPuX3RvpGDDPAZ1oQxTd/xA+qJhX0jrsRoJVaY40rufTgXPmAnNFOdWEds
fXs6ObRm+tHmeLhYVUXhODa1HPHBQMvElVeB3CGxhPUvhP2494sfoLOd7qs8Lblg
oUkYbIJee8Ir7NN34Gc69YW58sekPSVI4vXKu0PwF++Ubs4MYNNd7nP4Wu9N2ISw
p/GPIbx0uR3NbFGCoOLD0K3QHnX/b0FTWHzTTboZ+b69rNIDePpn3eGO2QEOLL/R
P/YkPma8SLyDvNnCzuSU3XDkFmWQsgK7xa7qpsBR1mbdF7zKPfOxCtoby/enSeuk
z7NJxtpHeUQkO7Pb3ZNk6gL+E8UAQihvdBgdBwBzj4qaZyAM5ec29aya3TtBbF+3
UoFX3m4tthR6mMWqizsdadSPvozDjMrhqSRFAjdBSX80Nfs2DVN1Hepp8NXvjRzV
b5RfV7x4yvtr92FFQboj
=+TWh
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-06-17 Thread Abhijit Menon-Sen

At 2014-06-17 23:26:37 +0900, maumau...@gmail.com wrote:
>
> Hello,
> 
> As I proposed before in the thread below, I've implemented a simple
> command for reliable WAL archiving.  I would appreciate it if you
> could review and test the patch.

Please add your patch to the next CF (i.e. 2014-08), so that it gets
review attention at the appropriate time.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-06-17 Thread Petr Jelinek


On 17/06/14 16:18, Robert Haas wrote:

On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek  wrote:

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data directory
produced by pg_basebackup - something that's helpful for logical replication
setup where you need to easily identify each node (it's used by
Bidirectional Replication for example).


I can clearly understand the utility of being able to reset the system
ID to a new, randomly-generated system ID - but giving the user the
ability to set a particular value of their own choosing seems like a
pretty sharp tool.  What is the use case for that?



Let's say you want to initialize new logical replication node via 
pg_basebackup and you want your replication slots to be easily 
identifiable so you use your local system id as part of the slot name.


In that case you need to know the future system id of the node because 
you need to register the slot before consistent point to which you 
replay via streaming replication (and you can't replay anymore once you 
changed the system id). Which means you need to generate your system id 
in advance and be able to change it in pg_control later.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 134 matches

Mail list logo