from:"Daniel Farina"

Re: [HACKERS] Storing the password in .pgpass file in an encrypted format

2014-02-21 Thread Daniel Farina

On Fri, Feb 21, 2014 at 10:42 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I think this thread deserves more attention:

 http://www.postgresql.org/message-id/caazkufajufddfp1_vghbdfyru0sj6msovvkrp87acq53ov6...@mail.gmail.com

(I wrote that mail)

I'm still in interested in this idea and haven't found a good reason
to rescind the general thinking there.

Some of my colleagues are thinking along similar lines outside the
Postgres context.  They seem happy with how that is shaping up.

So, if there is some will for revival, that would be grand.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Storing the password in .pgpass file in an encrypted format

2014-02-21 Thread Daniel Farina

On Fri, Feb 21, 2014 at 6:15 PM, Greg Stark st...@mit.edu wrote:
 On Fri, Feb 21, 2014 at 10:18 PM, Daniel Farina dan...@heroku.com wrote:
 I'm still in interested in this idea and haven't found a good reason
 to rescind the general thinking there.

 It's an interesting idea. I wonder if it would be possible to make it
 compatible with existing tools like ssh-agent instead of inventing our
 own?

I don't understand what you mean: the aesthetic of that proposal was
to act as pure delegation insomuch as possible to integrate with other
programs, and the supplementary programs provided that I wrote just
for the purposes of demonstration are short.

(https://github.com/fdr/pq-resolvers, if you want to read the program texts)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements fingerprinting logic and ArrayExpr

2013-12-10 Thread Daniel Farina

On Tue, Dec 10, 2013 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So my objection to what Peter is suggesting is not that it's a bad idea
 in isolation, but that I don't see where he's going to stop, short of
 reinventing every query-normalization behavior that exists in the planner.
 If this particular case is worthy of fixing with a hack in the
 fingerprinter, aren't there going to be dozens more with just as good
 claims?  (Perhaps not, but what's the basis for thinking this is far
 worse than any other normalization issue?)

Qualitatively, the dynamic length values list is the worst offender.

There is no algebraic solution to where to stop with normalizations,
but as Peter points out, that bridge has been crossed already:
assumptions have already been made that toss some potentially useful
information already, and yet the program is undoubtedly practical.

Based on my own experience (which sounds similar to Andres's), I am
completely convinced the canonicalization he proposes here is more
practical than the current definition to a large degree, so I suggest
the goal is worthwhile.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-11-14 Thread Daniel Farina

On Thu, Nov 14, 2013 at 7:25 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Hello,
 Please find attached pg_stat_statements-identification-v9.patch.

 I took a quick look. Observations:

 +   /* Making query ID dependent on PG version */
 +   query-queryId |= PG_VERSION_NUM  16;

 If you want to do something like this, make the value of
 PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

 Why are you doing this?

The destabilization of the query_id is to attempt to skirt the
objection that the unpredictability of instability in the query_id
would otherwise cause problems and present a false contract,
particular in regards to point release upgrades.

Earliest drafts of mine included destabilizing every session, but this
kills off a nice use case of correlating query ids between primaries
and standbys, if memory serves.  This has the semantics of
destabilizing -- for sure -- every point release.

 @@ -128,6 +146,7 @@ typedef struct pgssEntry
 pgssHashKey key;/* hash key of entry - MUST 
 BE FIRST */
 Counterscounters;   /* the statistics for this 
 query */
 int query_len;  /* # of valid bytes 
 in query string */
 +   uint32  query_id;   /* jumble value for this 
 entry */

 query_id is already in key.

 Not sure I like the idea of the new enum at all, but in any case you
 shouldn't have a PGSS_TUP_LATEST constant - should someone go update
 all usage of that constant only when your version isn't the latest?
 Like here:

 +   if (detected_version = PGSS_TUP_LATEST)

It is roughly modeled after the column version of the same thing
that pre-existed in pg_stat_statements.  The only reason to have a
LATEST is for some of the invariants though; otherwise, most uses
should use the version-stamped symbol.  I did this wrongly a bunch of
times as spotted by Alvaro, I believe.

 I forget why Daniel originally altered the min value of
 pg_stat_statements.max to 1 (I just remember that he did), but I don't
 think it holds that you should keep it there. Have you considered the
 failure modes when it is actually set to 1?

Testing.  It was very useful to set to small numbers, like two or
three.  It's not crucial to the patch at all but having to whack it
back all the time to keep the patch submission devoid of it seemed
impractically tedious during revisions.

 This is what I call a can't happen error, or a defensive one:

 +   else
 +   {
 +   /*
 +* Couldn't identify the tuple format.  Raise error.
 +*
 +* This is an exceptional case that may only happen in bizarre
 +* situations, since it is thought that every released version
 +* of pg_stat_statements has a matching schema.
 +*/
 +   ereport(ERROR,
 +   
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 +errmsg(pg_stat_statements schema is not 
 supported 
 +   by its installed binary)));
 +   }

 I'll generally make these simple elogs(), which are more terse. No one
 is going to find all that dressing useful.

That seems reasonable.  Yes, it's basically a soft assert.  I hit this
one in development a few times, it was handy.

 It probably isn't useful to comment random, unaffected code that isn't
 affected by your patch - I don't find this new refactoring useful, and
 am surprised to see it in your patch:

 +   /* Check header existence and magic number match. */
 if (fread(header, sizeof(uint32), 1, file) != 1 ||
 -   header != PGSS_FILE_HEADER ||
 -   fread(num, sizeof(int32), 1, file) != 1)
 +   header != PGSS_FILE_HEADER)
 +   goto error;
 +
 +   /* Read how many table entries there are. */
 +   if (fread(num, sizeof(int32), 1, file) != 1)
 goto error;

 Did you mean to add all this, or is it left over from Daniel's patch?

The whitespace changes are not intentional, but I think the comments
help a reader track what's going on better, which becomes more
important if one adds more fields.  It can be broken out into a
separate patch, but it didn't seem like that bookkeeping was
necessary.

 @@ -43,6 +43,7 @@
   */
  #include postgres.h

 +#include time.h
  #include unistd.h

  #include access/hash.h
 @@ -59,15 +60,18 @@
  #include storage/spin.h
  #include tcop/utility.h
  #include utils/builtins.h
 +#include utils/timestamp.h

 Final thought: I think the order in the pg_stat_statements view is
 wrong. It ought to be like a composite primary key - (userid, dbid,
 query_id).

I made no design decisions here...no complaints from me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make

Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Daniel Farina

On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Hi,

 Here's an idea: when a user ask for an Hash Index transparently build a
 BTree index over an hash function instead.

 Advantages:

   - it works
   - it's crash safe
   - it's (much?) faster than a hash index anyways

 Drawbacks:

   - root access concurrency
   - we need a hash_any function stable against pg_upgrade

 After talking about it with Heikki, we don't seem to find ways in which
 the root access concurrency pattern would be visible enough to matter.

 Also, talking with Peter Geoghegan, it's unclear that there's a use case
 where a hash index would be faster than a btree index over the hash
 function.

 Comments?

I haven't met a single Heroku user that has stumbled into this
landmine.  Normally I am very weary of such landmine laden features,
but this one is there and doesn't seem to have detonated at any point.
 I guess the obscure nature of those indexes and the stern warning in
the documentation has sufficed to discourage their use.

Given that experience, I think foreclosing fixing hash indexes is
premature, and doesn't seem to be hurting inexperienced users of this
stripe.  Maybe there are yet other reasons to argue the subject,
though...it's not like the current user base relies on the behavior
as-is either, having seemingly steered clear just about altogether.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Daniel Farina

On Fri, Nov 1, 2013 at 8:52 AM, Daniel Farina dan...@heroku.com wrote:
 On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 Also, talking with Peter Geoghegan, it's unclear that there's a use case
 where a hash index would be faster than a btree index over the hash
 function.

 Comments?

 I haven't met a single Heroku user that has stumbled into this
 landmine.  Normally I am very weary of such landmine laden features,
 but this one is there and doesn't seem to have detonated at any point.
  I guess the obscure nature of those indexes and the stern warning in
 the documentation has sufficed to discourage their use.

 Given that experience, I think foreclosing fixing hash indexes is
 premature, and doesn't seem to be hurting inexperienced users of this
 stripe.  Maybe there are yet other reasons to argue the subject,
 though...it's not like the current user base relies on the behavior
 as-is either, having seemingly steered clear just about altogether.

In addendum, though, some users *have* done the hash-function + btree
trick to make keys smaller.  They tend to resort to full blown
cryptographic hashes and assume/enforce non-collision, but it's
somewhat awkward and finicky.  So while perhaps commandeering the
'hash' index type is a bit over-aggressive, making that use case work
better would probably carry positive impact.

I can say most of my indexes over text are *never* used for range
queries, so hashing them down one would think would be great.  The
fiddliness of applying expression indexes over all that forecloses
getting the benefits that might be possible, there: only certain heavy
workloads will receive that level of care.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] What hook would you recommend for one time, post authentication?

2013-10-28 Thread Daniel Farina

What hook would you recommend that matches this criteria:

* Runs post-authentication

* ..Once

I was putting together a little extension module[0] intended to do
connection limits out-of-band with the catalog (so that hot standbys
and primaries can have different imposed connection limits), but am
stymied because I can't locate a hook matching this description.

My general approach has been to try to use
GetUserNameFromId(GetSessionUserId()), but this requires
InitializeSessionUserId be called first, and that has been causing me
some trouble.

ClientAuthentication_hook is too early (authentication has not yet
happened, InitializeSessionUserId has not been called).  Many of the
other hooks are run per query (like the Executor hooks).  And,
postinit.c is not giving me a lot of clues here and nothing with the
lexeme 'hook' is giving me a lot of confidence as seen in
typedefs.list/grep.

I appreciate any advice one can supply, thank you.

[0]: https://github.com/fdr/pg_connlimit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What hook would you recommend for one time, post authentication?

2013-10-28 Thread Daniel Farina

On Mon, Oct 28, 2013 at 6:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 roleid = get_role_oid(port-user_name, true);

Thank you for that, that appears to work very well to my purpose, as
does ClientAuthentication_hook, now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add min and max execute statement time in pg_stat_statement

2013-10-22 Thread Daniel Farina

On Tue, Oct 22, 2013 at 2:56 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 Hm.  It's been a long time since college statistics, but doesn't the
 entire concept of standard deviation depend on the assumption that the
 underlying distribution is more-or-less normal (Gaussian)?  Is there a

 I just had a quick chat with a statistician friends of mine on that
 topic, and it seems that the only way to make sense of an average is if
 you know already the distribution.

 In our case, what I keep experiencing with tuning queries is that we
 have like 99% of them running under acceptable threshold and 1% of them
 taking more and more time.

Agreed.

In a lot of Heroku's performance work, the Perc99 and Perc95 have
provided a lot more value that stddev, although stddev is a lot better
than nothing and probably easier to implement.

There are apparently high-quality statistical approximations of these
that are not expensive to compute and are small in memory representation.

That said, I'd take stddev over nothing for sure.

Handily for stddev, I think by snapshots of count(x), sum(x),
sum(x**2) (which I understand to be the components of stddev), I think
one can compute stddevs across different time spans using auxiliary
tools that sample this triplet on occasion.  That's kind of a handy
property that I'm not sure if percN-approximates can get too easily.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina

On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur samthaku...@gmail.com wrote:
 Please find patch attached which adds documentation for session_start
 and introduced fields and corrects documentation for queryid to be
 query_id. session_start remains in the view as agreed.

 Thanks for updating the document!

 I'm not clear about the use case of 'session_start' and 'introduced' yet.
 Could you elaborate it? Maybe we should add that explanation into
 the document?

Probably.

The idea is that without those fields it's, to wit, impossible to
explain non-monotonic movement in metrics of those queries for precise
use in tools that insist on monotonicity of the fields, which are all
cumulative to date I think.

pg_stat_statements_reset() or crashing loses the session, so one
expects ncalls may decrease.  In addition, a query that is bouncing
in and out of the hash table will have its statistics be lost, so its
statistics will also decrease.  This can cause un-resolvable artifact
in analysis tools.

The two fields allow for precise understanding of when the statistics
for a given query_id are continuous since the last time a program
inspected it.

 In my test, I found that pg_stat_statements.total_time always indicates a 
 zero.
 I guess that the patch might handle pg_stat_statements.total_time wrongly.

 +values[i++] = DatumGetTimestamp(
 +instr_get_timestamptz(pgss-session_start));
 +values[i++] = DatumGetTimestamp(
 +instr_get_timestamptz(entry-introduced));

 These should be executed only when detected_version = PGSS_TUP_LATEST?

Yes. Oversight.

 +  entrystructfieldsession_start/structfield/entry
 +  entrytypetext/type/entry
 +  entry/entry
 +  entryStart time of a statistics session./entry
 + /row
 +
 + row
 +  entrystructfieldintroduced/structfield/entry
 +  entrytypetext/type/entry

 The data type of these columns is not text.

Oops

 +instr_time  session_start;
 +uint64private_stat_session_key;

 it's better to add the comments explaining these fields.

Yeah.


 +microsec = INSTR_TIME_GET_MICROSEC(t) -
 +((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY);

 HAVE_INT64_TIMESTAMP should be considered here.

That's not a bad idea.

 +if (log_cannot_read)
 +ereport(LOG,
 +(errcode_for_file_access(),
 + errmsg(could not read pg_stat_statement file \%s\: %m,
 +PGSS_DUMP_FILE)));

 Is it better to use WARNING instead of LOG as the log level here?

Is this new code?  Why did I add it again?  Seems reasonable
though to call it a WARNING.

 +/*
 + * The role calling this function is unable to see
 + * sensitive aspects of this tuple.
 + *
 + * Nullify everything except the insufficient privilege
 + * message for this entry
 + */
 +memset(nulls, 1, sizeof nulls);
 +
 +nulls[i]  = 0;
 +values[i] = CStringGetTextDatum(insufficient privilege);

 Why do we need to hide *all* the fields in pg_stat_statements, when
 it's accessed by improper user? This is a big change of pg_stat_statements
 behavior, and it might break the compatibility.

It seems like an information leak that grows more major if query_id is
exposed and, at any point, one can determine the query_id for a query
text.

 This is not directly related to the patch itself, but why does the
  queryid
 need to be calculated based on also the statistics session?

   If we expose hash value of query tree, without using statistics session,
 it is possible that users might make wrong assumption that this value
 remains stable across version upgrades, when in reality it depends on
 whether the version has make changes to query tree internals. So to
 explicitly ensure that users do not make this wrong assumption, hash value
 generation use statistics session id, which is newly created under
 situations described above.

 So, ISTM that we can use, for example, the version number to calculate
 the query_id. Right?

That does seem like it may be a more reasonable stability vs. once per
statistics session, because then use case with standbys work somewhat
better.

That said, the general concern was accidental contracts being assumed
by consuming code, so this approach is tuned for making the query_id
as unstable as possible while still being useful: stable, but only in
one statistics gathering section.

I did not raise the objection about over-aggressive contracts being
exposed although I think the concern has merit...but given the use
case involving standbys, I am for now charitable to increasing the
stability to the level you indicate: a guaranteed break every point
release.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina

On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina dan...@heroku.com wrote:
 Probably.

 The idea is that without those fields it's, to wit, impossible to
 explain non-monotonic movement in metrics of those queries for precise
 use in tools that insist on monotonicity of the fields, which are all
 cumulative to date I think.

 pg_stat_statements_reset() or crashing loses the session, so one
 expects ncalls may decrease.  In addition, a query that is bouncing
 in and out of the hash table will have its statistics be lost, so its
 statistics will also decrease.  This can cause un-resolvable artifact
 in analysis tools.

 The two fields allow for precise understanding of when the statistics
 for a given query_id are continuous since the last time a program
 inspected it.

 Thanks for elaborating them! Since 'introduced' is reset even when
 the statistics is reset, maybe we can do without 'session_start' for
 that purpose?

There is a small loss of precision.

The original reason was that if one wanted to know, given two samples
of pg_stat_statements, when the query_id is going to remain stable for
a given query.

For example:

If the session changes on account of a reset, then it is known that
all query_ids one's external program is tracking are no longer going
to be updated, and the program can take account for the fact that the
same query text is going to have a new query_id.

Given the new idea of mixing in the point release number:

If the code is changed to instead mixing in the full version of
Postgres, as you suggested recently, this can probably be removed.
The caveat there is then the client is going to have to do something a
bit weird like ask for the point release and perhaps even compile
options of Postgres to know when the query_id is going to have a
different value for a given query text.  But, maybe this is an
acceptable compromise to remove one field.

 +/*
 + * The role calling this function is unable to see
 + * sensitive aspects of this tuple.
 + *
 + * Nullify everything except the insufficient privilege
 + * message for this entry
 + */
 +memset(nulls, 1, sizeof nulls);
 +
 +nulls[i]  = 0;
 +values[i] = CStringGetTextDatum(insufficient privilege);

 Why do we need to hide *all* the fields in pg_stat_statements, when
 it's accessed by improper user? This is a big change of pg_stat_statements
 behavior, and it might break the compatibility.

 It seems like an information leak that grows more major if query_id is
 exposed and, at any point, one can determine the query_id for a query
 text.

 So hiding only query and query_id is enough?

Yeah, I think so.  The other fields feel a bit weird to leave hanging
around as well, so I thought I'd just fix it in one shot, but doing
the minimum or only applying this idea to new fields is safer.  It's
shorter to hit all the fields, though, which is why I was tempted to
do that.

Perhaps not a good economy for potential subtle breaks in the next version.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina

On Thu, Oct 10, 2013 at 10:49 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Daniel Farina escribió:
 On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:

  In my test, I found that pg_stat_statements.total_time always indicates a 
  zero.
  I guess that the patch might handle pg_stat_statements.total_time wrongly.
 
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(pgss-session_start));
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(entry-introduced));
 
  These should be executed only when detected_version = PGSS_TUP_LATEST?

 Yes. Oversight.

 Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?  I mean, if
 later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
 becomes latest, somebody running the current definition with the updated
 .so will no longer get these values.  Or is the intention that
 PGSS_TUP_LATEST will never be updated again, and future versions will
 get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
 I mean, surely we can always come up with new symbols to use, but it
 seems hard to follow.

 There's one other use of PGSS_TUP_LATEST here which I think should also
 be changed to = SOME_SPECIFIC_VERSION:

 +   if (detected_version = PGSS_TUP_LATEST)
 +   {
 +   uint64 qid = pgss-private_stat_session_key;
 +
 +   qid ^= (uint64) entry-query_id;
 +   qid ^= ((uint64) entry-query_id)  32;
 +
 +   values[i++] = Int64GetDatumFast(qid);
 +   }

I made some confusing mistakes here in using the newer tuple
versioning.  Let me try to explain what the motivation was:

I was adding new fields to pg_stat_statements and was afraid that it'd
be hard to get a very clear view that all the set fields are in
alignment and there were no accidental overruns, with the problem
getting more convoluted as more versions are added.

The idea of PGSS_TUP_LATEST is useful to catch common programmer error
by testing some invariants, and it'd be nice not to have to thrash the
invariant checking code every release, which would probably defeat the
point of such oops prevention code.  But, the fact that I went on to
rampantly do questionable things PGSS_TUP_LATEST is a bad sign.

By example, here are the two uses that have served me very well:

/* Perform version detection */
if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_0)
detected_version = PGSS_TUP_V1_0;
else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_1)
detected_version = PGSS_TUP_V1_1;
else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS)
detected_version = PGSS_TUP_LATEST;
else
{
/*
 * Couldn't identify the tuple format.  Raise error.
 *
 * This is an exceptional case that may only happen in bizarre
 * situations, since it is thought that every released version
 * of pg_stat_statements has a matching schema.
 */
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg(pg_stat_statements schema is not supported 
by its installed binary)));
}

And

#ifdef USE_ASSERT_CHECKING
/* Check that every column appears to be filled */
switch (detected_version)
{
case PGSS_TUP_V1_0:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_0);
break;
case PGSS_TUP_V1_1:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_1);
break;
case PGSS_TUP_LATEST:
Assert(i == PG_STAT_STATEMENTS_COLS);
break;
default:
Assert(false);
}
#endif

Given that, perhaps a way to fix this is something like this patch-fragment:


 {
  PGSS_TUP_V1_0 = 1,
  PGSS_TUP_V1_1,
- PGSS_TUP_LATEST
+ PGSS_TUP_V1_2
 } pgssTupVersion;

+#define PGSS_TUP_LATEST PGSS_TUP_V1_2


This way when a programmer is making new tuple versions, they are much
more likely to see the immediate need to teach those two sites about
the new tuple size.  But, the fact that one does not get the
invariants updated in a completely compulsory way may push the value
of this checking under water.

I'd be sad to see it go, it has saved me a lot of effort when
returning to the code after a while.

What do you think?

 The instr_time thingy being used for these times maps to
 QueryPerformanceCounter() on Windows, and I'm not sure how useful this
 is for long-term time tracking; see
 http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
 for instance.  I think it'd be better to use TimestampTz and
 GetCurrentTimestamp() for this.

Ah. I was going to do that, but thought it'd be nice to merely push
down the already-extant Instr struct in most cases, as to get the
'start' time stored there for free.  But if it can't

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina

On Thu, Oct 10, 2013 at 1:30 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
  Just noticed that you changed the timer to struct Instrumentation.  Not
  really sure about that change.  Since you seem to be using only the
  start time and counter, wouldn't it be better to store only those?
  Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

 Yeah, I was unsure about that too.

 The motivation was that I need one more piece of information in
 pgss_store (the absolute start time).  I was going to widen the
 argument list, but it was looking pretty long, so instead I was
 thinking it'd be more concise to push the entire, typically extant
 Instrumentation struct pointer down.

 Would it work to define your own struct to pass around?

Absolutely, I was just hoping to spare the code another abstraction if
another was a precise superset.

Looks like that isn't going to happen, though, so a pgss-oriented
struct is likely what will have to be.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-05 Thread Daniel Farina

On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur samthaku...@gmail.com wrote:
 On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur samthaku...@gmail.com wrote:

 Looks pretty good. Do you want to package up the patch with your
 change and do the honors and re-submit it? Thanks for helping out so
 much!
 Sure, will do. Need to add a bit of documentation explaining
 statistics session as well.
 I did some more basic testing around pg_stat_statements.max, now that
 we have clarity from Peter about its value being legitimate below 100.
 Seems to work fine, with pg_stat_statements =4 the max unique queries
 in the view are 4. On the 5th query the view holds just the latest
 unique query discarding the previous 4. Fujii had reported a
 segmentation fault in this scenario.
 Thank you for the patch

 Please find the patch attached

 Thanks for the patch! Here are the review comments:

 +OUT session_start timestamptz,
 +OUT introduced timestamptz,

 The patch exposes these columns in pg_stat_statements view.
 These should be documented.

 I don't think that session_start should be exposed in every
 rows in pg_stat_statements because it's updated only when
 all statistics are reset, i.e., session_start of all entries
 in pg_stat_statements indicate the same.

Dunno.  I agree it'd be less query traffic and noise.  Maybe hidden
behind a UDF?  I thought stats_reset on pg_database may be prior
art, but realized that the statistics there differ depending on stats
resets per database (maybe a name change of 'session' to 'stats_reset'
would be useful to avoid too much in-cohesion, though).

I didn't want to bloat the taxonomy of exposed API/symbols too much
for pg_stat_statements, but perhaps in this instance it is reasonable.
 Also, isn't the interlock with the result set is perhaps more
precise/fine-grained with the current solution?  Yet, that's awfully
corner-casey.

I'm on the fence because the simplicity and precision of the current
regime for aggregation tools is nice, but avoiding the noise for
inspecting humans in the common case is also nice.  I don't see a
reason right now to go strongly either way, so if you feel moderately
strongly that the repetitive column should be stripped then I am happy
to relent there and help out.  Let me know of your detailed thoughts
(or modify the patch) with your idea.

 +OUT query_id int8,

 query_id or queryid? I like the latter. Also the document
 uses the latter.

Not much opinion.  I prefer expending the _ character because most
compound words in postgres performance statistics catalogs seem to use
an underscore, though.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-01 Thread Daniel Farina

On Tue, Oct 1, 2013 at 5:32 AM, Sameer Thakur samthaku...@gmail.com wrote:
 On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL]
 [hidden email] wrote:


 On Sep 30, 2013 4:39 AM, Sameer Thakur [hidden email] wrote:

  Also, for onlookers, I have changed this patch around to do the
  date-oriented stuff but want to look it over before stapling it up and
  sending it.  If one cannot wait, one can look at
  https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
  that history contains a reasonable patch I think, but a re-read often
  finds something for me and I've only just completed it yesterday.
 

 I did the following
 1. Forked from fdr/postgres
 2. cloned branch queryid
 3. squashed
 22899c802571a57cfaf0df38e6c5c366b5430c74
 d813096e29049667151a49fc5e5cf3d6bbe55702
 picked
 be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
 4. usual make/make install/create extension pg_stat_statements.
 (pg_stat_statements.max=100).
 5. select * from pg_stat_statements_reset(), select * from
 pgbench_tellers.
 result below:

 userid | dbid  |  session_start   |introduced
|   query   |  query_id
   | calls | total_time |
  rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
 shared_blks_written | local_blks_hit | local_blks_read |
 local_blks_dirtied | local_blks_written | t
 emp_blks_read | temp_blks_written | blk_read_time | blk_write_time


 +---+--+---+---+-+---++


 --+-+--+-+-++-+++--
 --+---+---+
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pg_stat_statements_reset(); |
 2531907647060518039 | 1 |  0 |
 1 |   0 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pgbench_tellers ;   |
 7580333025384382649 | 1 |  0 |
10 |   1 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
 (2 rows)


 I understand session_start and verified that it changes with each
 database restart to reflect current time.

 It should only restart when the statistics file cannot be loaded.

 This seems to work fine.
 1. Started the instance
 2. Executed pg_stat_statements_reset(), select * from
 pgbench_history,select* from pgbench_tellers. Got the following in
 pg_stat_statements view
 userid | dbid  |  session_start   |
 introduced|   query   |
query_id   | calls | tota
 l_time | rows | shared_blks_hit | shared_blks_read |
 shared_blks_dirtied | shared_blks_written | local_blks_hit |
 local_blks_read | local_blks_dirtied | local_blks_wri
 tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time
 +---+--+--+---+--+---+-
 ---+--+-+--+-+-++-++---
 -++---+---+
  10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
 17:43:43.724301+05:30 | select * from pgbench_history;|
 -165801328395488047 | 1 |
  0 |0 |   0 |0 |
 0 |   0 |  0 |   0 |
0 |
0 |  0 | 0 | 0 |  0
  10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
 17:43:37.379785+05:30 | select * from pgbench_tellers;|
 8376871363863945311 | 1 |
  0 |   10 |   0 |1 |
 0 |   0 |  0 |   0 |
0 |
0 |  0 | 0 | 0 |  0
  10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
 17:43:26.667178+05:30 | select * from pg_stat_statements_reset(); |
 -1061018443194138344 | 1 |
  0 |1 |   0 |0 |
 0 |   0 |  0 |   0 |
0 |
0 |  0 | 0 | 0 |  0
 (3 rows)

 Then restarted the server

Re: [HACKERS] pluggable compression support

2013-10-01 Thread Daniel Farina

On Mon, Sep 30, 2013 at 1:49 PM, Huchev hugochevr...@gmail.com wrote:
 How come any compressor which could put some competition to pglz is
 systematically pushed out of the field on the ground of unverifiable legal
 risks ?

Because pglz has been around for a while and has not caused patent
trouble. The risks have been accepted and the downsides have not
materialized. Were pglz were being written and distributed starting
today, perhaps your reasoning would be more compelling, but as-is the
pglz ship has sailed for quite some time and empirically it has not
been a problem.

That said, I hope the findings are in favor of lz4 or snappy
integration.  It does seem lz4 has picked up a slight edge.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-30 Thread Daniel Farina

On Sun, Sep 29, 2013 at 10:25 AM, Sameer Thakur samthaku...@gmail.com wrote:
 Yes i was. Just saw a warning when pg_stat_statements is loaded that
 valid values for pg_stat_statements.max is between 100 and 2147483647.
 Not sure why though.

I remember hacking that out for testing sake.

I can only justify it as a foot-gun to prevent someone from being
stuck restarting the database to get a reasonable number in there.
Let's CC Peter; maybe he can remember some thoughts about that.

Also, for onlookers, I have changed this patch around to do the
date-oriented stuff but want to look it over before stapling it up and
sending it.  If one cannot wait, one can look at
https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
that history contains a reasonable patch I think, but a re-read often
finds something for me and I've only just completed it yesterday.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-30 Thread Daniel Farina

On Sep 30, 2013 4:39 AM, Sameer Thakur samthaku...@gmail.com wrote:

  Also, for onlookers, I have changed this patch around to do the
  date-oriented stuff but want to look it over before stapling it up and
  sending it.  If one cannot wait, one can look at
  https://github.com/fdr/postgres/tree/queryid.  The squashed-version of
  that history contains a reasonable patch I think, but a re-read often
  finds something for me and I've only just completed it yesterday.
 

 I did the following
 1. Forked from fdr/postgres
 2. cloned branch queryid
 3. squashed
 22899c802571a57cfaf0df38e6c5c366b5430c74
 d813096e29049667151a49fc5e5cf3d6bbe55702
 picked
 be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
 4. usual make/make install/create extension pg_stat_statements.
 (pg_stat_statements.max=100).
 5. select * from pg_stat_statements_reset(), select * from
pgbench_tellers.
 result below:

 userid | dbid  |  session_start   |introduced
|   query   |  query_id
   | calls | total_time |
  rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
 shared_blks_written | local_blks_hit | local_blks_read |
 local_blks_dirtied | local_blks_written | t
 emp_blks_read | temp_blks_written | blk_read_time | blk_write_time

+---+--+---+---+-+---++


--+-+--+-+-++-+++--

 --+---+---+
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pg_stat_statements_reset(); |
 2531907647060518039 | 1 |  0 |
 1 |   0 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
  10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
 05:30:00+05:30 | select * from pgbench_tellers ;   |
 7580333025384382649 | 1 |  0 |
10 |   1 |0 |   0 |
   0 |  0 |   0 |
 0 |  0 |
 0 | 0 | 0 |  0
 (2 rows)


 I understand session_start and verified that it changes with each
 database restart to reflect current time.

It should only restart when the statistics file cannot be loaded.

I am not sure why introduced
 keeps showing the same 1970-01-01 05:30:00+05:30 value. I thought it
 reflected the (most recent) time query statements statistics is added
 to hashtable. Is this a bug?
 Will continue to test and try and understand the code.

Yes, a bug.  There are a few calls to pgss store and I must be submitting a
zero value for the introduction time in one of those cases.

Heh, I thought that was fixed, but maybe I broke something.  Like I said;
preliminary. At the earliest I can look at this Wednesday, but feel free to
amend and resubmit including my changes if you feel inclined and get to it
first.

Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-27 Thread Daniel Farina

On Wed, Sep 25, 2013 at 8:00 AM, Greg Stark st...@mit.edu wrote:

 On Wed, Sep 25, 2013 at 12:15 AM, Daniel Farina dan...@heroku.com wrote:

 Enable the memcg OOM killer only for user faults, where it's really the
 only option available.


 Is this really a big deal? I would expect most faults to be user faults.

 It's certainly a big deal that we need to ensure we can handle ENOMEM from
 syscalls and library functions we weren't expecting to return it. But I
 don't expect it to actually reduce the OOM killing sprees by much.

Hmm, I see what you mean.  I have been reading through the mechanism:
I got too excited about 'allocations by system calls', because I
thought that might mean brk  and friends, except that's not much of an
allocation at all, just reservation.  I think.

There is some interesting stuff coming in along with these patches in
bringing the user-space memcg OOM handlers up to snuff that may make
it profitable to issue SIGTERM to backends when a safety margin is
crossed (too bad the error messages will be confusing in that case).
I was rather hoping that a regular ENOMEM could be injected by this
mechanism the next time a syscall is touched (unknown), but I'm not
confident if this is made easier or not, one way or another.  One
could imagine the kernel injecting such a fault when the amount of
memory being consumed starts to look hairy, but I surmise part of the
impetus for userspace handling of that is to avoid getting into that
particular heuristics game.

Anyway, I did do some extensive study of cgroups and memcg's
implementation in particular and found it not really practical for
Postgres use unless one was happy with lots and lots of database
restarts, and this work still gives me some hope to try again, even if
smaller modifications still seem necessary.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-24 Thread Daniel Farina

On Sep 24, 2013 10:12 AM, Josh Berkus j...@agliodbs.com wrote:

 All,

 I've send kernel.org a message that we're keen on seeing these changes
 become committed.

I thought it was merged already in 3.12. There are a few related
patches, but here's one:

commit 519e52473ebe9db5cdef44670d5a97f1fd53d721
Author: Johannes Weiner han...@cmpxchg.org
Date:   Thu Sep 12 15:13:42 2013 -0700

mm: memcg: enable memcg OOM killer only for user faults

System calls and kernel faults (uaccess, gup) can handle an out of memory
situation gracefully and just return -ENOMEM.

Enable the memcg OOM killer only for user faults, where it's really the
only option available.

Signed-off-by: Johannes Weiner han...@cmpxchg.org
Acked-by: Michal Hocko mho...@suse.cz
Cc: David Rientjes rient...@google.com
Cc: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com
Cc: azurIt azu...@pobox.sk
Cc: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Linus Torvalds torva...@linux-foundation.org

$ git tag --contains 519e52473ebe9db5cdef44670d5a97f1fd53d721
v3.12-rc1
v3.12-rc2

Searching for recent work by Johannes Weiner shows the pertinent stuff
more exhaustively.

 BTW, in the future if anyone sees kernel.org contemplating a patch which
 helps or hurts Postgres, don't hesiate to speak up to them.  They don't
 get nearly enough feedback from DB developers.

I don't hesitate, most of the time I simply don't know.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-09-20 Thread Daniel Farina

On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina dan...@fdr.io wrote:
 I think the n-call underestimation propagation may not be quite precise for
 various detailed reasons (having to do with 'sticky' queries) and to make it
 precise is probably more work than it's worth.  And, on more reflection, I'm
 also having a hard time imaging people intuiting that value usefully.  So,
 here's a version removing that.

I forgot about removal of the relevant SGML, amended here in v6.


pg_stat_statements-identification-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-18 Thread Daniel Farina

I'm not sure how many of you have been tracking this but courtesy of
lwn.net I have learned that it seems that the OOM killer behavior in
Linux 3.12 will be significantly different.  And by description, it
sounds like an improvement.  I thought some people reading -hackers
might be interested.

Based on the description at lwn, excerpted below, it sounds like the
news might be that systems with overcommit on might return OOM when a
non-outlandish request for memory is made from the kernel.


Johannes Weiner has posted a set of patches aimed at improving this
situation. Following a bunch of cleanup work, these patches make two
fundamental changes to how OOM conditions are handled in the kernel.
The first of those is perhaps the most visible: it causes the kernel
to avoid calling the OOM killer altogether for most memory allocation
failures. In particular, if the allocation is being made in response
to a system call, the kernel will just cause the system call to fail
with an ENOMEMerror rather than trying to find a process to kill. That
may cause system call failures to happen more often and in different
contexts than they used to. But, naturally, that will not be a problem
since all user-space code diligently checks the return status of every
system call and responds with well-tested error-handling code when
things go wrong.


Subject to experiment, this may be some good news, as many programs,
libraries, and runtime environments that may run parallel to Postgres
on a machine are pretty lackadaisical about limiting the amount of
virtual memory charged to them, and overcommit off is somewhat
punishing in those situations if one really needed a large hash table
from Postgres or whatever.  I've seen some cases here where a good
amount of VM has been reserved and caused apparent memory pressure
that cut throughput short of what should ought to be possible.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Unpacking scalar JSON values

2013-08-24 Thread Daniel Farina

Per report of Armin Ronacher, it's not clear how to take a scalar JSON
string and unquote it into a regular Postgres text value, given what
I can see here:
http://www.postgresql.org/docs/9.3/static/functions-json.html

Example:

SELECT 'a json string'::json;

(Although this some problem could play out with other scalar JSON types):

SELECT '4'::json;
SELECT '2.0'::json;

This use cases arises from some of the extant unpacking operations,
such as json_array_elements.  It's not that strange to have a value
something something like this in a JSON:

'{tags: [a \ string, b, c]}'

Thoughts?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unpacking scalar JSON values

2013-08-24 Thread Daniel Farina

On Sat, Aug 24, 2013 at 3:09 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 08/24/2013 11:36 PM, Daniel Farina wrote:
 Per report of Armin Ronacher, it's not clear how to take a scalar JSON
 string and unquote it into a regular Postgres text value, given what
 I can see here:
 http://www.postgresql.org/docs/9.3/static/functions-json.html

 Example:

 SELECT 'a json string'::json;

 (Although this some problem could play out with other scalar JSON types):

 SELECT '4'::json;
 SELECT '2.0'::json;

 This use cases arises from some of the extant unpacking operations,
 such as json_array_elements.  It's not that strange to have a value
 something something like this in a JSON:

 '{tags: [a \ string, b, c]}'

 Thoughts?
 This was discussed to death at some point during development and
 the prevailing consensus was that json type is not representing the
 underlying structure/class instance/object but a string which encodes
 this object

 so if you convert a restricted (must comply to JSON Spec) string to
 unrestricted string you really just do a NoOp vast.

This doesn't make a lot of sense to me.

select * from json_each_text('{key: va\lue}'); is handy and gives
one the json value of the text -- that is to say, dequoted.  So it's
not like unquoting is not already an operation seen in some of the
operators:

select * from json_each_text('{key: va\lue}');
 key | value
-+
 key | value
(1 row)

But there's no good way I can find from the documentation to do it
with a scalar: select ('va\lue'::json)::text;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unpacking scalar JSON values

2013-08-24 Thread Daniel Farina

On Sat, Aug 24, 2013 at 6:04 PM, Daniel Farina dan...@fdr.io wrote:
 But there's no good way I can find from the documentation to do it
 with a scalar: select ('va\lue'::json)::text;

Triggered send by accident:

select ('va\lue'::json)::text;
   text
---
 va\lue
(1 row)

the JSON escaping is retained.  That may be reasonable for a
text-cast, so I'm not suggesting its reinterpretation, but there is no
operator I can identify immediately from the documentation to convert
a JSON string value into a Postgres one like json_each_text, except on
a json that contains a scalar JSON string.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF] Free VMs for Reviewers Testers

2013-07-09 Thread Daniel Farina

On Tue, Jul 9, 2013 at 12:24 AM, Jesper Krogh jes...@krogh.cc wrote:

 The really, really big ones are useful even for pushing limits, such
 as cr1.8xlarge, with 32 CPUs and 244GiB memory.  Current spot instance
 price (the heavily discounted can die at any time one) is $0.343/hr.
 Otherwise, it's 3.500/hr.


 Just to keep in mind cpus are similar throttled:

 One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz
 2007 Opteron or 2007 Xeon processor. This is also the equivalent to an
 early-2006 1.7 GHz Xeon processor referenced in our original documentation.

This is only a statement of measurement (notably, it also is a metric
that is SMP-processor-count-oblivious), for lack of a more sensible
metric (certainly not clock cycles nor bogomips) in common use.

 Who knows what that does to memory bandwidth / context switches  etc.

Virtualization adds complexity, that is true, but so does a new
version of Linux or comparing across microprocessors, motherboards, or
operating systems.  I don't see a good reason to be off-put in the
common cases, especially since I can't think of another way to produce
such large machines on a short-term obligation basis.

The advantages are probably diminished (except for testing
virtualization overhead on common platforms) for smaller machines that
can be located sans-virtualization more routinely.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4 CF] Free VMs for Reviewers Testers

2013-07-08 Thread Daniel Farina

On Mon, Jul 8, 2013 at 7:25 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 07/09/2013 08:35 AM, Josh Berkus wrote:
 Since these are cloud servers, they won't work well for performance
 testing.

 I did some work on that a while ago, and found that I was able to get
 _astonishingly_ stable performance results out of EC2 EBS instances
 using provisioned IOPS volumes. Running them as EBS Optimized didn't
 seem to be required for the workloads I was testing on.

My colleague, Greg Burek, has done similar measurements and has
assessed an overall similar conclusion: the EBS PIOPS product delivers
exactly what it says on the tin...even under random access.  They can
be striped with software-RAID.

 These VMs aren't well suited to vertical scaling performance tests and
 pushing extremes, but they're really, really good for what impact does
 this patch have on regular real-world workloads.

The really, really big ones are useful even for pushing limits, such
as cr1.8xlarge, with 32 CPUs and 244GiB memory.  Current spot instance
price (the heavily discounted can die at any time one) is $0.343/hr.
 Otherwise, it's 3.500/hr.

Another instance offering that -- unlike the former -- I have yet to
experience in any way at all is the high-storage ones, also the
hs1.8xlarge, with 16CPU, 117GB RAM, and 24 instance-local rotational
media of 2TB apiece.  This is enough to deliver sequential reads
measured in a couple of GB/s.  I think this is a workhorse behind the
AWS Redshift data warehousing offering.  I don't think this one has
spot pricing either: my guess is availability is low.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] askpass program for libpq

2013-06-15 Thread Daniel Farina

On Fri, May 17, 2013 at 2:03 PM, Daniel Farina dan...@heroku.com wrote:
 On Wed, Jan 9, 2013 at 5:17 AM, Peter Eisentraut pete...@gmx.net wrote:
 I would like to have something like ssh-askpass for libpq.  The main
 reason is that I don't want to have passwords in plain text on disk,
 even if .pgpass is read protected.  By getting the password from an
 external program, I can integrate libpq tools with the host system's key
 chain or wallet thing, which stores passwords encrypted.

 I'm thinking about adding a new connection option askpass with
 environment variable PGASKPASS.  One thing I haven't quite figured out
 is how to make this ask for passwords only if needed.  Maybe it needs
 two connection options, one to say which program to use and one to say
 whether to use it.

 Ideas?

 Okay, I have a patch that does something *like* (but not the same) as
 this, and whose implementation is totally unreasonable, but it's
 enough to get a sense of how the whole thing feels.  Critically, it
 goes beyond askpass, instead allowing a shell-command based hook for
 arbitrary interpretation and rewriting of connection info...such as
 the 'host' libpq keyword.  I have called it, without much thought, a
 'resolver'.  In this way, it's closer to the libpq 'service' facility,
 except with addition of complete control of the interpretation of
 user-provided notation.

Hello everyone,

I'm sort of thinking of attacking this problem again, does anyone have
an opinion or any words of (en/dis)couragement to continue?  The
implementation I posted is bogus but is reasonable to feel around
with, but I'm curious besides its obvious defects as to what the
temperature of opinion is.

Most generally, I think the benefits are strongest in dealing with:

* Security: out-of-band secrets will just prevent people from pasting
  important stuff all over the place, as I see despairingly often
  today.

* Client-side Proxies: pgbouncer comes to mind, a variation being used
  on production applications right now that uses full-blown
  preprocessing of the user environment (only possible in a
  environment with certain assumptions like Heroku)
  https://github.com/gregburek/heroku-buildpack-pgbouncer seems very
  promising and effective, but it'd be nice to confer the same
  benefits to everyone else, too.

* HA: one of the most annoying problems in HA is naming things.  Yes,
  this could be solved with other forms of common dynamic binding DNS
  or Virtual IP (sometimes), but these both are pretty complicated and
  carry baggage and pitfalls, but as long as there is dynamic binding
  of the credentials, I'm thinking it may make sense to have dynamci
  binding of net locations, too.

* Cross-server references

  This is basically the issues seen in HA and Security, but on
  (horrible) steroids: the spate of features making Postgres work
  cross-server (older features like dblink, but now also new ones like
  FDWs and Writable FDWs) make complex interconnection between servers
  more likely and problematic, especially if one has standbys where
  there is a delay in catalog propagation from a primary to standby
  with new connection info.

  So, an out of band way where one can adjust the dynamic binding
  seems useful there.

Knowing those, am I barking up the wrong tree?  Can I do something
else entirely?  I've considered DNS and SSL certs, but these seem
much, much harder and limited, too.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] askpass program for libpq

2013-06-15 Thread Daniel Farina

On Sat, Jun 15, 2013 at 8:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 Okay, I have a patch that does something *like* (but not the same) as
 this, and whose implementation is totally unreasonable, but it's
 enough to get a sense of how the whole thing feels.  Critically, it
 goes beyond askpass, instead allowing a shell-command based hook for
 arbitrary interpretation and rewriting of connection info...such as
 the 'host' libpq keyword.  I have called it, without much thought, a
 'resolver'.  In this way, it's closer to the libpq 'service' facility,
 except with addition of complete control of the interpretation of
 user-provided notation.

 Hello everyone,

 I'm sort of thinking of attacking this problem again, does anyone have
 an opinion or any words of (en/dis)couragement to continue?  The
 implementation I posted is bogus but is reasonable to feel around
 with, but I'm curious besides its obvious defects as to what the
 temperature of opinion is.

 Most generally, I think the benefits are strongest in dealing with:

 * Security: out-of-band secrets will just prevent people from pasting
   important stuff all over the place, as I see despairingly often
   today.

 * Client-side Proxies: pgbouncer comes to mind, a variation being used
   on production applications right now that uses full-blown
   preprocessing of the user environment (only possible in a
   environment with certain assumptions like Heroku)
   https://github.com/gregburek/heroku-buildpack-pgbouncer seems very
   promising and effective, but it'd be nice to confer the same
   benefits to everyone else, too.

 * HA: one of the most annoying problems in HA is naming things.  Yes,
   this could be solved with other forms of common dynamic binding DNS
   or Virtual IP (sometimes), but these both are pretty complicated and
   carry baggage and pitfalls, but as long as there is dynamic binding
   of the credentials, I'm thinking it may make sense to have dynamci
   binding of net locations, too.

 * Cross-server references

   This is basically the issues seen in HA and Security, but on
   (horrible) steroids: the spate of features making Postgres work
   cross-server (older features like dblink, but now also new ones like
   FDWs and Writable FDWs) make complex interconnection between servers
   more likely and problematic, especially if one has standbys where
   there is a delay in catalog propagation from a primary to standby
   with new connection info.

   So, an out of band way where one can adjust the dynamic binding
   seems useful there.

 TBH, I see no clear reason to think that a connection-string rewriter
 solves any of those problems.  At best it would move them somewhere else.

Yes, that's exactly what I want to achieve: moving them somewhere else
that can be held in common by client applications.

 Nor is it clear that any of this should be libpq's business, as opposed
 to something an application might do before invoking libpq.

Yes, it's unclear.  I have only arrived at seriously exploring this
after trying my very best meditate on other options.  In addition,
sometimes 'the application' is Postgres, and with diverse access paths
like FDWs and dblink, which may not be so easy to adjust, and it would
seem strange to adjust them in a way that can't be shared in common
with regular non-backend-linked client applications.

Also, it seems like a very high bar to set for an application as to
make use of environment keychains or environment-specific high
availability retargeting.  This general approach you mention is used
in Greg Burek's heroku-pgbouncer buildpack, but it took significant
work to iron out (and probably still needs more ironing, although it
seems to work great) and only serves the Heroku-verse.

Basically, the needs seem very similar, so abstracting seems to me
profitable.

This is not that different in principle than pgpass and the services
file in that regard, except taking the final step to delegate their
function...and deliver full control over notation.

Although I don't know much about it, I seem to recall that VMWare felt
inclined to instigate some kind of vaguely related solution to solve a
similar problem in a custom libpq.  After initial recoil and over a
year of contemplation, I think the reasons are more well justified
than I originally thought and it'd be nice to de-weirdify such
approaches.

 Also, I think a facility dependent on invoking a shell command is
 (a) wide open for security problems, and (b) not likely to be
 portable to Windows.

Yeah, those things occurred to me, I think a dlopen based mechanism is
a more likely solution than the shell-command one.  The latter just
let me get started quickly to experiment.  Would a rigorous proposal
about how to do that help the matter?  I mostly wanted to get the
temperature before thinking about Real Mechanisms.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your

Re: [HACKERS] updated emacs configuration

2013-06-14 Thread Daniel Farina

On Thu, Jun 13, 2013 at 6:27 PM, Peter Eisentraut pete...@gmx.net wrote:
 First, I propose adding a .dir-locals.el file to the top-level directory
 with basic emacs settings.  These get applied automatically.  This
 especially covers the particular tab and indentation settings that
 PostgreSQL uses.  With this, casual developers will not need to modify
 any of their emacs settings.

Yes please.  I've had the pgsql stuff in my .emacs for-ever (ever since I
was a student and compelled to do homework on Postgres) and knew the
magical rules about naming the directory, but it always felt so dirty
and very much a 'you need to know the trick' level of intimacy.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Daniel Farina

On Mon, Jun 10, 2013 at 11:59 AM, Josh Berkus j...@agliodbs.com wrote:
 Anyway, what I'm pointing out is that this is a business decision, and
 there is no way that we can make a decision for the users what to do
 when we run out of WAL space.  And that the stop archiving option
 needs to be there for users, as well as the shut down option.
 *without* requiring users to learn the internals of the archiving system
 to implement it, or to know the implied effects of non-obvious
 PostgreSQL settings.

I don't doubt this, that's why I do have a no-op fallback for
emergencies.  The discussion was about defaults.  I still think that
drop-wal-from-archiving-whenever is not a good one.

You may have noticed I also wrote that a neater, common way to drop
WAL when under pressure might be nice, to avoid having it ad-hoc and
all over, so it's not as though I wanted to suggest an Postgres
feature to this effect was an anti-feature.

And, as I wrote before, it's much easier to teach an external system
to drop WAL than it is to teach Postgres to attenuate, hence the
repeated correspondence from my fellows and myself about attenuation
side of the equation.

Hope that clears things up about where I stand on the matter.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-10 Thread Daniel Farina

On Mon, Jun 10, 2013 at 4:42 PM, Josh Berkus j...@agliodbs.com wrote:
 Daniel, Jeff,

 I don't doubt this, that's why I do have a no-op fallback for
 emergencies.  The discussion was about defaults.  I still think that
 drop-wal-from-archiving-whenever is not a good one.

 Yeah, we can argue defaults for a long time.  What would be better is
 some way to actually determine what the user is trying to do, or wants
 to happen.  That's why I'd be in favor of an explict setting; if there's
 a setting which says:

 on_archive_failure=shutdown

 ... then it's a LOT clearer to the user what will happen if the archive
 runs out of space, even if we make no change to the defaults.  And if
 that setting is changeable on reload, it even becomes a way for users to
 get out of tight spots.

I like your suggestion, save one thing: it's not a 'failure' or
archiving if it cannot keep up, provided one subscribes to the view
that archiving is not elective.  I nit pick at this because one might
think this has something to do with a non-zero return code from the
archiving program, which already has a pretty alarmist message in
event of transient failures (I think someone brought this up on
-hackers but a few months ago...can't remember if that resulted in a
change).

I don't have a better suggestion that is less jargonrific though, but
I wanted to express my general appreciation as to the shape of the
suggestion.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-07 Thread Daniel Farina

On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus j...@agliodbs.com wrote:
 Right now, what we're telling users is You can have continuous backup
 with Postgres, but you'd better hire and expensive consultant to set it
 up for you, or use this external tool of dubious provenance which
 there's no packages for, or you might accidentally cause your database
 to shut down in the middle of the night.

Inverted and just as well supported: if you want to not accidentally
lose data, you better hire an expensive consultant to check your
systems for all sorts of default 'safety = off' features.  This
being but the hypothetical first one.

Furthermore, I see no reason why high quality external archiving
software cannot exist.  Maybe some even exists already, and no doubt
they can be improved and the contract with Postgres enriched to that
purpose.

Contrast: JSON, where the stable OID in the core distribution helps
pragmatically punt on a particularly sticky problem (extension
dependencies and non-system OIDs), I can't think of a reason an
external archiver couldn't do its job well right now.

 At which point most sensible users say no thanks, I'll use something else.

Oh, I lost some disks, well, no big deal, I'll use the archives.  Surprise!

forensic analysis ensues

So, as it turns out, it has been dropping segments at times because of
systemic backlog for months/years.

Alternative ending:

Hey, I restored the database.

later Why is the state so old?  Why are customers getting warnings
that their (thought paid) invoices are overdue?  Oh crap, the restore
was cut short by this stupid option and this database lives in the
past!

Fin.

I have a clear bias in experience here, but I can't relate to someone
who sets up archives but is totally okay losing a segment unceremoniously,
because it only takes one of those once in a while to make a really,
really bad day.  Who is this person that lackadaisically archives, and
are they just fooling themselves?  And where are these archivers that
enjoy even a modicum of long-term success that are not reliable?  If
one wants to casually drop archives, how is someone going to find out
and freak out a bit?  Per experience, logs are pretty clearly
hazardous to the purpose.

Basically, I think the default that opts one into danger is not good,
especially since the system is starting from a position of do too
much stuff and you'll crash.

Finally, it's not that hard to teach any archiver how to no-op at
user-peril, or perhaps Postgres can learn a way to do this expressly
to standardize the procedure a bit to ease publicly shared recipes, perhaps.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2013-06-06 Thread Daniel Farina

On Wed, Jun 5, 2013 at 10:27 PM, Joshua D. Drake j...@commandprompt.com wrote:

 On 6/5/2013 10:07 PM, Daniel Farina wrote:


 If I told you there were some of us who would prefer to attenuate the
 rate that things get written rather than cancel or delay archiving for
 a long period of time, would that explain the framing of the problem?


 I understand that based on what you said above.


 Or, is it that you understand that's what I want, but find the notion
 of such a operation hard to relate to?


 I think this is where I am at. To me, you don't attenuate the rate that
 things get written, you fix the problem in needing to do so. The problem is
 one of provisioning. Please note that I am not suggesting there aren't
 improvements to be made, there absolutely are. I just wonder if we are
 looking in the right place (outside of some obvious badness like the PANIC
 running out of disk space).

Okay, well, I don't see the fact that the block device is faster than
the archive command as a problem, it's just an artifact of the
ratios of performance of stuff in the system.  If one views archives
as a must-have, there's not much other choice than to attenuate.

An alternative is to buy a slower block device.  That'd accomplish the
same effect, but it's a pretty bizarre and heavyhanded way to go about
it, and not easily adaptive to, say, if I made the archive command
faster (in my case, I well could, with some work).

So, I don't think it's all that unnatural to allow for the flexibility
of a neat attenuation technique, and it's pretty important too.
Methinks.  Disagree?

Final thought: I can't really tell users to knock off what they're
doing on a large scale.  It's better to not provide abrupt changes in
service (like crashing or turning off everything for extended periods
while the archive uploads).  So, smoothness and predictability is
desirable.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2013-06-06 Thread Daniel Farina

On Wed, Jun 5, 2013 at 11:05 PM, Joshua D. Drake j...@commandprompt.com wrote:

 On 6/5/2013 10:54 PM, Peter Geoghegan wrote:

 On Wed, Jun 5, 2013 at 10:27 PM, Joshua D. Drake j...@commandprompt.com
 wrote:

 I just wonder if we are looking in the right place (outside of some
 obvious
 badness like the PANIC running out of disk space).

 So you don't think we should PANIC on running out of disk space? If
 you don't think we should do that, and you don't think that WAL
 writing should be throttled, what's the alternative?


 As I mentioned in my previous email:


 Instead of running out of disk space PANIC we should just write to an
 emergency location within PGDATA and log very loudly that the SA isn't
 paying attention. Perhaps if that area starts to get to an unhappy place we
 immediately bounce into read-only mode and log even more loudly that the SA
 should be fired. I would think read-only mode is safer and more polite than
 an PANIC crash.

 I do not think we should worry about filling up the hard disk except to
 protect against data loss in the event. It is not user unfriendly to assume
 that a user will pay attention to disk space. Really?

Okay, then I will say it's user unfriendly, especially for a transient
use of space, and particularly if there's no knob for said SA to
attenuate what's going on.  You appear to assume the SA can lean on
the application to knock off whatever is going on or provision more
disk in time, or that disk is reliable enough to meet one's goals.  In
my case, none of these precepts are true or desirable.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hard limit on WAL space used (because PANIC sucks)

2013-06-06 Thread Daniel Farina

On Thu, Jun 6, 2013 at 9:30 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I would oppose that as the solution, either an unconditional one, or
 configurable with is it as the default.  Those segments are not unneeded.  I
 need them.  That is why I set up archiving in the first place.  If you need
 to shut down the database rather than violate my established retention
 policy, then shut down the database.

Same boat.  My archives are the real storage.  The disks are
write-back caching.  That's because the storage of my archives is
probably three to five orders of magnitude more reliable.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2013-06-05 Thread Daniel Farina

On Wed, Jun 5, 2013 at 6:00 PM, Joshua D. Drake j...@commandprompt.com wrote:
 I didn't see that proposal, link? Because the idea of slowing down
 wal-writing sounds insane.

It's not as insane as introducing an archiving gap, PANICing and
crashing, or running this hunk o junk I wrote
http://github.com/fdr/ratchet


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redesigning checkpoint_segments

2013-06-05 Thread Daniel Farina

On Wed, Jun 5, 2013 at 8:23 PM, Joshua D. Drake j...@commandprompt.com wrote:
 It's not as insane as introducing an archiving gap, PANICing and
 crashing, or running this hunk o junk I wrote
 http://github.com/fdr/ratchet


 Well certainly we shouldn't PANIC and crash but that is a simple fix. You
 have a backup write location and start logging really loudly that you are
 using it.

If I told you there were some of us who would prefer to attenuate the
rate that things get written rather than cancel or delay archiving for
a long period of time, would that explain the framing of the problem?

Or, is it that you understand that's what I want, but find the notion
of such a operation hard to relate to?

Or, am I misunderstanding your confusion?

Or, none of the above?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Planning incompatibilities for Postgres 10.0

2013-05-29 Thread Daniel Farina

On Mon, May 27, 2013 at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 27 May 2013 15:36, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Mon, May 27, 2013 at 08:26:48AM -0400, Stephen Frost wrote:
 That said, many discussions and ideas do get shut down, perhaps too
 early, because of pg_upgrade considerations.  If we had a plan to have
 an incompatible release in the future, those ideas and discussions might
 be able to progress to a point where we determine it's worth it to take
 the pain of a non-pg_upgrade-supported release.  That's a bit of a
 stretch, in my view, but I suppose it's possible.  Even so though, I
 would suggest that we put together a wiki page to list out those items
 and encourage people to add to such a list; perhaps having an item on
 that list would make discussion about it progress beyond it breaks
 pg_upgrade.

 Yes, we should be collecting things we want to do for a pg_upgrade break
 so we can see the list all in one place.

 Precisely.  We've said right along that we reserve the right to have a
 non-upgradable disk format change whenever sufficiently many reasons
 accumulate to do that.

Here's one that's come up a few times: being able to tweak the
out-of-line storage strategy, e.g. change the compression format used.
 I think some folks were lamenting the lack of a convenient byte in
the right place for that one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better handling of archive_command problems

2013-05-17 Thread Daniel Farina

On Thu, May 16, 2013 at 9:13 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 16, 2013 at 10:06 PM, Daniel Farina dan...@heroku.com wrote:
 Do you have a sketch about mechanism to not encounter that problem?

 I didn't until just now, but see my email to Peter.  That idea might
 be all wet, but off-hand it seems like it might work...

 However little it may matter, I would like to disagree with your
 opinion on this one: the current situation as I imagine encountered by
 *all* users of archiving is really unpleasant, 'my' shop or no.  It
 would probably not be inaccurate to say that 99.% of archiving
 users have to live with a hazy control over the amount of data loss,
 only bounded by how long it takes for the system to full up the WAL
 file system and then for PostgreSQL to PANIC and crash (hence, no more
 writes are processed, and no more data can be lost).

 I'm really not trying to pick a fight here.  What I am saying is that
 it is the problem is not so bad that we should accept the first design
 proposed, despite the obvious problems with it, without trying to find
 a better one.  The first post to -hackers on this was 4 days ago.
 Feature freeze for the first release that could conceivably contain
 this feature will most likely occur about 8 months from now.  We can
 take a few more days or weeks to explore alternatives, and I believe
 it will be possible to find good ones.  If I were trying to kill this
 proposal, I could find better ways to do it than clearly articulating
 my concerns at the outset.

I know you are not trying to kill it or anything, and I didn't mean to
pick a fight either: my point is, without rancor, that I think that
as-is the state of affairs fails the sniff test dimensions like
availability and exposure to data loss, much worse than some of the
negative effects of hobbling cancellations.

And I completely agree with you that given the timing of thinking
about this issue means one can marinate it for a while.

And, I hope it goes without saying, I am grateful for your noodling on
the matter, as always.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] askpass program for libpq

2013-05-17 Thread Daniel Farina

On Wed, Jan 9, 2013 at 5:17 AM, Peter Eisentraut pete...@gmx.net wrote:
 I would like to have something like ssh-askpass for libpq.  The main
 reason is that I don't want to have passwords in plain text on disk,
 even if .pgpass is read protected.  By getting the password from an
 external program, I can integrate libpq tools with the host system's key
 chain or wallet thing, which stores passwords encrypted.

 I'm thinking about adding a new connection option askpass with
 environment variable PGASKPASS.  One thing I haven't quite figured out
 is how to make this ask for passwords only if needed.  Maybe it needs
 two connection options, one to say which program to use and one to say
 whether to use it.

 Ideas?

Okay, I have a patch that does something *like* (but not the same) as
this, and whose implementation is totally unreasonable, but it's
enough to get a sense of how the whole thing feels.  Critically, it
goes beyond askpass, instead allowing a shell-command based hook for
arbitrary interpretation and rewriting of connection info...such as
the 'host' libpq keyword.  I have called it, without much thought, a
'resolver'.  In this way, it's closer to the libpq 'service' facility,
except with addition of complete control of the interpretation of
user-provided notation.

I think it would be useful to have an in-process equivalent
(e.g. loading hooks via dynamic linking, like the backend), if the
functionality seems worthwhile, mostly for performance and a richer
protocol between libpq and resolvers (error messages come to mind).  I
think with a bit of care this also would allow other drivers to more
easily implement some of the features grown into libpq (by re-using
the shared-object's protocol, or a generic wrapper that can load the
shared object to provide a command line interface), like the .pgpass
file and the services file, which are often missing from new driver
implementations.

With these disclaimers out of the way, here's a 'demo'.  It's long,
but hopefully not as dense as it first appears since it's mostly a
narration of walking through using this.

Included alongside the patch is a short program that integrates with
the freedesktop.org secret service standard, realized in the form of
libsecret (I don't know exactly what the relationship is, but I used
libsecret's documentation[0]).  This resolver I wrote is not a solid
piece of work either, much like the libpq patch I've attached...but, I
probably will try to actually make personal use of this for a time
after fixing a handful of things to feel it out over an extended
period.

On Debian, one needs gir1.2-secret-1 and Python 2.7.  I run this on
Ubuntu Raring, and have traced my steps using the GNOME Keyring used
there, Seahorse.

# This tool has some help:
$ ./pq-secret-service -h
$ ./pq-secret-service store -h
$ ./pq-secret-service recall -h

# Store a secret:
$ printf 'dbname=regression host=localhost port=5432 user=user'\
' password=password' | ./pq-secret-service store

Now, you can check your keyring manager, and search for 'postgres' or
'postgresql'.  You should see a stored PostgreSQL secret, probably
labeled PostgreSQL Database Information

After having compiled a libpq with my patch applied and making sure a
'psql' is using that libpq...

# Set the 'resolve' command:
$ export PGRESOLVE='/path/to/pq-secret-service recall'

# Attempt to connect to regression database.  The credentials are
# probably not enabled on your database, and so you'll see the
# appropriate error message instead.
$ psql regression

What this does is searches the keyrings in the most straightforward
way given the Secret Service protocol to try to 'complete' the input
passed, based on any of the (non-secret) parts of the passed conninfo
to 'pq-secret-service store'.

Here's an example where the resolver opts to not load something from
the keyring:

$ psql 'dbname=regression user=not-the-one-stored'

Now the resolver doesn't find a secret, so it just passes-through the
original connection info.  Ditto if one passes a password.  Here are
more examples, this time explored by invoking the resolver command
directly, without having to go through libpq.  This makes testing and
experimentation somewhat easier:

$ echo $(printf 'regression' | ./pq-secret-service recall)
$ echo $(printf 'dbname=regression' | ./pq-secret-service recall)
$ echo $(printf 'host=localhost port=5432' | ./pq-secret-service recall)

# Show failing to match the secret (the input provided will be
# returned unchanged):
$ echo $(printf 'host=elsewhere port=5432' | ./pq-secret-service recall)

In some aspects the loose matching is kind of neat, because usually
some subset of database names, role names, and host names are
sufficient to find the right thing, but this is besides the point: the
resolver program can be made very strict, or lax, or can even try
opening a database connection to feel things out.  I do want to make
sure that at least some useful baseline functionality can be
implemented, though,

Re: [HACKERS] askpass program for libpq

2013-05-17 Thread Daniel Farina

On Fri, May 17, 2013 at 2:03 PM, Daniel Farina dan...@heroku.com wrote:
 Thanks for getting through all that text.  Fin.  And, thoughts?

I have uploaded the resolvers, the last mail, and the patch to github:

https://github.com/fdr/pq-resolvers

So, if one prefers to use git to get this and track potential changes,
or add contributions, please do feel encouraged to do so.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better handling of archive_command problems

2013-05-16 Thread Daniel Farina

On Thu, May 16, 2013 at 5:43 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 16, 2013 at 2:42 PM, Peter Geoghegan p...@heroku.com wrote:
 On Thu, May 16, 2013 at 11:16 AM, Robert Haas robertmh...@gmail.com wrote:
 Well, I think it IS a Postgres precept that interrupts should get a
 timely response.  You don't have to agree, but I think that's
 important.

 Well, yes, but the fact of the matter is that it is taking high single
 digit numbers of seconds to get a response at times, so I don't think
 that there is any reasonable expectation that that be almost
 instantaneous. I don't want to make that worse, but then it might be
 worth it in order to ameliorate a particular pain point for users.

 At times, like when the system is under really heavy load?  Or at
 times, like depending on what the backend is doing?  We can't do a
 whole lot about the fact that it's possible to beat a system to death
 so that, at the OS level, it stops responding.  Linux is unfriendly
 enough to put processes into non-interruptible kernel wait states when
 they're waiting on the disk, a decision that I suspect to have been
 made by a sadomasochist.  But if there are times when a system that is
 not responding to cancels in under a second when not particularly
 heavily loaded, I would consider that a bug, and we should fix it.

 There is a setting called zero_damaged_pages, and enabling it causes
 data loss. I've seen cases where it was enabled within postgresql.conf
 for years.

 That is both true and bad, but it is not a reason to do more bad things.

 I don't think it's bad. I think that we shouldn't be paternalistic
 towards our users. If anyone enables a setting like zero_damaged_pages
 (or, say, wal_write_throttle) within their postgresql.conf
 indefinitely for no good reason, then they're incompetent. End of
 story.

 That's a pretty user-hostile attitude.  Configuration mistakes are a
 very common user error.  If those configuration hose the system, users
 expect to be able to change them back, hit reload, and get things back
 on track.  But you're proposing a GUC that, if set to a bad value,
 will very plausibly cause the entire system to freeze up in such a way
 that it won't respond to a reload request - or for that matter a fast
 shutdown request.  I think that's 100% unacceptable.  Despite what you
 seem to think, we've put a lot of work into ensuring interruptibility,
 and it does not make sense to abandon that principle for this or any
 other feature.

The inability to shut down in such a situation is not happy at all, as
you say, and the problems with whacking the GUC around due to
non-interruptability is pretty bad too.

 Would you feel better about it if the setting had a time-out? Say, the
 user had to explicitly re-enable it after one hour at the most?

 No, but I'd feel better about it if you figured out a way avoid
 creating a scenario where it might lock up the entire database
 cluster.  I am convinced that it is possible to avoid that

Do you have a sketch about mechanism to not encounter that problem?

 and that without that this is not a feature worthy of being included
 in PostgreSQL.  Yeah, it's more work that way.  But that's the
 difference between a quick hack that is useful in our shop and a
 production-quality feature ready for a general audience.

However little it may matter, I would like to disagree with your
opinion on this one: the current situation as I imagine encountered by
*all* users of archiving is really unpleasant, 'my' shop or no.  It
would probably not be inaccurate to say that 99.% of archiving
users have to live with a hazy control over the amount of data loss,
only bounded by how long it takes for the system to full up the WAL
file system and then for PostgreSQL to PANIC and crash (hence, no more
writes are processed, and no more data can be lost).

Once one factors in the human cost of having to deal with that down
time or monitor it to circumvent this, I feel as though the bar for
quality should be lowered.  As you see, we've had to resort to
horrific techniques that to get around this problem.

I think this is something serious enough that it is worth doing
better, but the bind that people doing archiving find themselves in is
much worse at the margins -- involving data loss and loss of
availability -- and accordingly, I think the bar for some kind of
solution should be lowered, insomuch as that at least the interface
should be right enough to not be an albatross later (of which this
proposal may not meet).

That said, there is probably a way to please everyone and do something
better.  Any ideas?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on existing update construct

2013-05-15 Thread Daniel Farina

On Wed, May 15, 2013 at 11:44 AM, Dev Kumkar devdas.kum...@gmail.com wrote:
 Hello,

 Is there an alternative of Sybase on existing update construct in pgsql.

 ON DUPLICATE KEY UPDATE doesn't work.

 Thanks in advance!

No, you'll have to either handle this in the application or use a
stored procedure at this time.  The omission of such a construct from
psql's \h command and the manual is not in error.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)

2013-05-15 Thread Daniel Farina

On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when
 I run pgbench -S -c64 -j64 -T60 -M prepared on a 32-core Linux machine:

 -  64.09%  postgres  postgres   [.] tas
- tas
   - 99.83% s_lock
  - 53.22% LWLockAcquire
 + 99.87% GetSnapshotData
  - 46.78% LWLockRelease
   GetSnapshotData
 + GetTransactionSnapshot
 +   2.97%  postgres  postgres   [.] tas
 +   1.53%  postgres  libc-2.13.so   [.] 0x119873
 +   1.44%  postgres  postgres   [.] GetSnapshotData
 +   1.29%  postgres  [kernel.kallsyms]  [k] arch_local_irq_enable
 +   1.18%  postgres  postgres   [.] AllocSetAlloc
 ...

 So, on this test, a lot of time is wasted spinning on the mutex of
 ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a
 surprisingly steep drop in performance once you go beyond 29 clients
 (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory
 is that after that point all the cores are busy, and processes start to be
 sometimes context switched while holding the spinlock, which kills
 performance.

I have, I also used linux perf to come to this conclusion, and my
determination was similar: a system was undergoing increasingly heavy
load, in this case with processes  number of processors.  It was
also a phase-change type of event: at one moment everything would be
going great, but once a critical threshold was hit, s_lock would
consume enormous amount of CPU time.  I figured preemption while in
the spinlock was to blame at the time, given the extreme nature.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better LWLocks with compare-and-swap (9.4)

2013-05-15 Thread Daniel Farina

On Wed, May 15, 2013 at 3:08 PM, Daniel Farina dan...@heroku.com wrote:
 On Mon, May 13, 2013 at 5:50 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 pgbench -S is such a workload. With 9.3beta1, I'm seeing this profile, when
 I run pgbench -S -c64 -j64 -T60 -M prepared on a 32-core Linux machine:

 -  64.09%  postgres  postgres   [.] tas
- tas
   - 99.83% s_lock
  - 53.22% LWLockAcquire
 + 99.87% GetSnapshotData
  - 46.78% LWLockRelease
   GetSnapshotData
 + GetTransactionSnapshot
 +   2.97%  postgres  postgres   [.] tas
 +   1.53%  postgres  libc-2.13.so   [.] 0x119873
 +   1.44%  postgres  postgres   [.] GetSnapshotData
 +   1.29%  postgres  [kernel.kallsyms]  [k] arch_local_irq_enable
 +   1.18%  postgres  postgres   [.] AllocSetAlloc
 ...

 So, on this test, a lot of time is wasted spinning on the mutex of
 ProcArrayLock. If you plot a graph of TPS vs. # of clients, there is a
 surprisingly steep drop in performance once you go beyond 29 clients
 (attached, pgbench-lwlock-cas-local-clients-sets.png, red line). My theory
 is that after that point all the cores are busy, and processes start to be
 sometimes context switched while holding the spinlock, which kills
 performance.

I accidentally some important last words from Heikki's last words in
his mail, which make my correspondence pretty bizarre to understand at
the outset.  Apologies.  He wrote:

 [...] Has anyone else seen that pattern?

 I have, I also used linux perf to come to this conclusion, and my
 determination was similar: a system was undergoing increasingly heavy
 load, in this case with processes  number of processors.  It was
 also a phase-change type of event: at one moment everything would be
 going great, but once a critical threshold was hit, s_lock would
 consume enormous amount of CPU time.  I figured preemption while in
 the spinlock was to blame at the time, given the extreme nature.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Better handling of archive_command problems

2013-05-13 Thread Daniel Farina

On Mon, May 13, 2013 at 3:02 PM, Peter Geoghegan p...@heroku.com wrote:
 Has anyone else thought about approaches to mitigating the problems
 that arise when an archive_command continually fails, and the DBA must
 manually clean up the mess?

Notably, the most common problem in this vein suffered at Heroku has
nothing to do with archive_command failing, and everything to do with
the ratio of block device write performance (hence, backlog) versus
the archiving performance.  When CPU is uncontended it's not a huge
deficit, but it is there and it causes quite a bit of stress.

Archive commands failing are definitely a special case there, where it
might be nice to bring write traffic to exactly zero for a time.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_controldata gobbledygook

2013-04-26 Thread Daniel Farina

On Thu, Apr 25, 2013 at 9:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 I think I've heard of scripts grepping the output of pg_controldata for
 this that or the other.  Any rewording of the labels would break that.
 While I'm not opposed to improving the labels, I would vote against your
 second, abbreviated scheme because it would make things ambiguous for
 simple grep-based scripts.

 We could provide two alternative outputs, one for human consumption with
 the proposed format and something else that uses, say, shell assignment
 syntax.  (I did propose this years ago and I might have an unfinished
 patch still lingering about somewhere.)

 And a script would use that how?  pg_controldata --machine-friendly
 would fail outright on older versions.  I think it's okay to ask script
 writers to write
 pg_controldata | grep -e 'old label|new label'
 but not okay to ask them to deal with anything as complicated as trying
 a switch to see if it works or not.

From what I'm reading, it seems like the main benefit of the changes
is to make things easier for humans to skim over.  Automated programs
that care about precise meanings of each field are awkwardly but
otherwise well-served by the precise output as rendered right now.

What about doing something similar but different from the
--machine-readable proposal, such as adding an option for the
*human*-readable variant that is guaranteed to mercilessly change as
human-readers/-hackers sees fit on whim?  It's a bit of a kludge that
this is not the default, but would prevent having to serve two quite
different masters with the same output.

Although I'm not seriously proposing explicitly -h (as seen in some
GNU programs in rendering byte sizes and the like...yet could be
confused for 'help'), something like that may serve as prior art.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 release notes suggestions

2013-04-25 Thread Daniel Farina

On Wed, Apr 24, 2013 at 6:30 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 23, 2013 at 11:41 PM, Bruce Momjian br...@momjian.us wrote:
 Thanks for the many suggestions on improving the 9.3 release notes.
 There were many ideas I would have never thought of.  Please keep the
 suggestions coming.

 Bruce,

 Thanks for writing them!

Consider the sentiment duplicated.  Thank you, Bruce.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] confusing message about archive failures

2013-04-19 Thread Daniel Farina

On Wed, Apr 17, 2013 at 7:33 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wednesday, April 17, 2013, Peter Eisentraut wrote:

 When archive_command fails three times, it prints this message into the
 logs:

 transaction log file \%s\ could not be archived: too many failures

 This leaves it open what happens next.  What will actually happen is
 that it will usually try again after 60 seconds or so, but the message
 indicates something much more fatal than that.

 Could we rephrase this a little bit to make it less dramatic, like

 ... too many failures, will try again later

 ?


 +1  I've found the current message alarming/confusing as well.  But I don't
 really understand the logic behind bursting the attempts, 3 of them one
 second apart, then sleeping 57 seconds, in the first place.

Same.  By now I am numb, but when I was first rolling out archives
ages ago the message was cause for more much alarm than was indicated.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-18 Thread Daniel Farina

On Wed, Apr 17, 2013 at 11:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-04-17 18:16:36 -0700, Daniel Farina wrote:
 The original paper is often shorthanded Castagnoli 93, but it exists
 in the IEEE's sphere of influence and is hard to find a copy of.
 Luckily, a pretty interesting survey paper discussing some of the
 issues was written by Koopman in 2002 and is available:
 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a
 pedagolgical note, it's pretty interesting and accessible piece of
 writing (for me, as someone who knows little of error
 detection/correction) and explains some of the engineering reasons
 that provoke such exercises.

 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=arnumber=231911userType=inst

 There's also a koopman paper from 2004 thats interesting.

Having read the 2002 paper more, it seems that the current CRC32
doesn't have a whole lot going for it: CRC32C pretty much cleans its
clock across the board (I don't understand detected Hamming Distance
that seem greater than the information content of the message, e.g. HD
14 with 8 bit messages as seen in CRC32C: that's where CRC32 can win).

CRC32C looks, all in all, the most flexible, because detection of
Hamming Distance 4 spans from 5244-131072 bits (the upper range of
which is a full 16KiB!) and there is superior Hamming Distance
detection on shorter messages up until the point where it seems like
the Hamming Distance able to be detected is larger than the message
size itself (e.g. HM 13 on an 8 bit message).  I'm not sure if this is
an error in my understanding, or what.

Also, larger runs (16KB) are better served by CRC32C: even the
probably-best contender I can see (0xD419CC15) drops to Hamming
Distance 2-detection right after 65505 bits.  CRC32C has the biggest
range at HD4, although Koopman 0xBA0DC66 comes close, gaining superior
Hamming distance detection for 178-16360 bits (the upper end of this
rnage is short of 2KiB by 3 bytes).

All in all, there is no reason I can see to keep CRC32 at all, vs
CRC32C on the basis of error detection alone, so putting aside all the
business about instruction set architecture, I think a software CRC32C
in a vacuum can be seen as a robustness improvement.

There may be polynomials that are not CRC32 or CRC32C that one might
view as having slightly better tradeoffs as seen in Table 1 of Koopman
2002, but it's kind of a stretch: being able to handle 8KB and 16KB as
seen in CRC32C at HD4 as seen in CRC32C is awfully compelling to me.
Koopman 0xBA0DC66B can admirably reach HD6 on a much larger range, up
to 16360 bytes, which is every so shy of 2KiB.  Castagnoli 0xD419CC15
can, short of 8KB by 31 bits can detect HD 5.

Corrections welcome on my interpretations of Tbl 1.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-04-17 Thread Daniel Farina

On Wed, Apr 17, 2013 at 5:21 PM, Greg Smith g...@2ndquadrant.com wrote:
 Let me see if I can summarize where the messages flying by are at since
 you'd like to close this topic for now:

 -Original checksum feature used Fletcher checksums.  Its main problems, to
 quote wikipedia, include that it cannot distinguish between blocks of all 0
 bits and blocks of all 1 bits.

 -Committed checksum feature uses truncated CRC-32.  This has known good
 error detection properties, but is expensive to compute.  There's reason to
 believe that particular computation will become cheaper on future platforms
 though.  But taking full advantage of that will require adding CPU-specific
 code to the database.

 -The latest idea is using the Fowler–Noll–Vo hash function:
 https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash  There's 20 years of
 research around when that is good or bad.  The exactly properties depend on
 magic FNV primes:  http://isthe.com/chongo/tech/comp/fnv/#fnv-prime that
 can vary based on both your target block size and how many bytes you'll
 process at a time.  For PostgreSQL checksums, one of the common
 problems--getting an even distribution of the hashed values--isn't important
 the way it is for other types of hashes.  Ants and Florian have now dug into
 how exactly that and specific CPU optimization concerns impact the best
 approach for 8K database pages.  This is very clearly a 9.4 project that is
 just getting started.

I was curious about the activity in this thread and wanted to understand
the tradeoffs, and came to the same understanding as you when poking
around.  It seems the tough aspect of the equation is that the most
well studied thing is slow (CRC-32C) unless you have special ISA
support  Trying to find as much information and conclusive research on
FNV was a lot more challenging.  Fletcher is similar in that regard.

Given my hasty attempt to understand each of the alternatives, my
qualitative judgement is that, strangely enough, the most conservative
choice of the three (in terms of being understood and treated in the
literature more than ten times over) is CRC-32C, but it's also the one
being cast as only suitable inside micro-optimization.  To add
another, theoretically-oriented dimension to the discussion, I'd like
suggest it's also the most thoroughly studied of all the alternatives.
 I really had a hard time finding follow-up papers about the two
alternatives, but to be fair, I didn't try very hard...then again, I
didn't try very hard for any of the three, it's just that CRC32C was
by far the easiest find materials on.

The original paper is often shorthanded Castagnoli 93, but it exists
in the IEEE's sphere of influence and is hard to find a copy of.
Luckily, a pretty interesting survey paper discussing some of the
issues was written by Koopman in 2002 and is available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8323 As a
pedagolgical note, it's pretty interesting and accessible piece of
writing (for me, as someone who knows little of error
detection/correction) and explains some of the engineering reasons
that provoke such exercises.

Basically...if it comes down to understand what the heck is going on
and what the trade-offs are, it was a lot easier to brush up on
CRC32-C in my meandering around the Internet.

One might think this level of scrutiny would constitute a viable
explanation of why CRC32C found its way into several standards and
then finally in silicon.

All in all, if the real world costs of CRC32C on not-SSE4.2 are
allowable, I think it's the most researched and and conservative
option, although perhaps some of the other polynomials seen in Koopman
could also be desirable.  It seems there's a tradeoff in CRC
polynomials between long-message and short-message error detection,
and the paper above may allow for a more informed selection.  CRC32C
is considered a good trade-off for both, but I haven't assessed the
paper in enough detail to suggest whether there are specialized
long-run polynomials that may be better still (although, then, there
is also the microoptimization question, which postdates the literature
I was looking at by a lot).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interesting post-mortem on a near disaster with git

2013-04-03 Thread Daniel Farina

On Wed, Apr 3, 2013 at 6:18 PM, Jim Nasby j...@nasby.net wrote:
   What about rdiff-backup? I've set it up for personal use years ago

   (via the handy open source bash script backupninja) years ago and it

   has a pretty nice no-frills point-in-time, self-expiring, file-based

   automatic backup program that works well with file synchronization

   like rsync (I rdiff-backup to one disk and rsync the entire

   rsync-backup output to another disk). I've enjoyed using it quite a

   bit during my own personal-computer emergencies and thought the

   maintenance required from me has been zero, and I have used it from

   time to time to restore, proving it even works. Hardlinks can be used

   to tag versions of a file-directory tree recursively relatively

   compactly.

  

   It won't be as compact as a git-aware solution (since git tends to to

   rewrite entire files, which will confuse file-based incremental

   differential backup), but the amount of data we are talking about is

   pretty small, and as far as a lowest-common-denominator tradeoff for

   use in emergencies, I have to give it a lot of praise. The main

   advantage it has here is it implements point-in-time recovery

   operations that easy to use and actually seem to work. That said,

   I've mostly done targeted recoveries rather than trying to recover

   entire trees.

 I have the same set up, and same feedback.


 I had the same setup, but got tired of how rdiff-backup behaved when a
 backup was interrupted (very lengthy cleanup process). Since then I've
 switched to an rsync setup that does essentially the same thing as
 rdiff-backup (uses hardlinks between multiple copies).

 The only downside I'm aware of is that my rsync backups aren't guaranteed to
 be consistent (for however consistent a backup of an active FS would
 be...).

I forgot to add one more thing to my first mail, although it's very
important to my feeble recommendation: the problem is that blind
synchronization is a great way to propagate destruction.

rdiff-backup (but perhaps others, too) has a file/directory structure
that is, as far as I know, additive, and the pruning can be done
independently at different replicas that can have different
retention...and if done just right (I'm not sure about the case of
concurrent backups being taken) one can write a re-check that no files
are to be modified or deleted by the synchronization as a safeguard.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting to 9.3 beta

2013-03-29 Thread Daniel Farina

On Fri, Mar 29, 2013 at 8:22 AM, Andres Freund and...@2ndquadrant.com wrote:
 - pg_stat_statements: query, session, and eviction identification:
   Seems to need at least docs
   = wait for author, seems to  be easy enough?

I would have responded by now, but recent events have unfortunately
made me put a lot of things that take longer than fifteen minutes on
hold.  Still, I am watching.  Like you say, it's not terribly
invasive.  The under-estimation error propagation is perhaps a bit too
weird to justify (even though it's a simple implementation).

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default connection parameters for postgres_fdw and dblink

2013-03-28 Thread Daniel Farina

On Wed, Mar 27, 2013 at 8:22 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 27, 2013 at 10:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Use a service file maybe?  But you can't have it both ways: either we
 like the behavior of libpq absorbing defaults from the postmaster
 environment, or we don't.  You were just arguing this was a bug, and
 now you're calling it a feature.

 Maybe we could compromise and call it a beature.  Or a fug.  In all
 seriousness, it's in that grey area where most people would probably
 consider this a surprising and unwanted behavior, but there might be a
 few out there who had a problem and discovered that they could use
 this trick to solve it.

 In terms of a different solution, what about a GUC that can contain a
 connection string which is used to set connection defaults, but which
 can still be overriden?  So it would mimic the semantics of setting an
 environment variable, but it would be better, because you could do all
 of the usual GUC things with it instead of having to hack on the
 postmaster startup environment.

In my meta-experience, nobody really wants defaults in that
configuration anyway, and if they do, most of them would be
appreciative of being notified positively of such a problem of relying
on connection defaults (maybe with a warning at first at most?).   In
this case, such an abrupt change in the contrib is probably fine.

From the vantage point I have: full steam ahead, with the sans GUC
solution...it's one less detailed GUC in the world, and its desired
outcome can be achieved otherwise.  I'd personally rather open myself
up for dealing with the consequences of this narrow change in my own
corner of the world.

This anecdote carries a standard disclaimer: my meta-experience
doesn't include all use cases for everyone everywhere.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC project : K-medoids clustering in Madlib

2013-03-26 Thread Daniel Farina

On Tue, Mar 26, 2013 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Atri Sharma atri.j...@gmail.com writes:
 I suggested a couple of algorithms to be implemented in MADLib(apart
 from K Medoids). You could pick some(or all) of them, which would
 require 3 months to be completed.

 As for more information on index, you can refer

 http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1

 along with the postgres wiki. The wiki is the standard for anything postgres.

 pg_trgm used KNN, but I believe it uses its own implementation of the
 algorithm. The idea I proposed aims at writing an implementation in
 the MADlib so that any client program can use the algorithm(s) in
 their code directly, using MADlib functions.

 I'm a bit confused as to why this is being proposed as a
 Postgres-related project.  I don't even know what MADlib is, but I'm
 pretty darn sure that no part of Postgres uses it.  KNNGist certainly
 doesn't.

It's a reasonably well established extension for Postgres for
statistical and machine learning methods.  Rather neat, but as you
indicate, it's not part of Postgres proper.

http://madlib.net/

https://github.com/madlib/madlib/

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interesting post-mortem on a near disaster with git

2013-03-25 Thread Daniel Farina

On Mon, Mar 25, 2013 at 11:07 AM, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:
 Back when we used CVS for quite a few years I kept 7 day rolling
 snapshots of the CVS repo, against just such a difficulty as this. But
 we seem to be much better organized with infrastructure these days so I
 haven't done that for a long time.

 well there is always room for improvement(and for learning from others)
 - but I agree that this proposal seems way overkill. If people think we
 should keep online delayed mirrors we certainly have the resources to
 do that on our own if we want though...

What about rdiff-backup?  I've set it up for personal use years ago
(via the handy open source bash script backupninja) years ago and it
has a pretty nice no-frills point-in-time, self-expiring, file-based
automatic backup program that works well with file synchronization
like rsync (I rdiff-backup to one disk and rsync the entire
rsync-backup output to another disk).  I've enjoyed using it quite a
bit during my own personal-computer emergencies and thought the
maintenance required from me has been zero, and I have used it from
time to time to restore, proving it even works.  Hardlinks can be used
to tag versions of a file-directory tree recursively relatively
compactly.

It won't be as compact as a git-aware solution (since git tends to to
rewrite entire files, which will confuse file-based incremental
differential backup), but the amount of data we are talking about is
pretty small, and as far as a lowest-common-denominator tradeoff for
use in emergencies, I have to give it a lot of praise.  The main
advantage it has here is it implements point-in-time recovery
operations that easy to use and actually seem to work.  That said,
I've mostly done targeted recoveries rather than trying to recover
entire trees.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-22 Thread Daniel Farina

On Fri, Mar 22, 2013 at 12:29 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 This contains some edits to comments that referred to the obsolete and
 bogus TupleDesc scanning.  No mechanical alterations.

 Applied with some substantial revisions.  I didn't like where you'd put
 the apply/restore calls, for one thing --- we need to wait to do the
 applies until we have the PGresult in hand, else we might be applying
 stale values of the remote's GUCs.  Also, adding a call that could throw
 errors right before materializeResult() won't do, because that would
 result in leaking the PGresult on error.

Good catches.

 The struct for state seemed a
 bit of a mess too, given that you couldn't always initialize it in one
 place.

Yeah, I had to give that up when pushing things around, unless I
wanted to push more state down.  It used to be neater.

 (In hindsight I could have left that alone given where I ended
 up putting the calls, but it didn't seem to be providing any useful
 isolation.)

I studied your commit.

Yeah, the idea I had was to try to avoid pushing down a loaded a value
as a PGconn into the lower level helper functions, but perhaps that
economy was false one after the modifications.  Earlier versions used
to push down the RemoteGucs struct instead of a full-blown conn to
hint to the restricted purpose of that reference.  By conceding to this
pushdown I think the struct could have remained, as you said, but the
difference to clarity is likely marginal.  I thought I found a way to
not have to widen the parameter list at all, so I preferred that one,
but clearly it is wrong, w.r.t. leaks and the not up-to-date protocol
state.

Sorry you had to root around so much in there to get something you
liked, but thanks for going through it.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-21 Thread Daniel Farina

This contains some edits to comments that referred to the obsolete and
bogus TupleDesc scanning.  No mechanical alterations.

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2961,9 +2961,8 @@ initRemoteGucs(remoteGucs *rgs, PGconn *conn)
 }

 /*
- * Scan a TupleDesc and, should it contain types that are sensitive to
- * GUCs, acquire remote GUCs and set them in a new GUC nesting level.
- * This is undone with restoreLocalGucs.
+ * Acquire remote GUCs that may affect type parsing and set them in a
+ * new GUC nesting level.
  */
 static void
 applyRemoteGucs(remoteGucs *rgs)
@@ -2974,11 +2973,8 @@ applyRemoteGucs(remoteGucs *rgs)
int addedGucNesting = false;

/*
-* Affected types require local GUC manipulations.  Create a new
-* GUC NestLevel to overlay the remote settings.
-*
-* Also, this nesting is done exactly once per remoteGucInfo
-* structure, so expect it to come with an invalid NestLevel.
+* This nesting is done exactly once per remoteGucInfo structure,
+* so expect it to come with an invalid NestLevel.
 */
Assert(rgs-localGUCNestLevel == -1);

diff --git a/contrib/dblink/expected/dblink.out
b/contrib/dblink/expected/dblink.out
index 3946485..579664e 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -930,9 +930,8 @@ SELECT dblink_exec('myconn', 'SET datestyle =
GERMAN, DMY;');
  SET
 (1 row)

--- The following attempt test various paths at which TupleDescs are
--- formed and inspected for containment of types requiring local GUC
--- setting.
+-- The following attempt test various paths at which tuples are formed
+-- and inspected for containment of types requiring local GUC setting.
 -- single row synchronous case
 SELECT *
 FROM dblink('myconn',
diff --git a/contrib/dblink/sql/dblink.sql b/contrib/dblink/sql/dblink.sql
index de925eb..7ff43fd 100644
--- a/contrib/dblink/sql/dblink.sql
+++ b/contrib/dblink/sql/dblink.sql
@@ -435,9 +435,8 @@ SET timezone = UTC;
 SELECT dblink_connect('myconn','dbname=contrib_regression');
 SELECT dblink_exec('myconn', 'SET datestyle = GERMAN, DMY;');

--- The following attempt test various paths at which TupleDescs are
--- formed and inspected for containment of types requiring local GUC
--- setting.
+-- The following attempt test various paths at which tuples are formed
+-- and inspected for containment of types requiring local GUC setting.

 -- single row synchronous case
 SELECT *

--
fdr


dblink-guc-sensitive-types-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-20 Thread Daniel Farina

On Tue, Mar 19, 2013 at 10:37 PM, Daniel Farina dan...@heroku.com wrote:
 I added programming around various NULL returns reading GUCs in this
 revision, v4.

Okay, one more of those fridge-logic bugs.  Sorry for the noise. v5.

A missing PG_RETHROW to get the properly finally-esque semantics:

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -642,7 +642,10 @@ dblink_fetch(PG_FUNCTION_ARGS)
  }
  PG_CATCH();
  {
+ /* Pop any set GUCs, if necessary */
  restoreLocalGucs(rgs);
+
+ PG_RE_THROW();
  }
  PG_END_TRY();

This was easy to add a regression test to exercise, and so I did (not
displayed here).

--
fdr


dblink-guc-sensitive-types-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-20 Thread Daniel Farina

On Wed, Mar 20, 2013 at 7:43 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 Okay, one more of those fridge-logic bugs.  Sorry for the noise. v5.

 A missing PG_RETHROW to get the properly finally-esque semantics:

 --- a/contrib/dblink/dblink.c
 +++ b/contrib/dblink/dblink.c
 @@ -642,7 +642,10 @@ dblink_fetch(PG_FUNCTION_ARGS)
   }
   PG_CATCH();
   {
 + /* Pop any set GUCs, if necessary */
   restoreLocalGucs(rgs);
 +
 + PG_RE_THROW();
   }
   PG_END_TRY();

 Um ... you shouldn't need a PG_TRY for that at all.  guc.c will take
 care of popping the values on transaction abort --- that's really rather
 the whole point of having that mechanism.

Hmm, well, merely raising the error doesn't reset the GUCs, so I was
rather thinking that this was a good idea to compose more neatly in
the case of nested exception processing, e.g.:

PG_TRY();
{
PG_TRY();
{
elog(NOTICE, pre: %s,
 GetConfigOption(DateStyle, false, true));
materializeResult(fcinfo, res);
}
PG_CATCH();
{
elog(NOTICE, inner catch: %s,
 GetConfigOption(DateStyle, false, true));
PG_RE_THROW();
}
PG_END_TRY();
}
PG_CATCH();
{
elog(NOTICE, outer catch: %s,
 GetConfigOption(DateStyle, false, true));
restoreLocalGucs(rgs);
elog(NOTICE, restored: %s,
 GetConfigOption(DateStyle, false, true));
PG_RE_THROW();
}
PG_END_TRY();

I don't think this paranoia is made in other call sites for AtEOXact,
so included is a version that takes the same stance. This one shows up
 best with the whitespace-insensitive option for git:

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -634,21 +634,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 * affect parsing and then un-set them afterwards.
 */
initRemoteGucs(rgs, conn);
-
-   PG_TRY();
-   {
applyRemoteGucs(rgs);
materializeResult(fcinfo, res);
-   }
-   PG_CATCH();
-   {
-   /* Pop any set GUCs, if necessary */
-   restoreLocalGucs(rgs);
-
-   PG_RE_THROW();
-   }
-   PG_END_TRY();
-
restoreLocalGucs(rgs);

return (Datum) 0;
@@ -823,9 +810,6 @@ dblink_record_internal(FunctionCallInfo fcinfo,
bool is_async)
if (freeconn)
PQfinish(conn);

-   /* Pop any set GUCs, if necessary */
-   restoreLocalGucs(rgs);
-
PG_RE_THROW();
}
PG_END_TRY();

--
fdr


dblink-guc-sensitive-types-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina

On Thu, Mar 14, 2013 at 8:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 On Tue, Mar 12, 2013 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah, watching the remote side's datestyle and intervalstyle and
 matching them (for both input and output) would probably work.

 Alright, so I've been whacking at this and there's one interesting
 thing to ask about: saving and restoring the local GUCs.  There are a
 bunch of things about GUCs besides their value that are maintained,
 such as their 'source', so writing a little ad-hoc save/restore is not
 going to do the right thing.

 Right, you should use NewGUCNestLevel/AtEOXact_GUC.  Look at the fixes
 I committed in postgres_fdw a day or two ago for an example.

Thanks.  Here's a patch.  Here is the comments on top of the patch file inlined:

Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but
taking into account that a dblink caller may choose to cause arbitrary
changes to DateStyle and IntervalStyle.  To handle this, it is
necessary to use PQparameterStatus before parsing any input, every
time.  This is avoided in cases that do not include the two
problematic types treated -- interval and timestamptz -- by scanning
the TupleDesc's types first.

Although it has been suggested that extra_float_digits would need
similar treatment as IntervalStyle and DateStyle (as it's seen in
pg_dump), extra_float_digits is not an GUC_REPORT-ed GUC and is not
able to be checked in PQparameterStatus, and furthermore, the float4
and float8 parsers are not sensitive to the GUC anyway: both accept as
much precision as is provided in an unambiguous way.

 So, I can add one more such use of AtEOXact_GUC for the dblink fix,
 but would it also be appropriate to find some new names for the
 concepts (instead of AtEOXact_GUC and isCommit) here to more
 accurately express what's going on?

 Meh.  I guess we could invent an EndGUCNestLevel that's a wrapper
 around AtEOXact_GUC, but I'm not that excited about it ...

Per your sentiment, I won't pursue that then.

Overall, the patch I think has reasonably thorough regression testing
(I tried to hit the several code paths whereby one would have to scan
TupleDescs, as well as a few other edge cases) and has some of the
obvious optimizations in place: it doesn't call PQparameterStatus more
than once per vulnerable type per TupleDesc scan, and avoids using the
 parameter status procedure at all if there are no affected types.

The mechanisms may be overwrought though, since it was originally
intended to generalize to three types before I realized that
extra_float_digits is another kind of problem entirely, leaving just
IntervalStyle and DateStyle remaining, which perhaps could have been
treated even more specially to make the patch more terse.

I'll add it to the commitfest.

--
fdr


dblink-guc-sensitive-types-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina

On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 Similar in purpose to cc3f281ffb0a5d9b187e7a7b7de4a045809ff683, but
 taking into account that a dblink caller may choose to cause arbitrary
 changes to DateStyle and IntervalStyle.  To handle this, it is
 necessary to use PQparameterStatus before parsing any input, every
 time.  This is avoided in cases that do not include the two
 problematic types treated -- interval and timestamptz -- by scanning
 the TupleDesc's types first.

 Hm.  Is that really a win?  Examining the tupdesc wouldn't be free
 either, and what's more, I think you're making false assumptions about
 which types have behavior dependent on the GUCs.  You definitely missed
 out on date and plain timestamp, and what of domains over these types?

Yes, I also forgot about arrays, and nested composites in arrays or
nested composites.  As soon as recursive types are introduced the scan
is not sufficient for sure: it's necessary to copy every GUC unless
one wants to recurse through the catalogs which feels like a heavy
loss.

I presumed at the time that skimming over the tupdecs to compare a few
Oids would be significantly cheaper than the guts of
PQparameterStatus, which I believe to be a linked list traversal.  I
mostly wrote that optimization because it was easy coincidentally from
how I chose to structure the code rather than belief in its absolute
necessity.

And, as you said, I forgot a few types even for the simple case, which
is a bit of a relief because some of the machinery I wrote for the n =
3 case will come in useful for that.

 I'd be inclined to eat the cost of calling PQparameterStatus every time
 (which won't be that much) and instead try to optimize by avoiding the
 GUC-setting overhead if the current value matches the local setting.
 But even that might be premature optimization.  Did you do any
 performance testing to see if there was a problem worth avoiding?

Nope; should I invent a new way to do that, or would it be up to
commit standard based on inspection alone?  I'm okay either way, let
me know.

I'll take into account these problems and post a new version.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina

On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd be inclined to eat the cost of calling PQparameterStatus every time
 (which won't be that much) and instead try to optimize by avoiding the
 GUC-setting overhead if the current value matches the local setting.
 But even that might be premature optimization.  Did you do any
 performance testing to see if there was a problem worth avoiding?

 Nope; should I invent a new way to do that, or would it be up to
 commit standard based on inspection alone?  I'm okay either way, let
 me know.

 Doesn't seem that hard to test: run a dblink query that pulls back a
 bunch of data under best-case conditions (ie, local not remote server),
 and time it with and without the proposed fix.  If the difference
 is marginal then it's not worth working hard to optimize.

Okay, will do, and here's the shorter and less mechanically intensive
naive version that I think is the baseline: it doesn't try to optimize
out any GUC settings and sets up the GUCs before the two
materialization paths in dblink.

Something I forgot to ask about is about if an strangely-minded type
input function could whack around the GUC as records are being
remitted, which would necessitate per-tuple polling of
pqParameterStatus (e.g. in the inner loop of a materialization) .
That seemed pretty out there, but I'm broaching it for completeness.

I'll see how much of a penalty it is vs. not applying any patch at all next.

--
fdr


dblink-guc-sensitive-types-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-19 Thread Daniel Farina

On Tue, Mar 19, 2013 at 3:52 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 3/19/13 6:08 PM, Ants Aasma wrote:

 My main worry is that there is a reasonably
 large population of users out there that don't have that acceleration
 capability and will have to settle for performance overhead 4x worse
 than what you currently measured for a shared buffer swapping
 workload.


 That would be very bad.  I want to keep hammering on this part of the
 implementation.  If the only style of checksum that's computationally
 feasible is the Fletcher one that's already been done--if that approach is
 basically the most expensive one that's practical to use--I'd still consider
 that a major win over doing nothing.

 While being a lazy researcher today instead of writing code, I discovered
 that the PNG file format includes a CRC-32 on its data chunks, and to
 support that there's a CRC32 function inside of zlib:
 http://www.zlib.net/zlib_tech.html

 Is there anywhere that compiles a PostgreSQL --without-zlib that matters?

I'm confused. Postgres includes a CRC32 implementation for WAL, does
it not?  Are you referring to something else?

I happen to remember this because I moved some things around to enable
third party programs (like xlogdump) to be separately compiled:
http://www.postgresql.org/message-id/e1s2xo0-0004uv...@gemulon.postgresql.org

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina

On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina dan...@heroku.com wrote:
 On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd be inclined to eat the cost of calling PQparameterStatus every time
 (which won't be that much) and instead try to optimize by avoiding the
 GUC-setting overhead if the current value matches the local setting.
 But even that might be premature optimization.  Did you do any
 performance testing to see if there was a problem worth avoiding?

 Nope; should I invent a new way to do that, or would it be up to
 commit standard based on inspection alone?  I'm okay either way, let
 me know.

 Doesn't seem that hard to test: run a dblink query that pulls back a
 bunch of data under best-case conditions (ie, local not remote server),
 and time it with and without the proposed fix.  If the difference
 is marginal then it's not worth working hard to optimize.

 Okay, will do, and here's the shorter and less mechanically intensive
 naive version that I think is the baseline: it doesn't try to optimize
 out any GUC settings and sets up the GUCs before the two
 materialization paths in dblink.

The results.  Summary: seems like grabbing the GUC and strcmp-ing is
worthwhile, but the amount of ping-ponging between processes adds some
noise to the timing results: utilization is far short of 100% on
either processor involved.  Attached is a cumulative diff of the new
version, and also reproduced below are the changes to v2 that make up
v3.

## Benchmark

SELECT dblink_connect('benchconn','dbname=contrib_regression');

CREATE FUNCTION bench() RETURNS integer AS $$
DECLARE
iterations integer;
BEGIN
iterations := 0;

WHILE iterations  30 LOOP
  PERFORM * FROM dblink('benchconn', 'SELECT 1') AS t(a int);
  iterations := iterations + 1;
END LOOP;

RETURN iterations;
END;
$$ LANGUAGE plpgsql;

SELECT clock_timestamp() INTO TEMP beginning;
SELECT bench();
SELECT clock_timestamp() INTO TEMP ending;

SELECT 'dblink-benchmark-lines';
SELECT ending.clock_timestamp - beginning.clock_timestamp
FROM beginning, ending;

## Data Setup

CREATE TEMP TABLE bench_results (version text, span interval);

COPY bench_results FROM STDIN WITH CSV;
no-patch,@ 41.308122 secs
no-patch,@ 36.63597 secs
no-patch,@ 34.264119 secs
no-patch,@ 34.760179 secs
no-patch,@ 32.991257 secs
no-patch,@ 34.538258 secs
no-patch,@ 42.576354 secs
no-patch,@ 39.335557 secs
no-patch,@ 37.493206 secs
no-patch,@ 37.812205 secs
v2-applied,@ 36.550553 secs
v2-applied,@ 38.608723 secs
v2-applied,@ 39.415744 secs
v2-applied,@ 46.091052 secs
v2-applied,@ 45.425438 secs
v2-applied,@ 48.219506 secs
v2-applied,@ 43.514878 secs
v2-applied,@ 45.892302 secs
v2-applied,@ 48.479335 secs
v2-applied,@ 47.632041 secs
v3-strcmp,@ 32.524385 secs
v3-strcmp,@ 34.982098 secs
v3-strcmp,@ 34.487222 secs
v3-strcmp,@ 44.394681 secs
v3-strcmp,@ 44.638309 secs
v3-strcmp,@ 44.113088 secs
v3-strcmp,@ 45.497769 secs
v3-strcmp,@ 33.530176 secs
v3-strcmp,@ 32.9704 secs
v3-strcmp,@ 40.84764 secs
\.

= SELECT version, avg(extract(epoch from span)), stddev(extract(epoch
from span))
   FROM bench_results
   GROUP BY version;
  version   |avg |  stddev
++--
 no-patch   | 37.1715227 | 3.17076487912923
 v2-applied | 43.9829572 | 4.30572672565711
 v3-strcmp  | 38.7985768 | 5.54760393720725

## Changes to v2:

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2981,9 +2981,11 @@ initRemoteGucs(remoteGucs *rgs, PGconn *conn)
 static void
 applyRemoteGucs(remoteGucs *rgs)
 {
- int i;
  const int numGucs = sizeof parseAffectingGucs / sizeof *parseAffectingGucs;

+ int i;
+ int addedGucNesting = false;
+
  /*
  * Affected types require local GUC manipulations.  Create a new
  * GUC NestLevel to overlay the remote settings.
@@ -2992,14 +2994,27 @@ applyRemoteGucs(remoteGucs *rgs)
  * structure, so expect it to come with an invalid NestLevel.
  */
  Assert(rgs-localGUCNestLevel == -1);
- rgs-localGUCNestLevel = NewGUCNestLevel();

  for (i = 0; i  numGucs; i += 1)
  {
+ int gucApplyStatus;
+
  const char *gucName   = parseAffectingGucs[i];
  const char *remoteVal = PQparameterStatus(rgs-conn, gucName);
+ const char *localVal  = GetConfigOption(gucName, true, true);

- int gucApplyStatus;
+ /*
+ * Attempt to avoid GUC setting if the remote and local GUCs
+ * already have the same value.
+ */
+ if (strcmp(remoteVal, localVal) == 0)
+ continue;
+
+ if (!addedGucNesting)
+ {
+ rgs-localGUCNestLevel = NewGUCNestLevel();
+ addedGucNesting = true;
+ }

  gucApplyStatus = set_config_option(gucName, remoteVal,
PGC_USERSET, PGC_S_SESSION,

--
fdr


dblink-guc-sensitive-types-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-19 Thread Daniel Farina

On Tue, Mar 19, 2013 at 6:10 PM, Daniel Farina dan...@heroku.com wrote:
 On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina dan...@heroku.com wrote:
 On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd be inclined to eat the cost of calling PQparameterStatus every time
 (which won't be that much) and instead try to optimize by avoiding the
 GUC-setting overhead if the current value matches the local setting.
 But even that might be premature optimization.  Did you do any
 performance testing to see if there was a problem worth avoiding?

 Nope; should I invent a new way to do that, or would it be up to
 commit standard based on inspection alone?  I'm okay either way, let
 me know.

 Doesn't seem that hard to test: run a dblink query that pulls back a
 bunch of data under best-case conditions (ie, local not remote server),
 and time it with and without the proposed fix.  If the difference
 is marginal then it's not worth working hard to optimize.

 Okay, will do, and here's the shorter and less mechanically intensive
 naive version that I think is the baseline: it doesn't try to optimize
 out any GUC settings and sets up the GUCs before the two
 materialization paths in dblink.

 The results.  Summary: seems like grabbing the GUC and strcmp-ing is
 worthwhile, but the amount of ping-ponging between processes adds some
 noise to the timing results: utilization is far short of 100% on
 either processor involved.  Attached is a cumulative diff of the new
 version, and also reproduced below are the changes to v2 that make up
 v3.

I added programming around various NULL returns reading GUCs in this
revision, v4.

The non-cumulative changes:

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -3005,8 +3005,22 @@ applyRemoteGucs(remoteGucs *rgs)
  /*
  * Attempt to avoid GUC setting if the remote and local GUCs
  * already have the same value.
+ *
+ * NB: Must error if the GUC is not found.
  */
- localVal = GetConfigOption(gucName, true, true);
+ localVal = GetConfigOption(gucName, false, true);
+
+ if (remoteVal == NULL)
+ ereport(ERROR,
+ (errmsg(could not load parameter status of %s,
+ gucName)));
+
+ /*
+ * An error must have been raised by now if GUC values could
+ * not be loaded for any reason.
+ */
+ Assert(localVal != NULL);
+ Assert(remoteVal != NULL);

  if (strcmp(remoteVal, localVal) == 0)
  continue;

--
fdr


dblink-guc-sensitive-types-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Daniel Farina

On Mon, Mar 18, 2013 at 1:31 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 3/18/13 10:52 AM, Bruce Momjian wrote:

 With a potential 10-20% overhead, I am unclear who would enable this at
 initdb time.


 If you survey people who are running PostgreSQL on cloud hardware, be it
 Amazon's EC2 or similar options from other vendors, you will find a high
 percentage of them would pay quite a bit of performance to make their
 storage more reliable.  To pick one common measurement for popularity, a
 Google search on ebs corruption returns 17 million hits.  To quote one of
 those, Baron Schwartz of Percona talking about MySQL on EC2:
 BTW, I have seen data corruption on EBS volumes. It’s not clear whether it
 was InnoDB’s fault (extremely unlikely IMO), the operating system’s fault,
 EBS’s fault, or something else.

Clarification, because I think this assessment as delivered feeds some
unnecessary FUD about EBS:

EBS is quite reliable.  Presuming that all noticed corruptions are
strictly EBS's problem (that's quite a stretch), I'd say the defect
rate falls somewhere in the range of volume-centuries.

I want to point this out because I think EBS gets an outsized amount
of public flogging, and not all of it is deserved.

My assessment of the caustion at hand: I care about this feature not
because EBS sucks more than anything else by a large degree, but
because there's an ever mounting number of EBS volumes whose defects
are under the responsibility of comparatively few individuals.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-18 Thread Daniel Farina

On Mon, Mar 18, 2013 at 7:13 PM, Greg Smith g...@2ndquadrant.com wrote:
 I wasn't trying to flog EBS as any more or less reliable than other types of
 storage.  What I was trying to emphasize, similarly to your quite a
 stretch comment, was the uncertainty involved when such deployments fail.
 Failures happen due to many causes outside of just EBS itself.  But people
 are so far removed from the physical objects that fail, it's harder now to
 point blame the right way when things fail.

I didn't mean to imply you personally were going out of your way to
flog EBS, but there is a sufficient vacuum in the narrative that
someone could reasonably interpereted it that way, so I want to set it
straight.  The problem is the quantity of databases per human.  The
Pythons said it best: 'A simple question of weight ratios.'

 A quick example will demonstrate what I mean.  Let's say my server at home
 dies.  There's some terrible log messages, it crashes, and when it comes
 back up it's broken.  Troubleshooting and possibly replacement parts follow.
 I will normally expect an eventual resolution that includes data like the
 drive showed X SMART errors or I swapped the memory with a similar system
 and the problem followed the RAM.  I'll learn something about what failed
 that I might use as feedback to adjust my practices.  But an EC2+EBS failure
 doesn't let you get to the root cause effectively most of the time, and that
 makes people nervous.

Yes, the layering makes it tougher to do vertical treatment of obscure
issues.  Redundancy has often been the preferred solution here: bugs
come and go all the time, and everyone at each level tries to fix what
they can without much coordination from the layer above or below.
There are hopefully benefits in throughput of progress at each level
from this abstraction, but predicting when any one particular issue
will go understood top to bottom is even harder than it already was.

Also, I think the line of reasoning presented is biased towards a
certain class of database: there are many, many databases with minimal
funding and oversight being run in the traditional way, and the odds
they'll get a vigorous root cause analysis in event of an obscure
issue is already close to nil.  Although there are other
considerations at play (like not just leaving those users with nothing
more than a bad block message), checksums open some avenues
gradually benefit those use cases, too.

 I can already see how do checksums alone help narrow the blame? as the
 next question.  I'll post something summarizing how I use them for that
 tomorrow, just out of juice for that tonight.

Not from me.  It seems pretty intuitive from here how database
maintained checksums assist in partitioning the problem.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-18 Thread Daniel Farina

On Wed, Mar 6, 2013 at 6:32 AM, Joachim Wieland j...@mcknight.de wrote:
 On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 With these tweaks, I was able to make pglz-based delta encoding perform
 roughly as well as Amit's patch.

 Out of curiosity, do we know how pglz compares with other algorithms, e.g. 
 lz4 ?

This one is for the archives, as I thought it surprising: there can be
a surprisingly huge magnitude of performance difference of these
algorithms depending on architecture.  Here's a table reproduced from:
http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9


testdata/alice29.txt :
ZLIB:[b 1M] bytes 152089 -  54404 35.8%  comp   0.8 MB/s  uncomp   8.1 MB/s
LZO: [b 1M] bytes 152089 -  82721 54.4%  comp  14.5 MB/s  uncomp  43.0 MB/s
CSNAPPY: [b 1M] bytes 152089 -  90965 59.8%  comp   2.1 MB/s  uncomp   4.4 MB/s
SNAPPY:  [b 4M] bytes 152089 -  90965 59.8%  comp   1.8 MB/s  uncomp   2.8 MB/s
testdata/asyoulik.txt:
ZLIB:[b 1M] bytes 125179 -  48897 39.1%  comp   0.8 MB/s  uncomp   7.7 MB/s
LZO: [b 1M] bytes 125179 -  73224 58.5%  comp  15.3 MB/s  uncomp  42.4 MB/s
CSNAPPY: [b 1M] bytes 125179 -  80207 64.1%  comp   2.0 MB/s  uncomp   4.2 MB/s
SNAPPY:  [b 4M] bytes 125179 -  80207 64.1%  comp   1.7 MB/s  uncomp   2.7 MB/s

LZO was ~8x faster compressing and ~16x faster decompressing. Only on
uncompressible data was Snappy was faster:

testdata/house.jpg   :
ZLIB:[b 1M] bytes 126958 - 126513 99.6%  comp   1.2 MB/s  uncomp   9.6 MB/s
LZO: [b 1M] bytes 126958 - 127173 100.2%  comp   4.2 MB/s  uncomp
 74.9 MB/s
CSNAPPY: [b 1M] bytes 126958 - 126803 99.9%  comp  24.6 MB/s  uncomp 381.2 MB/s
SNAPPY:  [b 4M] bytes 126958 - 126803 99.9%  comp  22.8 MB/s  uncomp 354.4 MB/s


So that's one more gotcha to worry about, since I surmise most numbers
are being taken on x86.  Apparently this has something to do with
alignment of accesses.  Some of it may be fixable by tweaking the
implementation rather than the compression encoding, although I am no
expert in the matter.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-17 Thread Daniel Farina

On Sun, Mar 17, 2013 at 5:50 PM, Greg Smith g...@2ndquadrant.com wrote:
 On the testing front, we've seen on-list interest in this feature from
 companies like Heroku and Enova, who both have some resources and practice
 to help testing too.  Heroku can spin up test instances with workloads any
 number of ways.  Enova can make a Londiste standby with checksums turned on
 to hit it with a logical replicated workload, while the master stays
 un-checksummed.

I was thinking about turning checksums on for all new databases as
long as I am able to turn them off easily, per my message prior:
http://www.postgresql.org/message-id/caazkufzza+aw8zl4f_5c8t8zhrtjo3cm1ajqddglqcpez_3...@mail.gmail.com.
 An unstated assumption here was that I could apply the patch to 9.2
with some work.  It seems the revitalized interest in the patch has
raised a couple of issues on inspection that have yet to be resolved,
so before moving I'd prefer to wait for a quiescence in the patch's
evolution, as
was the case for some time even after review.

However, if we want to just hit 9.3dev with a bunch of synthetic
traffic, that's probably doable also, and in some ways easier (or at
least less risky).

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-14 Thread Daniel Farina

On Tue, Mar 12, 2013 at 11:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 Okay, I see.  So inverting the thinking I wrote earlier: how about
 hearkening carefully to any ParameterStatus messages on the local side
 before entering the inner loop of dblink.c:materializeResult as to set
 the local GUC (and carefully dropping it back off after
 materializeResult) so that the the _in functions can evaluate the
 input in the same relevant GUC context as the remote side?

 Yeah, watching the remote side's datestyle and intervalstyle and
 matching them (for both input and output) would probably work.

Alright, so I've been whacking at this and there's one interesting
thing to ask about: saving and restoring the local GUCs.  There are a
bunch of things about GUCs besides their value that are maintained,
such as their 'source', so writing a little ad-hoc save/restore is not
going to do the right thing.  Luckily, something much closer to the
right thing is done for SET LOCAL with transactions and
subtransactions, to push and pop GUC contexts:

  guc.h:

  extern int NewGUCNestLevel(void);
  extern void AtEOXact_GUC(bool isCommit, int nestLevel);

By and large looking at the mechanics of these two functions, the
latter in particular has very little to do with the transaction
machinery in general.  It's already being called from a bunch of
places that don't, to me, seem to really indicate a intrinsic
connection to transactions, e.g. do_analyze_rel, ri_triggers, and
postgres_fdw.  I think in those cases the justification for settings
of 'isCommit' is informed by the mechanism more than the symbol name
or its comments.  I count about ten call sites that are like this.

So, I can add one more such use of AtEOXact_GUC for the dblink fix,
but would it also be appropriate to find some new names for the
concepts (instead of AtEOXact_GUC and isCommit) here to more
accurately express what's going on?

I'll perhaps do that after finishing up the dblink fix if I receive
some positive response on the matter.  However, if for some reason I
*shouldn't* use that machinery in dblink, I'd appreciate the guidance.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-12 Thread Daniel Farina

On Mon, Mar 11, 2013 at 7:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 I will try to make time for this, although it seems like the general
 approach should match pgsql_fdw if possible.  Is the current thinking
 to forward the settings and then use the GUC hooks to track updates?

 That's not what I had in mind for postgres_fdw --- rather the idea is to
 avoid needing on-the-fly changes in remote-side settings, because those
 are so expensive to make.  However, postgres_fdw is fortunate in that
 the SQL it expects to execute on the remote side is very constrained.
 dblink might need a different solution that would leave room for
 user-driven changes of remote-side settings.

Okay, I see.  So inverting the thinking I wrote earlier: how about
hearkening carefully to any ParameterStatus messages on the local side
before entering the inner loop of dblink.c:materializeResult as to set
the local GUC (and carefully dropping it back off after
materializeResult) so that the the _in functions can evaluate the
input in the same relevant GUC context as the remote side?

That should handle SET actions executed remotely.

Otherwise it seems like a solution would have to be ambitious enough
to encompass reifying the GUCs from the afflicted parsers, which I
surmise is not something that we want to treat right now.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-11 Thread Daniel Farina

On Mon, Mar 11, 2013 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, it strikes me that dblink is probably subject to at least some of
 these same failure modes.  I'm not personally volunteering to fix any
 of this in dblink, but maybe someone ought to look into that.

I will try to make time for this, although it seems like the general
approach should match pgsql_fdw if possible.  Is the current thinking
to forward the settings and then use the GUC hooks to track updates?

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: postgres_fdw vs data formatting GUCs (was Re: [HACKERS] [v9.3] writable foreign tables)

2013-03-10 Thread Daniel Farina

On Sun, Mar 10, 2013 at 12:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 There's a lot left to do here of course.  One thing I was wondering
 about was why we don't allow DEFAULTs to be attached to foreign-table
 columns.  There was no use in it before, but it seems sensible enough
 now.

 Hmm ... the buildfarm just rubbed my nose in a more immediate issue,
 which is that postgres_fdw is vulnerable to problems if the remote
 server is using different GUC settings than it is for things like
 timezone and datestyle.  The failure seen here:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rover_fireflydt=2013-03-10%2018%3A30%3A00
 is basically just cosmetic, but it's not hard to imagine non-cosmetic
 problems coming up.  For instance, suppose our instance is running in
 DMY datestyle and transmits an ambiguous date to a remote running in
 MDY datestyle.

 We could consider sending our settings to the remote at connection
 establishment, but that doesn't seem terribly bulletproof --- what if
 someone does a local SET later?  What seems safer is to set the remote
 to ISO style always, but then we have to figure out how to get the local
 timestamptz_out to emit that style without touching our local GUC.
 Ugh.

Forgive my naivety: why would a timestamptz value not be passed
through the _in/_out function locally at least once (hence, respecting
local GUCs) when using the FDW?  Is the problem overhead in not
wanting to parse and re-format the value on the local server?

Although it seems that if GUCs affected *parsing* then the problem
comes back again, since one may choose to canonicalize output from a
remote server (e.g. via setting ISO time in all cases) but should the
user have set up GUCs to interpret input in a particular way for a
data type that is not enough.

As-is this situation is already technically true for timestamptz in
the case of time stamps without any explicit time offset or time zone,
but since timestamptz_out doesn't ever render without the offset
(right?)  it's not a problem.  Close shave, though, and one that an
extension author could easily find themselves on the wrong side of.

I suppose that means any non-immutable in/out function pair may have
to be carefully considered, and the list is despairingly long...but
finite:

SELECT proname
FROM pg_proc WHERE EXISTS
  (SELECT 1
   FROM pg_type
   WHERE pg_proc.oid = pg_type.typinput OR
 pg_proc.oid = pg_type.typoutput OR
 pg_proc.oid = pg_type.typsend OR
 pg_proc.oid = pg_type.typreceive)
  AND provolatile != 'i';

 (One more reason why GUCs that affect application-visible semantics are
dangerous.)

:(

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Daniel Farina

On Thu, Mar 7, 2013 at 7:31 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Mar  4, 2013 at 05:04:27PM -0800, Daniel Farina wrote:
 Putting aside the not-so-rosy predictions seen elsewhere in this
 thread about the availability of a high performance, reliable
 checksumming file system available on common platforms, I'd like to
 express what benefit this feature will have to me:

 Corruption has easily occupied more than one person-month of time last
 year for us.  This year to date I've burned two weeks, although
 admittedly this was probably the result of statistical clustering.
 Other colleagues of mine have probably put in a week or two in
 aggregate in this year to date.  The ability to quickly, accurately,
 and maybe at some later date proactively finding good backups to run
 WAL recovery from is one of the biggest strides we can make in the
 operation of Postgres.  The especially ugly cases are where the page
 header is not corrupt, so full page images can carry along malformed
 tuples...basically, when the corruption works its way into the WAL,
 we're in much worse shape.  Checksums would hopefully prevent this
 case, converting them into corrupt pages that will not be modified.

 It would be better yet if I could write tools to find the last-good
 version of pages, and so I think tight integration with Postgres will
 see a lot of benefits that would be quite difficult and non-portable
 when relying on file system checksumming.

 I see Heroku has corruption experience, and I know Jim Nasby has
 struggled with corruption in the past.

More than a little: it has entered the realm of the routine, and
happens frequently enough that it has become worthwhile to start
looking for patterns.

Our methods so far rely heavily on our archives to deal with it: it's
time consuming but the 'simple' case of replaying WAL from some
earlier base backup resulting in a non-corrupt database is easily the
most common.  Interestingly, the WAL has never failed to recover
halfway through because of CRC failures while treating corruption[0].
We know this fairly convincingly because we constantly sample txid and
wal positions while checking the database, as we typically do about
every thirty seconds.

I think this unreasonable effectiveness of this strategy of old backup
and WAL replay might suggest that database checksums would prove
useful.  In my mind, the ways this formula could work so well if the
bug was RAM or CPU based is slimmed considerably.

[0] I have seen -- very rarely -- substantial periods of severe WAL
corruption (files are not even remotely the correct size) propagated
to the archives in the case of disaster recovery where the machine met
its end because of the WAL disk being marked as dead.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Daniel Farina

On Wed, Mar 6, 2013 at 8:17 PM, Greg Smith g...@2ndquadrant.com wrote:
 TL;DR summary:  on a system I thought was a fair middle of the road server,
 pgbench tests are averaging about a 2% increase in WAL writes and a 2%
 slowdown when I turn on checksums.  There are a small number of troublesome
 cases where that overhead rises to closer to 20%, an upper limit that's
 shown up in a few tests aiming to stress this feature now.

I have only done some cursory research, but cpu-time of 20% seem to
expected for InnoDB's CRC computation[0].  Although a galling number,
this comparison with other systems may be a way to see how much of
that overhead is avoidable or just the price of entry.  It's unclear
how this 20% cpu-time compares to your above whole-system results, but
it's enough to suggest that nothing comes for (nearly) free.

[0]: http://mysqlha.blogspot.com/2009/05/innodb-checksum-performance.html

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

2013-03-04 Thread Daniel Farina

On Mon, Mar 4, 2013 at 2:27 PM, Maciek Sakrejda m.sakre...@gmail.com wrote:
 On Sun, Mar 3, 2013 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The real difficulty is that there may be more than one storable value
 that corresponds to 1.23456 to six decimal digits.  To be certain that
 we can reproduce the stored value uniquely, we have to err in the other
 direction, and print *more* decimal digits than the underlying precision
 justifies, rather than a bit less.  Some of those digits are going to
 look like garbage to the naked eye.

 I think part of the difficulty here is that psql (if I understand this
 correctly) conflates the wire-format text representations with what
 should be displayed to the user. E.g., a different driver might parse
 the wire representation into a native representation, and then format
 that native representation when it is to be displayed. That's what the
 JDBC driver does, so it doesn't care about how the wire format
 actually looks.

 pg_dump cares about reproducing values exactly, and not about whether
 things are nice-looking, so it cranks up extra_float_digits.  The JDBC
 driver might be justified in doing likewise, to ensure that the
 identical binary float value is stored on both client and server ---
 but that isn't even a valid goal unless you assume that the server's
 float implementation is the same as Java's, which is a bit of a leap of
 faith, even if IEEE 754 is nigh universal these days.

 I would hope that any driver cares about reproducing values exactly
 (or at least as exactly as the semantics of the client and server
 representations of the data type allow). Once you start talking
 operations, sure, things get a lot more complicated and you're better
 off not relying on any particular semantics. But IEEE 754
 unambiguously defines certain bit patterns to correspond to certain
 values, no? If both client and server talk IEEE 754 floating point, it
 should be possible to round-trip values with no fuss and end up with
 the same bits you started with (and as far as I can tell, it is, as
 long as extra_float_digits is set to the max), even if the
 implementations of actual operations on these numbers behave very
 differently on client and server. I think given that many ORMs can
 cause UPDATEs on tuple fields that have not changed as part of saving
 an object, stable round trips seem like a desirable feature.

I also find the rationale for extra_float digits quite mysterious for
the same reason: why would most programs care about precision less
than pg_dump does?

If a client wants floating point numbers to look nice, I think the
rendering should be on them (e.g. psql and pgadmin), and the default
should be to expose whatever precision is available to clients that
want an accurate representation of what is in the database.

This kind of change may have many practical problems that may make it
un-pragmatic to alter at this time (considering the workaround is to
set the extra float digits), but I can't quite grasp the rationale for
well, the only program that cares about the most precision available
is pg_dump.  It seems like most programs would care just as much.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-04 Thread Daniel Farina

On Mon, Mar 4, 2013 at 1:22 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 04.03.2013 23:00, Jeff Davis wrote:

 On Mon, 2013-03-04 at 22:27 +0200, Heikki Linnakangas wrote:

 Yeah, fragmentation will certainly hurt some workloads. But how badly,
 and which workloads, and how does that compare with the work that
 PostgreSQL has to do to maintain the checksums? I'd like to see some
 data on those things.


 I think we all would. Btrfs will be a major filesystem in a few years,
 and we should be ready to support it.


 Perhaps we should just wait a few years? If we suspect that this becomes
 obsolete in a few years, it's probably better to just wait, than add a
 feature we'll have to keep maintaining. Assuming it gets committed today,
 it's going to take a year or two for 9.3 to get released and all the bugs
 ironed out, anyway.

Putting aside the not-so-rosy predictions seen elsewhere in this
thread about the availability of a high performance, reliable
checksumming file system available on common platforms, I'd like to
express what benefit this feature will have to me:

Corruption has easily occupied more than one person-month of time last
year for us.  This year to date I've burned two weeks, although
admittedly this was probably the result of statistical clustering.
Other colleagues of mine have probably put in a week or two in
aggregate in this year to date.  The ability to quickly, accurately,
and maybe at some later date proactively finding good backups to run
WAL recovery from is one of the biggest strides we can make in the
operation of Postgres.  The especially ugly cases are where the page
header is not corrupt, so full page images can carry along malformed
tuples...basically, when the corruption works its way into the WAL,
we're in much worse shape.  Checksums would hopefully prevent this
case, converting them into corrupt pages that will not be modified.

It would be better yet if I could write tools to find the last-good
version of pages, and so I think tight integration with Postgres will
see a lot of benefits that would be quite difficult and non-portable
when relying on file system checksumming.

You are among the most well-positioned to make assessments of the cost
of the feature, but I thought you might appreciate a perspective of
the benefits, too.  I think they're large, and for me they are the
highest pole in the tent for what makes Postgres stressful to operate
as-is today.  It's a testament to the quality of the programming in
Postgres that Postgres programming error is not the largest problem.

For sense of reference, I think the next largest operational problem
is the disruption caused by logical backups, e.g. pg_dump, and in
particular its long running transactions and sessions.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-01 Thread Daniel Farina

On Sun, Feb 24, 2013 at 10:30 PM, Greg Smith g...@2ndquadrant.com wrote:
 Attached is some bit rot updates to the checksums patches.  The replace-tli
 one still works fine

I rather badly want this feature, and if the open issues with the
patch has hit zero, I'm thinking about applying it, shipping it, and
turning it on.  Given that the heap format has not changed, the main
affordence I may check for is if I can work in backwards compatibility
(while not maintaining the checksums, of course) in case of an
emergency.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-21 Thread Daniel Farina

On Thu, Feb 21, 2013 at 12:39 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 21.02.2013 02:59, Daniel Farina wrote:

 On Fri, Feb 15, 2013 at 9:29 AM, Simon Riggssi...@2ndquadrant.com
 wrote:

 On 15 February 2013 17:07, Heikki Linnakangashlinnakan...@vmware.com
 wrote:

 Unfortunately in HEAD, xxx.done file is not created when restoring
 archived
 file because of absence of the patch. We need to implement that first.



 Ah yeah, that thing again..
 (http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm
 going
 to forward-port that patch now, before it's forgotten again. It's not
 clear
 to me what the holdup was on this, but whatever the bigger patch we've
 been
 waiting for is, it can just as well be done on top of the forward-port.


 Agreed. I wouldn't wait for a better version now.


 Related to this, how is this going to affect point releases, and are
 there any lingering doubts about the mechanism of the fix?


 Are you talking about the patch to avoid restored WAL segments from being
 re-archived (commit 6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d), or the bug
 that that unarchived WALs were deleted after crash (commit
 b5ec56f664fa20d80fe752de494ec96362eff520)? The former was included in 9.2.0
 already, and the latter will be included in the next point release.

Unarchived WALs being deleted after a crash is the one that worries
me.  I actually presume re-archivals will happen anyway because I may
lose connection to archive storage after the WAL has already been
committed, hence b5ec56f664fa20d80fe752de494ec96362eff520.

 This is
 quite serious given my reliance on archiving, so unless the thinking
 for point releases is 'real soon' I must backpatch and release it on
 my own accord until then.


 I don't know what the release schedule is. I take that to be a request to
 put out a new minor release ASAP.

Perhaps, but it's more of a concrete evaluation of how important
archiving is to me and my affiliated operation.  An acceptable answer
might be yeah, backpatch if you feel it's that much of a rush.
Clearly, my opinion is that a gap in the archives is pretty
cringe-inducing.  I hit it from an out of disk case, and you'd be
surprised (or perhaps not?) how many people like to kill -9 processes
on a whim.

I already maintain other backpatches (not related to fixes), and this
one is only temporary, so it's not too much trouble for me.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-20 Thread Daniel Farina

On Fri, Feb 15, 2013 at 9:29 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 February 2013 17:07, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 Unfortunately in HEAD, xxx.done file is not created when restoring
 archived
 file because of absence of the patch. We need to implement that first.


 Ah yeah, that thing again..
 (http://www.postgresql.org/message-id/50df5ba7.6070...@vmware.com) I'm going
 to forward-port that patch now, before it's forgotten again. It's not clear
 to me what the holdup was on this, but whatever the bigger patch we've been
 waiting for is, it can just as well be done on top of the forward-port.

 Agreed. I wouldn't wait for a better version now.

Related to this, how is this going to affect point releases, and are
there any lingering doubts about the mechanism of the fix?  This is
quite serious given my reliance on archiving, so unless the thinking
for point releases is 'real soon' I must backpatch and release it on
my own accord until then.

Thanks for the attention paid to the bug report, as always.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unarchived WALs deleted after crash

2013-02-14 Thread Daniel Farina

On Thu, Feb 14, 2013 at 7:45 AM, Jehan-Guillaume de Rorthais
j...@dalibo.com wrote:
 Hi,

 I am facing an unexpected behavior on a 9.2.2 cluster that I can
 reproduce on current HEAD.

 On a cluster with archive enabled but failing, after a crash of
 postmaster, the checkpoint occurring before leaving the recovery mode
 deletes any additional WALs, even those waiting to be archived.

I believe I have encountered this recently, but didn't get enough
chance to work with it to correspond.  For me, the cause was
out-of-disk on the file system that exclusively contained WAL,
backlogged because archiving fell behind writing.  This causes the
cluster to crash -- par for the course -- but also an archive gap was
created.  At the time I thought there was some kind of bug in dealing
with out of space issues in the archiver (the .ready bookkeeping), but
the symptoms I saw seem like they might be explained by your report,
too.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Daniel Farina

On Wed, Feb 13, 2013 at 8:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic idea of a fractal tree index is to attach a buffer to every
 non-leaf page. On insertion, instead of descending all the way down to
 the correct leaf page, the new tuple is put on the buffer at the root
 page. When that buffer fills up, all the tuples in the buffer are
 cascaded down to the buffers on the next level pages. And recursively,
 whenever a buffer fills up at any level, it's flushed to the next level.

 [ scratches head... ]  What's fractal about that?  Or is that just a
 content-free marketing name for this technique?

The name in the literature is Cache Oblivious Lookahead Array, aka
COLA.  The authors also are founders of TokuTek, and seemed to have
take pains to ring-fence mentions of the algorithm with reference to
its patent.

Well, at least nobody can blame them for submarine patent action.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Considering Gerrit for CFs

2013-02-08 Thread Daniel Farina

On Fri, Feb 8, 2013 at 2:23 AM, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Feb 8, 2013 at 1:32 AM, Josh Berkus j...@agliodbs.com wrote:
 This is a few too many steps, and certainly appears completely broken to
 any newcomer.

 I agree it's way too many step. Several of those can certainly be made
 more efficient now that we have a more sane archives, well within the
 scope of the current system.

I have thought it'd be 'nice' if pre-generated binaries and git repos
were made for each submitted patch.  Is there a nice-and-tidy way to
paginate over list traffic from an external program over The Internet?
 Is there/could there be a nice pagination marker? Is that a
reasonable step zero to making other things that consume email?

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Considering Gerrit for CFs

2013-02-06 Thread Daniel Farina

On Wed, Feb 6, 2013 at 3:00 PM, Joshua D. Drake j...@commandprompt.com wrote:

 On 02/06/2013 01:53 PM, Tom Lane wrote:

 ... if it's going to try to coerce us out of our email-centric habits,
 then I for one am very much against it.  To me, the problems with the
 existing CF app are precisely that it's not well enough integrated with
 the email discussions.  The way to fix that is not to abandon email (or
 at least, it's not a way I wish to pursue).


 The email centric habits are by far the biggest limiting factor we have.
 Email was never designed for integral collaboration. That said, I see no way
 around it. I have brought up this idea before but, Redmine has by far served
 the purposes (with a little effort) of CMD and it also integrates with GIT.
 It also does not break the email work flow. It does not currently allow
 commands via email but that could be easily (when I say easily, I mean it)
 added.

 Just another thought.

I don't think controlling things by email is the issue.  I have used
the usual litany of bug trackers and appreciate the
correspondence-driven model that -hackers and -bugs uses pretty
pleasant.  If nothing else, the uniform, well-developed, addressable,
and indexed nature of the archives definitely provides me a lot of
value that I haven't really seen duplicated in other projects that use
structured bug or patch tracking.  The individual communications tend
to be of higher quality -- particularly to the purpose of later
reference -- maybe a byproduct of the fact that prose is on the
pedestal.

There are obvious tooling gaps (aren't there always?), but I don't
really see the model as broken, and I don't think I've been around
pgsql-hackers exclusively or extensively enough to have developed
Stockholm syndrome.

I also happen to feel that the commitfest application works rather
well for me in general.  Sure, I wish that it might slurp up all
submitted patches automatically or something like that with default
titles or something or identify new versions when they appear, but
I've always felt that skimming the commitfest detail page for a patch
was useful.  It was perhaps harder to know if the commitfest page I
was looking at was complete or up to date or not, although it often
is.

Here's what I find most gut-wrenching in the whole submit-review-commit process:

* When a reviewer shows up a couple of weeks later and says this
patch doesn't apply or make check crash or fails to compile.
It's especially sad because the reviewer has presumably cut out a
chunk of time -- now probably aborted -- to review the patch.
Machines can know these things automatically.

* When on occasion patches are submitted with non-C/sh/perl suites
that may not really be something that anyone wants to be a
build/source tree, but seem like they might do a better job testing
the patch.  The inevitable conclusion is that the automated test
harness is tossed, or never written because it is known it will have
no place to live after a possible commit.  Somehow I wish those could
live somewhere and run against all submitted patches.

I've toyed a very, very tiny bit with running build farm clients on
Heroku with both of these in mind, but haven't revisited it in a
while.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] writable foreign tables

2013-02-05 Thread Daniel Farina

On Fri, Feb 1, 2013 at 2:22 AM, Daniel Farina dan...@heroku.com wrote:
 On Tue, Jan 29, 2013 at 1:19 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I noticed the v10 patch cannot be applied on the latest master branch
 cleanly. The attached ones were rebased.

 Anyway, I'm looking at the first patch, which applies neatly. Thanks.

Hello,

My review is incomplete, but to the benefit of pipelining I wanted to
ask a couple of things and submit some changes for your consideration
while continuing to look at this.

So far, I've been trying to understand in very close detail how your
changes affect non-FDW related paths first, before delving into the
actual writable FDW functionality.  There's not that much in this
category, but there's one thing that gave me pause: the fact that the
target list (by convention, tlist) has to be pushed down from
planmain.c/query_planner all the way to a
fdwroutine-GetForeignRelWidth callsite in plancat.c/get_relation_info
(so even in the end, it is just pushed down -- never inspected or
modified).  In relative terms, it's a significant widening of
currently fairly short parameter lists in these procedures:

add_base_rels_to_query(PlannerInfo *root, List *tlist, Node *jtnode)
build_simple_rel(PlannerInfo *root, int relid, List *tlist, RelOptKind
reloptkind)
get_relation_info(PlannerInfo *root, Oid relationObjectId, bool
inhparent, List *tlist, RelOptInfo *rel)

In the case of all the other parameters except tlist, each is more
intimately related with the inner workings and mechanisms of the
procedure.  I wonder if there's a nice way that can train the reader's
eye that the tlist is simply pushed down rather than actively managed
at each level.  However, I can't immediately produce a way to both
achieve that that doesn't seem overwrought.  Perhaps a comment will
suffice.

Related to this, I found that make_modifytable grew a dependency on
PlannerInfo *root, and it seemed like a bunch of the arguments are
functionally related to that, so the attached patch that should be
able to be applied to yours tries to utilize that as often as
possible.  The weirdest thing in there is how make_modifytable has
been taught to call SS_assign_special_param, which has a side effect
on the PlannerInfo, but there exist exactly zero cases where one
*doesn't* want to do that prior to the (exactly two) call sites to
make_modifytable, so I pushed it in.  The second weirdest thing is
pushing in the handling of rowMarks (e.g. SELECT FOR UPDATE et al)

Let me know as to your thoughts, otherwise I'll keep moving on...

--
fdr


wfdw-rely-on-plannerinfo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json api WIP patch

2013-02-04 Thread Daniel Farina

On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan and...@dunslane.net wrote:
 On 02/04/2013 10:47 AM, Robert Haas wrote:


 The SQL standards considerations seem worth thinking about, too.
 We've certainly gone through a lot of pain working toward eliminating
 = as an operator name, and if the SQL standard has commandeered -
 for some purpose or other, I'd really rather not add to the headaches
 involved should we ever decide to reclaim it.

 OK, but I'd like to know what is going to be safe. There's no way to
 future-proof the language. I'm quite prepared to replace - with something
 else, and if I do then - will need to be adjusted accordingly, I think.

 My suggestion would be ~ and ~. I know David Wheeler didn't like that on
 the ground that some fonts elevate ~ rather than aligning it in the middle
 as most monospaced fonts do, but I'm tempted just to say then use a
 different font. Other possibilities that come to mind are + and +,
 although I think they're less attractive. But I'll be guided by the
 consensus, assuming there is one ;-)

 I suspect both of those are pretty safe from an SQL standards point of
 view.  Of course, as Tom is often wont to point out, the SQL standards
 committee sometimes does bizarre things, so nothing's perfect, but I'd
 be rather shocked if any of those got tapped to mean something else.

 That having been said, I still don't see value in adding operators at
 all.  Good old function call notation seems perfectly adequate from
 where I sit.  Sure, it's a little more verbose, but when you try to
 too hard make things concise then you end up having to explain to your
 users why \ditS is a sensible thing for them to type into psql, or why
 s@\W@sprintf%%%02x,ord($)@e in Perl.  I recognize that I may lose
 this argument, but I've worked with a couple of languages where
 operators can be overloaded (C++) or defined (ML) and it's just never
 seemed to work out very well.  YMMV, of course.

I also basically feel this way, although I know I tend more towards
notational brutalism than many.  I think we shouldn't kid ourselves
that non-default operators will be used, and for
current-implementation reasons (that maybe could be fixed by someone
determined) it's not really at the pleasure of the author to use them
via CREATE OPERATOR either.

So, I basically subscribe to view that we should investigate what
total reliance on prefix syntax looks like.  I guess it'll make nested
navigation horribly ugly, though...positively lisp-esque.  That' s one
consideration hstore doesn't have that may make use of infix notations
considerably more useful for json than hstore.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json api WIP patch

2013-02-04 Thread Daniel Farina

On Mon, Feb 4, 2013 at 12:37 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 02/04/2013 03:16 PM, Daniel Farina wrote:

 On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas robertmh...@gmail.com
 wrote:

 On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan and...@dunslane.net
 wrote:

 On 02/04/2013 10:47 AM, Robert Haas wrote:


 The SQL standards considerations seem worth thinking about, too.
 We've certainly gone through a lot of pain working toward eliminating
 = as an operator name, and if the SQL standard has commandeered -
 for some purpose or other, I'd really rather not add to the headaches
 involved should we ever decide to reclaim it.

 OK, but I'd like to know what is going to be safe. There's no way to
 future-proof the language. I'm quite prepared to replace - with
 something
 else, and if I do then - will need to be adjusted accordingly, I
 think.

 My suggestion would be ~ and ~. I know David Wheeler didn't like that
 on
 the ground that some fonts elevate ~ rather than aligning it in the
 middle
 as most monospaced fonts do, but I'm tempted just to say then use a
 different font. Other possibilities that come to mind are + and +,
 although I think they're less attractive. But I'll be guided by the
 consensus, assuming there is one ;-)

 I suspect both of those are pretty safe from an SQL standards point of
 view.  Of course, as Tom is often wont to point out, the SQL standards
 committee sometimes does bizarre things, so nothing's perfect, but I'd
 be rather shocked if any of those got tapped to mean something else.

 That having been said, I still don't see value in adding operators at
 all.  Good old function call notation seems perfectly adequate from
 where I sit.  Sure, it's a little more verbose, but when you try to
 too hard make things concise then you end up having to explain to your
 users why \ditS is a sensible thing for them to type into psql, or why
 s@\W@sprintf%%%02x,ord($)@e in Perl.  I recognize that I may lose
 this argument, but I've worked with a couple of languages where
 operators can be overloaded (C++) or defined (ML) and it's just never
 seemed to work out very well.  YMMV, of course.

 I also basically feel this way, although I know I tend more towards
 notational brutalism than many.  I think we shouldn't kid ourselves
 that non-default operators will be used, and for
 current-implementation reasons (that maybe could be fixed by someone
 determined) it's not really at the pleasure of the author to use them
 via CREATE OPERATOR either.

 So, I basically subscribe to view that we should investigate what
 total reliance on prefix syntax looks like.  I guess it'll make nested
 navigation horribly ugly, though...positively lisp-esque.  That' s one
 consideration hstore doesn't have that may make use of infix notations
 considerably more useful for json than hstore.



 We don't have the luxury of designing things like this in or out from
 scratch. Creation of operators has been a part of PostgreSQL for a good
 while longer than my involvement, and a great many people expect to be able
 to use it. I can just imagine the outrage at any suggestion of removing it.

I am only referring to referring the restriction that the planner
can't understand that fetchval() and '-' mean the same thing for,
say, hstore.  Hence, use of non-default CREATE OPERATOR may become
more useful some day, instead of basically being a pitfall when
someone reasonably thinks they could use either spelling of the same
functionality and the optimizer will figure it out.

I'm not suggesting removal of any feature.

My reference to total reliance of prefix syntax refers only to the
JSON operators, since the previous correspondence from Robert was
about how function call syntax alone may be sufficient.  This phrase
refers to the same idea he is proposing.

I also included a weakness to that idea, which is that nesting in JSON
makes the situation worse than the common compared case, hstore.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] writable foreign tables

2013-02-01 Thread Daniel Farina

On Tue, Jan 29, 2013 at 1:19 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I noticed the v10 patch cannot be applied on the latest master branch
 cleanly. The attached ones were rebased.

Hello,

I'm just getting started looking at this, but notice that the second
patch relies on contrib/postgres_fdw to apply, but it's not clear to
me where to get the correct version of that.  One way to solve this
would be to tell me where to get a version of that, and another that
may work well would be to relay a Git repository with postgres_fdw and
the writable fdw changes accumulated.

I poked around on git.postgresql.org and github and wasn't able to
find a recent pushed copy of this.

Anyway, I'm looking at the first patch, which applies neatly. Thanks.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json api WIP patch

2013-02-01 Thread Daniel Farina

On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 31, 2013 at 7:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
 I would like to not create any - operators, so that that syntax could
 be used in the future for method invocation or something similar (it's
 in the SQL standard).

 This is the first time I have heard that we should stay away from this. We
 have operators with this name in hstore, which is why I chose it.

 I'm not happy about this either.  It's bad enough that we're thinking
 about taking away =, but to disallow - as well?  My inclination is to
 just say no, we're not implementing that.  Even if we remove the contrib
 operators named that way, it's insane to suppose that nobody has chosen
 these names for user-defined operators in their applications.

 I think it's smarter for us to ship functions, and let users wrap them
 in operators if they so choose.  It's not difficult for people who
 want it to do it, and that way we dodge this whole mess.

Normally I'd agree with you, but I think there's a complexity here
that is very frown-inducing, although I'm not positively inclined to
state that it's so great that your suggested solution is not the least
of all evils:

  http://www.postgresql.org/message-id/8551.1331580...@sss.pgh.pa.us

The problem being: even though pg_operator resolves to functions in
pg_proc, they have distinct identities as far as the planner is
concerned w.r.t selectivity estimation and index selection.  Already
there is a slight hazard that some ORMs that want to grow hstore
support will spell it fetchval and others -.  So far, infix
syntax seems to be the common default, but a minor disagreement among
ORMs or different user preferences will be destructive.

Another way to look at this is that by depriving people multiple
choices in the default install, that hazard goes away.  But it also
means that, practically, CREATE OPERATOR is going to be hazardous to
use because almost all software is probably not going to assume the
existence of any non-default installed operators for JSON, and those
that do will not receive the benefit of indexes from software using
the other notation.  So, I think that if one takes the 'when in doubt,
leave it out' approach you seem to be proposing, I also think that
very little profitable use of CREATE OPERATOR will take place -- ORMs
et al will choose the lowest common denominator (for good sensible
reason) and flirting with CREATE OPERATOR will probably cause
surprising plans, so I doubt it'll take hold.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch to add \watch to psql

2013-01-17 Thread Daniel Farina

I have adjusted this patch a little bit to take care of the review
issues, along with just doing a bit of review myself.

On Thu, Oct 25, 2012 at 2:25 AM, Will Leinweber w...@heroku.com wrote:
 Thanks for the reviews and comments. Responses inline:
 .
 On Sat, Oct 20, 2012 at 9:19 AM, Abhijit Menon-Sen a...@2ndquadrant.com 
 wrote:
 Maybe you should call it \repeat or something. I'm sure people would get
 around to using \watch that way eventually. :-)


 While I agree that clearing the screen would be nicer, I feel that
 watching the bottom of the screen instead of the top gets you 95% of
 the value of unix watch(1), and having the same name will greatly
 enhance discoverability. Perhaps later on if ncurses is linked for
 some other reason, this could take advantage of it then.

 That said, I'm not that strongly attached to the name one way or the other.

The name \repeat has grown on me, but I haven't bothered renaming it
for the time being.  I think sameness with the familiar 'watch'
program may not be such a big deal as I thought originally, but
'repeat' sounds a lot more like a kind of flow control for scripts,
whereas \watch is more clearly for humans, which is the idea.

 On Wed, Oct 24, 2012 at 2:55 PM, Peter Eisentraut pete...@gmx.net wrote:
 This doesn't handle multiline queries:

 = \watch select 1 +
 ERROR:  42601: syntax error at end of input
 LINE 1:  select +
  ^

 I think to make it cooperate better with psql syntax, put the \watch at
 the end, as a replacement for \g, like

I have implemented some kind of multi-line support.  The rough part is
in this part of the patch:

+ if (query_buf  query_buf-len  0)
+ {
+ /*
+ * Check that the query in query_buf has been terminated.  This
+ * is mostly consistent with behavior from commands like \g.
+ * The reason this is here is to prevent terrible things from
+ * occuring from submitting incomplete input of statements
+ * like:
+ *
+ * DELETE FROM foo
+ * \watch
+ * WHERE
+ *
+ * Wherein \watch will go ahead and send whatever has been
+ * submitted so far.  So instead, insist that the user
+ * terminate the query with a semicolon to be safe.
+ */
+ if (query_buf-data[query_buf-len - 1] == ';')

What I found myself reaching for when giving up and writing this hack
was a way to thread through the last lexer state of  query_buf, which
seems it could stand to be accrue a bit more information than being
just a byte buffer.  But this is the simplest possible thing, so I'll
let others comment...

--
fdr


psql-watch-v2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch to add \watch to psql

2013-01-17 Thread Daniel Farina

On Thu, Jan 17, 2013 at 5:07 PM, Daniel Farina dan...@heroku.com wrote:
 I have adjusted this patch a little bit to take care of the review
 issues, along with just doing a bit of review myself.

I realized while making my adjustments that I pointlessly grew some input
checking in the inner loop.  I just hoisted it out in this version.

-- 
fdr


psql-watch-v3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel query execution

2013-01-16 Thread Daniel Farina

On Tue, Jan 15, 2013 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

 In case you hadn't noticed, we've totally lost control of the CF
 process.  Quite aside from the lack of progress on closing CF3, major
 hackers who should know better are submitting significant new feature
 patches now, despite our agreement in Ottawa that nothing big would be
 accepted after CF3.  At this point I'd bet against releasing 9.3 during
 2013.

I have been skimming the commitfest application, and unlike some of
the previous commitfests a huge number of patches have had review at
some point in time, but probably need more...so looking for the red
Nobody in the 'reviewers' column probably understates the shortage
of review.

I'm curious what the qualitative feelings are on patches or clusters
thereof and what kind of review would be helpful in clearing the
field.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json api WIP patch

2013-01-15 Thread Daniel Farina

On Tue, Jan 15, 2013 at 12:17 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/15/2013 02:47 PM, Merlin Moncure wrote:

 On Tue, Jan 15, 2013 at 1:04 PM, David Fetter da...@fetter.org wrote:

 On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:

 On 01/14/2013 07:36 PM, Merlin Moncure wrote:

 While testing this I noticed that integer based 'get' routines are
 zero based -- was this intentional?  Virtually all other aspects of
 SQL are 1 based:

 postgres=# select json_get('[1,2,3]', 1);
   json_get
 --
   2
 (1 row)

 postgres=# select json_get('[1,2,3]', 0);
   json_get
 --
   1
 (1 row)

 Yes. it's intentional. SQL arrays might be 1-based by default, but
 JavaScript arrays are not. JsonPath and similar gadgets treat the
 arrays as zero-based. I suspect the Json-using community would not
 thank us for being overly SQL-centric on this - and I say that as
 someone who has always thought zero based arrays were a major design
 mistake, responsible for countless off-by-one errors.

 Perhaps we could compromise by making arrays 0.5-based.

 Well, I'm not prepared to argue with Andrew in this one.  It was
 surprising behavior to me, but that's sample size one.

 I doubt I'm very representative either. People like David Wheeler, Taras
 Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than
 me. I'm quite prepared to change it if that's the consensus.

Hello.

I'm inclined to go with the same gut feeling you had (zero-based-indexing).

Here is the background for my reasoning:

The downside of zero-based-indexing is that people who want to use
multiple sequential container types will inevitably have to deal with
detailed and not easily type-checked integer coordinates that mean
different things in each domain that will, no doubt, lead to a number
of off-by-one errors.  Nevertheless, this cost is already paid because
one of the first things many people will do in programs generating SQL
queries is try to zero-index a SQL array, swear a bit after figuring
things out (because a NULL will be generated, not an error), and then
adjust all the offsets. So, this is not a new problem.  On many
occasions I'm sure this has caused off-by-one bugs, or the NULLs
slipped through testing and delivered funny results, yet the world
moves on.

On the other hand, the downside of going down the road of 1-based
indexing and attempting to attain relative sameness to SQL arrays, it
would also feel like one would be obliged to implement SQL array
infelicities like 'out of bounds' being SQL NULL rather than an error,
related to other spectres like non-rectangular nested arrays.  SQL
array semantics are complex and The Committee can change them or --
slightly more likely -- add interactions, so it seems like a general
expectation that Postgres container types that happen to have any
reasonable ordinal addressing will implement some level of same-ness
with SQL arrays is a very messy one.  As such, if it becomes customary
to implement one-based indexing of containers, I think such customs
are best carefully circumscribed so that attempts to be 'like' SQL
arrays are only as superficial as that.

What made me come down on the side of zero-based indexing in spite of
the weaknesses are these two reasons:

* The number of people who use JSON and zero-based-indexing is very
  large, and furthermore, within that set the number that know that
  SQL even defines array support -- much less that Postgres implements
  it -- is much smaller. Thus, one is targeting cohesion with a fairly
  alien concept that is not understood by the audience.

* Maintaining PL integrated code that uses both 1-based indexing in PG
  functions and 0-based indexing in embedded languages that are likely
  to be combined with JSON -- doesn't sound very palatable, and the
  use of such PLs (e.g. plv8) seems pretty likely, too.  That can
  probably be a rich source of bugs and frustration.

If one wants SQL array semantics, it seems like the right way to get
them is coercion to a SQL array value.  Then one will receive SQL
array semantics exactly.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread Daniel Farina

On Tue, Jan 8, 2013 at 11:51 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 8 January 2013 18:46, Josh Berkus j...@agliodbs.com wrote:
 On 1/5/13 1:21 PM, Peter Geoghegan wrote:

 On 21 December 2012 14:08, Robert Haas robertmh...@gmail.com wrote:

 I'm sure it's possible; I don't *think* it's terribly easy.


 I'm inclined to agree that this isn't a terribly pressing issue.
 Certainly, the need to introduce a bunch of new infrastructure to
 detect this case seems hard to justify.


 Impossible to justify, I'd say.

 Does anyone have any objections to my adding this to the TODO list, in case
 some clever GSOC student comes up with a way to do it *without* adding a
 bunch of infrastructure?

 Daniel already did object

To briefly reiterate my objection, I observed that one may want to
enter a case of cyclicality on a temporary basis -- to assist with
some intermediate states in remastering, and it'd be nice if Postgres
didn't try to get in the way of that.

I would like to have enough reporting to be able to write tools that
detect cyclicity and other configuration error, and I think that may
exist already in recovery.conf/its successor in postgresql.conf.  A
notable problem here is that UDFs, by their mechanical nature, don't
quite cover all the use cases, as they require the server to be
running and available for hot standby to run.  It seems like reading
recovery.conf or its successor is probably the best option here.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication: should we detect/prevent cycles?

2013-01-08 Thread Daniel Farina

On Tue, Jan 8, 2013 at 5:51 PM, Josh Berkus j...@agliodbs.com wrote:
 Daniel,


 To briefly reiterate my objection, I observed that one may want to
 enter a case of cyclicality on a temporary basis -- to assist with
 some intermediate states in remastering, and it'd be nice if Postgres
 didn't try to get in the way of that.

 I don't think it *should* fail.  I think it should write a WARNING to
 the logs, to make it easy to debug if the cycle was created accidentally.

Well, in the conversation so long ago that was more openly considered,
which may not be true in the present era...just covering my old tracks
exactly.

 I would like to have enough reporting to be able to write tools that
 detect cyclicity and other configuration error, and I think that may
 exist already in recovery.conf/its successor in postgresql.conf.  A
 notable problem here is that UDFs, by their mechanical nature, don't
 quite cover all the use cases, as they require the server to be
 running and available for hot standby to run.  It seems like reading
 recovery.conf or its successor is probably the best option here.

 Well, pg_conninfo will still be in postgresql.conf.  But that doesn't
 help you if you're playing fast and loose with virtual IP addresses ...
 and arguably, people using Virtual IPs are more likely to accidentally
 create a cycle.

That's a good point. Even simpler than virtual-IP is DNS, wherein the
resolution can be rebound, but a pre-existing connection to an old IP
will happily remain, and will be hard to know that from
postgresql.conf and friends.  I guess that means the hard case is when
 hot standby is not (yet) on, but the server is actively recovering
WAL...UDFs are out, and scanning postgresql.conf is not necessarily an
accurate picture of the situation.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Daniel Farina

Attached is a cumulative patch attempting to address the below.  One
can see the deltas to get there at https://github.com/fdr/postgres.git
error-prop-pg_stat_statements-v2.

On Fri, Dec 28, 2012 at 9:58 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 However, with this approach, calls_underest values might appear to the
 user in what might be considered an astonishing order. Now, I'm not
 suggesting that that's a real problem - just that they may not be the
 semantics we want, particularly as we can reasonably defer assigning a
 calls_underest until a sticky entry is unstuck, and an entry becomes
 user-visible, within pgss_store().

Fix for this as I understand it:

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 1036,1041  pgss_store(const char *query, uint32 queryId,
--- 1036,1042 
e-counters.usage = USAGE_INIT;

e-counters.calls += 1;
+   e-counters.calls_underest = pgss-calls_max_underest;
e-counters.total_time += total_time;
e-counters.rows += rows;
e-counters.shared_blks_hit += bufusage-shared_blks_hit;
***
*** 1264,1272  entry_alloc(pgssHashKey *key, const char *query,
int query_len, bool sticky)
/* set the appropriate initial usage count */
entry-counters.usage = sticky ? pgss-cur_median_usage : 
USAGE_INIT;

-   /* propagate calls under-estimation bound */
-   entry-counters.calls_underest = pgss-calls_max_underest;
-
/* re-initialize the mutex each time ... we assume no one using 
it */
SpinLockInit(entry-mutex);
/* ... and don't forget the query text */
--- 1265,1270 

 Also, it seems like you should initialise pgss-calls_max_underest,
 within pgss_shmem_startup().

Easy enough. Somehow I wrongly thought zero-initialization was a thing
for the shmem functions.

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 426,431  pgss_shmem_startup(void)
--- 426,432 
{
/* First time through ... */
pgss-lock = LWLockAssign();
+   pgss-calls_max_underest = 0;
pgss-query_size = pgstat_track_activity_query_size;
pgss-cur_median_usage = ASSUMED_MEDIAN_INIT;
}

 You should probably serialise the value
 to disk, and initialise it to 0 if there is no such value to begin
 with.

I prefer different approach here: just compute it while loading the
entries from disk, since the calls + underestimation can be used to
find a new pessimum underestimation global value.

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***
*** 419,424  pgss_shmem_startup(void)
--- 419,425 
int query_size;
int buffer_size;
char   *buffer = NULL;
+   int64   calls_max_underest = 0;

if (prev_shmem_startup_hook)
prev_shmem_startup_hook();
***
*** 440,446  pgss_shmem_startup(void)
{
/* First time through ... */
pgss-lock = LWLockAssign();
!   pgss-calls_max_underest = 0;
pgss-query_size = pgstat_track_activity_query_size;
pgss-cur_median_usage = ASSUMED_MEDIAN_INIT;
}
--- 441,447 
{
/* First time through ... */
pgss-lock = LWLockAssign();
!   pgss-calls_max_underest = calls_max_underest;
pgss-query_size = pgstat_track_activity_query_size;
pgss-cur_median_usage = ASSUMED_MEDIAN_INIT;
}
***
*** 528,533  pgss_shmem_startup(void)
--- 529,545 

   temp.query_len,

   query_size - 1);

+   /*
+* Compute maxima of under-estimation over the read entries
+* for reinitializing pgss-calls_max_underest.
+*/
+   {
+   int64 cur_underest;
+   
+   cur_underest = temp.calls + temp.calls_underest;
+   calls_max_underest = Max(calls_max_underest, 
cur_underest);
+   }
+
/* make the hashtable entry (discards old entries if too many) 
*/
entry = entry_alloc(temp.key, buffer, temp.query_len, false);

***
*** 535,540  pgss_shmem_startup(void)
--- 547,559 
entry-counters = temp.counters;
}

+   /*
+* Reinitialize global under-estimation

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Daniel Farina

On Sat, Dec 29, 2012 at 4:21 AM, Daniel Farina drfar...@acm.org wrote:
 These were not express goals of the patch, but so long as you are
 inviting features, attached is a bonus patch that exposes the queryid
 and also the notion of a statistics session that is re-rolled
 whenever the stats file could not be read or the stats are reset, able
 to fully explain all obvious causes of retrograde motion in
 statistics.  It too is cumulative, so it includes the under-estimation
 field.  Notably, I also opted to nullify extra pg_stat_statements
 fields when they'd also show insufficient privileges (that one is
 spared from this censorship), because I feel as though a bit too much
 information leaks from pg_stat_statement's statistics to ignore,
 especially after adding the query id.  Since the common theme here is
 identifying queries, I have called it
 pg_stat_statements-identification, and it can be found in the git
 repo above under the same name (...-v1).

A small amendment that doesn't really change the spirit of the
narrative is attached.

--
fdr


pg_stat_statements-identification-v2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2012-12-29 Thread Daniel Farina

On Sat, Dec 29, 2012 at 6:37 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 29 December 2012 12:21, Daniel Farina drfar...@acm.org wrote:
 These were not express goals of the patch, but so long as you are
 inviting features, attached is a bonus patch that exposes the queryid
 and also the notion of a statistics session that is re-rolled
 whenever the stats file could not be read or the stats are reset, able
 to fully explain all obvious causes of retrograde motion in
 statistics.  It too is cumulative, so it includes the under-estimation
 field.

 Cool.

 I had a thought about Tom's objection to exposing the hash value. I
 would like to propose a compromise between exposing the hash value and
 not doing so at all.

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it.  Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions.  That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

A change in hash function should also then necessitate changing the
stat file header.

Another more minor issue is the hard-wiring of the precision of the
id.  For that reason, I have thought it reasonable to expose this as a
string also.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 >

1 - 100 of 432 matches

Mail list logo